On the Politics of Power, Data, and Academia
Or an overly critical take on a solid work in progress paper.
There is, without any question, an aura of authority that comes with data. Take any awkward Thanksgiving dinner debate over unemployed people receiving food stamps and inject some actual statistics into it. What do you get? Well, a pretty conclusive answer that no - people who are on food stamps are, for the most part, employed. Of course, there is still more that could go into such a debate, but the data more or less ends it. In fact, if your conservative uncle-in-law is anything like mine, this probably leads them to complaining about your use of “fancy statistics” or “numbers.” The point is that data can and often is - conclusive. It provides concrete answers to critical questions, sifting through the speculation that comes with social theory or dinner table arguments.
Right?
Yesterday, I was a part of my research and things (RAT) lab reading group. In it, we read a paper titled - Understanding Creators’ Acceptance of Content Reuse. The paper was published in the CSCW companion, a venue for works in progress on Computer-Supported Cooperative Work & Social Computing. In the work, the authors conduct a pretty large, 478 to be exact, survey over Instagram user's preferences on content reuse.
For those new to the world of digital platforms, it's pretty common practice to reuse data created from platforms like Instagram or Twitter. As an example, academics will commonly look at Reddit or Twitter as a tool for developing sentiment analysis on specific topics. To do so, they will either employ a web scraper, a tool that automatically combs through webpages and extracts data, or the platform will provide free mass access to the data through its developed API. Because this usually happens without the consent of these individuals, it is somewhat of a moral grey area in academic research and is often discussed in such circles.
Numerous other actors also scrape and use the data. More problematic examples include using the data to train artificial intelligence models, collecting and selling it in mass to data brokers, or being scraped for search engine use. Again most, if not all, of this happens without the informed consent of the individuals, and thus poses a problem for a data-just society. This is also exacerbated by the threats and harms tools like AI or large language models pose. For instance, I have heard accounts of trans people's data being used to create models for the detection of trans individuals.
To try and measure people’s acceptance of data reuse, the academics created a survey detailing a number of cases in which data may be reused. This includes all of the ones described up above, along with a few others, like search engines and individual identification systems. Participants were then asked if the scenario was acceptable, unacceptable, or negotiable. For participants who reported negotiable, they were also given a number of requirements for usage of their data. These included receiving compensation, accreditation, and anonymization.
The paper then reports on the results of said survey. Most of the analysis is expository, highlighting what answers were common among the different scenarios. After a small amount of discussion about what the data shows, including its implications for policy and researchers, the authors also share their ideas for future work.
On face value, this seems like a solid academic paper. It highlights a problem within the human-computer interaction space, conducts a survey to help deepen our understanding of the problem, and proposes some insights into how said problem may be solved. Many of my fellow lab members also seemed to find the paper relatively strong. A few gripes were shared around some vagueness in the study's prompts or lack of reporting around how they collected participants, however, no one, apart from myself, found anything to really be objectionable.
So what gives? Why do I, someone who is extremely sensitive to forms of data exploitation, find this paper a bit problematic? After significant reflection and refinement, I have narrowed it down into three issues. First, and most critically, is that their specific methodology is completely blind to the dynamics of power. Second, the paper fails to properly contextualize any of their results, in part due to its sole reliance on qualitative work. And third, the paper takes what I would call an advocacy approach to challenging current data standards. An approach far too passive to meaningful solve the problem.
Let’s each point sequentially.
Before arriving at the paper, I want to first discuss what is meant by the dynamics of power. While not obvious from the paper’s prose, the work is providing a challenge to the status quo. It's not uncommon for tech companies to refer to much of the data they use, as “waste” or “exhaust”, a byproduct of engaging in the digital age. Done in part to obscure the power access to large data sets gives them, this sees data as something that has little economic value, especially in small quantities. The subtext is that individuals shouldn’t worry about their data usage or control. Borrowing from the capitalist playbook, the focus is on the product and its associated value, not how it was produced.
This paper does the opposite. Drawing from critical accounts of data, labor, and political economy, it sees data as a product of labor. Individuals expend effort in the production of data, both through their content and usage, and thus data creators should have some control over it.
What we are really talking about, albeit at a significantly lower level, is power. The power to have control over how one’s data is being used. (Note that by one's data, we are utilizing their broader conception of data. Meaning that one’s data refers to the data produced by the actions of an individual. ) Thus the paper is really asking the question, if you had the power to control your data, data that you produced, what would you do with it?
On face value this doesn’t seem a very complex issue. Any just society should be based on consent and protecting the rights of individuals. Therefore, the natural conclusion to fixing the issue of data exploitation is to start with individuals. And - for the most part - I agree. However, power is a complex topic, one that needs to be handled with care.
Perhaps the most important point, for me, is that power is dialectic. People's perceptions of power, both individually and systematically, are shaped by their participation, and vice versa.
A great parallel case study is from the literature on unionization, social movements, and industrial sociology. One of the most important pieces of wisdom from 50 years of study, is that what people want, or, in the unionization case, what they think they can win, is directly informed by their perceptions of their power. Do workers feel like they have a lot of power? Then they are significantly likely to report wanting more. Do workers feel like they are lacking in power? Then they are probably going to just want a 3% raise.
Looking at this study, the paper makes no attempt at trying to engage with any questions of power. Instead, it takes the significantly easier approach of observing. Simply asking participants what they would do if given data autonomy. Nothing is provided to understand why users reported these desires, instead simply highlighting that they had different desires. If we want to actually develop an understanding of data reuse, we need to not just observe, but inquire. Inquire into how we arrived at this moment in time and how participants came to hold these beliefs.
A simple counterargument to my point, and one that was brought up in the associated discussion, was that it is out of scope. However, this is not an argument I find particularly convincing. What is really meant by out of scope is that the broader academic community doesn’t think it necessary to engage with these questions for publication. Perhaps I am wrong, but isn’t the point of academic research to challenge norms? Why should the minimum required to publish be a meaningful metric? Obviously, there are larger systematic issues within academia, the nature of publish and perish, and the coercive nature of capitalism that play a factor. However, I significantly doubt that conducting a number of interviews, focus groups, or other more active-oriented methods would pose a significant barrier. All of which could help provide meaningful depth around the questions I have raised.
This is probably a good time to move into my second point, as the argument is now likely obvious, the lack of proper contextual analysis or qualitative work. I opened this essay with an anecdote on data and the power behind it. I now want to provide another anecdote around the failings of data.
In 2019, workers at the Chattanooga Volkswagen plant failed to unionize, with 773 voting yes and 833 voting now. The loss came as a surprise to UAW, as polling of the workers had estimated a super-majority of workers would vote yes. So what happened? Later reflections showed that the UAW was asking the wrong questions. Instead of looking at things like the percentage of workers who would wear a union shirt or pin, they based their analysis on self-reports of support. Thus when the tensions rose, and the election started, many workers switched their votes for fear of retaliation, job loss, etc.
The point of this antidote is that individual data, data that lacks proper contextual analysis, especially when it comes to complex issues of power, politics, and economy, is practically useless. And unfortunately, the study did exactly this. It engaged with potentially high-impact questions, those that could require significant changes to many tech company business models, while also providing no analysis, discussion, or literature review on any of these critical topics.
This is not to say that the study of individual wants, without intervention or education, is not useful in the production of knowledge. Nor is this to say that in crafting policy around these issues we shouldn’t focus on community wants and needs. Both, I think, are unarguably true. Instead, it is to critique the lack of additional research done on how this data came to be. If we want to craft policy around data ownership and rights, a motivation of this paper, it's critical to not just observe. Instead, we also need to challenge, to educate, to empower. It is only from a state of empowerment that meaningful policy, policy that actually answers the harms brought up by previous works, can be created.
One could also rectify this issue by providing discussions, either in the literature review or discussion section, about the potential issues of crafting policy from just this observation-driven study. Obviously, one could argue that such an argument is obvious and is thus not needed. But I disagree. Drawing on Noam Chomsky’s critiques of the academy, we need to be critically thinking about how our work might also be harmful. What isn’t said can be just as important as what is said and including a section that highlights how this work should fit within the development of policy would go a long way to alleviate potential harms.
Finally, I want to engage with my third point, that being the passive nature of this paper's advocacy. This, much like my other two points is a critique based not on the paper individually, but in its broader context within the academic system. The paper is, both implicitly and explicitly, advocating for change. Change around how companies utilize data as a tool for profit, with the main mechanism for change being policy.
At face value, there is nothing wrong with advocating for policy changes. Policy is generally a pretty strong force for ensuring change. It's structural, in so much as the state is structural. It is, at least theoretically, a democratic process, and also is backed by military enforcement. However, history teaches us that policy and advocacy isn’t really the way to go about enacting large change, especially when it challenges capital. Movements that brought about large systemic change, like the civil rights or progressive movements of the 20th century, were not decided by court cases or advocacy work (although they have since been gutted by them). Instead, they were won out in the street, through protests and disruptions, through marches, and sometimes through violence.
Attempting to enact policy changes around data, especially when they are the most powerful capital interest on the bloc, is going to require significant work. Work that goes beyond not just academia, but also into activism, into the streets.
So what is the harm of a paper not promoting activism? Well, I would argue that it gives academics, myself included, a pass to not participate, to not organize. More importantly, this framing strips us of our power. It says, “This is a problem, but it’s not ours to solve”. But, as academics, we have a lot of power, more so than we probably realize. Framing the problem as something to solve in policy reduces that, it tells us that we can’t. Going back to my earlier point, power is dialectic, it reflects and refracts through our perceptions and the conditions around it. Choosing to argue for change through policy does little to build it and does a lot to reduce it. After all, isn’t the pen supposedly mightier than the sword?
The paper also does little to assist data activists. Perhaps the most important feature of the paper's data is that it highlights that people are not OK with the status quo, that there is something to organize around. It does not, however, give organizers or activists anything to really sink their teach in. Some of my favorite organizing spaces, labor unions, the debt collective, and the Occupy Wall Street movement all draw on academic work. The authors could have created easy-to-read figures, made ties to corporate interest with data as “waste”, or even engaged in actual organizing. All of these actions have been done before by academics, so why didn’t this one?
I want to conclude this essay by reflecting on the purpose of writing it. An initial read of the above might conclude that I disliked the study and associated paper, that I see it as a failure. But that isn’t really accurate. Instead, I am more frustrated with the paper. It does a fantastic job of starting the conversation around data, political economy, consent, and the like. It highlights that many data creators are not OK with how their data is being used today. But instead of going that extra step, it stops short, choosing to just engage with consent at its face value.
My frustration isn’t grounded in its failures, but in what could have been. The potential to empirically, both through qualitative and quantitative data, explore the issues around data and political economy is so high. So high in fact, that I, after reading what is, by all accounts, a solid work-in-progress paper decided to write an essay-length critique.
Perhaps the takeaway then, is that we should keep working in this space, and the authors should keep working on this problem. That more people should get involved in this discussion, this debate. And, in doing so - we explore not just what is, but would could be.