Researchers don’t have the money, infrastructure or coordination of standards to get to grips with an unregulated information environment. A new, centralised and properly resourced research institute is needed
Academics and policymakers need a new CERN-like central facility to study the swirl of information and news circulating online, academics have argued, or democracy risks being eroded.
Amateurish software engineering, a lack of long-term support for research infrastructure, and patchy access to tech platform data are leaving academics in the dark about what the explosion of social media content, podcasts and videos are doing to society and democracy.
“Current approaches to studying the information environment and its effects on democracy are not keeping pace with rapidly evolving threats,” says A CERN Model for Studying the Information Environment, a recent paper from the Carnegie Endowment, a US-based think tank.
In the view of co-author Jacob Shapiro, a politics professor at Princeton University, researchers have only the faintest grasp of what is going on online.
For example, academics still don’t know if banning people from social media platforms nudges their followers to tone down controversial content, or if it leads to an exodus to other platforms with looser controls, he said. This is just one of many crucial yet unanswered questions, he said.
“In general we lack even basic observations of this environment,” he said. “So when platforms make changes, their impact isn’t measured, even imperfectly.”
Researchers need a “well engineered system set up to collect information on a regular basis” to answer these questions, Shapiro said. But on the whole, social scientists do not have the software engineering skills to build these data pipelines.
“What takes an industry quality engineer a week or two takes a month,” he said. “We’re using peoples’ time incredibly poorly.”
What’s more, researchers need far more resources than they currently have to analyse the torrent of video content uploaded onto sites like Tiktok, the paper says. Instead, researchers just use transcripts of what is said, missing out on visual messages.
A further problem is that researchers have not agreed on certain standardised ways of measuring variables online, Shapiro noted, making it hard to know whether differences in results are merely due to study design.
For example, there are several different ways to detect someone’s location on social media, and each different method risks skewing results one way or the other.
Monthly stats
Despite its importance, monitoring of the online information environment may be as embryonic as official unemployment statistics, say, in the mid-twentieth century.
The US’s Bureau of Labor Statistics has an annual budget of more than $600 million, Shapiro pointed out, in order to divine key metrics like the inflation and unemployment rate. “We have nothing remotely comparable for understanding the information environment,” he said.
Reliable, regular statistics about the online world might be just as important. “In many areas of public policy, we find it very useful to have month to month measures of how things are going,” he said.
This might include whether users are linking to sources when asserting things online, or how much of the conversation is about politics, as opposed to sport or entertainment.
A new CERN?
The answer is a new CERN-like centralised research infrastructure, according to Shapiro and his co-author, Alicia Wanless, a senior fellow at the Carnegie Endowment. It would have the permanent funding to build reliable data pipelines. And backing from multiple countries would stop the centre being abused by any one nation that took an authoritarian turn.
Fabio Giglietto, an associate professor mapping Italian news online at the University of Urbino, thinks the idea is a good one. A “CERN-like model” would help solve the “fragmentation” of the research community, he said.
But not everyone is convinced. “I have heard people calling for a CERN for the oceans, a CERN for the fight again cancer, a CERN for AI,” said Robert-Jan Smits, president of Eindhoven University of Technology, and former director general of research and innovation at the European Commission. “But CERN is CERN and the model cannot be copied that easily.”
For a start, CERN, located on the Franco-Swiss border, is centralised in one place for a reason: its unique collection of physical experimental equipment, such as the Large Hadron Collider.
An equivalent body studying the information environment would work differently, said Smits, with “light” coordination across the many research projects already underway all over the world. “I can hardly see it function as an institution,” he said.
But Shapiro thinks building a physical centre for such work has its merits. “Many companies are finding there’s a value to physical colocation that is hard to replicate,” he said. “Universities spend huge amounts on travel to bring people in to have conversations.”
There’s also a question about goals. CERN can point to key successes, like finding experimental evidence of the Higgs boson, for example.
A new CERN for the information environment would have a mission “to save democracy,” the Carnegie paper says. But this would be an easy goal for Scandinavian countries, pointed out Smits, but a near impossible ask for totalitarian or authoritarian countries.
Data access
One other big hurdle for researchers of the internet is that the data they need is often kept under lock and key by the tech companies that own social media platforms. Access can be patchy, and reliant on the goodwill of tech firms.
“Unlike what happens with physics, the data needed to address the most pressing questions are owned by private companies,” said Giglietto.
The EU recently approved Digital Services Act should help researchers get easier access to data held by online platforms. The act contains a clause that allows researchers who have an academic affiliation and are domain experts to request data from online platforms, so long as the request does not compromise trade secrets.