Time to rethink content

Sponsored by: NGI
06 Sep 2022 | News

As misinformation goes well beyond news websites, we need to critically evaluate all kinds of information and content

The fight against misinformation can be compared to a big clean-up initiative: Think of your neighbourhood covered with trash, not only from last week, but several years. The littering happens quickly, but cleaning up takes much longer.

It is important to rethink, critically, what should be considered content. Yes, a news article is clearly content. But there are many other forms of bits of information that either precede or follow traditional news articles. And all this additional information must be reliable and trustworthy. 

When falsified content goes viral, lives might be in danger

One big criticism towards large social media platforms is their failure to foresee and then act against dynamically generated misinformation based on posts, comments and discussions. When falsified content goes viral, lives can be in danger. There are documented cases where false information was used to incite anger among a group, sometimes leading to angry mobs in the streets burning down the house of a victim of such allegations. 

Therefore, all elements of what we consider trustworthy information must be verifiable. In addition to written text, the definition of content must include pictures, artwork/visuals, videos, video stills, numerical data, system data, raw or aggregated data, and algorithms. In addition, we must include user-generated content, such as discussions, opinions and other interactions on social media. So far, on a technical level, it is very difficult to separate one from the other, if only the words are analysed. Therefore, the meta-data must be included in the analysis, too. 

Most difficult: Mixing true and false information

When correct content is taken out of context and is combined or mixed with false or fabricated information, automated searches find it hard to distinguish between the two. Humans might be able to see the difference, but there is no feasible way that every item of information can be checked for plausibility or truth by a human. It is important to evaluate search strings and search methods to identify new information. 

For example, technical systems can be tricked through the manipulation of publication dates. To get a search spider to pick up existing content as new, it can be sufficient to re-publish the content, potentially on a different website and under a different URL and IP address. 

If a website methodically refreshes dates for the content on its pages there are not many ways for search engines to detect this. With a little bit of know-how, it is possible to trick search spider software into believing that a recycled article has been published very recently.

Fraud detection is far from fool proof

Much of the data used by Google for search depends on what website owners and content creators provide. Although there are all kinds of fraud detection systems, they are largely unable to verify data points for content published via content management systems at the source. Therefore, the search engines do have not many options. Although only a fraction of content sources intentionally falsify what they publish, as they often go undetected, they can do a great deal of harm. 

Why falsify content? Presumably, the main and the most frequent motivation is simply to make money. Running a partially automated fake news system and connecting it to an advertising platform can result in a very good pay out. 

This is a big business. “According to the World Federation of Advertisers (WFA), it is estimated that by 2025, over $50 billion will be wasted annually on ad fraud,” says a report published by industry organisation IAB Europe.

Fake political articles can be very profitable

After the 2016 election in the U.S., researchers found that a considerable number of entirely faked articles were coming from Macedonia. Some people there had learned how to make money through digital advertising and the key was to write entirely falsified, but outrageous articles about political candidates. The wilder the allegations, the better the click rates and shares for such content. This created a mini-industry based on “fake news” in the region. 

The techniques that make ad fraud successful can be used for political or criminal disinformation campaigns. The financial motivation for ad fraud exploits helps to build experience and a lot of practical knowledge on how to mislead existing platforms, which then can be reused for targeted disinformation campaigns. 

These are some of the arguments as to why we need to broaden our understanding of what is content. Over time there should be detection measures, even at the source where the information is published, to enable 100% verification. 

Examples of funded projects from TruBlo

TruBlo is an EU-funded project and part of the Next Generation Internet initiative which aims to create new tools using blockchain to enhance the trustworthiness of content. This project is providing funding for innovative ideas to fight misinformation on social media.

Among the ten projects which received funding in the 1st open call, there are several which are explicitly looking for new ways to detect falsified information and content. Some examples below, full list of funded projects can be found here.

CONTOUR – Trusted content for tourism marketing purposes

LEDGEAIR – Aircraft data mining framework

ShoppEx – to restore the trust between retailers/brands and consumers

Never miss an update from Science|Business:   Newsletter sign-up