INESC-TEC: Can artificial intelligence be an ally of democracy?

02 Jul 2024 | Network Updates | Update from INESC Brussels HUB
These updates are republished press releases and communications from members of the Science|Business Network

On June 9, across the European Union, millions of people went to vote and choose national representatives in Parliament. For several weeks, the media provided comprehensive coverage of the different campaigns, with emphasis on debates and rallies. There were many interviews and analyses to electoral programmes, together with the presentation of political trajectories, and the discussion of future paths to the countries.

To create this electoral narrative, it became necessary to go back a few years, and to understand the evolution of candidates and parties in the Portuguese and European political context. How did the journalists, who disseminated the information, manage to search for and select the most relevant data? How did they organise the events in a meaningful timeline? How did they make the process of telling this story easier?

In addition to interviews, reports, news coverage (with or without different sources), journalists tend to resort to information research, whether in public documents, on social networks or in other media. But the human being does not have the ability to look at all the texts and analyse them thoroughly; so, they resort to keywords as clues to understand things. But what if they used decision-support tools, capable of collecting and organising relevant information, and performing this time-consuming research work? If you add AI to the equation, then you can open a new world of possibilities. Shall we find out?

There is a Portuguese software, available for free, capable of extracting keywords from texts

If there's one thing we've learned from literature and Hollywood movies, it's that math can really help us solve problems. For instance, Murph, from Interstellar, who uses equations to save humanity. Contact scientist Eleanor Arroway, decodes an extraterrestrial message through sequences of prime numbers and other advanced mathematical concepts. In the book (and movie) The Da Vinci Code, Robert Langdon manages to solve a series of puzzles (and a murder!) using mathematics and even the Fibonacci sequence.

From the big screen to the real world: here's YAKE! (Yet Another Keyword Extractor), a tool that uses statistics to extract keywords from a text. Ricardo Campos and Alípio Jorge, researchers at INESC TEC, joined a team of computer science and mathematics (statistics) experts from the Universities of Beira Interior and Innsbruck to develop a software capable of going through a text and - based on a set of mathematical formulations - determining relevant words with a significant level of accuracy. But what are keywords after all? Ricardo Campos explained: "the notion of keyword is subjective, from a theoretical point of view, but also objective, from a practical point of view - namely since it is based on measurable criteria, like the frequency, context, etc. Keywords are not a theme, but rather elements that characterise the general idea of a text. For example, the theme may be sports, but the keywords are the name of the player, the club or expressions like ‘impressive victory’.

Hence, AI plays a vital role in analysing large volumes of data, automating the extraction of useful information. The YAKE! project goes a little further and, unlike neural network models, it uses a system that does require training - thus, making it easily adapted to other languages. "Nowadays, and given the complexity of these neural network models, it is still difficult to know the reasons why a certain model reaches a certain conclusion. The YAKE! project is based on a set of easily interpretable statistics. It may be slightly less effective than high-end models – namely large language models (LLMs)[1] - but it is still a very intuitive software that extracts keywords quickly and easily, supported by a set of mathematical heuristics. One of the main issues of AI today is precisely the interpretability and understandability of models, which do not explain their replies or decisions. This could lead to a case where, for instance, a person is standing trial in court and is convicted by a machine decision - little to nothing interpretable. Naturally, we cannot take such risk", explained Ricardo Campos.

So, how can we use YAKE!? Let's go back to the journalists' work during the European Elections? The software could have been used, for example, to infer (in real time) the keywords during a political debate; or maybe to generate a word cloud with the most important words from the programmes of political parties - or even from all the news about a specific campaign. In other words, the search for information would focus on what is effectively relevant and not on everything that is written. Did we manage to entice you, journalists around the world?

The YAKE! has left the hands of its creators, and it belongs to the world now. How? "As researchers, we always favour a scientific outlook rather than a business perspective concerning our research; and we aim to give back to the scientific community, for all the opportunities granted to us. Hence, we decided to design YAKE! as open-source software - and it is currently being used in more than 1000 other open-source projects, like the General Index”, mentioned the "creator" of this technology. But there are rules: whoever uses it must make a disclosure (relevant information) about how they intend to use it.

A window into the past that helps us understand the present

Ricardo Campos, a past aficionado, didn't stop here. By using an adapted version of YAKE! Conta-me Histórias works almost like a search engine for the past - mainly because it's integrated into Arquivo.pt, a Portuguese web content preservation platform. For instance, and concerning politics and the coverage of the European Elections, one could search for the name of a politician associated with a specific theme (environment, taxes, health, etc.)  and quickly build a narrative with the most relevant news over the past decade. Ricardo Campos carried out a similar exercise (with a political personality from the past): "at that time, based on the set of news, I tried to detect inconsistencies in Pedro Passos Coelho's political discourse about “taxes”. And how can we do that? We use YAKE! to analyse the vast volume of data available on Arquivo.pt - not to extract keywords, but to select, among thousands of news, those that are most relevant".

A useful tool for journalists, since it can retrieve news from the past that no longer exists on the “conventional” web and can provide essential information to tell a story.

From journalists to the average user, how can Conta-me Histórias be useful?

"Nowadays, most people get their news from feeds (content delivery systems) that learn what we like and start to favour a certain type of content.  It is something called filter bubbles: we are "bombarded" with the set of news that meets our preferences.  This is a problem that ultimately leads us to extreme and polarising positions", explained Ricardo Campos.

October 7, 2023. Rocket-warning sirens echoed in several Israeli cities. People started to share reports of an imminent attack. Word spread quickly: a coordinated Hamas attack on Israel near the Gaza Strip. Since then, there have been several headlines about this topic all around the world. Journalists, commentators, historians, politicians, academics and organisations: all of them expressed their opinion. Many people investigated, while others carried out fieldwork; and many studied a conflict that has been going on for decades. Where did it start? Who is right? What are the sources? When searching through a search engine, the results are based on the latest information available. However, recent information does not always provide a comprehensive temporal and spatial interpretation of a certain question. And this is where Conta-me Histórias can really make a difference.

Analyse data to tackle misinformation

Evelin Freire Amorim, a researcher at INESC TEC who is involved in similar projects, believes that gathering relevant information quickly is vital to help people make decisions. Text2Story and StorySense, two projects led by Ricardo Campos and Alípio Jorge, help to summarise information and visualise it in a diagram, highlighting characters and actions and the relationships between them.

“When we navigate a world filled with information, it is useful to use a tool that helps us understand said information, as well as certain events and the relationships between them, in an almost automatic way. In this sense, we're no longer just extracting keywords or aggregating information on a topic. We are looking for answers to more complex questions that require greater understanding", she said.

We do know that this tool is quite useful for journalists, since it allows an overview of the news available and how they've been written over time. According to Evelin, the application of said tools is not limited to these domains: “Text2Story can be used in healthcare, with the compilation of medical records of a given patient and concerning a certain diagnosis. It is very useful for professionals to quickly understand, for example, how a patient with lung cancer evolves over time. Legal professionals can also benefit from this tool, since it allows compiling and summarising relevant information from a sentence, while relating it to other key information".

StorySense goes a little further, since it connects to knowledge bases and interprets information. "I'm from Brazil, so when people talk about the Social Democratic Party, it's not clear to me where this party stands on the political spectrum. Having a knowledge base could help me understand this information, conveyed in a compilation of information on this topic. StorySense provides me with the background knowledge that allows me to mitigate misinformation and even help models extract information more efficient" stated the researcher.

Back to the European Elections, these tools could identify different narrative arcs, i.e., the most common narrative line concerning a party or a candidate, making it easier to establish cause-effect connections (Is the far right really spreading racism? We'll leave this one for AI to answer)

Each of these tools can help identify fake and out-of-context news, but there is still a lot of work to be done - many algorithms away. "I enjoy ideas that have an impact on society. Misinformation favours the lack of democratic balance; hence, I'd like to work on a project that uses narrative extraction resources to help people in this sense. Particularly since political programmes do not showcase everything about a party - and can even convey ideas that are never applied", said Evelin, regarding her main motivation as a researcher.

Wouldn't it be perfect to have a kind of large-scale polygraph, capable of bringing to the public (quickly and automatically) the truth about certain narratives or events? "This team was recently part of a project that involves, among others, a team from the Joint Research Center (European Commission), with the objective of identifying persuasion techniques. In this sense, we were challenged to participate in this project, by carrying out the annotation and identification of said techniques in Portuguese texts - with the help of a team of linguists, led by Purificação Silvano and António Leal. We selected 104 documents in Portuguese, annotated from the point of view of 23 classes of persuasion techniques, resulting in 1727 annotations. Our texts - and, above all, their notes (more than 30.000) - will now be used by AI teams, so they can develop algorithms for automatic identification of persuasion techniques, since this task cannot be done manually, particularly with large volumes of data", said Ricardo Campos.

AI and critical thinking? It is possible!

We are taking an effective step in the development of templates that can automatically label the various dimensions of texts. The annotation work performed by linguists is, as we have already observed, vital to leverage the emergence of other models - especially concerning European Portuguese, as LLMs mainly use English texts. The more data (annotations) we have, the easier it will be for the scientific community to create models and ontologies[2].

Portugal is progressing in the European and global AI landscape, and the main concern is to ensure that ethical and privacy issues are safeguarded when using data to create models. Moreover, this is a discussion that the European Commission has been leading recently.

Another question that stems when it comes to AI is whether we are sacrificing authenticity. Evelin Amorim believes that tools like those developed within the scope of Text2Story and StorySense move in the opposite direction, as they only compile information - leaving critical thinking to people. Tools like ChatGPT, on the other hand, generate standard replies without any room for creativity. The INESC TEC researcher also warns about the inherent issue of racial and gender bias in these models. "If a person depends only on the information generated by the models, without arguing with peers or colleagues, they will receive very distorted answers, leading to even more biases. If we ask a model to complete the sentence 'João works in the ward, so he's a...' the answer will be ‘doctor’ and not ‘nurse’. Why? Because the model is based on the most basic idea, on what is unconsciously in the minds of our patriarchal society", she added.

Back to the European Elections, we leave some questions to our readers. Have you read all the programmes of the political parties? What are the keywords from each one? Do you know the political path and proposals of each of the candidates? Do you know if your position on a certain question remained the same over time? If you answered "no", don't worry. After all, and unlike Barack Obama, we are not equipped with a team of data scientists and journalists to help us in the decision-making process. The good news is that five years from now, in the next European Elections, you already know where to look for answers!

[1]Generative AI models trained with large amounts of text to understand and generate natural language, as is the case with ChatGPT.

[2] In computer science, an ontology is a formal representation of a set of concepts within a domain and the relationship between those concepts.

This article was first published on 28 June by INESC TEC.

Never miss an update from Science|Business:   Newsletter sign-up