In a post truth age, academics must step up and ensure the reliability of the information fed into AI models
We are witnessing remarkable advancements in neural networks, big data models, intensive data analysis and machine learning. While much of the basic research that enabled these developments was funded by public resources, the widespread application of these technologies - such as generative AI models like ChatGPT or Bard - has been developed and commercialised by private companies, driven primarely by profit rather than the public god or the pursuit of an informed society.
This raises the fundamental question of what role universities play when powerful and wealthy private entities dominate the direction of digital innovation.
Generative AI technologies already have, and will continue to have, a wide societal impact. Therefore, it is crucial to understand how these technologies work and what they produce. The basic premise of these technologies involves analysing vast amounts of information to find patterns, and creating outputs that answer our questions, queries or demands, based on the patterns machines learned from the input data.
With sufficient input data, these systems can generate anything from computer code to pop music, paintings and poems. While these tools can enhance our knowledge and are increasingly being used by university students to support their studies, caution is warranted regarding how they are used and how much we trust their outputs. The critical issue is the quality and reliability of the data used to train these systems.
Traditionally - before the digital age and the advent of the internet - publishing involved robust validation. Mainstream media outlets hesitated to publish uncorroborated stories, with rare exceptions for whistleblowing in the public interest.
Scientific papers, initally published as an exchange of information between scientists rather than as source for gaining prestige and funding, underwent rigourous peer review and were considered trusted information. Any claims in scientific papers had to be backed by previous research, which had to be properly cited. Thus, there was some control to ensure that only truthful and scientifically sound material was published.
The digital revolution has drastically transformed this landscape. Today, anyone can write and publish anything, and even the most implausable ideas can find their place in the vast repository of human written knowledge online.
`This is the root of the problem of the veracity of the information AI systems are trained on. Just as the quality of a final dish depends on the quality of the ingredients used, the quality of the information generated by AI systems relies on the veracity of the information used to train these models.
In this so-called post-truth era, where the notion that all opinions are equal prevails and given the technological advancements that allow anything to be published and become part of the public domain of human knowledge, we face the challenge of assessing the quality of the input data that generative AI models gather.
Private companies are generally not interested, nor do they likely have the capacity, to sort through this information. Therefore, if we want generative AI models to provide accurate information in the future, we must address this crucial issue and explore viable solutions.
One potential solution could involve universities collaborating to create an academic or public generative AI model that exclusively relies on peer-reviewed scientific papers and other verifiable sources. A coalition of European (or even global) universities could join forces to develop an ethical and trustworthy AI model by carefully controlling the sources used for training.
Such a model would serve as a valuable tool for both the general public and in the academic community. Whether we like it or not, these tools are being used and will inevitably become more pervasive, particularly among students. Instead of avoiding AI, we should embrace and refine it to serve the public good and advance universities' missions.
However, in discussing such public generative AI tool, based on verifiable and proven information, we encounter another important question regarding scientific publishing. While this topic alone warrants a comprehensive discussion, many agree that the scientific publishing system is deeply flawed. With 10,000 scientific papers (1 out of every 500 published) being retracted due to scientific misconduct in 2023 alone, an inflation of publishers and scientific papers, profit-driven scientific publishers running the show (in a sector that is already the most profitable of all economic sectors) and systems rewarding quantity of publications and reputation of the journal rather than quality of individual publications it is clear we need some major changes.
Even if we restrict AI training to peer-reviewed papers, these systemic flaws remain. Therefore, the academic community must tackle these two interconnected issues simultaneously: building verifiable AI models based on high-quality publications and reforming the scientific publishing system. Addressing these challenges is critical to restoring trust in science and encouraging societies to once again value true knowledge
Universities must embrace AI and contribute to its development as part of the public domain for the advancement of humanity and the benefit of society. If we shy away from this, others will take the lead, likely with detrimental effects, similar to what we have already observed with social media, which spreads misinformation, polarizes society, and creates distrust in fundamental societal values such as democracy.
We live in a time of profound societal changes, and universities bear significant responsibility to uphold democratic values, foster knowledge based societies, and ensure respect for human rights for everyone. As institutions that have endured through centuries of societal upheaval by balancing tradition with adaptability, universities must continue to evolve. Embracing the tools being adopted by the broader population—while ensuring their ethical and beneficial application—is essential. By actively shaping the development and use of AI, universities can safeguard democracy, uphold human rights, and contribute to the preservation of our planet for future generations.
Gregor Majdič is rector of the University of Ljubljana.