In 2001, The New York Times published a story about a new website that was “looking for ways to enable visitors… to collaborate to change the site themselves”.
The journalist Peter Meyers wondered if the world needed another encyclopaedia. “Vast information is already available on the Web, and… search engines have made it easier to find it,” he wrote.
At that time, Wikipedia, the site he was referring to, held about 10,000 entries.
Today, as it enters its 25th year, it is one of the internet’s longest-lasting websites, and the default reference source for students, writers, the curious, the uninformed, and anyone in the middle of an argument about who’s right.
The English version contains over 7 million pages. Versions in other popular language such as French, Spanish and German have over 2 million pages each.
But the institution’s future is now uncertain.
Key challenges include AI search, which offers and optimises summarised results; and AI content, which is sneaking past Wikipedia’s strict and layered moderation system.
In January 2024, an unknown contributor added an entry for the Ottoman fortress of Amberlisihar, “built in 1466 by Mehmed the Conqueror in Trabzon, Turkey”. It detailed its construction, history, damage during various wars, and its present-day status.
There was no Amberlisihar fortress. The entry was generated via a large language model (LLM), down to detailed citations that were also fake. The fraud remained undetected for 10 months.
From its inception, hoaxes and misinformation have existed on Wikipedia. The site has also been accused of bias. Wikipedia editors with IP addresses linked to British Petroleum, for instance, were found to have modified articles concerning the company and its environmental controversies. The British public-relations firm Bell Pottinger made hundreds of edits on behalf of its clients. Marketers and public relations executives worked on pages for corporations and politicians, sanitising them, downplaying scandals and criticism.
But the vigilance of a massive global editing and moderating team (an estimated 270,000 in English alone) managed to keep such activity in check. Volunteers certified as moderators spend hours every day checking new entries relating to areas of their expertise.
AI has made this harder.
SO WHAT EXACTLY IS CHANGING?
Since 2002, bots have been allowed on Wikipedia. A bot created in that year, Rambot, “transformed census data into short new articles about towns in the United States; the vast majority of town, city, and county articles were started by it.” Bots have also been used to detect vandalism, and to aid in translation.
But LLMs are different.
For most of its history, the platform’s infiltrators were identifiable humans with agendas: Companies, governments, political activists, hoaxers and pranksters. AI changes the scale of the problem. One person can now generate hundreds of plausible edits, complete with fabricated citations.
Every entry on the site is a battlefield, and AI represents a new weapon in a long-running war.
As of March, the English Wikipedia community has prohibited the use of AI to add content to articles, with exceptions for certain kinds of copy-editing and machine translation. This begins to address the issue of how the platform can use artificial intelligence.
But there is another aspect to how AI affects Wikipedia’s future.
Today’s AI-generated search-engine summaries mean that few have to click on the Wiki link any more. But every LLM is trained on Wikipedia.
According to data from OpenAI, English Wikipedia accounted for roughly 3% of GPT-3’s weighted training corpus. That figure doesn’t tell the whole story. Because the platform’s information is structured, verified and, in a sense, peer-reviewed, it is worth far more to LLMs than Reddit posts and Tumblr blogs.
In essence, AI is swallowing its millions of words of painstakingly sourced and fiercely argued material, in order to make it invisible.
Wikipedia has thrived on the dedication of human volunteers, their obsession with their subjects and their interest in ensuring the platform remains a dependable resource. Now, they are forced to divert time and effort towards combating a flood of AI-generated content, at a time when potted AI summaries reduce traffic to the site. How long before this wears them down, or declining visibility reduces the number of new editors willing to help?
At 25, Wikipedia risks becoming both foundational and invisible at the same time. The really interesting twist: in attempting to make it redundant, AI risks undermining its own foundation too.
(K Narayanan writes on films, videogames, books and, occasionally, technology)
