Did AI researchers let AI hallucinations into scientific papers?

9 min

•Feb 21, 20263 months ago

Summary

The episode investigates AI hallucinations in academic research papers published at NeurIPS, a premier machine learning conference. Researchers discovered over 100 fabricated citations across 50 papers, revealing that busy AI researchers are using AI tools to write sections of their papers without proper verification, creating false references that undermine academic integrity.

Insights

AI researchers themselves are vulnerable to AI hallucinations despite their expertise, indicating the problem is systemic across the industry
There are significant career and financial incentives (job offers, $100M+ funding) driving researchers to publish prolifically, creating pressure that leads to shortcuts
AI hallucination detection requires specialized, trained AI systems rather than general-purpose models like ChatGPT
Academic review processes rely heavily on trust rather than verification, making them vulnerable to undetected AI-generated errors
AI bias disproportionately affects non-anglophone researchers, with the system more likely to fabricate citations for non-English names

Trends

Widespread adoption of AI writing tools in academic research without adequate verification mechanismsGrowing tension between speed of publication and accuracy in competitive academic fieldsNeed for specialized AI detection tools to identify AI-generated errors in professional contextsBias in AI systems affecting citation generation for non-Western researchersErosion of trust in peer review processes due to AI-assisted paper writingEmergence of AI verification services as a new business categoryPressure on academic conferences to develop clearer AI usage guidelines

Topics

AI Hallucinations in Academic PublishingFabricated Citations in Research PapersNeurIPS Conference Quality ControlAI-Generated Content VerificationAcademic Integrity and AI ToolsPeer Review Process VulnerabilitiesAI Bias in Citation GenerationLarge Language Model LimitationsResearch Paper Writing AutomationAcademic Publishing Incentive StructuresAI Detection and Monitoring ServicesNon-Anglophone Researcher BiasMachine Learning Research StandardsChatGPT and Academic MisuseScientific Paper Reproducibility

Companies

OpenAI

Creator of ChatGPT, discussed as an example of AI tools that hallucinate and are used by researchers

Google

Mentioned as a major tech company with industry researchers publishing at NeurIPS

Meta

Mentioned as a major tech company with industry researchers publishing at NeurIPS

Anthropic

Creator of Claude, mentioned as an example of AI tools that can hallucinate

People

Alex Tway

CTO and co-founder of GPT company that discovered 100+ hallucinated citations in NeurIPS papers

Quotes

"In some ways, it's a weird point of pride, I think, to be hallucinated by an AI. That's definitely one sign that you've made it in the industry."

Alex Tway

"The most accurate methods to do this are using AI itself, but extremely specialized for this purpose."

Alex Tway

"If we can trust that your paper is even a human reviewed so the AI is making mistakes in your paper and you not catching it then how can you trust that everything else created by the researcher was also reviewed by human and not hallucinated by AI"

Alex Tway

"It would just start chaining together highly likely names of researchers, such as, like it would start chaining Chinese initials, like HYX, XZ, like N, blah, blah, blah, blah, blah, blah, like just like a string of like 10 three-letter acronyms."

Alex Tway

Full Transcript

This BBC podcast is supported by ads outside the UK. designer, marketer, logistics manager, all while bringing your vision to life. Shopify helps millions of business sell online. Build fast with templates and AI descriptions and photos, inventory and shipping. Sign up for your one euro per month trial and start selling today at shopify.nl. That's shopify.nl. It's time to see what you can accomplish with Shopify by your side. Hello and thanks for downloading the More or Less podcast. We're the programme that looks at the numbers in the news and in life and in AI hallucinations. I'm Tom Coles. As a small print warns you, if you ever asked ChatGPT to help your kid with their maths homework, AI can make mistakes. Despite having all the confidence of your overconfident friend, some of the stuff that AI engines like ChatGPT, Gemini, Grok or Claude confidently tells you is essentially made up. I mean, to be totally fair, everything a large language model like this tells you is just what it thinks is the most likely answer, but much of the time the most likely thing is factually accurate. Sometimes it's totally fictitious, and this totally fictitious or false stuff is sometimes called a hallucination. Whether these hallucinations matter depend on what you're using AI for and whether they are spotted and sorted out. So the team on Moral S were slightly surprised to see the following headline in Fortune magazine. One of the world's top academic AI conferences accepted research papers with 100 plus AI hallucinated citations. You might think that the top AI researchers in the world would be careful about using AI to write their research papers. So is this number right? And what does it mean if it is? People have started to kind of share that they're getting citations from these big hallucinations. And it's a mixture of, I think, pride and bewilderment. This is Alex Tway the CTO and co of GPT the company that found these 100 AI hallucinations in research papers They like hey this LLM knows so much about my research It thinks I wrote all these papers I didn't. In some ways, it's a weird point of pride, I think, to be hallucinated by an AI. That's definitely one sign that you've made it in the industry. If you're new to this subject, it might sound strange to talk about a computer program hallucinating. They're not out in the desert taking mind-altering substances after all. The reality is a little more prosaic. What happens is a researcher might say, oh, like, can you write this section of the paper for me and make sure to add a lot of citations? And the AI will do that, but it's essentially doing it without any references. And so it has to start making things up to make them look like real high-quality citations. but they're actually not correspond to anything real. Alex certainly has skin in this game. The company he runs offers a service to organisations publishing these papers to help root out AI slob. But no one is denying that there are AI hallucinations in these papers. So what exactly did they find? First, the context. The papers in question were published as part of a big AI conference known as NeurIPS. It's essentially the premier event for machine learning. This get-together attracts the brightest minds in AI, or machine learning, as Alex calls it. From academic researchers at top universities to industry researchers at the big tech companies like Meta and Google, Alex says that in this booming industry, getting your paper published really matters. Having a couple of papers in these conferences can get you an open AI job. If you're a startup company, having a couple of these papers in these conferences can mean raising $100 million from investors. This means there's a massive incentive for researchers to pump out a lot of papers for consideration at conferences like this one. Once a year, they'll receive about, let's say, around 20,000 submissions. And then they'll accept about 5,000 of them. This is where Alex's company comes in. They took those 5,000 odd papers that were selected for publication and, oh, wait a minute, asked AI to take a look? Yeah, so that's the funny thing. The most accurate methods to do this are using AI itself, but extremely specialized for this purpose. And so to make our specialized AI be able to find hallucinations, we have to train it on countless examples where we actually labeled like hey this is a hallucination this isn And it drastically improves in that task compared to an off chat GPT or something like that So essentially we ran our hallucination detector on these papers. For this experiment, they didn't look at the text of the papers themselves, but just the citations, the references to other papers which lodged the research in the wider web of academic publishing. These are easier to verify as wrong, as you can easily check against the real thing. And we would go through each citation and try to verify whether or not it exists, searching through massive, black-scale search engines, academic databases, and so on. And so we might get a bunch of potential matches. Most of the time, the details of the paper cited, author, title, and date published, did match a real scientific paper. Sometimes, they really didn't. We were able to pretty easily find at least 100 hallucinations over 50 papers. This isn't an exhaustive list, by the way. They just stopped counting when they found a suitable round number. And these hallucinations took a variety of forms. About 39 were just completely non-existent publications. Then the other 61... had a combination of fabricated authors, people who don't exist or exist but never wrote a paper like that, fake titles, fake links or URLs, and so on. Some of the dodgy citations contained real authors but who didn't write those papers. Some had wonky names or odd titles, but others were just completely made up. This one is my absolute favourite. The authors were first name, last name, and others, which I imagine is a coincidence if all of those three were real people. We asked professor first name and doctor last name for comments, but didn't hear back. Look, it seems clear from this research that incredibly busy, under-pressure researchers are using AI to write the boring bits of their papers. It's quite funny, really, that AI researchers aren't immune to its charming overconfidence, but beyond the obvious irony, why does this matter? In the culture of computer science, it often can be challenging to reproduce some of these experiments. And the reviewers, they don't have time to do so in their own capacity. And so there's so much trust that goes into, like, did you write your code correctly? Is your data correct? They're often at scales that are impossible to verify without a lot of actual work. And so if we can trust that your paper is even a human reviewed so the AI is making mistakes in your paper and you not catching it then how can you trust that everything else created by the researcher was also reviewed by human and not hallucinated by AI For their part, the organisers of the New Rips conference told us that while they accept researchers are using AI to write their papers and hallucinations can get through the review process, they do not believe the research in these papers would necessarily be invalidated by the discovery of AI hallucinations. At the same time, they're continuously refining their guidance for both the authors and reviewers as the use of AI rapidly evolves. There is an implication for society in all of this too. Alex says that because of the biases in the stuff the AI has learned from, citations from non-anglophone researchers seem to go wrong more than for others. Although this wasn't from a Neurips paper, there are some pretty odd things going on. We found that it would just start chaining together highly likely names of researchers, such as, like it would start chaining Chinese initials, like HYX, XZ, like N, blah, blah, blah, blah, blah, blah, like just like a string of like 10 three-letter acronyms. And you could just tell that the LLAM thinks, oh, like if I had to make up a citation, all I have to do is just write Chinese names. Thanks to Alex Tway. That's it for this week. If you've seen a number in the news you think we should take a look at, email moreorless at bbc.co.uk. We'll be back next week. Until then, goodbye. Starting a business can be overwhelming. You're juggling multiple roles. Designer, marketer, logistics manager, all while bringing your vision to life. Shopify helps millions of business sell online. Build fast with templates and AI descriptions and photos, inventory and shipping. Sign up for your one euro per month trial and start selling today at shopify.nl. That's shopify.nl. It's time to see what you can accomplish with Shopify by your side. If there was a big red button that would just demolish the internet, I would smash that button with my forehead. From the BBC, this is The Interface, the show that explores how tech is rewiring your week and your world. This isn't about quarterly earnings or about tech reviews. It's about what technology is actually doing to your work, your politics, your everyday life, and all the bizarre ways people are using the internet. Listen on BBC.com or wherever you get your podcasts.