Maddy Mahoney puts ChatGPT to the test, to see if it holds up to The Review’s fact-checking standards

An image of a hand-drawn robot appears beneath the words "Accuracy Test." The robot is holding papers, as speech bubbles behind it say "Plague Outbreak?" "White House" "Monarch butterfly migration" and "American Revolution"
Art by Nuha Farhan

Since its launch in November 2022, ChatGPT, the artificial intelligence chatbot developed by OpenAI, has been sparking discussions about what role AI can and should play in the future of journalism. Sure, it might be able to write interview questions for you, but it also might plagiarize your work and steal your job. But perhaps most worryingly, there are serious concerns about the accuracy of ChatGPT that call into question whether it can really be relied upon to support solid journalistic work. It has a lot of answers, but are they the right ones?

So, we decided to run a little test. At the Review, everything we publish must go through a rigorous fact-checking process that involves verifying each and every fact (often multiple facts per sentence!) using either one primary source (a live video, a person who experienced an event firsthand) or multiple secondary ones (reputable news articles, for example). Almost always our work gets tweaked and changed throughout this process, because the bar for accuracy is high. Below, we take ChatGPT to task by applying these same standards to a few of its answers. 

Test 1

Let’s start with something with plenty of available sources: What makes the monarch butterfly migration in North America unique?

ChatGPT: The monarch butterfly migration in North America is unique for several reasons. Here are a few key factors that make this migration so remarkable:

  1. The distance traveled: Monarch butterflies undertake a long-distance migration each year, traveling up to 3,000 miles (4,800 kilometers) from their breeding grounds in Canada and the United States to their wintering grounds in central Mexico. This makes their migration one of the longest known insect migrations in the world.
  2. The multigenerational aspect: Unlike many other migratory animals, monarch butterflies complete their journey over the course of several generations. Each year, a new generation of monarchs is born and makes the journey south to Mexico, while subsequent generations make the return trip north in the spring.
  3. The use of a specific wintering location: Monarch butterflies overwinter in a very specific area in central Mexico, where they cluster together in large numbers on the branches of oyamel fir trees. This location is so important to the survival of the species that it has been designated as a UNESCO World Heritage Site.
  4. The visual spectacle: The sheer number of monarch butterflies that congregate in central Mexico each winter is truly breathtaking. The clusters of butterflies on the trees can be so dense that they make the branches bend under their weight, and the sight of millions of butterflies taking flight at once is a sight to behold.

Observations: ChaGPT did pretty well on this one! It neglected to factor in the portion of the monarch population that winters in California, and mixed up the oyamel fir forests for oyamel fir trees. Some small tweaks needed, but nothing major.

Test 2

Alright, now let’s do something a little more niche: When was the most recent outbreak of the plague?

ChatGPT: The most recent outbreak of the plague occurred in Madagascar in 2017. The plague is a bacterial infection caused by the bacterium Yersinia pestis, and it primarily affects rodents, but can also be transmitted to humans through flea bites or through contact with infected animals or people. The outbreak in Madagascar began in August 2017 and resulted in 2,348 confirmed cases and 202 deaths by the end of the outbreak in November 2017. The majority of cases were of the bubonic form of the plague, which causes swollen lymph nodes, but there were also cases of the more severe pneumonic form, which can be transmitted from person to person through the air.

Observations: Okay, ChaptGPT struggled a bit with this one. It stumbled over context—yes, bubonic plague does cause swollen lymph nodes, but given its really high death rates that seems like oversimplification to the point of inaccuracy. And interestingly, it seems to have swapped out a general truth for a specific one. It’s true that bubonic plague is the most common form of plague generally, but the fact that this specific outbreak went against the trend, with a majority of cases being pneumonic plague, is a critically important piece of information.

Test 3

What a question about a topic that’s rife with misinformation: What happened on January 6, 2021 at the White House and who was involved? 

ChatGPT: On January 6, 2021, a violent insurrection took place at the United States Capitol building in Washington, D.C., not the White House, which is the residence of the President of the United States. The insurrection was aimed at preventing Congress from certifying the results of the 2020 presidential election, which has been won by Joe Biden. 

The event was sparked by a rally near the White House, where then-President Donald Trump spoke to his supporters and repeated baseless claims of widespread voter fraud in the election. He encouraged his supporters to march to the Capitol building, where Congress was meeting to certify the electoral college results. 

As a result of the insurrection, rioters breached the Capitol building, resulting in violence, destruction of property, and injuries to law enforcement officers and other individuals. Five people died, including a Capitol police officer, and dozens more were injured. 

Many high-profile individuals, including members of Congress, were present at the Capitol building when the insurrection took place. Some politicians, such as Senators Ted Cruz and Josh Hawley, were criticized for their role in promoting baseless claims of voter fraud that fueled the insurrection. President Trump was also widely criticized for his role in inciting the insurrection, leading to his second impeachment by the House of Representatives, although he was ultimately acquitted by the Senate. 

Observations: Not bad—many of the basic facts are correct, but ChatGPT was mixing up both numbers and cause and effect. An “individual” may seem like a small change from “individuals” but it does make a difference, and amounts like “dozens” are pretty open to interpretation. Regarding cause and effect, there were some mistakes and missing context. It’s not quite accurate to say that a complex event was caused by a single speech, and if violence occurs both before and after rioters enter a building it isn’t accurate to say that the violence was a result of their break in. 

Final Thoughts: Whatever else may happen, it seems like our fact-checkers won’t be out of a job anytime soon!

About the author

+ posts

Maddy Mahoney is a second-year Master of Journalism student at Toronto Metropolitan University. She is a freelance journalist whose work can be found at Xtra Magazine, Toronto Life, CBC Arts, and others. She was formerly an Editorial Intern at Maisonneuve magazine and the Web Editor for CJRU 1280AM. She’s interested in long-form features, queer storytelling, subcultures, and weird internet trends.

Sign Up for Our Newsletters

Keep up to date with the latest stories from our newsroom.

You May Also Like

CBC featured more Israelis even as Palestinian casualties rose, data shows

CBC’s flagship broadcast continued to feature more Israelis than Palestinians even as the death toll in Gaza mounted. It also failed to identify by name more than a quarter of Palestinians and their allies