This AI Legal Battle Could Reshape Journalism in Canada

“Big tech, again and again, shows itself to be an industry that moves with entitlement and lack of care,” wrote author Michael Melgaard in a contribution for The Walrus in 2023. Not long before the magazine’s interviews with Melgaard and other Canadian writers, The Atlantic’s Alex Reisner had exposed the contents of Books3, a text database used to train LLaMA, Meta’s large language model (LLM) for AI-generated text. Alongside publicly available data like Wikipedia entries, the database was found to have over 191,000 scraped copyrighted ebooks from the internet, including the torrented works of Canadian book authors. By feeding the LLaMA system literary texts, Reisner wrote, the goal was to produce “humanlike” language.
After the Atlantic article was published, it was discovered that Sports Illustrated had published AI-generated articles, according to sources interviewed for Futurism. The articles were attributed to fake authors with blurry, uncanny profile photos and generic bios. LLMs were no longer ingesting human writing—they were attempting to replace it. “That unsettles me more than any breach of my copyright,” wrote author Joan Thomas for the Walrus. “The degrading of our culture with more crap content.”
By late 2024, Canadian media caught onto the fight against AI scraping. In a lawsuit filed in November, Torstar, Postmedia, The Globe and Mail, The Canadian Press, and CBC/Radio-Canada accused OpenAI of scraping their content without permission or compensation. Their joint statement claims that by using “large swathes” of their work to train its LLM, OpenAI is violating Canadian copyright law. However, until the hearings, whether AI’s use of published media is legally considered theft remains an open question.
The news media companies are seeking $20,000 in damages per infringement—per scraped article—or a fee proportionate to the value of their entire digital publication archives. “Given that they’re talking about millions of documents,” says lawyer and Thompson Rivers University professor Robert Diab, “you can do the math.”
Financial Damage

The Canadian companies’ lawsuit follows The New York Times’s 2023 case against OpenAI, which revealed that ChatGPT had been regurgitating paywalled articles word for word to its users. In that case, the financial damage, in the form of subscription revenue lost from article pirating, is clear. North of the border, the legal argument is murkier.
The case hinges on how courts interpret copyright law, wrote Diab for The Conversation. OpenAI claims ChatGPT “learns” from news articles’ copied text to adapt the service to new prompts, not to embezzle ideas written by human journalists. With that argument, there’s some wiggle room to prove that its process is not equivalent to a writer stealing ideas from hard-earned research or copying and pasting paragraphs.
What’s more, if OpenAI’s processes are deemed unauthorized copying, the five media companies will still need to prove whether or not they fall under “fair dealing”—Canada’s version of fair use. “The Supreme Court of Canada has said that if you engage in unauthorized copying for research purposes, you still fall within the exception,” says Diab, “even if you’re doing so for a profit.”
The legal ambiguity leaves Canada playing catch-up. The country’s fair-dealing laws do not yet account for the continuously updating capabilities of LLMs. Minimal AI regulation could lead to detrimental consequences in the future. “Journalism is in the public interest,” says the five media companies in their statement. “OpenAI using other companies’ journalism for their own commercial gain is not.”
Beyond Copyright Violation

To tech experts and civil liberty advocates, the use of journalism for AI improvements doesn’t just violate copyright law. Training generative tech on human writing—with Canada’s lax regulations to halt it—could harbour critical consequences for the privacy and safety of journalists and consumers alike.
On November 2, 2023, Daniel Konikoff of the Canadian Civil Liberties Association urged Parliament to revise the drafted Artificial Intelligence and Data Act (AIDA), part of tech privacy and safety bill C-27. While AIDA promised to “ensure the development of responsible AI,” civil groups pointed out that lawmakers had neglected to consult the public on how AI should be regulated. “Public interest technology needs to be developed by the public,” says Christelle Tessono, a research and policy assistant at Toronto Metropolitan University’s think tank, The Dais, and a signatory of the letter.
Given that recent research has found that critical thinking skills diminish with increased reliance on AI tools, the growing usage of services like ChatGPT could be problematic for the state of news media consumption, says Tessono. “I feel like we’re not going to be able to be as critical as we are now when reading literature.”
The future is uncertain, Tessono continues, especially as the data stores and capabilities of LLMs like ChatGPT grow. “The steps to analyze, propose new laws, and operationalize them will take time,” she says. “It’s natural, but during that time OpenAI will continue to produce and develop their products.”
By December 4, 2024, ChatGPT had reached 300 million active users per week. That count included lawyers and journalists using made-up research as well as AI-generated articles falsely credited to real journalists published in news outlets around the world.