[News clips playing]
[Crowds cheering]: Canada! This is not a fringe.
[News Anchor]: Take a look at the fast-growing crowds in front of Parliament Hill some of the first to arrive ahead of this weekend’s protest against the government’s vaccine rules.
[News Anchor 2]: But first, Hockey Canada says they are dealing with a new allegation of sexual assault. They say it involves members of the 2003 World Juniors men’s team. That tournament was co-hosted in Halifax...
[News Anchor 3]: ChatGPT. Maybe you’ve heard of it. If you haven’t, then get ready because this promises to be the viral sensation that could completely reset...
[News clips fade out]
[PULL QUOTES co-host, Tara De Boer starts talking]
Tara De Boer: I think we can agree that 2022 was an eventful year for those who cover the news. From industry developments to groundbreaking and at times, controversial reporting. Since September, we, at the Review of Journalism have been busy examining these areas. For five of the six episodes to come, we will be joined by Review of Journalism writers to talk about their thoroughly-researched stories to provide a behind-the-scenes account of the reporting that went into them.
Tim Cooke: But first, we want to take this episode to examine a hugely significant issue that broke in November 2022.
[Intro music plays softly in background]
The newest incarnation of text generating AI, most notably ChatGPT. Welcome to the sixth season of Pull Quotes. We’re your hosts, Tim Cooke,
TD: Tara De Boer,
Silas Le Blanc: and Silas Le Blanc.
PART 1 – Generative AI in newswriting: CNET case study
TC: Types of certificates of deposit. What is a second chance chequing account? It was seemingly business as usual for CNET, the U.S. online tech news publication. The site’s money section was churning out content, including dull financial explainers, with SEO-friendly headlines. As happens at news organizations, the attribution of some of these was nonspecific. ‘CNET money staff.’ But in the case of over 70 such articles—published from November 2022 through early January—this was misleading. If you click their bylines, there was a drop-down. “This article was generated using automation technology, and thoroughly edited and fact-checked by an editor on our editorial staff.” This was the only indication of AI input, and it still didn’t tell the whole story. Reputable outlets like CNET, have been publishing automatically-generated content for years. Previously, such articles were formulaic, about things like corporate earnings and sports results and based on pre-written templates. To produce the articles, the automation software analyzed incoming data, selected the appropriate templates and inserted any relevant data points. But CNET’s explainers were too intricate for that. CNET was using its own ChatGPT-esque tool for dynamic real-time natural language generation.
[Chime, soft background music plays]
It was only an experiment. But one CNET failed to clearly announce to its readers. Even to many of its own staff. It took a January 11th Twitter thread from an online marketing expert who had clicked on said bylines, for it to fess up. In a statement to readers, CNET admitted that a[n] “AI engine had drafted each article or gathered some of its information,” but no apology. They just hastily amended the byline to make the AI aspects clearer. There was outrage on the internet. People were hailing the move as the beginning of large-scale replacement of human newsroom staff. And then, things got worse. It turned out that the editing and fact-checking process hadn’t been thorough enough. A CNET audit revealed that roughly half the AI-generated explainers included factual errors. They also contained significant plagiarism. CNET paused the use of the tool.
The ChatGPT breed of text-generating AI is wending its way into the newsroom. Besides CNET, Buzzfeed, and soon to be launching startup, The Newsroom have announced similar experiments. Major newspaper publishers, Axel Springer, and U.K.-based Reach have indicated they plan to roll it out across their titles. It’s wider and more successful deployment may only be a matter of time. But what might this look like? What might the consequences be for news journalism? The Review has examined generative AI before. In 2020, Mitchell Consky’s feature provided an overview of the issue and what it meant for journalism. It touched on OpenAI’s GPT-2 model. In the same year, Pull Quotes conducted an experiment to see if the software can mimic the collective voice of the magazine. Both concluded that news writers could sleep easy—for the time being. But in November 2022, OpenAI released ChatGPT powered by the more advanced GPT-3, it has just released GPT-4. So, we thought it was time to follow up. First, I think we should unpack exactly how ChatGPT and its ilk, known as “large language models” actually work. For this, I needed the help of a computer scientist.
PART 2: Dr. Alona Fyshe describes how generative AI works
Dr. Alona Fyshe: I’m Alona Fyshe, I’m an associate professor at the University of Alberta my position is joint between computing science and psychology.
TC: Okay, so yeah, my first question is one, if you could give me an explanation for the layperson about how large language models are built and how exactly they work.
AF: Large language models are built with neural networks, they’re trained to predict a missing word, usually, it’s the last word of a sequence, but they can also be trained to predict a missing word inside of a sequence of words. So, take a sentence and take a word out, get the neural network to predict the middle word, or the next word that’s coming. Yeah, in recent years, they’ve grown to be very large, which is why we call them large language models.
TC: Okay, so and a neural network is sort of trying to recreate sort of the structure of the human brain.
AF: Yeah, that depends who you ask [laughs]. If you ask somebody who knows something about the brain, they would, they would probably say, no. Neural networks were definitely inspired by the brain, the individual neurons themselves are connected to each other in a way that is reminiscent of how neurons in our brains are connected. But there are huge differences between brains and neural networks, as I’m sure you’ve heard other people tell you.
TC: And I was… another sort of crucial component of, I guess, understanding these large language models, is sort of the training that they undergo, they perform. Can you maybe tell me a little bit about that?
AF: Yeah, so they’re trained on sequences of text. Oftentimes, they’re trained to predict the next word in a sequence. So, it turns out even as you’re listening to me talk right now, you are also predicting the next word of what I’m about to say. And if I said something unexpected, you would have a particular brain response to that unexpected word. So similarly, computer models, these language models can be trained to predict the next word in a sequence. And it turns out, to do that task well, you need to know a lot about language. And so that very simple prediction task–just the next word in a sequence–trains language models to do some pretty impressive things. They learn grammar, they learn sentence structure, they learn to match quotes, use punctuation correctly, reuse the proper names of things inside of text. All of this comes just for the simple task of predicting the next word in a sequence.
TC: And hopefully, this is not crap I read on the internet, but I sort of read somewhere that the way they do this is like, there’s so much text in these training models, so the way it works to sort of train it to predict things is sort of, well, somehow, words, phrases are blanked out in the text of this training data, and you sort of see if the computer can predict what comes next. Is that sort of how it works?
AF: Right. Yeah. So, there are two ways to do it. One is the next word prediction, which I just described. There’s also masked language models, in which we would mask or blank out a word inside of a sentence and have the model predict that missing word. They’re sort of… I mean, they’re similar, just in one case, you have context on both sides of the word. And on the other case, you only have context on one side.
TC: From your understanding sort of why has this breakthrough in text-generating AI, these large language models, what was sort of, what was preventing this breakthrough until recently?
AF: I think it might feel recent, but I think it’s been chugging along behind the scenes for you know, a decade, a decade and a half now, getting progressively better. And for multiple reasons. Some of the ones, I would say, is that the amount of data has increased and continues to increase. People keep writing stuff on the internet, which is great for those of us trying to build models because they keep creating training data for us. But other things that have changed is that the amount of compute power available has changed. GPUs have become much more powerful than they used to be. There’s also been other advances like Transformers were invented, which allowed models to sort of consider an entire context an entire history of text rather than just the most recent contexts, I would say. And that’s helped also. But another thing that I think is most apparent in the recent models is that a lot of work has gone into aligning the AI to produce the text that we want it to. As you know, as a journalist, there’s multiple next things you could say in an article and choosing the right next thing or the best next thing is a task of weighing different options. So I was saying earlier that these language models are trained to predict the next word. And actually the most recent iterations have been given more information, not just what is the next word, but also a ranking of possible next words. And that ranking wasn’t available earlier, because we were using just text by itself. But what they ended up doing was having, for a given context, having multiple people generate not the next word, but like the next sentence. And then based on those next sentences, they would have another set of people rank them for what was the best next thing to say. And that additional piece of information about not just what is the one next thing you could say, but here are multiple next things to say, and here is the ordering of them: which one is best, which one’s second best. I think that particular piece of information is, was really useful to machines. And it required a lot of human effort, huge teams of people had to be hired and trained in order to do this.
TC: So yeah, one question that occurred to me that might get I don’t know, people who are not so technologically-inclined, like myself, a little bit riled up about this technology is people sort of have misconceptions about how intelligent these things are. Whether words such as think or understand, can be sort of ascribed to them, whether they actually… Yeah, whether they can actually think or understand language. Can you sort of maybe define the intelligence that these models have, if you can? That’s a huge question.
AF: Yeah. I mean, so, no, I can’t. But I can talk to that. So I – what do we mean by think and understand? I think everybody has a little bit different understanding of what that word means. But what I can say is that these language models, they don’t live in the world, they don’t exist. They don’t understand physics, they don’t know how to judge the emotion of a face, like they’re missing huge parts of the human experience. So they do not understand the world like we do. And that comes up a lot of a lot of the time, if you play around with these language models in asking it questions that require it to understand something about the world that’s hard to get through text alone. And one of the easy examples is physics. There are a lot of properties of physics that, of course, show up sort of implicitly in text, but are very rarely described explicitly in text. So like the rolling of a ball down a hill, that gets stopped by a curb and, you know what I mean? Like there’s these sorts of… how objects act in the world doesn’t show up as strongly in the text generated by language models, because language models don’t live in the world.
TC: Yeah, no, I’ve definitely seen things like people asking it, ‘which is heavier, a battery or a picture frame?’ And that’s never been talked on the internet. So obviously, it has no idea what that prompt is on about. Yeah.
AF: Yeah. Yeah. So, I mean, that shows up a lot, and I mean, how will we solve that? I think there’s lots of possible ways. But I think the long-term thing is that language models need to be in the world to understand the world.
PART 3 – Lucas Timmons on how newsrooms can use generative AI
TC: Now, let’s consider what large language models might mean for news journalism. Lucas Timmons is a digital developer at Torstar. Among other things, Timmons develops template-based article generation tools. So I thought he’d be a good person to help make sense of current large language models’ potential impact. His work features in the Review story I mentioned earlier, please check it out. You can also read about it in J-Source. I’ve added links in the show notes, as well as one to his personal website.
TC: I was wondering if now we could look ahead to sort of the next development in text-generating AI technology. So yeah, ChatGPT at the moment, the one everyone can access on the internet, that is not able to browse the internet, just it’s able to sort of review its training data. But once tools like these are hooked up to the internet, are able to browse the internet for new information, first of all, do you think that such tools could be used to write the sort of the main bulk of news articles barring a bit of additional human reporting? And if so, do you think that newsrooms will deploy these tools and for that purpose for news writing?
Lucas Timmons: I think that could help, definitely. I don’t think we’re anywhere close enough yet to it being something that, you know, is going to take widespread adoption, like you said, you know, it’s… you have to be wary on the data models they were trained on, and the information that they’re using. Is that information, correct? Is it up to date? It’s, you know, I’ve tried it and we’ve had some varied results, you know, I’ve experimented with it. Like I said, the current AI tools are prone to both errors and biases. And they often produced dull, sort of unoriginal writing, or in the worst-case scenario, you know, plagiarized portions of text, which is incredibly problematic if you’re working in news. Like even if you could get it to generate something that you think is good, how do you know that part of that’s not plagiarized?
TC: So, so do you think there could be sort of any additional types of news content, if not actual sort of news articles themselves, that could be automated using these? I don’t know, maybe a sort of– a use that springs to mind for me, sort of backgrounders and explainers on sort of complex issues that normally would have taken a journalist and quite a lot of time researching. Could maybe those be produced using these tools? Or do you think they’re still there too unreliable?
LT: Yeah, I think absolutely, there is. I think that’s a great use-case for it. But again, you’d have to make sure that the data going in is correct. When I worked at the Canadian Press, I built a tool to parse previous election data and latest census data in order to allow the reporters to do research. And, you know, dealing with StatCan data can be rough, especially the census data. It’s hard to make comparisons between datasets, if, you know, if you don’t have all the data. It’s almost impossible to rank them unless you pull in all the data. And so, you know, the software that I wrote, allowed reporters to do that type of thing, and to do it quickly. It saved them a ton of time, and they use that to inform the types of stories when they were writing about, like riding profiles, or background information. And I think that’s a really good use case of that. It’s not something that, you know, even though it generated stories, it would generate like a 600-word story that we know was 100% factually correct, like I said, it’s kind of, it’s dull, it’s kind of boring. It’s not something that you would just publish on its own. So I think it’s something that could be used for augmenting your reporting and coverage. But again, you have to be entirely sure that the data going in is correct. You know, what’s that…‘garbage in, garbage out.’
TC: So, on the topic of sort of, yeah, using as an aid in your reporting, could you see it being used as an aid for actual writing? And I’ve seen in my research people using ChatGPT to sort of edit things, where they’re going “oh rewrite this in the style of Ernest Hemingway,” for example, to cut down on excess words. Or, I don’t know, helping them produce like a rough first draft and then refine, or helping them generate ideas using it. Is that sort of how you might see it being used, rather than directly with the writing?
TC: Yeah, I think that’s another great use case for it. We’ve been using tools that are like really basic, this sort of thing for a long time. Like, everyone uses a spellcheck. You know, in journalism, people use Tansa, and it’s little more sophisticated than just a spellcheck. But yeah, like if you’re asking it to check grammar, and you you’re confident that it understands the rules of grammar, absolutely. You know, you can you can tighten up your writing that way, I’d suggest that probably being a tight writer from the start by honing your craft is probably better than asking the computer to do it for you every time, just in terms of the time you’d save. But absolutely, it’s something you could do. I think the thing is, you’re going to have to know how to deal with these tools in the proper way, how to ask them to give you the output that you want. Like, I’ve been using one in my sort of work, it’s not exactly in writing, but GitHub has the software called Copilot, it uses the OpenAI Codex to suggest code and entire functions in real time, while you’re writing. It’s kind of like having a computer be your pair programmer with you. And it helps and it’s great. The problem is, it can’t think. And like these AI’s, they can’t think. You can get really good at telling it what to do. And it’ll give you the output kind of that you want. But you’re still going to have to debug, and you’re still going to have to verify the results.
TC: Could these also be used for sort of copy editing? Could you not, in the prompts, say that you want to, I don’t know, ‘please, edit this using CP style guide’?
LT: Yeah, if it’s been trained on the CP style guide, absolutely. I think it would be great for summaries, if you can submit the full text and say, ‘give me, you know, a great SEO headline for this story,’ or ‘give me a great description or something that we can use on social media.’ ‘Give me a summary of this, that’s under 240 characters,’ and you can learn to prompt it really well to try to get it to do what you want. And as you teach yourself how to do that, I think that’s a great possible use for this.
TC: So you don’t really see, sort of, any widespread sweeping patterns of adoption of this technology. It’s more that individual reporters will sort of work it into their workflow, if and when they can. You don’t see it sort of making gigantic, catastrophic, devastating waves through the industry?
LT: Not at first, no. I mean, who knows how fast the technology is going to change? I think if you can incorporate it into your workflow, it’s going to help you out a lot. But again, you have to be wary of the pitfalls that we talked about. The thing that bugs me, and worries me a little bit, is it is sophisticated enough now to write stories if you don’t care whether or not they’re completely accurate. And it’s pretty easy to set up a news website, throw on some terrible ads on there, or Taboola or whatever, and just start printing stuff. We’ve seen the sort of devastating effects that misinformation can have in democracies and around the world. And with this sort of tool, you could set up a website pretty quick, you know, you could actually ask it to design the website for it. And it would probably write you a lot of the code. And then if you connect to it through their API, you could have it write stories for you based on certain topics. And I mean, if you don’t care whether or not they’re correct, you put them online and just see if you can make some money that way. And that’s sort of something that we’re going to have to worry about. It’s not ethical by any means. But you know, it’s the internet.
TC: So your biggest concern is that this could be used as a tool for the forces out there sort of competing with audiences for attention with the reputable journalism industry, rather than the direct effect of this technology on the industry itself?
LT: Uh, yeah, that’s what I’m most worried about. Like, I don’t think we’re at a point where it’s going to start replacing people in real, you know, legitimate newsrooms. I’m sure a lot of the companies would love to try, but it’s just, you know, those lawsuits can be expensive if you print something libellous. And your reputation can be damaged pretty quickly if you print something that is completely untrue. But the bad actors out there don’t care about those things. They just, you know, they’ll make their money and they’ll shut the websites down if something bad happens. You know, that’s what bugs me. And, you know, I’m not going to name any specific sites, but you know exactly who I’m talking about when I said that. You know, clickbait farms and that type of thing. You know, sadly, they can drive discourse, especially if you’re confirming biases that people already have, even if they’re not true. You know, the world’s pretty politicized, especially now, especially online.
TC: And something you mentioned at the beginning of that particular response there, you don’t think this is going to have a particularly marked effect on sort of staffing levels in newsrooms. Do you think there’ll be sort of a qualitative change in staffing? As in new hires in newsrooms, from now on will sort of be expected to have the ability to sort of interface with these tools and possibly even manage them? So yeah, more qualitative change to the type of skills that will be required from people because of this?
LT: Yeah, absolutely. For example, you know, if you’re doing research, you can type in the subject that you’re researching into Google, and you’ve got like your top 10 million results or whatever, right, and you probably don’t go past the first page. But if you know how to use the search engine properly, you know, you can do some really powerful stuff. You can look for specific types of documents, you know, you can exclude certain words, you can require certain phrases. Like, there’s a lot of really advanced sort of searching you can do on Google, for example. And that’s a great skill to have, if you’re trying to do journalism well. So by that same token, if you’re able to learn how to write these prompts correctly, and get what you need out of this tool, you’ll be at an advantage, absolutely, if you’re if you’re looking for a reporting job. Whether or not that’s going to be something that newsrooms explicitly ask, I don’t know. You know, I don’t think that’s going to be something that’s explicitly asked for, at least not in the short term. But if it’s going to make you faster, or make you better, then absolutely it’s a skill, you want to have to put yourself out there, you know, stand out from everyone else who’s looking for work.
TC: do you think that news organizations themselves will be able to develop their own large language tools to sort of ensure that they know exactly how the algorithms work? What data it’s in, possibly train them using a corpus of their, their own articles? To know exactly, they can sort of shape the output to the best of their ability to something that they want and something they can rely on? Do you think that is a prospect? Or do you think these will remain for the foreseeable future, the reserve of the major technology companies that are currently producing them?
LT: I would love it if newsrooms did that, I think it’s going to come down to that that’s expensive to do. And you know, they’re not exactly flush with cash right now. I think it’s totally worth it, though. Like I mentioned before the election research bot that I had created, you know, if you could train a model, using that information, and sort of get it to understand that, you could find, you know, amazing story ideas in there, or you look for outliers, or that type of thing. You know, it’s just, it’s going to come down to it’s time-consuming and expensive to do. So, you know, those are two things that that newsrooms typically avoid. But the tools are becoming more and more democratized every day. Like you can learn machine learning, you know, from tons of different websites now for free, you can get TensorFlow. Like you can get, you know, go to GitHub, you can find people who’ve written code. You can stand on the shoulders of giants, I guess, is the phrase, and learn to do this sort of thing yourself and get better at it. But it all really comes back to what we talked about before in terms of accuracy, and you know, are you going to be able to train a model that you’re going to be happy with? That is going to be accurate? And that is- that seems like a pretty difficult thing to do.
TC: So yeah, something that makes me wonder is, for me, it’s not just like a question of accuracy, building your own tools like this, rather than relying on technology company-supplied ones, but um for me, it’s also a question of being able to be fully transparent and accountable to your audience. Because if you can’t tell them exactly how your content is produced, does that not throw your legitimacy as a professional journalism outfit into question, if you can’t be fully accountable and open with the audience about your content?
LT: Yeah, I think that’s a great point. The work that we do, that I do, at Torstar, the stories that are automatically generated have, you know, some boilerplate text that says that on them. We’ve got a landing page for the Open Data Team, which is, you know, it explains the type of stuff that we do. And I think that is absolutely important in terms of transparency and trust. And then, you know, every newsroom is going to have its own statement on you know, ethical guidelines if they’re part of the trust project or not, that type of thing. So yeah, I mean, if you can’t explain how you came up with a story, I think, you know, it is going to be pretty hard to get people to trust you. And if you’re a legitimate news outlet, you know, that’s something you’ve got to be worried about. But like I said before, the thing that sort of bothers me are the ones that don’t care whether or not they’re legitimate.
TC: as my final question. Yeah, this talking about transparency, about the fact that a piece of content has been generated by AI is fundamental. This might be an unfair question—but are there any other sort of ethical best practices you think are completely essential for if and when these large language models come to be used in newsrooms more widely? Things that are actually essential, have to be done, for you?
LT: Yeah, I would say, you know, I think on informed consent. And so if there are models that are that are being trained, like, let’s say there was a specific journalism model out there, and it was being trained on news stories, is the source of the news stories, is it aware that it’s being used to train this model? And has it consented to its content being used to train a model? There’s a bunch of like… is data, is data online, free? You know, there’s big debates there, that we’re not going to get into, but, you know, if a story is posted online, does that mean, it’s free for one of these models to take it as part of the corpus and train on it? I don’t know. I think that’s an area worth, you know, investigation. So yeah, the ethic— if the data is sourced ethically, I guess. Which is kind of a strange phrase. But yeah, what’s going in there, I think is absolutely something, and then you know, look at how it’s being used in terms of ethics, as well, because…
TC: It took so long for this profession to start devising ethical codes of practice. And now we’re basically, having to, well it looks like, soon we’ll have to rewrite them overnight based on how to incorporate this technology. Yeah. So it’s uh…
LT: Yeah, it’s innovation, right? Like, you know, things happen so quickly. And if there’s a sea change, like you’re always struggling to catch up, that’s the way it’s always going to be. And it’s, you know, it’s a good thing, because innovation is sorely needed in news. But it’s not so much whether you can do something, it’s if you should do something, and I think we don’t often take the time to think about that.
TC: So yeah, I think that’s an excellent way to end this discussion. Taking a look ahead, in that way to look at the challenges need to be confronted. So yeah, thank you so much for your time and your expertise and going through all that with me, Lucas, thank you so much.
LT: Oh, you’re very welcome!
SL: We hope you enjoyed that conversation. Stay tuned for the next five episodes, which will tackle five intriguing stories and introduce you to some talented journalists out of the Review of Journalism. From Silas Le Blanc, Tara De Boer, and Tim Cooke. Thanks for listening to the first episode of this season of Pull Quotes, and we hope you stick along for the ride.