Descriptive image of soundwaves
Courtesy iStock

The growing privacy concerns around using transcription services like Otter.ai, Descript, and Trint

Who's Listening
Courtesy iStock

[su_dropcap style=”simple” size=”10″]W[/su_dropcap]hen Michael Lista went to stay 10 days at a lonely, one-storey motel in Emerson, Manitoba, he overpacked. His suitcase was full to the brim with clothes he didn’t end up wearing. It turned out that all he needed were his travel essentials—a pair of jeans, a pair of boots, a parka, and most important, a beat-up Sony voice recorder, complete with scratches on its small LED screen and dust in the crevasses of the speaker grilles. For Lista, a self-described Luddite, that palm-sized device is his baby; the single-most important tool for his line of work.

His room at the motel was a relic of a past long gone. A frayed, low-pile carpet lined the floor, the beige wallpaper stained by humidity and the passage of time. Lista had no complaints, though, for two reasons: the first being that the kind folks that managed the motel charged less than $50 per night. The other reason was because the Maple Leaf Motel was exactly where Lista intended to be. His goal was to tell the story of his next-door neighbours: asylum-seekers who, fearing President Donald Trump’s anti-immigration regime, fled the United States and ended up in a small town with a population of fewer than 700, some 200 metres north of the border.

Lista chronicled the story of refugees like Ahmed, a gay man who endured capsizing migrant boats, deadly spiders, a robbery, and imprisonment during his three-year journey from Ecuador to southern Winnipeg. He shared the story of a young man named Koffi, who feared deportation back to Ghana. Emerson, being so close to the U.S. border, was a town filled with Odyssean tales like these, of the people who were forced to cross the lines of legality to escape the perilousness of their home country. And because their stories so often involved breaking immigration laws, Lista had a concern: would he be putting his sources in jeopardy simply by speaking with them?

This concern elucidates the significance of Lista’s recorder. Had he recorded the interview on a device that uses cloud-based storage, like a cell phone or a newer laptop, Lista’s saved data, and thus Ahmed’s confidential information, might end up in a server in Silicon Valley. Lista would be powerless should the U.S. government choose to subpoena the information on those servers. In short, by allowing Apple (through iCloud) or Google (through Google Drive) access to his interviews, Lista believes he would be failing his moral duty as a journalist to protect the confidentiality of his sources.

That moral duty, he says, is to protect the unspoken contract between source and reporter: in return for sharing their story, sources are promised fair representation and protection of their information. Lista says the confidentiality promised by a journalist to their source, regardless of the content of the information disclosed, should be no different than that guaranteed by a priest in confession or a lawyer to their client. To voluntarily grant access to this information to third-party companies is an abdication of this promise. As recent advancements in digital tools change the practice of journalism, the contract Lista describes, between reporter and source, is increasingly abating.

And so Lista remains firm in his conviction that, as a reporter, particularly an investigative one, he should always employ an “air gap” with his recording devices. An air-gapped device is completely isolated from network connections, with all its processes and storage solutions done by the device itself. Lista’s recorder is an example: it has no capability to connect to the Internet and it does not automatically sync its data with other devices.

But air-gapping is only one step in the protection of source confidentiality. Lista has other recommendations. He conducts all phone interviews, if possible, via landline because governments are much more capable of obtaining information from cell phones (and also because the higher sonic quality of landlines allows for a quieter, clearer, and more intimate conversation that helps build greater rapport with sources). He avoids cloud-based word processors like Google Docs and he never, ever, uploads his interviews to transcription websites.

[su_dropcap style=”simple” size=”10″]T[/su_dropcap]oday, transcription is a facet of journalism. Writing, or typing, out the entirety of a conversation ensures that the journalist can quote sources with accuracy, meaning word for word, and within the original context meant by the speaker. For journalists, there are few sins as unforgivable as an inaccurate quote. To spare readers from inaccuracy, as is their duty, some journalists transcribe their interviews.

Among journalists, the methods of transcription are varied. Freelancer Isabel Slone plays the entire recording back with her headphones in and types away on her keyboard. CBC reporter Stu Mills prefers to use his background in radio to his advantage. He notes when his interview subject says anything that would be suitable for a radio clip, and then he extracts that specific audio clip and transcribes it. “You’re looking for the value of the experience, where they’re not just saying ‘the truck weighed 3,400 kilograms,’” says Mills. “But if he said ‘the truck looked like a beached whale laying on its side…’”

The variety expands to some journalism classrooms. Carleton University journalism professor Paul Adams teaches an exercise in which his students are given a long section of an interview to transcribe word for word. But he doesn’t give specific instructions on how they should go about it. Adams is noticing a new trend among his students: their reliance on automatic transcription services.

Today, many transcription services are AI-based, with a complex algorithm used to deconstruct speech into text. Companies that offer AI transcription, such as Trint, Otter.ai, Temi, and Descript, use machine learning to improve the accuracy of the algorithm. Simply, these services identify patterns by parsing user data. The more user data they have to parse, the more patterns they can identify, which leads to a better, more accurate final result. There are several advantages to transcribing through these platforms. They can be accessed on multiple devices—a journalist can upload an interview on their computer and then access the transcription on their mobile phone at the touch of a button. But AI transcription’s greatest asset is its speed: hour-long interviews can be fully transcribed within minutes.

Four years later, Lista remembers the details of Ahmed’s story. Ahmed was beaten to the brink of death for his sexuality in his home country of Ghana. He flew to Ecuador to begin a journey to the United States, in an attempt to obtain refugee status. In Colombia, he was robbed of all his belongings, save a photocopy of his birth certificate and some cash. When Ahmed finally reached the U.S., he was imprisoned and ordered to return to Ghana. Ahmed’s life, and those of the other asylum-seekers Lista spoke to, were the stakes of the story he was about to tell. Had Lista used a transcription service, there would be a chance of giving the U.S. government access to the information of his sources. It’s not impossible that a simple click of the “upload” button could have sent Ahmed back to Ghana.

[su_dropcap style=”simple” size=”10″]B[/su_dropcap]ut it didn’t. In early 2017, Ahmed obtained refugee status, which would protect him from persecution outside of Canada. As he reported the story, Lista, perhaps due to his unwavering conviction that source information must be protected, did not end up jeopardizing the lives of those he interviewed. It may have been Lista’s methods—his air-gapping of his recording apparatus and his refusal to involve networked software in his reporting–that protected the privacy of Ahmed and others.

Or maybe there was no actual risk at all. Trint, a U.K.-based AI-transcription platform, promises users that all user data is encrypted using AES 256, a notoriously rigorous encryption algorithm used by the U.S. government. Otter.ai includes in their privacy policy that they will “not share Personal Information or Customer Data with others except as Customer requests per written instructions or by sending a message.” Descript, a company based in San Francisco, pledges HTTPS encryption and claims to employ a “Data Protection Officer.”

[su_quote ]Lista’s Sony…has no capability to connect to the internet, it does not automatically sync its data with other devices [/su_quote]

But privacy and technology lawyer Aaron Baer warns against blind faith in what companies present as their privacy policy and their terms of use. The problem is twofold, says Baer. The first is that companies are required by law to have privacy policies and that terms of use are almost always broad and in the company’s favour. The second is that there is no guarantee that companies are following policies or that they even know what their policies are. “The reality is,” says Baer, “you don’t have a clue, with any app or any platform that you’re using, what is happening with your data.”

In February 2020, U.S. officials were reported to have said that the Chinese telecommunications giant Huawei, through backdoor access, is capable of covertly obtaining data from companies that use its equipment. Huawei, already the world’s largest supplier of telecommunications hardware, is poised to become far and away the global leader in 5G equipment. The reason this is hugely troubling is because AI-based services, like transcription, are soon to be carried out exclusively on 5G networks and equipment because of 5G’s unmatched speed and efficiency.

Huawei, being based in China, is legally mandated to relinquish obtained data to the Chinese government because of the country’s intelligence laws. On top of that, the Chinese government has “considerable sway over all Chinese private companies,” and, in 2018, even awarded Huawei $222 million in government grants, according to reports by the Council of Foreign Relations.

In other words, because a) it is a near certainty that machine learning, AI-based tools like transcription will soon operate exclusively on 5G equipment, and b) Huawei is asserting its dominance in the 5G market, it follows that user data uploaded to these services will be at risk of being collected and given to Huawei and the Chinese government. It follows then, that users of transcription services would legitimately be giving access to source information to Huawei and the Chinese government. This is one of the key reasons why Lista describes the use of transcription services on the part of journalists as madness. “It’s a compromise of the chain of information gathering—allowing Huawei into the 5G network,” he says. “I can’t believe people [use transcription services]. It’s so irresponsible.”

The fear that companies are accessing private user data is not just a futuristic hypothetical. Last year, Google admitted that more than 1,000 private conversations from users in the EU were leaked by a third-party company that was hired to analyze the voice clips. The tech giant promptly halted voice transcriptions done through their Google Assistant app within countries in the European Union.

This incident revealed compounding problems in the realm of data privacy. Not only did a company leak user data, uploaded solely for transcription purposes, but it was also revealed that Google was sharing unencrypted information to hidden third parties. This case is emblematic of Lista’s concerns: by granting access to source information, voluntarily or involuntarily, the journalist cedes their capability to protect that source. Worse still, in many privacy policies, including those of Otter.ai, Descript, Temi, and Trint, it is stated that the companies will comply with any legal obligation to disclose personal data. Though a standard phrase in privacy policies, it is precisely what Lista fears: the possibility of a subpoena.

“We need to be more diligent than everyone else,” says Lista. “In the same way when someone speaks to their priest, or lawyer, or doctor, we need to circumscribe the boundaries of our conversation and protect them at all costs.”

[su_dropcap style=”simple” size=”10″]A[/su_dropcap]nn Cavoukian, a world-leading privacy expert, offers a more hopeful perspective on the future of automated transcription. Cavoukian believes that consumers are becoming increasingly conscious of privacy concerns when using services offered by digital companies. In 2009, when she was the information and privacy commissioner of Ontario, she published a document outlining the principles of Privacy by Design (PbD), an engineering concept in which privacy is a primary concern and is embedded in every stage of building a system. She describes PbD as being able to offer a competitive advantage for businesses because their clients actively seek privacy.

Privacy by Design would certainly be welcomed by journalists as an assurance policy that their information could not end up in the wrong hands. PbD is a concept that is becoming increasingly popular in modern networked data systems, including transcription services. One example of a transcription service that adheres to some principles of PbD is Descript. Descript is open about their privacy-ensuring process, going as far as to explain the method of encryption the company employs in a webpage solely dedicated to the security of its platform.

In the same vein, Sam Liang, co-founder and CEO of San Francisco-based Otter.ai, says that his company is dedicated to user privacy. But as Cavoukian and Baer say, he concedes that journalists who use the service are ultimately left to rely on blind faith that companies, his included, are following the policies they lay out. When asked how he can truly guarantee users that their data is secure, Liang deflects, comparing it to using other third-party sites like Zoom or Dropbox. Liang sees journalism as a collaborative medium between writers, editors, and sources, in which data has to be quickly transmitted and processed by different parties constantly. For that reason, the use of third-party companies in journalism is a norm, says Liang. And it’s true. Some journalists record calls with a third-party app such as TapeACall or store recorded interviews in Dropbox or Google Drive accounts.

And so, Baer’s warning that users will never truly be able to know what happens with their data, in spite of how trustworthy a company can appear, persists. The future of privacy in transcription services may be bright, but Cavoukian agrees with Baer. By using them, a journalist will always be leaving the conversation they have with a source at the hands of businesses. She recalls a morning where she and a reporter from The Globe and Mail spoke confidentially. “If I thought what I was saying to him was going to a transcription service…I would never speak to him!” says Cavoukian. “I wouldn’t give him any information. Why would I do that? There’s got to be a trust and respect between a reporter and who he’s talking to.”

But what if Cavoukian and the Globe reporter weren’t speaking in confidence? What if they were having a conversation about their favourite way to scramble an egg or any other topic of zero sensitivity? It may be that the distinction between sensitive and non-sensitive information is one to be made by the journalist. Adams says he would have grave concerns about privacy if he were interviewing a whistleblower or dealing with a sexual assault story, but that his concerns would be lessened if he were interviewing a subject as public as Justin Trudeau, for example.

In cases where an interview topic does not even register on a hypothetical spectrum of information sensitivity, the permissibility of transcription service use dwindles from an “absolutely not” to a grey area with which many journalists are familiar. Journalistic questions can often be broken down into questions about ethics, to which the answers are often ambiguous. Cavoukian raises an argument about consent. “Why would anyone give information to a reporter,” she wonders, “if it can then go anywhere without my consent whatsoever?”

Maybe it’s time for journalists to begin asking for consent if they intend to upload a conversation to a third-party transcription service—another addition to the list of journalism best practices, right next to confirming that an interview will be on the record.

Or maybe journalists should seek alternative methods of accurately quoting their sources. They could use Slone’s approach of getting dialed in and typing away. Or perhaps isolating audio clips and transcribing only certain selections, the way Mills does. Or maybe they should be more like Lista, who doesn’t transcribe at all. Lista, a staunch believer in the principle that the tone and emotional context of his sources are lost in transcription, simply hits the playback button on his beat-up Sony voice recorder and types the quote he wants directly into his piece.

Maybe Lista really is a Luddite overly alarmed by privacy issues. Or maybe it’s not a coincidence that late in the year of their initial interview, Ahmed lived in an apartment in Winnipeg, where he was hard at work as a volunteer, helping other refugee claimants in need.

Lucas Lee’s Ranking: Transcription Services

A comparison of available tools based on privacy ethics, security and handiness

WITH THE ADVANCEMENT of artificial intelligence and machine learning, one task performed by many journalists has changed: transcription. Artificial Intelligence makes the processing of speech-to-text faster and simpler than ever. But which method of transcription is most secure? Which platform protects the conversations shared by journalists and sources? Here’s my (worst to best) ranking of the more popular transcription services available:

Descriptive image of Rev logo

7. Rev (human transcription)

Rev is a San Francisco-based company that offers transcription at a rate of $1.25 (U.S.) per minute, done by human freelancers (called Revvers). Although Rev promises TLS 1.2 encryption, the fact that Rev employs people to listen to recordings means the company lands at the bottom of the list. It is recommended that journalists refrain from granting access to interviews and recordings to strangers, despite the fact that Rev claims that all Revvers are vetted and have signed NDAs. Rev does offer AI-based transcription, which is the better option to process information divulged by sources.

Descriptive image of Temi logo

6. Temi

Temi, another San Francisco-based company, charges 25 cents per minute for transcription performed by its proprietary algorithm. Temi’s low ranking on this list is due to several issues I have with its privacy policy. The first is its vague description of what it considers “personal data.” Temi defines personal data as: “when you voluntarily provide such information such as when you register for access to the Services (for example, your name and mailing address), use certain Services, contact us with inquiries or respond to one of our surveys.” The lack of definition of “certain Services” lands Temi at the sixth spot.

Another area of concern is that by using Temi, the user agrees that their “Personal Data” may be transferred in the event of corporate sale, merger, reorganization, dissolution, or a similar event. The vague nature of both descriptions is why I recommend against using Temi.

Descriptive image of Otter logo

5. Otter.ai

Otter.ai, Los Angeles-based, provides a more comprehensive privacy policy than that of Temi in my view. Otter.ai lists the privacy regulations by which they abide—EU customers are granted rights under the General Data Privacy Regulation (GDPR), which is one of the most stringent pieces of privacy legislation ever passed. But Otter.ai is far from perfect. In regard to customer data, users grant the company “a worldwide, non-exclusive, royalty-free, license to access, process, copy, export, and display Customer Data.” The broadness of this language is one of the reasons I do not recommend the Otter.ai service. Another issue is this: Otter.ai claims they encrypt many of their services, not all.

Logo for Descript software

4. Descript

Descript’s security policy is comprehensive and transparent. The company claims to employ a Data Protection Officer accountable for the implementation of its security practices. The company claims it is compliant with the GDPR and the California Consumer Privacy Act. But I do not recommend Descript either—far from it. Descript does not have a proprietary machine-learning algorithm—it uses Google Cloud Speech-to-Text for its AI transcription, and for its human-powered transcription. This means that Descript is sharing user data with Google and Rev. Trusting Descript with information means trusting Google and Rev as well.

Descriptive image of Trint logo

3. Trint

U.K.-based Trint has a lot going for it as the transcription service of choice for journalists. In fact, it was founded by a journalist—an award-winning war correspondent named Jeff Kofman. Trint claims it is ISO/IEC 27001 certified, which is a standard of security requirements created by the International Organization for Standardization. Trint claims it encrypts data with TLS 1.2 during upload, and uses AES 256, a very secure algorithm, while the data is at rest. But I don’t recommend Trint either.

Trint, like all other transcription services listed before it, is legally compelled to disclose user data if need be. Their policy states: they may disclose your personal information to third parties, the courts and/or regulators or law enforcement agencies in connection with proceedings or investigations anywhere in the world where we are compelled or believe it is reasonable to do so. This clause means relinquishing any control the journalist has over source information.

Descriptive image of oTranscribe logo

2. oTranscribe

One transcription service I do recommend is oTranscribe, because oTranscribe is not really a transcription service at all. It is a web page that allows users to upload their recordings and start typing away, themselves. Manually. The benefit is that the platform allows users to avoid switching from window to window when transcribing. And it’s secure too—the recording and transcription files never leave the user’s computer, and oTranscribe can prove it, because unlike all the aforementioned companies, oTranscribe is open source.

Descriptive image of headphones

1. Sitting your behind down, plugging headphones in, and going all-in

The absolute best way to transcribe interviews is the old-fashioned way. Sources should feel confident that journalists will keep the information they share, whether or not confidential, away from the hands of companies and governments. The only way to fulfill that duty to sources is to keep all information away from third-party companies. My recommendation is this: avoid using TapeACall. Refrain from storing any source information using cloud services, like Google Drive, iCloud, or Dropbox. Never use online word processors like Google Docs. If using Microsoft Word, disable auto-syncing. And, for the love of God, never use AI transcription.

This list and therefore opinions stated within are purely the author’s.

—Lucas Lee

About the author

+ posts

Sign Up for Our Newsletters

Keep up to date with the latest stories from our newsroom.

You May Also Like
A group of people in a line looking their electronic devices

Good as Newsletter

Amid the instability of social media, email newsletters are back and booming…

Pressing Saudades

O Jornal de Toronto united the city’s Brazilian diaspora. As the paper…