Generative AI and the Future of News - a tale of two worlds

Jul 27, 2023

Last week, the New York Times reported that Google is testing an AI tool that enables news organizations to create content using AI. It is unclear what the exact capabilities of the product are but based on one comment from Google, it helps with tasks like headlines and modifying writing styles. Amidst fears of what impact this could have on journalist jobs, Google quickly put out a clarifying statement:

Our goal is to give journalists the choice of using these emerging technologies in a way that enhances their work and productivity. Quite simply these tools are not intended to, and cannot, replace the essential role journalists have in reporting, creating, and fact-checking their articles.

This has sparked a heated conversation about the future of news with generative AI. The companies making AI products argue that this technology will enable media organizations and journalists to be more effective, while critics contend that it could potentially harm journalist jobs, increase misinformation, and flood the market with low-quality AI-generated content.

In this piece, we unpack the impact of generative AI on the future of news by exploring a few things:

Types of news articles today by complexity
The business of news
The anatomy of producing a news article
What generative AI can and cannot do well
Generative AI and the Future of News - A tale of two worlds

Types of news articles today by complexity

There are several types of news content on the internet, each with varying levels of complexity in their production. The complexity typically arises from factors like timeliness, amount of research required, and the story being told.

The low complexity articles include:

Factual/data news (Eg. article listing mortgage rates in San Francisco, article with numbers from a company’s earnings call) - these are relative straightforward with minimal subjective opinions or perspectives
Interest-based/informational news (Eg. summer recipes published on NY Times Cooking, article explaining what the Fed interest rate means) - there is some creativity in selecting topics but the focus is more informational and catering to specific interests
Breaking news (Eg. article about a CEO resigning, article describing an active weather event) - these are typically short articles about a rapidly unfolding event, with limited initial information and emerging facts

The high complexity articles include:

News coverage (with context, research, and facts) - these provide more detailed explanation of news backed by research, additional context and often interviews with people, thereby requiring more time and effort to produce; they are also extensively fact-checked
Interpretive news - this includes opinion pieces, op-eds, and analyses (like this one) that provide interpretations / perspectives / opinions on current issues; they are often subjective and require extensive research to back up the perspectives
Feature pieces - these are typically deep dives on topics that might not really be hot news right now but important issues, investigative journalism would fall in this category; these require extensive research and interviews after spanning across months, and also creative storytelling

I bring up this categorization because the purpose behind and process involved in producing these two types of articles is very different, and consequently will evolve differently with the use of generative AI. With this in mind, let’s talk about how news makes money.

The business of news

The business of news is tricky - most news organizations operate on an ad-supported media model, and a small subset have successfully managed to pivot to subscriptions. This has major implications on the type of content a news organization produces.

The subscription-based news businesses (NYTimes notably now gets ~70% of revenue from subscriptions) have a content strategy that’s tied directly to consumer value - focus on high quality content + diverse interest-based content to make the subscription worthwhile. For example, of the types of content above, NYT primarily does news coverage (that is well-researched), interpretive news and feature pieces for their core news offering, backed up by interest-based content like NYT Cooking and Wirecutter. Some other publications like the Wall Street Journal and the Washington Post have made strides with subscriptions using a similar approach.

However, most publications are still ad-supported and will continue to be so for the foreseeable future. This means that they are focused on generating high engagement: more eye balls → more ad inventory → more revenue. The strategy that has been more effective for them is augmenting high complexity content (like news coverage, interpretive news) with a ton of high volume, low complexity content (like factual/data news, interest-based/informational news).

This strategy works because high complexity content helps deliver long-term value to consumers, while the the ton of low complexity content helps get eyeballs as well as win the SEO game. The SEO loop looks something like: high volume of content that gets clicks → search engines think your content is valuable → all of your content gets ranked higher → more eyeballs.

This is no criticism of the strategy and the need to play the SEO game is a necessity for ad-supported media. The reality is that news media is a terrible business - the internet disrupted how content was created and distributed (mostly through Google Search / Meta today), and news organizations have not recovered from the impact of that disruption. There are attempts to fix the imbalance, which are still evolving - Canada/Australia passed laws that force Google / Meta to pay what’s essentially a “disruptor tax” to support good journalism (full analysis here), some organizations like NPR are partially supported by federal funding, and some newspapers like the Washington Post are subsidized by (generally well-meaning) billionaires.

All this to say - news media / journalism is an absolutely essential public utility for a well-functioning democracy but a not-so-great business. Therefore, the more self-reliant these news businesses can be (without having to depend on regulation or billionaires), the more effective they can be towards achieving their mission. Generative AI can’t solve all of these structural challenges (especially content distribution) but it can definitely make content creation more efficient without necessarily compromising on quality.

The anatomy of producing a news article

To understand what part of news production will generative AI have most impact on, it’s helpful to understand the different steps involved in getting a news article published.

We can break down the effort into a few sequential components:

Research - finding a topic, gathering facts / data, talking to the right people for quotes or perspectives
Storytelling - making sense of the research to come up with a compelling storyline
Writing - turning the storyline into actual words that make an article, buttoning up the storyline, backing it up with the right data / adding supporting links
Editing - improving the writing, coming up with catchier headlines, having better visuals, fact-checking to meet standards
Distribution - publishing, distributing across channels like social

Based on some research and also influenced by my writing experience, I estimate the effort of producing an article into something like: Research (30%), Storytelling (20%), Writing (20%), Editing (20%) and Distribution (10%). Take the actually numbers with a grain of salt but they are directionally accurate.

Now, note that not all formats of news articles require all of the steps - the low complexity pieces might not need to rigorously go through all steps in the life cycle above but the high complexity pieces do.

As an example, this article took me ~8-9 hours to finish (so did most other articles on my Substack), and I would like to think that my articles fall under interpretive news. If I wrote a low complexity piece, I could probably get it done in ~2 hours. Another data point - this NYT reporter says he can typically get a news article done in a couple of hours while a feature piece might take up to 6 months.

So, what can generative AI actually help with? It basically comes down to what steps in the news production process can current (and future) products do well.

What generative AI can and cannot do well

Here’s my take on each step. Spoiler - they are somewhat effective for writing and editing, they can be effective for research if the right products are built, and they will continue to be bad at storytelling.

Research - surprisingly low capability

Most current generative AI products (like ChatGPT, Google Bard) have surprisingly low capability when it comes to Research:

There are definitely certain capabilities that they are good at. For example, they are good at giving arguments for a particular point of view or providing inspiration for new topics.
They are average-to-good at summarizing content and in particular, answering questions based on reading an article. For example, you can ask them to read this article and list out different steps involved in news production.
However, they often produce factually incorrect information (“hallucinations”), do not provide links / sources to what they claim is the truth, and have questionably sourced data with potential copyright infringement. See screenshot below.

(left-ChatGPT) Good at creating an argument vs (right-Bard) bad at giving fact-checked information or sources

For a generic use case (say you want to draft an email to a future LinkedIn connection), these issues don’t matter. But when you are writing a news article, not fact-checking can severely impair your brand perception.

Though the research tools can be used for directionally useful research (for example, you want to know how much market share Bing and Yahoo search have), writers / journalists still have to do additional work to find a new source to link because current tools neither provide links nor have fully accurate data.

Using cleanly sourced data + being able to provide answers with trustworthy links is a big opportunity for anyone building a research product for writers, and it’s very likely that new companies will emerge here.

Storytelling - low capability

Generative AI’s storytelling capabilities are fairly weak today. Here’s an example: I did a bunch of research for this article, put together the research in an organized format and asked ChatGPT to give me a storyline. See screenshot for results.

At first glance, it looks like they “make sense”. But that’s literally all it is - it makes sense at a surface level. In reality, none of these storylines are very compelling and the research notes I provided had much more detail and nuance beyond the surface level results provided here. You can make the argument that these are still decent, but in the context of writing an article, this was completely useless to me - at best, this was inspiration/ideas, not a storyline based on the research I submitted.

(left) Research notes fed to ChatGPT, (right) storylines generated

If you have a low complexity story, it will get the job done for you. If you are building a nuanced story, or have the data and need help constructing a story, current products do not get the job done. Given the high amount of subjectivity involved, I am not bullish it will get better and this will continue to be the part where writers can add most value.

Writing - medium capability

If you provide current products with a detailed storyline of what you want to say, it can generate a barely decent v1 of the content. The output is still pretty cookie-cutter and it is pretty challenging to instruct the current models to craft the language in a way that it tells a story. See screenshot for an example output after I fed ChatGPT detailed notes about the storyline for this article.

At first glance, you might think it looks okay for a v1 draft. It is not - the tone isn’t right, the story doesn’t flow and it always looks like it’s been written by a bot. It’s very generic and does not articulate the story despite being fed a very specific narrative. If I published that article, you would not read it. And that’s the challenge with the writing capability today - it can work for low complexity articles but for anything more complex, you essentially have to rewrite the whole draft.

(left) detailed storyline fed to ChatGPT, (right) v1 draft of article

The big product unlock here would enabling effective human instruction - writers don’t want to take terrible cookie cutter versions and redo the whole thing line by line. What they would like is some form of a user interaction construct that lets a writer feed in a storyline and sequentially craft the article section by section, while giving active feedback to the AI tool. Cherry on top would be being able to customize the writing style by feeding past articles written by the same writer.

I think the underlying models today do have the capability to do this and there is innovation needed at the UI layer, which I believe will happen near-term.

Editing - medium capability

Tools today have a good amount of capability to review articles, find errors and make corrections. These tools are also very good at cosmetic tasks, like coming up with ideas for catchy headlines or section titles that might perform well.

However, there is still work needed at the UI layer to make this usable for editing - there are some partial solutions today like Notion AI which lets you improve language and make sentences shorter / longer from within a Notion page, but it does not capture the full context of the page. ChatGPT does a good job of making edits across a full article but lacks the ability to take instructions to easily edit specific sections and also cannot support links (i.e. I give it a blurb with text that has hyperlinks, I get back text without any links).

I absolutely hate the editing part of the writing process, and I am sure several writers and journalists do as well - generative AI can most definitely minimize some of this editing grunt work near-term.

Distribution - medium capability

There are tools coming up here today that for example help you generate social snippets, or identify parts of your story that might go more viral. This will likely continue getting better in the future.

Generative AI and the future of News - A tale of two worlds

Based on the above analysis, you can see a clear duality emerging:

Low complexity formats, which require simpler research & storytelling, and can be written & edited easily, will start to be more and more AI-generated (or heavily AI-assisted)
High complexity formats, which require more complex research and storytelling capabilities that don’t exist today, will continue to be primarily created by journalists but generative AI can bring a good amount of efficiency by minimizing writing and editing grunt work

While AI-generated low complexity content seems bad at first glance, these articles are primarily written for SEO purposes or to augment existing high quality content, and the race to commoditization had started much before the generative AI wave. For example, the Associated Press has been using bots to publish articles reporting company earnings since 2014. The upside here - this is not the kinda of content journalists want to spend time creating and automating this will free up their time for high complexity content.

More and more high complexity content will emerge. Research and storytelling capabilities with products today are limited, which means that the ability to construct a compelling story based on information and story-tell in a unique way will continue to be the biggest currency journalists have. This, accelerated by AI writing and editing tools that take the grunt work out of publishing quality content, will be a big boon for journalists.

What about some of the concerns raised around using AI for news? Some are fair but I believe they are mostly solvable:

Wave of junk SEO content - Google has taken the stance that they won’t penalize AI-generated content and have gotten criticism for opening up a world of junk content, including factually incorrect articles. That’s a somewhat fair critique but I believe Google will crack down on this - not out of the goodness of their hearts but because the crux of a search product is to give users useful results. Google already has penalties for SEO hacking practices (like keyword stuffing, link farming) and it would be easy to extend this framework to AI content.
Journalists will lose jobs - Some have brought up the concern that journalists might lose jobs, or be in a less advantageous position (like writers in Hollywood who are currently on strike); the big difference here is that while there are a large number of writers in Hollywood, newsroom employment in the US has fallen 26% since 2008, not because we don’t need journalists but because news is a bad business; Generative AI can help fix the economics of the business while continuing to empower journalists do what they love.

Conclusion

I am by no means an AI maximalist and I absolutely think there are real risks to AI which need to addressed as we scale the technology. However, I think it’s important to analyze every AI-impacted market / situation separately and not bundle them into one big AI impact problem.

In the case of news, generative AI can tremendously improve the economics of news businesses. The products are not there today - there is a clear need for AI products that thoughtfully solve writers’ needs without a brute-force, chat-based, language model interface, but I’m confident these will emerge near-term.

Low complexity writing will be more and more AI generated and that’s okay - it can help companies run an efficient SEO machine, while journalists (assisted by AI for grunt work) bring to life a lot more high complexity content that elevates the public discourse.

In case you missed the last couple of articles:

See you next week!

Unpacked

Discussion about this post

Ready for more?