Over a year of graphing the news

The anniversary passed without my notice, but I realized I’ve been running my little hobby project to track US newspaper stories for over a year! By far my longest running project not tied to a paycheck, I think I’ve written 12 posts on on the project over that time:

And that’s with my life getting dominated by COVID-19 response work since April! This will be post 13, and is going to serve as sort of summary of the work I have done over the past year. Data is from 08.09.2019 to 08.31.2020.

As with most pipelines, the bulk of the code writing happened at the start. All the scripts to grab newspaper front page PDFs from the Newseum’s website, identify headlines and group the text into stories (see example below) had to happen before I could start comparing stories from different newspapers!

Above: Before/After visualization showing how I identify the different possible headlines in the front page PDF and then associate the story text with the headline.

After that, I needed some method of identifying if stories were on the same topic. Some natural language processing and graph work did the trick, generating these graphs each day’s stories, which clustered stories on similar topics:

Above: Example graphs of stories from several days. Each green dot (node) represents a single story and each black or red line (edge) links 2 stories on the same topic. Some days, there seemed like there was only a single story in town. Other days, nothing really seemed to be dominating the news. Normally 1-2 topics were the clear front runners.

At this point, I turned my focus to the largest clusters and started identifying which topic was such a big deal that so many different stories were being written on it that day. There was a bit of manual effort in this step, although the computer did most of the work. Over a year later, a lot of news has come and gone…

The top story for each day over the past year+. Far too many to visualize!

But now that I have a year’s worth of data, I can retrospectively go back and see which of those topics can be grouped together. For example, stories about the Democratic primary contenders, Biden’s pick of Harris as a VP candidate, and the USPS delivery issues can all be lumped together into stories about the US 2020 presidential election. This reduces the number of topics from 148 to 22, although did add another layer of subjectiveness to my work.

Looking at the distribution of these 22 topics, something that immediately jumps out is how just a few topics really dominated the year’s news.

Leading the way are stories on COVID-19, followed by domestic presidential politics. First the impeachment of president Trump followed by the 2020 presidential election. Only 2 other topics were the leading story of the day for more than 15 days: US international relations (think ‘US abandoning the Kurds’, or the ‘trade war with China’) and police killings and reactions (think protests over Floyd’s killing).

Just looking at these top meta-topics is much easier than looking at the 148 original ones!

Fraction of stories on the top topic to number of newspapers studied. Roughly speaking, if the fraction is 0.8 than you’d expect about 80% of the newspapers to have a story on the topic for a given day. Only showing topics with 20+ days as the topic daily topic

One thing that really jumps out is how clustered these stories are. Only 1 or 2 stories can really dominate the news for a period of time. Another thing to see is just how few other topics led the day’s news over the past year. Of the 376 days of data I collected, 289 of them had a top story which fell into 1 of these 5 groups.

So what to do with all this data? Honestly, I’m still not sure. I’ve done a lot of visualizations, but now I have enough data to start pushing into other realms. Maybe trying to predict how long a topic will keep it’s leading record? Training a bot to produce similar sounding stories? But I think it’s time to stop fiddling with the pipeline and start figuring out new things to do with the data, or time to find a new project! Maybe go back to some of that agent based modeling

COVID-19 Check in

I’m fortunate enough to have a job directly contributing to the effort against COVID-19 in the United States. That has meant long workdays,  vanished weekends, and no time for the personal hobby projects that go into this blog. Today was a day off though! So here’s a quick dump of all the COVID-19 related analyses I have running in the background.

Coronavirus in the headlines

Coronavirus started showing up on U.S. newspaper front pages in late January, and it has since taken over as the new reality. (Not surprising with how much the pandemic is changing lives, but still impressive to see!)

I’ve visualized this before, but here are some updated  word clouds made from all my gathered front pages since late February. Watch how coronavirus literally grows to dominate the clouds, and never releases that dominance:

WC_COVID19_200409
Growth of coronavirus and related terms in the headlines and text of stories from U.S. newspaper front pages. The long bar is my hacky way of forcing the wordclouds to reflect relative word frequency between days.

This dominance shows up no matter how many different ways you slice the data. Here are some visuals of the fraction of papers that mention the word “coronavirus” somewhere on their front pages:

By mid March, almost every paper had a story mentioning that single word at least once. I don’t think that’s ever happened in my analysis before. Even President Trump and his impeachment didn’t get close to that dominance:

A new COVID-19 reality

Despite coronavirus showing up in every paper every day, my previous measures of topic dominance don’t really capture this. My previous work used the similar language between stories to automatically detect when stories were on the same topic. It worked great for showing how important the impeachment inquiry and trial were in the American narrative. But for coronavirus, I never see a huge break through story dominating in the same way:

multiDayFractionRed200409
The top 2 main topics of U.S. newspaper front pages for 2020. Stories that share similar language are grouped into topics. The top 2 topics with the largest number of stories are manually assigned a label (e.g. “Democratic primary”) that links topics between days. The 2nd largest topic is plotted in front of the largest topic.

So what’s happening? The impeachment was a very localized topic. Each day, all the papers were publishing stories hashing the latest event to have come out of Washington DC. The language of all those stories was very similar, so my little algorithm had no trouble linking them together.

COVID-19 is different. It’s impacting everything everywhere. And because of that, it’s really become a new reality that impeachment never achieved. There are COVID-19 stories about the economy, politics, hospitals, New York City, Washington State, social distancing, … The list never ends. And my little algorithm is seeing each of those as their own topic, rather than the single meta topic of coronavirus. That shows up in the above bar chart. For the past month, nearly every one of the top topics has been about something coronavirus related.

We can see this by looking at one of the graphs the algorithm generates of stories linked into topics and layering on top of that an indication if the word “coronavirus” was in the story:

graph200409Coronavirus
A graph of the many Topics found in front pages from 04/08/2020. Every story (blue dot) that was linked to at least 1 other story (black line) shown. Stories that are part of the top 2 largest topics shown as green and yellow. Any story that mentioned “coronavirus” circled in red.

As you can see, the word “coronavirus” shows up in a huge number of the stories. But there are several separate topics being identified. However, a closer inspection shows they are all about how coronavirus is effecting our lives:

  • The largest topic is on NYC’s death toll.
  • The next is about Trump firing an oversight watchdog
  • The 3rd largest is on churches finding ways to meet virtually
  • The 4th is about a nursing home
  • The 5th covers the acting Secretary of the Navy resigning

All those stories have to do with the new COVID-19 reality.

This may have not been my best blog post ever, but time is very limited and I wanted to get out what I could!

 

Stay safe out there all.

The only story in town: COVID-19

The exponential spread and unprecedented actions to combat COVID-19 have made the virus the only major story in town. Over the last few weeks, the virus has been dominating the headlines in ways not even the Trump impeachment trial could achieve.

Below is a GIF of word clouds generated from the front pages of U.S. newspapers over the past few months. The time span covers Trump’s acquittal, the death of Kobe Bryant, and Biden’s dominance of the Democratic primary. But for the last few weeks, the only story has been coronavirus.

wordclouds_200322
Wordclouds from words in the headlines and story bodies from U.S. newspaper front pages over the past few months. The size of a word roughly correlates to the frequency of that word that day.

A simple key term examination shows that pretty much every paper in recent days has had a story about COVID-19 somewhere on its front page:

KeytermCoronavirusZoom20200322

That is incredible. Even at the height of the impeachment trial, the same key term was never found on such a large fraction of the papers:

KeytermImpeach20200322

I’ve been tracking this using some text similarity and graphing techniques for over 6 months now. I compare the text from each extracted story to the text in every other story and identify when the stories shared similar language. The result is a graph where each node (green dot) represents a story extracted from a paper and each edge (connecting line) represents the language between the 2 stories being found to be very similar. An example of this graph is shown below from March 14th, the day after President Trump declared an national emergency over COVID-19:

graph200214

(Not shown are stories that didn’t share any connection to any other story.)

On March 14th, the story about Trump declaring the emergency dominated the headlines. It’s that large cluster in the center of the figure. On days when a single story isn’t so dominate, often a second substantial cluster also appears. But from my past work, it’s pretty rare to have more than 2 substantial clusters on a given day.

So for the past 6 months I’ve been tracking the topics of these top 2 clusters. Each day I manually review them and identify what topic they are about. Often, a story will dominate the headlines 1 day and vanish the next. Think Kobe Bryant’s crash–a huge story one day and gone from the front pages the next. Other topics have staying power, like the impeachment trial or the presidential primary. COVID-19 is looking definitely like one of those staying stories:

multiDayFractionRed200322
Fraction of newspapers with a story on a given topic. Plotted are histograms for both the first and second largest topic clusters found each day, with second plotted in front of the first.

One important thing to point out: despite dominating the headlines, COVID-19 has smaller histogram numbers int he above graph than, say, the impeachment trial. I have done some preliminary work on this, and it seems to be because the COVID-19 stories change geographically. For the impeachment trial, all the stories were about events in DC. So stories all were very similar to one another. But for COVID-19, stories tend to be more local. So my algorithms often miss identifying these as being part of the same topic. (If you have good eyes, you can see that the last few days both the 1st and 2nd topics have been on COVID-19!).

I can visualize this failure of my current algorithms by superimposing stories that contain a particular key term on a given graph.

Here’s such a visual from today’s graph.

graph200322Coronavirus

Each story about COVID-19 is ringed in red. Notice how most of the stories in the larger clusters have this ring? The top cluster is stories about New York and hospital supplies. The next largest cluster in on the government’s economic rescue plan. The 3rd cluster is about charities closing down because of the virus. Different topics, but all tied together by a meta topic of the coronavirus.

So long story short: COVID-19 is a huge story right now and my algorithms weren’t built to handle this kind of thing. But what they show is what you’d expect: the story is everywhere and probably won’t be going anywhere anytime soon.

The Dominating Growth of Coronavirus in the News

Over the past week, we really seem to have really hit an inflection point for how much ink newspaper front pages are giving to the coronavirus epidemic. Just check out this GIF of word clouds from over the past few weeks (selected stills below) to see how headlines have become dominated by COVID-19:covid19

(A similar visual from President Trump’s impeachment trial is down below.)

I’ve been collecting pdfs of newspaper front pages for over 6 months. For each day, I’ve developed algorithms to identify the headlines on the paper, the corresponding news stories, and extract the text. From that point, I can take all the text and count up how many times individual words appear. If there is a single national story that many of the papers are reporting on, words associated with that story will show up a lot and be correspondingly large in the word cloud. For example, yesterday’s stories about coronavirus and the declared national emergency:

20200314When there isn’t a strong national story dominating the headlines, word choice is spread out over many topics and no word frequencies get particularly large. That’s what we see on days like Feb. 23:

20200223

What I find especially impressive about coronavirus’ domination of the headlines is how strong it appears against other large new events. Just look how it compares to headlines after Weinstein was convicted or Super Tuesday:

This slideshow requires JavaScript.

A few Q&As for the curious:

Q: What’s with that long line (———-) that keeps showing up?

A: The package I’m using to generate the word clouds (aptly named “wordcloud”) scales the sizes of the words in the cloud based on the word with the highest frequency count. That’s a problem for me, because I want to visualize how word frequency changes day by day. So I insert a fake word (———-) with a constant frequency count as a benchmark. Here’s an example of what a word cloud for Feb. 23rd looks like without and with this benchmark word:

benchmark200223

By adding the benchmark, I can more easily show the dominance of a few words on days when a national story like coronavirus takes hold of the front pages.

Q: So, can I use that benchmark word (———-) to determine how frequency another word was used?

A: …Sort of. There are some complications. The scaling algorithm in word cloud takes into account the length of a word in a not always intuitive way. So the size of 300 mentions of ‘trump’ may look different than the size of 300 mentions of ‘coronavirus’. Also, the number of newspapers I’m pulling data from each day changes. Sundays especially are low paper count days. If I just did a raw count of word frequencies in the word clouds, it would look like every Sunday nothing was happening. So I scale the word counts by the number of papers in that day’s collection.

Q: Why generate 2 word clouds for headlines and body text each day?

A: Because I can! Occasionally the size of words in the body word cloud doesn’t match those in the headline one. I bet there is an interesting reason for that, but I haven’t gotten around to investigating it. But easy to generate the data now and go back to check later, yeah? For example, here are the word clouds after Kobe Bryant’s helicopter crash.

20200127Comparison to the Trump Impeachment Trial:

I can do the same analysis on newspaper text from during President Trump’s impeachment trial. See below:

impeachAgain, we can see surges in the word cloud at various important points during the trial, with the largest surge occurring after Trump was acquitted:

20200206I’m also continuing all the graph analysis work and multi-day topic trend work I’ve written about previously. I’ll try to get a post on that work and coronavirus up soon!

Coronavirus (COVID-19) in the News

The coronavirus has been around since late 2019, and has been a continuous presence on the front page of U.S. newspapers since mid-January. But if you’re in the USA and feel like the intensity around the virus just went up a notch, you’re not alone.

Stories about the virus had their highest presence on U.S. newspaper front pages today (2/26/2019). Although no where near the levels of coverage that the impeachment trial of President Trump attained, stories about the virus have been flirting with being the top news story all month:

multiDayFractionRedZoom200226
Top 2 topics from US newspaper front pages for 2020. Topics that appeared in top 2 for 5+ days are given a color. Legend includes topics from 2019 (sorry! See my previous posts for those topics.).

Digging a bit beneath the surface, I can see that stories about COVID-19 never truly fell off the front pages, even if other topics became the top topic for the day.

KeytermCoronavirusZoom20200226
Fraction of stories that contained the word coronavirus somewhere in their body or headline. (Fraction values are slightly different than indicated in previous figure do to different methodologies, discussed below.)

So, how has the coverage of coronavirus changed over time? Well, one way to look at that is to check in on those top topics that were about the coronavirus and see what types of terms were being used in those stories.

Those top topics were found by taking all the hundreds of different stories that were on the front pages of different newspapers and forming a graph representation of how similar the language was in each story to every other story. (See my previous post for more info.) In the graph, each story is one of the nodes (little dots). If 2 stories shared a lot of language, an edge (line) was drawn to connect them.

graph200226
Graph of stories from 2/26/2020. Each story is represented by a green dot. Stories with similar language connected with a black edge. (An additional connecting routine links similar topics with red edges.)

What falls out of this is clumps of stories that are about similar topics. This process is not exact (hobby project!). Many stories don’t get connected (for reasons ranging from how the data was processed to how large the story was on the original front page). Below is the same graph, but with every story that mentions the word ‘coronavirus’ shown in blue while all other stories are in red. You can see a lot of stories in the center clump, but a good deal of stories on the edges too.

graph200226Coronavirus
Graph to stories from 02/26/2020 showing which stories contained the term “coronavirus.” While most of the stories are found in the day’s top topic, several stories mentioning coronavirus were not linked into the topic.

The final piece of the puzzle is a manual review, where I go look at the 2 largest clumps (or topics) and either assign them to an existing category or create a new category for them. To make my life easier, I pull some common terms out of the headlines that make up the topic (see this link for more about that!).

The downside of this design was only the top 2 topics were identified and sorted. While stories about the virus have been constant in the front pages of the newspapers for over a month, many of those days didn’t have a story about the virus in the top 2 topics.

So I went back over the past month and identified all the stories which mentioned the virus. Then I found the topic clump in the graph that had the highest number of those stories. Finally, I pulled the top headline terms for that topic clump.

What do those terms show? Early on the headlines all were focused on this novel virus coming out of China. Words like “china”,  “city”, “outbreak”, and “new” show up early. Later, words like “quarantined” and “cases” started to appear. And most recently, “CDC” and other countries like “America” start to show up. Below I include a table of all these results. Check it out if you are interested!

date Number of stories mentioning coronavirus Graph clump with most matched stories Number of matched stories in graph clump top terms from headline of graph clump
20200121 10 63 2 china|concerns|confirmed|

declareaglobal|human

20200122 42 1 9 1st|case|china|new|virus
20200123 32 2 8 chinese|city|outbreak|travel|virus
20200124 31 6 4 china|cities|stirs|stop|virus
20200125 55 3 12 2nd|china|coronavirus|new|virus
20200126 21 2 4 calls|grave|situation|virus|xi
20200127 55 6 8 confirmed|coronavirus|oc
20200128 68 4 9 cases|confi|coronavirus|rmed|tested
20200129 104 1 41 cdc|china|coronavirus|flight|testing
20200130 79 5 10 evacuees|land|socal|southland|virus
20200131 120 1 43 declares|emergency|global|person|virus
20200201 93 1 23 china|declares|emergency|

outbreak|virus

20200202 77 1 15 2020dems|focus|iowa|unity|virus
20200203 77 2 26 china|death|outside|philippines|virus
20200204 72 2 13 china|coronavirus|hospital|opens|virus
20200205 58 9 5 child|hospital|observation|

quarantined|taken

20200206 44 23 3 81rst|case|coronavirus|county|dane
20200207 62 4 10 china|dies|doctor|virus|warned
20200208 78 2 14 anger|china|death|doctor|virus
20200209 64 1 16 cases|china|citizen|coronavirus|virus
20200210 108 0 25 cases|china|sars|toll|virus
20200211 88 4 10 cases|off|rise|ship|virus
20200212 85 3 13 cleared|coronavirus|ends|

leave|quarantine

20200213 62 2 13 cases|hope|misery|new|outbreak
20200214 80 2 21 coronavirus|count|hospitals|

prepare|virus

20200215 102 1 32 flu|hits|kids|second|wave
20200216 39 14 4 county|expenses|feds|pay|quarantine
20200217 93 0 50 americans|cruise|quarantine|ship|trade
20200218 134 0 71 americans|bases|cruise|

passengers|quarantined

20200219 37 85 2 businesses|coronavirus|effects|

feel|feeling

20200220 29 30 3 cancer|freedom|march|rising|risk
20200221 33 35 3 americans|flu|new|virus|worry
20200222 33 13 4 7k|home|new|sacramento|stay
20200223 66 5 3 delicious|fat|paczki|pastry|say
20200224 45 1 12 contain|italy|korea|outbreak|virus
20200225 126 1 51 000|asia|dow|pushes|virus
20200226 205 0 104 cdc|officials|spread|virus|warn

If you noticed that some of those top terms didn’t seem to match coronavirus, you’re absolutely right! On several days, very few stories about the coronavirus combined into a large topic. Take 02/23/2020. Only 3 stories were found in the same group. Naturally, that group’s top headlines were about a different topic. Still, on the whole, we can see the trend of the global virus in these headlines. Which is pretty cool!

So, I’d really like to make an awesome visual of this, but don’t have any stellar ideas. Do you? Please let me know!


Update on 03/01/2020: The increased presence of coronavirus in the news is continuing. Quick update to include the most recent data:

KeytermCoronavirusZoom20200301

 

Coronavirus in the News

My current pipeline for analyzing the front pages of U.S. newspapers focuses on just the top 2 topics of each day. It’s a pipeline that does a good job tracking when a story really breaks through the noise and dominates the headlines–such as the ongoing impeachment of President Trump or Hurricane Dorian from September 2019.

multiDayFractionRed200203
Bar graph of the number of stories on a given topic divided by the number of newspapers in that day’s dataset. This is close to the fraction of papers covering a given topic, but doesn’t account for 1 paper running multiple front page stories on the same topic. Only the top 2 topics from each day are plotted. Beyond that, noise overwhelms the signal. Only topics that had 5+ days as one of these daily top 2 topics are highlighted in the legend. Missing data from early January visible as the gap.

What this process doesn’t do is capture any build up to the story. It also has a tendency to capture fast paced stories that capture the collective attention of the country for a few days (impeachment not withstanding!).

Recently, stories on the coronavirus outbreak had finally pushed their way onto the top 2 daily topics:

multiDayFractionRedZoom200203
Zoom in on top 2 daily topics in 2020. Impeachment still dominates, but coronavirus starts showing up late January.

But over the past few days those stories have been overshadowed by topics like the Iowa caucus and the Superbowl.

I’m working on a clever way of tracking these topics outside of the top 2 daily stories. One of the elements of that tracking is to use track the usage of key words and phrases over the course of time. For example, here’s the fraction of U.S. newspapers that mentioned the word ‘dorian’ somewhere on their front page:

KeytermDorian20200203
Fraction of newspapers mentioning the keyterm ‘dorian’ somewhere on their front page. Large spike correlates with Hurricane Dorian threatening the eastern seaboard.

Dorian shows a clear spike in usage followed by a trailing edge as stories about the hurricane faded from the front page (and occasional stories about recovery kept popping back up). Contrast that with keyterms that are very, very common in newspapers these days: ‘Trump’ and ‘impeach’:

Trump a constant presence. While impeach has been a common word since soon after the Ukraine scandal broke.

This kind of keyterm tracking lets me peak into the staying power of certain stories. For example, when the U.S. killed the leader of ISIS, it was a huge story…for 1 day. After that–nothing:

KeytermAlBaghdadi20200203
The U.S. killing of ISIS leader Al-Baghdadi generated huge headlines for a single day. Then fell off the radar.

Bringing everything back to coronavirus, a few interesting differences jump out. First, and as you might expect, usage only showed up recently:

KeytermCoronavirus20200203
Coronavirus mentions in U.S. newspapers showing a rapid growth in usage in late Jan. 2020.

But if you zoom in on that last month, a couple interesting things jump out…

KeytermCoronavirusZoom20200203
Coronavirus usage in 2020 newspaper front pages. Even while the topic fell off the radar in my top topics pipeline, the usage of the term didn’t fade.

First, even though the coronavirus topic fell out of the top 2 on Feb. 2nd and 3rd, it’s still very much a story.

Second, it first made front page news on January 9th. That came from the Wall Street Journal, which had this line on its front page.

Chinese scientists investigating a mystery illness that has sickened dozens in central China have discovered a new strain of coronavirus. A9

So, the coronavirus was making it onto the front page when there were only dozens of reported infections. Wouldn’t it be cool to build a tracker to ID those mentions early? That’s a hard project. Maybe I’ll try it!

Tracking key terms is one thing if you already know the term to look for. But how would we have known to look for the term coronavirus on January 9th? The top topic system is built to auto-identify daily topics. I’ll be working on merging these pipelines to try to make a system that can detect stories on the rise. Stay tuned!

Newspaper Trends in Early 2020

Another brief update to bring this blog temporarily up-to-date on my tracking of front page stories in U.S. newspapers. At the end of 2019, the dominating story line was the impeachment of President Trump. So far in 2020, the dominating story line is…the impeachment of President Trump.

multiDayFractionRed200202_gapExplain

Note the sad gap over the first few days of 2020. The Newseum is the source of my raw data–the PDFs of newspaper front pages. On January 1st, the structure of that website changed and I didn’t get around to fixing the pipeline for a few days. So some data was tragically lost. 😦

While no 2020 story has achieved the single day dominance that occurred after the President was officially impeached by the House on December 18th, the impeachment story continued to dominate the front pages throughout the Senate trial (still technically ongoing as of this writing).

However, a few different stories managed to squeeze in the gaps left by the impeachment saga. First, the U.S. killing of Iran’s Soleimani general occurred in early January. The biggest news days probably happened during the lost data days, but the killing did have several days as the top story even after the data pipeline started flowing again. Compared to the U.S. killing of ISIS’ leader in late October (second biggest story in the figure above), the killing of Soleimani had substantially more staying power. The killing of the leader of ISIS dropped off the top story list     after just 2 days. Soleimani lasted substantially longer than that.

Finally, the new 2019 coronavirus coming out of China has finally started to break into the top story list in U.S. newspapers. More on that in a later post.

President Trump Impeached

This is old news by now, but the President was impeached by the House of Representatives a few days ago. Their wasn’t a ton of suspense leading up to the vote — I feel we all had a pretty good idea how it was going to play out well before the first vote was cast — but that didn’t stop it from being a big, big news story the next day.

Since early August, I’ve been harvesting the front page newspaper PDFs from the Newseum and running analysis to identify how many stories in those papers are about the same topics. The day after Trump’s impeachment, it was the only story in town:

191219_hist

(Above: poorly labeled histogram. Horizontal axis labeled ‘group number’ should be called topics in that day’s collection of newspapers. Vertical axis is the number of stories written about that topic. As you can see, 1 topic dominated all the rest.)

I generate histograms like the one above by identifying all the stories in all the newspaper front pages, then doing analysis on the similarity of each story. If stories are similar to one another, I connect them as being part of the same topic. Once completed, I get a nice graph that shows the size of each topic from the day’s news:

graph191219

(Above: graph of all the stories found from hundreds of US newspaper front pages from 12/19/19. Each cluster represents a topic. The large collection at the center is impeachment.)

Again, one story and only one story dominates on Dec. 19th. An analysis of the headlines associated with those stories tells us what they are all about:

Subgraph0_191219.png

(Above: the large central subgraph. The title are common words found in the headlines associated with the stories, which give us the topic.)

This large central subgraph is a red flag that a bit event went down. One that all those paper’s editors collectively thought was a big deal. For comparison, here’s the graph from a slow news day (12/01/19):

graph191201

(Above: slow news day graph. No single topic dominates the graph.)

I’ve been collecting this data since August, and have been posting updates on the impeachment process as it’s played out. Looking back over all those day’s of data shows just how long lasting this impeachment story has been:

multiDayFraction191219

(Above: Complete graph of both the largest topic each day and the 2nd largest topic each day. If a topic continued onto multiple days, it was given a color and label in the key to the right.)

I’ve been collecting data long enough that this plot is getting pretty messy. Here is a simplified graph, with only a few of the longest lasting topics highlighted:

multiDayFractionRed191219

(Above: the same histogram as before, now only looking at the top few topics with the longest staying power. Even in this view, impeachment is in a category of its own.)

In mid September, I though the Hurricane Dorian story was long lasting. But look at how much longer the impeachment story has stayed in the headlines! It’s an incredible run. Now to see what happens as the process moves to the U.S. Senate…

The Trump-Ukraine Impeachment Story: Just keeps going!

Another update on my tracking of the country’s front page stories. The last few updates have all been about the impeachment story coming out of DC, and today is no exception.

In my last post from Oct. 20th, the impeachment story seemed to be fading out of the news cycle. The strong dominance from the early days (where the topic was a front page story in over half the daily papers!) had faded behind news about Turkey attacking our Kurdish allies in Syria. It was an open question if the impeachment story had run its course in the national news.

top2StoriesManualCats191109

Turns out, no. It had not run its course and has re-emerged as the dominant story line of the recent news cycle. Even when a large news event happened, it didn’t knock the topic off the lead for long. The US killing of the head of the Islamic State was the largest single day story I have on record, but it faded just as fast.

So far, the impeachment story has outlasted major stories on:

  1. The Turkish attack on the Kurds,
  2. An epidemic of vaping deaths,
  3. The GM union strike,
  4. Climate Change,
  5. California wildfires, and
  6. The killing of the leader of the Islamic State.

That’s some staying power! And as the House’s impeachment inquiry heads into the public hearing phase, I’m going to predict the topic won’t fade away unless something really big happens.

Trump-Ukraine Impeachment story: 1 month in

Back for a quick post on the Trump-Ukraine impeachment. In my last post, the story had been dominating national headlines for 2 weeks. But a month in, it’s finally faded as the top story line. The US rapid withdrawal from Syria and the Turkish invasion of Kurdish territory quickly grew to dominate the top spot over the last week:

topStoryManualCats191020

I was curious just how far the impeachment story fell off the radar. My current method was only looking at the top story, so I doubled back and repeated the same analysis for the 2nd most popular story as well:

top2StoriesManualCats191020

In the plot above, the 2nd most popular topic of the day is plotted in front of the top topic of the day. On most days, that 2nd most popular topic was a one-off, but occasionally it was part of a multi-day theme.

If we look at the days in early to mid October, we can the impeachment inquiry story and the Turkish invasion of the Kurds territory story start to trade back and forth between the top 2 spots. So while the impeachment story sank slightly from its dominating position in late September, it was still just beneath the surface!

But over the last few days, the impeachment story fell from even the top 2 spots. What event will bring it back to the forefront of the countries papers? Only time will tell!