An analysis of tweets from U.S. senators and the president shows an interesting thing: by some metrics, Pres. Trump tweets like a Democratic Senator. In terms of both sentiment and frequency, the president looks much more like an average Democrat than any senator of the President’s own Republican party.
President as an average Democrat
To find this little trend, I collected every tweet (excluding retweets) from the president and the 100 U.S. senators for 2017. From there, I wrote a program that judged the sentiment of each tweet and classified it as either 0 (negative) or 1 (positive). The program has a measured accuracy of 77%, which is middling for these types of AIs but high enough to make claims about averaged results (more on the program at the bottom of the post!) .
With this information in hand, I looked at the average sentiment of a politician’s twitter handle over the first 4 months of 2017. Some senators have very negative twitter handles (nearly 3 out of every 4 tweets from Sen. Elizabeth Warren (D) of Massachusetts are negative), while others are much more opposite (almost 9 out of 10 tweets from Sen. Thad Cochran (R) of Mississippi are positive). You can look at the scores for every senator here.
Combining these average sentiment values for each senator into a single histogram, I can clearly see the two parties of congress have different distributions:
While there is some overlap, Democrats tend to have more negative tweets with an average sentiment value of .51 (roughly half positive and half negative). Republicans are substantially more positive, with an average of .67. I also found President Trump’s average over the same time period: .47, which places him squarely with the Democrats.
Something this first histogram hides is the number of tweets from every senator used to calculate the average sentiment score. Sen. Warren had 442 tweets during the 4 months, and Sen. Cochran had 108. Plenty to get a nice, robust average. On the other hand, Sen. John Neely Kennedy (R) of Louisiana had exactly 1 tweet during that time period. This histogram makes him look like a very positive person, but 1 tweet isn’t really enough to make a claim on the sentiment of the twitter feed.
I can make a similar histogram for the total number of tweets each senator made during these 4 months:
Again, Pres. Trump falls in the center of the Democrat distribution and on the periphery of his own party.
So in both sentiment and frequency, Pres. Trump looks much more like a Democrat than a Republican. There might be further conclusions that can be drawn from that fact, but I don’t have any on hand right now. If you’re curious which senator’s twitter handle looks most like the president’s, it is Sen. Maria Cantwell (D) of Washington. In both average sentiment (Cantwell: .472, Trump: .474) and counts (Cantwell: 722, Trump: 641), Sen. Cantwell and Pres. Trump are quite similar. (Policy wise, of course, is a different story!)
Trends over time
Instead of looking at each senator and averaging over the full 4 months, I can look at each party and find the average sentiment each week. That lets me see how the sentiment of each party in the Senate changes with time.
You might expect the 2 curves to move in opposite directions. That is, when Democrats have lots of negative tweets, Republicans have lots of positive tweets. But that’s not what I found. The 2 curves have the same rises and falls in weekly sentiment, it’s only the overall average between the two parties that is different:
I’ve also included the President’s weekly averages. It has a lot more variance, which isn’t surprising—the President has fewer tweets per week than the combined voices (fingers?) of the Senate.
I suspect there’s something interesting hidden in the fact that the Democrats and Republicans rise and fall together, but I haven’t teased that out as of yet. Let me know if you have an explanation!
NPR did it first
I’m not the first person with this idea. NPR did a similar analysis on Pres. Trump a month ago. Their results don’t perfectly overlap with mine, but that is to be expected. They used a different sentiment analysis technique and probably had a similar accuracy rate. However, the same broad features (a dip near the start of February, a couple of peaks at the start of March and April) are present in both. A nice little check that my program isn’t totally off the mark with its sentiment analysis.
That’s it for the political insights for this post. The last section looks a bit more at the computer program I wrote to do this analysis and the processing I did on the tweets to make them play nicely with the program.
Sentiment analysis with machine learning
The computer program I made to determine the sentiment of all these tweets is a neural network, a type of machine learning algorithm that can be very accurate if cleverly designed and given enough training data. The webcomic XKCD recently had a pretty good description of how these machine learning algorithms work:
The important bit the comic misses is how you stir. Every time you go and check the answers coming out of the system, the results tell you how to stir such that next time the answers are a little bit more correct. This is why these systems require a lot of training data–data where you know the answers before sending it through the linear algebra. Eventually, with enough checking of the results from the training data and doing enough stirring, these systems give accurate answers (often said to have been ‘trained’) and can be used with data where we don’t know the answers in advance.
If you’ve read about neural networks before, you might be more familiar with figures that include the nodes of a neural network. I’ll include a figure of that too, but please keep in mind this figure and the XKCD comic above show the exact same thing (with the exception of highlighting the importance of the special stir):
My design was pretty basic: a vanilla version of a neural network readily available to python users as part of the sklearn package (MLPClassifier). For training data, I used 200,000 of the 1.5M labeled tweets from here, evenly split between positive and negative tweets.
Neural networks work best with a numeric input—but a tweet is a collection of words. To convert from words to numbers, I made use of Stanford’s GloVe look up table. Using the table, I can convert almost any tweeted word into a numeric vector of some dimension (for this work I used 50 dimensional word vectors). These word vectors are nice because they encode a lot of the relationships between words. E.g. the vectors for ‘hotel’ and ‘motel’ share more similarity than the vectors for ‘hotel’ and ‘physics.’
I also only looked at some of the words in each tweet. From my training data, I can find what words are more often found in tweets that are either positive or negative. E.g. ‘terrible’ is more often in negative tweets. Limiting myself to including the word vectors of these semi-predictive words improved the accuracy of my neural network:
All of this work was done on my laptop in a Jupyter notebook environment running Anaconda python. Run times varied from under a minute to an hour, depending on the part of the program and the amount of training data. Every subset of the program was its own notebook, which I’ve uploaded to GitHub here.