Saturday, July 4, 2020

Is America coming together or drifting apart with time? 😰


In part two of this series of articles, we set the stage for making graphs from our dataset using principal component analysis (PCA). In this post, we will finally get into the data and draw conclusions from our speech data. 

Diving into the Data

Figure 1 - Speeches by Decade
Since our dataset spans such a broad range (1789 - 2019), let's start by visualizing our data over time. To do this, we can aggregate our speeches by decade and plot the PCA components for each decade. The result of this can be seen in figure-1. It is interesting to note that the decades don't just form a single trend. A given decade's closest neighbour can be sixty to one-hundred years away from it, some interesting examples of this are 1800 - 1900 and 1990 - 2010. With more knowledge of American History, there are likely some interesting conclusions that could result from this data, but I'll leave it up to the reader to draw their own conclusions. It is worth noting that going from 100D to 3D introduces a-lot of error, and all results should be taken with caution. 

The Modern Era

In Figure-1, it can be seen that speeches are tending to clump together with time. This trend is most evident in the modern era (1940's onwards) and can be seen in greater detail in Figure-2. The strong clumping of decades insinuates that there has been consistent word usage in the modern era. If we are to take Figure-2 at face value. The tighter grouping of the Modern Era (Yellow circle) shows that word usage has become less diverse. In contrast, the Previous Era (Red circle) shows that decades before 1940 exhibited very unique word usage and were very unique.

Figure 2 - Modern vs previous era comparison

Digging Deeper

One of the most intuitive ways to compare eras is to analyze word usage. In Figure-3, we can see a comparison of the top 10 most frequently used words by era. Next to each word in the modern era, we can see how the frequency of each word has altered between eras.

Figure 3 - Modern vs. Previous Rhetoric

Comparing the word usage of the modern era to the previous, several trends jump out to me. Firstly it seems that in the modern era, there are a lot more mentions to people (People, Us, President, American). Secondly, there appears to be much more urgency with the addition of "must" in the modern era. Thirdly, the additions of more geographic words (World and America) adds a more global flavour to the modern era. These three characteristics combined (Individualism, Urgency and Geography) are the three central themes that appear most in the modern era.

In the previous era, the unique rhetoric consists of States, Government, United, Congress and Country. To me, these words all seem very common and can be grouped together as "government-y" words. In my opinion, these words appeal more to government figures (senators, congresspeople, etc) than to the people. These differences make sense when contrasting the communication mediums of the time (absence of radio and television).