ForwardIn part two of this series of articles, we set the stage for making graphs from our dataset using principal component analysis (PCA). In this post, we will finally get into the data and draw conclusions from our speech data.
Diving into the Data
|Figure 1 - Speeches by Decade|
Since our dataset spans such a broad range (1789 - 2019), let's start by visualizing our data over time. To do this, we can aggregate our speeches by decade and plot the PCA components for each decade. The result of this can be seen in figure-1. It is interesting to note that the decades don't just form a single trend. A given decade's closest neighbour can be sixty to one-hundred years away from it, some interesting examples of this are 1800 - 1900 and 1990 - 2010. With more knowledge of American History, there are likely some interesting conclusions that could result from this data, but I'll leave it up to the reader to draw their own conclusions. It is worth noting that going from 100D to 3D introduces a-lot of error, and all results should be taken with caution.
The Modern EraIn Figure-1, it can be seen that speeches are tending to clump together with time. This trend is most evident in the modern era (1940's onwards) and can be seen in greater detail in Figure-2. The strong clumping of decades insinuates that there has been consistent word usage in the modern era. If we are to take Figure-2 at face value. The tighter grouping of the Modern Era (Yellow circle) shows that word usage has become less diverse. In contrast, the Previous Era (Red circle) shows that decades before 1940 exhibited very unique word usage and were very unique.
|Figure 2 - Modern vs previous era comparison|
One of the most intuitive ways to compare eras is to analyze word usage. In Figure-3, we can see a comparison of the top 10 most frequently used words by era. Next to each word in the modern era, we can see how the frequency of each word has altered between eras.
In the previous era, the unique rhetoric consists of States, Government, United, Congress and Country. To me, these words all seem very common and can be grouped together as "government-y" words. In my opinion, these words appeal more to government figures (senators, congresspeople, etc) than to the people. These differences make sense when contrasting the communication mediums of the time (absence of radio and television).