In this article, we will learn how to scrape Google Ngarm using Python. Google Ngram/Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings.
The Google Ngram Viewer is a search engine used to determine the popularity of a word or a phrase in books. Google ngram viewer gives us various filter options, including selecting the language/genre of the books (also called corpus) and the range of years in which the books were published. By default, the search is case-sensitive.
If we search for "Albert Einstein" in Google Ngram, the search result will look like this.
We can even compare the popularity of different phrases in the same search result by separating them with commas. For example, we can compare the popularity of "Albert Einstein" vs "Isaac Newton" from the years 1850 to 1900 across different books written in the English language.
If we search for "Albert Einstein" in google ngram with the years ranging from 1850 to 1860, corpus being English, and 0 smoothing, we will see a graph as shown in the image above. The URL of this search query will look like this.
We can extract this JSON data using Python.
Now, we will create a function that extracts the data from google ngram's website. Go through the comments written along with the code in order to follow along.
[('Albert Einstein', [0.0, 0.0, 0.0, 0.0, 2.171790969285325e-09,
1.014315520464492e-09, 6.44787723214079e-10, 0.0, 7.01216085197131e-10, 0.0, 0.0])]
We can even enter multiple phrases in the same query by separating each phrase with commas.
[('Albert Einstein', [0.0, 0.0, 0.0, 0.0, 2.171790969285325e-09,
1.014315520464492e-09, 6.44787723214079e-10, 0.0, 7.01216085197131e-10,
0.0, 0.0]), ('Isaac Newton', [1.568728407619346e-06, 1.135979687205690e-06,
1.140318772741011e-06, 1.102130454455618e-06, 1.34806168716750e-06,
2.039112359852879e-06, 1.356955749542976e-06, 1.121004174819972e-06,
1.223622120960499e-06, 1.18965874662535e-06, 1.077695060303085e-06])]