0% found this document useful (0 votes)
10 views

W7 - MongoDB in Python (Me)

The document discusses installing and using MongoDB with Python. It explains how to connect to a MongoDB database, how JSON data maps to Python and MongoDB structures, and how to query and retrieve documents from MongoDB collections using methods like find() and find_one().

Uploaded by

gihankumar4678
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

W7 - MongoDB in Python (Me)

The document discusses installing and using MongoDB with Python. It explains how to connect to a MongoDB database, how JSON data maps to Python and MongoDB structures, and how to query and retrieve documents from MongoDB collections using methods like find() and find_one().

Uploaded by

gihankumar4678
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

MongoDB in Python

1 Install Python package and test MongoDB


Python needs a MongoDB driver to access the MongoDB database. The Python package for
MongoDB driver is “PyMongo”.
Run the commend below to install the “PyMongo” package (e.g. use Powershell in Anaconda):
pip3 install pymongo

[1]: # If the code below was executed with no errors, "pymongo" is installed and␣
,→ready to be used.

import pymongo

To run the Python code for MongoDB, the MongoDB server (mongod) must be run/started first.
To create a database in MongoDB, start by creating a MongoClient object, then specify a connection
URL with the correct ip address and the name of the database you want to create.
MongoDB will create the database if it does not exist, and make a connection to it.

2 MongoDB structure
JSON <> Python
JSON (JavaScript Object Notation) is the basis of MongoDB’s data format.
JSON has two collection structures. Objects map string keys to values, and arrays order values.
JSON data types have equivalents in Python. - JSON objects are like Python dictionaries with
string-type keys. - Arrays are like Python lists.
JSON <> Python <> MongoDB

1
• A MongoDB database maps names to collections. You can access collections by name the
same way you would access values in a Python dictionary.
• A collection, in turn, is like a list of dictionaries, called “documents” by MongoDB. When a
dictionary is a value within a document, that’s a subdocument.
• Values in a document can be any of the types I mentioned. MongoDB also supports some
types native to Python but not to JSON. Two examples are dates and regular expressions.
Accessing databases and collections
To access databases and collections from a client object. - One way is square bracket notation, as
if a client is a dictionary of databases, with database names as keys. A database in turn is like a
dictionary of collections, with collection names as keys. - - - Another way to access things is dot
notation. Databases are attributes of a client, and collections are attributes of a database.

[2]: """
#The code below fetches real and latest data from Nobel Prize website, and
# store it into collections

import pymongo
import json
import requests

# Connnect to local MongoDB server


client = pymongo.MongoClient("mongodb://localhost:27017/")

# Create a database 'nobel'


nobel = client["nobel"]

# Return a list of your system's databases:


# In MongoDB, a database is not created until it gets content.
# print(client.list_database_names())

2
for collection_name in ["prizes", "laureates"]:
# collect the date from the API
response = requests.get("http://api.nobelprize.org/v1/{}.json".\
format(collection_name[:-1]))
# convert the data to json
documents = response.json()[collection_name]
# Create collections on the fly
nobel[collection_name].insert_many(documents)

# Show the number of document in a collection


print(nobel["prizes"].count_documents({}))
print(nobel["laureates"].count_documents({}))

# delete the collections


nobel["prizes"].drop()
nobel["laureates"].drop()
"""

[2]: '\n#The code below fetches real and latest data from Nobel Prize website, and\n#
store it into collections\n\nimport pymongo\nimport json\nimport requests\n\n#
Connnect to local MongoDB server\nclient =
pymongo.MongoClient("mongodb://localhost:27017/")\n\n# Create a database
\'nobel\'\nnobel = client["nobel"]\n\n# Return a list of your system\'s
databases:\n# In MongoDB, a database is not created until it gets content.\n#
print(client.list_database_names())\n\n\nfor collection_name in ["prizes",
"laureates"]:\n # collect the date from the API\n response =
requests.get("http://api.nobelprize.org/v1/{}.json".
format(collection_name[:-1]))\n # convert the data to json\n documents =
response.json()[collection_name]\n # Create collections on the fly\n
nobel[collection_name].insert_many(documents)\n\n# Show the number of document
in a collection\nprint(nobel["prizes"].count_documents({}))\nprint(nobel["laurea
tes"].count_documents({}))\n\n# delete the
collections\nnobel["prizes"].drop()\nnobel["laureates"].drop()\n'

[3]: # Read json files and load them into collections

import json
from pymongo import MongoClient

# make connection to MongoDB server


client = MongoClient("mongodb://localhost:27017/")

# create database named 'nobel'


nobel = client["nobel"]

# Create collections
prizes = nobel.prizes

3
laureates = nobel.laureates
# alternative ways to create collections
# prizes = nobel["prizes"]
# laureates = nobel["laureates"]

# Loading json files to collections


with open('prizes.json') as file:
file_data = json.load(file)
# Inserting the loaded data in the Collection
# if JSON contains data more than one entry
# insert_many is used else inser_one is used
if isinstance(file_data, list):
prizes.insert_many(file_data)
else:
prizes.insert_one(file_data)

with open('laureates.json') as file:


file_data = json.load(file)
if isinstance(file_data, list):
laureates.insert_many(file_data)
else:
laureates.insert_one(file_data)

# check
print("List of collection names: ", nobel.list_collection_names())

# check the number of documents in a collection.


# remember to pass an empty filter document {} to the count_documents() method ␣
,→

print("Number of documents in the prizes collection: ", prizes.


,→count_documents({}))

print("Number of documents in the laureates collection: ", laureates.


,→count_documents({}))

print("\n")

# The .find_one() method of a collection can be used to retrieve a single␣


,→document.

# This method accepts an optional filter argument specifings the pattern that␣
,→the document must match

# You can specify no filter or an empty document filter ({}), in which case␣
,→MongoDB will

# return the document that is first in the internal order of the collection.
prize = prizes.find_one()

4
laureate = laureates.find_one()

# Print the sample prize and laureate documents


print("One document of the Prizes collection: \n", prize); print("\n")
print("One document of the Laureates collection: \n", laureate); print("\n")
print("Type of a document:", type(laureate))

# To print all documents in a collection


# for doc in prizes.find({}):
# print(doc)

# Get the list of fields (keys) present in each type of document


prize_fields = list(prize.keys())
laureate_fields = list(laureate.keys())

print("The keys in a Prize document: \n", prize_fields); print("\n")


print("The keys in a Laureates document: \n", laureate_fields)

# delete collections
#prizes.drop()
#laureates.drop()

# delete database
#client.drop_database('nobel')

List of collection names: ['laureates', 'prizes']


Number of documents in the prizes collection: 590
Number of documents in the laureates collection: 934

One document of the Prizes collection:


{'_id': ObjectId('5fcf861e166123acf44a9dcb'), 'year': '2018', 'category':
'physics', 'overallMotivation': '“for groundbreaking inventions in the field of
laser physics”', 'laureates': [{'id': '960', 'firstname': 'Arthur', 'surname':
'Ashkin', 'motivation': '"for the optical tweezers and their application to
biological systems"', 'share': '2'}, {'id': '961', 'firstname': 'Gérard',
'surname': 'Mourou', 'motivation': '"for their method of generating high-
intensity, ultra-short optical pulses"', 'share': '4'}, {'id': '962',
'firstname': 'Donna', 'surname': 'Strickland', 'motivation': '"for their method
of generating high-intensity, ultra-short optical pulses"', 'share': '4'}]}

One document of the Laureates collection:


{'_id': ObjectId('5fcf861e166123acf44aa019'), 'id': '2', 'firstname': 'Hendrik
Antoon', 'surname': 'Lorentz', 'born': '1853-07-18', 'died': '1928-02-04',
'bornCountry': 'the Netherlands', 'bornCountryCode': 'NL', 'bornCity': 'Arnhem',

5
'diedCountry': 'the Netherlands', 'diedCountryCode': 'NL', 'gender': 'male',
'prizes': [{'year': '1902', 'category': 'physics', 'share': '2', 'motivation':
'"in recognition of the extraordinary service they rendered by their researches
into the influence of magnetism upon radiation phenomena"', 'affiliations':
[{'name': 'Leiden University', 'city': 'Leiden', 'country': 'the
Netherlands'}]}]}

Type of a document: <class 'dict'>


The keys in a Prize document:
['_id', 'year', 'category', 'overallMotivation', 'laureates']

The keys in a Laureates document:


['_id', 'id', 'firstname', 'surname', 'born', 'died', 'bornCountry',
'bornCountryCode', 'bornCity', 'diedCountry', 'diedCountryCode', 'gender',
'prizes']

3 Query collections / finding documents


In MongoDB we use the find() and find_one() methods to find data in a collection: - The find_one()
method returns the first occurrence in the selection. - The find() method returns all occurrences
in the selection.
These methods may have some parameters: - The first parameter of the find() method is the
selection/filter criteria expressed as a document. - The second parameter of the find() method is
an object describing which fields to include in the result.
The filter document mirrors the structure of documents to match in the collection. Operators can
be placed in a filter document to wrap around a field and its acceptable values. - Operators in
MongoDB have a dollar-sign prefix. - Comparison operators order values in lexicographic order,
meaning alphabetical order for non-numeric values. - Dot notation (.) can be used to query
document substructure, and it gives a full path to a field from the document root. You can also
reference an array element by its numerical index using dot notation

6
[14]: # Create a filter for Germany-born laureates who died in the USA and with the␣
,→first name "Albert"

criteria = {'diedCountry': 'USA',


'bornCountry': 'Germany',
'firstname': 'Albert'}
# Save the count
count = nobel.laureates.count_documents(criteria)
print(count)

# Save a filter for laureates born in the USA, Canada, or Mexico


criteria = { "bornCountry":
{ "$in": ["USA", "Canada", "Mexico"]}
}
# Count them and save the count
count = nobel.laureates.count_documents(criteria)
print(count)

# Save a filter for laureates who died in the USA and were not born there
criteria = { 'diedCountry': 'USA',
'bornCountry': { "$ne": 'USA'},
}
# Count them
count = nobel.laureates.count_documents(criteria)
print(count)

# Use dot notation. Filter for laureates born in Austria with non-Austria prize␣
,→affiliation

criteria = {"bornCountry": "Austria",


"prizes.affiliations.country": {"$ne": "Austria"}}
# Count the number of such laureates
count = laureates.count_documents(criteria)
print(count)

# Filter for documents without a "born" field


criteria = {"born": {"$exists": False}}
# Save the count
count = laureates.count_documents(criteria)
print(count)

# Use dot notation. Filter for laureates with at least three prizes
criteria = {"prizes.2": {"$exists": True}}
# Find one laureate with at least three prizes

7
doc = laureates.find_one(criteria)
# Print the document
print(doc)

1
291
69
10
0
{'_id': ObjectId('5fcce8b621f0b9f44a855e18'), 'id': '482', 'firstname': 'Comité
international de la Croix Rouge (International Committee of the Red Cross)',
'born': '0000-00-00', 'died': '0000-00-00', 'gender': 'org', 'prizes': [{'year':
'1917', 'category': 'peace', 'share': '1', 'affiliations': [[]]}, {'year':
'1944', 'category': 'peace', 'share': '1', 'affiliations': [[]]}, {'year':
'1963', 'category': 'peace', 'share': '2', 'affiliations': [[]]}]}

4 Working with Distinct Values and Sets


4.1 Distinct values
The distinct() function that finds and returns the distinct values for a specified field across a single
collection and returns the results in an array.
The distinct(key, filter = None, ) method can take parameters like: - key: field name for which
the distinct values need to be found. - filter : (Optional) A query document that specifies the
documents from which to retrieve the distinct values. The method will fetch field values only from
those matching documents.

[5]: # All the values (countries) for the "diedCountry" field. Convert the result to␣
,→a set

died_countries = set(laureates.distinct("diedCountry"))
# All the values (countries) for the "bornCountry" field. Convert the result to␣
,→a set

born_countries = set(laureates.distinct("bornCountry"))
# Countries recorded as countries of death but not as countries of birth
print(died_countries - born_countries)

# The number of distinct countries of laureate affiliation for prizes


count = len(laureates.distinct("prizes.affiliations.country"))
print(count)

# Save a filter for prize documents with three or more laureates


criteria = {"laureates.2": {"$exists": True}}

# Save the set of distinct prize categories in documents satisfying the criteria
# i.e., returns all prize categories shared by three or more laureates.

8
triple_play_categories = set(prizes.distinct("category", criteria))
# Find the prize categories that are not shared by three or more laureates
print(set(prizes.distinct("category")) - triple_play_categories)

# the ratio of the number of laureates who won an unshared prize


# on the catetories other than "physics", "chemistry" and "medicine"
# in or after 1945 to the number of laureates who shared a prize in or after␣
,→1945

# Save a filter for laureates with unshared prizes


unshared = {
"prizes": {"$elemMatch": {
"category": {"$nin": ["physics", "chemistry", "medicine"]},
"share": "1",
"year": {"$gte": "1945"},
}}}

# Save a filter for laureates with shared prizes


shared = {
"prizes": {"$elemMatch": {
"category": {"$nin": ["physics", "chemistry", "medicine"]},
"share": {"$ne": "1"},
"year": {"$gte": "1945"},
}}}

ratio = laureates.count_documents(unshared) / laureates.count_documents(shared)


print("Ratio: ", ratio)

# Ratio of organizations won prizes before 1945 versus in or after 1945


# Save a filter for organization laureates with prizes won before 1945
before = {
"gender": "org",
"prizes.year": {"$lt": "1945"},
}

# Save a filter for organization laureates with prizes won in or after 1945
in_or_after = {
"gender": "org",
"prizes.year": {"$gte": "1945"},
}

n_before = laureates.count_documents(before)
n_in_or_after = laureates.count_documents(in_or_after)
ratio = n_in_or_after / (n_in_or_after + n_before)
print("Ratio: ", ratio)

9
{'Greece', 'Israel', 'East Germany', 'Gabon', 'Tunisia', 'USSR', 'Jamaica',
'Northern Rhodesia (now Zambia)', 'Barbados', 'Puerto Rico', 'Yugoslavia (now
Serbia)', 'Czechoslovakia', 'Philippines'}
29
{'literature'}
Ratio: 1.3653846153846154
Ratio: 0.84

4.2 Filtering with regular expressions


For string-valued fields, regular expressions are a powerful way to match a field’s value to a pattern.
Finding a substring with Regex operator.
• To match the beginning of a field’s value, use the caret character (^). Anchor it to the
beginning of the string you pass to regex.
• To escape a character, use a backslash (\)
• to match the end of a field’s value, use the dollar sign ($). Anchor it to the end of what you
pass to regex

[12]: from bson.regex import Regex

# Filter for laureates with "Germany" in their "bornCountry" value


criteria = {"bornCountry": Regex("Germany")}
print(set(laureates.distinct("bornCountry", criteria)))

# Filter for laureates with a "bornCountry" value starting with "Germany"


criteria = {"bornCountry": Regex("^Germany")}
print(set(laureates.distinct("bornCountry", criteria)))

# Fill in a string value to be sandwiched between the strings "^Germany " and␣
,→"now"

criteria = {"bornCountry": Regex("^Germany " + "\(" + "now")}


print(set(laureates.distinct("bornCountry", criteria)))

# Fill in a string value to be sandwiched between the strings "now" and "$"
criteria = {"bornCountry": Regex("now" + " Germany\)" + "$")}
print(set(laureates.distinct("bornCountry", criteria)))

# filter on "transistor" as a substring of a laureate's "prizes.motivation"␣


,→field value to

# find these laureates.

# Save a filter for laureates with prize motivation values containing␣


,→"transistor" as a substring

criteria = {"prizes.motivation": Regex("transistor")}

10
# Save the field names corresponding to a laureate's first name and last name
first, last = "firstname", "surname"
print([(laureate[first], laureate[last]) for laureate in laureates.
,→find(criteria)])

{'East Friesland (now Germany)', 'Germany (now France)', 'Germany (now Poland)',
'Germany (now Russia)', 'West Germany (now Germany)', 'Germany', 'Hesse-Kassel
(now Germany)', 'W&uuml;rttemberg (now Germany)', 'Mecklenburg (now Germany)',
'Schleswig (now Germany)', 'Prussia (now Germany)', 'Bavaria (now Germany)'}
{'Germany (now Russia)', 'Germany', 'Germany (now France)', 'Germany (now
Poland)'}
{'Germany (now Russia)', 'Germany (now France)', 'Germany (now Poland)'}
{'East Friesland (now Germany)', 'West Germany (now Germany)', 'Hesse-Kassel
(now Germany)', 'W&uuml;rttemberg (now Germany)', 'Mecklenburg (now Germany)',
'Schleswig (now Germany)', 'Prussia (now Germany)', 'Bavaria (now Germany)'}
[('William Bradford', 'Shockley'), ('John', 'Bardeen'), ('Walter Houser',
'Brattain')]

5 Get only what you need and fast


5.1 Projection
In MongoDB, we fetch projections by specifying what document fields interest us. We can do this
by passing a dictionary as a second argument to the find() method of a collection. - For each field
that we want to include in the projection, we give a value of 1. - Fields that we don’t include in the
dictionary are not included in the projection. The exception is a document’s *”_id”* field. The
*”_id”* field is always included in a projection by default. We must assign it the value 0 in the
projection dictionary to leave it out.

[5]: # Find laureates whose first name starts with "G" and last name starts with "S"
# Use projection to select only firstname and surname
docs = laureates.find(
filter= {"firstname" : {"$regex" : "^G"},
"surname" : {"$regex" : "^S"} },
projection= ["firstname", "surname"] )

# Iterate over docs and concatenate first name and surname


full_names = [doc["firstname"] + " " + doc["surname"] for doc in docs]

# Print the full names


print(full_names)

['George D. Snell', 'Gustav Stresemann', 'Glenn Theodore Seaborg', 'George J.


Stigler', 'George F. Smoot', 'George E. Smith', 'George P. Smith', 'George
Bernard Shaw', 'Giorgos Seferis']

11
5.2 Sorting
We pass a “sort” argument to the find() method, giving a list of field-direction pairs. The list of
field-direction pairs can contain multiple entries, you can sort first by one field and then by other
fields, i.e. primary and secondary sorting.
• ascending: 1
• descending: -1
As an alternative to passing extra parameters to the find() method, we can chain the find() method
and the sort() method which takes one parameter for “fieldname” and one parameter for “direction”
(ascending is the default direction).

[16]: # This exercise explores the prizes in the physics category.


# You will use Python to sort laureates for one prize by last name, and
# then MongoDB to sort prizes by year.

from operator import itemgetter

# This function that takes a prize document as an argument, extracts all the␣
,→laureates from

# that document, arranges them in alphabetical order, and returns


# a string containing the last names separated by " and ".
def all_laureates(prize):
# sort the laureates by surname
sorted_laureates = sorted(prize["laureates"], key=itemgetter("surname"))

# extract surnames
surnames = [laureate["surname"] for laureate in sorted_laureates]

# concatenate surnames separated with " and "


all_names = " and ".join(surnames)

return all_names

# find physics prizes, project year and name, and sort by year
docs = prizes.find(
filter= {"category": "physics"},
projection= ["year", "laureates.firstname", "laureates.surname"],
sort= [("year", 1), ("laureates.surname", 1)])

# print the year and laureate names (from all_laureates)


for doc in docs:
print("{year}: {names}".format(year=doc["year"], names=all_laureates(doc)))

1901: Röntgen
1902: Lorentz and Zeeman
1903: Becquerel and Curie and Curie, née Sklodowska
1904: (John William Strutt)

12
1905: von Lenard
1906: Thomson
1907: Michelson
1908: Lippmann
1909: Braun and Marconi
1910: van der Waals
1911: Wien
1912: Dalén
1913: Kamerlingh Onnes
1914: von Laue
1915: Bragg and Bragg
1917: Barkla
1918: Planck
1919: Stark
1920: Guillaume
1921: Einstein
1922: Bohr
1923: Millikan
1924: Siegbahn
1925: Franck and Hertz
1926: Perrin
1927: Compton and Wilson
1928: Richardson
1929: de Broglie
1930: Raman
1932: Heisenberg
1933: Dirac and Schrödinger
1935: Chadwick
1936: Anderson and Hess
1937: Davisson and Thomson
1938: Fermi
1939: Lawrence
1943: Stern
1944: Rabi
1945: Pauli
1946: Bridgman
1947: Appleton
1948: Blackett
1949: Yukawa
1950: Powell
1951: Cockcroft and Walton
1952: Bloch and Purcell
1953: Zernike
1954: Born and Bothe
1955: Kusch and Lamb
1956: Bardeen and Brattain and Shockley
1957: Lee and Yang
1958: Cherenkov and Frank and Tamm

13
1959: Chamberlain and Segrè
1960: Glaser
1961: Hofstadter and Mössbauer
1962: Landau
1963: Goeppert Mayer and Jensen and Wigner
1964: Basov and Prokhorov and Townes
1965: Feynman and Schwinger and Tomonaga
1966: Kastler
1967: Bethe
1968: Alvarez
1969: Gell-Mann
1970: Alfvén and Néel
1971: Gabor
1972: Bardeen and Cooper and Schrieffer
1973: Esaki and Giaever and Josephson
1974: Hewish and Ryle
1975: Bohr and Mottelson and Rainwater
1976: Richter and Ting
1977: Anderson and Mott and van Vleck
1978: Kapitsa and Penzias and Wilson
1979: Glashow and Salam and Weinberg
1980: Cronin and Fitch
1981: Bloembergen and Schawlow and Siegbahn
1982: Wilson
1983: Chandrasekhar and Fowler
1984: Rubbia and van der Meer
1985: von Klitzing
1986: Binnig and Rohrer and Ruska
1987: Bednorz and Müller
1988: Lederman and Schwartz and Steinberger
1989: Dehmelt and Paul and Ramsey
1990: Friedman and Kendall and Taylor
1991: de Gennes
1992: Charpak
1993: Hulse and Taylor Jr.
1994: Brockhouse and Shull
1995: Perl and Reines
1996: Lee and Osheroff and Richardson
1997: Chu and Cohen-Tannoudji and Phillips
1998: Laughlin and Störmer and Tsui
1999: 't Hooft and Veltman
2000: Alferov and Kilby and Kroemer
2001: Cornell and Ketterle and Wieman
2002: Davis Jr. and Giacconi and Koshiba
2003: Abrikosov and Ginzburg and Leggett
2004: Gross and Politzer and Wilczek
2005: Glauber and Hall and Hänsch
2006: Mather and Smoot

14
2007: Fert and Grünberg
2008: Kobayashi and Maskawa and Nambu
2009: Boyle and Kao and Smith
2010: Geim and Novoselov
2011: Perlmutter and Riess and Schmidt
2012: Haroche and Wineland
2013: Englert and Higgs
2014: Akasaki and Amano and Nakamura
2015: Kajita and McDonald
2016: Haldane and Kosterlitz and Thouless
2017: Barish and Thorne and Weiss
2018: Ashkin and Mourou and Strickland

[18]: # utilize sorting by multiple fields to see which prize categories are missing␣
,→in which years.

# Find the original prize categories established in 1901 by


# looking at the distinct values of the "category" field for prizes from year␣
,→1901.

# Use the .distinct("field_name", criteria) method of a collection to


# find distinct values of "field_name" among documents satisfying criteria.
original_categories = prizes.distinct("category", {"year": "1901"})
print(original_categories)

# project year and category, and sort


# Fetch ONLY the year and category from all the documents (without the "_id"␣
,→field)

# Sort by "year" in descending order, then by "category" in ascending order.


docs = prizes.find(
filter={},
projection={"year":1, "category":1, "_id":0},
sort=[("year", -1), ("category", 1)]
)

#print the documents


for doc in docs:
print(doc)

['chemistry', 'literature', 'medicine', 'peace', 'physics']


{'year': '2018', 'category': 'chemistry'}
{'year': '2018', 'category': 'economics'}
{'year': '2018', 'category': 'medicine'}
{'year': '2018', 'category': 'peace'}
{'year': '2018', 'category': 'physics'}
{'year': '2017', 'category': 'chemistry'}
{'year': '2017', 'category': 'economics'}
{'year': '2017', 'category': 'literature'}
{'year': '2017', 'category': 'medicine'}

15
{'year': '2017', 'category': 'peace'}
{'year': '2017', 'category': 'physics'}
{'year': '2016', 'category': 'chemistry'}
{'year': '2016', 'category': 'economics'}
{'year': '2016', 'category': 'literature'}
{'year': '2016', 'category': 'medicine'}
{'year': '2016', 'category': 'peace'}
{'year': '2016', 'category': 'physics'}
{'year': '2015', 'category': 'chemistry'}
{'year': '2015', 'category': 'economics'}
{'year': '2015', 'category': 'literature'}
{'year': '2015', 'category': 'medicine'}
{'year': '2015', 'category': 'peace'}
{'year': '2015', 'category': 'physics'}
{'year': '2014', 'category': 'chemistry'}
{'year': '2014', 'category': 'economics'}
{'year': '2014', 'category': 'literature'}
{'year': '2014', 'category': 'medicine'}
{'year': '2014', 'category': 'peace'}
{'year': '2014', 'category': 'physics'}
{'year': '2013', 'category': 'chemistry'}
{'year': '2013', 'category': 'economics'}
{'year': '2013', 'category': 'literature'}
{'year': '2013', 'category': 'medicine'}
{'year': '2013', 'category': 'peace'}
{'year': '2013', 'category': 'physics'}
{'year': '2012', 'category': 'chemistry'}
{'year': '2012', 'category': 'economics'}
{'year': '2012', 'category': 'literature'}
{'year': '2012', 'category': 'medicine'}
{'year': '2012', 'category': 'peace'}
{'year': '2012', 'category': 'physics'}
{'year': '2011', 'category': 'chemistry'}
{'year': '2011', 'category': 'economics'}
{'year': '2011', 'category': 'literature'}
{'year': '2011', 'category': 'medicine'}
{'year': '2011', 'category': 'peace'}
{'year': '2011', 'category': 'physics'}
{'year': '2010', 'category': 'chemistry'}
{'year': '2010', 'category': 'economics'}
{'year': '2010', 'category': 'literature'}
{'year': '2010', 'category': 'medicine'}
{'year': '2010', 'category': 'peace'}
{'year': '2010', 'category': 'physics'}
{'year': '2009', 'category': 'chemistry'}
{'year': '2009', 'category': 'economics'}
{'year': '2009', 'category': 'literature'}
{'year': '2009', 'category': 'medicine'}

16
{'year': '2009', 'category': 'peace'}
{'year': '2009', 'category': 'physics'}
{'year': '2008', 'category': 'chemistry'}
{'year': '2008', 'category': 'economics'}
{'year': '2008', 'category': 'literature'}
{'year': '2008', 'category': 'medicine'}
{'year': '2008', 'category': 'peace'}
{'year': '2008', 'category': 'physics'}
{'year': '2007', 'category': 'chemistry'}
{'year': '2007', 'category': 'economics'}
{'year': '2007', 'category': 'literature'}
{'year': '2007', 'category': 'medicine'}
{'year': '2007', 'category': 'peace'}
{'year': '2007', 'category': 'physics'}
{'year': '2006', 'category': 'chemistry'}
{'year': '2006', 'category': 'economics'}
{'year': '2006', 'category': 'literature'}
{'year': '2006', 'category': 'medicine'}
{'year': '2006', 'category': 'peace'}
{'year': '2006', 'category': 'physics'}
{'year': '2005', 'category': 'chemistry'}
{'year': '2005', 'category': 'economics'}
{'year': '2005', 'category': 'literature'}
{'year': '2005', 'category': 'medicine'}
{'year': '2005', 'category': 'peace'}
{'year': '2005', 'category': 'physics'}
{'year': '2004', 'category': 'chemistry'}
{'year': '2004', 'category': 'economics'}
{'year': '2004', 'category': 'literature'}
{'year': '2004', 'category': 'medicine'}
{'year': '2004', 'category': 'peace'}
{'year': '2004', 'category': 'physics'}
{'year': '2003', 'category': 'chemistry'}
{'year': '2003', 'category': 'economics'}
{'year': '2003', 'category': 'literature'}
{'year': '2003', 'category': 'medicine'}
{'year': '2003', 'category': 'peace'}
{'year': '2003', 'category': 'physics'}
{'year': '2002', 'category': 'chemistry'}
{'year': '2002', 'category': 'economics'}
{'year': '2002', 'category': 'literature'}
{'year': '2002', 'category': 'medicine'}
{'year': '2002', 'category': 'peace'}
{'year': '2002', 'category': 'physics'}
{'year': '2001', 'category': 'chemistry'}
{'year': '2001', 'category': 'economics'}
{'year': '2001', 'category': 'literature'}
{'year': '2001', 'category': 'medicine'}

17
{'year': '2001', 'category': 'peace'}
{'year': '2001', 'category': 'physics'}
{'year': '2000', 'category': 'chemistry'}
{'year': '2000', 'category': 'economics'}
{'year': '2000', 'category': 'literature'}
{'year': '2000', 'category': 'medicine'}
{'year': '2000', 'category': 'peace'}
{'year': '2000', 'category': 'physics'}
{'year': '1999', 'category': 'chemistry'}
{'year': '1999', 'category': 'economics'}
{'year': '1999', 'category': 'literature'}
{'year': '1999', 'category': 'medicine'}
{'year': '1999', 'category': 'peace'}
{'year': '1999', 'category': 'physics'}
{'year': '1998', 'category': 'chemistry'}
{'year': '1998', 'category': 'economics'}
{'year': '1998', 'category': 'literature'}
{'year': '1998', 'category': 'medicine'}
{'year': '1998', 'category': 'peace'}
{'year': '1998', 'category': 'physics'}
{'year': '1997', 'category': 'chemistry'}
{'year': '1997', 'category': 'economics'}
{'year': '1997', 'category': 'literature'}
{'year': '1997', 'category': 'medicine'}
{'year': '1997', 'category': 'peace'}
{'year': '1997', 'category': 'physics'}
{'year': '1996', 'category': 'chemistry'}
{'year': '1996', 'category': 'economics'}
{'year': '1996', 'category': 'literature'}
{'year': '1996', 'category': 'medicine'}
{'year': '1996', 'category': 'peace'}
{'year': '1996', 'category': 'physics'}
{'year': '1995', 'category': 'chemistry'}
{'year': '1995', 'category': 'economics'}
{'year': '1995', 'category': 'literature'}
{'year': '1995', 'category': 'medicine'}
{'year': '1995', 'category': 'peace'}
{'year': '1995', 'category': 'physics'}
{'year': '1994', 'category': 'chemistry'}
{'year': '1994', 'category': 'economics'}
{'year': '1994', 'category': 'literature'}
{'year': '1994', 'category': 'medicine'}
{'year': '1994', 'category': 'peace'}
{'year': '1994', 'category': 'physics'}
{'year': '1993', 'category': 'chemistry'}
{'year': '1993', 'category': 'economics'}
{'year': '1993', 'category': 'literature'}
{'year': '1993', 'category': 'medicine'}

18
{'year': '1993', 'category': 'peace'}
{'year': '1993', 'category': 'physics'}
{'year': '1992', 'category': 'chemistry'}
{'year': '1992', 'category': 'economics'}
{'year': '1992', 'category': 'literature'}
{'year': '1992', 'category': 'medicine'}
{'year': '1992', 'category': 'peace'}
{'year': '1992', 'category': 'physics'}
{'year': '1991', 'category': 'chemistry'}
{'year': '1991', 'category': 'economics'}
{'year': '1991', 'category': 'literature'}
{'year': '1991', 'category': 'medicine'}
{'year': '1991', 'category': 'peace'}
{'year': '1991', 'category': 'physics'}
{'year': '1990', 'category': 'chemistry'}
{'year': '1990', 'category': 'economics'}
{'year': '1990', 'category': 'literature'}
{'year': '1990', 'category': 'medicine'}
{'year': '1990', 'category': 'peace'}
{'year': '1990', 'category': 'physics'}
{'year': '1989', 'category': 'chemistry'}
{'year': '1989', 'category': 'economics'}
{'year': '1989', 'category': 'literature'}
{'year': '1989', 'category': 'medicine'}
{'year': '1989', 'category': 'peace'}
{'year': '1989', 'category': 'physics'}
{'year': '1988', 'category': 'chemistry'}
{'year': '1988', 'category': 'economics'}
{'year': '1988', 'category': 'literature'}
{'year': '1988', 'category': 'medicine'}
{'year': '1988', 'category': 'peace'}
{'year': '1988', 'category': 'physics'}
{'year': '1987', 'category': 'chemistry'}
{'year': '1987', 'category': 'economics'}
{'year': '1987', 'category': 'literature'}
{'year': '1987', 'category': 'medicine'}
{'year': '1987', 'category': 'peace'}
{'year': '1987', 'category': 'physics'}
{'year': '1986', 'category': 'chemistry'}
{'year': '1986', 'category': 'economics'}
{'year': '1986', 'category': 'literature'}
{'year': '1986', 'category': 'medicine'}
{'year': '1986', 'category': 'peace'}
{'year': '1986', 'category': 'physics'}
{'year': '1985', 'category': 'chemistry'}
{'year': '1985', 'category': 'economics'}
{'year': '1985', 'category': 'literature'}
{'year': '1985', 'category': 'medicine'}

19
{'year': '1985', 'category': 'peace'}
{'year': '1985', 'category': 'physics'}
{'year': '1984', 'category': 'chemistry'}
{'year': '1984', 'category': 'economics'}
{'year': '1984', 'category': 'literature'}
{'year': '1984', 'category': 'medicine'}
{'year': '1984', 'category': 'peace'}
{'year': '1984', 'category': 'physics'}
{'year': '1983', 'category': 'chemistry'}
{'year': '1983', 'category': 'economics'}
{'year': '1983', 'category': 'literature'}
{'year': '1983', 'category': 'medicine'}
{'year': '1983', 'category': 'peace'}
{'year': '1983', 'category': 'physics'}
{'year': '1982', 'category': 'chemistry'}
{'year': '1982', 'category': 'economics'}
{'year': '1982', 'category': 'literature'}
{'year': '1982', 'category': 'medicine'}
{'year': '1982', 'category': 'peace'}
{'year': '1982', 'category': 'physics'}
{'year': '1981', 'category': 'chemistry'}
{'year': '1981', 'category': 'economics'}
{'year': '1981', 'category': 'literature'}
{'year': '1981', 'category': 'medicine'}
{'year': '1981', 'category': 'peace'}
{'year': '1981', 'category': 'physics'}
{'year': '1980', 'category': 'chemistry'}
{'year': '1980', 'category': 'economics'}
{'year': '1980', 'category': 'literature'}
{'year': '1980', 'category': 'medicine'}
{'year': '1980', 'category': 'peace'}
{'year': '1980', 'category': 'physics'}
{'year': '1979', 'category': 'chemistry'}
{'year': '1979', 'category': 'economics'}
{'year': '1979', 'category': 'literature'}
{'year': '1979', 'category': 'medicine'}
{'year': '1979', 'category': 'peace'}
{'year': '1979', 'category': 'physics'}
{'year': '1978', 'category': 'chemistry'}
{'year': '1978', 'category': 'economics'}
{'year': '1978', 'category': 'literature'}
{'year': '1978', 'category': 'medicine'}
{'year': '1978', 'category': 'peace'}
{'year': '1978', 'category': 'physics'}
{'year': '1977', 'category': 'chemistry'}
{'year': '1977', 'category': 'economics'}
{'year': '1977', 'category': 'literature'}
{'year': '1977', 'category': 'medicine'}

20
{'year': '1977', 'category': 'peace'}
{'year': '1977', 'category': 'physics'}
{'year': '1976', 'category': 'chemistry'}
{'year': '1976', 'category': 'economics'}
{'year': '1976', 'category': 'literature'}
{'year': '1976', 'category': 'medicine'}
{'year': '1976', 'category': 'peace'}
{'year': '1976', 'category': 'physics'}
{'year': '1975', 'category': 'chemistry'}
{'year': '1975', 'category': 'economics'}
{'year': '1975', 'category': 'literature'}
{'year': '1975', 'category': 'medicine'}
{'year': '1975', 'category': 'peace'}
{'year': '1975', 'category': 'physics'}
{'year': '1974', 'category': 'chemistry'}
{'year': '1974', 'category': 'economics'}
{'year': '1974', 'category': 'literature'}
{'year': '1974', 'category': 'medicine'}
{'year': '1974', 'category': 'peace'}
{'year': '1974', 'category': 'physics'}
{'year': '1973', 'category': 'chemistry'}
{'year': '1973', 'category': 'economics'}
{'year': '1973', 'category': 'literature'}
{'year': '1973', 'category': 'medicine'}
{'year': '1973', 'category': 'peace'}
{'year': '1973', 'category': 'physics'}
{'year': '1972', 'category': 'chemistry'}
{'year': '1972', 'category': 'economics'}
{'year': '1972', 'category': 'literature'}
{'year': '1972', 'category': 'medicine'}
{'year': '1972', 'category': 'physics'}
{'year': '1971', 'category': 'chemistry'}
{'year': '1971', 'category': 'economics'}
{'year': '1971', 'category': 'literature'}
{'year': '1971', 'category': 'medicine'}
{'year': '1971', 'category': 'peace'}
{'year': '1971', 'category': 'physics'}
{'year': '1970', 'category': 'chemistry'}
{'year': '1970', 'category': 'economics'}
{'year': '1970', 'category': 'literature'}
{'year': '1970', 'category': 'medicine'}
{'year': '1970', 'category': 'peace'}
{'year': '1970', 'category': 'physics'}
{'year': '1969', 'category': 'chemistry'}
{'year': '1969', 'category': 'economics'}
{'year': '1969', 'category': 'literature'}
{'year': '1969', 'category': 'medicine'}
{'year': '1969', 'category': 'peace'}

21
{'year': '1969', 'category': 'physics'}
{'year': '1968', 'category': 'chemistry'}
{'year': '1968', 'category': 'literature'}
{'year': '1968', 'category': 'medicine'}
{'year': '1968', 'category': 'peace'}
{'year': '1968', 'category': 'physics'}
{'year': '1967', 'category': 'chemistry'}
{'year': '1967', 'category': 'literature'}
{'year': '1967', 'category': 'medicine'}
{'year': '1967', 'category': 'physics'}
{'year': '1966', 'category': 'chemistry'}
{'year': '1966', 'category': 'literature'}
{'year': '1966', 'category': 'medicine'}
{'year': '1966', 'category': 'physics'}
{'year': '1965', 'category': 'chemistry'}
{'year': '1965', 'category': 'literature'}
{'year': '1965', 'category': 'medicine'}
{'year': '1965', 'category': 'peace'}
{'year': '1965', 'category': 'physics'}
{'year': '1964', 'category': 'chemistry'}
{'year': '1964', 'category': 'literature'}
{'year': '1964', 'category': 'medicine'}
{'year': '1964', 'category': 'peace'}
{'year': '1964', 'category': 'physics'}
{'year': '1963', 'category': 'chemistry'}
{'year': '1963', 'category': 'literature'}
{'year': '1963', 'category': 'medicine'}
{'year': '1963', 'category': 'peace'}
{'year': '1963', 'category': 'physics'}
{'year': '1962', 'category': 'chemistry'}
{'year': '1962', 'category': 'literature'}
{'year': '1962', 'category': 'medicine'}
{'year': '1962', 'category': 'peace'}
{'year': '1962', 'category': 'physics'}
{'year': '1961', 'category': 'chemistry'}
{'year': '1961', 'category': 'literature'}
{'year': '1961', 'category': 'medicine'}
{'year': '1961', 'category': 'peace'}
{'year': '1961', 'category': 'physics'}
{'year': '1960', 'category': 'chemistry'}
{'year': '1960', 'category': 'literature'}
{'year': '1960', 'category': 'medicine'}
{'year': '1960', 'category': 'peace'}
{'year': '1960', 'category': 'physics'}
{'year': '1959', 'category': 'chemistry'}
{'year': '1959', 'category': 'literature'}
{'year': '1959', 'category': 'medicine'}
{'year': '1959', 'category': 'peace'}

22
{'year': '1959', 'category': 'physics'}
{'year': '1958', 'category': 'chemistry'}
{'year': '1958', 'category': 'literature'}
{'year': '1958', 'category': 'medicine'}
{'year': '1958', 'category': 'peace'}
{'year': '1958', 'category': 'physics'}
{'year': '1957', 'category': 'chemistry'}
{'year': '1957', 'category': 'literature'}
{'year': '1957', 'category': 'medicine'}
{'year': '1957', 'category': 'peace'}
{'year': '1957', 'category': 'physics'}
{'year': '1956', 'category': 'chemistry'}
{'year': '1956', 'category': 'literature'}
{'year': '1956', 'category': 'medicine'}
{'year': '1956', 'category': 'physics'}
{'year': '1955', 'category': 'chemistry'}
{'year': '1955', 'category': 'literature'}
{'year': '1955', 'category': 'medicine'}
{'year': '1955', 'category': 'physics'}
{'year': '1954', 'category': 'chemistry'}
{'year': '1954', 'category': 'literature'}
{'year': '1954', 'category': 'medicine'}
{'year': '1954', 'category': 'peace'}
{'year': '1954', 'category': 'physics'}
{'year': '1953', 'category': 'chemistry'}
{'year': '1953', 'category': 'literature'}
{'year': '1953', 'category': 'medicine'}
{'year': '1953', 'category': 'peace'}
{'year': '1953', 'category': 'physics'}
{'year': '1952', 'category': 'chemistry'}
{'year': '1952', 'category': 'literature'}
{'year': '1952', 'category': 'medicine'}
{'year': '1952', 'category': 'peace'}
{'year': '1952', 'category': 'physics'}
{'year': '1951', 'category': 'chemistry'}
{'year': '1951', 'category': 'literature'}
{'year': '1951', 'category': 'medicine'}
{'year': '1951', 'category': 'peace'}
{'year': '1951', 'category': 'physics'}
{'year': '1950', 'category': 'chemistry'}
{'year': '1950', 'category': 'literature'}
{'year': '1950', 'category': 'medicine'}
{'year': '1950', 'category': 'peace'}
{'year': '1950', 'category': 'physics'}
{'year': '1949', 'category': 'chemistry'}
{'year': '1949', 'category': 'literature'}
{'year': '1949', 'category': 'medicine'}
{'year': '1949', 'category': 'peace'}

23
{'year': '1949', 'category': 'physics'}
{'year': '1948', 'category': 'chemistry'}
{'year': '1948', 'category': 'literature'}
{'year': '1948', 'category': 'medicine'}
{'year': '1948', 'category': 'physics'}
{'year': '1947', 'category': 'chemistry'}
{'year': '1947', 'category': 'literature'}
{'year': '1947', 'category': 'medicine'}
{'year': '1947', 'category': 'peace'}
{'year': '1947', 'category': 'physics'}
{'year': '1946', 'category': 'chemistry'}
{'year': '1946', 'category': 'literature'}
{'year': '1946', 'category': 'medicine'}
{'year': '1946', 'category': 'peace'}
{'year': '1946', 'category': 'physics'}
{'year': '1945', 'category': 'chemistry'}
{'year': '1945', 'category': 'literature'}
{'year': '1945', 'category': 'medicine'}
{'year': '1945', 'category': 'peace'}
{'year': '1945', 'category': 'physics'}
{'year': '1944', 'category': 'chemistry'}
{'year': '1944', 'category': 'literature'}
{'year': '1944', 'category': 'medicine'}
{'year': '1944', 'category': 'peace'}
{'year': '1944', 'category': 'physics'}
{'year': '1943', 'category': 'chemistry'}
{'year': '1943', 'category': 'medicine'}
{'year': '1943', 'category': 'physics'}
{'year': '1939', 'category': 'chemistry'}
{'year': '1939', 'category': 'literature'}
{'year': '1939', 'category': 'medicine'}
{'year': '1939', 'category': 'physics'}
{'year': '1938', 'category': 'chemistry'}
{'year': '1938', 'category': 'literature'}
{'year': '1938', 'category': 'medicine'}
{'year': '1938', 'category': 'peace'}
{'year': '1938', 'category': 'physics'}
{'year': '1937', 'category': 'chemistry'}
{'year': '1937', 'category': 'literature'}
{'year': '1937', 'category': 'medicine'}
{'year': '1937', 'category': 'peace'}
{'year': '1937', 'category': 'physics'}
{'year': '1936', 'category': 'chemistry'}
{'year': '1936', 'category': 'literature'}
{'year': '1936', 'category': 'medicine'}
{'year': '1936', 'category': 'peace'}
{'year': '1936', 'category': 'physics'}
{'year': '1935', 'category': 'chemistry'}

24
{'year': '1935', 'category': 'medicine'}
{'year': '1935', 'category': 'peace'}
{'year': '1935', 'category': 'physics'}
{'year': '1934', 'category': 'chemistry'}
{'year': '1934', 'category': 'literature'}
{'year': '1934', 'category': 'medicine'}
{'year': '1934', 'category': 'peace'}
{'year': '1933', 'category': 'literature'}
{'year': '1933', 'category': 'medicine'}
{'year': '1933', 'category': 'peace'}
{'year': '1933', 'category': 'physics'}
{'year': '1932', 'category': 'chemistry'}
{'year': '1932', 'category': 'literature'}
{'year': '1932', 'category': 'medicine'}
{'year': '1932', 'category': 'physics'}
{'year': '1931', 'category': 'chemistry'}
{'year': '1931', 'category': 'literature'}
{'year': '1931', 'category': 'medicine'}
{'year': '1931', 'category': 'peace'}
{'year': '1930', 'category': 'chemistry'}
{'year': '1930', 'category': 'literature'}
{'year': '1930', 'category': 'medicine'}
{'year': '1930', 'category': 'peace'}
{'year': '1930', 'category': 'physics'}
{'year': '1929', 'category': 'chemistry'}
{'year': '1929', 'category': 'literature'}
{'year': '1929', 'category': 'medicine'}
{'year': '1929', 'category': 'peace'}
{'year': '1929', 'category': 'physics'}
{'year': '1928', 'category': 'chemistry'}
{'year': '1928', 'category': 'literature'}
{'year': '1928', 'category': 'medicine'}
{'year': '1928', 'category': 'physics'}
{'year': '1927', 'category': 'chemistry'}
{'year': '1927', 'category': 'literature'}
{'year': '1927', 'category': 'medicine'}
{'year': '1927', 'category': 'peace'}
{'year': '1927', 'category': 'physics'}
{'year': '1926', 'category': 'chemistry'}
{'year': '1926', 'category': 'literature'}
{'year': '1926', 'category': 'medicine'}
{'year': '1926', 'category': 'peace'}
{'year': '1926', 'category': 'physics'}
{'year': '1925', 'category': 'chemistry'}
{'year': '1925', 'category': 'literature'}
{'year': '1925', 'category': 'peace'}
{'year': '1925', 'category': 'physics'}
{'year': '1924', 'category': 'literature'}

25
{'year': '1924', 'category': 'medicine'}
{'year': '1924', 'category': 'physics'}
{'year': '1923', 'category': 'chemistry'}
{'year': '1923', 'category': 'literature'}
{'year': '1923', 'category': 'medicine'}
{'year': '1923', 'category': 'physics'}
{'year': '1922', 'category': 'chemistry'}
{'year': '1922', 'category': 'literature'}
{'year': '1922', 'category': 'medicine'}
{'year': '1922', 'category': 'peace'}
{'year': '1922', 'category': 'physics'}
{'year': '1921', 'category': 'chemistry'}
{'year': '1921', 'category': 'literature'}
{'year': '1921', 'category': 'peace'}
{'year': '1921', 'category': 'physics'}
{'year': '1920', 'category': 'chemistry'}
{'year': '1920', 'category': 'literature'}
{'year': '1920', 'category': 'medicine'}
{'year': '1920', 'category': 'peace'}
{'year': '1920', 'category': 'physics'}
{'year': '1919', 'category': 'literature'}
{'year': '1919', 'category': 'medicine'}
{'year': '1919', 'category': 'peace'}
{'year': '1919', 'category': 'physics'}
{'year': '1918', 'category': 'chemistry'}
{'year': '1918', 'category': 'physics'}
{'year': '1917', 'category': 'literature'}
{'year': '1917', 'category': 'peace'}
{'year': '1917', 'category': 'physics'}
{'year': '1916', 'category': 'literature'}
{'year': '1915', 'category': 'chemistry'}
{'year': '1915', 'category': 'literature'}
{'year': '1915', 'category': 'physics'}
{'year': '1914', 'category': 'chemistry'}
{'year': '1914', 'category': 'medicine'}
{'year': '1914', 'category': 'physics'}
{'year': '1913', 'category': 'chemistry'}
{'year': '1913', 'category': 'literature'}
{'year': '1913', 'category': 'medicine'}
{'year': '1913', 'category': 'peace'}
{'year': '1913', 'category': 'physics'}
{'year': '1912', 'category': 'chemistry'}
{'year': '1912', 'category': 'literature'}
{'year': '1912', 'category': 'medicine'}
{'year': '1912', 'category': 'peace'}
{'year': '1912', 'category': 'physics'}
{'year': '1911', 'category': 'chemistry'}
{'year': '1911', 'category': 'literature'}

26
{'year': '1911', 'category': 'medicine'}
{'year': '1911', 'category': 'peace'}
{'year': '1911', 'category': 'physics'}
{'year': '1910', 'category': 'chemistry'}
{'year': '1910', 'category': 'literature'}
{'year': '1910', 'category': 'medicine'}
{'year': '1910', 'category': 'peace'}
{'year': '1910', 'category': 'physics'}
{'year': '1909', 'category': 'chemistry'}
{'year': '1909', 'category': 'literature'}
{'year': '1909', 'category': 'medicine'}
{'year': '1909', 'category': 'peace'}
{'year': '1909', 'category': 'physics'}
{'year': '1908', 'category': 'chemistry'}
{'year': '1908', 'category': 'literature'}
{'year': '1908', 'category': 'medicine'}
{'year': '1908', 'category': 'peace'}
{'year': '1908', 'category': 'physics'}
{'year': '1907', 'category': 'chemistry'}
{'year': '1907', 'category': 'literature'}
{'year': '1907', 'category': 'medicine'}
{'year': '1907', 'category': 'peace'}
{'year': '1907', 'category': 'physics'}
{'year': '1906', 'category': 'chemistry'}
{'year': '1906', 'category': 'literature'}
{'year': '1906', 'category': 'medicine'}
{'year': '1906', 'category': 'peace'}
{'year': '1906', 'category': 'physics'}
{'year': '1905', 'category': 'chemistry'}
{'year': '1905', 'category': 'literature'}
{'year': '1905', 'category': 'medicine'}
{'year': '1905', 'category': 'peace'}
{'year': '1905', 'category': 'physics'}
{'year': '1904', 'category': 'chemistry'}
{'year': '1904', 'category': 'literature'}
{'year': '1904', 'category': 'medicine'}
{'year': '1904', 'category': 'peace'}
{'year': '1904', 'category': 'physics'}
{'year': '1903', 'category': 'chemistry'}
{'year': '1903', 'category': 'literature'}
{'year': '1903', 'category': 'medicine'}
{'year': '1903', 'category': 'peace'}
{'year': '1903', 'category': 'physics'}
{'year': '1902', 'category': 'chemistry'}
{'year': '1902', 'category': 'literature'}
{'year': '1902', 'category': 'medicine'}
{'year': '1902', 'category': 'peace'}
{'year': '1902', 'category': 'physics'}

27
{'year': '1901', 'category': 'chemistry'}
{'year': '1901', 'category': 'literature'}
{'year': '1901', 'category': 'medicine'}
{'year': '1901', 'category': 'peace'}
{'year': '1901', 'category': 'physics'}

5.3 Indexing
An index in MongoDB is a special data structure that holds the data of some fields of documents
on which the index is created. - Indexes improve the speed of search operations in database because
instead of searching the whole document, the search is performed on the indexes that holds only
few fields. - On the other hand, having too many indexes can hamper the performance of insert,
update and delete operations because of the additional write and additional data space used by
indexes.
When to use index - First, when you expect to get only one or a few documents back. If your
typical queries fetch most if not all documents, you might as well scan the whole collection. Making
Mongo maintain an index is a waste of time. - Second, when you have very large documents or
very large collections. Rather than load these into memory from disk, Mongo can use much-smaller
indexes.
Index operations - Create index - An index model is a list of (field, direction) pairs, where
direction is either 1 (ascending) or -1 (descending). - db.collection_name.createIndex({field_name:
1 or -1})
• Finding index in a collection
– db.collection_name.getIndexes()
• Droping index
– db.collection_name.dropIndex({index_name: 1})
– db.collection_name.dropIndexes()

[20]: # Example of creating an index that speeds up finding prizes by category and
# then sorting results by decreasing year

# Specify an index model for compound sorting


index_model = [("category", 1), ("year", -1)]
prizes.create_index(index_model)

# Collect the last single-laureate year for each category


# Save a string report for printing the last single-laureate year for each␣
,→distinct category,

# one category per line.


report = ""

# for each distinct prize category, find the latest-year prize (requiring a␣
,→descending sort by year)

28
# of that category (so, find matches for that category) with a laureate share␣
,→of "1".

for category in sorted(prizes.distinct("category")):


doc = prizes.find_one(
{"category": category, "laureates.share": "1"},
sort=[("year", -1)]
)
report += "{category}: {year}\n".format(**doc)

print(report)

chemistry: 2011
economics: 2017
literature: 2017
medicine: 2016
peace: 2017
physics: 1992

[4]: # Some countries are, for one or more laureates, both their country of birth␣
,→("bornCountry") and

# a country of affiliation for one or more of their prizes ("prizes.


,→affiliations.country").

# You will find the five countries of birth with the highest counts of such␣
,→laureates.

from collections import Counter

# Create an index on country of birth


# to ensure efficient gathering of distinct values and counting of documents

laureates.create_index([("bornCountry", 1)])

# Collect a count of laureates for each country of birth


n_born_and_affiliated = {
country: laureates.count_documents({
"bornCountry": country,
"prizes.affiliations.country": country
})
for country in laureates.distinct("bornCountry")
}

five_most_common = Counter(n_born_and_affiliated).most_common(5)
print(five_most_common)

[('USA', 241), ('United Kingdom', 56), ('France', 26), ('Germany', 19),


('Japan', 17)]

29
5.4 Limits
To limit the result in MongoDB, we use the limit() method. The limit() method takes one param-
eter, a number defining how many documents to return.
Besides limiting the number of results, we can also skip results server-side. When you use the
“skip” parameter in conjunction with limits, you can get pagination, with the number of results
per page set by the limit parameter.

[7]: # Find the first five prizes with one or more laureates sharing 1/4 of the␣
,→prize.

# Project our prize category, year, and laureates' motivations.

from pprint import pprint

# Save to filter_ the filter document to fetch only prizes with one or more␣
,→quarter-share laureates,

# i.e. with a "laureates.share" of "4".


filter_ = {'laureates.share': '4'}

# Save to projection the list of field names so that prize category, year and
# laureates' motivations ("laureates.motivation") may be fetched for inspection.
projection = ['category', 'year', 'laureates.motivation']

# Save to cursor a cursor that will yield prizes, sorted by ascending year.
# Limit this to five prizes, and sort using the most concise specification.
cursor = prizes.find(filter_, projection).sort("year").limit(5)
pprint(list(cursor))

[{'_id': ObjectId('5fcf861e166123acf44a9fc0'),
'category': 'physics',
'laureates': [{'motivation': '"in recognition of the extraordinary services '
'he has rendered by his discovery of '
'spontaneous radioactivity"'},
{'motivation': '"in recognition of the extraordinary services '
'they have rendered by their joint researches '
'on the radiation phenomena discovered by '
'Professor Henri Becquerel"'},
{'motivation': '"in recognition of the extraordinary services '
'they have rendered by their joint researches '
'on the radiation phenomena discovered by '
'Professor Henri Becquerel"'}],
'year': '1903'},
{'_id': ObjectId('5fcf861e166123acf44a9f67'),
'category': 'chemistry',
'laureates': [{'motivation': '"for his discovery that enzymes can be '
'crystallized"'},
{'motivation': '"for their preparation of enzymes and virus '
'proteins in a pure form"'},

30
{'motivation': '"for their preparation of enzymes and virus '
'proteins in a pure form"'}],
'year': '1946'},
{'_id': ObjectId('5fcf861e166123acf44a9f40'),
'category': 'medicine',
'laureates': [{'motivation': '"for their discovery of the course of the '
'catalytic conversion of glycogen"'},
{'motivation': '"for their discovery of the course of the '
'catalytic conversion of glycogen"'},
{'motivation': '"for his discovery of the part played by the '
'hormone of the anterior pituitary lobe in the '
'metabolism of sugar"'}],
'year': '1947'},
{'_id': ObjectId('5fcf861e166123acf44a9f21'),
'category': 'medicine',
'laureates': [{'motivation': '"for their discovery that genes act by '
'regulating definite chemical events"'},
{'motivation': '"for their discovery that genes act by '
'regulating definite chemical events"'},
{'motivation': '"for his discoveries concerning genetic '
'recombination and the organization of the '
'genetic material of bacteria"'}],
'year': '1958'},
{'_id': ObjectId('5fcf861e166123acf44a9f01'),
'category': 'physics',
'laureates': [{'motivation': '"for his contributions to the theory of the '
'atomic nucleus and the elementary particles, '
'particularly through the discovery and '
'application of fundamental symmetry '
'principles"'},
{'motivation': '"for their discoveries concerning nuclear '
'shell structure"'},
{'motivation': '"for their discoveries concerning nuclear '
'shell structure"'}],
'year': '1963'}]

6 Aggregation pipeline
The aggregation pipeline is a framework for data aggregation modeled on the concept of data
processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into
aggregated results. Aggregation pipelines can be constructed for flexible and powerful analyses.
The MongoDB aggregation pipeline consists of a list / sequence of stages. Each stage transforms
the documents as they pass through the pipeline.
db.collection.aggregate([stage_1,stage_2, …])

31
Various stages in pipeline are:
• $project – select, reshape data
• $match – filter data
• $group – aggregate data
• $sort – sorts data
• $skip – skips data
• $limit – limit data
• $unwind – normalizes data

[9]: # Create an aggregation pipieline that yields birth-country and␣


,→prize-affiliation-country information

# for three non-organization laureates:

# Translate cursor to aggregation pipeline


# the find collection method's "filter" parameter maps to the "$match"␣
,→aggregation stage,

# its "projection" parameter maps to the "$project" stage, and


# the "limit" parameter (or cursor method) maps to the "$limit" stage.
pipeline = [
{"$match": {"gender": {"$ne": "org"}}},
{"$project": {"bornCountry": 1, "prizes.affiliations.country": 1}},
{"$limit": 3}
]

for doc in laureates.aggregate(pipeline):


print("{bornCountry}: {prizes}".format(**doc))

the Netherlands: [{'affiliations': [{'country': 'the Netherlands'}]}]


USA: [{'affiliations': [{'country': 'USA'}]}]
USA: [{'affiliations': [{'country': 'USA'}]}]

[12]: # Construct an aggregation pipeline to collect, in reverse chronological order␣


,→(ie. descending year),

# prize documents for all original categories (that is, $in categories awarded␣
,→in 1901).

# Project only the prize year and category (including document _id is fine).

32
# The aggregation cursor will be fed to Python's itertools.groupby function to␣
,→group prizes by year.

# For each year that at least one of the original prize categories was missing,
# a line with all missing categories for that year will be printed.

from collections import OrderedDict


from itertools import groupby
from operator import itemgetter

original_categories = set(prizes.distinct("category", {"year": "1901"}))

# Save an pipeline to collect original-category prizes


pipeline = [
{"$match": {"category": {"$in": list(original_categories)}}},
{"$project": {"category": 1, "year": 1}},
{"$sort": OrderedDict([("year", -1)])}
]
cursor = prizes.aggregate(pipeline)
for key, group in groupby(cursor, key=itemgetter("year")):
missing = original_categories - {doc["category"] for doc in group}
if missing:
print("{year}: {missing}".format(year=key, missing=", ".
,→join(sorted(missing))))

2018: literature
1972: peace
1967: peace
1966: peace
1956: peace
1955: peace
1948: peace
1943: literature, peace
1939: peace
1935: literature
1934: physics
1933: chemistry
1932: peace
1931: physics
1928: peace
1925: medicine
1924: chemistry, peace
1923: peace
1921: medicine
1919: chemistry
1918: literature, medicine, peace
1917: chemistry, medicine
1916: chemistry, medicine, peace, physics

33
1915: medicine, peace
1914: literature, peace

[14]: # Fill out pipeline to determine the number of prizes awarded (at least partly)␣
,→to organizations.

# To do this, you'll first need to $match on the "gender" that designates␣


,→organizations.

# Then, use a field path to project the number of prizes for each organization
# as the "$size" of the "prizes" array.
# Recall that to specify the value of a field "<my_field>", you use the field␣
,→path "$<my_field>".

# Finally, use a single group {"_id": None} to sum over values of all␣
,→organizations' prize counts

pipeline = [
{"$match": {"gender": "org"}},
{"$project": {"n_prizes": {"$size": "$prizes"}}},
{"$group": {"_id": None, "n_prizes_total": {"$sum": "$n_prizes"}}}
]

print(list(laureates.aggregate(pipeline)))

[{'_id': None, 'n_prizes_total': 27}]

[17]: # Use an aggregation pipeline that:


# 1. Filters for original prize categories (i.e. sans economics),
# 2. Projects category and year,
# 3. Groups distinct prize categories awarded by year,
# 4. Projects prize categories not awarded by year,
# 5. Filters for years with missing prize categories, and
# 6. Returns a cursor of documents in reverse chronological order, one per␣
,→year,

# each with a list of missing prize categories for that year.

from collections import OrderedDict

original_categories = sorted(set(prizes.distinct("category", {"year": "1901"})))


pipeline = [
{"$match": {"category": {"$in": original_categories}}},
{"$project": {"category": 1, "year": 1}},

# Collect the set of category values for each prize year.


# Make $group stage output a document for each prize year (set "_id" to the␣
,→field path for year)

34
# with the set of categories awarded that year.
{"$group": {"_id": "$year", "categories": {"$addToSet": "$category"}}},

# Project categories *not* awarded (i.e., that are missing this year).
# Given your intermediate collection of year-keyed documents,
# $project a field named "missing" with the (original) categories not␣
,→awarded that year.

# Again, mind your field paths!


{"$project": {"missing": {"$setDifference": [original_categories,␣
,→"$categories"]}}},

# Only include years with at least one missing category


# Use a $match stage to only pass through documents with at least one␣
,→missing prize category.

{"$match": {"missing.0": {"$exists": True}}},

# Finally, add sort documents in descending order.


# Sort in reverse chronological order. Note that "_id" is a distinct year␣
,→at this stage.

{"$sort": OrderedDict([("_id", -1)])},


]
for doc in prizes.aggregate(pipeline):
print("{year}: {missing}".format(year=doc["_id"],missing=", ".
,→join(sorted(doc["missing"]))))

2018: literature
1972: peace
1967: peace
1966: peace
1956: peace
1955: peace
1948: peace
1943: literature, peace
1939: peace
1935: literature
1934: physics
1933: chemistry
1932: peace
1931: physics
1928: peace
1925: medicine
1924: chemistry, peace
1923: peace
1921: medicine
1919: chemistry
1918: literature, medicine, peace
1917: chemistry, medicine

35
1916: chemistry, medicine, peace, physics
1915: medicine, peace
1914: literature, peace

[18]: # Build an aggregation pipeline to get the count of laureates who either did or␣
,→did not win a prize

# with an affiliation country that is a substring of their country of birth --␣


,→for example,

# the prize affiliation country "Germany" should match the country of birth␣
,→"Prussia (now Germany)".

key_ac = "prizes.affiliations.country"
key_bc = "bornCountry"
pipeline = [
{"$project": {key_bc: 1, key_ac: 1}},

# Use $unwind stages to ensure a single prize affiliation country per␣


,→ pipeline document.
{"$unwind": "$prizes"},
{"$unwind": "$prizes.affiliations"},

# Ensure values in the list of distinct values (so not empty)


# Filter out prize-affiliation-country values that are "empty" (null, not␣
,→present, etc.)

# -- ensure values are "$in" the list of known values.


{"$match": {key_ac: {"$in": laureates.distinct(key_ac)}}},
{"$project": {"affilCountrySameAsBorn": {
"$gte": [{"$indexOfBytes": ["$"+key_ac, "$"+key_bc]}, 0]}}},

# Count by "$affilCountrySameAsBorn" value (True or False)


# Produce a count of documents for each value of "affilCountrySameAsBorn"
# (a field we've projected for you using the $indexOfBytes operator) by
# adding 1 to the running sum.
{"$group": {"_id": "$affilCountrySameAsBorn",
"count": {"$sum": 1}}},
]
for doc in laureates.aggregate(pipeline): print(doc)

{'_id': False, 'count': 261}


{'_id': True, 'count': 477}

[21]: # Some prize categories have laureates hailing from a greater number of␣
,→countries than

# do other categories. You will build an aggregation pipeline for the prizes␣
,→collection to

# collect these numbers, using a $lookup stage to obtain laureate countries of␣
,→birth.

36
pipeline = [
# Unwind the laureates array
# $unwind the laureates array field to output one pipeline document for␣
,→each array element.

{"$unwind": "$laureates"},

# After pulling in laureate bios with a $lookup stage,


# unwind the new laureate_bios array field (each laureate has only a single␣
,→biography document).

{"$lookup": {
"from": "laureates", "foreignField": "id",
"localField": "laureates.id", "as": "laureate_bios"}},

# Unwind the new laureate_bios array


{"$unwind": "$laureate_bios"},
{"$project": {"category": 1,
"bornCountry": "$laureate_bios.bornCountry"}},

# Collect the set of bornCountries associated with each prize category.


{"$group": {"_id": "$category",
"bornCountries": {"$addToSet": "$bornCountry"}}},

# Project out the size of each category's (set of) bornCountries


{"$project": {"category": 1,
"nBornCountries": {"$size": "$bornCountries"}}},
{"$sort": {"nBornCountries": -1}},
]

for doc in prizes.aggregate(pipeline): print(doc)

{'_id': 'literature', 'nBornCountries': 55}


{'_id': 'peace', 'nBornCountries': 50}
{'_id': 'chemistry', 'nBornCountries': 48}
{'_id': 'physics', 'nBornCountries': 44}
{'_id': 'medicine', 'nBornCountries': 44}
{'_id': 'economics', 'nBornCountries': 21}

[ ]:

37

You might also like