Nested Data Structures in Python
Nested Data Structures in Python
In fact, the items in a list can be any type of python object. For example, we can have a list of lists.
Line 2 prints out the first item from the list that nested1 is bound to. That item is itself a list,
so it prints out with square brackets. It has length 3, which prints out on line 3. Line 4 adds a new
item to nested1. It is a list with one element, ‘i’ (it a list with one element, it’s not just the string ‘i’).
When you get to step 4 of the execution, take a look at the object that variable nested1
points to. It is a list of three items, numbered 0, 1, and 2. The item in slot 1 is small enough that it is
shown right there as a list containing items “d” and “e”. The item in slot 0 didn’t quite fit, so it is
shown in the figure as a pointer to another separate list; same thing for the item in slot 2, the list
['f', 'g', 'h'].
There’s no special meaning to whether the list is shown embedded or with a pointer to it: that’s just
CodeLens making the best use of space that it can. In fact, if you go on to step 5, you’ll see that,
with the addition of a fourth item, the list [‘i’], CodeLens has chosen to show all four lists embedded
in the top-level list.
With a nested list, you can make complex expressions to get or set a value in a sub-list.
Lines 1-4 above probably look pretty natural to you. Line 5 illustrates the left to right processing of
expressions. nested1[1] evaluates to the second inner list, so nested1[1][1] evaluates to its second
element, 'e'. Line 6 is just a reminder that you index into a literal list, one that is written out, the
same way as you can index into a list referred to by a variable. [10, 20, 30] creates a list. [1]
indexes into that list, pulling out the second item, 20.
Just as with a function call where the return value can be thought of as replacing the text of the
function call in an expression, you can evaluate an expression like that in line 7 from left to right.
Because the value of nested1[1] is the list ['d', 'e'], nested1[1][0] is the same as ['d',
'e'][0]. So line 7 is equivalent to lines 2 and 4; it is a simpler way of pulling out the first item from
the second list.
At first, expressions like that on line 7 may look foreign. They will soon feel more natural, and you
will end up using them a lot. Once you are comfortable with them, the only time you will write code
like lines 2-4 is when you aren’t quite sure what your data’s structure is, and so you need to
incrementally write and debug your code. Often, you will start by writing code like lines 2-4, then,
once you’re sure it’s working, replace it with something like line 7.
def square(x):
return x*x
L = [square, abs, lambda x: x+1]
print("****names****")
for f in L:
print(f)
print("****call each of them****")
for f in L:
print(f(-2))
print("****just the first one in the list****")
print(L[0])
print(L[0](3))
Here, L is a list with three items. All those items are functions. The first is the function square
that is defined on lines 1 and 2. The second is the built-in python function abs. The third is an
anonymous function that returns one more than its input.
In the first for loop, we do not call the functions, we just output their printed representations.
The output <function square> confirms that square truly is a function object. For some reason, in our
online environment, it’s not able to produce a nice printed representation of the built-in function abs,
so it just outputs <unknown>
In the second for loop, we call each of the functions, passing in the value -2 each time and
printing whatever value the function returns.
The last two lines just emphasize that there’s nothing special about lists of functions. They
follow all the same rules for how python treats any other list. Because L[0] picks out the function
square, L[0](3) calls the function square, passing it the parameter 3
Nested Dictionaries
Just as lists can contain items of any type, the value associated with a key in a dictionary can
also be an object of any type. In particular, it is often useful to have a list or a dictionary as a value in
a dictionary. And of course, those lists or dictionaries can also contain lists and dictionaries. There
can be many layers of nesting.
Only the values in dictionaries can be objects of arbitrary type. The keys in dictionaries must
be one of the immutable data types (numbers, strings, tuples).
Again, python provides a module for doing this. The module is called json. We will be using two
functions in this module, loads and dumps.
json.loads() takes a string as input and produces a python object (a dictionary or a list) as output.
Consider, for example, some data that we might get from Apple’s iTunes, in the JSON format:
import json
a_string = '\n\n\n{\n "resultCount":25,\n "results":
[\n{"wrapperType":"track", "kind":"podcast", "collectionId":10892}]}'
print(a_string)
d = json.loads(a_string)
print("------")
print(type(d))
print(d.keys())
print(d['resultCount'])
# print(a_string['resultCount'])
The other function we will use is dumps. It does the inverse of loads. It takes a python object,
typically a dictionary or a list, and returns a string, in JSON format. It has a few other parameters.
Two useful parameters are sort_keys and indent. When the value True is passed for the sort_keys
parameter, the keys of dictionaries are output in alphabetic order with their values
import json
def pretty(obj):
return json.dumps(obj, sort_keys=True, indent=2)
d = {'key1': {'c': True, 'a': 90, '5': 50}, 'key2':{'b': 3, 'c': "yes"}}
print(d)
print('--------')
print(pretty(d))
Nested Iteration
When you have nested data structures, especially lists and/or dictionaries, you will frequently need
nested for loops to traverse them.
Line 3 executes once for each top-level list, three times in all. With each sub-list, line 5 executes
once for each item in the sub-list. Try stepping through it in Codelens to make sure you understand
what the nested iteration does.
Structuring Nested Data
When constructing your own nested data, it is a good idea to keep the structure consistent across
each level. For example, if you have a list of dictionaries, then each dictionary should have the same
structure, meaning the same keys and the same type of value associated with a particular key in all
the dictionaries. The reason for this is because any deviation in the structure that is used will require
extra code to handle those special cases. The more the structure deviates, the more you will have to
use special cases.
For example, let’s reconsider this nested iteration, but suppose not all the items in the outer list are
lists.
We can solve this with special casing, a conditional that checks the type.
Assuming that you don’t want to have aliased lists inside of your nested list, then you’ll need to
perform nested iteration.
Or, equivalently, you could take advantage of the slice operator to do the copying of the inner list.
This process above works fine when there are only two layers or levels in a nested list. However, if
we want to make a copy of a nested list that has more than two levels, then we recommend using
the copy module. In the copy module there is a method called deepcopy that will take care of the
operation for you.
import copy
original = [['canines', ['dogs', 'puppies']], ['felines', ['cats', 'kittens']]]
shallow_copy_version = original[:]
deeply_copied_version = copy.deepcopy(original)
original.append("Hi there")
original[0].append(["marsupials"])
print("-------- Original -----------")
print(original)
print("-------- deep copy -----------")
print(deeply_copied_version)
print("-------- shallow copy -----------")
print(shallow_copy_version)
Follow the system described below and you will have success with extracting nested data. The
process involves the following steps:
To illustrate this, we will walk through extracting information from data formatted in a way that it’s
return by the Twitter API. This nested dictionary results from querying Twitter, asking for three tweets
matching “University of Michigan”. As you’ll see, it’s quite a daunting data structure, even when
printed with nice indentation as it’s shown below.
res = {
"search_metadata": {
"count": 3,
"completed_in": 0.015,
"max_id_str": "536624519285583872",
"since_id_str": "0",
"next_results":
"?max_id=536623674942439424&q=University%20of%20Michigan&count=3&include_entiti
es=1",
"refresh_url":
"?since_id=536624519285583872&q=University%20of%20Michigan&include_entities=1",
"since_id": 0,
"query": "University+of+Michigan",
"max_id": 536624519285583872
},
"statuses": [
{
"contributors": None,
"truncated": False,
"text": "RT @mikeweber25: I'm decommiting from the university of Michigan
thank you Michigan for the love and support I'll remake my decision at
the\u2026",
"in_reply_to_status_id": None,
"id": 536624519285583872,
"favorite_count": 0,
"source": "<a href=\"http://twitter.com/download/iphone\"
rel=\"nofollow\">Twitter for iPhone</a>",
"retweeted": False,
"coordinates": None,
"entities": {
"symbols": [],
"user_mentions": [
{
"id": 1119996684,
"indices": [
3,
15
],
"id_str": "1119996684",
"screen_name": "mikeweber25",
"name": "Mikey"
}
],
"hashtags": [],
"urls": []
},
"in_reply_to_screen_name": None,
"in_reply_to_user_id": None,
"retweet_count": 2014,
"id_str": "536624519285583872",
"favorited": False,
"retweeted_status": {
"contributors": None,
"truncated": False,
"text": "I'm decommiting from the university of Michigan thank you
Michigan for the love and support I'll remake my decision at the army bowl",
"in_reply_to_status_id": None,
"id": 536300265616322560,
"favorite_count": 1583,
"source": "<a href=\"http://twitter.com/download/iphone\"
rel=\"nofollow\">Twitter for iPhone</a>",
"retweeted": False,
"coordinates": None,
"entities": {
"symbols": [],
"user_mentions": [],
"hashtags": [],
"urls": []
},
"in_reply_to_screen_name": None,
"in_reply_to_user_id": None,
"retweet_count": 2014,
"id_str": "536300265616322560",
"favorited": False,
"user": {
"follow_request_sent": False,
"profile_use_background_image": True,
"profile_text_color": "666666",
"default_profile_image": False,
"id": 1119996684,
"profile_background_image_url_https":
"https://abs.twimg.com/images/themes/theme9/bg.gif",
"verified": False,
"profile_location": None,
"profile_image_url_https":
"https://pbs.twimg.com/profile_images/534465900343083008/A09dIq1d_normal.jpeg",
"profile_sidebar_fill_color": "252429",
"entities": {
"description": {
"urls": []
}
},
"followers_count": 5444,
"profile_sidebar_border_color": "FFFFFF",
"id_str": "1119996684",
"profile_background_color": "C0DEED",
"listed_count": 36,
"is_translation_enabled": False,
"utc_offset": None,
"statuses_count": 6525,
"description": "Mike Weber (U.S Army All American) DETROIT CTSENIOR
State Champion",
"friends_count": 693,
"location": "",
"profile_link_color": "0084B4",
"profile_image_url":
"http://pbs.twimg.com/profile_images/534465900343083008/A09dIq1d_normal.jpeg",
"following": False,
"geo_enabled": False,
"profile_banner_url":
"https://pbs.twimg.com/profile_banners/1119996684/1416261575",
"profile_background_image_url":
"http://abs.twimg.com/images/themes/theme9/bg.gif",
"name": "Mikey",
"lang": "en",
"profile_background_tile": False,
"favourites_count": 1401,
"screen_name": "mikeweber25",
"notifications": False,
"url": None,
"created_at": "Fri Jan 25 18:45:53 +0000 2013",
"contributors_enabled": False,
"time_zone": None,
"protected": False,
"default_profile": False,
"is_translator": False
},
"geo": None,
"in_reply_to_user_id_str": None,
"lang": "en",
"created_at": "Sat Nov 22 23:28:41 +0000 2014",
"in_reply_to_status_id_str": None,
"place": None,
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
}
},
"user": {
"follow_request_sent": False,
"profile_use_background_image": True,
"profile_text_color": "333333",
"default_profile_image": False,
"id": 2435537208,
"profile_background_image_url_https":
"https://abs.twimg.com/images/themes/theme1/bg.png",
"verified": False,
"profile_location": None,
"profile_image_url_https":
"https://pbs.twimg.com/profile_images/532694075947110400/oZEP5XNQ_normal.jpeg",
"profile_sidebar_fill_color": "DDEEF6",
"entities": {
"description": {
"urls": []
}
},
"followers_count": 161,
"profile_sidebar_border_color": "C0DEED",
"id_str": "2435537208",
"profile_background_color": "C0DEED",
"listed_count": 0,
"is_translation_enabled": False,
"utc_offset": None,
"statuses_count": 524,
"description": "Delasalle '17 Baseball & Football.",
"friends_count": 255,
"location": "",
"profile_link_color": "0084B4",
"profile_image_url":
"http://pbs.twimg.com/profile_images/532694075947110400/oZEP5XNQ_normal.jpeg",
"following": False,
"geo_enabled": False,
"profile_banner_url":
"https://pbs.twimg.com/profile_banners/2435537208/1406779364",
"profile_background_image_url":
"http://abs.twimg.com/images/themes/theme1/bg.png",
"name": "Andrew Brooks",
"lang": "en",
"profile_background_tile": False,
"favourites_count": 555,
"screen_name": "31brooks_",
"notifications": False,
"url": None,
"created_at": "Wed Apr 09 14:34:41 +0000 2014",
"contributors_enabled": False,
"time_zone": None,
"protected": False,
"default_profile": True,
"is_translator": False
},
"geo": None,
"in_reply_to_user_id_str": None,
"lang": "en",
"created_at": "Sun Nov 23 20:57:10 +0000 2014",
"in_reply_to_status_id_str": None,
"place": None,
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
}
},
{
"contributors": None,
"truncated": False,
"text": "RT @Plantedd: The University of Michigan moved a big Bur Oak
yesterday. 65ft tall. 350+ tons. http://t.co/v2Y6vl3f9e",
"in_reply_to_status_id": None,
"id": 536624216305848320,
"favorite_count": 0,
"source": "<a href=\"http://tapbots.com/tweetbot\"
rel=\"nofollow\">Tweetbot for i\u039fS</a>",
"retweeted": False,
"coordinates": None,
"entities": {
"symbols": [],
"user_mentions": [
{
"id": 462890283,
"indices": [
3,
12
],
"id_str": "462890283",
"screen_name": "Plantedd",
"name": "David Wong"
}
],
"hashtags": [],
"urls": [],
"media": [
{
"source_status_id_str": "526276522374889472",
"expanded_url":
"http://twitter.com/Plantedd/status/526276522374889472/photo/1",
"display_url": "pic.twitter.com/v2Y6vl3f9e",
"url": "http://t.co/v2Y6vl3f9e",
"media_url_https":
"https://pbs.twimg.com/media/B021tLsIYAADq21.jpg",
"source_status_id": 526276522374889472,
"id_str": "526276519308845056",
"sizes": {
"small": {
"h": 191,
"resize": "fit",
"w": 340
},
"large": {
"h": 576,
"resize": "fit",
"w": 1024
},
"medium": {
"h": 337,
"resize": "fit",
"w": 600
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"indices": [
94,
116
],
"type": "photo",
"id": 526276519308845056,
"media_url": "http://pbs.twimg.com/media/B021tLsIYAADq21.jpg"
}
]
},
"in_reply_to_screen_name": None,
"in_reply_to_user_id": None,
"retweet_count": 27,
"id_str": "536624216305848320",
"favorited": False,
"retweeted_status": {
"contributors": None,
"truncated": False,
"text": "The University of Michigan moved a big Bur Oak yesterday. 65ft
tall. 350+ tons. http://t.co/v2Y6vl3f9e",
"in_reply_to_status_id": None,
"id": 526276522374889472,
"favorite_count": 25,
"source": "<a href=\"http://twitter.com/download/iphone\"
rel=\"nofollow\">Twitter for iPhone</a>",
"retweeted": False,
"coordinates": None,
"entities": {
"symbols": [],
"user_mentions": [],
"hashtags": [],
"urls": [],
"media": [
{
"expanded_url":
"http://twitter.com/Plantedd/status/526276522374889472/photo/1",
"display_url": "pic.twitter.com/v2Y6vl3f9e",
"url": "http://t.co/v2Y6vl3f9e",
"media_url_https":
"https://pbs.twimg.com/media/B021tLsIYAADq21.jpg",
"id_str": "526276519308845056",
"sizes": {
"small": {
"h": 191,
"resize": "fit",
"w": 340
},
"large": {
"h": 576,
"resize": "fit",
"w": 1024
},
"medium": {
"h": 337,
"resize": "fit",
"w": 600
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"indices": [
80,
102
],
"type": "photo",
"id": 526276519308845056,
"media_url": "http://pbs.twimg.com/media/B021tLsIYAADq21.jpg"
}
]
},
"in_reply_to_screen_name": None,
"in_reply_to_user_id": None,
"retweet_count": 27,
"id_str": "526276522374889472",
"favorited": False,
"user": {
"follow_request_sent": False,
"profile_use_background_image": True,
"profile_text_color": "333333",
"default_profile_image": False,
"id": 462890283,
"profile_background_image_url_https":
"https://abs.twimg.com/images/themes/theme1/bg.png",
"verified": False,
"profile_location": None,
"profile_image_url_https":
"https://pbs.twimg.com/profile_images/1791926707/Plantedd_Logo__square__normal.
jpg",
"profile_sidebar_fill_color": "DDEEF6",
"entities": {
"url": {
"urls": [
{
"url": "http://t.co/ZOnsCHvoKt",
"indices": [
0,
22
],
"expanded_url": "http://www.plantedd.com",
"display_url": "plantedd.com"
}
]
},
"description": {
"urls": []
}
},
"followers_count": 2598,
"profile_sidebar_border_color": "C0DEED",
"id_str": "462890283",
"profile_background_color": "C0DEED",
"listed_count": 61,
"is_translation_enabled": False,
"utc_offset": 0,
"statuses_count": 8157,
"description": "Hello, I'm the supervillain behind Plantedd. We're an
online market for plant lovers plotting to take over the world by making it
simple to find and buy plants.",
"friends_count": 2664,
"location": "UK",
"profile_link_color": "0084B4",
"profile_image_url":
"http://pbs.twimg.com/profile_images/1791926707/Plantedd_Logo__square__normal.j
pg",
"following": False,
"geo_enabled": False,
"profile_banner_url":
"https://pbs.twimg.com/profile_banners/462890283/1398254314",
"profile_background_image_url":
"http://abs.twimg.com/images/themes/theme1/bg.png",
"name": "David Wong",
"lang": "en",
"profile_background_tile": False,
"favourites_count": 371,
"screen_name": "Plantedd",
"notifications": False,
"url": "http://t.co/ZOnsCHvoKt",
"created_at": "Fri Jan 13 13:46:46 +0000 2012",
"contributors_enabled": False,
"time_zone": "Edinburgh",
"protected": False,
"default_profile": True,
"is_translator": False
},
"geo": None,
"in_reply_to_user_id_str": None,
"possibly_sensitive": False,
"lang": "en",
"created_at": "Sun Oct 26 07:37:55 +0000 2014",
"in_reply_to_status_id_str": None,
"place": None,
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
}
},
"user": {
"follow_request_sent": False,
"profile_use_background_image": True,
"profile_text_color": "2A48AE",
"default_profile_image": False,
"id": 104940733,
"profile_background_image_url_https":
"https://abs.twimg.com/images/themes/theme17/bg.gif",
"verified": False,
"profile_location": None,
"profile_image_url_https":
"https://pbs.twimg.com/profile_images/2878477539/78e20432088b5ee2addc9ce3362fd4
61_normal.jpeg",
"profile_sidebar_fill_color": "6378B1",
"entities": {
"description": {
"urls": []
}
},
"followers_count": 149,
"profile_sidebar_border_color": "FBD0C9",
"id_str": "104940733",
"profile_background_color": "0C003D",
"listed_count": 18,
"is_translation_enabled": False,
"utc_offset": 0,
"statuses_count": 16031,
"description": "Have you any dreams you'd like to sell?",
"friends_count": 248,
"location": "",
"profile_link_color": "0F1B7C",
"profile_image_url":
"http://pbs.twimg.com/profile_images/2878477539/78e20432088b5ee2addc9ce3362fd46
1_normal.jpeg",
"following": False,
"geo_enabled": False,
"profile_banner_url":
"https://pbs.twimg.com/profile_banners/104940733/1410032966",
"profile_background_image_url":
"http://abs.twimg.com/images/themes/theme17/bg.gif",
"name": "Heather",
"lang": "en",
"profile_background_tile": False,
"favourites_count": 777,
"screen_name": "froyoho",
"notifications": False,
"url": None,
"created_at": "Thu Jan 14 21:37:54 +0000 2010",
"contributors_enabled": False,
"time_zone": "London",
"protected": False,
"default_profile": False,
"is_translator": False
},
"geo": None,
"in_reply_to_user_id_str": None,
"possibly_sensitive": False,
"lang": "en",
"created_at": "Sun Nov 23 20:55:57 +0000 2014",
"in_reply_to_status_id_str": None,
"place": None,
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
}
},
{
"contributors": None,
"truncated": False,
"text": "RT @NotableHistory: Madonna, 18 year old freshman at the
University of Michigan, 1976 http://t.co/x2dm1G67ea",
"in_reply_to_status_id": None,
"id": 536623674942439425,
"favorite_count": 0,
"source": "<a href=\"http://twitter.com/download/android\"
rel=\"nofollow\">Twitter for Android</a>",
"retweeted": False,
"coordinates": None,
"entities": {
"symbols": [],
"user_mentions": [
{
"id": 844766941,
"indices": [
3,
18
],
"id_str": "844766941",
"screen_name": "NotableHistory",
"name": "OnThisDay & Facts"
}
],
"hashtags": [],
"urls": [],
"media": [
{
"source_status_id_str": "536610190334779392",
"expanded_url":
"http://twitter.com/NotableHistory/status/536610190334779392/photo/1",
"display_url": "pic.twitter.com/x2dm1G67ea",
"url": "http://t.co/x2dm1G67ea",
"media_url_https":
"https://pbs.twimg.com/media/B3EXbQkCMAEipwM.jpg",
"source_status_id": 536610190334779392,
"id_str": "536235587703812097",
"sizes": {
"small": {
"h": 487,
"resize": "fit",
"w": 340
},
"large": {
"h": 918,
"resize": "fit",
"w": 640
},
"medium": {
"h": 860,
"resize": "fit",
"w": 600
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"indices": [
86,
108
],
"type": "photo",
"id": 536235587703812097,
"media_url": "http://pbs.twimg.com/media/B3EXbQkCMAEipwM.jpg"
}
]
},
"in_reply_to_screen_name": None,
"in_reply_to_user_id": None,
"retweet_count": 9,
"id_str": "536623674942439425",
"favorited": False,
"retweeted_status": {
"contributors": None,
"truncated": False,
"text": "Madonna, 18 year old freshman at the University of Michigan,
1976 http://t.co/x2dm1G67ea",
"in_reply_to_status_id": None,
"id": 536610190334779392,
"favorite_count": 13,
"source": "<a href=\"https://ads.twitter.com\" rel=\"nofollow\">Twitter
Ads</a>",
"retweeted": False,
"coordinates": None,
"entities": {
"symbols": [],
"user_mentions": [],
"hashtags": [],
"urls": [],
"media": [
{
"expanded_url":
"http://twitter.com/NotableHistory/status/536610190334779392/photo/1",
"display_url": "pic.twitter.com/x2dm1G67ea",
"url": "http://t.co/x2dm1G67ea",
"media_url_https":
"https://pbs.twimg.com/media/B3EXbQkCMAEipwM.jpg",
"id_str": "536235587703812097",
"sizes": {
"small": {
"h": 487,
"resize": "fit",
"w": 340
},
"large": {
"h": 918,
"resize": "fit",
"w": 640
},
"medium": {
"h": 860,
"resize": "fit",
"w": 600
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"indices": [
66,
88
],
"type": "photo",
"id": 536235587703812097,
"media_url": "http://pbs.twimg.com/media/B3EXbQkCMAEipwM.jpg"
}
]
},
"in_reply_to_screen_name": None,
"in_reply_to_user_id": None,
"retweet_count": 9,
"id_str": "536610190334779392",
"favorited": False,
"user": {
"follow_request_sent": False,
"profile_use_background_image": True,
"profile_text_color": "333333",
"default_profile_image": False,
"id": 844766941,
"profile_background_image_url_https":
"https://pbs.twimg.com/profile_background_images/458461302696837121/rGlGdWsc.pn
g",
"verified": False,
"profile_location": None,
"profile_image_url_https":
"https://pbs.twimg.com/profile_images/481243404320251905/gCr1cVP2_normal.png",
"profile_sidebar_fill_color": "DDFFCC",
"entities": {
"url": {
"urls": [
{
"url": "http://t.co/9fTPk5A4wh",
"indices": [
0,
22
],
"expanded_url": "http://notablefacts.com/",
"display_url": "notablefacts.com"
}
]
},
"description": {
"urls": []
}
},
"followers_count": 73817,
"profile_sidebar_border_color": "FFFFFF",
"id_str": "844766941",
"profile_background_color": "9AE4E8",
"listed_count": 485,
"is_translation_enabled": False,
"utc_offset": -21600,
"statuses_count": 38841,
"description": "On This Day in History, Historical Pictures & other
Interesting [email protected]",
"friends_count": 43594,
"location": "",
"profile_link_color": "0084B4",
"profile_image_url":
"http://pbs.twimg.com/profile_images/481243404320251905/gCr1cVP2_normal.png",
"following": False,
"geo_enabled": False,
"profile_banner_url":
"https://pbs.twimg.com/profile_banners/844766941/1411076349",
"profile_background_image_url":
"http://pbs.twimg.com/profile_background_images/458461302696837121/rGlGdWsc.png
",
"name": "OnThisDay & Facts",
"lang": "en",
"profile_background_tile": True,
"favourites_count": 1383,
"screen_name": "NotableHistory",
"notifications": False,
"url": "http://t.co/9fTPk5A4wh",
"created_at": "Tue Sep 25 03:08:59 +0000 2012",
"contributors_enabled": False,
"time_zone": "Central Time (US & Canada)",
"protected": False,
"default_profile": False,
"is_translator": False
},
"geo": None,
"in_reply_to_user_id_str": None,
"possibly_sensitive": False,
"lang": "en",
"created_at": "Sun Nov 23 20:00:13 +0000 2014",
"in_reply_to_status_id_str": None,
"place": None,
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
}
},
"user": {
"follow_request_sent": False,
"profile_use_background_image": True,
"profile_text_color": "333333",
"default_profile_image": False,
"id": 818185729,
"profile_background_image_url_https":
"https://abs.twimg.com/images/themes/theme1/bg.png",
"verified": False,
"profile_location": None,
"profile_image_url_https":
"https://pbs.twimg.com/profile_images/486215801498640384/rz9o7LnF_normal.jpeg",
"profile_sidebar_fill_color": "DDEEF6",
"entities": {
"description": {
"urls": []
}
},
"followers_count": 302,
"profile_sidebar_border_color": "C0DEED",
"id_str": "818185729",
"profile_background_color": "C0DEED",
"listed_count": 0,
"is_translation_enabled": False,
"utc_offset": None,
"statuses_count": 395,
"description": "Formerly with California Dept of General Services, now
freelancing around the Sacramento area...",
"friends_count": 1521,
"location": "Citrus Heights, CA",
"profile_link_color": "0084B4",
"profile_image_url":
"http://pbs.twimg.com/profile_images/486215801498640384/rz9o7LnF_normal.jpeg",
"following": False,
"geo_enabled": True,
"profile_banner_url":
"https://pbs.twimg.com/profile_banners/818185729/1383764759",
"profile_background_image_url":
"http://abs.twimg.com/images/themes/theme1/bg.png",
"name": "M Duncan",
"lang": "en",
"profile_background_tile": False,
"favourites_count": 6544,
"screen_name": "MDuncan95814",
"notifications": False,
"url": None,
"created_at": "Tue Sep 11 21:02:09 +0000 2012",
"contributors_enabled": False,
"time_zone": None,
"protected": False,
"default_profile": True,
"is_translator": False
},
"geo": None,
"in_reply_to_user_id_str": None,
"possibly_sensitive": False,
"lang": "en",
"created_at": "Sun Nov 23 20:53:48 +0000 2014",
"in_reply_to_status_id_str": None,
"place": None,
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
}
}
]
}
Understand
At any level of the extraction process, the first task is to make sure you understand the current
object you have extracted. There are few options here.
1. Print the entire object. If it’s small enough, you may be able to make sense of the
printout directly. If it’s a little bit larger, you may find it helpful to “pretty-print” it, with
indentation showing the level of nesting of the data. We don’t have a way to pretty-print
in our online browser-based environment, but if you’re running code with a full Python
interpreter, you can use the dumps function in the json module. For example:
import json
print(json.dumps(res, indent=2))
2. If printing the entire object gives you something that’s too unwieldy, you have other
options for making sense of it.
○ Copy and paste it to a site like https://jsoneditoronline.org/ which will let you
explore and collapse levels
○ Print the type of the object.
○ If it’s a dictionary:
■ print the keys
○ If it’s a list:
■ print its length
■ print the type of the first item
■ print the first item if it’s of manageable size
import json
print(json.dumps(res, indent=2)[:100])
print("-----------")
print(type(res))
print(res.keys())
Extract
In the extraction phase, you will be diving one level deeper into the nested data.
1. If it’s a dictionary, figure out which key has the value you’re looking for, and get its value. For
example: res2 = res['statuses']
2. If it’s a list, you will typically be wanting to do something with each of the items (e.g.,
extracting something from each, and accumulating them in a list). For that you’ll want a for
loop, such as for res2 in res. During your exploration phase, however, it will be easier to
debug things if you work with just one item. One trick for doing that is to iterate over a slice of
the list containing just one item. For example, for res2 in res[:1].
print(type(res))
print(res.keys())
res2 = res['statuses']
Repeat
Now you’ll repeat the Understand and Extract processes at the next level.
Level 2
First understand.
print(type(res))
print(res.keys())
res2 = res['statuses']
print("----Level 2-----")
print(type(res2)) # it's a list!
print(len(res2))
It’s a list, with three items, so it’s a good guess that each item represents one tweet.
Now extract. Since it’s a list, we’ll want to work with each item, but to keep things manageable for
now, let’s use the trick for just looking at the first item. Later we’ll switch to processing all the items.
import json
print(type(res))
print(res.keys())
res2 = res['statuses']
print("----Level 2: a list of tweets-----")
print(type(res2)) # it's a list!
print(len(res2)) # looks like one item representing each of the three tweets
for res3 in res2[:1]:
print("----Level 3: a tweet----")
print(json.dumps(res3, indent=2)[:30])
Level 3
First understand.
import json
print(type(res))
print(res.keys())
res2 = res['statuses']
print("----Level 2: a list of tweets-----")
print(type(res2)) # it's a list!
print(len(res2)) # looks like one item representing each of the three tweets
for res3 in res2[:1]:
print("----Level 3: a tweet----")
print(json.dumps(res3, indent=2)[:30])
print(type(res3)) # it's a dictionary
print(res3.keys())
Then extract. Let’s pull out the information about who sent each of the tweets. Probably that’s the
value associated with the ‘user’ key.
import json
print(type(res))
print(res.keys())
res2 = res['statuses']
print("----Level 2: a list of tweets-----")
print(type(res2)) # it's a list!
print(len(res2)) # looks like one item representing each of the three tweets
for res3 in res2[:1]:
print("----Level 3: a tweet----")
print(json.dumps(res3, indent=2)[:30])
res4 = res3['user']
Now repeat.
Level 4 - Understand
import json
print(type(res))
print(res.keys())
res2 = res['statuses']
print("----Level 2: a list of tweets-----")
print(type(res2)) # it's a list!
print(len(res2)) # looks like one item representing each of the three
tweets
for res3 in res2[:1]:
print("----Level 3: a tweet----")
print(json.dumps(res3, indent=2)[:30])
res4 = res3['user']
print("----Level 4: the user who wrote the tweet----")
print(type(res4)) # it's a dictionary
print(res4.keys())
Extract. Let’s print out the user’s screen name and when their account was created.
import json
# print(type(res))
# print(res.keys())
res2 = res['statuses']
# print("----Level 2: a list of tweets-----")
# print(type(res2)) # it's a list!
# print(len(res2)) # looks like one item representing each of the three tweets
for res3 in res2[:1]:
print("----Level 3: a tweet----")
# print(json.dumps(res3, indent=2)[:30])
res4 = res3['user']
print("----Level 4: the user who wrote the tweet----")
# print(type(res4)) # it's a dictionary
# print(res4.keys())
print(res4['screen_name'], res4['created_at'])
Now, we may want to go back and have it extract for all the items rather than only the first item in
res2.
import json
# print(type(res))
# print(res.keys())
res2 = res['statuses']
#print("----Level 2: a list of tweets-----")
#print(type(res2)) # it's a list!
#print(len(res2)) # looks like one item representing each of the three tweets
for res3 in res2:
#print("----Level 3: a tweet----")
#print(json.dumps(res3, indent=2)[:30])
res4 = res3['user']
#print("----Level 4: the user who wrote the tweet----")
#print(type(res4)) # it's a dictionary
#print(res4.keys())
print(res4['screen_name'], res4['created_at'])
Reflections
Notice that each time we descend a level in a dictionary, we have a [] picking out a key. Each time
we look inside a list, we will have a for loop. If there are lists at multiple levels, we will have nested
for loops.
Once you’ve figured out how to extract everything you want, you may choose to collapse things with
multiple extractions in a single expression. For example, we could have this shorter version.
Even with this compact code, we can still count off how many levels of nesting we have extracted
from, in this case four. res[‘statuses’] says we have descended one level (in a dictionary). for res3
in… says we have descended another level (in a list). [‘user’] is descending one more level, and
[‘screen_name’] is descending one more level.