Pandas Plots
Pandas Plots
Pandas Plots
Pandas also provides us options to visualize the data. Here are some of the examples:
Syntax:
X = value on X axis
y = value on y axis
2.
A bar plot to visualize mean acceleration in different years.
1. df.groupby('model_year').mean()[['acceleration']].plot(kind
= 'bar');
2.
1. df['cylinders'].plot(kind = 'hist')
2.
A scatter plot to visualize the relationship between weight and mpg.
2.
A bar plot to visualize the sorted mean values of acceleration with respect to number of
cylinders.
1. df.groupby('cylinders').mean().sort_values('acceleration')
[['acceleration']].plot(kind = 'bar')
2.
Pandas Exercise
Problem Statement:
to find the rainfall dataset that is to be considered for the exercise.. This data contains
region(district) wise rainfall across India.
1. import pandas as pd
2.
3. df = pd.DataFrame([[0.23,'f1'],[5.36,'f2']],
4. index = list('pq'),
5. columns = list('ab'))
6.
Do the following:
4. Display rows whose any of the element matches with any element of the given list:
1. lst = ['f30','f50','f2','f0']
Merging
There are few more commonly used methods involved in Pandas as listed:
Merging
Reshaping
Pivot Tables
Grouping
Let us understand their importance by some illustrations starting with merging of datasets.
Given two datasets from a conducted experiment with each dataset having its own feature. Our
task is to form a single dataset combining all the features particular to each observation. To do so
we can take help of concat() function.
1. import pandas as pd
2.
6. columns = list('ABC'))
11. print(data1)
12. # A B C
13. # 0 15 12 -3
14. # 1 33 54 21
15. # 2 10 32 22
16.
17. print(data2)
18. # D E F
19. # 0 10 1.00 3
20. # 1 33 -54.00 2
21. # 2 10 0.32 2
22.
24. # A B C D E F
25. # 0 15 12 -3 10 1.00 3
26. # 1 33 54 21 33 -54.00 2
27. # 2 10 32 22 10 0.32 2
28.
29.
Suppose an individual took observation of 3 different features in two instances. Next, she wants
to combine all these samples to form a single dataset. We can achieve this task using the
same concat() function, but this time with the change of axis.
1. import numpy as np
2. import pandas as pd
3.
4. data1 = pd.DataFrame(np.random.randn(9).reshape(3,3),
5. columns = list('ABC'))
6. data2 = pd.DataFrame(np.arange(9).reshape(3,3),
7. columns = list('ABC'))
9. # A B C
14. print(data2)
15. # A B C
16. # 0 0 1 2
17. # 1 3 4 5
18. # 2 6 7 8
19.
21. # A B C
Pivot Tables
Reshaping a dataset is a perfect seed for quick text visualization. However, similar to reshaping we have
another term named as pivot tables which are more efficient in delivering a better visualization.
To understand pivot tables we take the same last dataframe except adding a new feature 'score'.
1. import pandas as pd
2.
3. df = pd.DataFrame([
9. 'Game', 'Score'],
11.
12. print(df)
18.
Pivot tables come handy when we have to break down a large dataset (in terms of features) into fewer features
for quick visualization. For example, finding which medal is common to both IND and USA, listing game(s) in
which India won Silver, etc.
3. # Game Score
5. # Country
8.
13. # Country
16.
1. import numpy as np
2. import pandas as pd
3.
4. df = pd.DataFrame([
13.
14. print(df)
21.
27. # Country
Grouping
To understand grouping, a concept similar to databases, let us consider that we have been given a dataset for
the sales of laptop and desktop systems. The observations for a particular type can be repeated for the different
sales price. In this case, if we need to calculate the total sales of each category then we can group similar data
and apply a certain function.
To do so we create a sample dummy dataset and sum up the total sales particular to each category.
1. import pandas as pd
2.
3. df = pd.DataFrame([["Laptop", 1000],
4. ["Laptop", 2520],
5. ["Desktop", 3000],
7.
8. print(df)
9. # Category Sales
14.
16. # Sales
17. # Category
Problem Statement:
Given a dataframe df which has three attributes defining: set_name: system names, spd_per_day:
Speed per day, speed: Network speed in MBps
1. sys = ['s1','s1','s1','s1',
2. 's2','s2','s2','s2']
3. net_day = ['d1','d1','d2','d2',
4. 'd1','d1','d2','d2']
7. df = pd.DataFrame({'set_name':sys,
8. 'spd_per_day':net_day,
9. 'speed':spd})
10.
Do the following:
1. Construct a dataframe new_df where the given dataset is grouped based on each system
(s1 and s2) and speed per day (d1 and d2) with the median speed each day per system.
Also, provide a secondary name ' Median' for the speed attribute.
2. Sort the dataframe new_df in the ascending order of the median speed.
Hints/Answers:
1. Answer:
2. Median
3. 0 s1 d1 6.35
4. 1 s1 d2 8.95
5. 2 s2 d1 3.65
6. 3 s2 d2 14.40
7.
2. Answer:
2. Median
3. 2 s2 d1 3.65
4. 0 s1 d1 6.35
5. 1 s1 d2 8.95
6. 3 s2 d2 14.40