Applications of ICT
Python Libraries
Python – Lambda
A lambda function is a small anonymous function.
A lambda function can take any number of arguments, but can only have
one expression.
lambda arguments : expression
x = lambda a : a + 10
15
print(x(5))
Python – Lambda
Summarize argument a, b, and c and return the result:
x = lambda a, b, c : a + b + c
13
print(x(5, 6, 2))
Python – Lambda
Lambda functions can take any number of arguments:
Example
Multiply argument a with argument b and return the result:
x = lambda a, b : a * b
30
print(x(5, 6))
Python – Lambda
Why Use Lambda Functions?
The power of lambda is better shown when you use them as
an anonymous function inside another function.
Say you have a function definition that takes one argument,
and that argument will be multiplied with an unknown
number:
def myfunc(n):
return lambda a : a * n
Python – Lambda
Use that function definition to make a function that always doubles the
number you send in:
def myfunc(n):
return lambda a : a * n
22
mydoubler = myfunc(2)
print(mydoubler(11))
Python – Lambda
Or, use the same function definition to make both functions, in the
same program:
def myfunc(n):
return lambda a : a * n
mydoubler = myfunc(2) 22
mytripler = myfunc(3) 33
print(mydoubler(11))
print(mytripler(11))
NumPy
• NumPy is a Python library used for working with arrays.
• It also has functions for working in domain of linear
algebra, fourier transform, and matrices.
• In Python we have lists that serve the purpose of arrays,
but they are slow to process.
• The array object in NumPy is called ndarray, it provides
a lot of supporting functions that make working with
ndarray very easy.
NumPy
Installation of NumPy
pip install numpy
Once NumPy is installed, import it in your applications by
adding the import keyword:
import numpy
NumPy
Example
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) [1 2 3 4 5]
print(arr)
NumPy
Create a NumPy ndarray Object
NumPy is used to work with arrays. The array object in
NumPy is called ndarray.
We can create a NumPy ndarray object by using the array()
function.
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) [1 2 3 4 5]
<class 'numpy.ndarray'>
print(arr)
print(type(arr))
NumPy
Create a NumPy ndarray Object
To create an ndarray, we can pass a list, tuple or any array-
like object into the array() method, and it will be converted
into an ndarray:
import numpy as np
# using tuple
arr = np.array((1, 2, 3, 4, 5)) [1 2 3 4 5]
print(arr)
NumPy
Dimensions in Arrays
A dimension in arrays is one level of array depth (nested
arrays).
• 0-D Arrays: or Scalars, are the elements in an array.
Each value in an array is a 0-D array.
import numpy as np
arr = np.array(42) 42
<class 'numpy.ndarray'>
print(arr)
print(type(arr))
NumPy
1-D Arrays
An array that has 0-D arrays as its elements is called uni-
dimensional or 1-D array.
These are the most common and basic arrays.
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) [1, 2, 3, 4, 5]
<class 'numpy.ndarray'>
print(arr)
print(type(arr))
NumPy
2-D Arrays
• An array that has 1-D arrays as its elements is
called a 2-D array.
• These are often used to represent matrix or 2nd
order tensors.
import numpy as np
[[1 2 3]
arr = np.array([[1, 2, 3], [4, 5, 6]]) [4 5 6]]
print(arr)
NumPy
Check number of dimensions
import numpy as np
a = np.array(42)
0
b = np.array([1, 2, 3, 4, 5]) 1
c = np.array([[1, 2, 3], [4, 5, 6]]) 2
3
d = np.array([[[1, 2, 3], [4, 5, 6]],
[[1, 2, 3], [4, 5, 6]]])
NumPy
Check number of dimensions
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
0
d = np.array([[[1, 2, 3], [4, 5, 6]], 1
[[1, 2, 3], [4, 5, 6]]]) 2
print(a.ndim) 3
print(b.ndim)
print(c.ndim)
print(d.ndim)
NumPy
Access Array Elements
• Array indexing is the same as accessing an array
element.
• You can access an array element by referring to
its index number.
import numpy as np
arr = np.array([1, 2, 3, 4]) 1
print(arr[0])
NumPy
Access Array Elements
• Get third and fourth elements from the following array
and add them.
import numpy as np
arr = np.array([1, 2, 3, 4]) 7
#print()
NumPy
Access Array Elements
• Get third and fourth elements from the following array
and add them.
import numpy as np
arr = np.array([1, 2, 3, 4]) 7
print(arr[2] + arr[3])
NumPy
Access 2-D Arrays
To access elements from 2-D arrays we can use comma
separated integers representing the dimension and the
index of the element.
Think of 2-D arrays like a table with rows and columns,
where the dimension represents the row and the index
represents the column.
NumPy
Access 2-D Arrays
import numpy as np
arr = np.array([[1,2,3,4,5],
2nd element on 1st dim: 2
[6,7,8,9,10]])
#print()
NumPy
Access 2-D Arrays
import numpy as np
arr = np.array([[1,2,3,4,5],
[6,7,8,9,10]]) 2nd element on 1st dim: 2
print('2nd element on 1st row: ',
arr[0, 1])
NumPy
Access 2-D Arrays
Access the element on the 2nd row, 5th column:
import numpy as np
arr = np.array([[1,2,3,4,5],
5th element on 2nd dim: 10
[6,7,8,9,10]])
#print()
NumPy
Access 2-D Arrays
Access the element on the 2nd row, 5th column:
import numpy as np
arr = np.array([[1,2,3,4,5],
[6,7,8,9,10]]) 5th element on 2nd dim: 10
print('5th element on 2nd row: ',
arr[1, 4])
NumPy
Access 3-D Arrays
Access the third element of the second array of the first
array:
import numpy as np
arr = np.array([[[1, 2, 3],
[4, 5, 6]], [[7, 8, 9], ?
[10, 11, 12]]])
#print()
NumPy
Access 3-D Arrays
Access the third element of the second array of the first array:
import numpy as np
arr = np.array([[[1, 2, 3],
[4, 5, 6]], [[7, 8, 9], 6
[10, 11, 12]]])
print(arr[0, 1, 2])
NumPy
Example Explained
arr[0, 1, 2] prints the value 6.
And this is why:
The first number represents the first dimension, which contains two arrays:
[[1, 2, 3], [4, 5, 6]]
and:
[[7, 8, 9], [10, 11, 12]]
Since we selected 0, we are left with the first array:
[[1, 2, 3], [4, 5, 6]]
The second number represents the second dimension, which also contains two arrays:
[1, 2, 3]
and:
[4, 5, 6]
NumPy
Example Explained
Since we selected 1, we are left with the second array:
[4, 5, 6]
The third number represents the third dimension, which contains three values:
4
5
6
Since we selected 2, we end up with the third value:
6
NumPy
Negative Indexing
Use negative indexing to access an array from the end.
'Last element from 2nd dim’
import numpy as np
arr = np.array([[1,2,3,4,5],
Last element from 2nd dim: 10
[6,7,8,9,10]])
#print()
NumPy
Negative Indexing
Use negative indexing to access an array from the end.
'Last element from 2nd dim’
import numpy as np
arr = np.array([[1,2,3,4,5],
[6,7,8,9,10]]) Last element from 2nd dim: 10
print('Last element from 2nd dim:
', arr[1, -1])
Pandas
• Pandas is a Python library used for working with data
sets.
• It has functions for analyzing, cleaning, exploring, and
manipulating data.
• It can be installed using PIP command
• Once installed, it can be imported into your Python code
Pandas
What Can Pandas Do?
Pandas gives you answers about the data. Like:
Is there a correlation between two or more
columns?
• What is average value?
• Max value?
• Min value?
Pandas are also able to delete rows that are not
relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.
Pandas
Installation of Pandas
pip install pandas
Once Pandas is installed, import it in your applications by
adding the import keyword:
import pandas
Pandas
Example
import pandas as pd
mydataset = {
cars passings
'cars': ["BMW", "Volvo", "Ford"], 0 BMW 3
'passings': [3, 7, 2] 1 Volvo 7
2 Ford 2
}
myvar = pd.DataFrame(mydataset)
print(myvar)
Pandas
Series: A Pandas Series is like a column in a table
It is a one-dimensional array holding data of any type
import pandas as pd
a = [1, 7, 2] 0 1
1 7
2 2
myvar = pd.Series(a)
print(myvar)
Pandas
Labels: If nothing else is specified, the values are labeled
with their index number.
The label can be used to access a specified value
import pandas as pd
a = [1, 7, 2]
1
myvar = pd.Series(a)
print(myvar[0])
Pandas
Create Labels:
With the index argument, you can name your own labels
import pandas as pd
a = [1, 7, 2] x 1
y 7
z 2
myvar = pd.Series(a, index =
["x", "y", "z"])
print(myvar)
Pandas
Create Labels:
With the index argument, you can name your own labels
You can access an item by referring to the label
import pandas as pd
a = [1, 7, 2]
7
myvar = pd.Series(a, index =
["x", "y", "z"])
print(myvar[“y”])
Pandas
Key/Value Objects as Series
You can also use a key/value object, like a dictionary, when
creating a Series
import pandas as pd
calories =
{"day1": 420, "day2": 380, "day3": 390 day1 420
day2 380
} day3 390
myvar = pd.Series(calories)
print(myvar)
Pandas
Key/Value Objects as Series
To select only some of the items in the dictionary, use the
index argument and specify only the items you want to
include in the Series.
import pandas as pd
calories =
{"day1": 420, "day2": 380, "day3": 390
day1 420
} day2 380
myvar = pd.Series(calories, index =
["day1", "day2"])
print(myvar)
Pandas
DataFrames
• Data sets in Pandas are usually multi-dimensional tables,
called DataFrames.
• Series is like a column, a DataFrame is the whole table.
import pandas as pd
data = {
"calories": [420, 380, 390], calories duration
0 420 50
"duration": [50, 40, 45] 1 380 40
} 2 390 45
df = pd.DataFrame(data)
print(df)
Pandas
DataFrames
• DataFrame is like a table with rows and columns.
• Pandas use the loc attribute to return one or more
specified row(s)
import pandas as pd
data = {
"calories": [420, 380, 390],
calories 420
"duration": [50, 40, 45] duration 50
}
df = pd.DataFrame(data)
print(df.loc[0])
Pandas
Named Indexes
• With the index argument, you can name your own
indexes
import pandas as pd
data = {
"calories": [420, 380, 390],
calories duration
"duration": [50, 40, 45] day1 420 50
} day2 380 40
day3 390 45
df = pd.DataFrame(data, index =
["day1", "day2", "day3"])
print(df)
Pandas
Locate Name Indexes
• Use the named index in the loc attribute to return the
specified row(s)
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45] calories 380
} duration 40
df = pd.DataFrame(data, index =
["day1", "day2", "day3"])
print(df.loc[“day2”])
Pandas
Load Files into a DataFrame
• If your data sets are stored in a file, Pandas can load
them into a DataFrame
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
import pandas as pd 1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
df = pd.read_csv('data.csv') .. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
print(df) 167 75 120 150 320.4
168 75 125 150 330.4
Pandas
Read CSV Files
• A simple way to store big data sets is to use CSV files
(comma separated files).
• CSV files contains plain text and is a well know format
that can be read by everyone including Pandas.
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
import pandas as pd 1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
df = pd.read_csv('data.csv') 5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
print(df.to_string()) .. ... ... ... ...
Pandas
Dictionary as import pandas as pd "Maxpulse":{
"0":130,
JSON data = { "1":145,
"Duration":{ "2":135,
"0":60, "3":175,
"1":60, "4":148,
"2":60, "5":127
"3":45, },
"4":45, "Calories":{
"5":60 "0":409,
}, "1":479,
"Pulse":{ "2":340,
"0":110, "3":282,
"1":117, "4":406,
"2":103, "5":300
"3":109, }
"4":117, }
"5":102
}, df = pd.DataFrame(data)
print(df)
Pandas
Dictionary as JSON
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
Pandas
Viewing the Data
• One of the most used method for getting a quick
overview of the DataFrame, is the head() method.
• The head() method returns the headers and a specified
number of rows, starting from the top.
Duration Pulse Maxpulse Calories
import pandas as pd 0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
df = pd.read_csv('data.csv') 4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
print(df.head(9)) 8 30 109 133 195.1
Pandas
Info About the Data
• The DataFrames object has a method called info(), that
gives you more information about the data set.
import pandas as pd <class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
# Column Non-Null Count Dtype
df = pd.read_csv('data.csv') --- ------ -------------- -----
0 Duration 169 non-null int64
1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
print(df.info()) 3 Calories 164 non-null float64
Pandas
Data Cleaning
Data cleaning means fixing bad data in your data set.
Bad data could be:
– Empty cells
– Data in wrong format
– Wrong data
– Duplicates
Pandas
Data Set
Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 NaN
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
Pandas
Data Set Issues
• The data set contains some empty cells ("Date" in row
22, and "Calories" in row 18 and 28).
• The data set contains wrong format ("Date" in row 26).
• The data set contains wrong data ("Duration" in row 7).
• The data set contains duplicates (row 11 and 12).
Pandas
Empty Cells
• One way to deal with empty cells is to remove rows that
contain empty cells.
import pandas as pd
<Duration Date Pulse Maxpulse Calories
df = pd.read_csv('data.csv') 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
new_df = df.dropna() 5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
#df.dropna(inplace = True) 7 450 '2020/12/08' 104 134 253.3
print(new_df.to_string())
Pandas
Replace Empty Values
• Insert a new value instead, don’t have to delete entire
rows just because of some empty cells.
import pandas as pd
<Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 130
df = pd.read_csv('data.csv') 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
df.fillna(130, inplace = 5 60 '2020/12/06' 102 127 300.0
True)
Pandas
Replace Only For Specified Columns
• To only replace empty values for one column, specify
the column name for the DataFrame:
import pandas as pd
<Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 130
df = pd.read_csv('data.csv') 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
df[“Calories”].fillna(130, 5 60 '2020/12/06' 102 127 300.0
inplace = True)
Pandas
Replace Using Mean, Median, or Mode
• A common way to replace empty cells, is to calculate the
mean, median or mode value of the column.
import pandas as pd
df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 450
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
x = df["Calories"].mean() 3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
df["Calories"].fillna(x,
inplace = True)
Pandas
Data of Wrong Format
• We have two cells with the wrong format. Check out row 22 and
26, the 'Date' column should be a string that represents a date:
import pandas as pd
df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories
21 60 2020-12-21 108 131 364.2
22 45 NaT 100 119 282.0
23 60 2020-12-23 130 101 300.0
df['Date'] = 24 45 2020-12-24 105 132 246.0
25 60 2020-12-25 102 126 334.5
pd.to_datetime(df['Date']) 26 60 2020-12-26 100 120 250.
print(df.to_string())
Pandas
Data of Wrong Format
• Empty data in row 22 got a NaT value. One way to deal with
empty values is simply removing the entire row
import pandas as pd
df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories
21 60 2020-12-21 108 131 364.2
df['Date'] = 23 60 2020-12-23 130 101 300.0
24 45 2020-12-24 105 132 246.0
pd.to_datetime(df['Date']) 25 60 2020-12-25 102 126 334.5
26 60 2020-12-26 100 120 250.
df.dropna(subset=[‘Date’],
inplace – True)
print(df.to_string())
Pandas
Wrong Data – Replacing Values
• Empty data in row 22 got a NaT value. One way to deal with
empty values is simply removing the entire row
import pandas as pd
<Duration Date Pulse Maxpulse Calories
df = pd.read_csv('data.csv') 5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
df.loc[7,'Duration'] = 45 7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
print(df.to_string())
Pandas
Wrong Data – Replacing Values
• To replace wrong data for larger data sets you can create some
rules, e.g. set some boundaries for legal values, and replace any
values that are outside of the boundaries.
import pandas as pd
df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 409.1
for x in df.index: 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
if df.loc[x, "Duration"] > 120: 3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
df.loc[x, "Duration"] = 120
print(df.to_string())
Pandas
Wrong Data – Replacing Values
• Another way of handling wrong data is to remove the rows
that contains wrong data.
import pandas as pd
df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 409.1
for x in df.index: 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
if df.loc[x, "Duration"] > 120: 3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
df.drop(x, inplace = True)
print(df.to_string())
Pandas
Discovering Duplicates
• Duplicate rows are rows that have been registered more
than one time.
0 False
1 False
import pandas as pd
2 False
3 False
4 False
df = pd.read_csv('data.csv')
5 False
6 False
7 False
Print(df.duplicated())
8 False
9 .......
Pandas
Removing Duplicates
• To remove duplicates, use the drop_duplicates() method.
import pandas as pd
Duration Date Pulse Maxpulse Calories
df = pd.read_csv('data.csv') 9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
df.drop_duplicates(inplace = True) 14 60 '2020/12/14' 104 132 379.3
print(df.toString())
Pandas
Finding Relationships
• A great aspect of the Pandas module is the corr() method.
• The corr() method calculates the relationship between
each column in your data set.
import pandas as pd
Duration Pulse Maxpulse Calories
Duration 1.000000 -0.059452 -0.250033 0.344341
df = pd.read_csv('data.csv') Pulse -0.059452 1.000000 0.269672 0.481791
Maxpulse -0.250033 0.269672 1.000000 0.335392
Calories 0.344341 0.481791 0.335392 1.00000
print(df.corr())
Pandas
Finding Relationships
• The corr() method calculates the relationship between
each column in your data set.
• The corr() method ignores "not numeric" columns.
Duration Pulse Maxpulse Calories
Duration 1.000000 -0.059452 -0.250033 0.344341
Pulse -0.059452 1.000000 0.269672 0.481791
Maxpulse -0.250033 0.269672 1.000000 0.335392
Calories 0.344341 0.481791 0.335392 1.00000
Pandas
Results Explained
• The Result of the corr() method is a table with a lot of
numbers that represents how well the relationship is
between two columns.
• The number varies from -1 to 1.
• 1 means that there is a 1 to 1 relationship (a perfect
correlation), and for this data set, each time a value went
up in the first column, the other one went up as well.
Pandas
Results Explained
• 0.9 is also a good relationship, and if you increase one
value, the other will probably increase as well.
• -0.9 would be just as good relationship as 0.9, but if you
increase one value, the other will probably go down.
• 0.2 means NOT a good relationship, meaning that if one
value goes up does not mean that the other will.
Pandas
Pandas - Plotting
Pandas
Dataset Plotting
• We can use plotting library called matplotlib
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot()
plt.show()
Pandas
Plotting
• Pandas uses the plot() method to create diagrams.
• We can use Pyplot, a submodule of the Matplotlib library to
visualize the diagram on the screen.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot()
plt.show()
Pandas
Scatter Plot
• Specify that you want a scatter plot with the kind argument:
• kind = 'scatter'
• A scatter plot needs an x- and a y-axis..
import pandas as pd
import matplotlib.pyplot as plt
df.plot(kind = 'scatter', x
= 'Duration', y = 'Calories')
plt.show()
Pandas
Remember: In
the previous
example, we
learned that the
correlation
between
"Duration" and
"Calories" was
0.922721, and
we concluded
with the fact that
higher duration
means more
calories burned.
Pandas
• Let's create another scatterplot, where there is a bad
relationship between the columns, like “x" and “y", with
the correlation c:
?
Pandas
• Let's create another scatterplot, where there is a bad
relationship between the columns, like "Duration" and
"Maxpulse", with the correlation 0.009403:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot(kind = 'scatter', x = 'Duration', y = 'Maxpulse')
plt.show()
Pandas
• Let's create another scatterplot, where there is a bad relationship between the
columns, like "Duration" and "Maxpulse", with the correlation 0.009403:
Pandas
Histogram
• Use the kind argument to specify that you want a
histogram:
• kind = 'hist'
• A histogram needs only one column.
• A histogram shows us the frequency of each interval, e.g.
how many workouts lasted between 50 and 60 minutes?
Pandas
Histogram
• we will use the "Duration" column to create the histogram.
The histogram tells us that there were over 100 workouts that lasted
between 50 and 60 minutes.
***
df["Duration"].plot(kind = 'hist')
***