0% found this document useful (0 votes)
9 views

Python Libraries

Explaining python libraries

Uploaded by

umeume3636
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Python Libraries

Explaining python libraries

Uploaded by

umeume3636
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Applications of ICT

Python Libraries
Python – Lambda

A lambda function is a small anonymous function.

A lambda function can take any number of arguments, but can only have
one expression.

lambda arguments : expression

x = lambda a : a + 10
15
print(x(5))
Python – Lambda

Summarize argument a, b, and c and return the result:

x = lambda a, b, c : a + b + c
13
print(x(5, 6, 2))
Python – Lambda

Lambda functions can take any number of arguments:

Example
Multiply argument a with argument b and return the result:

x = lambda a, b : a * b
30
print(x(5, 6))
Python – Lambda

Why Use Lambda Functions?

The power of lambda is better shown when you use them as


an anonymous function inside another function.

Say you have a function definition that takes one argument,


and that argument will be multiplied with an unknown
number:

def myfunc(n):
return lambda a : a * n
Python – Lambda

Use that function definition to make a function that always doubles the
number you send in:

def myfunc(n):
return lambda a : a * n

22
mydoubler = myfunc(2)

print(mydoubler(11))
Python – Lambda

Or, use the same function definition to make both functions, in the
same program:

def myfunc(n):
return lambda a : a * n

mydoubler = myfunc(2) 22
mytripler = myfunc(3) 33

print(mydoubler(11))
print(mytripler(11))
NumPy

• NumPy is a Python library used for working with arrays.

• It also has functions for working in domain of linear


algebra, fourier transform, and matrices.

• In Python we have lists that serve the purpose of arrays,


but they are slow to process.

• The array object in NumPy is called ndarray, it provides


a lot of supporting functions that make working with
ndarray very easy.
NumPy

Installation of NumPy

pip install numpy

Once NumPy is installed, import it in your applications by


adding the import keyword:

import numpy
NumPy

Example

import numpy as np

arr = np.array([1, 2, 3, 4, 5]) [1 2 3 4 5]

print(arr)
NumPy

Create a NumPy ndarray Object


NumPy is used to work with arrays. The array object in
NumPy is called ndarray.
We can create a NumPy ndarray object by using the array()
function.
import numpy as np

arr = np.array([1, 2, 3, 4, 5]) [1 2 3 4 5]


<class 'numpy.ndarray'>

print(arr)
print(type(arr))
NumPy

Create a NumPy ndarray Object


To create an ndarray, we can pass a list, tuple or any array-
like object into the array() method, and it will be converted
into an ndarray:

import numpy as np

# using tuple
arr = np.array((1, 2, 3, 4, 5)) [1 2 3 4 5]

print(arr)
NumPy

Dimensions in Arrays
A dimension in arrays is one level of array depth (nested
arrays).
• 0-D Arrays: or Scalars, are the elements in an array.
Each value in an array is a 0-D array.
import numpy as np

arr = np.array(42) 42
<class 'numpy.ndarray'>

print(arr)
print(type(arr))
NumPy

1-D Arrays
An array that has 0-D arrays as its elements is called uni-
dimensional or 1-D array.

These are the most common and basic arrays.


import numpy as np

arr = np.array([1, 2, 3, 4, 5]) [1, 2, 3, 4, 5]


<class 'numpy.ndarray'>

print(arr)
print(type(arr))
NumPy

2-D Arrays
• An array that has 1-D arrays as its elements is
called a 2-D array.
• These are often used to represent matrix or 2nd
order tensors.

import numpy as np

[[1 2 3]
arr = np.array([[1, 2, 3], [4, 5, 6]]) [4 5 6]]

print(arr)
NumPy

Check number of dimensions

import numpy as np

a = np.array(42)
0
b = np.array([1, 2, 3, 4, 5]) 1
c = np.array([[1, 2, 3], [4, 5, 6]]) 2
3
d = np.array([[[1, 2, 3], [4, 5, 6]],
[[1, 2, 3], [4, 5, 6]]])
NumPy

Check number of dimensions


import numpy as np

a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
0
d = np.array([[[1, 2, 3], [4, 5, 6]], 1
[[1, 2, 3], [4, 5, 6]]]) 2
print(a.ndim) 3

print(b.ndim)
print(c.ndim)
print(d.ndim)
NumPy

Access Array Elements


• Array indexing is the same as accessing an array
element.
• You can access an array element by referring to
its index number.

import numpy as np

arr = np.array([1, 2, 3, 4]) 1

print(arr[0])
NumPy

Access Array Elements


• Get third and fourth elements from the following array
and add them.

import numpy as np

arr = np.array([1, 2, 3, 4]) 7

#print()
NumPy

Access Array Elements


• Get third and fourth elements from the following array
and add them.

import numpy as np

arr = np.array([1, 2, 3, 4]) 7

print(arr[2] + arr[3])
NumPy

Access 2-D Arrays

To access elements from 2-D arrays we can use comma


separated integers representing the dimension and the
index of the element.

Think of 2-D arrays like a table with rows and columns,


where the dimension represents the row and the index
represents the column.
NumPy

Access 2-D Arrays

import numpy as np

arr = np.array([[1,2,3,4,5],
2nd element on 1st dim: 2
[6,7,8,9,10]])

#print()
NumPy

Access 2-D Arrays

import numpy as np

arr = np.array([[1,2,3,4,5],
[6,7,8,9,10]]) 2nd element on 1st dim: 2

print('2nd element on 1st row: ',


arr[0, 1])
NumPy

Access 2-D Arrays


Access the element on the 2nd row, 5th column:

import numpy as np

arr = np.array([[1,2,3,4,5],
5th element on 2nd dim: 10
[6,7,8,9,10]])

#print()
NumPy

Access 2-D Arrays


Access the element on the 2nd row, 5th column:

import numpy as np

arr = np.array([[1,2,3,4,5],
[6,7,8,9,10]]) 5th element on 2nd dim: 10

print('5th element on 2nd row: ',


arr[1, 4])
NumPy

Access 3-D Arrays

Access the third element of the second array of the first


array:
import numpy as np

arr = np.array([[[1, 2, 3],


[4, 5, 6]], [[7, 8, 9], ?
[10, 11, 12]]])

#print()
NumPy

Access 3-D Arrays

Access the third element of the second array of the first array:

import numpy as np

arr = np.array([[[1, 2, 3],


[4, 5, 6]], [[7, 8, 9], 6
[10, 11, 12]]])

print(arr[0, 1, 2])
NumPy

Example Explained

arr[0, 1, 2] prints the value 6.

And this is why:

The first number represents the first dimension, which contains two arrays:
[[1, 2, 3], [4, 5, 6]]
and:
[[7, 8, 9], [10, 11, 12]]
Since we selected 0, we are left with the first array:
[[1, 2, 3], [4, 5, 6]]

The second number represents the second dimension, which also contains two arrays:
[1, 2, 3]
and:
[4, 5, 6]
NumPy

Example Explained

Since we selected 1, we are left with the second array:


[4, 5, 6]

The third number represents the third dimension, which contains three values:
4
5
6
Since we selected 2, we end up with the third value:
6
NumPy

Negative Indexing

Use negative indexing to access an array from the end.


'Last element from 2nd dim’

import numpy as np

arr = np.array([[1,2,3,4,5],
Last element from 2nd dim: 10
[6,7,8,9,10]])

#print()
NumPy

Negative Indexing

Use negative indexing to access an array from the end.


'Last element from 2nd dim’
import numpy as np

arr = np.array([[1,2,3,4,5],
[6,7,8,9,10]]) Last element from 2nd dim: 10

print('Last element from 2nd dim:


', arr[1, -1])
Pandas

• Pandas is a Python library used for working with data


sets.

• It has functions for analyzing, cleaning, exploring, and


manipulating data.

• It can be installed using PIP command

• Once installed, it can be imported into your Python code


Pandas

What Can Pandas Do?


Pandas gives you answers about the data. Like:
Is there a correlation between two or more
columns?
• What is average value?
• Max value?
• Min value?

Pandas are also able to delete rows that are not


relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.
Pandas

Installation of Pandas

pip install pandas

Once Pandas is installed, import it in your applications by


adding the import keyword:

import pandas
Pandas

Example

import pandas as pd

mydataset = {
cars passings
'cars': ["BMW", "Volvo", "Ford"], 0 BMW 3
'passings': [3, 7, 2] 1 Volvo 7
2 Ford 2
}
myvar = pd.DataFrame(mydataset)
print(myvar)
Pandas

Series: A Pandas Series is like a column in a table


It is a one-dimensional array holding data of any type

import pandas as pd

a = [1, 7, 2] 0 1
1 7
2 2
myvar = pd.Series(a)

print(myvar)
Pandas

Labels: If nothing else is specified, the values are labeled


with their index number.
The label can be used to access a specified value

import pandas as pd

a = [1, 7, 2]
1
myvar = pd.Series(a)

print(myvar[0])
Pandas

Create Labels:
With the index argument, you can name your own labels

import pandas as pd

a = [1, 7, 2] x 1
y 7
z 2
myvar = pd.Series(a, index =
["x", "y", "z"])
print(myvar)
Pandas

Create Labels:
With the index argument, you can name your own labels
You can access an item by referring to the label

import pandas as pd

a = [1, 7, 2]
7
myvar = pd.Series(a, index =
["x", "y", "z"])
print(myvar[“y”])
Pandas

Key/Value Objects as Series


You can also use a key/value object, like a dictionary, when
creating a Series

import pandas as pd
calories =
{"day1": 420, "day2": 380, "day3": 390 day1 420
day2 380
} day3 390
myvar = pd.Series(calories)
print(myvar)
Pandas

Key/Value Objects as Series


To select only some of the items in the dictionary, use the
index argument and specify only the items you want to
include in the Series.

import pandas as pd
calories =
{"day1": 420, "day2": 380, "day3": 390
day1 420
} day2 380
myvar = pd.Series(calories, index =
["day1", "day2"])
print(myvar)
Pandas

DataFrames
• Data sets in Pandas are usually multi-dimensional tables,
called DataFrames.
• Series is like a column, a DataFrame is the whole table.
import pandas as pd
data = {
"calories": [420, 380, 390], calories duration
0 420 50
"duration": [50, 40, 45] 1 380 40
} 2 390 45
df = pd.DataFrame(data)
print(df)
Pandas

DataFrames
• DataFrame is like a table with rows and columns.
• Pandas use the loc attribute to return one or more
specified row(s)
import pandas as pd
data = {
"calories": [420, 380, 390],
calories 420
"duration": [50, 40, 45] duration 50
}
df = pd.DataFrame(data)
print(df.loc[0])
Pandas

Named Indexes
• With the index argument, you can name your own
indexes
import pandas as pd
data = {
"calories": [420, 380, 390],
calories duration
"duration": [50, 40, 45] day1 420 50
} day2 380 40
day3 390 45
df = pd.DataFrame(data, index =
["day1", "day2", "day3"])
print(df)
Pandas

Locate Name Indexes


• Use the named index in the loc attribute to return the
specified row(s)
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45] calories 380
} duration 40

df = pd.DataFrame(data, index =
["day1", "day2", "day3"])
print(df.loc[“day2”])
Pandas

Load Files into a DataFrame


• If your data sets are stored in a file, Pandas can load
them into a DataFrame

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
import pandas as pd 1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
df = pd.read_csv('data.csv') .. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
print(df) 167 75 120 150 320.4
168 75 125 150 330.4
Pandas

Read CSV Files


• A simple way to store big data sets is to use CSV files
(comma separated files).
• CSV files contains plain text and is a well know format
that can be read by everyone including Pandas.
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
import pandas as pd 1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
df = pd.read_csv('data.csv') 5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
print(df.to_string()) .. ... ... ... ...
Pandas

Dictionary as import pandas as pd "Maxpulse":{


"0":130,
JSON data = { "1":145,
"Duration":{ "2":135,
"0":60, "3":175,
"1":60, "4":148,
"2":60, "5":127
"3":45, },
"4":45, "Calories":{
"5":60 "0":409,
}, "1":479,
"Pulse":{ "2":340,
"0":110, "3":282,
"1":117, "4":406,
"2":103, "5":300
"3":109, }
"4":117, }
"5":102
}, df = pd.DataFrame(data)

print(df)
Pandas

Dictionary as JSON

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.5
Pandas

Viewing the Data


• One of the most used method for getting a quick
overview of the DataFrame, is the head() method.
• The head() method returns the headers and a specified
number of rows, starting from the top.
Duration Pulse Maxpulse Calories
import pandas as pd 0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
df = pd.read_csv('data.csv') 4 45 117 148 406.0
5 60 102 127 300.5
6 60 110 136 374.0
7 45 104 134 253.3
print(df.head(9)) 8 30 109 133 195.1
Pandas

Info About the Data


• The DataFrames object has a method called info(), that
gives you more information about the data set.

import pandas as pd <class 'pandas.core.frame.DataFrame'>


RangeIndex: 169 entries, 0 to 168
Data columns (total 4 columns):
# Column Non-Null Count Dtype
df = pd.read_csv('data.csv') --- ------ -------------- -----
0 Duration 169 non-null int64
1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
print(df.info()) 3 Calories 164 non-null float64
Pandas

Data Cleaning
Data cleaning means fixing bad data in your data set.
Bad data could be:
– Empty cells
– Data in wrong format
– Wrong data
– Duplicates
Pandas

Data Set

Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 NaN
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1
9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
12 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
14 60 '2020/12/14' 104 132 379.3
15 60 '2020/12/15' 98 123 275.0
16 60 '2020/12/16' 98 120 215.2
17 60 '2020/12/17' 100 120 300.0
18 45 '2020/12/18' 90 112 NaN
Pandas

Data Set Issues

• The data set contains some empty cells ("Date" in row


22, and "Calories" in row 18 and 28).

• The data set contains wrong format ("Date" in row 26).

• The data set contains wrong data ("Duration" in row 7).

• The data set contains duplicates (row 11 and 12).


Pandas

Empty Cells
• One way to deal with empty cells is to remove rows that
contain empty cells.
import pandas as pd

<Duration Date Pulse Maxpulse Calories


df = pd.read_csv('data.csv') 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
new_df = df.dropna() 5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
#df.dropna(inplace = True) 7 450 '2020/12/08' 104 134 253.3

print(new_df.to_string())
Pandas

Replace Empty Values


• Insert a new value instead, don’t have to delete entire
rows just because of some empty cells.

import pandas as pd
<Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 130
df = pd.read_csv('data.csv') 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
df.fillna(130, inplace = 5 60 '2020/12/06' 102 127 300.0

True)
Pandas

Replace Only For Specified Columns


• To only replace empty values for one column, specify
the column name for the DataFrame:

import pandas as pd
<Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 130
df = pd.read_csv('data.csv') 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
df[“Calories”].fillna(130, 5 60 '2020/12/06' 102 127 300.0

inplace = True)
Pandas

Replace Using Mean, Median, or Mode


• A common way to replace empty cells, is to calculate the
mean, median or mode value of the column.
import pandas as pd

df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories


0 60 '2020/12/01' 110 130 450
1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
x = df["Calories"].mean() 3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
5 60 '2020/12/06' 102 127 300.0

df["Calories"].fillna(x,
inplace = True)
Pandas

Data of Wrong Format


• We have two cells with the wrong format. Check out row 22 and
26, the 'Date' column should be a string that represents a date:

import pandas as pd

df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories


21 60 2020-12-21 108 131 364.2
22 45 NaT 100 119 282.0
23 60 2020-12-23 130 101 300.0
df['Date'] = 24 45 2020-12-24 105 132 246.0
25 60 2020-12-25 102 126 334.5
pd.to_datetime(df['Date']) 26 60 2020-12-26 100 120 250.

print(df.to_string())
Pandas

Data of Wrong Format


• Empty data in row 22 got a NaT value. One way to deal with
empty values is simply removing the entire row

import pandas as pd

df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories


21 60 2020-12-21 108 131 364.2
df['Date'] = 23 60 2020-12-23 130 101 300.0
24 45 2020-12-24 105 132 246.0
pd.to_datetime(df['Date']) 25 60 2020-12-25 102 126 334.5
26 60 2020-12-26 100 120 250.
df.dropna(subset=[‘Date’],
inplace – True)
print(df.to_string())
Pandas

Wrong Data – Replacing Values


• Empty data in row 22 got a NaT value. One way to deal with
empty values is simply removing the entire row

import pandas as pd

<Duration Date Pulse Maxpulse Calories


df = pd.read_csv('data.csv') 5 60 '2020/12/06' 102 127 300.0
6 60 '2020/12/07' 110 136 374.0
df.loc[7,'Duration'] = 45 7 450 '2020/12/08' 104 134 253.3
8 30 '2020/12/09' 109 133 195.1

print(df.to_string())
Pandas

Wrong Data – Replacing Values


• To replace wrong data for larger data sets you can create some
rules, e.g. set some boundaries for legal values, and replace any
values that are outside of the boundaries.

import pandas as pd
df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 409.1
for x in df.index: 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
if df.loc[x, "Duration"] > 120: 3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
df.loc[x, "Duration"] = 120
print(df.to_string())
Pandas

Wrong Data – Replacing Values


• Another way of handling wrong data is to remove the rows
that contains wrong data.

import pandas as pd
df = pd.read_csv('data.csv') <Duration Date Pulse Maxpulse Calories
0 60 '2020/12/01' 110 130 409.1
for x in df.index: 1 60 '2020/12/02' 117 145 479.0
2 60 '2020/12/03' 103 135 340.0
if df.loc[x, "Duration"] > 120: 3 45 '2020/12/04' 109 175 282.4
4 45 '2020/12/05' 117 148 406.0
df.drop(x, inplace = True)
print(df.to_string())
Pandas

Discovering Duplicates
• Duplicate rows are rows that have been registered more
than one time.

0 False
1 False
import pandas as pd
2 False
3 False
4 False
df = pd.read_csv('data.csv')
5 False
6 False
7 False
Print(df.duplicated())
8 False
9 .......
Pandas

Removing Duplicates
• To remove duplicates, use the drop_duplicates() method.

import pandas as pd

Duration Date Pulse Maxpulse Calories


df = pd.read_csv('data.csv') 9 60 '2020/12/10' 98 124 269.0
10 60 '2020/12/11' 103 147 329.3
11 60 '2020/12/12' 100 120 250.7
13 60 '2020/12/13' 106 128 345.3
df.drop_duplicates(inplace = True) 14 60 '2020/12/14' 104 132 379.3

print(df.toString())
Pandas

Finding Relationships
• A great aspect of the Pandas module is the corr() method.
• The corr() method calculates the relationship between
each column in your data set.

import pandas as pd
Duration Pulse Maxpulse Calories
Duration 1.000000 -0.059452 -0.250033 0.344341
df = pd.read_csv('data.csv') Pulse -0.059452 1.000000 0.269672 0.481791
Maxpulse -0.250033 0.269672 1.000000 0.335392
Calories 0.344341 0.481791 0.335392 1.00000

print(df.corr())
Pandas

Finding Relationships
• The corr() method calculates the relationship between
each column in your data set.
• The corr() method ignores "not numeric" columns.

Duration Pulse Maxpulse Calories


Duration 1.000000 -0.059452 -0.250033 0.344341
Pulse -0.059452 1.000000 0.269672 0.481791
Maxpulse -0.250033 0.269672 1.000000 0.335392
Calories 0.344341 0.481791 0.335392 1.00000
Pandas

Results Explained

• The Result of the corr() method is a table with a lot of


numbers that represents how well the relationship is
between two columns.

• The number varies from -1 to 1.

• 1 means that there is a 1 to 1 relationship (a perfect


correlation), and for this data set, each time a value went
up in the first column, the other one went up as well.
Pandas

Results Explained

• 0.9 is also a good relationship, and if you increase one


value, the other will probably increase as well.

• -0.9 would be just as good relationship as 0.9, but if you


increase one value, the other will probably go down.

• 0.2 means NOT a good relationship, meaning that if one


value goes up does not mean that the other will.
Pandas

Pandas - Plotting
Pandas

Dataset Plotting
• We can use plotting library called matplotlib

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot()
plt.show()
Pandas

Plotting
• Pandas uses the plot() method to create diagrams.
• We can use Pyplot, a submodule of the Matplotlib library to
visualize the diagram on the screen.
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot()
plt.show()
Pandas

Scatter Plot
• Specify that you want a scatter plot with the kind argument:
• kind = 'scatter'
• A scatter plot needs an x- and a y-axis..
import pandas as pd
import matplotlib.pyplot as plt

df.plot(kind = 'scatter', x
= 'Duration', y = 'Calories')

plt.show()
Pandas

Remember: In
the previous
example, we
learned that the
correlation
between
"Duration" and
"Calories" was
0.922721, and
we concluded
with the fact that
higher duration
means more
calories burned.
Pandas

• Let's create another scatterplot, where there is a bad


relationship between the columns, like “x" and “y", with
the correlation c:

?
Pandas

• Let's create another scatterplot, where there is a bad


relationship between the columns, like "Duration" and
"Maxpulse", with the correlation 0.009403:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot(kind = 'scatter', x = 'Duration', y = 'Maxpulse')

plt.show()
Pandas

• Let's create another scatterplot, where there is a bad relationship between the
columns, like "Duration" and "Maxpulse", with the correlation 0.009403:
Pandas

Histogram
• Use the kind argument to specify that you want a
histogram:

• kind = 'hist'

• A histogram needs only one column.

• A histogram shows us the frequency of each interval, e.g.


how many workouts lasted between 50 and 60 minutes?
Pandas

Histogram
• we will use the "Duration" column to create the histogram.
The histogram tells us that there were over 100 workouts that lasted
between 50 and 60 minutes.

***
df["Duration"].plot(kind = 'hist')
***

You might also like