0% found this document useful (0 votes)
25 views9 pages

Lecture Material 8

This document discusses correlation analysis techniques in Python using Pandas and Seaborn. It shows how to calculate and visualize correlation matrices for different datasets using Pearson, Spearman and heatmap plots. Scatter plots are also used to explore linear relationships between variables.

Uploaded by

Ali Naseer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views9 pages

Lecture Material 8

This document discusses correlation analysis techniques in Python using Pandas and Seaborn. It shows how to calculate and visualize correlation matrices for different datasets using Pearson, Spearman and heatmap plots. Scatter plots are also used to explore linear relationships between variables.

Uploaded by

Ali Naseer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Lecture_material_8

January 31, 2024

1 Correlation
[ ]: import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

[ ]: # Correlation
df=sns.load_dataset('iris')

[ ]: df.head()

[ ]: sepal_length sepal_width petal_length petal_width species


0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

[ ]: df=df[['sepal_length','sepal_width','petal_length','petal_width']]

[ ]: df.corr(method='pearson') # for normal data

[ ]: sepal_length sepal_width petal_length petal_width


sepal_length 1.000000 -0.117570 0.871754 0.817941
sepal_width -0.117570 1.000000 -0.428440 -0.366126
petal_length 0.871754 -0.428440 1.000000 0.962865
petal_width 0.817941 -0.366126 0.962865 1.000000

[ ]: df.corr(method='spearman') # for non normal distribution

[ ]: sepal_length sepal_width petal_length petal_width


sepal_length 1.000000 -0.166778 0.881898 0.834289
sepal_width -0.166778 1.000000 -0.309635 -0.289032
petal_length 0.881898 -0.309635 1.000000 0.937667
petal_width 0.834289 -0.289032 0.937667 1.000000

[ ]: sns.regplot(x=df['sepal_length'], y=df['petal_length']) # for linear regression

1
[ ]: <Axes: xlabel='sepal_length', ylabel='petal_length'>

[ ]: phool=sns.load_dataset('iris')
phool.head(5)

[ ]: sepal_length sepal_width petal_length petal_width species


0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

[ ]: sns.regplot(x=phool['sepal_length'], y=phool['sepal_width'])

[ ]: <Axes: xlabel='sepal_length', ylabel='sepal_width'>

2
[ ]: corr=df.corr(method="pearson")

[ ]: sns.heatmap(corr)

[ ]: <Axes: >

3
[ ]: sns.heatmap(corr, annot= True)

[ ]: <Axes: >

4
[ ]: %pip install Jinja2

[ ]: corr.style.background_gradient(cmap='coolwarm')

[ ]: <pandas.io.formats.style.Styler at 0x156de8600e0>

[ ]: sns.pairplot(corr)

[ ]: <seaborn.axisgrid.PairGrid at 0x156d5ef5880>

5
[ ]: penguins=sns.load_dataset('penguins')
penguins.head(-10)

[ ]: species island bill_length_mm bill_depth_mm flipper_length_mm \


0 Adelie Torgersen 39.1 18.7 181.0
1 Adelie Torgersen 39.5 17.4 186.0
2 Adelie Torgersen 40.3 18.0 195.0
3 Adelie Torgersen NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0
.. … … … … …
329 Gentoo Biscoe 48.1 15.1 209.0
330 Gentoo Biscoe 50.5 15.2 216.0
331 Gentoo Biscoe 49.8 15.9 229.0

6
332 Gentoo Biscoe 43.5 15.2 213.0
333 Gentoo Biscoe 51.5 16.3 230.0

body_mass_g sex
0 3750.0 Male
1 3800.0 Female
2 3250.0 Female
3 NaN NaN
4 3450.0 Female
.. … …
329 5500.0 Male
330 5000.0 Female
331 5950.0 Male
332 4650.0 Female
333 5500.0 Male

[334 rows x 7 columns]

[ ]: sns.pairplot(penguins, hue='species')

[ ]: <seaborn.axisgrid.PairGrid at 0x156e8d31cd0>

7
[ ]: sns.pairplot(penguins, hue='species', diag_kind='hist')

[ ]: <seaborn.axisgrid.PairGrid at 0x156de85be00>

8
[ ]: # Calculating pearson correlation
from scipy.stats import pearsonr
corr, _ = pearsonr(phool['sepal_length'], phool['petal_length'])
print('Pearsons correlation: %.3f' % corr)

Pearsons correlation: 0.872

You might also like