Dev Lab Record
Dev Lab Record
Install the data Analysis and Visualization tool: R/ Python /Power BI.
DATE:
1. Installation:
1
5. Indexing Using Labels in Pandas
# prints first five rows including 5th index and every columns of dfdf.loc[0:5,:]
# prints from 5th rows onwards and entire columnsdf =
df.loc[5:,:]
# Prints the first 5 rows of Time period#
value
df.loc[:5,"Time period"]
6. Installation
pip install matplotlib
7. Pandas Plotting
# import the required module
import matplotlib.pyplot as plt
# plot a histogram
df['Observation Value'].hist(bins=10)
# shows presence of a lot of outliers/extreme values
df.boxplot(column='Observation Value', by = 'Time period')
# plotting points as a scatter plot
x = df["Observation Value"]
y = df["Time period"]
plt.scatter(x, y, label= "stars", color= "m",marker= "*", s=30)
# x-axis label
plt.xlabel('Observation Value')#
frequency label
plt.ylabel('Time period')
# function to show the plot
plt.show()
Output:
2
Ex 2. Perform exploratory data analysis (EDA) on with datasets
like email data set. Export all your emails as a dataset, import
DATE:
them inside a pandas data frame, visualize them and get
different insights from the data.
PROGRAM:
3
OUTPUT:
4
Ex 3. Working with Nupy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.Numpy arrays using matplotlib
Program:
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11)
y=2*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
Output:
import pandas as pd
5
Basic plots
6
Ex 4. Explore various variable and row filters in python for cleaning data. Apply
various plot features in python on sample data sets and visualize
DATE:
Program:
import pandas as pd
import numpy as np
print (df)
Output:
import pandas as pd
import numpy as np
print (df['one'].isnull())
7
Output:
a False
b True
c False
d True
e False
f False
g True
h False
Name: one, dtype: bool
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 3), index=['a', 'c', 'e'],columns=['one',
'two', 'three'])
df = df.reindex(['a', 'b', 'c'])
print df
print ("NaN replaced with '0':")
print (df.fillna(0))
Output:
8
Output:
import pandas as pd
import numpy as np
df = pd.DataFrame({'one':[10,20,30,40,50,2000],
'two':[1000,0,30,40,50,60]})
print (df.replace({1000:10,2000:60}))
Output:
one two
0 10 10
1 20 0
2 30 30
3 40 40
4 50 50
5 60 60
9
Ex 5. Perform Time Series Analysis and apply the various visualization techniques.
DATE:
Program:
10
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in
Australia from 1992 to 2008.')
Output:
11
Ex 6. Perform Data Analysis and representation on a Map using various Map data sets with
Mouse Rollover effect, user interaction, etc.
DATE:
Program:
import plotly.express as px
import pandas as pd
print("getting data")
df=px.data.carshare()
print(df.head(10))
print(df.tail(10))
fig=px.scatter_mapbox(df,
lon=df["centroid_lon"],
lat=df["centroid_lat"],
zoom=10,
color=df["peak_hour"],
size=df["car_hours"],
width=1200,
height=900,
title="CAR SHARE SCATTER MAP")
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":50,"b":10})
fig.show()
12
Output
13
Ex 7. Build cartographic visualization for multiple datasets involving
DATE: various countries of the worldstates and districts in India etc.
Program:
Output:
14
Ex 8. Perform EDA on Wine Quality Data Set
DATE:
Program:
15
Output:
16
Ex 9. Use a case study on a data set and apply the various EDA and
visualization techniques andpresent an analysis report
DATE:
Program:
import pandas as pd
import numpy as np
import seaborn as sns
#Load the data
df =pd.read_csv('titanic.csv')
#View the data
df.head()
#Basic information
df.info()
df.describe()
17
Describe the data - Descriptive statistics.
Duplicate values
df. duplicated().sum()
Output:
0
This means, there is not a single duplicate value present in our dataset.
18
Unique values in the data
#unique values
df['Pclass'].unique()
df['Survived'].unique()
df['Sex'].unique()
array([3, 1, 2], dtype=int64)
array([0, 1], dtype=int64)
array(['male', 'female'],dtype=object)
Visualize the Unique counts
#Plot the unique values
sns.countplot(df['Pclass']).unique()
df.isnull().sum()
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
19
Replace the Null values
A replace() function to replace all the null values with a specific data.
#Replace null values
df.replace(np.nan,'0',inplace = True)
#Check the changes now
df.isnull().sum()
PassengerId
Survived
Pclass 0
Name0
Sex 0
Age 0
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 0
Embarked0dtype: int64
lOMoARcPSD|272 628 94
20
A quick box plot
df[['Fare']].boxplot()
#Correlation plot
sns.heatmap(df.corr())
21