0% found this document useful (0 votes)

0 views21 pages

Dev Lab Record

The document provides a comprehensive guide on data analysis and visualization using tools such as Python, Pandas, and Matplotlib. It covers installation, creating DataFrames, importing data, indexing, exploratory data analysis (EDA), time series analysis, and cartographic visualization, along with various coding examples. Additionally, it includes methods for handling missing data, visualizing datasets, and performing statistical analysis.

Uploaded by

zabuzarkhan00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views21 pages

Dev Lab Record

Uploaded by

zabuzarkhan00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Ex 1.

Install the data Analysis and Visualization tool: R/ Python /Power BI.
DATE:

1. Installation:

pip install pandas

2. Creating A DataFrame in Pandas:

# assigning two series to s1 and s2
s1 = pd.Series([1,2])
s2 = pd.Series(["Ashish", "Sid"])
# framing series objects into data
df = pd.DataFrame([s1,s2])
# show the data frame df
# data framing in another way
# taking index and column values
dframe = pd.DataFrame([[1,2],["Ashish", "Sid"]],
index=["r1", "r2"],
columns=["c1", "c2"])
dframe
# framing in another way #
dict-like container
dframe=pd.DataFrame({
"c1": [1, "Ashish"],
"c2": [2, "Sid"]})
dframe
3. Importing Data with Pandas

# Import the pandas library, renamed as pd

import pandas as pd
# Read IND_data.csv into a DataFrame, assigned to df
df = pd.read_csv("IND_data.csv")
# Prints the first 5 rows of a DataFrame as default
df.head()
# Prints no. of rows and columns of a DataFrame
df.shape

4. Indexing DataFrames with Pandas

# prints first 5 rows and every column which replicates df.head() df.iloc[0:5,:]
# prints entire rows and columns
df.iloc[:,:]
# prints from 5th rows and first 5 columns
df.iloc[5:,:5]

1
5. Indexing Using Labels in Pandas
# prints first five rows including 5th index and every columns of dfdf.loc[0:5,:]
# prints from 5th rows onwards and entire columnsdf =
df.loc[5:,:]
# Prints the first 5 rows of Time period#
value
df.loc[:5,"Time period"]

6. Installation
pip install matplotlib

7. Pandas Plotting
# import the required module
import matplotlib.pyplot as plt
# plot a histogram
df['Observation Value'].hist(bins=10)
# shows presence of a lot of outliers/extreme values
df.boxplot(column='Observation Value', by = 'Time period')
# plotting points as a scatter plot
x = df["Observation Value"]
y = df["Time period"]
plt.scatter(x, y, label= "stars", color= "m",marker= "*", s=30)
# x-axis label
plt.xlabel('Observation Value')#
frequency label
plt.ylabel('Time period')
# function to show the plot
plt.show()

Output:

2
Ex 2. Perform exploratory data analysis (EDA) on with datasets
like email data set. Export all your emails as a dataset, import
DATE:
them inside a pandas data frame, visualize them and get
different insights from the data.

PROGRAM:

#import required modules

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#read the dataset to be accessed
df=pd.read_csv(r"C:\\Users\sibis\Desktop\studies\2nd sem\dev lab\EMAIL.csv")
#headfunction()-
print(df.head())
#dimentions of table
print(df.shape)
#table info
print(df.info)
#table statistics
print(df.describe)
#unique values is a particular
attribute print(df.TO.unique())
#subject of mail
plt.plot(df["FROM"],df["SUBJECT"])
plt.show()

3
OUTPUT:

4
Ex 3. Working with Nupy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.Numpy arrays using matplotlib

Program:

import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11)
y=2*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
Output:

3(b)Pandas dataframe using matplotlib

import pandas as pd

import matplotlib.pyplot as plt

df = pd.DataFrame({'Name': ['John', 'Sammy', 'Joe'],'Age': [45, 38, 90]})

df.plot(x="Name", y="Age", kind="bar")

5
Basic plots

import matplotlib.pyplot as plt

x = [1,1.5,2,2.5,3,3.5,3.6]
y = [7.5,8,8.5,9,9.5,10,10.5]
x1=[8,8.5,9,9.5,10,10.5,11]
y1=[3,3.5,3.7,4,4.5,5,5.2]
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b')
plt.xlabel('saving*100')
plt.ylabel('income*1000')
plt.title('Scatter Plot')
plt.legend()
plt.show()

6
Ex 4. Explore various variable and row filters in python for cleaning data. Apply
various plot features in python on sample data sets and visualize
DATE:

Program:

# import the pandas library

import pandas as pd

import numpy as np

df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h'],columns=['one', 'two', 'three'])

df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

print (df)

Output:

one two three

a 0.077988 0.476149 0.965836
b NaN NaN NaN
c -0.390208 -0.551605 -2.301950
d NaN NaN NaN
e -2.000303 -0.788201 1.510072

Program :( Checking duplicate)

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',

'h'],columns=['one', 'two', 'three'])

df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

print (df['one'].isnull())

7
Output:

a False
b True
c False
d True
e False
f False
g True
h False
Name: one, dtype: bool

Program:(filling missing data)

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 3), index=['a', 'c', 'e'],columns=['one',
'two', 'three'])
df = df.reindex(['a', 'b', 'c'])
print df
print ("NaN replaced with '0':")
print (df.fillna(0))

Output:

one two three

a -0.576991 -0.741695 0.553172
b NaN NaN NaN
c 0.744328 -1.735166 1.749580

NaN replaced with '0':

one two three
a -0.576991 -0.741695 0.553172
b 0.000000 0.000000 0.000000
c 0.744328 -1.735166 1.749580

Program:(Drop missing values)

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h'],columns=['one', 'two', 'three'])
df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
print (df.dropna())

8
Output:

one two three

a 0.077988 0.476149 0.965836
c -0.390208 -0.551605 -2.301950
e -2.000303 -0.788201 1.510072
f -0.930230 -0.670473 1.14661

Program:(Replace missing or generic values)

import pandas as pd
import numpy as np
df = pd.DataFrame({'one':[10,20,30,40,50,2000],
'two':[1000,0,30,40,50,60]})
print (df.replace({1000:10,2000:60}))

Output:

one two
0 10 10
1 20 0
2 30 30
3 40 40
4 50 50
5 60 60

9
Ex 5. Perform Time Series Analysis and apply the various visualization techniques.
DATE:

Program:

import matplotlib as mpl import

matplotlib.pyplot as plt import
seaborn as sns
import numpy as np
import pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120})
# Import as Dataframe
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])
df.head()
Date Value
0 1991-07-01 3.526591
1 1991-08-01 3.180891
2 1991-09-01 3.252221
3 1991-10-01 3.611003
4 1991-11-01 3.565869

# Time series data source: fpp pacakge in R.

import matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()

10
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in
Australia from 1992 to 2008.')

Output:

11
Ex 6. Perform Data Analysis and representation on a Map using various Map data sets with
Mouse Rollover effect, user interaction, etc.
DATE:

Program:

import plotly.express as px
import pandas as pd
print("getting data")
df=px.data.carshare()
print(df.head(10))
print(df.tail(10))
fig=px.scatter_mapbox(df,
lon=df["centroid_lon"],
lat=df["centroid_lat"],
zoom=10,
color=df["peak_hour"],
size=df["car_hours"],
width=1200,
height=900,
title="CAR SHARE SCATTER MAP")
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":50,"b":10})
fig.show()

12
Output

13
Ex 7. Build cartographic visualization for multiple datasets involving
DATE: various countries of the worldstates and districts in India etc.

Program:

Output:

14
Ex 8. Perform EDA on Wine Quality Data Set
DATE:

Program:

15
Output:

16
Ex 9. Use a case study on a data set and apply the various EDA and
visualization techniques andpresent an analysis report
DATE:

Program:
import pandas as pd
import numpy as np
import seaborn as sns
#Load the data
df =pd.read_csv('titanic.csv')
#View the data
df.head()

#Basic information

df.info()

#Describe the data

df.describe()

17
Describe the data - Descriptive statistics.

Duplicate values
df. duplicated().sum()

Output:
0
This means, there is not a single duplicate value present in our dataset.

18
Unique values in the data
#unique values
df['Pclass'].unique()
df['Survived'].unique()
df['Sex'].unique()
array([3, 1, 2], dtype=int64)
array([0, 1], dtype=int64)
array(['male', 'female'],dtype=object)
Visualize the Unique counts
#Plot the unique values
sns.countplot(df['Pclass']).unique()

df.isnull().sum()
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0

19
Replace the Null values
A replace() function to replace all the null values with a specific data.
#Replace null values
df.replace(np.nan,'0',inplace = True)
#Check the changes now
df.isnull().sum()
PassengerId
Survived
Pclass 0
Name0
Sex 0
Age 0
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 0
Embarked0dtype: int64
lOMoARcPSD|272 628 94

Know the datatypes

df.dtypes
PassengerId
int64
Survived int64
Pclass int64
Name object
Sex object
Age object
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object

Filter the Data

df[df['Pclass']==1].head()

20
A quick box plot
df[['Fare']].boxplot()

Correlation Plot - EDA

df.corr()

#Correlation plot
sns.heatmap(df.corr())

The Universal Encyclopedia of Scales
100% (1)
The Universal Encyclopedia of Scales
2,620 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
10) Merging Dataframes: # Detecting Duplicates
No ratings yet
10) Merging Dataframes: # Detecting Duplicates
7 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Unit 2
No ratings yet
Unit 2
81 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Pandas
No ratings yet
Pandas
63 pages
Certificate
No ratings yet
Certificate
25 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
L32, 33 Pandas
No ratings yet
L32, 33 Pandas
7 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
Python Practical Questions
No ratings yet
Python Practical Questions
13 pages
Dataframes UNIT 1 PART 2
No ratings yet
Dataframes UNIT 1 PART 2
33 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Lab 9
No ratings yet
Lab 9
9 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Pandas
No ratings yet
Pandas
44 pages
Panas Short Notes
No ratings yet
Panas Short Notes
4 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Ip Study
No ratings yet
Ip Study
18 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Pandas
No ratings yet
Pandas
49 pages
Mohit
No ratings yet
Mohit
19 pages
Pandas
No ratings yet
Pandas
29 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
Code Explanation For Date Types
No ratings yet
Code Explanation For Date Types
8 pages
Module 3
No ratings yet
Module 3
20 pages
DAP Module4 Notes
No ratings yet
DAP Module4 Notes
17 pages
Practical - With Solution - XII - IP
No ratings yet
Practical - With Solution - XII - IP
13 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Exp 3
No ratings yet
Exp 3
10 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
XII - LIST OF PRACTICALS - With Answers
No ratings yet
XII - LIST OF PRACTICALS - With Answers
20 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas
No ratings yet
Pandas
36 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Python Amit
No ratings yet
Python Amit
11 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
EX-02-Data Manipulation Pandas Matplot
No ratings yet
EX-02-Data Manipulation Pandas Matplot
9 pages
Lecture - 2 Pandas
No ratings yet
Lecture - 2 Pandas
24 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
QB Ph3256 - Ui - U5 Slow Learners
No ratings yet
QB Ph3256 - Ui - U5 Slow Learners
5 pages
Python PRGM
No ratings yet
Python PRGM
51 pages
TAMIL (UNIT - 1&2) 2 Marks and 16 Marks
No ratings yet
TAMIL (UNIT - 1&2) 2 Marks and 16 Marks
6 pages
Stucor - QP Ma3151 ND2023 Aj
No ratings yet
Stucor - QP Ma3151 ND2023 Aj
3 pages
Aptitude
No ratings yet
Aptitude
271 pages
LOGIKAHUKUM2020
No ratings yet
LOGIKAHUKUM2020
26 pages
Leapp Report2
No ratings yet
Leapp Report2
25 pages
b3314 Barge Emily Q 88
No ratings yet
b3314 Barge Emily Q 88
7 pages
Chess Openings: Pirc Defense
57% (7)
Chess Openings: Pirc Defense
2 pages
Hach Lange DR 3900 User Manual
No ratings yet
Hach Lange DR 3900 User Manual
150 pages
GEE338 LCS Chapter - 1 Math Model For LTI System
No ratings yet
GEE338 LCS Chapter - 1 Math Model For LTI System
70 pages
Tech Mahindra Aptitude Questions Answers Paper 4
No ratings yet
Tech Mahindra Aptitude Questions Answers Paper 4
11 pages
Current Mirrors Differential Amplifier Class A: Input Stage
No ratings yet
Current Mirrors Differential Amplifier Class A: Input Stage
4 pages
Finkelstein, L., & Leaning, M. S. (1984) - A Review of The Fundamental Concepts of Measurement. Measurement, 2 (1), 25-34 PDF
No ratings yet
Finkelstein, L., & Leaning, M. S. (1984) - A Review of The Fundamental Concepts of Measurement. Measurement, 2 (1), 25-34 PDF
10 pages
Quiz 4 (PM) Solution
No ratings yet
Quiz 4 (PM) Solution
2 pages
Worksheets LS6 Computer Software
100% (1)
Worksheets LS6 Computer Software
6 pages
DAA Unit5 Part-1
No ratings yet
DAA Unit5 Part-1
7 pages
Nnm22is192 Coa
No ratings yet
Nnm22is192 Coa
7 pages
XDP3002 e
No ratings yet
XDP3002 e
2 pages
Developing Mitigation Measures For The Hazards Associated With
No ratings yet
Developing Mitigation Measures For The Hazards Associated With
25 pages
BITSAT Question Bank 1
No ratings yet
BITSAT Question Bank 1
18 pages
Ee14e-11102 Covec - F - RM
100% (4)
Ee14e-11102 Covec - F - RM
75 pages
Analytic Hierarchy Process and Expert Choice: Benefits and Limitations
No ratings yet
Analytic Hierarchy Process and Expert Choice: Benefits and Limitations
9 pages
Exergo Economics & E I TL Li Exergoenvironmental Analysis: June 12, 10:45 P.M. - 12:15 P.M
No ratings yet
Exergo Economics & E I TL Li Exergoenvironmental Analysis: June 12, 10:45 P.M. - 12:15 P.M
12 pages
B GENERAL BIOLOGY I Q1M3.1 Learner-Copy Final Layout
No ratings yet
B GENERAL BIOLOGY I Q1M3.1 Learner-Copy Final Layout
10 pages
The Ultimate C Handbook-1-3
No ratings yet
The Ultimate C Handbook-1-3
3 pages
I Ching
0% (1)
I Ching
15 pages
(Susol VCB) UL Type Catalog en 202005
No ratings yet
(Susol VCB) UL Type Catalog en 202005
96 pages
Optimization of Cost and Emission For Dynamic Load Dispatch Problem With Hybrid Renewable Energy Sources
No ratings yet
Optimization of Cost and Emission For Dynamic Load Dispatch Problem With Hybrid Renewable Energy Sources
33 pages
2012 Harley Electronics and Trouble Codes
100% (1)
2012 Harley Electronics and Trouble Codes
13 pages
10N60 InchangeSemiconductor
No ratings yet
10N60 InchangeSemiconductor
2 pages
Diagram Answer Choices: Basic Trigonometry Ratio's With Word Problems
No ratings yet
Diagram Answer Choices: Basic Trigonometry Ratio's With Word Problems
4 pages
Thyristor Tutorial
100% (1)
Thyristor Tutorial
9 pages

Dev Lab Record

Uploaded by

Dev Lab Record

Uploaded by

Ex 1.

pip install pandas

2. Creating A DataFrame in Pandas:

# Import the pandas library, renamed as pd

4. Indexing DataFrames with Pandas

#import required modules

3(b)Pandas dataframe using matplotlib

import matplotlib.pyplot as plt

df = pd.DataFrame({'Name': ['John', 'Sammy', 'Joe'],'Age': [45, 38, 90]})

df.plot(x="Name", y="Age", kind="bar")

import matplotlib.pyplot as plt

# import the pandas library

df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f','h'],columns=['one', 'two', 'three'])

df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

one two three

Program :( Checking duplicate)

df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f',

df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])

Program:(filling missing data)

one two three

NaN replaced with '0':

Program:(Drop missing values)

one two three

Program:(Replace missing or generic values)

import matplotlib as mpl import

# Time series data source: fpp pacakge in R.

#Describe the data

Know the datatypes

Filter the Data

Correlation Plot - EDA

You might also like