0% found this document useful (0 votes)

4 views45 pages

Aids Lab

The document is a lab manual for an Artificial Intelligence and Data Science course at Anna University, detailing various exercises related to data analysis, visualization, and cleaning using Python libraries such as Pandas and Matplotlib. It includes installation procedures, exploratory data analysis, working with NumPy arrays and Pandas data frames, creating plots, and data cleaning techniques. Each exercise concludes with a result statement confirming successful completion of the tasks.

Uploaded by

kcetcsedepartment

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views45 pages

Aids Lab

Uploaded by

kcetcsedepartment

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

lOMoARcPSD|57176504

AD3301 dev lab - dev lab manual

Artificial intelligence and data science (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

EX.NO:1 INSTALLATION OF DATA ANALYSIS

AND
VISUALIZATION TOOL

AIM :

To install Data analysis and visualization tool

INSTALLATION PROCEDURE:

Python is a great language for doing data analysis, primarily because of the fantastic
ecosystem of data-centric Python packages. Pandas is one of those packages, and
makes importing and analyzing data much easier

Step 1: To ensure that the system is updated and the necessary packages are installed, open a
terminal window and type the following commands:

Sudo apt update

Step 2:
Sudo apt install python3

Step 3: Verify the installation by checking the installed version:

$ python3 –version

Step 4 : Install pandas and numpy

$ pip install pandas

$ pip install numpy

Step 5: Install visualization tool

$ pip install matplotlib

RESULT:

Thus,we install Data analysis and visualization tool

1
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

EX.NO:2 EXPLORATORY DATA ANALYSIS

AIM:

To Perform exploratory data analysis (EDA) on with datasets like email data set. Export all
your emails as a dataset, import theminside a pandas data frame, visualize them and get
different insights from the data.

ALGORITHM:
Step 1 : Import necessary libraries
Step 2 :Create a simple example dataset
Step 3 :Display the dataset
Step 4 :Check for missing values
Step 5 :Visualize the distribution ofemail lengths,count and common words
Step 6 : Stop

PROGRAM:

# Import necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
# Create a simple example dataset (replace this with
your actual dataset)
data = {'sender': ['sender1', 'sender2', 'sender1', 'sender3'],
'content':
['Hello,howareyou?','Meetingtomorrow?','Important
update', 'Check this out!']}
email_data = pd.DataFrame(data)
# Display the dataset
print(email_data)
# Check for missing values

2
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

print(email_data.isnull().sum())

# Visualize the distribution ofemail lengths

email_data['email_length']=email_data['content'].apply(len)
plt.figure(figsize=(10,6))
plt.hist(email_data['email_length'],bins=30,
edgecolor='black')
plt.title('Distribution ofEmail Lengths')
plt.xlabel('EmailLength')
plt.ylabel('Frequency')
plt.show()
# Visualize the count ofemails per sender
plt.figure(figsize=(8, 5))
email_data['sender'].value_counts().plot(kind='bar',color='s
kyblue')
plt.title('Count ofEmails per Sender')
plt.xlabel('Sender')
plt.ylabel('Count')
plt.show()
# Visualize the most common words in emails
all_emails = ' '.join(email_data['content'])
wordcloud = WordCloud(width=800, height=400,
background_color='white').generate(all_emails)
plt.figure(figsize=(10, 6))
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud ofEmail Content')
plt.show()

3
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

OUTPUT:

RESULT:
4
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

Thus the exploratory data analysis (EDA) on with datasets like email data set Performed
successfully

EX.NO:3.1 NUMPY ARRAYS

AIM:
To Working with NumPy arrays, Pandas data frames, and creating basic plots using
Matplotlib

PROGRAM
import numpy as np

# Creating NumPy arrays

arr1=np.array([1,2,3,4,5])
arr2=np.array([6,7,8,9,10])

#Basicoperationsonarrays

sum_array = arr1 + arr2

product_array = arr1 * arr2
mean_value=np.mean(arr1)

#Displayingarraycontentsandresults
print("Array 1:", arr1)
print("Array2:",arr2)
print("Sum of arrays:", sum_array)
print("Productofarrays:", product_array)
print("Mean of Array 1:", mean_value)

5
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

OUTPUT

Array1: [1 2 3 4 5]
Array2: [6 7 8 9 10]
Sum of arrays: [7 9 1 1 1 3 1 5]
Product of arrays: [6 1 4 2 4 3 6 5 0]
Mean of Array 1:3.0

RESULT:

Thus we Working with NumPy arrays, Pandas data frames, and creating basic plots

6
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

using Matplotlib

Ex.No:3.2 PANDAS DATAFRAMES

AIM:
To Working with Pandas data frames, using Matplotlib

PROGRAM

import pandas as pd

#CreatingaPandasDataFramefromadictionary

data = {'Name':['Alice','Bob','Charlie','David','Emily'],'Age': [25, 30, 22, 35, 28],'City':

['NewYork','SanFrancisco','LosAngeles','Chicago','Boston']}
df=pd.DataFrame(data)

# Displaying the DataFrame

print("Original DataFrame:")
print(df)

# Basic DataFrame operations

average_age=df['Age'].mean()
youngest_person=df[df['Age']==df['Age'].min()]

# Displaying results of operations

print("\nAverageAge:",average_age)
print("\nYoungest Person:")
print(youngest_person)

7
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

OUTPUT:
Original DataFrame:
Name Age City
0 Alice 25 NewYork
1 Bob 30 SanFrancisco
2 Charlie 22 LosAngeles
3 David 35 Chicago
4 Emily 28 Boston

AverageAge: 28.0

Youngest Person:
Name Age City
2 Charlie 22 LosAngeles

8
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:
Thus we working with pandas data frames using matplotlib worked successfully

EX.NO:3.3 BASIC PLOTS USING MATPLOTLIB

AIM:
To Working with creating basic plots using Matplotlib

PROGRAM
import matplotlib.pyplot as plt

#Exampledata
x=[1,2,3,4,5]
y=[2,4,6,8,10]

# Line
plotplt.figure(figsize=(8,6)
plt.plot(x,y,label='LinePlot',marker='o',linestyle='-',color='blue')
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()

# Scatter
plotplt.figure(figsize=(8,6))
plt.scatter(x, y, label='Scatter Plot', color='red', marker='x')
plt.title('Simple Scatter Plot')
plt.xlabel('X-axis')
9
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()

OUTPUT:
1. LinePlot:

2. ScatterPlot:

10
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:
Thus we working with basic plot using matplotlib worked successfully

Ex.No:3.4 CUSTOMIZING PLOT

AIM:
To Working with customizing plot using Matplotlib

Program

import numpy as np
import matplotlib.pyplot as plt

#Generatedata
x=np.linspace(0,10,100)
y1 = np.sin(x)
y2=np.cos(x)

#Createafigureandaxis

fig,ax=plt.subplots(figsize=(8,6))

#Plotthedatawithcustomstyles
ax.plot(x,y1,label='SineWave',color='blue',linestyle='--',linewidth=2)
ax.plot(x,y2,label='CosineWave',color='red',linestyle='-',linewidth=2)

#Customizeaxeslabelsandtitle
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_title('CustomizedSineandCosineWaves')

#Addalegenda
x.legend()

11
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

#Addgrid
ax.grid(True,linestyle='--',alpha=0.7)

#Showtheplot
plt.show()

OUTPUT:

12
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:
Thus creating basic plots using Matplotlib worked successfully

EX.NO:4 DATA CLEANING USING R

AIM:
To explore various variable and row filtering techniques in R for cleaning data, and then
apply different plot features on a sample dataset.

PROGRAM
import pandas as pd

#Loadasampledataset(e.g.,Irisdataset)
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names=["sepal_length","sepal_width","petal_length","petal_width","class"]

df = pd.read_csv(url, header=None, names=column_names)

#Displaythefirstfewrowsofthedataset
print("Sample Dataset:")
print(df.head())

13
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

OUTPUT:
SampleDataset:
sepal_lengthsepal_widthpetal_lengthpetal_width class

0 5.1 3.5 1.4 0.2 Iris-setosa

1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

14
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:
Thus we explore various variable and row filtering techniques in R for cleaning data,
and then apply different plot features on a sample dataset worked successfully.

EXP 4.1 EXPLORING DATA

AIM:
To explore and understand the underlying patterns, trends, and relationships within the
dataset.

PROGRAM
import pandas as pd

#Loadasampledataset(e.g.,Irisdataset)

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names=["sepal_length","sepal_width","petal_length","petal_width","class"]
df = pd.read_csv(url, header=None, names=column_names)

#Displaybasicinformationaboutthedataset
print("Dataset Information:")
print(df.info())

#Displaysummarystatistics
print("\nSummary Statistics:")
print(df.describe())

#Displaythefirstfewrowsofthedataset
print("\nFirstFewRowsoftheDataset:")
print(df.head())

#Displayuniqueclassesinthe'class'column
print("\nUnique Classes:")
print(df['class'].unique())

15
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

OUTPUT:

DatasetInformation:
<class
'pandas.core.frame.DataFrame'>RangeInde
x: 150 entries, 0 to 149
Datacolumns(total5columns):
#Column Non-NullCountDtype
--------- -------------------
0 sepal_length150non-nullfloat64
1 sepal_width150non-nullfloat64
2 petal_length150non-nullfloat64
3 petal_width150non-nullfloat64
4 class 150non-nullobject
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
None

SummaryStatistics:
sepal_length sepal_width petal_length petal_width
count150.000000150.000000150.000000150.000000
mean 5.84333 3.054000 3.75866 1.198667
3 7
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

FirstFewRowsoftheDataset:
sepal_lengthsepal_widthpetal_lengthpetal_width class

0 5.1 3.5 1.4 0.2Iris-setosa

1 4.9 3.0 1.4 0.2Iris-setosa
2 4.7 3.2 1.3 0.2Iris-setosa
3 4.6 3.1 1.5 0.2Iris-setosa
4 5.0 3.6 1.4 0.2Iris-setosa

16
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

UniqueClasses:

['Iris-setosa''Iris-versicolor''Iris-virginica']

RESULT:
Thus, we explore and understand the underlying patterns, trends, and relationships within the
Dataset successfully

EX.NO:4.2 VARIABLE FILTERS

AIM:
To apply variable filters in order to refine data based on specific conditions or criteria.

PROGRAM
import pandas as pd

#CreateasampleDataFrame
data = {'Name':['Alice','Bob','Charlie'], 'Age': [25, 30, 22],'City':['NewYork','SanFrancisco','LosAngeles']}
df=pd.DataFrame(data)

#DisplaytheoriginalDataFrame
print("Original DataFrame:")
print(df)

#Filterspecificvariables(columns)
selected_columns=['Name','City']
filtered_df=df[selected_columns]

#DisplaythefilteredDataFrame
print("\nFilteredDataFrame:")
print(filtered_df)

17
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

OUTPUT:

Original DataFrame:

Name Age City

0Alice 25 NewYork
1Bob 30 SanFrancisco
2Charlie 22 LosAngeles

FilteredDataFrame:

Name City
Alice NewYork
Bob SanFrancisco
Charlie LosAngeles

18
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:
Filtered data will display only the values or records that meet the defined criteria, improving
data analysis and decision-making.

EX.NO:4.3 ROW FILTERS

Aim:
To apply a row filter to restrict data visibility to only specific rows based on predefined
conditions.

PROGRAM

import pandas as pd
# Sample data
data={'Name':['Alice','Bob','Charlie','David'], 'Age': [20, 22, 21, 23],'Grade':[85,92,78,95]}
df = pd.DataFrame(data)
print("OriginalDataFrame:")
print(df)

OUTPUT:
OriginalDataFrame:

Name Age Grade

0Alice 20 85

19
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

1Bob 22 92
2Charlie 21 78
3David 23 95

20
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

Result:

The dataset will display only the rows that meet the filter criteria, improving focus and analysis
efficiency.

EX.NO:4.4 DATACLEANING

Aim:
To identify and correct errors, inconsistencies, and inaccuracies in the dataset to ensure its quality
and reliability.

PROGRAM

import pandas as pd

#Sampledatawithmissingvaluesandduplicates
data={'Name':['Alice','Bob','Charlie','David','Alice'],'Age':[20,None,21,23,22],'Grade':[85,92,78,95,92]}

df=pd.DataFrame(data)

print("Original DataFrame:")
print(df)

21
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

OUTPUT:
OriginalDataFrame:

Name Age Grade

0Alice 20.0 85
1Bob NaN 92
2Charlie 21.0 78
3David 23.0 95
4Alice 22.0 92

In this example ,we have a Data Frame with missing values in the 'Age' column and aduplicate row for the
name 'Alice.'

22
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:
Thus the various variable and row filtering techniques in R for cleaning data, and then apply
different plot features on a sample dataset explored successfully

EX.NO:5 TIMESERIES
AIM:
To Perform Time Series Analysis and apply the various visualization techniques
DEFINITION
Timeseriesanalysisinvolvesanalyzingandmodelingdatacollectedovertimetoidentifypatterns, trends,
and make predictions. Here, I'll demonstrate timeseries analysis using Python with the pandas,
matplotlib, and seaborn libraries. I'll usea simpleexamplewith a synthetic time series dataset.

Firstly,ensureyouhavethe requiredlibrariesinstalled:

bash
pipinstallpandasmatplotlibseaborn

Now,let'screateasynthetictimeseriesdatasetandperformbasictimeseriesanalysis:
PROGRAM
import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
From datetime import datetime,timedelta

#Generateasynthetictimeseriesdataset
np.random.seed(42)
date_today=datetime.now()
days = pd.date_range(date_today, date_today + timedelta(9), freq='D')
23
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

data = {'Date': days, 'Value': np.random.randint(10, 100,size=(len(days)))}

df = pd.DataFrame(data)
# Display the dataset
print("TimeSeriesDataset:")
print(df.head())

# Visualize the time series data

plt.figure(figsize=(10, 6))
sns.lineplot(x='Date', y='Value', data=df)
plt.title('Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()
OUTPUT:

This will generate a synthetic time series dataset and plot it using a line chart.

Now, let's perform some basic time series analysis and visualization techniques:

24
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

EX.NO:5.1 TRENDANALYSIS
DEFINITION:
Trend analysis examines data over time to identify long-termpatterns, helping discern upward,
downward, or stable directional changes.
PROGRAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Generate asynthetic time series dataset

np.random.seed(42)
date_today=pd.to_datetime('2023-01-01')
days = pd.date_range(date_today, date_today + pd.to_timedelta(365, unit='D'),
freq='D') values = np.cumsum(np.random.randn(len(days)))
df = pd.DataFrame({'Date':days,'Value':values})

#Plot the originaltimeseriesdata

plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['Value'], label='Original
Data')
plt.title('Original Time Series Data')

25
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

OUTPUT:

26
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

EX.NO : 5.2 SEASONAL DECOMPOSITION

DEFINITION:
Seasonaldecompositionseparatesatimeseriesintocomponents:trend,seasonal,andresidual,
aiding analysis by isolating underlying patterns and fluctuations.

PROGRAM
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Generateasynthetictimeseriesdatasetwithaclearseasonalcomponent
np.random.seed(42)

date_today=pd.to_datetime('2023-01-01')
days=pd.date_range(date_today,date_today+pd.to_timedelta(365,unit='D'),freq='D')

#Creatingaseasonalcomponent
seasonal_component=np.sin(2*np.pi*np.arange(len(days))/365*7)

27
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

#Generatingsyntheticdatawithtrendandseasonalcomponents
trend_component = np.cumsum(np.random.randn(len(days))) values =
trend_component + 10 * seasonal_component

df = pd.DataFrame({'Date':days,'Value':values})

#Plottheoriginaltimeseriesdata
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['Value'], label='Original Data')
plt.title('OriginalTimeSeriesDatawithSeasonalComponent')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

OUTPUT:

28
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

Result:

Thus Time Series Analysis and apply the various visualization techniques Performed
successfully

EX.NO : 6 INTERACTIVE MAPVISUALIZATION

AIM:
To Perform Data Analysis and representation on a Map using various Map data
sets with Mouse Rollover effect, user interaction, etc..

DEFINITION:
Performing data analysis and representation on a map involves visualizing geographic
data in away that provides in sights into spatial patterns. A popular tool for this is the
`folium` library in Python, which allows you to create interactive maps. Below is abasic
exampleusing`folium`withmouserollovereffectanduserinteraction.

PROGRAM
import folium

#LoadaGeoJSONdataset(youcanuseyourownGeoJSONfileorfindoneonline)

geojson_data = "https://raw.githubusercontent.com/nvkelso/natural-earth-
vector/master/geojson/ne_110m_admin_0_countries.geojson"

#Createafoliummapcenteredaroundthemeanlatitudeandlongitudeofthedataset m =
folium.Map(location=[0, 0], zoom_start=2)

#AddGeoJSONdatawithmouserollovereffectfolium.
29
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

GeoJson(
geojson_data,
name='geojson',
style_function=lambda x: {'fillColor': 'green', 'color': 'black'},
highlight_function=lambda x: {'fillColor': 'yellow', 'color': 'blue'},
tooltip=folium.features.GeoJsonTooltip(fields=['name'], labels=False,
sticky=False)
).add_to(m)

#Addlayercontrolforuserinteraction
folium.LayerControl().add_to(m)

# Display the
mapm.save("interactive_map_with_interaction.html")

OUTPUT:

When you run the script, it will create an HTML file (in this case,
"interactive_map_with_tooltip.html")inthesamedirectorywhereyourunthescript.
Open this HTML file in a web browser to visualize themap with mouserollover effects.

30
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:

Thus the Data Analysis and representation on a Map using various Map data sets with
Mouse Rollover effect, user interaction, etc.. are performed successfully.

EX.NO : 7 CARTOGRAPHIC VISUALIZATION

AIM:

To build cartographic visualization for multiple datasets involving various

countries ofthe world; states and districts in India etc.

DEFINITION:
Creating cartographic visualizations for multiple datasets involving various countries,
states ,or districts often involves combining data with geographical boundaries. Here's a
Python script that utilizes `geopandas` and `folium` tocreatevisualizationsforboth world
countries and states in India, along with fictional data for illustration

PROGRAM

Import folium

#Createafoliummapcenteredaroundaspecificlocation
m=folium.Map(location=[20.5937,78.9629],zoom_start=5)

# Add a marker for a few world countries

31
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

folium.Marker([37.7749, -122.4194],
popup='USA').add_to(m)
folium.Marker([35.8617,104.1954],popup='China').add_to(m)
folium.Marker([20.5937,78.9629],popup='India').add_to(m)

#AddamarkerforafewIndianstates
folium.Marker([19.7515,75.7139],popup='Maharashtra').add_to(m)
folium.Marker([27.0238,74.2179],popup='Rajasthan').add_to(m)

# Save the
mapm.save("world_and_india_visualization_simple.html")

PROCEDURE:
Youneedtoruntheprovidedcodeonyourlocalmachinetoseetheoutput.Whenyourunthe script,
it will generate an HTML file named "world_and_india_visualization_simple.html" in
the same directory where you saved the script.

OUTPUT:

32
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:

Thus cartographic visualization for multiple datasets involving various

countries of the world; states and districts in India are built successfully

EX.NO : 8 EDA ON WINE QUALITYDATA SET

AIM:

To Perform EDA on Wine Quality Data Set.

DEFINITION:
ExploratoryDataAnalysis(EDA)isacrucialstepinunderstandingthecharacteristicsofa
dataset.Let'sperformEDAonawinequalitydataset.Forthisexample,I'llusetheWine
Quality dataset available in the UCI Machine Learning Repository.

PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#LoadtheWineQualitydataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-
white.csv"
wine_data=pd.read_csv(url,sep=';')

33
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

#Displaythefirstfewrowsofthedataset
print(wine_data.head())

# Summary
statisticsprint(wine_data.describe())

# Distribution of Wine Quality

sns.countplot(x='quality', data=wine_data)
plt.title('Distribution of Wine Quality')
plt.show()

# Correlation heatmap
correlation_matrix =
wine_data.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix,annot=True,cmap='coolwarm',linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

#Pairplotforselectedfeatures
selected_features=['fixedacidity','volatileacidity','citricacid','residualsugar','chlorides',
'quality']
sns.pairplot(wine_data[selected_features], hue='quality',
markers='o')
plt.title('Pairplot of Selected Features')
plt.show()

OUTPUT:

34
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

35
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:

Thus EDA on Wine Quality Data Set are Performed successfully.

EX.NO : 9 CASE STUDY ON A DATA SET TO PRESENT AN

ANALYSIS REPORT

AIM:
To Use a case study on a data set and apply the various EDA and visualization
techniques and present an analysis report.

DEFINITION:

PROGRAM

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset (replace 'college_students.csv' with your actual file path)
df=pd.read_csv('college_students.csv')

# Display the first few rows ofthe dataset

36
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

print(df.head())

# Check for missing values

print(df.isnull().sum())

# Summary statistics

print(df.describe())

#UnivariateAnalysis
#Histogramsfornumericalvariables
df.plot(kind='hist', subplots=True, layout=(2, 2), figsize=(12, 10), bins=20,
title='Histograms')
plt.show()

# Bar plot for categorical variables (e.g., Gender)

df['Gender'].value_counts().plot(kind='bar',color=['skyblue','pink'])
plt.title('Distribution ofGender')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.show()

#BivariateAnalysis
#Pairplotfornumericalvariables
sns.pairplot(df, hue='Gender', markers=['o', 's'], height=3) plt.suptitle('Pair Plot
ofNumerical Variables by Gender', y=1.02)
plt.show()

# Correlation heatmap
correlation_matrix=df.corr()

plt.figure(figsize=(8,6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('CorrelationHeatmap')
plt.show()

#InsightsandAnalysis
# You can print and analyze key insights based on the EDA performed

# Save the visualizations or export the analysis report as needed

Csv file :
Age,Gender,GPA,StudyHours,Grade
24,Female,2.76,24,A
21,Male,2.53,25,D
22,Male,3.24,14,D
24,Female,2.77,12,F
20,Female,3.05,21,B

37
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

22,Female,3.62,29,F
22,Female,3.58,13,B
24,Female,2.96,25,F
19,Female,3.31,16,C
20,Female,3.26,22,C
24,Female,3.45,19,C
20,Male,2.88,16,C
20,Female,3.38,23,A
22,Female,3.97,14,C
21,Male,3.23,12,D
20,Female,3.86,20,D
23,Male,3.15,20,A
22,Male,3.03,27,C
19,Female,3.47,24,C
21,Male,3.5,21,C
23,Male,3.8,18,F
23,Male,2.85,19,B
19,Male,3.25,21,F
21,Female,3.36,26,B
22,Male,3.65,15,C
18,Female,2.57,16,C
21,Male,3.99,23,F
19,Male,3.2,22,F
23,Male,2.92,17,B
22,Male,3.83,19,D
21,Female,3.62,18,B
18,Female,3.93,27,F

38
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

18,Male,3.0,11,F
20,Male,3.33,14,A
20,Female,3.36,14,F
24,Male,3.97,15,A
19,Male,2.61,28,D
21,Male,2.96,17,B
21,Female,2.79,25,B
24,Female,2.9,22,A
23,Female,3.23,10,B
23,Male,3.06,29,F
24,Male,3.09,26,C
23,Female,3.77,16,A
20,Female,3.9,22,B
21,Female,2.61,13,A
24,Female,2.81,13,A
21,Male,3.51,15,C
18,Female,3.04,28,F
20,Male,2.88,21,A
22,Female,2.94,16,B
20,Male,2.98,19,D
24,Female,3.77,28,A
22,Female,2.7,16,A
18,Female,3.56,12,C
24,Female,3.33,22,F
19,Male,2.94,22,D
21,Female,3.13,27,B
18,Male,2.88,29,D
21,Male,3.42,17,B
23,Male,2.62,18,F
19,Male,2.51,16,B
19,Female,3.44,10,C
18,Male,2.79,12,C
19,Male,2.61,22,C
22,Male,3.1,26,C
19,Female,2.58,10,D
21,Female,3.83,15,F
21,Female,2.54,15,B
24,Female,3.37,21,B
21,Male,3.16,22,C
24,Male,3.51,22,C
21,Female,2.99,24,A
22,Male,2.73,25,F
24,Male,3.97,20,D
20,Male,3.76,14,B
23,Female,3.79,13,A
18,Female,2.88,12,A
21,Male,2.56,28,B
19,Female,2.95,29,D

39
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

21,Female,3.31,27,A
19,Female,2.99,24,A
23,Female,3.74,18,F
23,Female,2.91,26,D
23,Male,3.95,23,A
19,Female,3.19,24,D
21,Male,3.76,10,B
23,Male,2.79,12,C
22,Female,3.12,25,A
24,Male,3.55,20,F
19,Male,2.71,21,B
19,Male,2.7,19,D
21,Female,3.95,25,B
19,Female,3.57,17,A
19,Male,2.56,15,D
23,Female,3.1,21,C
21,Female,3.15,17,B
23,Female,3.62,13,A
24,Female,2.88,17,F
24,Male,2.78,27,D
23,Male,2.62,14,B
24,Female,3.14,18,B
21,Female,3.53,13,C
18,Male,2.59,26,C
23,Male,3.87,18,F
22,Female,3.16,10,F
22,Male,2.86,29,A
19,Female,2.64,22,A
24,Female,2.77,25,F
22,Male,3.9,22,F
19,Male,3.46,23,D
18,Female,3.28,12,C
21,Female,3.49,15,A
21,Male,3.15,27,C
21,Female,3.6,28,C
22,Male,2.57,14,F
18,Female,3.35,24,D
22,Male,2.74,11,B
24,Male,2.68,19,D
22,Male,3.01,27,D
18,Female,2.64,22,C
18,Female,2.64,14,D
24,Male,2.97,10,A
18,Female,3.97,10,C
18,Male,2.76,27,A
21,Male,2.53,24,B
24,Female,3.65,26,C
20,Female,3.71,20,B

40
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

20,Male,3.02,26,C
18,Female,3.2,22,F
20,Female,3.47,10,D
20,Male,2.57,11,F
18,Male,3.92,18,B
20,Female,3.83,12,D
22,Male,2.89,10,C
19,Male,2.52,25,D
24,Female,3.9,15,A
19,Male,3.25,26,D
18,Male,3.31,14,A
21,Female,3.53,14,D
24,Female,3.42,15,A
18,Male,3.92,12,B
21,Male,3.92,14,F
19,Female,3.8,14,C
18,Female,3.45,19,D
24,Female,3.7,19,F
24,Female,3.52,28,C
23,Male,3.36,26,C
22,Female,2.69,23,A
20,Female,3.72,18,B
21,Male,3.73,23,B
23,Female,3.44,10,F
20,Male,3.73,28,B
20,Female,3.48,22,D
18,Female,2.81,22,B
20,Female,2.91,13,F
22,Male,2.82,10,B
24,Male,3.07,26,D
23,Female,2.56,17,A
20,Male,3.43,11,F
18,Male,3.0,17,A
22,Male,3.48,16,A
19,Male,3.08,11,A
24,Male,3.52,12,C
24,Male,3.01,27,C
23,Female,2.89,21,A
24,Female,3.24,10,F
20,Male,3.54,21,D
18,Female,3.02,14,D
24,Female,3.9,26,B
24,Female,2.56,25,F
19,Female,3.13,24,C
19,Male,3.95,24,A
21,Female,3.32,14,B
22,Female,3.14,23,D
20,Male,3.35,11,C

41
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

24,Male,3.36,20,C
24,Female,3.6,28,A
18,Female,2.69,16,D
21,Female,2.88,15,F
22,Female,3.37,11,C
21,Female,3.8,15,A
23,Female,3.34,27,F
22,Male,2.86,11,D
24,Female,3.52,27,C
24,Female,3.61,24,F
22,Male,2.86,28,F
24,Male,3.07,11,F
20,Male,3.3,29,C
22,Male,3.24,15,C
21,Female,3.08,10,B
22,Female,2.95,24,D
24,Female,2.65,19,A
20,Female,2.58,28,F
20,Female,3.94,26,B
23,Female,3.77,14,A
21,Male,3.03,13,B
19,Female,3.94,19,C
19,Male,3.52,26,F
22,Male,3.22,19,A

42
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

OUTPUT:

43
Downloaded by hanipriya nirmal ([email protected])
lOMoARcPSD|57176504

RESULT:
Thus case study on a data set apply the various EDA and visualization
techniques and present an analysis report successfully.

44
Downloaded by hanipriya nirmal ([email protected])

Breville BES980XL
100% (1)
Breville BES980XL
7 pages
Usa 2000
0% (1)
Usa 2000
225 pages
In Noctem Sheet Music For Piano, Soprano, Alto, Mezzo Soprano (Mixed Quintet)
No ratings yet
In Noctem Sheet Music For Piano, Soprano, Alto, Mezzo Soprano (Mixed Quintet)
1 page
Final Dev Record
No ratings yet
Final Dev Record
49 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Python Pandas and Matplotlib 7
100% (3)
Python Pandas and Matplotlib 7
72 pages
DEV Lab Manual-1
No ratings yet
DEV Lab Manual-1
27 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
BDA File
No ratings yet
BDA File
26 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
3-Numpy Pandas
No ratings yet
3-Numpy Pandas
37 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
DEV Manual - ESEC
No ratings yet
DEV Manual - ESEC
27 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
DSF Lab Exp Full
No ratings yet
DSF Lab Exp Full
88 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
Unit 5
No ratings yet
Unit 5
39 pages
PP&DS Unit Iii
No ratings yet
PP&DS Unit Iii
26 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Eda Lab Assignment2
No ratings yet
Eda Lab Assignment2
10 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Edap Lab
No ratings yet
Edap Lab
47 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Dav 2 Unit
No ratings yet
Dav 2 Unit
55 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
Dev
No ratings yet
Dev
33 pages
Mohit
No ratings yet
Mohit
19 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Program Questions
No ratings yet
Program Questions
2 pages
Unit 5
No ratings yet
Unit 5
28 pages
Report
No ratings yet
Report
18 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Module 4
No ratings yet
Module 4
57 pages
Pandas
No ratings yet
Pandas
25 pages
Unit 5
No ratings yet
Unit 5
27 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
ML Manual
No ratings yet
ML Manual
21 pages
Unit 5 Python Notes HM
No ratings yet
Unit 5 Python Notes HM
59 pages
RAW Data
No ratings yet
RAW Data
22 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Pandas Numpy
No ratings yet
Pandas Numpy
4 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
EX-02-Data Manipulation Pandas Matplot
No ratings yet
EX-02-Data Manipulation Pandas Matplot
9 pages
Labdev
No ratings yet
Labdev
57 pages
IP Book 12 Question Bank
No ratings yet
IP Book 12 Question Bank
20 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
DoanYen Resume
No ratings yet
DoanYen Resume
1 page
CMPM-MODULE3-Organization & Management
No ratings yet
CMPM-MODULE3-Organization & Management
8 pages
Brand Building New
No ratings yet
Brand Building New
69 pages
Overseas Road Notes PDF
No ratings yet
Overseas Road Notes PDF
2 pages
The Ultimate Closing Script Updated
No ratings yet
The Ultimate Closing Script Updated
4 pages
U8 L3 Delay Line Cancellers
No ratings yet
U8 L3 Delay Line Cancellers
3 pages
(English) Why 99.6% of You Will Never Be RICH - The Education Trap (DownSub - Com)
No ratings yet
(English) Why 99.6% of You Will Never Be RICH - The Education Trap (DownSub - Com)
3 pages
Arbor Press Drawings PDF
No ratings yet
Arbor Press Drawings PDF
16 pages
2-Door Pit Latrine Design-1
No ratings yet
2-Door Pit Latrine Design-1
1 page
Entrepreneurship Development
100% (2)
Entrepreneurship Development
94 pages
David Osborn
No ratings yet
David Osborn
9 pages
Answers For Tutorial Chapter 4
No ratings yet
Answers For Tutorial Chapter 4
8 pages
Part I Question 05 Revised
No ratings yet
Part I Question 05 Revised
13 pages
WEEK 12 - Business Transactions and Their Analysis PT 2
No ratings yet
WEEK 12 - Business Transactions and Their Analysis PT 2
27 pages
TCS Advanced Quantitative and Reasoning Ability - Trainer Handout
No ratings yet
TCS Advanced Quantitative and Reasoning Ability - Trainer Handout
10 pages
2026 Junior High School - List A
No ratings yet
2026 Junior High School - List A
2 pages
PM Wbs Guide
No ratings yet
PM Wbs Guide
2 pages
WB Presentation
No ratings yet
WB Presentation
23 pages
PlasmaPro 80
No ratings yet
PlasmaPro 80
8 pages
Song Chan Relay 875B 1CC F C
No ratings yet
Song Chan Relay 875B 1CC F C
4 pages
Keyword Tool Export - Keyword Suggestions - Analyst Jobs
No ratings yet
Keyword Tool Export - Keyword Suggestions - Analyst Jobs
13 pages
Professional Summary: Vikneswary S.Salwam
No ratings yet
Professional Summary: Vikneswary S.Salwam
10 pages
17262/TPTY GNT EXP Sleeper Class (SL)
No ratings yet
17262/TPTY GNT EXP Sleeper Class (SL)
2 pages
Chemical Engineering Science: Christian Witz, Daniel Treffer, Timo Hardiman, Johannes Khinast
No ratings yet
Chemical Engineering Science: Christian Witz, Daniel Treffer, Timo Hardiman, Johannes Khinast
13 pages
A Performative Approach To Urban Informality: Learning From Mexico City and Rio de Janeiro
No ratings yet
A Performative Approach To Urban Informality: Learning From Mexico City and Rio de Janeiro
15 pages
Ghaziabad Lawyers / Ghaziabad Law Firms: Find A Lawyer Select India City
No ratings yet
Ghaziabad Lawyers / Ghaziabad Law Firms: Find A Lawyer Select India City
3 pages
DAF LF55 Workshop Service Repair Manual 2001-2006 Download PDF
No ratings yet
DAF LF55 Workshop Service Repair Manual 2001-2006 Download PDF
7 pages