0% found this document useful (0 votes)
19 views7 pages

Eda 4 5

EDA-PYTHON

Uploaded by

arafaths062
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Eda 4 5

EDA-PYTHON

Uploaded by

arafaths062
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

EDA LAB

UNIT-II

4.Generate the following charts for a dataset.


a) Polar Chart b)Histogram c)Lollipop chart
a) Polar Chart
A polar chart is a diagram that is plotted on a polar axis. Its coordinates are
angle and radius, as opposed to the Cartesian system of x and y coordinates.
Sometimes, it is also referred to as a spider web plot.

#Let's assume you have five courses in your academic year:


subjects = ["C programming", "Numerical methods", "Operating
system", "DBMS", "Computer Networks"]
#And you planned to obtain the following grades in each subject:
plannedGrade = [90, 95, 92, 68, 68, 90]
#However, after your final examination, these are the grades you got:
actualGrade = [75, 89, 89, 80, 80, 75]

The first significant step is to initialize the spider plot. This can be done by
setting the figure size and polar projection.

#Import the required libraries:


import numpy as np
import matplotlib.pyplot as plt
# Prepare the dataset and set up theta:
theta = np.linspace(0, 2 * np.pi, len(plannedGrade))
#Initialize the plot with the figure size and polar projection:
plt.figure(figsize = (10,6))
plt.subplot(polar=True)
#Get the grid lines to align with each of the subject names:
(lines,labels) = plt.thetagrids(range(0,360,
int(360/len(subjects))),
(subjects))
#Use the plt.plot method to plot the graph and fill the area under it:
plt.plot(theta, plannedGrade)
plt.fill(theta, plannedGrade, 'b', alpha=0.2)
#Now, we plot the actual grades obtained:
plt.plot(theta, actualGrade)
#We add a legend and a nice comprehensible title to the plot:
plt.legend(labels=('Planned Grades','Actual Grades'),loc=1)
plt.title("Plan vs Actual grades by Subject")
#Finally, we show the plot on the screen:
plt.show()

b)Histogram
Histogram plots are used to depict the distribution of any continuous variable.
These types of plots are very popular in statistical analysis.

import matplotlib.pyplot as plt


import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()
c)Lollipop chart
They are nothing but a variation of the bar chart in which the thick bar is
replaced with just a line and a dot-like “o” (o-shaped) at the end. Lollipop
Charts are preferred when there are lots of data to be represented that can
form a cluster when represented in the form of bars.

Python allows to build lollipops, thanks to the matplotlib library, as shown in


the examples below. The strategy here is to use the stem() function.
A lollipop plot displays each element of a dataset as a segment and a circle.

# Create a dataframe
import pandas as pd
df = pd.DataFrame({'group':list(map(chr, range(65, 85))),
'values':np.random.uniform(size=20) })

# Reorder it following the values:


ordered_df = df.sort_values(by='values')
my_range=range(1,len(df.index)+1)

# Make the plot


plt.stem(ordered_df['values'])
plt.xticks( my_range, ordered_df['group'])

CONCLUSIONS:

EDA LAB
UNIT-II

5.Case Study: Perform Exploratory Data Analysis with Personal Email Data

Code:
import pandas as pd # Python library for data analysis and data frame
import numpy as np
# Numerical Python library for linear algebra computations
pd.set_option('display.max_columns', None) # code to display all columns

# Visualisation libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings("ignore") # To prevent kernel from showing any
warning

train_df = pd.read_csv('train_F3fUq2S.csv')
train_df.sample(5)
# import the above dataset from Kaggle

train_df.shape
train_df.info()

train_df.isnull().sum()

train_df.describe()

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Select only numeric columns from the DataFrame


numeric_df = train_df.select_dtypes(include=['number'])

# Calculate correlation for numeric columns only


corr = numeric_df.corr()

plt.figure(figsize=(20, 8))
sns.heatmap(corr, cmap="YlGnBu", annot=True)
plt.show()
train_df.drop(['campaign_id','is_timer'], axis=1, inplace=True) #dropping
#redundant columns

train_df.rename(columns={'is_image':'no_image','is_quote':'no_quote','is_emo
ticons':'no_emoticons'}, inplace=True)

BIVARIATE ANALYSIS
#Bivariate analysis is one of the simplest forms of quantitative (statistical)
analysis.
#It involves the analysis of two variables (often denoted as X, Y), for the
purpose of determining the empirical relationship between them.

_, ax1 = plt.subplots(2,2, figsize=(25,20))


for i, col in enumerate(num_cols):
if col != 'click_rate':
sns.scatterplot(x=col, y='click_rate', data=train_df, ax=ax1[i//2, i%2])
plt.show()

_, ax1 = plt.subplots(8,2, figsize=(25,50))


for i, col in enumerate(cat_cols):
sns.barplot(x=col, y='click_rate', data=train_df, ax=ax1[i//2, i%2])
plt.show()

_, ax1 = plt.subplots(8,2, figsize=(25,50))


for i, col in enumerate(cat_cols):
sns.boxplot(x=col, y='click_rate', data=train_df, ax=ax1[i//2, i%2])
plt.show()
CONCLUSIONS:

You might also like