Eda 4 5
Eda 4 5
UNIT-II
The first significant step is to initialize the spider plot. This can be done by
setting the figure size and polar projection.
b)Histogram
Histogram plots are used to depict the distribution of any continuous variable.
These types of plots are very popular in statistical analysis.
# Create a dataframe
import pandas as pd
df = pd.DataFrame({'group':list(map(chr, range(65, 85))),
'values':np.random.uniform(size=20) })
CONCLUSIONS:
EDA LAB
UNIT-II
5.Case Study: Perform Exploratory Data Analysis with Personal Email Data
Code:
import pandas as pd # Python library for data analysis and data frame
import numpy as np
# Numerical Python library for linear algebra computations
pd.set_option('display.max_columns', None) # code to display all columns
# Visualisation libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore") # To prevent kernel from showing any
warning
train_df = pd.read_csv('train_F3fUq2S.csv')
train_df.sample(5)
# import the above dataset from Kaggle
train_df.shape
train_df.info()
train_df.isnull().sum()
train_df.describe()
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 8))
sns.heatmap(corr, cmap="YlGnBu", annot=True)
plt.show()
train_df.drop(['campaign_id','is_timer'], axis=1, inplace=True) #dropping
#redundant columns
train_df.rename(columns={'is_image':'no_image','is_quote':'no_quote','is_emo
ticons':'no_emoticons'}, inplace=True)
BIVARIATE ANALYSIS
#Bivariate analysis is one of the simplest forms of quantitative (statistical)
analysis.
#It involves the analysis of two variables (often denoted as X, Y), for the
purpose of determining the empirical relationship between them.