Eda Expt
Eda Expt
– 01
Theory:
Exploratory Data Analysis (EDA) is an analysis approach that identifies general patterns in
the data. These patterns include outliers and features of the data that might be unexpected.
EDA is an important first step in any data analysis. Exploratory Data Analysis refers to the
critical process of performing initial investigations on data so as to discover patterns, to spot
anomalies, to test hypothesis and to check assumptions with the help of summary statistics
and graphical representations. Data analytics in automotive industry enhances supply chain
management by providing insights into the entire supply chain, from raw material suppliers to
finished vehicle dealerships. This information can be used to improve efficiency, reduce
costs, and mitigate risks. An automobile dataset typically includes information about various
types of vehicles, such as cars, trucks, and motorcycles. The dataset may include information
about the make, model, year, and manufacturer of the vehicle.
Types of EDA:
1. Univariate Analysis
2. Bivariate Analysis
3. Multivariate Analysis
1. Univariate Analysis
Definition: Focuses on analyzing a single variable at a time.
Purpose: To understand the variable's distribution, central tendency, and spread.
Techniques: Descriptive statistics (mean, median, mode, variance, standard deviation).
Visualizations (histograms, box plots, bar charts, pie charts).
2. Bivariate Analysis
3. Multivariate Analysis
1. Histograms,
2. Scatter Plots,
3. Box Plots,
4. Bar Charts,
5. Line Charts,
6. Heatmaps, And
7. pair plots
Conclusion:
EDA is an essential process for data scientists to analyze the data before reaching final
assumptions. So, It can help data scientists to identify errors, and abnormal events, promote a
better understanding of patterns within the data, and help in understanding the data set
variables. Explanatory data analysis (EDA) is a formal and rigorous approach to data analysis
that is used to test hypotheses, make predictions, and draw conclusions based on the data. It
helps us comprehend the underlying patterns, identify potential biases or confounding factors,
and communicate hidden data insights. Here we Concludes as;
Data Loading
In [2]:
Data Cleaning
In [3]:
Missing Data
fill missing data of normalised-losses, price, horsepower, peak-rpm, bore, stroke with the
respective column mean Fill missing data category Number of doors with the mode of the column
i.e. Four
In [4]:
df_temp = df_automobile[df_automobile['normalized-losses']!='?'] normalised_mean =
df_temp['normalized-losses'].astype(int).mean()
df_automobile['normalized-losses'] = df_automobile['normalized-losses'].replace('?',normalised_mean).astype(int)
df_temp = df_automobile[df_automobile['bore']!='?']
normalised_mean = df_temp['bore'].astype(float).mean()
df_automobile['bore'] = df_automobile['bore'].replace('?',normalised_mean).astype(float)
df_temp = df_automobile[df_automobile['stroke']!='?']
normalised_mean = df_temp['stroke'].astype(float).mean()
df_automobile['stroke'] = df_automobile['stroke'].replace('?',normalised_mean).astype(float)
Out[4]:
In [5]:
Out[5]:
In [7]:
Findings