Aphical Representation
Aphical Representation
Graphical Representation
Instructions:
Please share your answers filled in-line in the word document. Submit code
separately wherever applicable.
Guidelines:
1. An assignment submission is considered complete only when the correct and executable code(s) is
submitted along with the documentation explaining the method and results. Failing to submit either
of those will be considered an invalid submission and will not be considered a correct submission.
2. Ensure that you submit your assignments correctly. Resubmission is not allowed.
3. Post the submission you can evaluate your work by referring to the keys provided. (will be available
only post the submission).
Make a table as shown above and provide information about the features such as its data
type and its relevance to the model building. And if not relevant, provide reasons and a
description of the feature.
Problem Statements:
1. Univariate plots for UNIV data (Plot must have Title, X & Y label)
A) Plot numerical column with 3 different plots ?
B) What are bin parameters? What are the methods to define the number of bins and
bin sizes ?
#Plotting
plt.hist(df.workex)
plt.title('Histogram of numerical columns')
plt.xlabel('workex')
plt.show()
#Boxplot
plt.boxplot(df.workex)
plt.title('Boxplot of numerical columns')
plt.xlabel('workex')
plt.show()
B) Bins are used to divide the range of data into intervals in a histogram. Choosing an
appropriate number of bins and bin size is important for effectively visualizing the data. There
are several methods to determine the number of bins:
Square Root Choice:
Sturges' Formula
Scott's Rule:
Freedman-Diaconis Rule:
Density plots, particularly those created using kernel density estimation (KDE), can extend
beyond the range of the data for visualization purposes. This is because the KDE is used to
estimate the underlying probability density function, and it can have tails that extend beyond
the observed data range. It's important to remember that the density plot doesn't represent
the actual data but rather provides a smoothed estimate of the data distribution.
The extension of the density plot beyond the range of the data is a feature of the KDE
smoothing process and is not necessarily an error. However, you can limit the x-axis range of
the plot to match the observed data range if you want to focus on the data within that range.
D) unique_values = categorical_data.unique()
value_counts = categorical_data.value_counts()
plt.bar(unique_values, value_counts)
#Plotting
plt.scatter(df.age,df.Salaries)
plt.title('Scatterplot of numerical columns')
plt.xlabel('age')
plt.ylabel('Salaries')
plt.grid()
plt.show()
B) #Bargraph
plt.bar(df.Salaries,df.Sex,width=1000,align='center')
plt.title('Bargraph of numerical and categorical columns')
plt.xlabel('Salaries')
plt.ylabel('Sex')
plt.show()
#line graph
sns.lineplot(df,x=df.Salaries,y=df.age)
C) The major difference between Bar Chart and Histogram is the bars of the bar chart are not
just next to each other. In the histogram, the bars are adjacent to each other. In statistics, bar
charts and histograms are important for expressing a huge or big number of data.
sns.pairplot(numerical_column)
4. Plot Skewness & Probability distribution for each column of marks data. (Hist, box, density)
A) What is normally distributed and What will be the relationship between mean,
median & mode ?
B) Which data variables are positively skewed and What will be the relationship
between mean, median & mode
C) What are negatively skewed/distributed and What will be the relationship between
mean, median & mode
D) What are the distinctive differences between skewness and distribution?
#Plotting
plt.hist(df[['workex','gmat']])
plt.legend(df)
plt.show()
plt.boxplot(df[['workex','gmat']])
sns.kdeplot(df[['workex','gmat']])