0% found this document useful (0 votes)
8 views15 pages

AIDS C04 Session 24

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views15 pages

AIDS C04 Session 24

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

21CS2213RA

AI for Data Science

Session -24

Contents: Density diagrams, Mean, Standard Deviation , Median,


Quantiles and correlations

1
The topics covered
• Representing statistical measures:
• Density diagrams
• Mean, Standard Deviation ,
• Median,
• Quantiles,
• and correlations
Density Plot
• A Density plot is a smoothed, continuous version of a histogram
estimated from the data.
• The most common form of estimation is known as kernel density
estimation.
• In this method, a continuous curve (the kernel) is drawn at every
individual data point and all of these curves are then added together
to make a single smooth density estimation.
Why Density Plot?
• It visualizes the distribution of data over a continuous interval or time
period.
• This chart is a variation of a Histogram that uses kernel smoothing to
plot values, allowing for smoother distributions by smoothing out the
noise.
• The peaks of a Density Plot help display where values are
concentrated over the interval.
• Density Plots have over Histograms is that they're better at
determining the distribution shape because they're not affected by the
number of bins used (each bar used in a typical histogram).
Example of Density Plot
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

data = np.random.normal(10,3,100) #
Generate Data
density = gaussian_kde(data)

x_vals = np.linspace(0,20,200) #
Specifying the limits of our data
density.covariance_factor = lambda : .5
#Smoothing parameter

density._compute_covariance()
plt.plot(x_vals,density(x_vals))
plt.show()
5
Statistical measures
• Statistics, in general, is the method of collection of data, tabulation,
and interpretation of numerical data
• With statistics, we can see how data can be used to solve complex
problems.
Descriptive Statistics

• descriptive statistics generally means describing the data with the


help of some representative methods like charts, tables, Excel files,
etc.
• The data is described in such a way that it can express some
meaningful information that can also be used to find some future
trends.
• Describing and summarizing a single variable is called univariate
analysis.
• Describing a statistical relationship between two variables is
called bivariate analysis.
• Describing the statistical relationship between multiple variables is
called multivariate analysis.
Mean
• It is the sum of observations divided by the total number of observations. It
is also defined as average which is the sum divided by count.
• The mean() function returns the mean or average of the data passed in its
arguments. If passed argument is empty, Statistics Error is raised.
• Example:

# mean()
import statistics

# initializing list
li = [1, 2, 3, 3, 2, 2, 2, 1]

# using mean() to calculate average of list


# elements
print ("The average of list values is : ",end="")
print (statistics.mean(li))
median()
• median() function is used to calculate the median, i.e middle element
of data. If the passed argument is empty, StatisticsError is raised.
Caclulating Median
• Step 1:Arrange the data in the increasing order and then find the mid
value.
• Step 2:Calulate median using the function.
Mode
• Mode is the number which occur most often in the data set.Here 150
is occurring twice so this is our mode.
Co-relations and Heat map
• A correlation heatmap is a graphical representation of a correlation
matrix representing the correlation between different variables.
• The value of correlation can take any value from -1 to 1.
• Correlation between two random variables or bivariate data does not
necessarily imply a causal relationship.
How to create seaborn correlation heatmap
Steps:
• Install seaborn package
• Ex: pip install seaborn
• Import all required modules
• Import the file where your data is stored
• Plot a heatmap
• Display it using matplotlib
Example
• import matplotlib.pyplot as py
• import pandas as pd
• import seaborn as sb

• # import file with data
• data = pd.read_csv(“data.csv”)
• print(data.corr())
• dataplot = sb.heatmap(data.corr(), cmap="YlGnBu", annot=True)

• # displaying heatmap
• py.show()
Thank you

15

You might also like