0% found this document useful (0 votes)
13 views

SQL Notes

The document discusses descriptive statistics and methods for summarizing data, including measures of central tendency like mean, median, and mode. It also covers measures of dispersion such as range and variance. Additionally, it discusses concepts like skewness, different visual plots like histograms and box plots, correlation, hypothesis testing, model evaluation, and association rule mining. Examples are provided to illustrate key statistical concepts.

Uploaded by

Preeti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

SQL Notes

The document discusses descriptive statistics and methods for summarizing data, including measures of central tendency like mean, median, and mode. It also covers measures of dispersion such as range and variance. Additionally, it discusses concepts like skewness, different visual plots like histograms and box plots, correlation, hypothesis testing, model evaluation, and association rule mining. Examples are provided to illustrate key statistical concepts.

Uploaded by

Preeti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Descriptive Statistics:

Descriptive statistics are methods of summarizing and organizing data to provide meaningful
insights. They help in understanding the main features of a dataset.

https://www.analyticsvidhya.com/blog/2021/06/descriptive-statistics-a-beginners-guide/

Measures of Central Tendency:

https://byjus.com/maths/central-tendency/

Definition: The middle value when the data is sorted. It's not affected by extreme values.
Example: Median of {10, 15, 20, 25, 30} = 20
Mode:

Definition: The most frequently occurring value in a dataset.


Example: Mode of {10, 15, 20, 25, 30, 20} = 20
Measures of Dispersion:
Range:

Definition: The difference between the maximum and minimum values in a dataset.
Example: Range of {10, 15, 20, 25, 30} = 30 - 10 = 20
Variance:

Skewness:
Definition: A measure of the asymmetry of the probability distribution.
Positive Skewness: Right-skewed (tail on the right side is longer).
Negative Skewness: Left-skewed (tail on the left side is longer).
Example: A positively skewed distribution might represent income data where a few
individuals have very high incomes.
https://www.analyticsvidhya.com/blog/2021/05/shape-of-data-skewness-and-kurtosis/
#:~:text=The%20skewness%20is%20a%20measure,pushed%20towards%20the%20left
%20side).

Visual Plots:

Histogram:

Definition: A graphical representation of the distribution of a dataset.


Key Points: Shows the frequency of data within certain ranges.

Box Plot (Box-and-Whisker Plot):

Definition: Displays the distribution of data based on a five-number summary.


Key Points: Highlights median, quartiles, and outliers.

Scatter Plot:

Definition: Displays individual data points on a two-dimensional graph.


Key Points: Useful for showing relationships between two variables.

Questions and Answers:


Q1: Why is the median preferred over the mean in certain situations?

A1: The median is less sensitive to extreme values (outliers) and provides a better
representation of central tendency when the data is skewed.
Q2: How is skewness interpreted in a dataset?

A2: Positive skewness indicates a rightward tail, while negative skewness indicates a
leftward tail. The magnitude of skewness quantifies the degree of asymmetry.
Q3: What information does a box plot convey about a dataset?

A3: A box plot visually represents the distribution's central tendency, and spread, and
identifies potential outliers through quartiles and the interquartile range.
Q4: When would you use a scatter plot in data analysis?

A4: A scatter plot is useful for identifying relationships between two variables, visualizing
patterns, and assessing correlations in data.

CORRELATION

https://www.analyticsvidhya.com/blog/2021/05/shape-of-data-skewness-and-kurtosis/
#:~:text=The%20skewness%20is%20a%20measure,pushed%20towards%20the%20left
%20side).
Hypothesis Testing

https://www.simplilearn.com/tutorials/statistics-tutorial/hypothesis-testing-in-
statistics#:~:text=Hypothesis%20testing%20is%20a%20statistical,data%20to%20assess
%20the%20evidence.

Evaluation Model:

https://www.analyticsvidhya.com/blog/2021/12/evaluation-of-classification-model/

Association Rule Mining in Python:

https://www.datacamp.com/tutorial/association-rule-mining-python

You might also like