0% found this document useful (0 votes)

22 views38 pages

Unit-IV of Data Science

It's a data science notes ppt

Uploaded by

dimplejangid1808

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views38 pages

Unit-IV of Data Science

It's a data science notes ppt

Uploaded by

dimplejangid1808

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Unit-IV : Statistics

Dr. Amit Kumar Chaturvedi

Assistant Prof., CSE(MCA) Dept.,
Engineering College, Ajmer
Basic Terminology in Statistics
Statistics is a form of mathematical analysis that uses quantified models
and representations for a given set of experimental data or real-life
studies. The main advantage of statistics is that information is presented
in an easy way. To become a master in the statistical program we should
be familiar with certain terminologies. They are:
• Understand the Type of Analytics
• Probability
• Central Tendency
• Variability
• Relationship Between Variables
• Probability Distribution
• Hypothesis Testing and Statistical Significance
• Regression
Understand the Type of Analytics

• Descriptive Analytics tells us what happened in the past and

helps a business understand how it is performing by providing
context to help stakeholders interpret information.
• Diagnostic Analytics takes descriptive data a step further and
helps you understand why something happened in the past.
• Predictive Analytics predicts what is most likely to happen in
the future and provides companies with actionable insights
based on the information.
• Prescriptive Analytics provides recommendations regarding
actions that will take advantage of the predictions and guide
the possible actions toward a solution.
Probability
Probability is the measure of the likelihood that an event will occur in a Random Experiment.
Complement: P(A) + P(A’) = 1
Intersection: P(A∩B) = P(A)P(B)
Union: P(A∪B) = P(A) + P(B) − P(A∩B)

Intersection and Union.

Conditional Probability: P(A|B) is a measure of the probability of one event occurring with some relationship to one or
more other events. P(A|B)=P(A∩B)/P(B), when P(B)>0.
Independent Events: Two events are independent if the occurrence of one does not affect the probability of occurrence
of the other. P(A∩B)=P(A)P(B) where P(A) != 0 and P(B) != 0 , P(A|B)=P(A), P(B|A)=P(B)
Mutually Exclusive Events: Two events are mutually exclusive if they cannot both occur at the same time. P(A∩B)=0 and
P(A∪B)=P(A)+P(B).
Bayes’ Theorem describes the probability of an event based on prior knowledge of conditions that might be related to
the event.
Bayes’ Theorem.
Central Tendency
• Mean: The average of the dataset.
• Median: The middle value of an ordered dataset.
• Mode: The most frequent value in the dataset. If
the data have multiple values that occurred the
most frequently, we have a multimodal
distribution.
• Skewness: A measure of symmetry.
• Kurtosis: A measure of whether the data are
heavy-tailed or light-tailed relative to a normal
distribution
Variability
• Range: The difference between the highest and lowest value in the dataset.
• Percentiles, Quartiles and Interquartile Range (IQR)
• Percentiles — A measure that indicates the value below which a given
percentage of observations in a group of observations falls.
• Quantiles— Values that divide the number of data points into four more or
less equal parts, or quarters.
• Interquartile Range (IQR)— A measure of statistical dispersion and
variability based on dividing a data set into quartiles. IQR = Q3 − Q1

Percentiles, Quartiles and Interquartile Range (IQR).

Population and Sample Variance and Standard Deviation.
Relationship Between Variables
• Causality: Relationship between two events where one event is affected by the other.
• Covariance: A quantitative measure of the joint variability between two or more
variables.
• Correlation: Measure the relationship between two variables and ranges from -1 to 1,
the normalized version of covariance.

Covariance and Correlation.

Probability Distributions
Probability Distribution Functions
Probability Mass Function (PMF): A function that gives the probability that
a discrete random variable is exactly equal to some value.
Probability Density Function (PDF): A function for continuous data where the
value at any given sample can be interpreted as providing a relative likelihood that
the value of the random variable would equal that sample.
Cumulative Density Function (CDF): A function that gives the probability that a
random variable is less than or equal to a certain value.

Comparison between PMF, PDF, and CDF.

Continuous Probability Distribution
• Uniform Distribution: Also called a rectangular distribution, is a probability
distribution where all outcomes are equally likely.
• Normal/Gaussian Distribution: The curve of the distribution is bell-shaped and
symmetrical and is related to the Central Limit Theorem that the sampling
distribution of the sample means approaches a normal distribution as the
sample size gets larger.
• Exponential Distribution: A probability distribution of the time
between the events in a Poisson point process.
• Chi-Square Distribution: The distribution of the sum of squared
standard normal deviates.
Discrete Probability Distribution
Bernoulli Distribution: The distribution of a random variable
which takes a single trial and only 2 possible outcomes,
namely 1(success) with probability p, and 0(failure) with
probability (1-p).
Binomial Distribution: The distribution of the number of
successes in a sequence of n independent experiments, and
each with only 2 possible outcomes, namely 1(success) with
probability p, and 0(failure) with probability (1-p).
Poisson Distribution: The distribution that expresses the
probability of a given number of events k occurring in a fixed
interval of time if these events occur with a known constant
average rate λ and independently of the time.
Hypothesis Testing and Statistical Significance

Null and Alternative Hypothesis

Null Hypothesis: A general statement that there is no relationship between two
measured phenomena or no association among groups. Alternative Hypothesis: Be
contrary to the null hypothesis.
In statistical hypothesis testing, a type I error is the rejection of a true null hypothesis,
while a type II error is the non-rejection of a false null hypothesis.
Interpretation
P-value: The probability of the test statistic being at least as extreme as the one
observed given that the null hypothesis is true. When p-value > α, we fail to reject
the null hypothesis, while p-value ≤ α, we reject the null hypothesis, and we can
conclude that we have a significant result.
Critical Value: A point on the scale of the test statistic beyond which we reject the
null hypothesis and is derived from the level of significance α of the test. It
depends upon a test statistic, which is specific to the type of test, and the
significance level, α, which defines the sensitivity of the test.
Significance Level and Rejection Region: The rejection region is actually
dependent on the significance level. The significance level is denoted by α and is
the probability of rejecting the null hypothesis if it is true.
Z-Test
A Z-test is any statistical test for which the distribution of the test statistic under
the null hypothesis can be approximated by a normal distribution and tests the
mean of a distribution in which we already know the population variance.
Therefore, many statistical tests can be conveniently performed as approximate Z-
tests if the sample size is large or the population variance is known.
T-Test
A T-test is the statistical test if the population variance is unknown, and the sample
size is not large (n < 30).
Paired sample means that we collect data twice from the same group, person, item,
or thing. Independent sample implies that the two samples must have come from
two completely different populations.
ANOVA (Analysis of Variance)
ANOVA is the way to find out if experimental results are significant. One-way
ANOVA compares two means from two independent groups using only one
independent variable. Two-way ANOVA is the extension of one-way ANOVA using
two independent variables to calculate the main effect and interaction effect.
Chi-Square Test
Chi-Square Test checks whether or not a model follows approximately normality
when we have s discrete set of data points. Goodness of Fit Test determines if a
sample matches the population fit one categorical variable to a distribution. Chi-
Square Test for Independence compares two sets of data to see if there is a
relationship.
Regression

Linear Regression
Assumptions of Linear Regression
Linear Relationship
Multivariate Normality
No or Little Multicollinearity
No or Little Autocorrelation
Homoscedasticity
Linear Regression is a linear approach to modeling the relationship between a
dependent variable and one independent variable. An independent variable is a
variable that is controlled in a scientific experiment to test the effects on the
dependent variable. A dependent variable is a variable being measured in a
scientific experiment.

Linear Regression Formula.

Multiple Linear Regression is a linear approach to modeling the relationship
between a dependent variable and two or more independent variables.
Steps for Running the Linear Regression
Step 1: Understand the model description, causality, and directionality
Step 2: Check the data, categorical data, missing data, and outliers
Outlier is a data point that differs significantly from other observations. We can use the standard deviation
method and interquartile range (IQR) method.
Dummy variable takes only the value 0 or 1 to indicate the effect for categorical variables.
Step 3: Simple Analysis — Check the effect comparing between dependent variable to independent variable and
independent variable to independent variable
Use scatter plots to check the correlation
Multicollinearity occurs when more than two independent variables are highly correlated. We can use Variance
Inflation Factor (VIF) to measure if VIF > 5 there is highly correlated and if VIF > 10, then there is certainly
multicollinearity among the variables.
Interaction Term implies a change in the slope from one value to another value.
Step 4: Multiple Linear Regression — Check the model and the correct variables
Step 5: Residual Analysis
Check normal distribution and normality for the residuals.
Homoscedasticity describes a situation in which the error term is the same across all values of the independent
variables and means that the residuals are equal across the regression line.
Step 6: Interpretation of Regression Output
R-Squared is a statistical measure of fit that indicates how much variation of a dependent variable is explained by
the independent variables. Higher R-Squared value represents smaller differences between the observed data
and fitted values.
P-value
Regression Equation
Populations, Samples, Parameters, and Statistics
The field of inferential statistics enables you to make educated guesses about the numerical
characteristics of large groups. The logic of sampling gives you a way to test conclusions about
such groups using only a small portion of its members.
A population is a group of phenomena that have something in common. The term often refers
to a group of people, as in the following examples:
• All registered voters in Crawford County
• All members of the International Machinists Union
• All Americans who played golf at least once in the past year

But populations can refer to things as well as people:

• All widgets produced last Tuesday by the Acme Widget Company
• All daily maximum temperatures in July for major U.S. cities
• All basal ganglia cells from a particular rhesus monkey

These values in the population are called parameters. Parameters are the unknown
characteristics of the entire population, like the population mean and median. Sample
statistics describe the characteristics of a fraction of population which, is taken as the sample.
The sample mean and median is fixed and known.
• Instead, the company might select a sample of the population. A sample is a smaller
group of members of a population selected to represent the population. In order to use
statistics to learn things about the population, the sample must be random. A random
sample is one in which every member of a population has an equal chance of being
selected. The most commonly used sample is a simple random sample. It requires that
every possible sample of the selected size has an equal chance of being used.
• A parameter is a characteristic of a population. A statistic is a characteristic of a
sample. Inferential statistics enables you to make an educated guess about a
population parameter based on a statistic computed from a sample randomly drawn
from that population
Estimate, Estimator
What is an estimator?
In machine learning, an estimator is an equation for
picking the “best,” or most likely accurate, data
model based upon observations in realty. Not to be
confused with estimation in general, the estimator
is the formula that evaluates a given quantity (the
estimand) and generates an estimate. This estimate
is then inserted into the deep learning classifier
system to determine what action to take.
Uses of Estimators
• By quantifying guesses, estimators are how machine learning in
theory is implemented in practice. Without the ability to estimate the
parameters of a dataset (such as the layers in a neural network or the
bandwidth in a kernel), there would be no way for an AI system to
“learn.”
• A simple example of estimators and estimation in practice is the so-
called “German Tank Problem” from World War Two. The Allies had
no way to know for sure how many tanks the Germans were building
every month. By counting the serial numbers of captured or
destroyed tanks (the estimand), Allied statisticians created an
estimator rule. This equation calculated the maximum possible
number of tanks based upon the sequential serial numbers, and apply
minimum variance analysis to generate the most likely estimate for
how many new tanks German was building.
Types of Estimators
Estimators come in two broad categories—point and interval. Point equations
generate single value results, such as standard deviation, that can be plugged into
a deep learning algorithm’s classifier functions. Interval equations generate a
range of likely values, such as a confidence interval, for analysis.
In addition, each estimator rule can be tailored to generate different types of
estimates:
• Biased - Either an overestimate or an underestimate.
• Efficient - Smallest variance analysis. The smallest possible variance is referred
to as the “best” estimate.
• Invariant: Less flexible estimates that aren’t easily changed by data
transformations.
• Shrinkage: An unprocessed estimate that’s combined with other variables to
create complex estimates.
• Sufficient: Estimating the total population’s parameter from a limited dataset.
• Unbiased: An exact-match estimate value that neither underestimates nor
overestimates.
Properties of Good Estimators
A distinction is made between an estimate and an estimator.
The numerical value of the sample mean is said to be an
estimate of the population mean figure. On the other hand,
the statistical measure used, that is, the method of estimation
is referred to as an estimator. A good estimator, as common
sense dictates, is close to the parameter being estimated. Its
quality is to be evaluated in terms of the following properties:
• Unbiasedness
• Efficient
• Consistent
• Sufficient
1. Unbiasedness.
An estimator is said to be unbiased if its expected value is identical with the population
parameter being estimated. That is if θ is an unbiased estimate of θ, then we must have E (θ) =
θ. Many estimators are “Asymptotically unbiased” in the sense that the biases reduce to
practically insignificant value (Zero) when n becomes sufficiently large. The estimator S2 is an
example.
It should be noted that bias is estimation is not necessarily undesirable. It may turn out to be
an asset in some situations.

2. Consistency.
If an estimator, say θ, approaches the parameter θ closer and closer as the sample size n
increases, θ is said to be a consistent estimator of θ. Stating somewhat more rigorously, the
estimator θ is said is be a consistent estimator of θ if, as n approaches infinity, the probability
approaches 1 that θ will differ from the parameter θ by no more than an arbitrary constant.

The sample mean is an unbiased estimator of µ no matter what form the population
distribution assumes, while the sample median is an unbiased estimate of µ only if the
population distribution is symmetrical. The sample mean is better than the sample median as
an estimate of µ in terms of both unbiasedness and consistency.
3. Efficiency.
The concept of efficiency refers to the sampling variability of an estimator. If two
competing estimators are both unbiased, the one with the smaller variance (for a given
sample size) is said to be relatively more efficient. Stated in a somewhat different
language, an estimator θ is said to be more efficient than another estimator θ 2 for θ if
the variance of the first is less than the variance of the second. The smaller the variance
of the estimator, the more concentrated is the distribution of the estimator around the
parameter being estimated and, therefore, the better this estimator is.

4. Sufficiency.
An estimator is said to be sufficient if it conveys much information as is possible about
the parameter which is contained in the sample. The significance of sufficiency lies in the
fact that if a sufficient estimator exists, it is absolutely unnecessary to considered any
other estimator; a sufficient estimator ensures that all information a sample a sample
can furnished with respect to the estimation of a parameter is being utilized.
Many methods have been devised for estimating parameters that may provide
estimators satisfying these properties. The two important methods are the least square
method and the method of maximum likelihood.
Estimate and Estimators
Let X be a random variable having distribution fx(x;θ),
where θ is an unknown parameter. A random sample, X1,
X2, ——, Xn, of size n taken on X.

The problem of point estimation is to pick a statistic,

g(X1, X2, ——–, Xn), that best estimates the parameter θ.

Once observed, the numerical value of g(x1, x2, ——–,

xn) is called an estimate, and Statistic g(X1, X2, ——–, Xn)
is called an estimator.
What is Point Estimators?
Point estimators are defined as the functions that are used to find an approximate
value of a population parameter from random samples of the population. They
take the help of the sample data of a population to determine a point estimate or a
statistic that serves as the best estimate of an unknown parameter of a population.
Sample Mean and Sample Variance
Two important Statistic:
Let X1, X2, X3, ——–, Xn be a random sample, then:
The Sample mean is denoted by x̄, and Sample Variance is
denoted by s2
Here x̄ and s2 are called the sample parameters.
The population parameters are denoted by:
σ2 = population variance, and µ = population mean

Fig. Population and Sample Mean

Fig. Population and Sample Variance
Measures of Spread of Data
In the world of data science, some of the most important
decisions regarding analyses are made while performing
exploratory data analysis on data-sets. While
understanding the concepts of Mean, Median and Mode
help analysts get started with the basic structure of the
data set, these are just the measures of central
tendency and don’t provide an overview of the entire
data set. Understanding Range, Interquartile Range
(IQR), Standard Deviation and Variance help us to
understand how spread out our data are from one
another.
When we discuss measures of spread, we are considering
numeric values that are associated with how far our points
are from one another.
Common measures of spread include:
• Range
• Interquartile Range (IQR)
• Standard Deviation
• Variance
It is easiest to understand the spread of our data visually and
the most common visual for quantitative data is the
Histogram.

DL RS 299a
100% (3)
DL RS 299a
9 pages
Statistics Cheat Sheet
100% (3)
Statistics Cheat Sheet
23 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
CG8 Data-Analysis
No ratings yet
CG8 Data-Analysis
63 pages
Chapter 5 Data Analysis Ab
No ratings yet
Chapter 5 Data Analysis Ab
56 pages
Main Title: Planning Data Analysis Using Statistical Data
100% (1)
Main Title: Planning Data Analysis Using Statistical Data
40 pages
Statistics: An Introduction and Overview
No ratings yet
Statistics: An Introduction and Overview
51 pages
CH 5
No ratings yet
CH 5
26 pages
Lecture 4 - Data Science Statistics
No ratings yet
Lecture 4 - Data Science Statistics
21 pages
3 Matm111
No ratings yet
3 Matm111
3 pages
Chapter 9
No ratings yet
Chapter 9
32 pages
Aicp Review Stats
No ratings yet
Aicp Review Stats
62 pages
ML Unit 3
No ratings yet
ML Unit 3
46 pages
Statistics
No ratings yet
Statistics
33 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Biostatistics Notes Part 1
No ratings yet
Biostatistics Notes Part 1
9 pages
Data Analysis: Florenda F. Cabatit RN MA Facilitator
No ratings yet
Data Analysis: Florenda F. Cabatit RN MA Facilitator
44 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Statistical Techniques - Bda
No ratings yet
Statistical Techniques - Bda
33 pages
Lesson One Introduction To Inferential Statistics
No ratings yet
Lesson One Introduction To Inferential Statistics
20 pages
Quantitative Data Analysis: Harshad Bajpai
No ratings yet
Quantitative Data Analysis: Harshad Bajpai
26 pages
Statistics - Exam Reviewer (Final)
No ratings yet
Statistics - Exam Reviewer (Final)
10 pages
Statistics SS2020
No ratings yet
Statistics SS2020
12 pages
Lecture 2 - MAT361 (21 JAN 2025)
No ratings yet
Lecture 2 - MAT361 (21 JAN 2025)
40 pages
Group 1
No ratings yet
Group 1
79 pages
Statss 2
No ratings yet
Statss 2
7 pages
1 Descriptive Statistics
No ratings yet
1 Descriptive Statistics
20 pages
Lecture 7.descriptive and Inferential Statistics
No ratings yet
Lecture 7.descriptive and Inferential Statistics
44 pages
Statistics През
No ratings yet
Statistics През
46 pages
Reviewer For Psych Stats
No ratings yet
Reviewer For Psych Stats
36 pages
Quantitative Analysis Sheet
No ratings yet
Quantitative Analysis Sheet
1 page
AP Statistics Important Vocabulary Terms
No ratings yet
AP Statistics Important Vocabulary Terms
5 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
What Is Statistics
No ratings yet
What Is Statistics
5 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
40 pages
Things To Know PDF
No ratings yet
Things To Know PDF
56 pages
Biostatistics Notes
100% (1)
Biostatistics Notes
8 pages
Biostatistics Notes: Descriptive Statistics
No ratings yet
Biostatistics Notes: Descriptive Statistics
16 pages
Statistics - The Big Picture
No ratings yet
Statistics - The Big Picture
4 pages
Psych 101 Endterm Notes
No ratings yet
Psych 101 Endterm Notes
9 pages
Statistics 101
100% (1)
Statistics 101
20 pages
Data Analysis
100% (1)
Data Analysis
34 pages
Screenshot 2024-03-09 at 21.19.03
No ratings yet
Screenshot 2024-03-09 at 21.19.03
3 pages
(Ebook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach Download
No ratings yet
(Ebook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach Download
29 pages
Introduction Into Statistics: Vladimir Kozlov
No ratings yet
Introduction Into Statistics: Vladimir Kozlov
20 pages
Applied - Data - Science MODULE 2 SEM8
No ratings yet
Applied - Data - Science MODULE 2 SEM8
53 pages
Lecture 1
No ratings yet
Lecture 1
72 pages
2.2 Probability
No ratings yet
2.2 Probability
19 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
Data Science by CFA
No ratings yet
Data Science by CFA
27 pages
Eco254 Summary (Full) 08024665051
No ratings yet
Eco254 Summary (Full) 08024665051
12 pages
Basic Stats For Testing and Intelligence
No ratings yet
Basic Stats For Testing and Intelligence
11 pages
Descriptive and Inferential Statistical Analysis
No ratings yet
Descriptive and Inferential Statistical Analysis
25 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
CH11 PPT
No ratings yet
CH11 PPT
33 pages
Uts WPS Office
No ratings yet
Uts WPS Office
7 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
A Semi-Nonparametric Poisson Regression Model For Analyzing Motor Vehicle Crash Data
No ratings yet
A Semi-Nonparametric Poisson Regression Model For Analyzing Motor Vehicle Crash Data
17 pages
Gather Statistics With DBMS - STATS
No ratings yet
Gather Statistics With DBMS - STATS
24 pages
Fuzzy Randomness - Towards A New Modeling of Uncertainty: Bernd Möller, Wolfgang Graf, Michael Beer and Jan-Uwe Sickert
No ratings yet
Fuzzy Randomness - Towards A New Modeling of Uncertainty: Bernd Möller, Wolfgang Graf, Michael Beer and Jan-Uwe Sickert
10 pages
Color Sensor V3 - Overview
No ratings yet
Color Sensor V3 - Overview
19 pages
Bareither 2015
No ratings yet
Bareither 2015
17 pages
A General Definition of Residuals
No ratings yet
A General Definition of Residuals
29 pages
Nakagami Distribution: Probability Density Function
No ratings yet
Nakagami Distribution: Probability Density Function
7 pages
Ch13 Sampling
No ratings yet
Ch13 Sampling
20 pages
Classification For Forecasting and Stock Control: A Case Study
No ratings yet
Classification For Forecasting and Stock Control: A Case Study
10 pages
Introduction To Inverse Problems - Guillaume Bal PDF
No ratings yet
Introduction To Inverse Problems - Guillaume Bal PDF
205 pages
PA Profile
No ratings yet
PA Profile
258 pages
What Is Statistics
No ratings yet
What Is Statistics
13 pages
Fouling Detection in A Cross - Ow Heat Exchanger Based On Physical Modeling
No ratings yet
Fouling Detection in A Cross - Ow Heat Exchanger Based On Physical Modeling
9 pages
Introduction To Rigging 3ds Max
100% (3)
Introduction To Rigging 3ds Max
106 pages
Parameter and Statistic DLP
No ratings yet
Parameter and Statistic DLP
5 pages
Application Notes For ASDA Series Servo Drive: Industrial Automation Headquarters
No ratings yet
Application Notes For ASDA Series Servo Drive: Industrial Automation Headquarters
274 pages
Value Sets in Oracle Apps
No ratings yet
Value Sets in Oracle Apps
11 pages
How To Use Moldex3D To Assess Gate Freeze Time and Optimize Packing Time
No ratings yet
How To Use Moldex3D To Assess Gate Freeze Time and Optimize Packing Time
5 pages
A New Perspective On The Constant Mi of The Hoek-Brown Failure Criterion and A New Model For Determining The Residual Strength
No ratings yet
A New Perspective On The Constant Mi of The Hoek-Brown Failure Criterion and A New Model For Determining The Residual Strength
15 pages
Chapter10 Solutions
No ratings yet
Chapter10 Solutions
6 pages
Rowe 1994
No ratings yet
Rowe 1994
8 pages
Fagor CNC 8055 Error Solution English PDF
No ratings yet
Fagor CNC 8055 Error Solution English PDF
110 pages
14.inspection Plan
67% (3)
14.inspection Plan
66 pages
36 Circular 2025
No ratings yet
36 Circular 2025
22 pages
Litrature Review
No ratings yet
Litrature Review
10 pages
A Dynamic Flotation Model For Predictive Control Incorporating Froth Physics. Part I Model Development
No ratings yet
A Dynamic Flotation Model For Predictive Control Incorporating Froth Physics. Part I Model Development
23 pages
Dynsim 5.3.2 Utilities: Simsci
No ratings yet
Dynsim 5.3.2 Utilities: Simsci
36 pages
Inferential Statistics Lecture
No ratings yet
Inferential Statistics Lecture
83 pages
Quality Control 2024
No ratings yet
Quality Control 2024
43 pages
Sem 6 - DSV - Unit 4 - Sampling and Estimation
No ratings yet
Sem 6 - DSV - Unit 4 - Sampling and Estimation
50 pages

Unit-IV of Data Science

Uploaded by

Unit-IV of Data Science

Uploaded by

Unit-IV : Statistics

Dr. Amit Kumar Chaturvedi

• Descriptive Analytics tells us what happened in the past and

Intersection and Union.

Percentiles, Quartiles and Interquartile Range (IQR).

Covariance and Correlation.

Comparison between PMF, PDF, and CDF.

Null and Alternative Hypothesis

Linear Regression Formula.

But populations can refer to things as well as people:

The problem of point estimation is to pick a statistic,

Once observed, the numerical value of g(x1, x2, ——–,

Fig. Population and Sample Mean

You might also like