0% found this document useful (0 votes)
170 views

Stats Notes

The document discusses different statistical concepts and techniques. It defines random and non-random sampling, explaining that random sampling allows for an equal probability of selection while non-random sampling relies on factors like convenience. It also discusses the differences between statistics and parameters, and types of data like qualitative and quantitative data. Finally, it provides an overview of techniques like simple random sampling, stratified random sampling, and the five steps of the six sigma methodology.

Uploaded by

Maryam Hussain
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views

Stats Notes

The document discusses different statistical concepts and techniques. It defines random and non-random sampling, explaining that random sampling allows for an equal probability of selection while non-random sampling relies on factors like convenience. It also discusses the differences between statistics and parameters, and types of data like qualitative and quantitative data. Finally, it provides an overview of techniques like simple random sampling, stratified random sampling, and the five steps of the six sigma methodology.

Uploaded by

Maryam Hussain
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Differences:

Random Sampling/probability Non-Random sampling/non-probability


Random sampling is a sampling technique Non-random sampling is a sampling
where each sample has an equal probability of technique where the sample selected will be
getting selected based on factors such as convenience,
judgement and experience of the researcher
and not on probability
Random sampling is unbiased in nature Non-random sampling is biased in nature
Based on probability Based on other factors such as convenience,
judgement and experience of researcher but,
not based on probability
Random sampling is representative of the Non-random sampling lacks the
entire population representation of the entire population
No chances of zero probability Zero probability can occur
Random sampling is the simplest sampling Non-random sampling method is a somewhat
technique complex sampling technique

Simple Random Sampling with replacement Without replacement


Sampling with replacement is used to find Sampling without Replacement is a way to
probability with replacement figure out probability without replacement
We choose two items which are independent We choose two items which are dependent of
of each other each other
it allows us to use the same dataset multiple we don’t want the data for any given
times to build models as opposed to going out household to appear twice in the sample so
and gathering new data, which can be time- we would sample without replacement.
consuming and expensive

Statistics Parameter
a statistic is a number describing a sample A parameter is a number describing a whole
population
With statistics, we can use sample statistics to The goal of quantitative research is to
make educated guesses about population understand characteristics of populations by
parameters. finding parameters
Easy, time-saving, and feasible too difficult, time-consuming or unfeasible
e.g., sample means, sample variance e.g., population mean, population variance
Exp: Standard deviation of weights of Exp: Standard deviation of weights of all
avocados from one farm. avocados in the region.

Simple Random Sampling Stratified Random Sampling


A simple random sample is used to represent A stratified random sample, on the other
the entire data population and randomly hand, first divides the population into smaller
selects individuals from the population groups, or strata, based on shared
without any other consideration. characteristics
Economical in nature, less time consuming Economical in nature, less time consuming,
less chance of bias as compared to simple
random sampling, and higher accuracy than
simple random sampling
Chance of bias, the difficulty of getting a Need to define the categorical variable by
representative sample which subgroups should be created — for
instance, age group, gender, occupation,
income, education, religion, region, etc.
The simple random sample is often used when This method of sampling means there will be
there is very little information available about selections from each different group—the size
the data population of which is based on its proportion to the
entire population

Proportional Allocation Optimum Allocation


Variance is higher than optimum Optimum Allocation has the least Variance
sample units are selected from within strata The cheaper the cost per unit in a stratum, the
in proportion to their strata sizes larger should be the size of the sample from that
stratum
proportional allocation is appropriate when The allocation of the sample to different strata
different parts of population are made in accordance with this principle is called
proportionally represented in the sample the principle of optimum allocation
each stratum directly depends on the The larger the variability within a stratum, the
number of units in the stratum large should be the size of the sample from it
N1, S1, C1

One way ANOVA Two-way Anova


A test that allows one to make comparisons A test that allows one to make comparisons
between the means of three or more groups of between the means of three or more groups of
data. data, where two independent variables are
considered.
A one-way ANOVA has one independent A two-way ANOVA has two independent
variable variables
The means of three or more groups of an The effect of multiple groups of two
independent variable on a dependent variable. independent variables on a dependent variable
and on each other.
Number of samples are Three or more. Each variable should have multiple samples.
Questions and Answers:
Statistics and Types:
Statistics:
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and
presentation of data. In applying statistics to a scientific, industrial, or social problem, it is
conventional to begin with a statistical population or a statistical model to be studied.
Populations can be diverse groups of people or objects such as "all people living in a country" or
"every atom composing a crystal". Statistics deals with every aspect of data, including the
planning of data collection in terms of the design of surveys and experiments. There are two
branches:
Descriptive:
Descriptive statistics deals with the presentation and collection of data. This is usually the first
part of a statistical analysis. It is usually not as simple as it sounds, and the statistician needs to
be aware of designing experiments, choosing the right focus group and avoid biases that are so
easy to creep into the experiment.
Inferential:
Inferential statistics, as the name suggests, involves drawing the right conclusions from the
statistical analysis that has been performed using descriptive statistics. In the end, it is the
inferences that make studies important and this aspect is dealt with in inferential statistics.

Data and its Types:


Data:
In statistics, groups of individual data points may be classified as belonging to any of various
statistical data types, e.g., categorical ("red", "blue", "green"), real number (1.68, -5, 1.7e+6),
odd number (1,3,5) etc. The data type is a fundamental component of the semantic content of the
variable, and controls which sorts of probability distribution can logically be used to describe the
variable, the permissible operations on the variable, the type of regression analysis used to
predict the variable, etc. The concept of data type is similar to the concept of level of
measurement, but more specific: For example, count data require a different distribution (e.g., a
Poisson distribution or binomial distribution) than non-negative real-valued data require, but
both fall under the same level of measurement (a ratio scale)
Types:
There are different types of data in Statistics, that are collected, analysed, interpreted and
presented. The data are the individual pieces of factual information recorded, and it is used for
the purpose of the analysis process. There are two types of data; qualitative data and quantitative
data.
Qualitative Data:
Qualitative data, also known as the categorical data, describes the data that fits into the
categories. Qualitative data are not numerical. The categorical information involves categorical
variables that describe the features such as a person’s gender, home town etc. Categorical
measures are defined in terms of natural language specifications, but not in terms of numbers. It
includes normal data and ordinal data.
Quantitative Data:
Quantitative data is also known as numerical data which represents the numerical value (i.e.,
how much, how often, how many). Numerical data gives information about the quantities of a
specific thing. Some examples of numerical data are height, length, size, weight, and so on. The
quantitative data can be classified into two different types based on the data sets. The two
different classifications of numerical data are discrete data and continuous data. It includes
discrete data and continuous data.

5 steps of six sigma:


The Six Sigma Methodology comprises five data-driven stages — Define, Measure, Analyze,
Improve and Control (DMAIC)
1. Define - The “Define” stage seeks to identify all the pertinent information necessary to
break down a project, problem or process into tangible, actionable terms. It emphasizes
the concrete, grounding process improvements in actual, quantifiable and qualifiable
information rather than abstract goals.
2. Measure - In the “Measure” phase, organizations assess where current process
capabilities are. While they understand they need to make improvements and have listed
those improvements concretely in the Define phase, they
3. Analyse - The “Analyze” phase examines the data amassed during the Measure stage to
isolate the exact root causes of process inefficiencies, defects and discrepancies. In short,
it extracts meaning from your data. Insights gleaned from Analyzation begin scaffolding
the tangible process improvements for your team or organization to implement
4. Improve - The “Improve” initiates formal action plans meant to solve the target root
problems gleaned from your Analyzations. Organizations directly address what they’ve
identified as problem root causes, typically deploying a Design of Experiment plan to
isolate different variables and co-factors until the true obstacle is found.
5. Control - In the final phase, “Control,” Six Sigma teams create a control plan and deploy
your new standardized process. The control plan outlines improved daily workflows,
which result in critical business process variables abiding by accepted quality control
variances.

Importance of QC Chart:
QC Chart in Food Safety and Quality:
A quality control chart is a graphical representation of whether a firm's products or processes are
meeting their intended specifications. If problems appear to arise, the quality control chart can be
used to identify the degree by which they vary from those specifications and help in error
correction.
The food industry deals with highly sensitive products. This is one of the key reasons behind
maintaining quality standards and adhering to quality requirements, which are imperative for
players in the food industry. When it comes to food items, most of us tend to repeatedly buy the
same brand which we perceive is of good quality and matches our expectations
Also, in the case of companies in this industry, even a small incident where the quality of
products has been compromised could tarnish the brand image. Consequently, the company’s
profits could go crashing down the hill. This makes having appropriate quality control measures
highly necessary for brands dealing in food products. Quality control (QC) is a reactive process
and aims to identify and rectify the defects in finished products. It can be achieved by identifying
and eliminating sources of quality problems to ensure customer’s requirements are continually
met. It involves the inspection aspect of quality management and is typically the responsibility of
a specific team tasked with testing products for defects.

Scales of Measurement:
In statistics, there are four data measurement scales: nominal, ordinal, interval and ratio. These
are simply ways to sub-categorize different types of data (here’s an overview of statistical data
types).
1. Nominal - The nominal scale of measurement defines the identity property of data. This
scale has certain characteristics, but doesn’t have any form of numerical meaning. The
data can be placed into categories but can’t be multiplied, divided, added or subtracted
from one another. It’s also not possible to measure the difference between data points.
2. Ordinal - The ordinal scale defines data that is placed in a specific order. While each
value is ranked, there’s no information that specifies what differentiates the categories
from each other. These values can’t be added to or subtracted from.
3. Interval - The interval scale contains properties of nominal and ordered data, but the
difference between data points can be quantified. This type of data shows both the order
of the variables and the exact differences between the variables. They can be added to or
subtracted from each other, but not multiplied or divided. For example, 40 degrees is not
20 degrees multiplied by two.
4. Ratio - Ratio scales of measurement include properties from all four scales of
measurement. The data is nominal and defined by an identity, can be classified in order,
contains intervals and can be broken down into exact value. Weight, height and distance
are all examples of ratio variables. Data in the ratio scale can be added, subtracted,
divided and multiplied.
Hypothesis testing:
Hypothesis testing is a form of statistical inference that uses data from a sample to draw
conclusions about a population parameter or a population probability distribution. First, a
tentative assumption is made about the parameter or distribution. This assumption is called the
null hypothesis and is denoted by H0. In a statistical hypothesis test, a null hypothesis and an
alternative hypothesis is proposed for the probability distribution of the data.
Five Steps in Hypothesis Testing:
6. Specify the Null Hypothesis - The null hypothesis (H0) is a statement of no effect,
relationship, or difference between two or more groups or factors. In research studies, a
researcher is usually interested in disproving the null hypothesis.
7. Specify the Alternative Hypothesis - The alternative hypothesis (H1) is the statement that
there is an effect or difference. This is usually the hypothesis the researcher is interested
in proving. The alternative hypothesis can be one-sided (only provides one direction,
e.g., lower) or two-sided.
8. Set the Significance Level (a) - The significance level (denoted by the Greek letter alpha
— a) is generally set at 0.05. This means that there is a 5% chance that you will accept
your alternative hypothesis when your null hypothesis is actually true.
9. Calculate the Test Statistic and Corresponding P-Value - In another section we present
some basic test statistics to evaluate a hypothesis. Hypothesis testing generally uses a test
statistic that compares groups or examines associations between variables.
10. Drawing a Conclusion – reject or accept null hypothesis

Definition:
X-Chart:
In statistical process monitoring, the X-Chart is a type of scheme, popularly known as control
chart, used to monitor the mean and range of a normally distributed variables simultaneously,
when samples are collected at regular intervals from a business or industrial process.
X-Bar Chart:
In industrial statistics, the X-bar chart is a type of Shewhart control chart that is used to monitor
the arithmetic means of successive samples of constant size, n. This type of control chart is used
for characteristics that can be measured on a continuous scale, such as weight, temperature,
thickness etc.
S-Chart:
s charts are used to monitor the mean and variation of a process based on samples taken from the
process at given times (hours, shifts, days, weeks, months, etc.). The measurements of the
samples at a given time constitute a subgroup. Typically, an initial series of subgroups is used to
estimate the mean and standard deviation of a process.
P-Chart:
In statistical quality control, the p-chart is a type of control chart used to monitor the proportion
of nonconforming units in a sample, where the sample proportion nonconforming is defined as
the ratio of the number of nonconforming units to the sample size, n.

You might also like