0% found this document useful (0 votes)

8 views20 pages

Unit 1

Uploaded by

pes2ug23cs007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views20 pages

Unit 1

Uploaded by

pes2ug23cs007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

UNIT 1

Date : 21/09/2024
Subject : #UE23MA242A

What is Data?
Data refers to individual fact, statistics or items of
information that are collected through observation

Data v/s Information

1. Data:
Data is usually Raw Fact formatted in a specific way.
Data is based on record and observation
Unorganized
2. Information:
Information usually has additional meaning beyond the facts
themselves
Based on Analysis of Data
Organized

Example:
Data => {13.5C, 14.7C, 19.9C}
Information => Highest temp = 19.9C and lowest temp = 13.5C

Structured, Semi-Structured and

Unstructured
1. Structured:
This type of data is addressable for effective analysis.
Basically a database
2. Semi-Structured:
This type of data although does not have relations, it still
has a little bit of structure in it. Ex: XML -> has tags to
organize but no relations
3. Unstructured
No sort of structuring at all, just data everywhere. Ex:
Audio, Video, Location etc

Why do we need Data Science?

The main reason why we need data science is so that we can
process and interpret data, which can help to make better
decisions and help in growth, optimization etc
As we know, unstructured data is very hard to work with but its
size keeps increasing and increasing. Data Science provides a
way to extract valuable information from this Unstructured data

The 6 V's of Big Data

Volume: The amount of data that is generated
Velocity: The speed with which data is generated
Variety: The types of data being generated
Veracity: The trustworthiness of the data
Value: The value this data provides to the end user, business
etc
Variability: The ways in which data can be used and formatted

What is Statistical Analysis?

It's the science of collecting, exploring and presenting large
amounts of data to discover underlying patterns and trends
The basic idea of any statistical methods of data is to make
inference about a population by studying a relatively small
sample chosen from the population

Population:
A population is the entire collection of data/objects about
which the information is sought

Sample:

Sample is merely a subset of the Population, containing the

data/objects over which the outcomes are actually observed
Sample Size:
The number of items in the considered sample. This number
should always be less than the population size
The process of selecting observations in order to make an
inference that can be generalized to the public is know as
Sampling

Population v/s Sample

Population Sample
It is the complete set It is a subset of the population
Hard to define and Easier to define and observe
observe
Time consuming and Faster and Cheaper to study
Costly to study
Contains all the members Contains a small part of the
of a specified grp population that represents the
population
Eg: All the countries of Eg: Countries with data available
the world about GDP and birth rates since 2000

Why Sampling?
Necessity: Sometimes it’s simply not possible to study the whole
population
due to its size or inaccessibility.
Practicality:
It’s easier and more efficient to collect data from a
sample.
Cost-effectiveness:
There are fewer participant, laboratory, equipment, and
researcher costs involved.
Manageability:
Storing and running statistical analyses on smaller datasets
is easier and reliable.
Saves time:
As sample size is relatively less, it increases data-
collection speed

Characteristics of Sample:
1. Unbiased, which means it shd have all types of data in equal
proportion from the population
2. It should represent the population
3. It should be Goal-Oriented
4. It should be random i.e. every item in the population must have
an equal chance of getting selected
5. It should be Appropriately sized, it cannot be very small
compared to the population and also cannot be bigger than the
population. It should be just enough so that it can fit the
other characteristics

Types of Population:
1. Tangible or Concrete Population:
This type of populations consists of physical such as cars,
bolts, apples etc which can be counted. Thus, this
population is usually finite and involves counting
Each time an item is Sampled, population size decreases by 1
2. Conceptual Population:
Population which do not include physical objects as their
population are called Conceptual Population
This type of population usually involves measuring something
multiple times
The size of the population is usually very large
Example:
1. A geologist weighs a rock, several times on a scale ->
Conceptual Population
3. Target Population:
It is the entire population that researchers want to
generalise the conclusions
4. Study Population:
It is the population that the researchers have access to, it
can be due to geographic limitations, permissions, money etc
It is a subset of the Target Population
Sampling Breakdown:
Study : Find the mean weight of all students of all universities
in India.

1. Whom do you want to generalize results?

All universities in India
So this is our Target or Theoretical population
2. What population can you get access to?
All universities in Karnataka
So this is our Study population
3. How can you get access to them?
List of Universities in Karnataka
So this is our Sampling frame (Sampling frame is the list of
items or events from which potential respondents are drawn)
4. Who is in your study?
Two Universities from Karnataka
So this is our Sample

Types of Sampling Methods:

Read the Advantages and Disadvantages of each Sampling Method
from Slides

Samples

Non-Probability Samples Probability Samples

Probability Sampling
It is a type of sampling in which every unit in the population
has a non-zero chance/probability of being selected
This type of sampling reduces bias
When every item in the population has a equal probability to get
selected in the sample it is known as Equal Probability Sampling
or Self-Weighing
Probability Samples

Simple Random Systematic Stratified Cluster

1) Simple Random:

As the name suggests, it is an entirely random method of

selecting the sample
Here, each item has an equal probability of getting selected
Sampling Frame should be the entire Population
Best to use when sample size is small
Usually, all the items in the Population are assigned a number
and numbers are chosen at random to make the sample
Probability of an item getting selected is

n
( ) × 100
N

n -> Sample Size, N -> Population Size

1.5) (i) Simple Random Sampling With Replacement:

In this sampling method, a random item is selected, then it is

measured or recorded and then sent back to the population where
it can be selected again
This leads to a single item being sampled multiple times
Each time we sample a unit, all units have the same probability
of being sampled

1.5) (ii) Simple Random Sampling Without Replacement:

As the name suggests, in this sampling method there is not

replacement hence, a single unit cannot be sampled more than
once
This leads to the probability of items changing after one sample
has been made
Example: take a box of 10 balls, initially all the balls have a
probability of 10% (1/10 *100) but once we remove a ball it
becomes 11.11% (1/9*100)

2) Systematic:
In this type of sampling, we arrange the population is some
systematic way and then pick out samples at regular intervals
The first element is also selected randomly
Then we choose every K'th element where K = (Population Size /
Sample Size)
This is also an Equal Probability Sampling method

3) Stratified:

The population is broken down into Smaller Sub-Population called

Strata, then any of the sampling methods are applied to these
sub-population strata to create a sample
Usually we apply simple random sampling to choose items from
each strata
The strata can be chosen on terms of common characteristics but
we need to make sure that none of the strata overlap
The main advantage of this method is that the minorities in the
population are also given importance

4) Cluster:

In this type of sampling, population is again divided into non-

overlapping clusters which is a miniature of the population (In
stratified sampling we broke down the population based on common
traits)
Then entire clusters are selected randomly to be part of the
Sample

4.5) (i) One Stage Cluster Sampling:

In this, clusters are made -> random clusters are selected ->
entire clusters are put in the sample

4.5) (ii) Two Stage Cluster Sampling:

In this, clusters are made -> random clusters are selected ->
random items from the randomly chosen clusters are selected ->
put in the sample

Strata v/s Clusters:

Each Strata is represented in the Sample, since every strata

gives one or more element in the sample

Non-Probability Sampling:
In this type of sampling, every item does not have an equal
chance of getting selected in the sample.
This means some items might have a 0 probability of getting
selected, these items are usually referred to as Out of
coverage/Undercovered items
It involves the selection of elements based on assumptions
regarding the population of interest
It is a more biased sample

Non-Probability Samples

Convenience Judgement Snowball Quota

1) Convenience Sampling:

Sometimes it is also known as grab or opportunity or haphazard

or accidental sampling
The sample is drawn from the part of the population which is
close to hand to the researcher
Researcher cannot make generalisation about the population from
this as the sample will not be reprehensive enough of the
population

2) Judgement Sampling:

aka Purposive sampling

In this type of sampling, researcher chooses the sample based on
what they think is best for the research
This is used when limited number of people have expertise in the
area being researched
It is not a scientific way of sampling data

3) Snowball Sampling:

In this type of sampling, survey subjects are selected based on

referral from other survey respondents
This method is effective when sampling frame is difficult to
identify

4) Quota Sampling:
In this sampling, sample elements are selected until the Quota
controls are satisfied
The population is first divided into mutually exclusive sub
groups like Stratified sampling and then judgement sampling is
used to select item from each segment based on a specific
proportion

Sample Statistic v/s Population Parameter

Sample Statistic:
It is a piece of information that you get from a small part
of the population or sample
It can also be called the statistic computed from sample
data
Ex: Sample average, median etc
Population Parameter:
A statistical measure for a given population
It refers to the entire population
Ex: mean and variance of the population

Errors in Sampling
1. Sampling Error or random error
Occurs when sample is not representative of the population
The discrepancy between a sample statistic and its
population parameter is called sampling error
It occurs when the sample is not representative of the
population
2. Non-Sampling or Systematic Error
Occurs during data collection, causing the data to differ
from the true values
Sampling Bias:
This bias occurs when the sample is not representative of
the population
It can be either Selection bias or Non-Response Bias
Selection Bias:
It is a bias in which, the samples are chosen in such
a way that some members of the intended population
have a higher or lower sampling probability that
others
Non-Response Bias:
This occurs due to the absence of certain groups of
items from the population during sampling

Types of Data:
NOIR => Nominal Ordinal Interval Ratio

1. Qualitative (N-O):
Measurements that cannot be recorded on a naturally
occurring scale
These information can be categorized by category but not by
number
Nominal and Ordinal
Nominal:
Data that can be categorized without any natural order
Ex: Gender (Male, Female), Colours(Red, Green and Blue)
Ordinal:
Data has a natural order, but does not have a regular
interval between them
Ex: Grades (A,B,C), Satisfaction Levels
2. Quantitative (I-R):
This is the measurements that can be recorded on a naturally
occurring scale
These data are easily open for statistics and can be plotted
on various graphs
Discrete, Continuous, Interval and Ratio
Discrete:
If the values of the set are discrete and separate then
it is said to be discrete data
Usually bar charts are used to display this data
It has a limited number of values
Continuous:
If the values in the set can take any value finite or
infinite from the interval it is said to be continuous
data
Interval:
In this data type, data is measured along a scale with
regular intervals
Although they are placed at regular intervals, they do
not have a meaningful zero
Ex: Temperature in Celsius (0 Celsius does not mean "No
temperature")
Ratio:
Similar to interval data but it has a meaningful zero
point
It is the most precise type of data and allows for all
statistical techniques
Ex: Weight (0 means no weight)

Variables and Attributes

Variables are placeholder that can hold any type of data
Attributes refer to characteristic or properties of an entity.
For example in a dataset of children name, age, sex can be the
attributes
Types of Variables is the same as Types of data
Types of attributes depends on the properties it possesses
Nominal - Distinctiveness
Ordinal - Distinctiveness and Order
Interval - Distinctiveness, Order and Addition
Ratio - Distinctiveness, Order, Addition and Multiplication

Types of Studies:
Observational:
No interference from the researcher, subject are just
observed
Experimental:
Interference by the researcher to perform an experiment then
observations are made
Usually, experimental study happen with two groups namely
Control and Experimental
Control groups usually have no intervention by the
researcher so that the independent variable being
measured has no effect
Experimental group are the group on which the experiment
is conducted, and where the effect of the independent
variable is observed

Types of Statistics:
Descriptive Statistics:
Involves organization. summarization and display of data
It uses numerical and graphical methods to look for patterns
in a data set, to summarize the given data and to display
the data in a convenient form
Inferential Statistics:
Involves inferring something from the sample to infer
something about the population
It uses sample data to make estimates, generalization,
decisions and predictions about a larger set of data
There are two main area of Inferential Statistics:
Estimating Parameters:
This means taking a sample statistic (from sample
data) and using it to say something about a
population parameter
Hypothesis Testing:
This is where sample data is used to answer research
questions

Descriptive Statistics:
Measures of central tendency:
There are 3 diff. types of average mean, median and mode
All of them summarize where the centre of the data is

Mean:
It is the arithmetic average of the given data
Population mean

n
1
x = ∑ xi
N
i=1

N -> Population Size

x -> Measurement Values
Sample Mean

n
1
x̄ = ∑ xi
n
i=1

n -> Sample Size

x -> Measurement Values

Weighted mean:
It is the average where some of the elements contribute more to
the mean value

n
∑ wi xi
i=1
x̄ =
n
∑ wi
i=1

w -> Weights of the data

Trimmed Mean:

It is computed by arranging the sample data in an order and

trimming an equal number of them from both sides and then
finding the mean
if p% of the data is trimmed from both sides then the mean is
called p% trimmed mean
if a sample size is denoted by n and a p% trim is required then
the amount of data points to be removed is

100

This mean is used to reduce the effect of outliers in our mean

This method is suited for largely skewed or erratic deviations

Median:

It is the value separating the upper half values from the lower
half
It is the middle number of a sorted sample
To find the median we first arrange the data in ascending order
then,
If the no.of. elements is odd then median is the (n+1)/2 th terms
value
If the no.of. elements is even then the median is the average of
(n/2) and (n+1)/2 th term

Mode:

It is the value that occurs most number of times in the sample

data
If a sample has only one distinct mode it is called Unimodal
If it has 2 distinct modes -> Bimodal
If all the values of the sample data are very close to the mode
-> Uniform
If it has more than 2 distinct modes -> Multimodal

Empirical Formula

mean − mode = 3 ∗ (mean − meadian)

Skewness:
It is the measure of asymmetry of the distribution about its
mean
Skewness can be +ive, -ive, zero or undefined
Symmetric distribution is the one where the left and right side
of the distributions are balanced. The mean, median and mode are
the same
Skewed distribution is the one where the left and right side of
the distribution is imbalanced. The mean, median lie more
towards the skewness than the mode.
mean < median < mode -> Left Skewed
mode < median < mean -> Right Skewed
mean = median = mode -> Symmetric

Measures of Spread/Deviation:
It helps us tell how much the data is spread or how
homogenous/heterogenous the data is
There are two main methods to measure the spread of a data
Absolute:
Contains the same unit as the data, usually is the
average of deviations of observations such as standard
deviation etc
Relative:
This is used to compare the deviation of two or more data
sets

Range:

Most common and easily understandable

Difference b/w Max and Min of the data set
It can sometimes be misleading as it can be affected a lot if
the outlier values are very big
Example : Consider {8,11,5,9,7,6,3616} here although all the
values are around 10 the range is still 3616 - 8 = 3608 which
does not accurately tell the spread of the dataset

Percentile:
A percentile is a comparison measure between a particular value
and the values of rest of the dataset
For example, if u have scored 75 marks and are ranked in the
85th percentile that means that 75 marks is greater than 85% of
the scores
The percentile rank is calculated using

P
R = ( )(n + 1)
100

n -> Sample Size

The pth percentile of a sample, divides the distribution such
that
p% of the sample values are less that the pth percentile
(100 - p)% of the sample values are greater than the pth
percentile
Steps to calculate the percentile rank:
1. Order the sample in ascending order
2. Compute the rank
3. If rank is integer then the sample value in this position is
the percentile rank
4. Otherwise it is the average of the sample values preceding
and succeeding integer quantities

Quartile:
In this method the distribution is divided into 4 parts
Q1 = 0.25(n + 1)
Q2 = 0.5(n + 1) or this can also be the median
Q3 = 0.75(n + 1)
Here also if Q1, Q2, Q3 are integers we directly take the value
at that point, if not we take the average of the preceding and
succeeding values
Also we need to order the values in ascending order

InterQuartile Range:

It is the distance/range between the 75th Percentile and the

25th Percentile
IQR = Q3 - Q1

Variance:
It is the measure of how spread the values are from the
centre/average of the distribution

2
∑(x − x̄)
2
Sample V ariance, s =
n − 1

n -> Sample Size

2
∑(x − μ)
2
P opulation V ariance, σ =
N

N -> Population Size

Steps to calculate:
1. Find mean of the given data
2. Subtract the mean from the data
3. Square the deviation found in step 2
4. add all the deviation and divide by N for population and n-1
for sample

Standard Deviation:
Standard deviation = sqrt(Variance)
Larger the standard deviation, greater the spread
Just like mean, std. deviation is affected by outlier values

Chebyshev's Inequality:
This states that at least 1 - (1/k^2) of data from a sample must
fall within K standard deviations from the mean
Example: For K = 2, we have 1 - (1/k^2 = 1 - 1/4 = 3/4 = 75% .
According to Chebyshev's Inequality at least 75% of the data
should lie within 2 standard deviations from the mean
The inequality is represented something like this

1
P (|X − μ| ≥ kσ) ≤
2
k

In questions we will mostly use the rephrased version which is

1
P (μ − kσ ≤ X ≤ μ + kσ) ≥ 1 −
2
k

Do the questions on slide 367 - 369 in Unit 1 Combined

Slides

Sampling Distribution:
It is a probability distribution of a sample statistic like
sample mean etc taken from different samples from the same
population
It is used to estimate population parameters
If X1...Xn is a simple random sample from a population with mean
μ and variance σ 2 , then the sample mean X̄ is a random variable
with

μ x̄ = μ

2
σ
σ x̄ 2 =
n

σ
σ x̄ =
√n

Central Limit Theorem:

Point Estimate:
A quantity calculated from the data is called a statistic, and
the quantity used to estimate an unknown constant or parameter
is called Point Estimator
Properties of Point Estimator:
Bias:
When the expected value of the estimator is different from
the value of the parameter being estimated
Consistency:
This portrays how close can the point estimator be to the
true value even if the sample size increases
Efficiency:
A very efficient point estimator should have the following
Least Variance
Least Bias
Consistent
Goodness Measure of a Point Estimator - Mean Squared Error
Method to construct a Point Estimator - Maximum Likelihood
Estimate

Mean Squared Error:

MSE combines both bias and uncertainity

Maximum Likelihood Estimate:

Refer from page 69 of vibha notes (combined) in downloads

Next : _MCSE_/UNIT 2

Unit Iii
100% (1)
Unit Iii
36 pages
SAMPLING DISTRIBUTION 1autorecovered 310922401106253550
No ratings yet
SAMPLING DISTRIBUTION 1autorecovered 310922401106253550
92 pages
FHR 04
No ratings yet
FHR 04
51 pages
Unit IV - Sampling and Data Analysis
No ratings yet
Unit IV - Sampling and Data Analysis
93 pages
4.research Methodology-BBA S1M4
No ratings yet
4.research Methodology-BBA S1M4
41 pages
Sampling Methods
No ratings yet
Sampling Methods
63 pages
RM Unit 2
No ratings yet
RM Unit 2
52 pages
SAMPLING and Sampling Distribution
No ratings yet
SAMPLING and Sampling Distribution
49 pages
RM 4
No ratings yet
RM 4
36 pages
Intro To Stats - DataCollection & Sampling
No ratings yet
Intro To Stats - DataCollection & Sampling
21 pages
5 Sampling
No ratings yet
5 Sampling
62 pages
2 Data Collection 1
No ratings yet
2 Data Collection 1
38 pages
Portion 3
No ratings yet
Portion 3
32 pages
Stat Study Mat 2
No ratings yet
Stat Study Mat 2
44 pages
Chapter 4 Data Collection and Sampling Method
No ratings yet
Chapter 4 Data Collection and Sampling Method
25 pages
Elem Stats and Prob Lecture
No ratings yet
Elem Stats and Prob Lecture
15 pages
Statistical Sampling
No ratings yet
Statistical Sampling
5 pages
OERDoc 556 2354 12 08 2021
No ratings yet
OERDoc 556 2354 12 08 2021
38 pages
Sampling Techniques
No ratings yet
Sampling Techniques
25 pages
Chapter 1 INTRODUCTION TO STATISTICS (New)
No ratings yet
Chapter 1 INTRODUCTION TO STATISTICS (New)
34 pages
Sampling 2
No ratings yet
Sampling 2
20 pages
Tabulation and Presentation of Data - Unit II
No ratings yet
Tabulation and Presentation of Data - Unit II
17 pages
5 - Samplling and Its Estimation
No ratings yet
5 - Samplling and Its Estimation
44 pages
CPT Economics MCQ
100% (1)
CPT Economics MCQ
57 pages
2023 Statistics Fin 7 8
No ratings yet
2023 Statistics Fin 7 8
41 pages
Sampling
No ratings yet
Sampling
21 pages
Sampling
No ratings yet
Sampling
12 pages
Sampling Randomization
No ratings yet
Sampling Randomization
23 pages
Unit 6 I
No ratings yet
Unit 6 I
33 pages
Sampling Methods 1
No ratings yet
Sampling Methods 1
42 pages
ECON1005 Unit 1 Session 1.2
No ratings yet
ECON1005 Unit 1 Session 1.2
20 pages
Sampling and Sampling Distributions
100% (22)
Sampling and Sampling Distributions
78 pages
Complete Basic Stats
No ratings yet
Complete Basic Stats
18 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
3 Sampling Methods
No ratings yet
3 Sampling Methods
41 pages
Lecture 05
No ratings yet
Lecture 05
29 pages
Types of Sampling Design
No ratings yet
Types of Sampling Design
16 pages
Sampling (Method)
No ratings yet
Sampling (Method)
31 pages
STATISTICS
No ratings yet
STATISTICS
10 pages
Lecture8 Sampling Design
No ratings yet
Lecture8 Sampling Design
18 pages
SAMPLING - Probability and Non Probability
No ratings yet
SAMPLING - Probability and Non Probability
11 pages
Neron The Time Demon
No ratings yet
Neron The Time Demon
4 pages
Tổng Hợp BT Thống Kê (2) -Đã Gộp
No ratings yet
Tổng Hợp BT Thống Kê (2) -Đã Gộp
20 pages
Sampling Techniques in Reserarch Methodolgy
No ratings yet
Sampling Techniques in Reserarch Methodolgy
10 pages
C1 STS
No ratings yet
C1 STS
3 pages
Symbiosis International University, Pune: Case Analysis
100% (1)
Symbiosis International University, Pune: Case Analysis
15 pages
Business Data Analytics Students-07-Sampling PDF
No ratings yet
Business Data Analytics Students-07-Sampling PDF
50 pages
AS Maths Edexcel: 1. Statistical Sampling
No ratings yet
AS Maths Edexcel: 1. Statistical Sampling
5 pages
Brigada Eskwela Kick Off Ceremony Script
90% (10)
Brigada Eskwela Kick Off Ceremony Script
3 pages
Teaching Pe and Health in The Elementary Grades Introduction
No ratings yet
Teaching Pe and Health in The Elementary Grades Introduction
28 pages
Sampling Techniques Ali 2014
No ratings yet
Sampling Techniques Ali 2014
43 pages
L1 - City of Lies
No ratings yet
L1 - City of Lies
100 pages
Procedure To Make An Eco Ganesha
No ratings yet
Procedure To Make An Eco Ganesha
5 pages
PME Lec1. Sampling 13dec
No ratings yet
PME Lec1. Sampling 13dec
48 pages
Type of Sampling and Data
No ratings yet
Type of Sampling and Data
40 pages
Polyhedral Mesh Generation PDF
No ratings yet
Polyhedral Mesh Generation PDF
12 pages
3 Sampling and Data Gathering Techniques
No ratings yet
3 Sampling and Data Gathering Techniques
38 pages
Research: Strategies and Methods
No ratings yet
Research: Strategies and Methods
34 pages
The Legend of Sangkuriang
100% (3)
The Legend of Sangkuriang
2 pages
Chapter8 Sampling IoxO
No ratings yet
Chapter8 Sampling IoxO
24 pages
3 Unit: Sampling
No ratings yet
3 Unit: Sampling
25 pages
Sampling and Sampling Distributions: Mrs. Kiranmayi Patel
No ratings yet
Sampling and Sampling Distributions: Mrs. Kiranmayi Patel
35 pages
Identifying and Measuring Urban Design Qualities Related To Walkability
No ratings yet
Identifying and Measuring Urban Design Qualities Related To Walkability
35 pages
Natural Convection Concentric Cylinders
No ratings yet
Natural Convection Concentric Cylinders
17 pages
Statistics Assignment: by Vuyyuri Sujith Varma REG - NO: 17010141138 Bba Sec (A) Sem-2
No ratings yet
Statistics Assignment: by Vuyyuri Sujith Varma REG - NO: 17010141138 Bba Sec (A) Sem-2
20 pages
Cambridge O Level: Second Language Urdu For Examination From 2024
No ratings yet
Cambridge O Level: Second Language Urdu For Examination From 2024
10 pages
GC Method Development Tree
No ratings yet
GC Method Development Tree
9 pages
Offer Letter - Sagar Ravindra Jha
No ratings yet
Offer Letter - Sagar Ravindra Jha
11 pages
What Is Statistics?: Item 2000 2010 Malaysia Population
No ratings yet
What Is Statistics?: Item 2000 2010 Malaysia Population
15 pages
UNIT 5 Modes of Entry
No ratings yet
UNIT 5 Modes of Entry
12 pages
Sample Format of Final Report
No ratings yet
Sample Format of Final Report
1 page
Case Digest in Special Civil Action
No ratings yet
Case Digest in Special Civil Action
6 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
Unit V - Blocking Oscillators and Time Base Generators
No ratings yet
Unit V - Blocking Oscillators and Time Base Generators
29 pages
Sampling Method in Thesis
No ratings yet
Sampling Method in Thesis
37 pages
Tale of Despereaux Draft 1 Andrea Matos Devesa
No ratings yet
Tale of Despereaux Draft 1 Andrea Matos Devesa
40 pages
Business Research Methods (Class 1)
No ratings yet
Business Research Methods (Class 1)
29 pages
Course Report Istm Gsiti-Training Dec 13-Jan14 2
No ratings yet
Course Report Istm Gsiti-Training Dec 13-Jan14 2
35 pages
Teaching Strategies/Instructional Materials in Music Arts P.E. and Health By: Jinky D. Galindo
No ratings yet
Teaching Strategies/Instructional Materials in Music Arts P.E. and Health By: Jinky D. Galindo
8 pages
Confidentiality Issues in Arbitration 2013
No ratings yet
Confidentiality Issues in Arbitration 2013
9 pages
Irregular Verbs
No ratings yet
Irregular Verbs
2 pages
Convenience Sampling
No ratings yet
Convenience Sampling
4 pages
Federal Water Supply and Sewerage Management Project, Kanchanpur PDF
No ratings yet
Federal Water Supply and Sewerage Management Project, Kanchanpur PDF
51 pages
Test 3 U6
No ratings yet
Test 3 U6
5 pages
English Vocabulary Word List - VOA Special English Word List
No ratings yet
English Vocabulary Word List - VOA Special English Word List
9 pages
Sampling Techniques
No ratings yet
Sampling Techniques
1 page
Chapter 10 Group Dynamics
No ratings yet
Chapter 10 Group Dynamics
24 pages
Independence Sunday Liturgy PDF
No ratings yet
Independence Sunday Liturgy PDF
2 pages
Adevi & Lieberg (2012) Stress Rehabilitation Through Garden Therapy. A Caregiver Perspective
No ratings yet
Adevi & Lieberg (2012) Stress Rehabilitation Through Garden Therapy. A Caregiver Perspective
8 pages
1 - SDLP Water Marbling
No ratings yet
1 - SDLP Water Marbling
2 pages

Unit 1

Uploaded by

Unit 1

Uploaded by

UNIT 1

Data v/s Information

Structured, Semi-Structured and

Why do we need Data Science?

The 6 V's of Big Data

What is Statistical Analysis?

Sample is merely a subset of the Population, containing the

Population v/s Sample

1. Whom do you want to generalize results?

Types of Sampling Methods:

Non-Probability Samples Probability Samples

Simple Random Systematic Stratified Cluster

As the name suggests, it is an entirely random method of

n -> Sample Size, N -> Population Size

1.5) (i) Simple Random Sampling With Replacement:

In this sampling method, a random item is selected, then it is

1.5) (ii) Simple Random Sampling Without Replacement:

As the name suggests, in this sampling method there is not

The population is broken down into Smaller Sub-Population called

In this type of sampling, population is again divided into non-

4.5) (i) One Stage Cluster Sampling:

4.5) (ii) Two Stage Cluster Sampling:

Strata v/s Clusters:

Each Strata is represented in the Sample, since every strata

Convenience Judgement Snowball Quota

Sometimes it is also known as grab or opportunity or haphazard

aka Purposive sampling

In this type of sampling, survey subjects are selected based on

Sample Statistic v/s Population Parameter

Variables and Attributes

N -> Population Size

n -> Sample Size

w -> Weights of the data

It is computed by arranging the sample data in an order and

This mean is used to reduce the effect of outliers in our mean

It is the value that occurs most number of times in the sample

mean − mode = 3 ∗ (mean − meadian)

Most common and easily understandable

n -> Sample Size

It is the distance/range between the 75th Percentile and the

n -> Sample Size

N -> Population Size

In questions we will mostly use the rephrased version which is

Do the questions on slide 367 - 369 in Unit 1 Combined

Central Limit Theorem:

Mean Squared Error:

Maximum Likelihood Estimate:

You might also like