0% found this document useful (0 votes)
13 views41 pages

Stat 1

The document discusses statistics and sampling theory. It defines key concepts in statistics such as population, sample, parameter, and statistic. It also covers different types of data scales, sources of data, and sampling methods. Probability and non-probability sampling techniques are explained.

Uploaded by

yodahekahsay19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views41 pages

Stat 1

The document discusses statistics and sampling theory. It defines key concepts in statistics such as population, sample, parameter, and statistic. It also covers different types of data scales, sources of data, and sampling methods. Probability and non-probability sampling techniques are explained.

Uploaded by

yodahekahsay19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

CHAPTER ONE

1.1 Definition of statistics

 What is Statistics?
Statistics:-
 It is a field of study concerned with summarizing data,
interpreting data, and making decisions based on data.
 It refers to the methodology for collecting, classifying,
summarizing, organizing, presenting, analysing and
interpreting numerical information.
 In the broadest sense, “statistics” refers to a range of
techniques and procedures for analysing, interpreting,
displaying, and making decisions based on data.
 A quantity calculated in a sample to estimate a value in
a07/04/2024
population is called a "statistic"
Conti…
1.1. Structure or classification of Statistics?

Descriptive Statistics (DS)


 Descriptive statistics are numbers that are used to summarize and describe data
with out any attempt to infer from the data.
 Descriptive statistics are just descriptive.
 They do not involve generalizing beyond the data at hand.
 Descriptive statistics is a collection of methods for summarizing data (e.g., mean,
median, mode, range, variance, graphs).
 The word “data” refers to the information that has been collected from an experiment, a
survey, an historical record, etc.
07/04/2024
 (By the way, “data” is plural. One piece of information is called a “datum.”)
Conti…

07/04/2024
Conti….

07/04/2024
Conti…

07/04/2024
Conti….

 A sample is typically a small subset of the population.


 In statistics, we often rely on a sample --- that is, a small subset of a larger set
of data --- to draw inferences about the larger set.
 The larger set is known as the population from which the sample is drawn.
 Example: You have been hired by the National Election Commission to
examine how the Ethiopian people feel about the fairness of the voting
procedures in Ethiopia. Who will you ask?
07/04/2024
1.3. Applications of Statistics
 It is applied in marketing, e-commerce, banking, finance, human resource,
production, and information technology.
 In addition, this mathematical discipline has been a prominent part of research and
is widely used in data mining, medicine, aerospace, robotics, psychology, and
machine learning.
 Not to forget the economics, government, and public sectors where statistical data is
a significant part of decision-making.
 For example, it is used for public surveys, weather forecasts, sports scoring, and
budgeting.
 In the finance sector, statistical data facilitate decision-making. For instance, a
watch manufacturing company can use statistical tools to determine the percentage
of defective watches in every lot.
 At a macro level, it helps in understanding a country’s financial state and
measuring economic growth. At a micro-level, statistics helps analysts determine a
company’s business income, earnings, and revenue-generating capacity.
 Be it preparing budgets, financial forecasts, monitoring a company, or a country’s
performance, statistics is everywhere.
07/04/2024
1.4. Types of Data (Scale of measurement)
Types of scales or data based on their measurement scales
Four fundamental scales:
1. Nominal 2. Ordinal 3. Interval 4. Ratio
1. Nominal Scale:
 names or categories.
 examples include:
gender
handedness
favourite colour
Religion
 The essential point about nominal scales is that they do not imply any
ordering among the responses.
 For example, when classifying people according to their favourite colour,
there is no sense in which green is placed “ahead of” blue.
 Responses are merely categorized.
07/04/2024
Conti…
2. Ordinal Scales
 Ordinal:
 names or categories
 order is meaningful
 Examples include:
 consumer satisfaction ratings
 military rank
 class ranking

3. Interval Scales
 Interval: names or categories, the order is meaningful, and intervals have
the same interpretation.
 Example: Celsius temperature scale
 Problem; No true zero point
07/04/2024
Conti…
4. Ratio Scales
Ratio:
 Highest and most informative scale contains the qualities of the nominal,
ordinal, and interval scales with the addition of an absolute zero point.
 Example: amount of money-zero money indicates the absence of money.
 You can think of a ratio scale as the three earlier scales rolled up in one.
 Like a nominal scale, it provides a name or category for each object (the
numbers serve as labels). Like an ordinal scale, the objects are ordered (in terms
of the ordering of the numbers). Like an interval scale, the same difference at
two places on the scale has the same meaning.
 And in addition, the same ratio at two places on the scale also carries the same
meaning.
 Example: example of a ratio scale is the amount of money you have in your
pocket right now (25 cents, 55 cents, etc.).
 Money is measured on a ratio scale because, in addition to having the properties
of an interval scale, it has a true zero point: if you have zero money, this implies
the absence of money.
07/04/2024
Summary

07/04/2024
1.5. Sources of Data
 A data source is the location where data that is being used originates from.
 The following are the two sources of data:
 Internal sources
• When data is collected from reports and records of the organisation itself, they
are known as the internal sources.
• For example, a company publishes its annual report’ on profit and loss, total
sales, loans, wages, etc.
 External sources
• When data is collected from sources outside the organisation, they are known
as the external sources.
• For example, if a tour and travel company obtains information on Tigray
tourism from Tigray Tourism office, it would be known as an external source of
data.
 The information collected from internal sources is called “primary data,” while
the information gathered from outside references is called “secondary data.”

07/04/2024
Conti….
Types of Data Based on their source
A) Primary data
• Primary data means first-hand information collected by an investigator.
• It is collected for the first time.
• It is original and more reliable.
• For example, the population census conducted by the government of
Ethiopia after every ten years is primary data.
B) Secondary data
• Secondary data refers to second-hand information.
• It is not originally collected and rather obtained from already published
or unpublished sources.
• For example, the address of a person taken from the telephone
directory or the phone number of a company taken from Just dial are
secondary data.
07/04/2024
CHAPTER 2: SAMPLING THEORY
 Sampling: The process or method of sample selection from the
population.
2.1. Basic Concepts
Definitions:
1. Parameter: Characteristic or measure obtained from a population.
2. Statistic: Characteristic or measure obtained from a sample.
3. Sampling: The process or method of sample selection from the
population.
4. Sampling unit: the ultimate unit to be sampled or elements of the
population to be sampled.
Examples: - If some body studies Scio-economic status of the
households, households are the sampling unit.
- If one studies performance of freshman students in some college, the
student is the sampling unit.
5. Sampling
07/04/2024 frame: is the list of all elements in a population.
Conti….
6. Errors in sample survey:
 There are two types of errors
• a) Sampling error: - Is the discrepancy between the
population value and sample value. - May arise due to
in appropriate sampling techniques applied.
• b) Non sampling errors: are errors due to procedure
bias such as:
- Due to incorrect responses
- Measurement
- Errors at different stages in processing the data.
07/04/2024
2.2. Reasons for Sampling
 The Need for Sampling
 Costs/economy
 Timeliness
 Large size of many population
 Inaccessibility of the entire population
 Destructive nature of many tests
 Reliability or accuracy.

07/04/2024
2.3. A Review of Methods of Sampling
 The following two methods are used to collect information about
the population
 Census: When each and every element or unit of the population is
studied.
 Sampling: When a small part of the population is selected for study.
 There are two principal methods of drawing a sample from a
population (Techniques of Sampling).
A. Probability Sampling: in this case each observation in the population
has an equal chance of being selected to become part of the sample.
Sampling techniques such as simple random sampling, stratified
sampling, cluster sampling and systematic sampling are probability
sampling.
B. Non-probability Sampling: is no way of estimating the probability
that each individual will be included in the sample. Quota sampling
and judgmental sampling are examples of non-probability sampling.
07/04/2024
A. Probability Sampling:
1. Simple Random Sample: is a method of probability sampling in which
every unit in the population has an equal nonzero chance of being selected (or part
of the sample). In other words, each element of the population has an equal and
independent chance of being included into the sample. The probability is given by
n/N.
How to select a simple random sample?
i. Lottery method
ii. Random number method
2. Stratified sampling:
 It is a three-step process.
Step 1- Divide the population into a number of homogeneous groups, usually termed as
'strata', which differ from one another but each of these groups is homogeneous within
itself. Ex:- income level, sex, education level, etc.);
Step 2- Select an independent simple random sample from each stratum (using simple
random sample);
Step 3- Form the final sample by consolidating all sample elements chosen in step 2.

07/04/2024
Conti..

07/04/2024
Conti…
3. Systematic Sampling: In systematic sampling only one random number is
needed throughout the entire sampling process.
 To use systematic sampling, a researcher needs:
[i] a sampling frame of the population;
[ii] a skip interval (K) calculated as follows:

 The first element (number), which is between 1 and K, is determined using


simple random sampling and then the next items are selected using the skip
interval.
 For instance, The jth unit is selected at first and then (j+ Kth), (j+2kth),…, etc.
until the required sample size is obtained.
 Example: if a lecturer wants to randomly select 20 students from a class of 100
students using systematic sampling, she can take the first element between 1
and 5 using simple random sampling and then select every 5th element starting
07/04/2024
Conti….
4. Cluster Sampling: cluster sampling is a type of sampling which
involves dividing the population into groups (or clusters). Then, one or
more clusters are chosen at random and individual within the chosen
cluster is sampled.
 A two-step-process:
Step 1- Defined population is divided into number of mutually exclusive
and collectively exhaustive heterogonous groups or clusters;
Step 2- Select an independent simple random sample of clusters using
sample random sampling.
 One special type of cluster sampling is called area sampling, where
pieces of geographical areas are selected.
 A common form of cluster sampling where clusters consist of
geographic areas, such as districts, housing blocks or townships.
 Area sampling could be one-stage, two-stage, or multi-stage.
07/04/2024
Conti….
• How to Take an Area Sample Using Subdivisions
• Step 1: Determine the geographic area to be surveyed, and identify its
subdivisions. Each subdivision cluster should be highly similar to all
others. For example, choose ten housing blocks within 2 kilometers of
the proposed site [say, Model Town] for your new retail outlet; assign
each a number.
• Step 2: Decide on the use of one-step or two-step cluster sampling.
Assume that you decide to use a two-stage cluster sampling.
• Step 3: Using random numbers, select the housing blocks to be
sampled. Here, you select 4 blocks randomly, say numbers #102,
#104, #106, and #108.
• Step 4: Using some probability method of sample selection, select the
households in each of the chosen housing block to be included in the
sample. Identify a random starting point (say, apartment no. 103).
07/04/2024
Conti….

07/04/2024
Summary

07/04/2024
B. Non-probability sampling:
 Non-probability sampling: The selection of units in the sample from the
population is not governed by the probability laws.
 For example, the units are selected on the basis of the personal judgment of
the surveyor.
 The persons volunteering to take some medical test or to drink a new type of
coffee also constitute the sample on non-random laws.
Four main types
i. Convenience sampling: drawn at the convenience of the researcher.
 Common in exploratory research.
ii. Judgmental (Purposive) sampling : sampling based on some judgment or
when the researcher deliberately selects certain units from the universe, it is known
as purposive sampling.
 Common in commercial marketing research projects.
iii. Quota sampling: in this method, the decision maker requires the sample to
contain a certain number of items with a given characteristic.
 Individuals are selected from each quota.
iv. Snowball
07/04/2024
Sampling : If the population is hard to access, snowball sampling
Summary

07/04/2024
3. DATA COLLECTION AND PRESENTATION
3.1. Data types and sources
Definition of Data: data refer to the numerical description of quantitative
aspect of things.
 In other words, data are the facts and figures that are collected,
summarized, analysed, and interpreted.
 The singular form of data is datum.
Types of data and data measurements
 Data can be divided into quantitative and qualitative.
 Qualitative data are data that can be placed into distinct categories,
according to some characteristics or attributes. Qualitative data could be
nominal or ordinal. Examples of quantitative data are gender (Male and
Female), income categories (low, middle, high), education level (diploma,
degree, masters, PhD), etc.
 Quantitative data are numerical and can be ordered or ranked and they
are closure under mathematical operations.
 Examples
07/04/2024 of quantitative data are age, height, weight, income, expenditure,
3.2.Methods of data collection (from respondents)
 A number of methods can be used to gather data from the so called
'respondents'.
Methods of data collection
i. Primary data
 Interview Questionnaire Observation
 Focused Group Discussions (FGDs) Key Informant Interview
ii. Secondary data
 Published and unpublished documents
 Journals, Magazines and Newspapers
 Bureau of Statistics
 A method of data collection depends on:
i) Nature and scope of the problem
ii) Cost, time and resources
iii) Degree of accuracy desired
Data presentation and frequency distribution
1. Tabular Method: Once data are generated from representative
sample using appropriate data collection tools or obtained from
secondary sources, they must be presented in a meaningful manner.
Ungrouped data
 Example: The marital status of 60 adults classified as single, married,
divorced and widowed is given below:

Grouped data: frequency distribution


 Note, however, that when the number of possible values of our
variable is very large the discrete frequency distribution will no more
be a condensed presentation.
 Then data have to be handled as continuous and distributed in to
classes.
Conti….
Some concepts and terminologies

 Class Frequency (or frequency): refers to the number of items that


belong to a class or number of observations in a particular class.
 Class limits (C.L.): the lowest and highest values that can be
included in a class such that there is gap between successive classes
are called class limits.
 The lower class limit (L.C.L.) of a class is a value such that no lower
value can fall in to that class, whereas the upper class limit (U.C.L.)
of a class is a value such that no upper value can fall into that class.
 Class Boundary (C.B) or Real class limits: class boundaries are the
lowest and the highest values in each class when there is no gap
between successive classes.
 To work with the distribution of a variable as if it was continuous, we
make use of these real class limits (also known as class boundaries).
Conti….

07/04/2024
Conti….
 Class width (w) or class interval: The class width for a class in a
frequency distribution is found by subtracting the lower (or upper)
class limit of one class from the lower (or upper) class limit of the
next class.
 For example, the class width of the above example is seven (7). That
is, 31-24=7 or 37-30=7.
 Class Mark (C.M.) or class midpoint: The class mark is the mid-
point of the class interval or is a value which lies mid way between
the lower and upper limits of the class. It is obtained as:
Practical steps in constructing continuous frequency distribution
1. Decide the number of classes (k):Select the number of classes desired,
usually between 5 and 20 or use Sturge’s rule k= 1+3.322logn, where
‘’k’’ refers to the number of desired classes and ‘’n’’ is the total number
of frequency.
 Example: For example; If n =10, k = 4.32 ~4; if n =100, k= 7.644 ~ 8; if
n = 1000, k =10.96 ~11.
2. Compute the Range(R): R= Maximum value- Minimum value.
3. Determine the Class Width (w): If the number of classes is known and
if it is decided to use a uniform class width, we use w = Range/k and
rounded up to the nearest integer, where Range is the difference
between the highest and the smallest value of the data.
 Note: As far as possible, a class width of 5 or a multiple of 5 is
convenient and facilitates computations.
4. Determine the Class Limits: Pick a suitable starting point less than or
equal to the minimum value.
 The starting point is called the lower limit of the first class. Continue to
add the class width to this lower limit to get the rest of the lower limits.
CONTI
5. Determine the Upper class limits: To find the upper limit of the first
class, subtract 1 (one) from the lower limit of the second class. Then
continue to add the class width to this upper limit to find the rest of the
upper limits.
6. Determine the frequency of each class: Frequency of each class can
be determined simply by counting the number of observations belonging
to each class.
7. Sum up the frequency of each class to check whether it is equal to
the total number of data collected from the field or not.
 Example: Construct a continuous frequency distributions for the
following raw data on marks (out of 100) obtained by 50 students in
Statistics.
57, 53, 65, 55, 50, 45, 64, 52, 16, 46, 42, 63, 33, 64, 53, 25, 54, 35, 48,
55, 70, 47, 39, 58, 52, 36, 65, 75, 26, 20, 55, 60, 83, 61, 45, 63, 49, 42,
35, 18, 51, 45, 42, 65, 39, 59, 45, 41, 30, 40.
Cumulative Frequency Distribution
 The cumulative frequency of a class tells us how often the values fall
below or above that class.
 Or as the name indicates it cumulates frequencies starting at the
lowest or the highest class boundary.
 There are two types of cumulative frequency distributions: the ‘’less
than’’ and the ‘’more than’’ cumulative frequency distributions.
Conti….
2. Graphical methods of data presentation

• The histogram
• Frequency polygon
• Ogive curves
• The line graphs
• Scatter plots (two variables)

07/04/2024
3. Diagrammatical Presentation of Data

• Diagrammatical presentation of data is usually used


to present categorical data.
• The two most commonly used charts (usually for
qualitative data are bar diagram (Bar chart) and Pie
diagram (Pie chart).
 Bar chart
 Pie-chart (diagram)

07/04/2024
4. Measures of Central Tendency
07/04/2024

You might also like