Stat 1
Stat 1
What is Statistics?
Statistics:-
It is a field of study concerned with summarizing data,
interpreting data, and making decisions based on data.
It refers to the methodology for collecting, classifying,
summarizing, organizing, presenting, analysing and
interpreting numerical information.
In the broadest sense, “statistics” refers to a range of
techniques and procedures for analysing, interpreting,
displaying, and making decisions based on data.
A quantity calculated in a sample to estimate a value in
a07/04/2024
population is called a "statistic"
Conti…
1.1. Structure or classification of Statistics?
07/04/2024
Conti….
07/04/2024
Conti…
07/04/2024
Conti….
3. Interval Scales
Interval: names or categories, the order is meaningful, and intervals have
the same interpretation.
Example: Celsius temperature scale
Problem; No true zero point
07/04/2024
Conti…
4. Ratio Scales
Ratio:
Highest and most informative scale contains the qualities of the nominal,
ordinal, and interval scales with the addition of an absolute zero point.
Example: amount of money-zero money indicates the absence of money.
You can think of a ratio scale as the three earlier scales rolled up in one.
Like a nominal scale, it provides a name or category for each object (the
numbers serve as labels). Like an ordinal scale, the objects are ordered (in terms
of the ordering of the numbers). Like an interval scale, the same difference at
two places on the scale has the same meaning.
And in addition, the same ratio at two places on the scale also carries the same
meaning.
Example: example of a ratio scale is the amount of money you have in your
pocket right now (25 cents, 55 cents, etc.).
Money is measured on a ratio scale because, in addition to having the properties
of an interval scale, it has a true zero point: if you have zero money, this implies
the absence of money.
07/04/2024
Summary
07/04/2024
1.5. Sources of Data
A data source is the location where data that is being used originates from.
The following are the two sources of data:
Internal sources
• When data is collected from reports and records of the organisation itself, they
are known as the internal sources.
• For example, a company publishes its annual report’ on profit and loss, total
sales, loans, wages, etc.
External sources
• When data is collected from sources outside the organisation, they are known
as the external sources.
• For example, if a tour and travel company obtains information on Tigray
tourism from Tigray Tourism office, it would be known as an external source of
data.
The information collected from internal sources is called “primary data,” while
the information gathered from outside references is called “secondary data.”
07/04/2024
Conti….
Types of Data Based on their source
A) Primary data
• Primary data means first-hand information collected by an investigator.
• It is collected for the first time.
• It is original and more reliable.
• For example, the population census conducted by the government of
Ethiopia after every ten years is primary data.
B) Secondary data
• Secondary data refers to second-hand information.
• It is not originally collected and rather obtained from already published
or unpublished sources.
• For example, the address of a person taken from the telephone
directory or the phone number of a company taken from Just dial are
secondary data.
07/04/2024
CHAPTER 2: SAMPLING THEORY
Sampling: The process or method of sample selection from the
population.
2.1. Basic Concepts
Definitions:
1. Parameter: Characteristic or measure obtained from a population.
2. Statistic: Characteristic or measure obtained from a sample.
3. Sampling: The process or method of sample selection from the
population.
4. Sampling unit: the ultimate unit to be sampled or elements of the
population to be sampled.
Examples: - If some body studies Scio-economic status of the
households, households are the sampling unit.
- If one studies performance of freshman students in some college, the
student is the sampling unit.
5. Sampling
07/04/2024 frame: is the list of all elements in a population.
Conti….
6. Errors in sample survey:
There are two types of errors
• a) Sampling error: - Is the discrepancy between the
population value and sample value. - May arise due to
in appropriate sampling techniques applied.
• b) Non sampling errors: are errors due to procedure
bias such as:
- Due to incorrect responses
- Measurement
- Errors at different stages in processing the data.
07/04/2024
2.2. Reasons for Sampling
The Need for Sampling
Costs/economy
Timeliness
Large size of many population
Inaccessibility of the entire population
Destructive nature of many tests
Reliability or accuracy.
07/04/2024
2.3. A Review of Methods of Sampling
The following two methods are used to collect information about
the population
Census: When each and every element or unit of the population is
studied.
Sampling: When a small part of the population is selected for study.
There are two principal methods of drawing a sample from a
population (Techniques of Sampling).
A. Probability Sampling: in this case each observation in the population
has an equal chance of being selected to become part of the sample.
Sampling techniques such as simple random sampling, stratified
sampling, cluster sampling and systematic sampling are probability
sampling.
B. Non-probability Sampling: is no way of estimating the probability
that each individual will be included in the sample. Quota sampling
and judgmental sampling are examples of non-probability sampling.
07/04/2024
A. Probability Sampling:
1. Simple Random Sample: is a method of probability sampling in which
every unit in the population has an equal nonzero chance of being selected (or part
of the sample). In other words, each element of the population has an equal and
independent chance of being included into the sample. The probability is given by
n/N.
How to select a simple random sample?
i. Lottery method
ii. Random number method
2. Stratified sampling:
It is a three-step process.
Step 1- Divide the population into a number of homogeneous groups, usually termed as
'strata', which differ from one another but each of these groups is homogeneous within
itself. Ex:- income level, sex, education level, etc.);
Step 2- Select an independent simple random sample from each stratum (using simple
random sample);
Step 3- Form the final sample by consolidating all sample elements chosen in step 2.
07/04/2024
Conti..
07/04/2024
Conti…
3. Systematic Sampling: In systematic sampling only one random number is
needed throughout the entire sampling process.
To use systematic sampling, a researcher needs:
[i] a sampling frame of the population;
[ii] a skip interval (K) calculated as follows:
07/04/2024
Summary
07/04/2024
B. Non-probability sampling:
Non-probability sampling: The selection of units in the sample from the
population is not governed by the probability laws.
For example, the units are selected on the basis of the personal judgment of
the surveyor.
The persons volunteering to take some medical test or to drink a new type of
coffee also constitute the sample on non-random laws.
Four main types
i. Convenience sampling: drawn at the convenience of the researcher.
Common in exploratory research.
ii. Judgmental (Purposive) sampling : sampling based on some judgment or
when the researcher deliberately selects certain units from the universe, it is known
as purposive sampling.
Common in commercial marketing research projects.
iii. Quota sampling: in this method, the decision maker requires the sample to
contain a certain number of items with a given characteristic.
Individuals are selected from each quota.
iv. Snowball
07/04/2024
Sampling : If the population is hard to access, snowball sampling
Summary
07/04/2024
3. DATA COLLECTION AND PRESENTATION
3.1. Data types and sources
Definition of Data: data refer to the numerical description of quantitative
aspect of things.
In other words, data are the facts and figures that are collected,
summarized, analysed, and interpreted.
The singular form of data is datum.
Types of data and data measurements
Data can be divided into quantitative and qualitative.
Qualitative data are data that can be placed into distinct categories,
according to some characteristics or attributes. Qualitative data could be
nominal or ordinal. Examples of quantitative data are gender (Male and
Female), income categories (low, middle, high), education level (diploma,
degree, masters, PhD), etc.
Quantitative data are numerical and can be ordered or ranked and they
are closure under mathematical operations.
Examples
07/04/2024 of quantitative data are age, height, weight, income, expenditure,
3.2.Methods of data collection (from respondents)
A number of methods can be used to gather data from the so called
'respondents'.
Methods of data collection
i. Primary data
Interview Questionnaire Observation
Focused Group Discussions (FGDs) Key Informant Interview
ii. Secondary data
Published and unpublished documents
Journals, Magazines and Newspapers
Bureau of Statistics
A method of data collection depends on:
i) Nature and scope of the problem
ii) Cost, time and resources
iii) Degree of accuracy desired
Data presentation and frequency distribution
1. Tabular Method: Once data are generated from representative
sample using appropriate data collection tools or obtained from
secondary sources, they must be presented in a meaningful manner.
Ungrouped data
Example: The marital status of 60 adults classified as single, married,
divorced and widowed is given below:
07/04/2024
Conti….
Class width (w) or class interval: The class width for a class in a
frequency distribution is found by subtracting the lower (or upper)
class limit of one class from the lower (or upper) class limit of the
next class.
For example, the class width of the above example is seven (7). That
is, 31-24=7 or 37-30=7.
Class Mark (C.M.) or class midpoint: The class mark is the mid-
point of the class interval or is a value which lies mid way between
the lower and upper limits of the class. It is obtained as:
Practical steps in constructing continuous frequency distribution
1. Decide the number of classes (k):Select the number of classes desired,
usually between 5 and 20 or use Sturge’s rule k= 1+3.322logn, where
‘’k’’ refers to the number of desired classes and ‘’n’’ is the total number
of frequency.
Example: For example; If n =10, k = 4.32 ~4; if n =100, k= 7.644 ~ 8; if
n = 1000, k =10.96 ~11.
2. Compute the Range(R): R= Maximum value- Minimum value.
3. Determine the Class Width (w): If the number of classes is known and
if it is decided to use a uniform class width, we use w = Range/k and
rounded up to the nearest integer, where Range is the difference
between the highest and the smallest value of the data.
Note: As far as possible, a class width of 5 or a multiple of 5 is
convenient and facilitates computations.
4. Determine the Class Limits: Pick a suitable starting point less than or
equal to the minimum value.
The starting point is called the lower limit of the first class. Continue to
add the class width to this lower limit to get the rest of the lower limits.
CONTI
5. Determine the Upper class limits: To find the upper limit of the first
class, subtract 1 (one) from the lower limit of the second class. Then
continue to add the class width to this upper limit to find the rest of the
upper limits.
6. Determine the frequency of each class: Frequency of each class can
be determined simply by counting the number of observations belonging
to each class.
7. Sum up the frequency of each class to check whether it is equal to
the total number of data collected from the field or not.
Example: Construct a continuous frequency distributions for the
following raw data on marks (out of 100) obtained by 50 students in
Statistics.
57, 53, 65, 55, 50, 45, 64, 52, 16, 46, 42, 63, 33, 64, 53, 25, 54, 35, 48,
55, 70, 47, 39, 58, 52, 36, 65, 75, 26, 20, 55, 60, 83, 61, 45, 63, 49, 42,
35, 18, 51, 45, 42, 65, 39, 59, 45, 41, 30, 40.
Cumulative Frequency Distribution
The cumulative frequency of a class tells us how often the values fall
below or above that class.
Or as the name indicates it cumulates frequencies starting at the
lowest or the highest class boundary.
There are two types of cumulative frequency distributions: the ‘’less
than’’ and the ‘’more than’’ cumulative frequency distributions.
Conti….
2. Graphical methods of data presentation
• The histogram
• Frequency polygon
• Ogive curves
• The line graphs
• Scatter plots (two variables)
07/04/2024
3. Diagrammatical Presentation of Data
07/04/2024
4. Measures of Central Tendency
07/04/2024