0% found this document useful (0 votes)
15 views15 pages

Business Statistics Note

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views15 pages

Business Statistics Note

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

STA 111 (INTRODUCTION TO STATISTICS) LECTURE NOTES

1.1 DEFINE STATISTICS

Definition of Statistics:

Statistics can be defined as a branch of science that is concerned with collection, organization,
summarization, presentation and analysis of data – COSPA.

Data: Data are the actual measurements or observations recorded on individuals. A datum
(singular) is a single measurement or observation, usually referred to as a score or raw
score.

Statistic: A formula or numerical measurement that describes some characteristic of a sample.

There are two branches of statistics, namely: Descriptive and Inferential statistics.

Descriptive Statistics: This consists of methods for organizing and summarizing information
(data) in a clear and effective way.

Examples:

• Percentage description of voting results of an election.


• Average amount spent by selected first year students.
• Graph of rainfall data for a period of 12 months.

Inferential Statistics: This involves methods of drawing conclusions about a population based
on information obtained from a sample of the population. This also involves testing of hypothesis
making predictions and forecasting values of the population parameters and decision making.

Examples:

• A statistical test to determine the effect of a variable on another.


• An experiment to determine the effectiveness of a drug.
• Building and evaluation of a statistical model.

1.2 SOURCES OF STATISTICAL DATA

Basically, there are two sources from which statistical data can be obtained, namely, published
and unpublished sources.

1
1.2.1 PUBLISHED SOURCES

These include:

i. Govt. Publications: Ministries and Departments of various ministries of countries and


states publish the data relating to their departments or ministries. The most important
producers of published data in Nigeria are the National Bureau of Statistics (NBS) and
Central Bank of Nigeria (CBN). These include the topics like employment, savings,
investments, imports/exports, etc. They publish the data periodically; i.e. monthly,
quarterly or annually.

ii. International Publications: World bodies such as IMF, World Bank, UNESCO, UNICEF,
WTO, WHO etc. also publish the data regarding their organizations. These are used as
published secondary data.

iii. Reports of Committees and Commissions: Union and state governments at times
appoint some committees or commissions to make research into any problem such as
Finance Commission, Minority Commission, Planning Commission etc. These committees
are given a term to probe into the matter. After the expiry of the term, they present the
report to the respective authority, which are then published. The data is analyzed to find
the required solutions.

iv. Publication by Trade Business Associations: Big trade and business associations also
publish periodic data about trade and industry which are of much use. These data is used
by scholars to analyze various problems being faced by the country. Different industries
also publish data about their own production and other elements.

v. Newspapers, Magazines and Journals: These are one of the main providers of data
on day to day basis.

1.2.2 UNPUBLISHED SOURCES

These include firms and individuals.


i. Firm: A firm can be defined as two or more persons carrying out a business. A firm
carries out market research on its own in order to tackle problems therein. However,
small firms that cannot carry researches usually delegate research agencies when the
need arises.

ii. Individuals: This has to do with data collected by individuals. These may be data of
their thesis, research articles, term papers, etc.

iii. Private Publications: Some private institutions belonging to big education houses
also bring out their publications with data on different topics. These topics may include
2
development, employment import/export or balance of payments position etc. Different
stock exchanges also publish data in respect of companies listed with them.

1.4 USES OF STATISTICS

Statistics can be applied in all disciplines. Let us consider some examples:

1. Statistics helps in providing a better understanding and exact description of a phenomenon


of nature.
2. Statistics helps in the proper and efficient planning of a statistical inquiry in any field of
study.
3. Statistics helps in collecting appropriate quantitative and qualitative data.
4. Statistics helps in presenting complex data in a suitable tabular, diagrammatic and graphic
form for easy and clear comprehension of the data.
5. Statistics helps in understanding the nature and pattern of variability of a phenomenon
through quantitative observations.
6. Statistics helps in drawing valid inferences, along with a measure of their reliability about
the population parameters from the sample data.

1.5 TYPES OF DATA


There are two types of data, namely, quantitative and qualitative data.

Quantitative Data

Quantitative data is a set of data that is numerical. They are counts or numerical measurements,
also called ‘scale’. Quantitative data can either be discrete, for example, age, number of
customers that visit the bank in a day, etc., or continuous, for example, salaries, weight, height,
etc.

Qualitative Data

Qualitative, also called Categorical data is information that cannot be measured numerically, but
can be coded for it to be meaningful. Each value is chosen from a set of non-overlapping
categories. For example: Marital Status with categories: 'Single', 'Married', 'Divorced', 'Separated',
‘Widowed’ and ‘Engaged’. Another example is Gender with categories: 'male' and 'female'.

1.6 SCALES OF MEASUREMENT

In statistics, the term measurement is used more broadly and is more appropriately
termed scales of measurement. Scales of measurement refer to ways in which
variables/numbers are defined and categorized. Each scale of measurement has certain
properties which in turn determine the appropriateness for use of certain statistical analyses. The
four scales of measurement are nominal, ordinal, interval, and ratio.
3
Nominal Data
Nominal data is a set of data that can be coded into categories without a particular order. For
example, in a data set, males could be coded as 0, females as 1; marital status of an individual
could be coded as 1 if married, 2 if single, 3 if separated, 4 if divorced and 5 if widowed. This
type of data are considered as categorical data but the order of the categories is meaningless.
Data that consist of only two categories like male and female or dead and alive are called
binomial data, while those that consist of more than two categories like married, single,
separated, divorced and widowed are known as multinomial data.

Ordinal Data
Ordinal data is a type of data that is categorical, but the categories are ordered logically. These
data can be ranked in order of magnitude like Good, Better, Best; where Good can be coded as
1, Better as 2 and Best as 3 showing order or hierarchy. When rating items or products, the data
generated through it is usually ordinal. Most of the scores and scales used in research fall under
the ordinal data. For example, rating score/scale for taste, smell, ease of application of products,
etc.

Interval Data
Interval data, also called discrete data, is measured along a scale in which each position is
equidistant from one another. But this type of data has no natural zero. This allows for the
distance between two pairs to be equivalent in some way. For example Celsius scale of
temperature, age of respondents, number of female lecturers in all the departments in a
particular faculty (school), etc.

Ratio Data
Ratio data is also called continuous data; it has all the qualities of interval data (natural order,
equal intervals) plus a natural zero point. This type of data is observed to be used most
frequently. Example of ratio data is height, weight, length, etc. In this type of data, it can be said
meaningfully that 10m of length is double of 5m. This ratio holds true regardless of which scale
the object is being measured in (e.g., meters or yards). Reason for this is the presence of natural
zero.

1.7 CLASSIFICATION OF DATA

Data are classified into two categories: primary and secondary.

1. Primary data refer to self-acquired data. Here, the researcher or interviewer collects the
information (data) by himself and uses it for the purpose for which it was collected. For
example, if in a population census, as is always the case, the government uses data

4
collected on employment, deaths, school age, etc., to formulate policies to tackle these
aspects of the nation's problems; so we can say that the data is primary.
2. On the other hand, if one uses an already collected data for a study/research we say that
his data is secondary, since he was not the person that made the collection.

Some Advantages of using Primary data:


1. The investigator collects data specific to the problem under study.
2. There is no doubt about the quality of the data collected (for the investigator).
3. If required, it may be possible to obtain additional data during the study period.

Some Disadvantages of using Primary data:


1. The investigator has to contend with all the hassles of data collection-
▪ deciding why, what, how, when to collect
▪ getting the data collected (personally or through others)
▪ getting funding and dealing with funding agencies
▪ ethical considerations (consent, permissions, etc.)
2. Ensuring the data collected is of a high standard
3. Cost of obtaining the data is often the major expense in studies

Some Advantages of using Secondary data:


1. The data is already there – no hassles of data collection
2. It is less expensive
3. The investigator is not personally responsible for the quality of data

Some disadvantages of using Secondary data:


1. The investigator cannot decide what is collected (if specific data about something is
required, for instance).
2. One can only hope that the data is of good quality
3. Obtaining additional data (or even clarification) about something is not possible (most
often)

2.0 METHODS OF DATA COLLECTION

There are five major methods through which data can be collected. They are the interview
method, questionnaire, observation, register and focus group discussion (FGD) methods.

2.2.1 THE PERSONAL INTERVIEW METHOD


An interview is a data-collection technique that involves oral questioning of respondents, either
individually or as a group.

5
In market research, this is by far the most commonly used way of collecting information from the
general public.

Advantages:
1. Can be flexible with respondents.
2. More information can be collected.
3. Develops relationship with respondents.
4. Helps get full range and depth of information.
5. Help can be given to those respondents who are unable to understand the questions.

Disadvantages:
1. Can take much time.
2. Can be hard to analyze and compare.
3. Can be expensive.
4. Interviewer can bias respondent’ responses.

2.2.2 OBSERVATION METHOD

Observation is a technique that involves systematically selecting, watching and recording


behavior and
characteristics of living beings, objects or phenomena. Observation of human behavior is a much-
used data collection technique. It can be undertaken in different ways:

• Participant observation: The observer takes part in the situation he or she observes.
For example, the social worker who becomes a factory worker, to learn the habits and
customs of the community they are observing.
• Non-participant observation: The observer watches the situation, openly or concealed,
but does not participate. For example, by observing the "traffic" flow in a supermarket
before and after making changes in the store layout.

Advantages:
1. It can keep the system undisturbed.
2. The actual actions or habits of persons are observed and noted.

Disadvantages:
1. Can be expensive
2. Can be difficult to interpret seen behaviors

6
3. Can be complex to categorize observations
4. Opinions and attitudes cannot usually be obtained by observation.
5. Can influence behaviors of program participants. Actions which took place before the study
may not be observed.

2.2.3 REGISTRATION METHOD

This method involves data or information collected and recorded over time either when they
occurred or after the occurrence. For example, registration of births, deaths, marriages, divorces,
immigration and emigration, motor accidents, industrial accidents, etc.

Advantages

1. It is relatively inexpensive.
2. Relatively fast and easy to access.

Disadvantages

1. Limited to documents available.


2. Difficult to verify quality of information.
3. Leaves out tacit and informal knowledge.

2.2.4 QUESTIONNAIRE METHOD

A questionnaire (also referred to as self-administered questionnaire) is a data collection tool in


which written questions are presented that are to be answered by the respondents in written
form. The process of designing and distributing a questionnaire for the purpose of collecting
information is also referred to as a survey. Questionnaires can be administered in different ways,
such as by:

• Sending questionnaires by mail with clear instructions on how to answer the questions and
asking for mailed responses.
• Gathering all or part of the respondents in one place at one time, giving oral or written
instructions, and letting the respondents fill out the questionnaires.
• Hand-delivering questionnaires to respondents and collecting them later.
• Designing and distribution it electronically online using the computer.

Advantages:

1. Relatively inexpensive.
2. There is no interviewer bias.

7
3. Ability to reach more participants.
4. Summarizes findings in a clear and precise way.
5. The respondent has time to consult any necessary documents.

Disadvantages:

1. Usefulness depends on response rate.


2. Difficult to verify quality of information.
3. The wrong person may complete the form.
4. Risk of losing subtle differences in responses.
5. Spontaneous answers cannot be collected. Only simple questions and instructions can be
given.

2.1 QUESTIONNAIRE DESIGN

Questionnaire design is a skill that involves the consideration of the topic under study.
There are three types of questions that are used in questionnaire, namely: closed-ended
questions, open-ended questions and Likert-Scale questions.

CLOSED-ENDED QUESTION
Closed-ended questions are the questions that have options for the respondents to choose from.
For example:

Age: (a) 20 – 29 (b) 30 – 39 (c) 40 and above

OPEN-ENDED QUESTION
Open-ended questions are questions that have no options for the respondent to choose from,
rather the respondent is free to answer as much as s/he can. For example:
Age: _________

LIKERT-SCALE QUESTION
Likert-scale questions are questions with closed-ended options, but are scaled. For example:

Nigeria is a failed state:

(a) Strongly Agree (b) Agree (c) Neutral (d) Disagree (e) Strongly Disagree.

8
To design a questionnaire, the following must be considered:
1. The topic/purpose of the research.
2. The specific objectives of the research.
3. Type of research been carried out.
4. The demographic background of the respondents.
5. The type of questionnaire to be designed.
6. The types of questions that will capture the required information.
7. Operationalization of the variables in order to quantify the responses.

2.2 PROBLEMS AND TYPES OF ERRORS THAT ARISE IN DATA COLLECTION

These are the major problems and errors that arise in data collection:

1. The use of an inadequate frame.


2. Sampling error.
3. A poorly designed questionnaire.
4. Recording and measurement errors.
5. Non-response problems.

3.0 SAMPLING AND SAMPLING TECHNIQUES

Sampling is the process of selecting units (e.g., people, organizations, items, etc) from a
population of interest so that by studying the sample we may fairly generalize our results back to
the population from which they were chosen.

Sampling Frame
A sampling frame is the population from which a sample is drawn. It is made up of a list of all
those within a population who can be sampled, and may include individuals, households,
institutions, experimental group, etc.

Statistical Survey/Sample Survey


A statistical survey or a sample survey is an investigation about the characteristics of a
phenomenon by means of collecting data from a sample of the population and estimating their
characteristics through the systematic use of statistical methodology.

9
Census
A census is a complete enumeration of a population or groups at a point in time with respect to
well-defined characteristics (population, production). Data are collected for a specific reference
period. A census should be taken at regular intervals in order to have comparable information
available, therefore, most statistical censuses are conducted every 5 or 10 years. Data are usually
collected through questionnaires mailed to respondents, via the Internet, or completed by
an enumerator visiting respondents, or contacting them by telephone.

• An advantage is that censuses provide better data than surveys for small geographic
areas or sub-groups of the population. Census data can also provide a basis for sampling
frames used in subsequent surveys.
• The major disadvantage of censuses is usually the high cost associated with planning and
conducting them, and processing the resulting data.

Population
Population in statistics refers total items of which one is interested in.

Sample
A sample is a subset of a population. We use sample to make inference over the population.

3.1 REASONS FOR SAMPLING


1. To save cost.
2. To save time.
3. To reduce workload.
4. Reduce the use of resources.
5. To reduce labour.
6. To obtain more accurate results.

3.2 ADVANTAGES AND DISADVANTAGES OF SAMPLING


Advantages of Sampling
1. Very accurate.
2. Economical in nature.
3. Very reliable.
4. High suitability ratio towards the different surveys.
5. Takes less time.
6. In cases, when the universe is very large, then the sampling method is the only practical
method for collecting the data.

10
Disadvantages of Sampling
1. Inadequacy of the samples.
2. Chances for bias.
3. Problems of accuracy.
4. Difficulty of getting the representative sample.
5. Untrained manpower.
6. Absence of the informants.
7. Chances of committing the errors in sampling.

3.3 PILOT SURVEYS


Pilot surveys are also referred to as pilot enquiries. It is a preliminary survey used to gather
information prior to conducting a survey on a larger scale.

Purposes of a Pilot Survey


The few main purpose of carryout a pilot survey is to determine whether conducting a large-scale
survey is worth the effort and to evaluate what it would look like if carried out. Other specific
purposes are to:
1. Determine the efficiency of the future survey.
2. Smooth out difficulties before administering the main survey.
3. Gather insightful information as to how long it took to answer the questions in order to
modify the main survey.
4. Determine the best group size for the main survey.
5. Determine appropriateness of questions to the target population.
6. Test the correctness of the instructions to be measured; whether all the respondents in
the pilot sample are able to follow the directions as indicated.
7. Provide better information on whether the type of survey is effective in fulfilling the
purpose of the study.
8. Save financial resources because if errors are found in the questionnaire or interview
earlier on, there would be a lesser chance of unreliable results or worse, that you would
need to start over again after conducting the survey.

3.4 SAMPLING METHODS


Sampling methods are divided into two major types, namely: probability and non-probability
sampling methods.

11
Probability Sampling
A probability sampling method is one in which every unit in the population has equal chance of
being selected in the sample.

Probability sampling methods include:


 Simple Random Sampling
 Systematic Sampling
 Stratified Random Sampling
 Cluster Sampling
 Multistage Sampling
 Multiphase sampling

Non-Probability Sampling
Non-probability sampling is a method of sampling where the elements of the population have no
equal chance of being selected. It involves the selection of elements based on assumptions
regarding the population of interest, which forms the criteria for selection. Hence, because the
selection of elements is non-random, non-probability sampling does not allow the estimation of
sampling errors.

Non-probability Sampling include:


• Convenience Sampling
• Quota Sampling and
• Purposive or Judgmental Sampling
• Snow-Ball Sampling.

PROBABILITY SAMPLING METHODS

SIMPLE RANDOM SAMPLING


This method of sampling is used when the population is small, homogeneous & readily available.
It provides for greatest number of possible samples. This is done by assigning a number to each
unit in the sampling frame. A table of random numbers is used to determine which units are to be
selected.

Advantages
 Estimates are easy to calculate.
 Simple random sampling is always an EPS design, but not all EPS designs are simple
random sampling.
12
Disadvantages
 If sampling frame large, this method impracticable.
 Minority subgroups of interest in population may not be present in s ample in sufficient
numbers for study.

REPLACEMENT OF SELECTED UNITS


Sampling schemes may be without replacement ('WOR' - no element can be selected more than
once in the same sample) or with replacement ('WR' - an element may appear multiple times in
one sample).

For example, if we catch fish, measure them, and immediately return them to the water before
continuing with the sample, this is a WR design, because we might end up catching and
measuring the same fish more than once. However, if we do not return the fish to the water (e.g.
if we eat the fish), this becomes a WOR design.

SYSTEMATIC SAMPLING
Systematic sampling relies on arranging the target population according to some ordering scheme
and then selecting elements at regular intervals through that ordered list.

Systematic sampling involves a random start and then proceeds with the selection of every kth
element from then onwards. In this case, k = population size/sample size.

It is important that the starting point is not automatically the first in the list, but is instead
randomly chosen from within the first to the kth element in the list.

A simple example would be to select every 10th name from the telephone directory (an 'every
10th' sample, also referred to as 'sampling with a skip of 10').

STRATIFIED RANDOM SAMPLING

Stratified random sampling involves categorizing the members of the population into mutually
exclusive and collectively exhaustive groups. An independent simple random sample is then
drawn from each group. Stratified sampling techniques can provide more precise estimates if the
population being surveyed is more heterogeneous than the categorized groups, can enable the
researcher to determine desired levels of sampling precision for each group, and can provide
administrative efficiency. An example of a stratified sample would be a sample conducted to

13
determine the average income earned by families in the Nigeria. To obtain more precise
estimates of income, the researcher may want to stratify the sample by geographic region (north,
south, etc) and/or stratify the sample by urban, suburban, and rural groupings. If the differences
in income among the regions or groupings are greater than the income differences within the
regions or groupings, precision of the estimates is improved. In addition, if the research
organization has branch offices located in these regions, the administration of the survey can be
decentralized and perhaps conducted in a more cost-efficient manner.

CLUSTER SAMPLING

Cluster sampling is similar to stratified sampling because the population to be sampled is


subdivided into mutually exclusive groups. However, in cluster sampling the groups are defined
so as to maintain the heterogeneity of the population. It is the researcher’s goal to establish
clusters that are representative of the population as a whole, although in practice this may be
difficult to achieve. After the clusters are established, a simple random sample of th e clusters is
drawn and the members of the chosen clusters are sampled. If all of the elements (members) of
the clusters selected are sampled, then the sampling procedure is defined as one-stage cluster
sampling. If a random sample of the elements of each selected cluster is drawn, then the
sampling procedure is defined as two-stage cluster sampling. Cluster sampling is frequently
employed when the researcher is unable to compile a comprehensive list of all the elements in
the population of interest. A cluster sample might be used by a researcher attempting to measure
the age distribution of persons residing in a particular region of the country. It would be much
more difficult for the researcher to compile a list of every person residing in that region than to
compile a list of residential addresses. In this example, each address would represent a cluster of
elements (persons) to be sampled. If the elements contained in the clusters are as
heterogeneous as the population, then estimates derived from cluster sampling are as precise as
those from simple random sampling. However, if the heterogeneity of the clusters is less than
that of the population, the estimates will be less precise.

NON-PROBABILITY SAMPLING METHODS

CONVENIENCE SAMPLING

As the name implies, convenience sampling involves choosing respondents at the convenience of
the researcher.

Examples of convenience samples include people-in-the street interviews the sampling of people
to which the researcher has easy access, such as a class of students; and studies that use people
14
who have volunteered to be questioned as a result of an advertisement or another type of
promotion. A drawback to this methodology is the lack of sampling accuracy. Because the
probability of inclusion in the sample is unknown for each respondent, none of the reliability or
sampling precision statistics can be calculated. Convenience samples, however, are employed by
researchers because the time and cost of collecting information can be reduced.

QUOTA SAMPLING
Quota sampling is often confused with stratified and cluster sampling methodologies. All of these
methodologies sample a population that has been subdivided into classes or categories. The
primary differences between the methodologies is that with stratified and cluster sampling the
classes are mutually exclusive and are isolated prior to sampling. Thus, the probability of being
selected is known, and members of the population selected to be sampled are not arbitrarily
disqualified from being included in the results. In quota sampling, the classes cannot be isolated
prior to sampling and respondents are categorized into the classes as the survey proceeds. As
each class fills or reaches its quota, additional respondents that would have fallen into these
classes are rejected or excluded from the results. An example of a quota sample would be a
survey in which the researcher desires to obtain a certain number of respondents from various
income categories. Generally, researchers do not know the incomes of the persons they are
sampling until they ask about income. Therefore, the researcher is unable to subdivide the
population from which the sample is drawn into mutually exclusive income categories prior to
drawing the sample. Bias can be introduced into this type of sample when the respondents who
are rejected, because the class to which they belong has reached its quota, differ from those who
are used.

PURPOSIVE OR JUDGMENTAL SAMPLING


In judgmental or purposive sampling, the researcher employs his or her own "expert” judgment
about who to include in the sample frame. Prior knowledge and research skill are used in
selecting the respondents or elements to be sampled. An example of this type of sample would
be a study of potential users of a new recreational facility that is limited to those persons who
live within two miles of the new facility. Expert judgment, based on past experience, indicates
that most of the use of this type of facility comes from persons living within two miles. However,
by limiting the sample to only this group, usage projections may not be reliable if the usage
characteristics of the new facility vary from those previously experienced. As with all non-
probability sampling methods, the degree and direction of error introduced by the researcher
cannot be measured and statistics that measure the precision of the estimates cannot be
calculated.

15

You might also like