Mat105 Study Guide
Mat105 Study Guide
ASSOCIATION
STATISTICS
Statistics is concerned with the scientific method for collecting, organizing, summarizing,
presenting and analyzing data for useful purposes.
The two major areas of statistics are known as descriptive statistics and inferential.
Descriptive statistics describes the properties of sample and population data.
Inferential statistics uses those properties to test hypotheses and draw conclusions
Statistics can be used to make better informed business and investing decisions.
The mathematical theories behind statistics rely heavily on differential and integral calculus,
linear algebra, and probability theory.
Statisticians are people who do statistics.
Inferential statistics are used to make generalizations about large groups.
Descriptive statistics mostly focus on the central tendency, variability, and distribution of sample
data.
Descriptive statistical tools are as follows; mean (average), variance, skewness, kurtosis.
Inferential statistical tools are as follows; linear regression analysis, analysis of variance
(ANOVA), Logit/Probit models, Null hypothesis testing.
In statistics, we learn about the population and sample.
Population may refer to the number of people living in a region or a pool from which a statistical
sample is taken.
Sample is by studying the characteristics of a smaller number of similar objects or events.
Statisticians use sample because in many cases gathering comprehensive data about an entire
population is too costly, difficult, or flat out impossible, statistics start with a sample that can
conveniently or affordably be observed.
Statistic is the observed characteristics of the sample data.
Parameters are used to make inferences or educated guesses about the unmeasured characteristics
of the broader population.
sampling techniques are grouped into two categories as probability Sampling
and non- probability Sampling
Probability Sampling is alternatively known as random sampling.
Probability sampling rely on randomization
Examples of Probability Sampling are simple Random Sampling, Stratified sampling, Systematic
sampling, Cluster Sampling, Multi stage Sampling.
In simple random sampling, Every element has an equal chance of getting selected to be the part
sample.
We use simple random sampling when we don’t have any kind of prior information about the
target population.
Example of simple random sampling; Random selection of 20 students from class of 50 student.
Each student has equal chance of getting selected. Here probability of selection is 1/50
In stratified sampling, this technique divides the elements of the population into small subgroups
(strata) based on the similarity in such a way that the elements within the group are homogeneous
and heterogeneous among the other subgroups formed. And then the elements are randomly
selected from each of these strata.
We need to have prior information about the population to create subgroups.
In cluster sampling, Our entire population is divided into clusters or sections and then the clusters
are randomly selected.
All the elements of the cluster are used for sampling.
Clusters are identified using details such as age, sex, location etc.
In systematic sampling, the selection of elements is systematic and not random except the first
element.
Multi-Stage Sampling is the combination of one or more methods described above.
Non-probability sampling does not rely on randomization.
Non-probability sampling is also known as non-random sampling.
Non-probability sampling are convenience sampling, purposive sampling, quota sampling,
referral /snowball sampling
Convenience Sampling are selected based on the availability.
Convenience Sampling is used when the availability of sample is rare and also costly.
Purposive Sampling is based on the intention or the purpose of study.
Quota Sampling depends of some pre-set standard
Referral /Snowball Sampling is used in the situations where the population is completely
unknown and rare.
Qualitative data focus
Data can be collected by using sampling methods or experiments.
The information collected through censuses and surveys or in a routine manner or other sources is
called a raw data.
When the raw data are grouped into groups or classes, they are known as grouped data.
There are two types of data primary data and secondary data.
Primary data are data that are used for the specific purpose for which they were collected.
Secondary data are used for some purposes other than which they were originally collected.
The persons from whom information are collected are known as informants or respondents.
The data collected by an individual or his agents is primary data for him and secondary data for
all others.
When the observations are made with respect to quality is called qualitative data.
Examples of qualitative data are Crop varieties, Shape of seeds, soil type.
The qualitative variables are termed as attributes.
Classification is the process of arranging data into groups or classes according to the common
characteristics possessed by the individual items.
Quantitative data focuses on quantifying the collection and analysis of data.
Discrete and continuous data are examples of quantitative data.
Discrete data are data that can be measured precisely.
Continuous data are data that cannot be measured precisely.
Height, weight, temperature and length are examples of continuous data.
PRACTICE QUESTIONS
1. The scatter in a series of values about the average is called: (a) Central tendency (b) Dispersion
(c) Skewness (d) Symmetry
2. The measurements of spread or scatter of the individual values around the central point is
called: (a) Measures of dispersion (b) Measures of central tendency (c) Measures of skewness
(d) Measures of kurtosis
3. The degree to which numerical data tend to spread about an average value called: (a) Constant
(b) Flatness (c) Variation (d) Skewness
4. The measures of dispersion can never be: (a) Positive (b) Zero (c) Negative (d) Equal to 2
5. Given below the four sets of observations. Which set has the minimum variation? (a) 46, 48, 50,
52, 54 (b) 30, 40, 50, 60, 70 (c) 40, 50, 60, 70, 80 (d) 48, 49, 50, 51, 52
6. Which of the following is an absolute measure of dispersion? (a) Coefficient of variation (b)
Coefficient of dispersion (c) Standard deviation (d) Coefficient of skewness
7. The measure of dispersion which uses only two observations is called: (a) Mean (b) Median (c)
Range (d) Coefficient of variation
8. The range of the scores 29, 3, 143, 27, 99 is: (a) 140 (b) 143 (c) 146 (d) 70
9. If the observations of a variable X are, -4, -20, -30, -44 and -36, then the value of the range will
be: (a) -48 (b) 40 (c) -40 (d) 48
10. The maximum value in a series is 25 and its range is 15, the maximum value of the series is: (a)
10 (b) 15 (c) 25 (d) 35
11. The mean deviation of the scores 12, 15, 18 is: (a) 6 (b) 0 (c) 3
12. Mean deviation computed from a set of data is always: (a) Negative (b) Equal to standard
deviation (c) More than standard deviation (d) Less than standard deviation
13. Standard deviation of the values 2, 4, 6, 8 is 2.236, then standard deviation of the values 4, 8,12,
16 is: (a) 0 (b) 4.472 (c) 4.236 (d) 2.236
14. Primary data and _____________ data are same a.Grouped b)Secondary data c)Ungrouped
d)None of these
15. The data which have already been collected by someone are called a)Raw data b)Array data
c)Secondary data d)Fictitious data
16. The grouped data is also called a)Raw data b)Primary data c)Secondary data d)Qualitative data
17. Given X1=12,X2=19,X3=10,X4=,7 , then∑x4 equals? a)36 b)48 c)41 d)29
18. Data collected by NADRA to issue computerized identity cards (CICs) are a)Unofficial data
b)Qualitative data(c)Secondary data (d)Primary data
19. The number of accidents in a city during 2010 is (a)Discrete variable (b)Continuous variable
(c)Qualitative variable (d)Constant
20. Questionnaire survey method is used to collect (a)Secondary data. (b)Qualitative variable
(c)Primary data (d)None of these
21. The measure of central tendency listed below is: (a)The raw score (b)The mean (c)The range
(d)Standard deviation
22. The population mean µ is called: (a)Discrete variable (b)Continuous variable (c)Parameter
(d)Sampling unit
23. Standard deviation is the square root of (a)variance (b) mean (c) none of the options (d) all of
the options
24. For moderately skewed distribution, the value of mode is calculated as: (a)2Mean-3Median
(b)3Median-2Mean (c)2Mean + Mode (d)3Median - Mode
25. Sum of mode and median of the data 12, 15, 11, 13, 18, 11, 13, 12, 13 a. 26 b. 31 c. 36 d.
25
26. The difference of mode and mean is equal to (a)3(mean-median) (b) 2(mean median) (c)3(mean-
mode) (d)2(mode mean)
27. The median of 7, 6, 4, 8, 2, 5, 11 is (a)6 (b) 12 (c)11 (d)4
28. The mode of 12, 17, 16, 14, 13, 16, 11, 14 is (a)13 (b)11 (c)14 (d)14 and 16
29. If the mean of 6 numbers is 41 then the sum of these numbers is (a)250 (b)246 (c)134 (d)456
30. If the mean of 6 numbers is 17 then the sum of numbers is (a)102 (b)103 (c)150 (d)120
31. The following represents age distribution of students in an elementary class. Find the mode of
the values: 7, 9, 10, 13, 11, 7, 9, 19, 12, 11, 9, 7, 9, 10, 11. (a) 7 (b)9 (c) 10 (d)11
32. These numbers are taken from the number of people that attended a particular church every
Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the mean. a)25 (b)210 (c)62 (d)30
33. A numerical value used as a summary measure for a sample, such as a sample mean, is known as
a (a)Population Parameter (b)Sample Parameter (c)Sample Statistic (d)Population Mean
34. Statistics branches include (a)Applied Statistics (b)Mathematica Statistics (c)Industrial statistics
(d)a and b
35. Individual respondents, focus groups, and panels of respondents are categorised as (a)Primary
Data Sources (b)Secondary Data Sources (c)Itemized Data Sources (d)Pointed Data Sources
36. The variables whose calculation is done according to the height, length, and weight are
categorised as (a)Discrete Variables (b)Flowchart Variables (c)Measuring Variables
(d)Continuous Variables
37. The frequency distribution whose most values are dispersed to the left or right of the mode is
classified as (a) skewed (b)explored (c)bimodal (d) unimodal
38. What is the coefficient of variation of a sample that has a mean of 20 and standard deviation of
2? (a)1000% (b)10% (c)-1000% (d)-10%
39. Which scenario will give you a coefficient of variation of 30%? (a)Mean of 30 and standard
deviation of 15 (b)Mean of 4 and standard deviation of 16 (c)Mean of 3 and standard deviation
of 3 (d)Mean of 10 and standard deviation of 3
40. The standard deviation is divided by the coefficient of variation to calculate (a)arithmetic mean
(b)coefficient of arithmetic (c)coefficient of variance (d)multiplier of deviation
41. A researcher has collected the following sample data. 5 12 6 8 5 6 7 5 12 4..what is the sum of its
median and Mean? (a) 7 (b)6 (c)13 (d)1
42. from the above information find the squared value of the mode (a)5 (b)25 (c)10 (d)0
43. Which of the following is a non-numeric ordinal data? (a)Income (b)Price of commodity
(c)Occupation (d)Rating in beauty contest
44. A schedule in statistics refers to (a)Examination time table (b)set of questions used to gather
pieces of information which is filled by an informant or respondent. (c)A set of questions used to
gather information and filled by the investigator him/herself. (d) A set of past examination
questions.
45. Which of the following sampling methods does not need sampling frame? (a)Simple random
sampling. (b)Purposive sampling (b)Systematic sampling (d)Cluster sampling
46. The following are qualities of a good questionnaire except (a)That each question in the
questionnaire must be precise and unambiguous. (b)Avoidance of leading question in the
questionnaire. (c)That a questionnaire must be lengthy in order to accommodate many
questions. (d)That a questionnaire must be well structured into sections such that questions in
each section are related.
47. Which of the following is not a measure of partition? (a)Median (b)Mode (c)Percentile
(d)Quantiles E. Quintile
48. In the graphical method of obtaining the quartiles, which of the following diagrams is used?
(a)Bar chart (b)Histogram (c)Pie chart (d)Ogive
a. a discrete variable
b. a continuous variable
c. a constant
d. a qualitative variable
102. The weights of students in a college/school is a
a. Discrete Variable
b. Continuous Variable
c. Qualitative Variable
d. None of these
103. The number of accidents in a city during 2010 is
a. Discrete variable
b. Continuous variable
c. Qualitative variable
d. Constant
104. Which of these represent qualitative data
a. Height of a student
b. Liking or disliking of (500) persons of a product
c. Income of a government servant in a city
d. Yield from a wheat plot
105. Life of a T.V picture tube is a
a. Discrete variable
b. Continuous variable
c. Qualitative variable
d. Constant
106. The first hand and unorganized form of data is called
a. Secondary data
b. Organized data
c. Primary data
d. None of these
107. The data which have already been collected by some one are called
a. Raw data
b. Array data
c. Secondary data
d. Fictitious data
108. Census reports used as a source of data is
a. Primary source
b. secondary source
c. Organized data
d. none
108. The grouped data is also called
a. Raw data
b. Primary data
c. Secondary data
d. Qualitative data
The following data show the number of hours worked by 200 statistics students.
Number of Hours Frequency
0-9 40
10-19 50
10-29 70
30-39 40
35. Refer to the information above. The class width for this distribution
a. is 9
b. is 10
c. is 11
d. varies from class to class
37. What is one of the distinctions between a population parameter and a sample statistic?
a. A population parameter is only based on conceptual measurements, but a sample statistic is
based on a combination of real and conceptual measurements.
b. A sample statistic changes each time you try to measure it, but a population parameter
remains fixed.
c. A population parameter changes each time you try to measure it, but a sample statistic
remains fixed across samples.
d. The true value of a sample statistic can never be known but the true value of a population
parameter can be known
38. Which of the following would be most likely to produce selection bias in a survey?
a. Using questions with biased wording.
b. Only receiving responses from half of the people in the sample.
c. Conducting interviews by telephone instead of in person.
d. Using a random sample of students at a university to estimate the proportion of people who
think the legal drinking age should be lowered
39. Average of all observations in a set of data is known as
a. median
b. range
c. mean
d. mode
40. Which one of these statistics is unaffected by outliers?
a. Mean b. Inter quartile range c. Standard deviation d. Range
41. The skewness of a curve can also be measured using --------------
a. mode b. moments c. curve d. mean
42.. ------------------------ is a measure of departure of a curve from symmetry
43. The formula for the Karl Pearson coefficient of skewness is ------------------
Income # F
100-200 5
200-300 10
300-400 20
400-500 15
500-600 4
600-700 1
Total 55
ASISA-LASU