0% found this document useful (0 votes)
12 views8 pages

Statistics Notes

Ch 1 and 2

Uploaded by

kayleighjblair13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

Statistics Notes

Ch 1 and 2

Uploaded by

kayleighjblair13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

o Statistics – the science of collecting, organizing, summarizing, and


analyzing info to draw conclusions or answer questions. Also, about
providing a measure of confidence in any conclusions
o Population – entire group of individuals to be studied
o Individual – person or object that is member of population being
studied
o Sample – subset of population that is being studied
o Statistic – numeric summary of sample (survey)
o Parameter – numeric summary of population
1. Determine whether underlined value is a parameter or statistic.
a. Average grade for a class of 25 students in Elementary Statistics
was 83.2% - parameter
Process of statistics
1. Identify research objective
2. Collect data needed to answer questions
3. Describe data
4. Perform inference
o Descriptive statistics – consist of organizing and summarizing data.
Describe data through numerical summaries, tables, and graphs
o Inferential statistics – uses methods that take a result from a sample,
extend it to the population, and measure the reliability of the result
o Variables – the characteristics of the individuals within the population
(weight, age, etc.)
o Qualitative or categorical variables – allow for classification of
individuals based on some attribute or characteristics – hair color
o Quantitative variables – provide numerical measures of individuals.
Math operations such as addition and subtraction can be performed
and provide meaningful results. – social security number
o Discrete variable – a quantitative variable that has either a finite
number of possibilities or a countable number of variables “No
decimals” -number of deer in a forest
o Continuous variable – a quantitative variable that has an infinite
number of possible values that are not countable “has decimals” –
amount of oil in a car at any given time
o Individuals – a person or object that is a member of the population
being studied
o Variables – characteristics of the individuals within the population
o Data – describes characteristics of an individual
o Nominal level of measurement: name, label or categorize (hair color) –
color of a car
o Ordinal level of measurement: properties of nominal but you can
arrange by rank or order (car size) – class in high school
o Interval level of measurement: properties of ordinal and the differences
in the values of the variable have meaning (year of birth). Addition and
subtraction are okay – year of graduation
o Ratio level of measurement: same properties of interval and the ratios
of the values of the variable have meaning. Zero in this measurement
means absence of the quantity. Multiplication and division can be
performed at this level (value of a car) – weight of students

1.2 Simple Random Sampling

o Random sampling – process of using chance to select individuals from


a population to be included in the sample
o If convenience is used to obtain a sample, the results of the
survey are meaningless
o Four Basic Sampling techniques
o Simple random sampling
o Stratified sampling
o Systematic sampling
o Cluster sampling
o A sample of size n from a population of size N is obtained through
simple random sampling if every possible sample of size n has an
equally likely chance of occurring. The sample is then called a simple
random sample
o Sample without replacement – an individual who is selected is
removed from the population and cannot be chosen again
o Sample with replacement – a selected individual is placed back int the
population and could be chosen a second time
1. Given the following names, pick three names at random using table 1
in Appendix A

1. Frank 2. David 3. Bill


4. Mary 5. Sherry 6. Sam
7. Eric 8. Michelle 9. Danielle
o Number the names in manner – Michelle, Sam, Sherry
o Frame – a list of all the individuals within the population
o Stratified sample – obtained by separating the population into
nonoverlapping groups called strata and then obtaining a simple
random sample from each stratum. The individuals within each
stratum should be homogeneous (or similar) in some way. Need a
frame. – democrats, republicans, independent
o Systematic sample – is obtained by selecting every kth individual
from the population. The first individual selected corresponds to a
random number between 1 and k. doesn’t require a frame
o Steps in systematic sampling
o If possible, approximate the population size, N
o Determine the sample size desired, n
o Compute N/n and round down to the nearest integer. This
value is k
o Randomly select a number between 1 and k. call this number
p
o The sample will consist of the following individuals: p, p+k,
p+2k, …, p+(n-1)k
o Cluster sample – is obtained by selecting all individuals within a
randomly selected collection or group of individuals. Don’t need a
complete frame of the groupings
o Homogeneous – similar
o Heterogeneous – dissimilar
o Convenience sample – a sample in which the individuals are easily
obtained
o Most popular type
 Self-selected – individuals themselves decide to
participate (voluntary response samples)

N = 3300 n = 70

K=3300/70=47.14

= 47

N= 4000 n=35
N 4000
a) k = = =114
n 35
3 117 231 387
b) , , ,…,
p 3+114 117 +114 p+(n− j)k

2.1-2.3

o Frequency distribution – lists each category of data and the number of


times it occurs for each category
o Relative frequency – frequency/sum of all frequencies
o Relative frequency distribution – lists each category or data together
with the relative frequency
o Can use the function – COUNTIF to help build the frequency distribution
o Ex: COUNTIF (A1:A25, “Red”)
o Bar graph – constructed by labeling each category of data on a horiz.
Axis and the freq. or rel. freq. of the category on the vertical axis.
Rectangles of equal width are drawn for each category. The height of
each rectangle is equal to the freq. of that category or relative freq.
o bar graph from frequent dist.
o Put categories in column A, frequencies in column B
o Highlight them
o “insert” tab -> column -> 2D Clustered Column
o Pareto chart – a bar graph whose bars are drawn in decreasing order of
frequency or relative frequency
o Side by side bar graph – compares two data sets
o Pie chart – a circle divided into sectors. Each sector represents a
category of data. The area of each sector is proportional to the
frequency of the category
o Put categories in column a, frequencies in column B
o Highlight them
o “Insert” Tab -> Pie -> Pie
o Histogram – constructed by drawing rectangles for each class of data.
The height of each rectangle is the frequency or relative frequency of
the class. Width of each rectangle is the same and the rectangles
touch each other
o Frequency distribution – lists data values (either individually or by
groups of intervals) along with their corresponding frequencies (or
counts). Can be displayed in a bar chart
o Lower-class limits (LCL) – smallest numbers that can belong to the
different classes
o Upper-class limits (UCL) – largest numbers that can belong to the
different classes
o Open ended distributions = the first class has no lower-class limit, or
the last class has no upper-class limit
o Gap – the difference between the lower-class limit of one class and the
upper-class limit of the class right above or below it (depending on
data being ascending or descending)
o ½ gap = multiply the Gap by 0.5
o Upper-class boundary (UCB) = UCL + ½ Gap
o Lower-class boundary (LCB) = LCL - ½ Gap
o Class width – the difference between two consecutive lower-class limits
(or boundaries) or two consecutive upper-class limits (or boundaries)
o Constructing frequency distribution
o Decide on number of classes: n (between 5-20)
o Calculate class width (highest value minus lowest value then
divide by n). Round up to a convenient number
o Choose the LCL for the beginning class using lowest data value
or convenient number
o Keep adding class width over and over to get other LCL’s of other
classes
o Figure UCL from these (usually a gap of 1 or .1 or .01 or etc… is
assumed)
 If you have one decimal place in your data, then use 0.1
(since it has one decimal)
o Go through data and put tally marks in the appropriate class and
then add them all at the end
o Relative frequency distribution – replace the frequency with the
following formula
o Class frequency/sum of all frequencies
o Creating histogram (excel)
o Put data into column A (let’s say it is in A1:A25)
o Put UCLs (bins) into column B (let’s say it is in B1:B5)
o Click data tab -> “data analysis”
o Choose histogram -> choose appropriate settings and click “ok”
button
o Creating histogram (calculator)
o Put data into L1 (STAT button -> EDIT)
o “2nd” button, “y=” button, “enter” on Plot1
o Choose “on”, “third graph type”, L1, 1
o “window button:
 Xmin=LCL of first class
 Xmax=LCL of class beyond the last class
 Xscl= class width
 Ymin=0
 Ymax=your guess
o “graph” button and use “trace” button to determine frequency
o Stem-and-leaf plot – use the digits to the left of the rightmost digit to
form the stem. Each rightmost digit forms a leaf
o 147 would have 14 as stem and 7 as leaf
o Construction of a stem-and-leaf plot
o Stems are the digits to the left of the rightmost digit, leafs are
the rightmost digit
o Write stems in a vertical column in increasing order with a
vertical line to the right (don’t repeat duplicates)
o Write each leaf to the right of this vertical line (from smallest to
largest, these can repeat)
 If data is bunched up, can split each stem into two or more
(can repeat). This is referred to split stem
o Dot plot
o Place each observation horizontally in increasing order and place
a dot above the observation each time it is observed
o Frequency polygon – a graph that uses points, connected by line
segments, to represent the frequencies for the classes. It is
constructed by plotting a point above each class midpoint on a
horizontal axis at a height equal to the frequency of the class. Next,
line segments are drawn connecting consecutive points. Two additional
line segments are drawn connecting each end of the graph with the
horizonal axis
o Put midpoints in column A, frequencies in column B
o Insert a row at the beginning and at the end with the appropriate
midpoints (use class width) and put zeros for the frequency
o Highlight midpoints and frequencies
o Choose insert tab -> scatter -> scatter with straight lines and
markers
o Cumulative frequency distribution – displays the aggregate frequency
of the category. In other words, for discrete data, it displays the total
number of observations less than or equal to the category. For
continuous data, it displays the total number of observations less than
or equal to the upper-class limit of a class
o Cumulative relative frequency distribution – displays the proportion
(or percentage) of observations less than or equal to the category for
discrete data and the proportion (or percentage) of observations less
than or equal to the upper-class limit for continuous data
o Ogive – a graph that represents the cumulative frequency or
cumulative relative frequency for the class. It is constructed by plotting
points whose x-coordinates are the upper-class limits and whose y-
coordinates are the cumulative frequences or cumulative relative
frequencies of the class. Then line segments are drawn connecting
consecutive points. An additional line segment is drawn connecting the
first point to the horizontal axis at a location representing the upper
limit of the class that would precede the first class (if it existed)
o Put UCLs in column A, cumulative frequencies or relative
frequencies in column B
o Insert a row at the beginning with the appropriate UCL (use class
width) and put zero for the cumulative frequency
o Highlight UCLs and frequencies
o Choose insert tab -> scatter -> scatter with straight lines and
markers
o Time series plot – obtained by plotting the time in which a variable is
measured on the horizontal axis and the corresponding value of the
variable on the vertical axis. Line segments are then drawn connecting
the points
o Put dates in column A, quantity in column B (whatever you are
studying ex: stock prices)
o Highlight dates and values
o Choose insert tab -> scatter -> scatter with straight lines and
markers

3.1

o Mean – measure of the center found by adding all the values and
dividing the total by the number of values. Best to use if freq. dist. Is
roughly symmetric
∑ xi
o Sample mean - x=
n
∑ xi
o Population mean - μ=
N
o Mean on calculator
o Enter values into L1 (“stat” button – edit)
o “2nd” button, “stat” button
o Choose “math”
o Choose “mean”
o “enter” button
o “2nd” button, “1” button
o “)” button
o “enter” button
o Trimmed mean – drop the smallest and largest values and then find the
mean
o Median – measure of center that is the middle value when the original
data values are arranged Denote by M. Best to use if freq. dist. Is
skewed left or right
o Finding by hand
 Arrange them in order
 If the number of values is odd: median is exact number in
middle
 If the number of values is even: find the mean of two
middle numbers
o A numerical summary of data is said to be resistant if extreme values
(very large or small) relative to the data do not affect its value
substantially
o If the data are skewed left or skewed right, then the median is the
better measure of central tendency. If the data are symmetric, the
mean is the better measure of central tendency
o Mode – value that occurs most frequently
o Bimodal – two values occur with same greatest frequency
o Multimodal – more than 2 values occur with same greatest frequency
o Mode is often used where our measurement is names, categories, etc.
o Midrange – value midway between highest and lowest values in the
data
o High value + low value / 2

You might also like