Smat3: Statistics and Robability
Smat3: Statistics and Robability
What is statistics?
Overview of terminology and definitions
STATISTICS
Involves defining a research question and then translating that question into a statistical statement
that can be tested using data
- study design
- data collection
- summarize the data
- analyze data
- generalize to people or population of interest
- communicate the findings to a general audience
2 CATEGORIES OF STATISTICS
Descriptive Statistics - summarizing a sample of data using plots or numeric summaries like:
different plots, means, median, standard deviations.
Inferential Statistics - use data or summaries to infer something about a population by using a
sample to generalize to a population that can be divided into three sorts of areas:
Estimation – trying to estimate the average or the mean. (what is the average
salary of a CEO?)
Hypothesis testing – involves question like (does a CEO who’s six feet or taller
earn more on average than one who’s not?)
Prediction – might be (for particular CEO what does our model estimate their
salary would be?)
VOCABULARY
Unit (Subject) - this are just the entities on which data is collected
Variable - this is a recorded characteristic for the unit or for a person. Usually, a set of data or a
data table, it’s often organized with individuals or units in rows and variables in the columns.
Population - is the group of interest for our study so that is who or what are we interested in
studying.
Sample - is a subset of the population to study. The result of any study are only generalizable to
the studied population.
Population Parameter – thing/quantity we’d like to know for the entire population.
Sample Statistics – the Estimate of parameter from sample.
Mu for the population mean and x-bar for the sample mean
Sigma for the population standard deviation and s for the sample standard deviation
Rho for the population correlation and R for the sample correlation
The true (population) difference in depression rates for those who exercise vs. those who don’t.
(Sample) Statistics
The difference in depression rates for those who exercise vs. those who don’t in the sample?
External Validity
Is the result from our study generalizable to the general population of university students?
Internal Validity
a. Probability Sampling - Samples are chosen in such a way that each member of the
population has a known though not necessarily equal chance of being included in the
sample. It is also called unbiased sampling.
4. Data Presentation
Three ways to present data:
b. Tabular Presentation
*Numerical values are presented using table
*Information are lost in tabular presentation of data
*Frequency distribution table is applicable for qualitative variables
c. Graphical Presentation
*Trends are easily seen in graphs compared to tables.
*It is good to present data using pictures or figures like pictograph
*Pie charts are used to present data as part of one whole. As they are much
better to look at.
*Line graphs are for time-series data.
*It is better to present data using graphs then tables as they are much better to
look at.
5. Constructing Frequency distribution table (FDT) for grouped data. Consider the given data
below. Data below show the ages of 30 senior citizen respondents of Barangay Nueva that are randomly
selected. n = 30
61 64 74 80 63 73 75 64 65 68
71 63 72 76 69 70 74 68 70 65
64 62 63 68 67 69 68 66 63 64
Determine: range = HV - LV = 80 - 61 = 19
class size (i) = 3.47/4
* Class Boundary
* O’GIVE (< and > cum freq)
* Relative frequency
n = 30
Graphical Presentation: Using Line graph and bar graph.