STS
STS
1 SAS
Overview of Statistics - It is a statistical analysis platform that offers options to use either the GUI, or to create
scripts for more advanced analyses. It is a premium solution that is widely used in
Statistics business, healthcare, and human behavior research alike.
- It provides procedure in data collection, presentation, organization, and interpretation to GraphPad Prism
have a meaningful idea.
- It is premium software primarily used within statistics related to biology, but offers a range
Importance of Statistics of capabilities that can be used across various fields.
Statistics plays a major role in many aspects of our lives. Minitab
It is used in sports, for example, to help a general manager decide which player might be - It offers a range of both basic and fairly advanced statistical tools for data analysis.
the best fit for a team.
It is used in politics to help candidates understand how the public feels about various Excel
policies.
It is used in medicine to help determine the effectiveness of new drugs. - It offers a wide variety of tools for data visualization and simple statistics. It is simple to
Statistical research in business enables managers to analyze past performance, predict generate summary metrics and customizable graphics and figures, making it a usable tool
future business practices and lead organizations effectively. Statistics can describe for many who want to see the basics of their data.
markets, inform advertising, set prices and respond to changes in consumer demand.
Basic Terminologies in Statistics
Statistics, being quantitative tools widely used in the areas of economics and finance,
could help to shape effective monetary and fiscal policies and to develop pricing models population - consists of all the members of the group about which you want to draw a conclusion
for financial assets such as equities, bonds, currencies, and derivative securities.
sample - is a portion or part of the population of interest selected for analysis.
Computer Software
Experimental Classification
Imagine you've just spent weeks, months, or even years gathering data for a research project, and
now you want to analyze it all to find out what it means. If the data seems too massive to handle, Dependent variables or outcome variables
then you use computer software to deal with the data and make sure the results are useful and
- measure the behavior of subjects and expected to be influenced by the independent
informative.
variable.
SPSS (Statistical Package for the Social Sciences) - Example: For instance, to predict the value of fertilizer on the growth of plants, the
dependent variable is the growth of plants while the independent variable is the amount of
- It is perhaps the most widely used statistics software package within human behavior fertilizer used.
research. SPSS offers the ability to easily compile descriptive statistics, parametric and
non-parametric analyses, as well as graphical depictions of results through the graphical Mathematical Classification
user interface (GUI).
Continuous variables
R
- are quantitative variables that have an infinite number of possible values that are not
- It is a free statistical software package that is widely used across both human behavior countable. These are variables that are no longer countable but are measurable.
research and in other fields. While R is a very powerful software, it also has a steep - Some examples of these variables are height, weight, volume, etc.
learning curve, requiring a certain degree of coding.
Process of Statistics
MatLab
1. Identify the research objective
- It is an analytical platform and programming language that is widely used by engineers
A researcher must determine the question(s) he or she wants answered. The question(s) must
and scientists. As with R, the learning path is steep, and you will be required to create your
clearly identify the population that is to be studied. Identify the research objective.
own code at some point.
2. Collect the information needed to answer the questions.
Conducting research on an entire population is often difficult and expensive, so we typically look at Primary data can be collected by:
a sample. This step is vital to the statistical process, because if the data are not collected correctly,
the conclusions drawn are meaningless. Do not overlook the importance of appropriate data 1. Direct personal interviews – The researcher has direct contact with the interviewee. The
collection. researcher gathers information by asking questions to the interviewee.
2. Indirect/Questionnaire Method – This method of data collection involves sourcing and
3. Organize and summarize the information. accessing existing data that were originally collected for the purpose of the study.
Descriptive statistics allow the researcher to obtain an overview of the data and can help Questions can either be:
determine the type of statistical methods the researcher should use.
An open-ended question is a type of question that does not include response categories. This
4. Draw conclusion from the information. type of question is usually appropriate for collecting subjective data.
In this step the information collected from the sample is generalized to the population. Inferential A closed-ended question is a type of question that includes a list of response categories from
statistics uses methods that takes results obtained from a sample, extends them to the population, which the respondent will select his answer. This type of question is usually appropriate for
and measures the reliability of the result. collecting objective data.
Important Note: If the entire population is studied, then inferential statistics is not necessary, 3. Focus Group – It is a group interview of approximately six to twelve people who share
because descriptive statistics will provide all the information that we need regarding the similar characteristics or common interests. A facilitator guides the group based on a
population. predetermined set of topics.
4. Experiment – It is a method of collecting data where there is direct human intervention on
the conditions that may affect the values of the variable of interest.
5. Observation – It is a method of collecting data on the phenomenon of interest by
Lesson 1.2
recording the observations made about the phenomenon as it actually happens. involves
Data collection collecting information without asking questions.
- It is the process of gathering and measuring information on variables of interest, in an Secondary data can be collected by:
established systematic fashion that enables one to answer stated research questions, test
1. Published report on newspaper and periodicals.
hypotheses, and evaluate outcomes.
2. Financial Data reported in annual reports.
Consequences of Improperly Collected Data 3. Records maintained by the institution.
4. Internal reports of the government departments.
Inability to answer research questions accurately. 5. Information from official publications.
Inability to repeat and validate the study.
Distorted findings resulting in wasted resources. Take Note:
Misleading other researchers to pursue fruitless avenues of investigation.
Always investigate the validity and reliability of the data by examining the collection
Compromising decisions for public policy.
method employed by your source.
Causing harm to human participants and animal subjects.
Do not use inappropriate data for your research.
Steps in Data Gathering
sample size
1. Set the objectives for collecting data
- It is typically denoted by n and it is always a positive integer.
2. Determine the data needed based on the set objectives.
- No exact sample size can be mentioned here and it can vary in different research settings.
3. Determine the method to be used in data gathering and define the comprehensive data
- However, all else being equal, large sized sample leads to increased precision in estimates
collection points.
of various properties of the population.
4. Design data gathering forms to be used.
5. Collect data.
Take Note:
Methods of Data Collection Representativeness, not size, is the more important consideration.
Use no less than 30 subjects if possible. measure of central tendency
If you use complex statistics, you may need a minimum of 100 or more in your sample
(varies with method). - It is commonly referred to as an average, is a single value that represents a data set.
- Its purpose is to locate the center of a data set.
Choosing of sample size depends on nonstatistical considerations and statistical considerations.
Three Different Measures of Central Tendency
Non-statistical considerations – It may include availability of resources, man power,
budget, ethics and sampling frame. a. Mean or arithmetic mean
Statistical considerations – It will include the desired precision of the estimate. - It is the most frequently used measure of central tendency.
- It is the only common measure in which all values play an equal role meaning to determine
Three criteria need to be specified to determine the appropriate sample size: its values you would need to consider all the values of any given data set.
- It is appropriate to determine the central tendency of an interval or ratio data.
1. Level of Precision – Also called sampling error, the level of precision, is the range in
which the true value of the population is estimated to be. Properties of Mean
2. Confidence Interval – It is statistical measure of the number of times out of 100 that
results can be expected to be within a specified range. For example, a confidence interval A set of data has only one mean.
of 90% means that results of an action will probably meet expectations 90% of the time. Mean can be applied for interval and ratio data.
3. Degree of Variability – Depending upon the target population and attributes under All values in the data set are included in computing the mean.
consideration, the degree of variability varies considerably. The more heterogeneous a The mean is very useful in comparing two or more data sets.
population is, the larger the sample size is required to get an optimum level of precision. Mean is most appropriate in symmetrical data.
Mean is affected by the extreme small or large values (outliers) on a data set.
Reason for Sampling b. median
- It is the midpoint of the data array.
Important that the individuals included in a sample represent a cross section of individuals - When the data set is ordered whether ascending or descending, it is called data array.
in the population. - It is an appropriate measure of central tendency for data that are ordinal or above, but is
If sample is not representative it is biased. You cannot generalize to the population from more valuable in an ordinal type of data.
your statistical data.
Properties of Median
Definitions
The median is unique, there is only one median for a set of data.
Observation unit - An object on which a measurement is taken. This is the basic unit of The median is found by arranging the set of data from lowest or highest (or highest to
observation, sometimes called an element. In studying human populations, observation lowest) and getting the value of the middle observation.
units are often individuals. Median is not affected by the extreme small or large values.
Target population - The complete collection of observations we want to study. Median can be applied for ordinal, interval, and ratio data.
Sampled population - The collection of all possible observation units that might have Median is most appropriate in a skewed data.
been chosen in a sample; the population from which the sample was taken. c. Mode
Sample - A subset of a population. - It is the value in a data set that appears most frequently. Like the median and unlike the
Sampling unit - A unit that can be selected for a sample. We may want to study mean, the extreme values in a data set do not affect the mode.
individuals, but do not have a list of all individuals in the target population. Instead, - A data set that has only one value that occur the greatest frequency is said to be
households serve as the sampling units, and the observation units are the individuals living unimodal.
in the households. - If the data has two values with the same greatest frequency, both values are considered
Sampling technique/Sampling Strategies - It is a plan you set forth to be sure that the the mode and the data set is bimodal.
sample you use in your research study represents the population from which you drew - If a data set have more than two modes, and the data set is said to be multimodal.
your sample. - There are also some cases when data set values have the same number frequency, when
Sampling Bias - This involves problems in your sampling, which reveals that your sample this occur, the data set is said to be no mode.
is not representative of your population.
Properties of Mode
Lesson 2.1
The mode is found by locating the most frequently occurring value.
Measures of Central Tendency The mode is the easiest average to compute.
There can be more than one mode or even no mode in any given data set.
Mode is not affected by the extreme small or large values.
Mode can be applied for nominal, ordinal, interval, and ratio data. Lesson 2.3
- It is a measure of the combined sizes of the two tails. It tells you how tall and sharp the
central peak is, relative to a standard bell curve.
- It is actually the measure of outliers present in the distribution. The outliers in a sample,
therefore, have even more effect on the kurtosis than they do on the skewness.
Higher kurtosis means more of the variance is the result of infrequent extreme deviations,
as opposed to frequent modestly sized deviations. In other words, it is the tails that mostly
account for kurtosis, not the central peak.
The kurtosis decreases as the tails become lighter. It increases as the tails become
heavier.
a. Mesokurtic (Kurtosis=3)
- This distribution has kurtosis statistic similar to that of the normal distribution.
b. Leptokurtic (Kurtosis>3)
- Peak is higher and sharper than normal distribution, which means that data are heavy-
tailed or profusion of outliers.
c. Platykurtic (Kurtosis<3)
- Compared to a normal distribution, its tails are shorter and thinner, and often its central
peak is lower and broader.