0% found this document useful (0 votes)
8 views6 pages

Midterms Matm

The document provides an overview of statistics, its historical context, and its applications in various fields such as medicine, politics, insurance, and sports. It discusses different types of statistics, data measurement scales, measures of central tendency and variation, correlation analysis, and regression analysis. Additionally, it explains key concepts like population, sample, hypothesis testing, and the significance of normal distribution in data analysis.

Uploaded by

Arlene Alemania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

Midterms Matm

The document provides an overview of statistics, its historical context, and its applications in various fields such as medicine, politics, insurance, and sports. It discusses different types of statistics, data measurement scales, measures of central tendency and variation, correlation analysis, and regression analysis. Additionally, it explains key concepts like population, sample, hypothesis testing, and the significance of normal distribution in data analysis.

Uploaded by

Arlene Alemania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

INTRODUCTION TO STATISTICS Predicting

• Statistics is even playing a role in the medical field.


Statistics
• Came from the Latin word, “status” which means Political
state. • Statistics are crucial in a political campaign.
• From the ancient times, statistics was used by Without statistics, no one can run a political.
state leaders to know how much tax to levy their
subjects and how many soldiers are needed in an Insurance
expected war. • Insurance is a vast industry. There are hundreds of
• In capitalism, not also the leaders of the state but insurance i.e. car insurance, bike, life insurance.
also capitalists, are interested in statistical
surveys resulting to increased demand for data Consumer
processing for their increasing benefits such as • Statistics are widely used in consumer goods
insurance. products.

Data Financial
• In statistics is always a result of experiment, • The financial market completely relies on the
observation, investigation and other means and financial market.
often appears as a numerical figure and then
evaluated to make it into useful knowledge. Sports
• For most people, “statistics” is a scary thing that • There is lots of uses of statistics in sports. Every
must be avoided as much as possible because sports require statistics to make the sport more
they think that it is a collection of numbers and effective.
vague formulas.
DIVISION OF STATISTICS
Top 10 Uses of Statistics in Our Day to Day Life
1. Predictions Types of Statistics
2. Testing 1. Descriptive Statistics
3. Forecasts 2. Inferential Statistics
4. Preparedness
5. Predicting Statistics
6. Political • The branch of mathematics that transform data
7. Insurance into useful information for decision makers.
8. Consumer
9. Financial Descriptive Statistics
10. Sports • Collecting, summarizing, and describing data.
• Deals with the collection and presentation of data
Predictions and collection of summarizing values to describe
• The figures help us make predictions about its group characteristics.
something that is going to happen in the future. • The most common summarizing values are the
measure of central tendency and variation.
Testing
• Quality testing is another important use of Inferential Statistics
statistics in every area of life. • Drawing conclusions and/or making decisions
concerning a population based only on sample
Forecasts data.
• Have you ever seen weather forecasting? Do you • Deals with the predictions and inferences based
know how the government. on the analysis and interpretation of the results of
the information gathered by the statistician.
Preparedness • Some of the common statistical tools of inferential
• Statistics is also helpful in emergency statistics are the t-test, z-test, analysis of variance,
preparedness. chi-square, and pearson r.
Major Groupings of Data Scales of Measurement
1. Quantitative Data (Numerical) • Subdivided into four categories and upon drawing
a. Discrete inferences on a random sample, the type of
b. Continuous measurement scale ust be carefully chosen.
i. Interval Scale Data
ii. Ratio Scale Data
2. Qualitative Data (Categorial)
a. Nominal Scale Data (Named)
b. Ordinal Scale Data (Ordered)
i. Nominal with Order
ii. Nominal without Order
iii. Dichotomous

Quantitative Data
• Data that can be measured with numbers, such as
distance, duration, length, revenue, speed.

Discrete
• Whole numbers (integers) that cannot be divided,
such as the number of eggs, number of wins, or
number of dogs. You can’t have 3.2 dogs.
• This data is binary. Nominal
• Classifies elements into two more categories or
Continuous classes, the numbers indicating that the elements
• Numbers that can be broken into finer and finer are different but not according to order or
units (usually within a range). Weight, height, magnitude.
temperature are all examples (3.4981637081 lbs).
Nominal Scale
Qualitative Data • Assign responses to different categories.
• Non-numerical data that is usually textual and • No numerical differences between categories.
descriptive, like “mostly satisfied,” “brown eyes,” • i.e. gender, marital status, state of residence,
“female,” “yes/no,” etc. college major, SSN, zip code, student id.

Variable Ordinal
• A numerical characteristics or attribute associated • A scale that ranks individual in terms of the
with the population being studied. degree to which they possess a characteristic of
• They are further classified as categorical or interest.
qualitative and numerical or quantitative.
Ordinal Scale
Discrete Variables • Set of categories that are ordered from least to
• Values obtained by counting. most.
• i.e. number of pages in a book, shoe size, number • Don’t know numerical distance from each
of people in a race. category to the next.
• i.e. miss america results, miliary rank, letter grade
Continuous Variables in class, degrees held, medical condition, rank
• Values obtained by measuring, all of which cannot order of your preference from 1 to 4.
be put into a list because they can have any value
in some interval of real numbers. Interval
• i.e. legth of a film, temperature, time taken to run a • In addition to ordering scores from high to low, it
race. also establishes a uniform unit in the scale so that
any equal distance between two scores is of euqal
magnitude. There is no absolute zero in this
scale.
Interval Scale PARAMETER AND STATISTICS
• Scale with values, and there is the same
numerical distance between each value. Parameter
• This scale has an arbitrary zero point (no true • A numerical measure that describes a
meaningful zero point). characteristics of the population.
• i.e. current temperature, many behavioral science
questionnaires, IQ. Statistic
• Numerical measure that is used to describe a
Ratio characteristic of a sample.
• In addition to being an interval scale, it also has an
absolute zero.

Ratio Scale
• Scale with scores where there is the same
numerical distance between each score.
• The scale has a true, meaningful zero point that
anchors comparisons, such as “Maribel’s income
is 35% more than Susan’s.”
• i.e. weight of a package of candy, number of times
you return to a restaurant after visiting it the first
time, amount of money in your checking account, MEASURES OF CENTRAL TENDENCY
number of questions correct on a quiz, distance
from San Antonio to Laredo. Measures of Centre or Central Location / Central
Tendency
• To describe a whole set of data with a single value
that represents the middle or centre of its
distribution.
• To put in other words, it a way to describe the
center of a data set.
• It is to let us know what is normal or average for
a set of data.
• It also condenses the data set down to one
representative value, which is useful when you are
working with large amounts of data.
POPULATION AND SAMPLE
Mode
Population • Most frequent data point.
• Defined as groups of people, animals, places, • Mode exists as a data point.
things or ideas to which any conclusions based • Unaffected by extreme values.
on characteristics of a sample will be applied.
• Useful for qualitative data.
• May have more than 1 value.
Sample
• A subgroup of the population.
Median
• Value that divides ranked data points into
halves: 50% larger than it, 50% smaller.
• May not exist as a data point in the set.
• Influenced by position of items, but not their
values.
Mean MEASURES OF OTHER POSITION
• Most stable measure.
• Affected by extreme values. Quantiles
• May not exist as a data point in the set. • It is a score distribution where the scores are
divided into different equal parts.
Mean
• The mean by definition is the sum of all the values Three (3) Kinds of Quantiles
in the observation or a dataset divided by the total 1. Quartile
number of observations. This is also known as the 2. Decile
arithmetic average. 3. Percentile
• The mean can be used for both continuous and
discrete numeric data as well as for categorical Quartile
data, as the values cannot be summed. • A measure of position that divides the ordered
• As the mean includes every value in the observations or score distribution into 4 equal
distribution the mean is influenced by outliers parts.
(which are numbers that are much higher or much
lower than the rest of the data set) and skewed Decile
(asymmetric) distributions. • A measure of position that divides the ordered
• This measurement is applicable to use for ratio and observations or score distribution into 10 equal
interval data. parts.

Median Percentile
• The median is considered as the physical middle • A measure of position that divides the ordered
point in a distribution because it is located at the observations or score distribution into 100 equal
center position when the values are arranged in parts.
ascending or descending order, which in turn
divides the distribution in half (there are 50% of
observations on either side of the median value).
• If a distribution has an odd number of
observations, the median value is the middle
value.
• If it is an even number, the median value is the
mean or average of the two middle values.

Mode Q1 = D5 = P50 = MEDIAN


• The mode can be found for both numerical and
categorical (non-numerical) data. It is the most MEASURES OF VARIATIONS
commonly occurring value in a distribution.
• There can be more than one mode for the same Measures of Variation
distribution of data, (bi-modal, or multi-modal), • The measure of variation will enable you to know
thus limiting the ability of the mode in describing how varied the observations are, whether there are
the center of the distribution. extremes values in the distribution, or whether the
• In some particular cases, the distribution may values are very close to each other.
have no mode at all (i.e. if all values are different).In • If the measure of variation is zero, it means that
such case, it may be better to consider using the there is no variation at all and that the observations
median or mean, or group the data in to are all alike, or homogeneous. Otherwise, they are
appropriate intervals, and find the modal class. heterogenous.

The Common Measure of Variation


1. Range
2. Mean Absolute Deviation
3. Variance
4. Standard Deviation
5. Quartile Deviation and inter – Quartile Range
Range • The mean, median, and mode of the normal curve
• The range is the simplest form of measuring the distrbution are equal.
variation of a distribution. • The area under the normal curved is approximately
• The range is simple to compute and is useful when equal to 1 or 100%.
you wish to evaluate the whole of a dataset. • The standardized normal distribution has a mean
• The range is useful for showing the spread within a of 0 and standard deviation of 1.
dataset and for comparing the spread between
similar datasets. Standard Score – Z – Score
• It’s a measure of how many standard deviations
Mean Absolute Deviation (MAD) below or above the population mean a raw score
• A aet of data is the avrage distance between each is.
data value and the mean. • Z-scores are expressed in terms of standard
deviations from their means.
Variance • These z-scores have a distribution with a mean of
• Measures the dispersion of a set of data points 0 and a standard deviation of 1.
around their mean.
• Variance is another measure of variation which can CORRELATION ANALYSIS
be used instead of the range.
• The variance considers the deviation of each Correlation
observation from the mean. • It is a statistical technique used to determine the
• To obtain the variance of a distribution, first square degree to which two variables (x and y) are related.
the deviation from the mean of each row score and • Finding the relationship between two quantitative
add them together. variables without being able to infer causal
• Then, divide the resulting sum by N or the total relationships.
number of cases. • The goal of a correlation analysis is to see the
strength and the nature/direction of the
Standard Deviation relationship between two variables.
• The standard deviation, (σ) for a population and (s) • Bivariate data involve two variables that are taken
for a sample, is the square root of the value of the from a sample or population.
variance.
Scatter Diagram
Quartile Deviation (Inter – Quartile Deviation) • Two variables are positively correlated if the values
• The quartile deviation is another way of of the two variables both increase.
determining the spread of a distribution in terms of • Two variables are negatively correlated if the
quartiles. values of one variable increase while the values of
the other decrease.
NORMAL DISTRIBUTION • Two variables are not correlated, or they have zero
correlation if one variable neither increase nor
Normal Distribution decreases while the other increases.
• Most scientific and business data and natural
relationships, such as weight, height, etc., when Simple Correlation Coefficient (r)
displayed using a histogram frequency curve are • It is also called Pearson's correlation or Product
bell-shaped, and symmetrical. Moment Correlation Coefficient.
• It measures the nature and strength between two
Normal Curve variables of the quantitative type.
• The graphical representation in statistics.
Correlation Coefficient
Characteristics of a Normal Curve • The sign of r denotes the nature of association
• Mathematical model represented by a bell-shaped • While the value of r denotes the strength of
curve which is symmetric with respect to the association.
mean. • The value of r ranges between ( -1) and ( +1).
• The normal curve does not intersect or touch the
horizontal axis.
Correlation
• Correlation describes the strength of a linear
relationship between two variables.

Regression Line
• Calculates the “best-fit” line for a certain set of
data.
• The regression line makes the sum of the squares
of the residuals smaller than for any other line.

Regression Equation
• Regression equation describes the regression line
mathematically.
Pearson Correlation Coefficient
• Pearson correlation coefficient, r, indicates how far
away all these data points are to this line of best fit
(i.e., how well the data points fit this new
model/line of best fit).

Testing of Hypothesis
• In statistics is a way for you to test the results of a
survey or experiment to see if you have meaningful
results.

Null Hypothesis (Ho)


• Is a general statement that states that there is no
relationship between two phenomenon's under
consideration or that there is no association
between two groups.

Alternative Hypothesis (Ha)


• Is a statement that describes that there is a
relationship between two selected variables in a
study.

Spearman Rank Correlation Coefficient (p)


• When the entries in as set of data are ranks, the
spearman’s rank correlation coefficient 𝜌 (also
known as the Spearman’s rho) will be used in
hypothesis testing.

Regression Analysis

Regression Analysis
• Technique concerned with predicting some
variables by knowing others
• The process of predicting variable Y using variable
X.

Regression
• Uses a variable (x) to predict some outcome
variable (y)
• Tells you how values in y change as a function of
changes in values of x.

You might also like