Statistics
Statistics
1. Meaning of Statistics
2. Scope of Statistics
● Descriptive Statistics:
○ Definition: Involves methods that use a random sample of data taken from a
population to make inferences about the population.
○ Techniques: Hypothesis testing, confidence intervals, and regression
analysis.
● Applications:
3. Nature of Statistics
○ Science: Involves systematic methods and principles for data collection and
analysis.
○ Art: Requires skill and judgment in data interpretation and presentation.
● Collective Data:
4. Characteristics of Statistics
5. Importance of Statistics
6. Limitations of Statistics
2. Types of Data
● Primary Data:
○ Definition: Data collected firsthand for a specific research purpose.
○ Sources: Surveys, interviews, observations, experiments.
○ Advantages:
■ Relevant and specific to the study.
■ Up-to-date information.
○ Disadvantages:
■ Time-consuming and costly.
■ Requires planning and methodology.
● Secondary Data:
○ Definition: Data that has already been collected and published by others for
different purposes.
○ Sources: Books, articles, reports, government publications, databases.
○ Advantages:
■ Quick and easy to obtain.
■ Cost-effective.
○ Disadvantages:
■ May not be relevant or specific to the research.
■ Data quality and accuracy may vary.
● Non-Response Bias: Lack of responses from certain segments can lead to biased
results.
● Sampling Errors: Inaccuracies arising from the sampling method used.
● Data Quality: Ensuring accuracy, consistency, and validity of collected data.
● Time and Cost Constraints: Limited resources can affect the data collection
process.
● Informed Consent: Participants should be informed about the study and give
consent before participating.
● Confidentiality: Ensure the privacy of participants and protect sensitive information.
● Avoiding Misrepresentation: Present findings accurately without manipulation or
distortion.
● Raw Data: Unprocessed data collected directly from the source, often unorganized
and difficult to analyze.
● Organized Data: Data that has been processed and arranged in a structured format,
making it more useful for analysis.
● Tabular Form:
4. Frequency Distribution
5. Cumulative Frequency
● Ungrouped Data: Raw data presented in its original form without any summarization
(e.g., individual test scores).
● Grouped Data: Data that is organized into classes or categories (e.g., scores
organized into ranges).
● Definition: Measures of central tendency are statistical measures that describe the
center or typical value of a dataset.
● Purpose: They provide a summary measure that represents the entire dataset,
helping to understand its overall distribution.
● Arithmetic Mean: The average of a dataset, calculated by dividing the sum of all
values by the number of values.
● Median: The middle value of a dataset when it is ordered.
● Mode: The value that occurs most frequently in a dataset.
3. Arithmetic Mean
0 - 10 3 5 15
10 - 20 5 15 75
20 - 30 2 25 50
Total 10 140
■ Mean:
○ xˉ=14010=14\bar{x} = \frac{140}{10} = 14
2. Median
Definition: The median is the middle value of a dataset when ordered from smallest to
largest.
Characteristics:
- Divides the dataset into two equal halves.
- Not affected by extreme values (outliers).
Calculation of Median:
Example:
3. Mode
Definition: The mode is the value that appears most frequently in a dataset.
Characteristics:
- Can be used with categorical, ordinal, and numerical data.
- A dataset may have one mode (unimodal), more than one mode (bimodal or multimodal),
or no mode.
Calculation of Mode:
Example:
Example:
2. Types of Correlation
Positive Correlation: When one variable increases, the other variable also increases (e.g.,
height and weight).
Negative Correlation: When one variable increases, the other variable decreases (e.g.,
temperature and heating demand).
No Correlation: No relationship exists between the two variables; changes in one do not
affect the other.
3. Coefficient of Correlation
Definition: A numerical measure that quantifies the degree of correlation between two
variables.
Range: The coefficient of correlation (denoted as r) ranges from -1 to +1:
- r = +1: Perfect positive correlation
- r = -1: Perfect negative correlation
- r = 0: No correlation
Steps to Calculate:
1. Gather paired data (x, y).
2. Compute sums and sums of squares.
3. Substitute values into the formula.
Example:
Definition: Measures the strength and direction of association between two ranked variables.
Formula:
rs = 1 - [6Σd²] / [n(n² - 1)]
Where:
- di = Difference between ranks of each pair
- n = Number of pairs
Steps to Calculate:
Example:
6. Limitations of Correlation
- Causation vs. Correlation: Correlation does not imply causation; two variables may
correlate without one causing the other.
- Sensitivity to Outliers: Extreme values can significantly affect the correlation coefficient.
- Non-Linear Relationships: Pearson’s correlation only measures linear relationships; it may
not capture other types of relationships effectively.