0% found this document useful (0 votes)
15 views10 pages

STAT Module I Notes

Statistics is the mathematical science of collecting, analyzing, interpreting, presenting, and organizing data, essential for informed decision-making across various fields. It includes primary and secondary data, discrete and continuous data, and types of statistics such as descriptive and inferential. The document also covers data collection methods, frequency distribution, and measures of central tendency, highlighting the importance of statistical tools in understanding complex information.

Uploaded by

shristipuskar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

STAT Module I Notes

Statistics is the mathematical science of collecting, analyzing, interpreting, presenting, and organizing data, essential for informed decision-making across various fields. It includes primary and secondary data, discrete and continuous data, and types of statistics such as descriptive and inferential. The document also covers data collection methods, frequency distribution, and measures of central tendency, highlighting the importance of statistical tools in understanding complex information.

Uploaded by

shristipuskar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

MODULE:1

Definition of statistics
Statistics can be defined as the mathematical science that involves the
collection, analysis, interpretation, presentation, and organisation of data. It
encompasses methods for summarising and drawing meaningful inferences
from data, enabling individuals to make informed decisions, understand
patterns, and uncover insights within various phenomena. Statistics plays a
crucial role in various fields, from scientific research and business analysis to
social studies and policymaking, by providing tools to handle and extract
meaning from complex and often uncertain information.
• Charcteristics OF STATISTICS
Statistics should deal with an aggregate of individuals rather than with
individuals alone.
• Statistics should be expressed as numerical figures.
• Statistics should be obtained for predetermined purposes.
• Statistics collected should allow comparison with other data.
What is Data?
Definition: Facts or figures, numerical or otherwise, collected with a definite
purpose are called data.

Primary Data
Primary data is collected for the first time through personal experiences or
evidence, particularly for research.
It is also described as raw data or first-hand information.
The mode of assembling the information is costly.
The data is mainly collected through observations, physical testing, mailed
questionnaires, surveys, personal interviews, telephonic interviews, case
studies, focus groups, etc.
Secondary Data
Secondary data is second-hand data already collected and recorded by
some researchers for their purpose and not for the current research
problem.
It is accessible through data collected from different sources such as
government publications, censuses, internal organisation records, books,
journal articles, websites and reports, etc.
This method of gathering data is affordable, readily available, and saves
cost and time.
However, the one disadvantage is that the information assembled is for
some other purpose and may not meet the present research purpose or be
inaccurate.
Discrete Vs continuous data
Discrete data (countable) is information that can only take certain values.
These values don’t have to be whole numbers but they are fixed values –
such as shoe size, number of teeth, number of kids, etc.
Discrete data includes discrete variables that are finite, numeric, countable,
and non-negative integers (5, 10, 15, and so on).
Continuous data (measurable) is data that can take any value. Height,
weight, temperature and length are all examples of continuous data.
Continuous data changes over time and can have different values at
different time intervals like weight of a person
Types of statistics
Descriptive Statistics: This involves summarising and presenting data in a
meaningful way. Measures such as mean (average), median (middle value),
mode (most common value), and measures of dispersion like range and
standard deviation fall under this category.
Inferential Statistics: This involves concluding a population based on a sample
of data. It includes techniques such as hypothesis testing, confidence intervals,
and regression analysis.
SCOPE OF STATISTICS :
The scope of statistics is broad and encompasses various aspects related to
data collection, analysis, interpretation, and utilisation. Here are some key
components within the scope of statistics:
Biostatistics: Applies statistical methods to biological and medical data, aiding
in clinical trials, epidemiological studies, and medical research.
Econometrics: Applies statistical techniques to economic data, helping
economists analyse economic relationships and forecast trends.
Social Sciences: Utilizes statistics to study social phenomena, including
demographics, public opinion, and behaviour patterns.
Business and Marketing: Uses statistics to analyse market trends, consumer
preferences, and business performance metrics.
Environmental Statistics: Applies statistical methods to analyse environmental
data, such as pollution levels, climate patterns, and ecological changes.
Quality Control: In industries, statistics is used to monitor and enhance
product and process quality through methods like Six Sigma.
Psychometrics: Applies statistical methods to measure psychological traits and
abilities, commonly used in educational and psychological testing.
Statistical Ethics: Addresses ethical considerations such as data privacy,
avoiding bias, and accurately presenting results.
Educational Statistics: Analyzes educational data to evaluate educational
systems, and student performance, and inform policy decisions.
The scope of statistics continues to evolve with advancements in data science,
machine learning, and technology. It is crucial in providing insights, supporting
decision-making, and advancing knowledge across various disciplines.
Data Presentation
Two types of statistical presentation of data - graphical and numerical.
Graphical Presentation: We look for the overall pattern and for striking
deviations from that pattern. The shape, centre, and spread of the data
usually describes the overall pattern. An individual value that falls outside
the overall pattern is called an outlier.
Bar diagrams and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for a numerical variable
Histogram
A histogram is a graphical display of data using bars of different heights. In
a histogram, each bar groups numbers into ranges. Taller bars show that
more data falls in that range. A histogram displays the shape and spread of
continuous sample data.

Classification of Data
There are four types of classification. They are:
Geographical classification
When data are classified based on location or area, it is called geographical
classification
Chronological classification
Chronological classification means classification based on time, like months,
years etc.
Qualitative classification
In Qualitative classification, data are classified based on some attributes or
quality such as gender, colour of hair, literacy and religion. In this type of
classification, the attribute under study cannot be measured. It can only be
found whether it is present or absent in the units of study.
Quantitative classification
Quantitative classification refers to the classification of data according to
some characteristics which can be measured, such as height, weight,
income, profits etc.
Quantitative classification
There are two types of quantitative data classification: Discrete frequency
distribution and Continuous frequency distribution.
In this type of classification, there are two elements
variable
Variable refers to the characteristic that varies in magnitude or quantity.
E.g. weight of the students. A variable may be discrete or continuous.
Frequency
Frequency refers to the number of times each variable gets repeated. For
example there are 50 students having weight of 60 kgs. Here 50 students is
the frequency.
Frequency distribution
The following technical terms are important when a continuous frequency
distribution is formed
Class limits: Class limits are the lowest and highest values that can be
included in a class. For example take the class 51-55. The lowest value of
the class is 51 and the highest value is 55. In this class there can be no value
lesser than 51 or more than 55. 51 is the lower class limit and 55 is the
upper class limit.
Class interval: The difference between the upper and lower limit of a class
is known as class interval of that class.
Class frequency: The number of observations corresponding to a particular
class is known as the frequency of that class
Methods of collecting data
Data collection involves gathering information for analysis and research
purposes. There are several methods for collecting data, each suited to
different types of research questions, study designs, and resources. Here are
some common methods of data collection:
Surveys: Surveys involve asking a set of standardized questions to a sample of
individuals or groups. Surveys can be conducted through interviews (in-person,
phone, or video), questionnaires (paper-based or online), or email. Surveys are
useful for collecting self-reported information and opinions.
Observations: This method involves systematically watching and recording
behaviors, events, or phenomena as they naturally occur. Observations can be
participant (the researcher is involved) or non-participant (the researcher is an
observer), and they're commonly used in fields like anthropology, psychology,
and social sciences.
Experiments: Experiments involve manipulating one or more variables and
observing their effects on other variables in a controlled environment.
Experiments are often used to establish cause-and-effect relationships.
Case Studies: Case studies involve an in-depth examination of a specific
individual, group, event, or situation. Researchers gather extensive data to
understand the complexities and nuances of the subject.
Archival Research: Researchers analyze existing records, documents, and data
sources to gather information. This method is particularly useful for historical
research or when access to participants is challenging.
Secondary Data Analysis: Researchers use existing data collected for other
purposes. This method can save time and resources, but it's crucial to ensure
that the data is relevant and reliable.
Content Analysis: This method involves analyzing the content of texts,
documents, media, or other communication sources to extract patterns,
themes, and insights.
Focus Groups: In a focus group, a small group of participants discuss specific
topics or issues in a facilitated discussion. This method is useful for exploring
opinions, attitudes, and perceptions.
Sampling: Sampling involves selecting a subset of a larger population for data
collection. Common sampling methods include random sampling, stratified
sampling, and convenience sampling.
Census: A census involves collecting data from every member of a population.
While comprehensive, censuses can be resource-intensive and time-
consuming.
The choice of data collection method depends on research objectives, available
resources, the type of data needed, and ethical considerations. Researchers
should carefully select and tailor their data collection methods to ensure the
accuracy, validity, and relevance of the collected information.
Frequency distribution graphs visually represent the distribution of data values
in a dataset. These graphs help to understand the patterns, central tendency,
and variability of the data. Here are some common types of graphs used for
displaying frequency distributions:
Histogram: A histogram is a bar graph that represents the frequency of data
values within specific intervals (bins) on the x-axis. The height of each bar
corresponds to the frequency of values within the interval. Histograms are
particularly useful for visualizing the distribution's shape and identifying
potential outliers.
Frequency Polygon: A frequency polygon is created by connecting the
midpoints of each interval in a histogram with straight lines. It provides a
smoother representation of the distribution and is useful for comparing
multiple distributions.
Stem-and-Leaf Plot: A stem-and-leaf plot is a graphical representation that
displays individual data values. It divides each data value into a stem (larger
digit) and a leaf (smaller digit) to create a clear visualization of the data's
distribution.
Bar Graph: A bar graph displays the frequency or count of distinct categories or
discrete data points. It uses bars of equal width to represent each category,
and the height of the bars represents the frequency of each category.
Pie Chart: While more commonly used for displaying proportions, a pie chart
can also be used to show the distribution of categorical data. Each category is
represented as a slice of the pie, with the slice's size corresponding to the
category's relative frequency.
Box Plot (Box-and-Whisker Plot): A box plot visually summarises the
distribution's central tendency, variability, and potential outliers. It displays the
median, quartiles, and potential outliers using a box and whiskers.
Dot Plot: A dot plot places a dot for each data point above its corresponding
value on the x-axis. This creates a clear visual representation of the data's
distribution, highlighting clusters and gaps.
Ogive (Cumulative Frequency Polygon): An ogive is a line graph that represents
the cumulative frequency of data values. It helps visualise how the data
accumulates over the range of values.
Pareto Chart: A Pareto chart is a bar graph that displays the frequency of
different categories in descending order. It often highlights the most significant
factors contributing to a problem.
Scatter Plot: While commonly used for showing relationships between two
variables, a scatter plot can also display the distribution of two-dimensional
data points.
The choice of the graph depends on the nature of the data and the specific
insights you want to convey. Selecting a graph that accurately and effectively
represents the frequency distribution while considering the audience's
understanding and the data context is essential.
Measures of Centre Tendency(MODULE II)
In statistics, the central tendency is the descriptive summary of a data set.
The single value from the dataset, it reflects the centre of the data
distribution.
Moreover, it does not provide information regarding individual data from
the dataset, where it summarises the dataset. Generally, the central
tendency of a dataset can be defined using some of the measures in
statistics.

Mean
The mean represents the average value of the dataset.
It can be calculated as the sum of all the values in the dataset divided by
the number of values. In general, it is considered as the arithmetic mean.
Some other measures of mean used to find the central tendency are as
follows:
Geometric Mean (nth root of the product of n numbers)
Harmonic Mean (the reciprocal of the average of the reciprocals)
Weighted Mean (where some values contribute more than others)
It is observed that if all the values in the dataset are the same, then all
geometric, arithmetic and harmonic mean values are the same. If there is
variability in the data, then the mean value differs.
Characteristics of a Good Average
Characteristics for a good or an ideal average
:
The following properties should possess for an ideal average.
1. It should be rigidly defined.
2.It should be easy to understand and compute.
3. It should be based on all items in the data.
4. It should be capable of further algebraic treatment.
5. It should be capable of further algebraic treatment.
6. It should have sampling stability.
7.It should be capable of being used in further statistical computations
Or processing.

You might also like