STAT Module I Notes

Statistics is the mathematical science of collecting, analyzing, interpreting, presenting, and organizing data, essential for informed decision-making across various fields. It includes primary and secondary data, discrete and continuous data, and types of statistics such as descriptive and inferential. The document also covers data collection methods, frequency distribution, and measures of central tendency, highlighting the importance of statistical tools in understanding complex information.

Uploaded by

shristipuskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

STAT Module I Notes

Uploaded by

shristipuskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

MODULE:1

Definition of statistics
Statistics can be defined as the mathematical science that involves the
collection, analysis, interpretation, presentation, and organisation of data. It
encompasses methods for summarising and drawing meaningful inferences
from data, enabling individuals to make informed decisions, understand
patterns, and uncover insights within various phenomena. Statistics plays a
crucial role in various fields, from scientific research and business analysis to
social studies and policymaking, by providing tools to handle and extract
meaning from complex and often uncertain information.
• Charcteristics OF STATISTICS
Statistics should deal with an aggregate of individuals rather than with
individuals alone.
• Statistics should be expressed as numerical figures.
• Statistics should be obtained for predetermined purposes.
• Statistics collected should allow comparison with other data.
What is Data?
Definition: Facts or figures, numerical or otherwise, collected with a definite
purpose are called data.

Primary Data
Primary data is collected for the first time through personal experiences or
evidence, particularly for research.
It is also described as raw data or first-hand information.
The mode of assembling the information is costly.
The data is mainly collected through observations, physical testing, mailed
questionnaires, surveys, personal interviews, telephonic interviews, case
studies, focus groups, etc.
Secondary Data
Secondary data is second-hand data already collected and recorded by
some researchers for their purpose and not for the current research
problem.
It is accessible through data collected from different sources such as
government publications, censuses, internal organisation records, books,
journal articles, websites and reports, etc.
This method of gathering data is affordable, readily available, and saves
cost and time.
However, the one disadvantage is that the information assembled is for
some other purpose and may not meet the present research purpose or be
inaccurate.
Discrete Vs continuous data
Discrete data (countable) is information that can only take certain values.
These values don’t have to be whole numbers but they are fixed values –
such as shoe size, number of teeth, number of kids, etc.
Discrete data includes discrete variables that are finite, numeric, countable,
and non-negative integers (5, 10, 15, and so on).
Continuous data (measurable) is data that can take any value. Height,
weight, temperature and length are all examples of continuous data.
Continuous data changes over time and can have different values at
different time intervals like weight of a person
Types of statistics
Descriptive Statistics: This involves summarising and presenting data in a
meaningful way. Measures such as mean (average), median (middle value),
mode (most common value), and measures of dispersion like range and
standard deviation fall under this category.
Inferential Statistics: This involves concluding a population based on a sample
of data. It includes techniques such as hypothesis testing, confidence intervals,
and regression analysis.
SCOPE OF STATISTICS :
The scope of statistics is broad and encompasses various aspects related to
data collection, analysis, interpretation, and utilisation. Here are some key
components within the scope of statistics:
Biostatistics: Applies statistical methods to biological and medical data, aiding
in clinical trials, epidemiological studies, and medical research.
Econometrics: Applies statistical techniques to economic data, helping
economists analyse economic relationships and forecast trends.
Social Sciences: Utilizes statistics to study social phenomena, including
demographics, public opinion, and behaviour patterns.
Business and Marketing: Uses statistics to analyse market trends, consumer
preferences, and business performance metrics.
Environmental Statistics: Applies statistical methods to analyse environmental
data, such as pollution levels, climate patterns, and ecological changes.
Quality Control: In industries, statistics is used to monitor and enhance
product and process quality through methods like Six Sigma.
Psychometrics: Applies statistical methods to measure psychological traits and
abilities, commonly used in educational and psychological testing.
Statistical Ethics: Addresses ethical considerations such as data privacy,
avoiding bias, and accurately presenting results.
Educational Statistics: Analyzes educational data to evaluate educational
systems, and student performance, and inform policy decisions.
The scope of statistics continues to evolve with advancements in data science,
machine learning, and technology. It is crucial in providing insights, supporting
decision-making, and advancing knowledge across various disciplines.
Data Presentation
Two types of statistical presentation of data - graphical and numerical.
Graphical Presentation: We look for the overall pattern and for striking
deviations from that pattern. The shape, centre, and spread of the data
usually describes the overall pattern. An individual value that falls outside
the overall pattern is called an outlier.
Bar diagrams and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for a numerical variable
Histogram
A histogram is a graphical display of data using bars of different heights. In
a histogram, each bar groups numbers into ranges. Taller bars show that
more data falls in that range. A histogram displays the shape and spread of
continuous sample data.

Classification of Data
There are four types of classification. They are:
Geographical classification
When data are classified based on location or area, it is called geographical
classification
Chronological classification
Chronological classification means classification based on time, like months,
years etc.
Qualitative classification
In Qualitative classification, data are classified based on some attributes or
quality such as gender, colour of hair, literacy and religion. In this type of
classification, the attribute under study cannot be measured. It can only be
found whether it is present or absent in the units of study.
Quantitative classification
Quantitative classification refers to the classification of data according to
some characteristics which can be measured, such as height, weight,
income, profits etc.
Quantitative classification
There are two types of quantitative data classification: Discrete frequency
distribution and Continuous frequency distribution.
In this type of classification, there are two elements
variable
Variable refers to the characteristic that varies in magnitude or quantity.
E.g. weight of the students. A variable may be discrete or continuous.
Frequency
Frequency refers to the number of times each variable gets repeated. For
example there are 50 students having weight of 60 kgs. Here 50 students is
the frequency.
Frequency distribution
The following technical terms are important when a continuous frequency
distribution is formed
Class limits: Class limits are the lowest and highest values that can be
included in a class. For example take the class 51-55. The lowest value of
the class is 51 and the highest value is 55. In this class there can be no value
lesser than 51 or more than 55. 51 is the lower class limit and 55 is the
upper class limit.
Class interval: The difference between the upper and lower limit of a class
is known as class interval of that class.
Class frequency: The number of observations corresponding to a particular
class is known as the frequency of that class
Methods of collecting data
Data collection involves gathering information for analysis and research
purposes. There are several methods for collecting data, each suited to
different types of research questions, study designs, and resources. Here are
some common methods of data collection:
Surveys: Surveys involve asking a set of standardized questions to a sample of
individuals or groups. Surveys can be conducted through interviews (in-person,
phone, or video), questionnaires (paper-based or online), or email. Surveys are
useful for collecting self-reported information and opinions.
Observations: This method involves systematically watching and recording
behaviors, events, or phenomena as they naturally occur. Observations can be
participant (the researcher is involved) or non-participant (the researcher is an
observer), and they're commonly used in fields like anthropology, psychology,
and social sciences.
Experiments: Experiments involve manipulating one or more variables and
observing their effects on other variables in a controlled environment.
Experiments are often used to establish cause-and-effect relationships.
Case Studies: Case studies involve an in-depth examination of a specific
individual, group, event, or situation. Researchers gather extensive data to
understand the complexities and nuances of the subject.
Archival Research: Researchers analyze existing records, documents, and data
sources to gather information. This method is particularly useful for historical
research or when access to participants is challenging.
Secondary Data Analysis: Researchers use existing data collected for other
purposes. This method can save time and resources, but it's crucial to ensure
that the data is relevant and reliable.
Content Analysis: This method involves analyzing the content of texts,
documents, media, or other communication sources to extract patterns,
themes, and insights.
Focus Groups: In a focus group, a small group of participants discuss specific
topics or issues in a facilitated discussion. This method is useful for exploring
opinions, attitudes, and perceptions.
Sampling: Sampling involves selecting a subset of a larger population for data
collection. Common sampling methods include random sampling, stratified
sampling, and convenience sampling.
Census: A census involves collecting data from every member of a population.
While comprehensive, censuses can be resource-intensive and time-
consuming.
The choice of data collection method depends on research objectives, available
resources, the type of data needed, and ethical considerations. Researchers
should carefully select and tailor their data collection methods to ensure the
accuracy, validity, and relevance of the collected information.
Frequency distribution graphs visually represent the distribution of data values
in a dataset. These graphs help to understand the patterns, central tendency,
and variability of the data. Here are some common types of graphs used for
displaying frequency distributions:
Histogram: A histogram is a bar graph that represents the frequency of data
values within specific intervals (bins) on the x-axis. The height of each bar
corresponds to the frequency of values within the interval. Histograms are
particularly useful for visualizing the distribution's shape and identifying
potential outliers.
Frequency Polygon: A frequency polygon is created by connecting the
midpoints of each interval in a histogram with straight lines. It provides a
smoother representation of the distribution and is useful for comparing
multiple distributions.
Stem-and-Leaf Plot: A stem-and-leaf plot is a graphical representation that
displays individual data values. It divides each data value into a stem (larger
digit) and a leaf (smaller digit) to create a clear visualization of the data's
distribution.
Bar Graph: A bar graph displays the frequency or count of distinct categories or
discrete data points. It uses bars of equal width to represent each category,
and the height of the bars represents the frequency of each category.
Pie Chart: While more commonly used for displaying proportions, a pie chart
can also be used to show the distribution of categorical data. Each category is
represented as a slice of the pie, with the slice's size corresponding to the
category's relative frequency.
Box Plot (Box-and-Whisker Plot): A box plot visually summarises the
distribution's central tendency, variability, and potential outliers. It displays the
median, quartiles, and potential outliers using a box and whiskers.
Dot Plot: A dot plot places a dot for each data point above its corresponding
value on the x-axis. This creates a clear visual representation of the data's
distribution, highlighting clusters and gaps.
Ogive (Cumulative Frequency Polygon): An ogive is a line graph that represents
the cumulative frequency of data values. It helps visualise how the data
accumulates over the range of values.
Pareto Chart: A Pareto chart is a bar graph that displays the frequency of
different categories in descending order. It often highlights the most significant
factors contributing to a problem.
Scatter Plot: While commonly used for showing relationships between two
variables, a scatter plot can also display the distribution of two-dimensional
data points.
The choice of the graph depends on the nature of the data and the specific
insights you want to convey. Selecting a graph that accurately and effectively
represents the frequency distribution while considering the audience's
understanding and the data context is essential.
Measures of Centre Tendency(MODULE II)
In statistics, the central tendency is the descriptive summary of a data set.
The single value from the dataset, it reflects the centre of the data
distribution.
Moreover, it does not provide information regarding individual data from
the dataset, where it summarises the dataset. Generally, the central
tendency of a dataset can be defined using some of the measures in
statistics.

Mean
The mean represents the average value of the dataset.
It can be calculated as the sum of all the values in the dataset divided by
the number of values. In general, it is considered as the arithmetic mean.
Some other measures of mean used to find the central tendency are as
follows:
Geometric Mean (nth root of the product of n numbers)
Harmonic Mean (the reciprocal of the average of the reciprocals)
Weighted Mean (where some values contribute more than others)
It is observed that if all the values in the dataset are the same, then all
geometric, arithmetic and harmonic mean values are the same. If there is
variability in the data, then the mean value differs.
Characteristics of a Good Average
Characteristics for a good or an ideal average
:
The following properties should possess for an ideal average.
1. It should be rigidly defined.
2.It should be easy to understand and compute.
3. It should be based on all items in the data.
4. It should be capable of further algebraic treatment.
5. It should be capable of further algebraic treatment.
6. It should have sampling stability.
7.It should be capable of being used in further statistical computations
Or processing.

Report
100% (3)
Report
101 pages
Module A Statistics
No ratings yet
Module A Statistics
36 pages
Statistics
No ratings yet
Statistics
41 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
Statistics
No ratings yet
Statistics
81 pages
Foundational Mathematics of Data Science B. Tech Sem-VI UNIT-I, II
No ratings yet
Foundational Mathematics of Data Science B. Tech Sem-VI UNIT-I, II
41 pages
Statistics Ppt.1
No ratings yet
Statistics Ppt.1
39 pages
Statistics 24 04 2021 20210618114031
No ratings yet
Statistics 24 04 2021 20210618114031
41 pages
Data Managementmmw
No ratings yet
Data Managementmmw
26 pages
Basics of Statistics
No ratings yet
Basics of Statistics
32 pages
Lecture 01 Introduction To Statistics PPT 06022025 095924am
No ratings yet
Lecture 01 Introduction To Statistics PPT 06022025 095924am
40 pages
ML Unit 2
No ratings yet
ML Unit 2
21 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Elements of Statistics BCA Sem-I.
No ratings yet
Elements of Statistics BCA Sem-I.
46 pages
1-Introduction To Statistics
100% (1)
1-Introduction To Statistics
19 pages
BBA 2nd Sem - BBAHC-3
No ratings yet
BBA 2nd Sem - BBAHC-3
72 pages
Introduction To Statistics For IGCSE Students
No ratings yet
Introduction To Statistics For IGCSE Students
10 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
2nd Software Engineering
No ratings yet
2nd Software Engineering
107 pages
Week One May 20 bcsc108
No ratings yet
Week One May 20 bcsc108
13 pages
Stat For Engand Scientist - 231127 - 120304
No ratings yet
Stat For Engand Scientist - 231127 - 120304
75 pages
Unit 2
No ratings yet
Unit 2
72 pages
MMW
No ratings yet
MMW
7 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
Educational-Statistics Basic-Terms Sampling Data-Gathering
No ratings yet
Educational-Statistics Basic-Terms Sampling Data-Gathering
21 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Optimum Statistical Classifiers
100% (1)
Optimum Statistical Classifiers
12 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
Basics of Business Statistics
100% (1)
Basics of Business Statistics
66 pages
5 18k2co03 2021012812464915
No ratings yet
5 18k2co03 2021012812464915
51 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Deep Learning KCS078
0% (1)
Deep Learning KCS078
2 pages
Nature of Statistics
100% (1)
Nature of Statistics
7 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
Note For Int To Statistics
No ratings yet
Note For Int To Statistics
24 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Physics
No ratings yet
Physics
6 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Engineering Data Analysis Notes
No ratings yet
Engineering Data Analysis Notes
6 pages
RVO-STATISTICS - Statistics - Introduction To Statistics IBBI
No ratings yet
RVO-STATISTICS - Statistics - Introduction To Statistics IBBI
93 pages
Introduction To Statistics - Note
No ratings yet
Introduction To Statistics - Note
16 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
Lesson 01
No ratings yet
Lesson 01
6 pages
Classification and Clustering: CS109/Stat121/AC209/E-109 Data Science
No ratings yet
Classification and Clustering: CS109/Stat121/AC209/E-109 Data Science
28 pages
Business Statistics Introduction. 1
No ratings yet
Business Statistics Introduction. 1
18 pages
Chapter 1 - 250119 - 072242
No ratings yet
Chapter 1 - 250119 - 072242
11 pages
Part1 141104090445 Conversion Gate01
No ratings yet
Part1 141104090445 Conversion Gate01
27 pages
Nature of Statistics
No ratings yet
Nature of Statistics
7 pages
Statistic Reviewer
No ratings yet
Statistic Reviewer
9 pages
Written Report Gathering and Organizing Data
No ratings yet
Written Report Gathering and Organizing Data
13 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Statatics Cha 1
No ratings yet
Statatics Cha 1
8 pages
Statistics Analysis With Software Application
No ratings yet
Statistics Analysis With Software Application
22 pages
Sta 321
No ratings yet
Sta 321
7 pages
Unit - 1: Statistics: Meaning, Significance & Limitations
No ratings yet
Unit - 1: Statistics: Meaning, Significance & Limitations
11 pages
Statistical Processes Are Usually Carried Out As A Part of Decision Making Procedures
No ratings yet
Statistical Processes Are Usually Carried Out As A Part of Decision Making Procedures
9 pages
BCSE497J Project I Report
No ratings yet
BCSE497J Project I Report
51 pages
Land Cover Classification System
No ratings yet
Land Cover Classification System
92 pages
Statistics 2ND Sem Reviewer
No ratings yet
Statistics 2ND Sem Reviewer
5 pages
PAS 111 Week 1
No ratings yet
PAS 111 Week 1
3 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
10 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Medical Image Processing
No ratings yet
Medical Image Processing
45 pages
Naive Bayes
No ratings yet
Naive Bayes
32 pages
Random Forest Thesis
100% (3)
Random Forest Thesis
6 pages
Fooling LIME and SHAP
No ratings yet
Fooling LIME and SHAP
14 pages
SVM Parameter Optimization Using Grid Search and G
No ratings yet
SVM Parameter Optimization Using Grid Search and G
8 pages
Text Mining
No ratings yet
Text Mining
12 pages
Locality-Constrained Linear Coding For Image Classification: (Jyang29, Huang) @ifp - Uiuc.edu
No ratings yet
Locality-Constrained Linear Coding For Image Classification: (Jyang29, Huang) @ifp - Uiuc.edu
8 pages
Pattern: Recognition
No ratings yet
Pattern: Recognition
25 pages
Machine Learning and OLAP On Big COVID-19 Data
No ratings yet
Machine Learning and OLAP On Big COVID-19 Data
10 pages
Basketball Free Throw - Biomechanic Analysis
No ratings yet
Basketball Free Throw - Biomechanic Analysis
13 pages
System Intelligence
No ratings yet
System Intelligence
3 pages
AdaBoost Is Consistent
No ratings yet
AdaBoost Is Consistent
22 pages
Root Cause Analysis of Incidents Using Text Clustering and Classification Algorithms
No ratings yet
Root Cause Analysis of Incidents Using Text Clustering and Classification Algorithms
12 pages
A System For Automated Detection of Ampoule Injection Impurities
No ratings yet
A System For Automated Detection of Ampoule Injection Impurities
10 pages
Celebal Summer t-1
No ratings yet
Celebal Summer t-1
34 pages
Machine Learning: III B. Tech I Semester Regular/Supplementary Examinations, December - 2023
No ratings yet
Machine Learning: III B. Tech I Semester Regular/Supplementary Examinations, December - 2023
8 pages
1 PB
No ratings yet
1 PB
20 pages
Uncertainty in Big Data Analytics
No ratings yet
Uncertainty in Big Data Analytics
16 pages
Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms
No ratings yet
Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms
7 pages
ML Merged PDF
No ratings yet
ML Merged PDF
14 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Glossary of Research Methods
From Everand
Glossary of Research Methods
Dr. Awadhesh Kishore
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet

STAT Module I Notes

Uploaded by

STAT Module I Notes

Uploaded by

MODULE:1

You might also like