0% found this document useful (0 votes)
95 views58 pages

1. Introduction SMD

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 58

1

INTRODUCTION TO
STATISTICS
Dr.Bharath V MFM., M.Com., Ph.D
Assistant Professor
Kristu Jayanti College
Bengaluru

[email protected]
2

Course Content
• Unit 1: Introduction to Statistics
• Definition, divisions of statistics, importance, functions, scope, limitations
of statistics; Collection and Classification of data; create and interpret
diagrams and graphs; Construction of frequency distribution table.
• Unit 2: Univariate Data Analysis I – Measures of Central Tendency
• Introduction: measures of central tendency: simple arithmetic mean using
short cut method and step deviation method, Incorrect values, missing
values & missing frequencies problems, combined mean, weighted
arithmetic mean; median: discrete and continuous series problems,
missing frequencies, quartiles; mode: grouping and analysis table
method, interpolation formula.
• Unit 3: Univariate Data Analysis II – Measures of Dispersion
• Introduction: measures of dispersion: range, inter-quartile range, quartile
deviation, standard deviation problems using assumed mean and step
deviation method, coefficient of variation; sampling techniques and its
types.
3

• Unit 4: Bivariate Data Analysis


• Meaning: correlation, Karl Pearson’s coefficient of
correlation, Spearman’s rank correlation; regression
analysis.

• Unit 5: Time Series


• Meaning and components: measurement of trend values
using moving average and least square method.

• Unit 6: Index Numbers


• Classification: construction of index numbers, methods of
constructing index numbers, simple aggregate method,
simple average of price relative method; weighted index
method: Laspeyre’s method, Paasche’s method, Fischer’s
ideal method, consumer price index number.
4

Reference
• Gupta, S.P. (2006). Statistical Methods. New Delhi: Himalaya Publishing
House.
• Sathyaprasad, B.G. & Chikkodi. (2013). Quantitative methods for business
- II. New Delhi: Himalaya Publishing House.
• Rajesh.S. Rajaghatta and Gangadharappa.N.H.(2014), Quantitative
methods for business – II, Kalyani Publishers
• Aggarwal.S.L. and Bhardwaj.S.L. (2011), Business Statistics, Kalyani
publishers
• Sharma.J.K (2015), Fundamentals of business statistics, Vikas publishing
house Pvt. Ltd.
• Srivatsava TN, Shailaja Rego (2008), Statistics for Management, Tata
McGraw Hill.
5

Introduction
• The word ‘Statistics’ has been derived from the latin word
‘Status’, Italian word ‘Statista’ or German word ‘Statistik’
which relate to "state" or "politics." Historically, it referred to
the collection of data about the state or government.
• Statistics is a tool in the hands of mankind to translate
complex facts into simple and understandable statements
of facts.
• As a Discipline: Statistics is the science of collecting,
analyzing, interpreting, and presenting data for informed
decision-making.
• As Numerical Data: Statistics refers to quantitative facts
or figures, such as population counts, sales figures, or
survey results.
6

Meaning and Definition of Statistics


• "Statistics is the study of the principles and methods for
the reduction of data.” - Sir Ronald A. Fisher

• "Statistics may be defined as the science of collecting,


organizing, presenting, analyzing, and interpreting
numerical data for the purpose of making more effective
decisions.“ - Croxton and Cowden

• "Statistics is the science that deals with the collection,


classification, and interpretation of numerical facts or
data.“ - Seligman
7

Key Elements in the Definition


• Collection of Data: Gathering raw data from reliable sources
through surveys, experiments, or observation.
• Organization of Data: Structuring raw data into manageable
formats like tables, charts, or graphs.
• Analysis: Applying statistical techniques to identify patterns,
trends, and relationships in data.
• Interpretation: Drawing meaningful conclusions from the
data analysis.
• Presentation: Communicating findings effectively using
visual aids and summary statistics.
8

Two Aspects of Statistics


• Descriptive Statistics
• Focuses on summarizing and describing data using measures such
as mean, median, mode, and standard deviation.

• Inferential Statistics
• Uses sample data to make generalizations or predictions about a
larger population through hypothesis testing and confidence
intervals.
9

Characteristics of Statistics
• Quantitative Nature: Statistics primarily deals with
numerical data.

• Systematic Approach: Involves structured methods for


data handling and interpretation.

• Affected by Multiplicity of Causes: Statistical data is


influenced by several factors, making it essential to
analyze in context.

• Presents General Trends: Statistics identifies patterns or


averages rather than individual details.
10

Scope of Statistics Across Disciplines

• Business and Economics:


• Market research and consumer behavior analysis.
• Forecasting demand, supply, and pricing.
• Financial risk analysis and stock market predictions.

• Science and Research:


• Designing experiments and analyzing results.
• Validation of scientific hypotheses.
• Studying trends in environmental data, genetics, and physics.
11

Medicine and Healthcare


• Clinical trials to evaluate the effectiveness of drugs and
treatments.
• Healthcare quality improvement using statistical tools.

Social Sciences
• Studying population demographics and social behavior.
• Conducting surveys and public opinion polls.
• Measuring economic inequalities, literacy rates, and
employment statistics.
12

Education
• Analyzing student performance and evaluating educational
policies.
• Developing standardized testing metrics.

Government and Public Policy


• Census data analysis for planning and resource allocation.
• Crime rate statistics for law enforcement strategies.
• Policy evaluation and impact assessment.

Engineering and Quality Control


• Monitoring and improving product quality.
• Reliability testing of machines and equipment.
• Statistical process control in manufacturing.
13

Functions of Statistics
Collection of Data
• Statistics provides methods for collecting reliable and relevant
data.
• Ensures systematic and organized approaches like surveys,
experiments, and observational studies.
Organization of Data
• Helps arrange raw data into meaningful formats, such as tables,
charts, and graphs, for easy understanding and analysis.
Summarization of Data
Simplifies complex datasets using descriptive measures:
• Central Tendency: Mean, median, mode.
• Dispersion: Range, variance, standard deviation.
14

Data Analysis
• Identifies patterns, trends, and relationships within datasets.
• Facilitates hypothesis testing and decision-making
processes.

Interpretation of Results
• Draws meaningful conclusions from analyzed data.
• Provides insights into the underlying trends and variability.

Prediction and Forecasting


• Uses historical data to predict future trends and outcomes.
• Essential in areas like economics, weather forecasting, and
business planning.
15

Decision-Making
• Offers quantitative evidence to support strategic and
operational decisions in various fields, including business,
government, and healthcare.

Quality Control
• Assists in maintaining and improving the quality of products
and services.
• Applies statistical methods like control charts in
manufacturing and service industries.

Measuring Uncertainty
• Provides tools to quantify and manage uncertainty using
probability and confidence intervals.
• Enables risk assessment and mitigation.
16

Comparison and Ranking


• Enables comparison of different groups, products, or
phenomena using statistical measures.
• Provides a basis for ranking and benchmarking.

Designing Experiments
• Offers methods for creating experiments to test hypotheses
effectively.
• Ensures proper sampling and control of variables to reduce
bias.

Understanding Relationships
• Analyzes correlations and causations between variables.
• Supports the study of how one factor influences another,
such as in regression analysis.
17

Importance of Statistics
Informed Decision-Making
• Statistics provides quantitative evidence for making informed decisions
in business, government, healthcare, and personal life.
• Example: Businesses use sales data to determine product pricing and
marketing strategies.
Understanding Data
• Helps summarize, simplify, and interpret complex datasets into
meaningful information.
• Example: A survey with thousands of responses can be condensed into
averages, percentages, and visual charts for better understanding.

Prediction and Forecasting


• Uses historical data to predict future trends, aiding in strategic planning.
• Example: Weather forecasting, stock market trends, and demand
planning in supply chains.
18

• Supports Research
• Forms the backbone of scientific research by validating
hypotheses and analyzing experimental results.
• Example: In medicine, statistics is used to determine the
efficacy of a new drug in clinical trials.

• Understanding Relationships
• Explores correlations and causations between variables to
guide actions.
• Example: In marketing, understanding how advertising spend
affects sales.

• Evaluation and Monitoring


• Assists in tracking progress and evaluating the effectiveness of
initiatives or programs.
• Example: Monitoring vaccination rates to assess the success of
public health campaigns.
19

Limitations of Statistics
• Statistics does not deal with individual measurement.

• Statistics deals only with quantitative characteristics

• Statistical results are true only on an average.

• Statistics can be misused

• Statistical relations do not necessarily bring out the


‘cause and effect’ relationship between phenomena.
20

Data Collection
• Data collection involves gathering accurate and relevant
information to achieve specific objectives.

• Types of Data
• Primary Data: Data collected firsthand for a specific
purpose.
• Example: A researcher conducting a survey to understand
customer satisfaction in a supermarket.
• Secondary Data: Data already collected by someone
else, used for analysis.
• Example: Using census data to analyze population growth trends.
21

Methods of Data Collection


• Observation: Observing phenomena, events, or behavior
without interference.
• Example: Counting the number of cars passing through a toll plaza
during peak hours.

• Surveys/Questionnaires: Collecting data by distributing


structured questions.
• Example: A company emailing a feedback form to its customers
after a purchase.

• Interviews: Conducting direct discussions to gather


qualitative data.
• Example: A journalist interviewing residents to understand the
impact of a new law.
22

• Experiments
• Conducting controlled experiments to gather data.
• Example: Testing a new fertilizer on crops and recording the
growth rate.

• Existing Sources
• Utilizing previously published reports, articles, or
databases.
• Example: Using weather data from a government website for
climate change studies.
23

Advantages of Primary Data


• Specific to the Research Objective:
• Data is tailored to meet the specific needs of the study.
• Example: A survey designed to study customer preferences.
• Up-to-Date and Relevant:
• Primary data is collected in real-time, ensuring its relevance and
currency.
• Greater Control:
• The researcher has full control over the data collection process,
including methods, timing, and scope.
• Reliable and Accurate:
• Data is directly collected, reducing the likelihood of errors or biases
present in second-hand sources.
• Originality:
• Data is firsthand, making it unique and not previously used in other
studies.
24

Disadvantages of Primary Data


• Time-Consuming:
• Collecting primary data requires significant time for planning,
execution, and analysis.
• Example: Conducting a nationwide survey.
• Costly:
• It often involves expenses for surveys, travel, staff, and equipment.
• Limited Scope:
• Since it's collected for a specific purpose, it may not be applicable to
other research areas.
• Risk of Bias:
• Errors in data collection methods, interviewer bias, or poorly designed
questionnaires can lead to unreliable results.
• Accessibility Challenges:
• Some sources of primary data, like sensitive information, may be
difficult to obtain.
25

Advantages of Secondary Data


• Time-Saving:
• Secondary data is readily available, saving time on data collection.
• Example: Using government census data for population statistics.
• Cost-Effective:
• No additional resources are required for data collection since it’s
already available.
• Extensive Coverage:
• Secondary data often covers large populations or long time
periods, making it useful for macro-level analysis.
• Convenient for Comparison:
• Facilitates historical analysis and comparisons across studies or
regions.
26

Disadvantages of Secondary Data


• Not Specific to Research Objectives:
• The data may not perfectly align with the research questions or
objectives.
• Possibly Outdated:
• The data may no longer be relevant, especially in rapidly changing
fields.
• Example: Market reports from several years ago.
• Reliability Concerns:
• The credibility of the data depends on the source, and errors or biases
in the original study may persist.
• Inflexibility:
• The researcher cannot modify or expand the data to suit specific
needs.
• Lack of Control Over Quality:
• Errors, or limitations in the original data collection process may reduce
its usefulness.
27

Comparison Table
28

Data Classification
• Basis of Classification

• Chronological Classification:
• Grouping data based on time periods.
• Example: Monthly sales data of a store for the year.

• Geographical Classification:
• Grouping data based on location.
• Example: Population distribution by state or region.
29

• Qualitative Classification:
• Grouping data by non-numeric attributes.
• Example: Categorizing employees by job roles (e.g., manager,
developer).

• Quantitative Classification:
• Grouping data by numeric values.
• Example: Classifying families by income brackets.
30

Quantitative Data
• Quantitative data refers to information that can be
measured or expressed numerically. It is often used in
mathematical and statistical analysis.

• Numerical in nature.
• Can be discrete or continuous.
• Suitable for mathematical computations and comparisons.
31

Types of Quantitative Data


• Discrete Data
• Represents countable quantities.
• Values are distinct and separate (integers).
Examples:
• Number of students in a classroom.
• Total cars in a parking lot.
• Number of goals scored in a match.

• Continuous Data
• `Represents measurable quantities.
• `Can take any value within a range (includes fractions/decimals).
• Examples:
• Height of individuals (e.g., 5.8 feet).
• Temperature readings (e.g., 23.5°C).
• Time taken to complete a task (e.g., 2.45 hours).
32

Qualitative Data
• Qualitative data refers to non-numerical information that
describes qualities, characteristics, or categories. It is
often used in descriptive analysis.

• Characteristics:

• Descriptive in nature.

• Cannot be directly measured numerically.

• Focuses on categories or attributes.


33

Types of Qualitative Data


Nominal Data
• Represents categories without any inherent order or ranking.
• Used for labeling or identifying differences.
• Examples:
• Gender (male, female).
• Eye color (blue, green, brown).
• Types of vehicles (car, truck, motorcycle).

Ordinal Data
• Represents categories with a specific order or ranking.
• Differences between categories may not be measurable or equal.
• Examples:
• Customer satisfaction levels (very satisfied, satisfied, neutral, dissatisfied,
very dissatisfied).
• Educational qualifications (high school, undergraduate, graduate).
• Star ratings for a product (1 star, 2 stars, etc.).
34

Comparison Between Quantitative and


Qualitative Data
35

Types of Classification (variable)


• Univariate Classification: Analysis involving a single variable.
• Example: Heights of students in a class.

• Bivariate Classification: Analysis involving two variables.


• Example: Examining the relationship between hours studied and
exam scores.

• Multivariate Classification: Analysis involving more than two


variables.
• Example: Analyzing the impact of age, income, and education on
purchasing behavior.

• Cross-tabulation: Creating a matrix to show relationships


between two or more variables.
• Example: Gender vs. product preference in a marketing survey.
36

Mode of Presentation of Data


• 1. Textual presentation.

• 2. Tabular presentation or tabulation.

• 3. Diagrammatic presentation.
37

Textual Presentation
• Textual presentation is a method of presenting statistical
data in the form of written or descriptive text.

• It is used when the dataset is small or when the purpose


is to provide a summary, explanation, or interpretation of
the data.

• This method relies on words rather than tables, graphs, or


charts to explain the findings.

• It is ideal for simple data or when emphasizing key points


without overwhelming the reader with numbers or visuals.
38

Example
• Dataset
• A survey was conducted among 100 women entrepreneurs in a city
regarding challenges in business.
• Results: 30% reported financial difficulties.
• 25% faced administrative hurdles.
• 20% cited lack of access to raw materials.
• 15% mentioned legal issues.
• 10% identified market competition as a challenge.
• Textual Presentation:
• "In a recent survey of 100 women entrepreneurs, 30% of respondents
identified financial difficulties as the most significant challenge.
Administrative hurdles were reported by 25% of participants, while
20% faced difficulties in accessing raw materials. Legal issues were
highlighted by 15% of the respondents, and 10% cited market
competition as a barrier to their business success."
39

2. Tabular Presentation or Tabulation.


• Tabular presentation refers to the systematic arrangement
of data in rows and columns within a table.

• It is used to summarize, organize, and present data


clearly and concisely for easier interpretation and
analysis.

• The tabular format allows for quick comparisons,


identification of patterns, and understanding of trends
within the dataset.
40

Format of a Blank table


41

Example
• In 2014, out of total of 2000 students in a college, 1400
were for Graduation and rest for Post-Graduation(PG).
Out of 1400 graduate students 100 were girls. However,
in all there were 600 girls in the college.

• In 2019, number of graduate students increased to 1700


out of which 250 were girls, but the number of PG
students fall to 500 of which only 50 were boys.

• In 2024, out of 800 girls 650 were for graduation, whereas


the total number of graduates was 2200. the number of
boys and girls in PG classes are equal.

• Represent the above information in tabular form.


42

Diagrammatic Presentation
• Diagrammatic presentation refers to the use of visual aids
such as charts, graphs, and diagrams to represent data.

• This method enhances understanding by making data


easier to interpret and analyze through a visually
engaging format.

• It is particularly useful for comparing datasets, identifying


trends, and illustrating relationships between variables.
43

Types of Diagrammatic Presentations


• Line Graph:
• Represents data points connected by a line.
• Ideal for showing trends or changes over time.
• Bar Diagram:
• Represents data using rectangular bars of equal width and varying
heights.
• Used to compare different categories or variables.
• Pie Chart:
• A circular diagram divided into sectors representing proportions of
the total dataset.
• Useful for showing percentage distributions.
44

Line graph
• Ex. A test was administered on students of a class X to
demonstrate the effect of practice on learnings. The data
so obtained may be studied from the following table

Trail No. 1 2 3 4 5 6 7 8 9 10 11 12

Score 4 5 8 8 10 13 12 12 14 16 16 16

Draw a line graph for the representation and interpretation of the above data.
45

BAR DIAGRAM
• Simple bar diagram
• Multiple or grouped bar diagram
• Subdivided or component bar diagram
• Percentage subdivided bar diagram
46

Simple bar diagram


• Ex: Draw a bar chart of the procurement of rice (in tons) in
an Indian state

Year 2017 2018 2019 2020 2021 2022 2023


Rice (in tons) 4500 5700 6100 6500 4300 7800 8500
47

Multiple or Group bar diagram


• Ex. Represent the following data by a suitable diagram
showing the difference between profits and costs.

• Table: Profits and cost of a firm ( in ‘000)


Year Profits Cost
2018 22.0 19.5
2019 27.3 21.7
2020 28.2 30.0
2021 30.3 25.6
2022 32.7 26.1
2023 33.3 34.2
2024 36.8 38.6
48

Subdivided or component bar diagram


• Ex: Represent the following data of the development
expenditure of govt. during 2021-22, 2022-23, 2023-24 by
bar diagram
Year Loans and Advance Capital Revenue Total

2021-22 8,601 3,787 3,477 15,865


2022-23 10,335 4,456 4,036 18,827
2023-24 11,549 4,803 3,709 20,061
49

Percentage sub-divided bar diagram


• Ex: following are the heads of income of Railways during
2023 and 2024. represent the above data by a bar chart.

Heads 2023 ( in Crores) 2024 ( in Crores)


Passenger 26 31
Goods 40 39
Others 4.5 3.5
50

Frequency Distribution
• Frequency distribution is a tabular or graphical
representation of data that shows the number of times
each value or group of values (class) occurs in a dataset.

• It helps organize raw data into a structured format,


making it easier to analyze and interpret patterns or
trends.
51

Components of a Frequency Distribution Table

Class Intervals:
• Represents a range of data values.
• Example: For scores, intervals might be 0–10, 11–20, etc.

Frequency (f):
• The number of observations within each class interval.

Cumulative Frequency:
• The running total of frequencies as you move through
class intervals.
52

• Midpoint: the midpoint of each class interval, calculated


as

• Percentage Frequency
• The proportion of the total frequency for each class,
expressed as a percentage
53

Types of Frequency Distribution


Discrete Frequency Distribution:
• Used for discrete data where values are distinct and
countable.
• Example: Number of students scoring specific marks in an
exam.

Continuous Frequency Distribution:


• Used for continuous data, grouped into class intervals.
• Example: Heights of individuals grouped into intervals
(150–160 cm, 160–170 cm, etc.).
54

Cumulative Frequency Distribution:


• Shows cumulative totals up to each class interval.
• Can be less than cumulative frequency or greater than
cumulative frequency.

Relative Frequency Distribution:


• Represents frequencies as a proportion of the total.
• Example: Instead of absolute numbers, it shows
percentages or fractions.
55

Steps to Construct a Frequency Distribution Table


1. Collect and Organize Data:
• Gather raw data and arrange it in ascending order.
2. Decide Class Intervals (for grouped data):
• Determine the range (difference between the highest and lowest
values).
• Select an appropriate class size and number of intervals.

3. Tally the Data:


• Count the frequency of data points within each class interval.

4. Calculate Optional Metrics:


• Compute cumulative frequencies, midpoints, or percentage
frequencies if needed.
5. Present the Table:
• Display the data in a structured table format.
56
57

Problem
• A survey was conducted to record the ages
of 30 people. The ages (in years) are as
follows:

• 23, 25, 30, 21, 24, 28, 27, 23, 25, 22,
• 29, 30, 26, 31, 28,24, 22, 21, 29, 23,
• 25, 26, 28, 30, 27, 22, 31, 29, 28, 24.
58

• Construct the following:

• Discrete Frequency Distribution


• Continuous Frequency Distribution
• Cumulative Frequency Distribution
• Relative Frequency Distribution

You might also like