0% found this document useful (0 votes)
12 views44 pages

SM Session 1 IPL 2024 Post Session Slides

Uploaded by

ipl04rujulak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views44 pages

SM Session 1 IPL 2024 Post Session Slides

Uploaded by

ipl04rujulak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Introduction to data and & Statistics

software for analysis

Session 1

Dr. Aditya Kumar Sahu


Data and Statistics

n Statistics
n Applications in Business and Economics
n Data
n Data Sources
n Descriptive Statistics
n Statistical Inference

2
Why do we need to study Mathematics (Probability), Statistics ?

3
What is Statistics?

l Both Probability and Statistics are science that helps us make better
decisions in business and economics as well as in other fields.

l Statistics teaches us how to summarize, analyze, and draw


meaningful inferences from data that then lead to improve decisions.

l These decisions that we make help us improve the running, for


example, a department, a company, court, the entire economy, etc.

4
What is Statistics?

n The term statistics can refer to numerical facts such as averages,


medians, percentages, and maximums that help us understand a
variety of business and economic situations.

n Statistics can also refer to the art and science of collecting, analyzing,
presenting, and interpreting data.

5
Applications in Business and Economics
Marketing
n Electronic point-of-sale scanners at retail checkout counters are used
to collect data for a variety of marketing research applications.

Law
n Jury selection and Courtroom Analytics (e.g. Verdict Prediction)
n Legal Research
n Crime Data Analysis- e.g. crime trends
n Case Preparation- analyzing evidence, preparing legal arguments etc.

6
Applications in Business and Economics
Accounting
n Public accounting firms use statistical sampling procedures when
conducting audits for their clients. Also, during income tax returns check.

Economics
n Economists use statistical information in making forecasts about the future
of the economy or some aspect of it.

Production
n A variety of statistical quality control charts are used to monitor the output
of a production process.

7
9
Data and Data Sets

n Data are the facts and figures collected, analyzed, and summarized
for presentation and interpretation.

n All the data collected in a particular study are referred to as the data
set for the study.

10
Data, Data Sets, Elements, Variables, and Observations
Variables

Company Stock Annual Sales Earnings per share


Exchange ($M) ($)
Dataram NQ 73.10 0.86
Observation
Element EnergySouth N 74.00 1.67
Names Keystone N 365.70 0.86
LandCare NQ 111.40 0.33
Psychemedic N 17.60 0.13
s

Data Set

11
The World is Data Rich

12
“Data is the new oil. It’s valuable, but if unrefined, it cannot really be
used. It has to be changed into gas, plastic, chemicals, etc. to create a
valuable entity that drives profitable activity; so must data be broken
down and analyzed for it to have value.”

- Clive Humby, UK Mathematician and Architect of Tesco’s Clubcard.

13
Categorical and Quantitative Data

n Data can be further classified as being categorical or quantitative.

n The statistical analysis that is appropriate depends on whether the


data for the variable are categorical or quantitative.

n In general, there are more alternatives for statistical analysis when


the data are quantitative.

14
Categorical Data

n Labels or names are used to identify an attribute of each element

n Often referred to as qualitative data

n Use either the nominal or ordinal scale of measurement

n Can be either numeric or nonnumeric

n Appropriate statistical analyses are rather limited

15
Quantitative Data

n Quantitative data indicate how many or how much.

n Quantitative data are always numeric.

n Ordinary arithmetic operations are meaningful for quantitative data.

16
Cross-Sectional Data

Cross-sectional data are collected at the same or approximately the


same point in time.

Example
n Data detailing the number of building permits issued in November
2023 in each of the states of India.

17
18
Time Series Data

Time series data are collected over several time periods.

Example
Data detailing the number of building permits issued in Delhi, in each
of the last 36 months.

Graphs of time series data help analysts understand


n what happened in the past
n identify any trends over time, and
n project future levels for the time series
19
Time Series Data

Graph of Time Series Data

20
Scales of Measurement

n Scales of measurement include


¨ Nominal
¨ Ordinal
¨ Interval
¨ Ratio

n The scale determines the amount of information contained in the


data.
n The scale indicates the data summarization and statistical analyses
that are most appropriate.

21
Scales of Measurement

Nominal scale
n Data are labels or names used to identify an attribute of the element.
n A nonnumeric label or numeric code may be used.

Example
Students of a university are classified by the school in which they are
enrolled using a nonnumeric label such as Business, Humanities, Education,
and so on.
Alternatively, a numeric code could be used for the school variable (e.g. 1
denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).

22
Scales of Measurement

Ordinal scale
n The data have the properties of nominal data and the order or rank of
the data is meaningful.
n A nonnumeric label or numeric code may be used.

Example
Students of a university are classified by their class standing using a
nonnumeric label such as Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for the class standing
variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on).

23
Scales of Measurement

Interval scale
n The data have the properties of ordinal data, and the interval
between observations is expressed in terms of a fixed unit of measure.
n Interval data are always numeric.

Example
Melissa has an SAT score of 1985, while Kevin has an SAT score of
1880. Melissa scored 105 points more than Kevin.

24
Scales of Measurement

Ratio scale
¨ Data have all the properties of interval data and the ratio of two values is
meaningful.
¨ Ratio data are always numerical.
¨ Zero value is included in the scale.

Example:
Price of a book at a retail store is $ 200, while the price of the same book sold
online is $100. The ratio property shows that retail stores charge twice the
online price.

25
Exercise

• Nominal Scale - groups or classes


ü Gender, color, professional classification, etc.
• Ordinal Scale - order matters
ü Ranks (top ten videos, products, etc.)
• Interval Scale - difference or distance matters – has arbitrary zero value.
ü Temperatures (0F, 0C)
• Ratio Scale - Ratio matters – has a natural zero value.
ü Salaries, weight, volume, area, length, etc.

26
Scales of Measurement

Data

Categorical Quantitative

Non-
Numeric Numeric
numeric

Nominal Ordinal Nominal Ordinal Interval Ratio

27
PRIMARY DATA AND SECONDARY DATA

28
n Secondarydata
Secondary Data
n Pre-existing data not gathered for purposes of the current research
¨ Not ‘new’ data – ‘second hand’

¨ ‘Back up’ data – secondary in use

• Data gathered by another source (e.g. research study, survey, interview)

• Secondary data is gathered BEFORE primary data. WHY?

• Because you want to find out what is already known about a subject before
you decline into your own investigation. WHY?

• Because some of your questions can possibly have been already answered
by other investigators or authors.
Primary Data

n Data never gathered before.

n Advantage: find data you need to suit your purpose

n Disadvantage: usually more costly and time consuming than collecting


secondary data
n Collected after secondary data is collected
Types of Primary Data

Demographic/Socioeconomic
§ Age, Gender, Income, Marital Status, Occupation
Psychological/Lifestyle
§ Activities, Interests, Personality Traits
Attitudes/Opinions
§ Preferences, Views, Feelings, Inclinations
Awareness/Knowledge
§ Facts about product, features, price, uses
Intentions
§ Planned or Anticipated Behavior
Motivations
§ Why People Buy (Needs, Wants, Wishes, Ideal-Self)
Behavior
§ Purchase, Use, Timing, Traffic Flow
`
Primary Data Can Be gathered By:

• Communication Methods
§ Interacting with respondents
§ Asking for their opinions, attitudes, motivations, characteristics

• Observation Methods
§ No interaction with respondents
§ Letting them behave naturally and drawing conclusions from their actions
…but before we delve deeper

Chat GPT and LLMs

• Google-Talk to books • Scribd • slideshare.net


• Google Alerts • Digg it • Forbes.com
• Wikipedia • WSJ • Facebook
• LinkedIn • Yahoo Answers • You Tube
Data

• Sources • .. And it answers


• Units sold − IDC, Nielson − What is the EMEA revenue of X in
Q2021?
• Revenue − Gartner, Forrestor

− TBRI, SEC filings − In which areas did Google


improve its performance over last
• Company Data year?
− Finance.yahoo/Google/Reuters/ − What was the share price
D&B Hoovers performance and why?

− Standard & Poor − What was the size of the


• Vertical / pharmaceutical market and which
Economic − Datamonitor segments are supposed to grow?
numbers − Economist, EIU − What are the forecasted housing
− Consensus economics start numbers for US in FY10?
Information

• Sources • .. And it answers


− What is the estimated IT spend/sales
• Peer Comparison revenue?
− Harte Hanks − What is the reason behind a fall in
revenue?
− What is the desktop PC share at Shell
Canada?

− One Source − How is the cost structure for Dell


• Projections/ different from HP?
Higher Analysis − Compustat − What are the inventory days for
− Bloomberg Unilever in Japan?

− What was the size of the pharmaceutical


− IDC, Gartner, Ovum, Forrerstor, market and which are the fastest growing?
• Opinions/ AMR,PAC, Mckinsey Quaterly ,
Articles Economist,EBSCO,Proquest, − What are the trends in Supply chain
Outsourcing and Logistics?
Factiva
Data Sources

Data Available From Internal Company Records


Record Some of the Data Available

Employee records Name, address, social security number


Production records Part number, quantity produced, direct labor cost, material
cost
Inventory records Part number, quantity in stock, reorder level, economic order
quantity
Sales records Product number, sales volume, sales volume by region
Credit records Customer name, credit limit, accounts receivable balance
Customer profile Age, gender, income, household size
36
Data Acquisition Considerations

Time Requirement
n Searching for information can be time consuming.
n Information may no longer be useful by the time it is available.

Cost of Acquisition
n Organizations often charge for information even when it is not their
primary business activity.

Data Errors
n Using any data that happen to be available or were acquired with
little care can lead to misleading information.
37
Using Statistics (Two Categories)

l Descriptive Statistics l Inferential Statistics


ü Collect ü Predict and forecast
values of population
ü Organize
parameters
ü Summarize
ü Test hypotheses about
ü Display values of population
ü Analyze parameters
ü Make decisions

38
Descriptive Statistics

n Most of the statistical information in newspapers, magazines, company


reports, and other publications consists of data that are summarized and
presented in a form that is easy to understand.
n Such summaries of data, which may be tabular, graphical, or numerical,
are referred to as descriptive statistics.

Example
The manager of Hudson Auto would like to have a better understanding of
the cost of parts used in the engine tune-ups performed in her shop. She
examines 50 customer invoices for tune-ups. The costs of parts, rounded to
the nearest dollar, are listed on the next slide.

39
Hudson Auto Repair

Sample of Parts Cost ($) for 50 Tune-ups


91 78 93 57 75 52 99 80 97 62

71 69 72 89 66 75 79 75 72 76

104 74 62 68 97 105 77 65 80 109

85 97 88 68 83 68 71 69 67 74

62 82 98 101 79 105 79 69 62 73

40
Tabular Summary: Frequency and Percent Frequency

Parts Cost ($) Frequency Percent Frequency

50-59 2 4%
60-69 13 26%
70-79 16 32%
80-89 7 14%

90-99 7 14%
100-109 5 10%
TOTAL 50 100%
41
Graphical Summary: Histogram

Example: Hudson Auto


Tune-up Parts Cost
18

16

14

12
Frequency

10

0
50-59 60-69 70-79 80-89 90-99
Parts Cost ($)

42
Numerical Descriptive Statistics

n The most common numerical descriptive statistic is the mean (or


average).

n The mean demonstrates a measure of the central tendency, or central


location of the data for a variable.

n Hudson’s mean cost of parts, based on the 50 tune-ups studied is $79


(found by summing up the 50 cost values and then dividing by 50).

43
Softwares

n MS- Excel
n R- Software
n SPSS IBM

44

You might also like