Lecture 1
Lecture 1
Lecture 1
IN BUSINESS WITH R
Denis Marinšek
BASIC CONCEPTS
Before the tax changes are introduced, the Ministry of Finance wants to analyze the
wealth of citizens over the age of 18. The data shows that 75% of citizens own stocks
and bonds worth up to EUR 380, that the average net income is EUR 1670 and that 36%
of citizens own real estate.
Examples:
- Gender.
- How much do you agree with the statement "I'm addicted to YouTube"?
- What is your favorite way to prepare your steak?
- Number of household members.
- Revenue of a company.
3
BASIC CONCEPTS
Research methods
• Qualitative and quantitative methods.
• Inferential statistics methods attempt to go beyond the insights that follow directly
from the data. For example, they attempt to draw inferences about the parameters
of the population from the sample data.
4
BASIC CONCEPTS
Parameters
• Actual (population) value of the parameter: 𝛤𝑦
• Parameter estimate: 𝑔𝑦
𝑁 𝑛
1 1
𝜇 = 𝑦𝑖 𝑦ത = 𝑦𝑖
𝑁 𝑛
𝑖=1 𝑖=1
𝑁 𝑁
1 1
𝜎 2 = 𝑦𝑖 − 𝜇 2
𝑠2 = 𝑦𝑖 − 𝑦ത 2
𝑁 𝑛−1
𝑖=1 𝑖=1
5
BASIC CONCEPTS
Before the tax changes are introduced, the Ministry of Finance wants to
analyze the wealth of citizens over the age of 18. The data shows that 75% of
citizens own stocks and bonds worth up to EUR 395, that the average net
income is EUR 1620 and that 34% of citizens own real estate.
Frequency
2000
Symmetric distribution.
Skewed distribution.
1000
0
−4 −2 0 2 4
Score
4000
3000
Frequency
2000
1000
7
BASIC CONCEPTS
Estimates of parameters
Bimodal Multimodal
3000
Frequency
- Bimodal distribution
- Multimodal distribution 1000
8
BASIC CONCEPTS
Estimates of parameters
• Central tendency: arithmetic mean
• Variability: range
9
BASIC CONCEPTS
Estimates of parameters
• Variability: deviation (from the mean)
𝑦𝑖 − 𝑦ത
ത 2
(𝑦𝑖 − 𝑦)
𝑖=1
10
BASIC CONCEPTS
Estimates of parameters
• Variability: standard
deviation
𝑛
1
𝑠= ത 2
(𝑦𝑖 − 𝑦)
𝑛−1
𝑖=1
• Variability: coefficient of
variation
𝑠
% 𝑐𝑣 = ∙ 100
𝑦ത
11
BASIC CONCEPTS
Example: Managerial Economics
We randomly select 31 students and look at their results from the Managerial
Economics course (ME.csv):
12
BASIC CONCEPTS
Example: Managerial Economics
13
BASIC CONCEPTS
Example: Managerial Economics
14
BASIC CONCEPTS
Example: Managerial Economics
15
BASIC CONCEPTS
Normal distribution
Properties of the normal distribution:
• Unimodal and symmetric
• 𝜇 = 𝑀𝑜 = 𝑀𝑒
• On the interval from 𝜇𝑦 − 𝑘𝜎𝑦 to 𝜇𝑦 + 𝑘𝜎𝑦 is a known
percentage of values
16
GRAPHICAL ANALYSIS
17
GRAPHICAL ANALYSIS
Example: Nervousness
18
GRAPHICAL ANALYSIS
Example: Nervousness
19
GRAPHICAL ANALYSIS
Example: Nervousness
20
GRAPHICAL ANALYSIS
Example: Nervousness
21
GRAPHICAL ANALYSIS
Example: Movie
22
GRAPHICAL ANALYSIS
Example: Movie
23
GRAPHICAL ANALYSIS
Example: Movie
24
GRAPHICAL ANALYSIS
Example: Movie
25
GRAPHICAL ANALYSIS
Example: Movie
26
GRAPHICAL ANALYSIS
Example: Movie
27
GRAPHICAL ANALYSIS
Example: Movie
28
DATA CLEANING
Example: Managerial Economics
29
DATA CLEANING
Example: Managerial Economics
30
DATA CLEANING
Example: Managerial Economics
31
DATA CLEANING
Example: Managerial Economics
32
DATA CLEANING
Example: Trust
33
DATA CLEANING
Example: Trust
34
DATA CLEANING
Example: Trust
35
PRACTICAL EXAMPLE
We collected some data for a group of 35 athletes between the ages of 18 and 25
(Maraton.csv).
a) Import the data into RStudio (read.table) and display it with the head function.
b) Define the unit of study, define the variables, and explain which measurement scales they belong to.
c) Estimate and explain the arithmetic mean and the standard deviation for the variable Height.
d) Change the variable Gender into a factor.
e) Estimate the descriptive statistics for the Glucose separately for each gender. Use the describeBy
{psych} function.
f) Use the stat.desc {pastecs} function to describe the variables and check that you know all the
estimated parameters. Which variable has the greatest variability?
g) For the variable Hematocrit, plot and describe the frequency distribution using the hist function.
h) Draw a boxplot for the variable Glucose, separated by gender. Use the function geom_boxplot
{ggplot2}.
36