Slides Sessions 1-2
Slides Sessions 1-2
Data Analysis
BBA & BBA-BIR
Academic Year 2023-2024
Prof. Maud Pindard-Lejarraga
[email protected]
1
SESSION 1
Introduction to the course
and to its methodology
2
AGENDA FOR TODAY
1. 2. 3.
Introducing ourselves Syllabus What is statistics?
3
1. INTRODUCING OURSELVES
4
Maud Pindard-Lejarraga Office 22.19
https://www.linkedin.com/maud-pindard-lejarraga
5
What about you?
6
2. SYLLABUS
Structure and rules
7
Structure of the course
35 sessions but:
• 20 plenary sessions:
• 15 ‘lab’ sessions (although most sessions are practical)
• Half-groups, I will upload the calendar later today or
tomorrow
• If you need to be in a specific group for justified reasons let
me know (send me an email or a message through
Blackboard) TODAY
8
Rules of the IE game
• Be on time → if you’re more than 5 minutes late: absence
• Do not leave the room in the middle of the session (if you have
real urgent reasons, let me know in advance) → absence
• Refrain from side conversations
• Use your laptop for course-related purposes
• Do not use your phone
• Do not eat in the classroom
10
Evaluation system
• Final Exam: 40% of the final grade. You need a minimum
grade of 3.5/10 in the final to pass the course. May 20th
• Group project: 25% (includes peer grading to prevent free-
riding)
• Individual Assignments: 20% (quizzes, exercises, etc…)
• Participation: 15% (ACTIVE participation)
11
Group Project
Identification of a real-world problem, taken
from business or any other field of interest:
problem-solving will entail the selection of a
database (for example on Kaggle), the
statistical analysis of the data, and the final
interpretation of the obtained results.
Further instructions will be given around
mid-course
80% of the grade will be given to the project
itself and be the same for all group
members; 20% will be based on peer
evaluation of your implication in the group
work.
Deadline: May 19th, 7pm
12
Teamwork
• A team is a small number of
people with
complementary skills who
are committed to a
common purpose, set of
performance goals, and
approach for which they
hold themselves mutually
accountable
Participation
14
Materials
15
Why Python?
16
Remember
17
3. WHAT IS STATISTICS?
An overview of the course
18
Statistics and Data Analysis
Introduction
19
Session Goals
Define Statistics and their uses
Explain how decisions are often based on
incomplete information
Explain key definitions:
• Population vs. Sample
• Parameter vs. Statistic
• Descriptive vs. Inferential Statistics
20
Why do we study Statistics?
21
22
How do we interpret data?
• We live in an era of post-truth: the more
data we have, the more we distrust them.
25
Study Design
26
Results
The table below shows the distribution of patients with
good outcomes at 6-month follow-up.
Note that 7 patients dropped out of the study: 3 from the
treatment and 4 from the control group.
27
Understanding the results
28
Key Definitions
Population Sample
31
Random Sampling
Simple random sampling is a
procedure in which
• each member of the population is chosen
strictly by chance,
• each member of the population is equally
likely to be chosen,
and
• every possible sample of n objects is
equally likely to be chosen
The resulting sample is called a random
sample
We often use non-random sampling
(for a variety of reasons).
32
Descriptive and Inferential Statistics
35
From exploratory analysis to inference
Sampling is natural.
Think about sampling something you are
cooking - you taste (examine) a small part of
what you’re cooking to get an idea about the
dish as a whole.
When you taste a spoonful of soup and decide
the spoonful you tasted isn’t salty enough, that’s
exploratory analysis.
If you generalize and conclude that your entire
soup needs salt, that’s an inference.
For your inference to be valid, the spoonful you
tasted (the sample) needs to be representative of
the entire pot (the population).
If your spoonful comes only from the surface and
the salt is collected at the bottom of the pot, what
you tasted is probably not representative of the
whole pot.
If you first stir the soup thoroughly before you taste,
your spoonful will more likely be representative of
the whole pot.
36
Sampling bias
37
Types of Data
Data
Examples:
Marital Status
Do you own a car?
Categorical Numerical
Eye Color
(Defined categories or
groups)
Discrete Continuous
Examples: Examples:
Number of Children Weight
Defects per hour Voltage
(Counted items) (Measured characteristics)
Explanatory and response variables
To identify the explanatory variable in a pair of variables, identify
which of the two is suspected of affecting the other:
might affect
explanatory variable response variable
39
Observational studies and experiments
Observational study: Researchers collect data in a way that does not
directly interfere with how the data arise, i.e. they merely “observe”, and
can only establish an association between the explanatory and response
variables.
40
http://xkcd.com/552/
A Study
A Study (continued)
44
Obtaining good samples
More experimental design methodology:
Placebo: fake treatment, often used as the control group
for medical studies
Placebo effect: experimental units showing improvement
simply because they believe they are receiving a special
treatment
Blinding: when experimental units do not know whether
they are in the control or treatment group
Double-blind: when both the experimental units and the
researchers who interact with the patients do not know
who is in the control and who is in the treatment group
45
Statistics – Topic 1
Descriptive Statistics
46
Data in raw form…
… vs. tables and graphical presentations
The “Gen Z” version
Principles of Data Visualization
50