0% found this document useful (0 votes)
11 views30 pages

Chapter 0. Course Presentation

The Intermediate Econometrics and Data Analysis (IEDA) course aims to equip students with essential skills in data analysis and programming, focusing on Supervised Machine Learning using Python. The course is structured into four phases covering data preparation, statistical tools, classification algorithms, and a final interdisciplinary project. Assessment includes midterms, a final exam, and participation, with an emphasis on practical application through group projects utilizing classification models.

Uploaded by

mpineau2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views30 pages

Chapter 0. Course Presentation

The Intermediate Econometrics and Data Analysis (IEDA) course aims to equip students with essential skills in data analysis and programming, focusing on Supervised Machine Learning using Python. The course is structured into four phases covering data preparation, statistical tools, classification algorithms, and a final interdisciplinary project. Assessment includes midterms, a final exam, and participation, with an emphasis on practical application through group projects utilizing classification models.

Uploaded by

mpineau2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

INTERMEDIATE ECONOMETRICS & DATA

ANALYSIS
CHAPTER 0
ABOUT
IEDA
PART I
A. COURSE PRESENTATION

• Given their significant contribution to business success, the demand for data

analysts has been increasing and is expected to continue growing in the

coming years.

• In this context, the “Intermediate Econometrics and Data Analysis (IEDA)”

course is designed to prepare you for future job opportunities in this

expanding field.
B. COURSE PLAN
(1/4)

• The course is designed to address real managerial problems using tools from
“Supervised Machine Learning”, which is a subset of “Artificial Intelligence”.

• It consists of thirteen sessions designed to help you master essential skills


such as data analysis and programming, with Python as the primary
programming language.
B. COURSE PLAN
(2/4)

• The course consists of four main phases:

◦ Phase I (Chapter 1): Data preparation, including handling imbalanced datasets


and addressing abnormal or missing values using Python.

◦ Phase II (Chapters 2 and 3): Preparation for data analysis, which includes a
review of key statistical tools needed for the course and an overview of the
prerequisites for Supervised Machine Learning (such as training/testing sets,
cross-validation, and understanding variation and bias errors).
B. COURSE PLAN
(3/4)

• The course consists of four main phases:

◦ Phase III (Chapters 4 to 7): In-depth exploration of various classification


algorithms, including K-Nearest Neighbors, Decision Trees, Random Forest, and
Neural Networks.

◦ Phase IV (Last session): Emphasis on the interdisciplinary project, focusing on


applying the skills learned throughout the course to real-world scenarios and
presenting the results.
B. COURSE PLAN
(4/4)

CHAPTER 0 • ABOUT IEDA


 PHASE I
CHAPTER 1 • DATA PREPERATION
CHAPTER 2 • ECONOMETRICS & DATA ANALYSIS
 PHASE II
CHAPTER 3 • SUPERVISED MACHINE LEARNING
CHAPTER 4 • K-NEAREST NEIGHBORS (K-NN)
CHAPTER 5 • DECISION TREES (DT)
 PHASE III
CHAPTER 6 • RANDOM FOREST (RF)
CHAPTER 7 • NEURAL NETWORK (NN)
 PHASE IV PROJECT • COACHING
C. COURSE MATERIALS

MYCOURSES ORGANIZATION

• Three main components: • Before the class:

1. Chapter slides; 1. Read the chapter slides;

2. Datasets; 2. Download the chapter’s datasets;

3. Download the “In-Class Practice” files.


3. Python Scripts (i.e., code).

• After the class:


• Three types of Python scripts:
1. Review the slides;
1. Full script;
2. Download and review the “Full script”;
2. In-Class Practice;
3. Complete the “At-Home Practice” as
3. At-Home Practice.
homework for the next session.
D. ASSESSMENT METHODS

MIDTERM FINAL EXAM

• WEIGHT 30% 55%

• DESCRIPTION • Exercises and MCQs • Mainly consists of exercises

• CONTENT • Chapters 0 to 3 included • All chapters

• DATE • Mid-semester • End of the semester

PARTICIPATION

• WEIGHT 15%

• Participation grades consider attendance, class


• DESCRIPTION engagement, attitude, and timely homework
completion.
E. HOW IEDA DIFFERS FROM THE AI COURSE

IEDA AI

1. Focus on a specific subset of AI: 1. All subsets of AI: “Machine Learning” (Supervised
and Unsupervised), and “Deep Learning”;
“Supervised Machine Learning”;

4. Application of AI in business: Utilize the


2. Build classification models: Write the
algorithms developed in IEDA;
underlying algorithms;

5. AI implementation and ethical issues in business.


3. Select the appropriate model: Evaluate

the strengths and weaknesses of each.


 MANAGERIAL APPROACH
 SCIENTIFIC APPROACH
ABOUT
THE
PROJECT
PART II
A. PROJECT PRESENTATION
(1/3)

• The goal is to use classification models from the IEDA and AI courses to
predict the direction of price movement (up vs. down) of the explained
variable, rather than the actual price.

• To accomplish this, the quantitative explained variable (price) is converted


into a qualitative variable with two categories: up and down. Please note that
selecting the variables and obtaining the data are also part of the job.
A. PROJECT PRESENTATION
(2/3)

• Each group (5 students on average) must select only one topic from those
covered in the Financial Markets course.

• For the chosen topic, each group needs to identify one explained variable and
several explanatory variables (refer to the project guidelines for details).

• Your data source will be “Wharton Research Data Services (WRDS)”. Please
register at WRDS Registration to access the data.
A. PROJECT PRESENTATION
(3/3)

• In this regard, a dataset might look like this (just an example):

• Note that you have flexibility regarding the time frame and frequency of the
variables. However, be mindful of the frequencies. For example, do not use
explanatory variables with annual variation (e.g., GDP) to predict an explained
variable that varies daily (e.g., daily stock price) or monthly, and vice versa.
B. EXPECTATIONS

1. FOR THE PROJECT


• Create 4 models using Python : K-NN, DT, RF and NN;

• Compare all the models using performance metrics and evaluate their theoretical
strengths and weaknesses;

• Determine which model is the most effective for your dataset and study context.

2. FOR THE COACHING SESSION


• Ensure that data collection and preparation are completed, and;

• Bring your results, including the four Python models, to the coaching session.
C. IMPORTANT INFORMATION

• Individual coaching, whether via email or in person, is not possible.

• Collecting data and fixing errors are part of the job.

• You will be provided with the complete Python script for all models.

• The provided code contains no errors. Therefore, adapting it to your own


dataset and handling any potential issues are part of the job.

• No additional coaching or error support will be offered outside of the class.


D. CONTACT

• For any questions regarding group composition (e.g., missing members, individuals

not assigned to any group, etc.), please contact the project coordinators:
◦ Aljona ZORINA: [email protected]

◦ Marc JOETS: [email protected]

• For any questions related to data, please contact your finance professor or the

project coordinators, and for questions about Supervised Machine Learning, please

contact your IEDA professor.

• All project information is available on MyCourses:

https://mycourses.ieseg.fr/course/view.php?id=6216
E. SUBMISSION
(1/4)

1. CODE
• Three “.ipynb” files (i.e., google colab notebooks) are expected:

◦ One file for K-Nearest Neighbors;

◦ One file for both Decision Trees and Random Forest;

◦ One file for Neural Network.


E. SUBMISSION
(2/4)

2. TEMPLATE

• A Word document should be completed for all courses involved in the AI


project. For the IEDA section, you must:
◦ Include and interpret the performance measures of the four models you
created.

◦ Analyze the theoretical strengths and weaknesses of each model.

◦ Select the best model for your study and justify your choice.

• In summary, your IEDA professors will evaluate both your code and the

IEDA section of the template.


E. SUBMISSION
(3/4)

• Please note that:

◦ If the instructions are not followed (e.g., if you do not submit three .ipynb
files), you will get a zero for the coding part.
◦ Code that has not been adjusted to your dataset (i.e., using the provided
code without modifications) will also result in a zero for the coding part.
◦ Failure to submit the code in the correct format (e.g., submitting a
document instead of an .ipynb file or only providing a link to your code in
the template) will result in a zero for the coding part.
E. SUBMISSION
(4/4)

• A faulty model will result in a zero for the


entire IEDA section. In other words, if you
include both the original prices and the
converted qualitative variable (price
direction) in your analysis, your results will
be biased. Using the actual prices to explain
variations in the same prices is inherently
flawed.
ABOUT
PYTHON
PART II
A. INTRODUCTION TO PYTHON
(1/2)

• Python is one of the most in-demand programming languages by companies. Its


ease of learning does not diminish its power as a tool.

• A programming language is a means of communicating with computers. It involves


writing a set of instructions, which programmers use to develop software.

• Once these instructions are organized, they form an algorithm. This algorithm
represents the “behind-the-scenes” process of the software we use.

• Other programming languages include Java, JavaScript, C++, Ruby, and more.
A. INTRODUCTION TO PYTHON
(2/2)

• So far, we have used software like SPSS to create models.

• This year, however, we will be building our models from scratch, allowing you to
learn the underlying algorithms.

• Don’t worry, the course is not intended to teach coding. Instead, it focuses on
understanding the logic behind the algorithms and the instructions they require.

• To support this, the code will be provided. Your task will be to adapt the code to
your project dataset by making the necessary adjustments and modifications.
B. PYTHON CODE EDITORS
(1/3)

• In general, languages can be either written or spoken. Programming


languages, including Python, fall into the written category.

• Like any other written language, Python requires a medium for writing. While
human languages are written using text editors (e.g., MS Word) or physical
media (e.g., paper), programming languages are written using “code editors”.

• Therefore, you will need a code editor to write, edit, and run Python code.
B. PYTHON CODE EDITORS
(2/3)

• To write Python code, you can choose between an “Integrated Development


Environment (IDE)” and a simple code editor.

• On the one hand, an IDE (like PyCharm, Jupyter, Spyder, etc.) is a “program
dedicated to software development”*. It includes a wide range of tools and
features that facilitate coding, but it may require more memory and time to
download and install.
B. PYTHON CODE EDITORS
(3/3)

• On the other hand, a code editor is a text “editor designed to handle codes
(with, for example, syntax highlighting and auto-completion).”*

• For the IEDA course, we will use Google Colab, an online code editor that
only requires a Google account, with no need for download or installation.
C. PYTHON LIBRARIES
(1/3)

• The first step in conducting research is usually a literature review. This


process saves time by ensuring we do not duplicate existing knowledge (e.g.,
researching whether smoking causes cardiovascular disease). It also provides
valuable information for our study without starting from scratch.

• Coding is similar to research in this regard. Developers write lines of code,


which are then stored in “libraries” and shared with the community for
reuse. A Python library can therefore be defined as a “reusable chunk of
code”*.
C. PYTHON LIBRARIES
(2/3)

• Using libraries in programming avoids the need to write code from scratch.
Instead, you can search for the appropriate library, import it, and use it.

• Just as in research where you visit a library to find the most relevant book for
your topic, in programming you use libraries to find the tools and functions
needed for your specific task.
C. PYTHON LIBRARIES
(3/3)

Among the thousands of libraries available, these are the ones we will
primarily use:
Data cleaning & analysis: Scientific computing: Machine Learning:
missing values, outliers… mathematical operations PCA, HCA, K-NN, DT, RF…

https://pandas.pydata.org/ https://numpy.org/ https://scikit-learn.org/stable/

Superset of the matplotlib


Imbalanced datasets:
Visualizations: library:
Under-sampling, over-
basic graphs applies themes and decorates
sampling…
matplotlib graphs
https://matplotlib.org https://seaborn.pydata.org/ https://imbalanced-learn.org/

You might also like