7-U2C3-AI Project Cycle - Data Exploration
7-U2C3-AI Project Cycle - Data Exploration
Teacher: ____________________________________________________________________________________
Learning Support Assistant: ____________________________________________________________
Estimated time: 120 min.
Learning Objectives
By the end of this session, students will be able to:
1. Distinguish between structured and unstructured data.
2. Describe the impact of missing values in data exploration.
3. Define the term “feature engineering”.
4. Understand the use of data exploration to clean the data for AI training.
5. Apply different ways to analyse the data
Entry behaviour
Students have good understanding of AI fundamentals. They know the problem scoping phase and data
acquisition phase of AI project cycle.
Session Conduction
Pedagogical Teacher Activities Students Resources and
Phase Activities Method
Engage Recall: Listen, follow Interactive
Quickly recap the concept of problem understanding, and interact. conversation.
scoping and 4W framework.
Discuss various ways of data acquisition.
Inform:
How we ensure that data is suitable for training any
AI model is the main agenda of data exploration.
Data exploration is the process to look in to the
acquired data and removing unnecessary details
from it as well as finding out missing values to finally
have the data sets suitable for AI learning.
Concept Inform: Listen, follow Interactive
introduction Exploration of data usually involves various ways to and interact. conversation.
visualise the data to understand it better, to extract
useful information from, to discard unwanted details
and finally structuring the data in such a way that it
can be used for AI training.
Say:
Basically, exploration involves analysing the data
acquired from various sources.
Concept Elaborate on the factor of missing values. In real life, Listen, follow Interactive
explanation it is not necessary that all the required data is 100% and interact. conversation.
available. There might be blanks in the data. For Do activities.
example, a student who has not appeared in the exam
will have null values for marks in the database or any
employee who is not allocated any project to work
upon after joining may have no details of any project.
Explain how to deal with missing values in various
ways listed in the book.
Then, explain the concept of feature engineering. It is
a process to extract hidden useful information from
the data which was not visible at first glance. This
needs a lot of analytical skills and experience. For
example, a detailed study of data about customers
visiting s shopping mall revealed that most of the
maximum footfall was on the dates which were
Saturdays or holidays.
Elaborate on the well-formedness, validity and
correctness of the data.
Prepare some charts of different types on a dataset
and then use them to explain how different
conclusions are made form them. How the
relationship between the data values are established,
compared and analysed.
Students must understand the technique of analysing
a chart and draw conclusions from it.
Activity – Data Exploration
Refer to the dataset given in the activity and
complete it according to the instructions given.
Debrief on the solution after the activity.
Activity - Explore More
Explain that charts are created for following main
purposes:
Comparisons
Establishing relationship
Distributions and compositions
Show various charts and examples given in the book
to explain how different types of charts are being
used for above 3 purposes.
Closing Explain that AI models must be trained flawlessly
with no biases involved. To prevent biases, missing
values are treated and data is given a structure and
shape suitable for training of the AI model. This is
done by experienced data scientists and data
managers. Visualising and analysing the data to
understand it is the best way to explore data.
Reinforcement activities
Students will read the given case study and answer the questions based on it.
Students will visit the given URL and prepare a presentation to showcase features of various data
visualisation tools online.