classVIII DS Student Handbook
classVIII DS Student Handbook
GRADE VIII
Version 1.0
DATA SCIENCE
GRADE VIII
Student Handbook
ACKNOWLEDGMENT
Patrons
• Sh. Ramesh Pokhriyal 'Nishank', Minister of Human Resource Development,
Government of India
• Sh. Dhotre Sanjay Shamrao, Minister of State for Human Resource
Development, Government of India
• Ms. Anita Karwal, IAS, Secretary, Department of School Education and Literacy,
Ministry Human Resource Development, Government of India Advisory
The objective of this curriculum is to lay the foundation for Data Science,
understanding how data is collected, analyzed and, how it can be used in solving
problems and making decisions. It will also cover ethical issues with data including
data governance and builds foundation for AI based applications of data science.
Therefore, CBSE is introducing ‘Data Science’ as a skill module of 12 hours duration
in class VIII and as a skill subject in classes IX-XII.
CBSE acknowledges the initiative by Microsoft India in developing this data science
handbook for class VIII students. This handbook introduces the concepts of data
science, data visualizations and applications of data science in AI. The course covers
the theoretical concepts of data science followed by practical examples to develop
critical thinking capabilities among students.
The purpose of the book is to enable the future workforce to acquire data science skills
early in their educational phase and build a solid foundation to be industry ready.
Contents
I
CHAPTER
Introduction to Data
1
2. Real-World examples of
Data
Now that we have understood what is
data and what are types in which data is
categorized, an obvious question that
strikes our mind is that, what is the
application of this data in the real world?
2
content. Combined with that, it analyses • Effective targeting of the
the videos that people usually play post- advertisements
watching a video.
These people's preferences are stored
and studied. Later an algorithm in the Recap
background creates a pattern of people's
preferences and shows you the same • We are surrounded by data. Every
content in suggested videos, which the computer, every mobile device
majority of people watched post the generates immense amount of data.
existing clip. • Data comes in different types such
as audio, video, text etc.
This is how data analysis is applied in
• Data can be qualitative or
the entertainment industry in real life.
quantitative, continuous or iscrete.
Some of the benefits of data in the • Discrete data can take only a
entertainment industry are: specific value.
• Continuous data can have a value
• Predicting interests of the audience
within a specific range.
• Optimized or on-demand scheduling
of media streams in digital media
distribution platforms
• Getting insights from customer
reviews
3
Exercises
Objective Type Questions
Please choose the correct option in the questions below.
1. Discrete data can take any value in a range.
a. True
b. False
2. Continuous data cannot take decimal values.
a. True
b. False
3. Information stored in a PDF is not considered data.
a. True
b. False
4. Quantitative data cannot take numerical values
a. True
b. False
5. Qualitative data is descriptive in nature.
a. True
b. False
6. “How is the weather like?” is what kind of data
a. Quantitative
b. Qualitative
7. Which of the following is considered data?
a. Speech
b. Video
c. Messages
d. All of the above
8. How is data used in the entertainment industry?
a. Predicting interests
b. Targeting ads
c. Both of the above
9. Number of days in a week is an example of?
a. Discrete Data
b. Continuous Data
10. What are the types of quantitative data?
a. Discrete
b. Continuous
c. Both a and b
4
Standard Questions
Please answer the questions below in no less than 100 words.
1. Explain what data is, with the help of two real-life examples.
2. How is the data categorized?
3. What is Discrete Data?
4. What is Continuous Data?
5. Give two examples of real-life applications of data.
Applied Project
Data analytics has many applications in our life. Discuss how data analytics is applied
in the airline industry to predict flight delays. Few factors which influence flight delays:
5
CHAPTER
6
create actionable plans for
companies and organizations.
Activity 2.1
Data Scientists are analytical experts
Try to find everyday used applications
who utilize their skills both in
that depend on data science. technology and social science to find
trends and manage data. They use
their industry knowledge and
2. Careers in Data science context-specific understanding to
As we understand about Data, Data find solutions to business
Analysis, and Data Science, one of the challenges.
important questions that coin up is, 2. Business Intelligence Analyst -
what are the career options that we can Business Intelligence Analysts use
take up in Data Science? data to assess the market and find
We have learned about the real-life the latest business trends in the
applications of data and data science. industry. This helps to develop a
Many of us may have found it interesting clearer picture of how a company
and may want to pursue this career to should shape its strategy.
explore it further.
3. Data Engineer - Data Engineer
To help you nail through the right examines not only the Data for their
choice, let us understand which own business but also that of third
different careers we can take up in Data parties. In addition to mining data, a
Science. Some common job titles for data engineer creates robust
data scientists include: algorithms to help analyze the data
further.
1. Data Scientist
2. Business Intelligence Analyst
4. Data Architect - Data Architects
3. Data Mining Engineer work closely with users, system
4. Data Architect designers, and developers to create a
5. Senior Data Scientist
blueprint that data management
Let us now briefly go through these job systems use to centralize, integrate
titles to get a better understanding: and maintain the data sources.
1. Data Scientist - Data Scientists are 5. Senior Data Scientist - Senior Data
data enthusiasts who gather and Scientists anticipate the business's
analyze large sets of structured and needs in the future. Although they
unstructured data. A data scientist's might not be involved in gathering
role combines computer science, data, they play a high-level role in
statistics, and mathematics. They analyzing it. Using their vast
analyze, process, and model data experience, they can design and
and later interpret the results to create new standards for analyzing
7
data. They can also create ways to Is this an outlier?
use statistical data and develop tools
to further analyze the data. In some cases, the objective is to find
8
The algorithms that are used for these What should be done now?
types of questions are called anomaly
detection algorithms. This question usually solves the
problems of autonomous robots or self-
What will probably be the value of driving cars that need to make decisions
this variable? based on changes in external factors.
Machine learning helps to solve such
Machine learning can also help us problems with the help of reinforcement
predict numerical values of continuous learning.
variables. There are scenarios in which
we must predict numerical values of a These models are trained by a process of
variable based on historic data. reward every time a correct action is
taken and punishment every time a
Some examples are: wrong action is taken.
Q: How much rainfall will we receive this
year?
A: 100 mm
A: 320
The kind of algorithms that can predict
these values are called regression
algorithms.
9
Recap
• Data science is about how to extract meaningful interpretation from the data.
• There are many careers in Data Science like Data Scientist, Data Engineer and
Data analyst.
• Data Architect and Senior Data Scientist are two roles for experienced
professionals.
• Classification helps us to predict if a new item belongs to class A or class B.
• Regression helps us to predict the value of a continuous variable.
• Clustering helps us to find patterns in the data.
• Reinforcement learning helps models to take decisions based on external
factors.
Exercises
Objective Type Questions
Please choose the correct option in the questions below.
1. A school named ABC has recorded the total marks of every student in the class.
This an example of:
a. Qualitative data
b. Quantitative data
c. Both qualitative and quantitative data
d. None of the above
2. A food delivery app has asked for your feedback on the quality of the food. You
have written two paragraphs to describe the food. This is an example of:
a. Qualitative Data
b. Quantitative Data
c. Both qualitative and quantitative data
d. None of the above
3. It would help if you predicted what the temperature would be for next Friday.
Which algorithm will you use?
a) Clustering
b) Regression
c) Anomaly detection
d) Binary classification
4. You need to predict if your car tire will last for the next 1000 km. Which algorithm
will you use?
a) Clustering
b) Regression
c) Anomaly detection
d) Binary classification
5. You want to build a way to segregate spam emails from good emails. Which
algorithm will you use?
a) Clustering
b) Regression
c) Anomaly detection
d) Binary classification
Standard Questions
Please answer the questions below in no less than 100 words.
1. What are the common career paths for data science?
2. What does a Data Architect do?
3. What are the differences between classification and regression?
Applied Project
Emails are a part of daily communication. Sometimes we receive unwanted emails called
spam. There are few techniques that email providers use to identify spam mails :
11
CHAPTER
Data Visualization
12
• Charts The most preferred food item is pizza
• Graphs and the least preferred food item is
• Tables pasta.
• Maps
• Histograms Example 2: Using a line chart that
displays the data of the number of
3. Examples of data students present in the class for one
week.
visualization
Example 1: Using a pie chart that Here is the data:
displays the data of the food preferred by
the students.
Date Number of students
present
We have the food item preference of 50
06-Apr 49
students. Let us now visualize the data
using a pie chart and find the most 07-Apr 42
preferred and the least preferred food 08-Apr 37
item. 09-Apr 48
10-Apr 43
11-Apr 36
12-Apr 50
N U M BE R O F S T U D EN T S
Let us now visualize the data using a pie PRESENT
chart: 60
50
FOOD PREFERENCE
40
Dosa
30% 30
Pizza 20
50%
10
Pasta 0
20%
13
We can also visualize the same data Let us understand what steps we need
using a bar graph: to take to make sure that we collect the
right set of data for analysis.
To make sure that we get the required • Format of data - The format of the
outcome from the data, we must collect Data that is collected for analysis
the right and relevant data. should be right. Data should be
It is essential to have correct and good accessible and readable for analysis.
quality data to make an analysis or to If the collected data is not in the right
format, we should convert it to the
construct algorithms that can have an
required format for analysis.
impact. Without relevant data, your
analyses will not only be irrelevant, but
they can also be misleading.
5. Asking the right
You cannot expect to find perfectly question
preprocessed raw data that be used
Once we have the required data ready
directly for your needs. Hence, you need
with us, the next step is to ask the right
to understand how the data was question to the data. It is important to
gathered and what sources it was
understand that if we don't ask the right
collected from.
questions, we will never get the right
Therefore, it is essential to understand answers. To make sure we perform the
how to collect relevant data for analysis.
14
a. Regression Analysis is a process
for finding out the relationships
and correlations among the
different variables in the data.
Below are specific questions that you b. Cohort Analysis – it enables you
need to ask to your data set to get the to easily compare how different
right answer: groups, or cohorts, of customers,
behave over time.
• What do you wish to find?
For example, you can create a
It is essential to consider what your cohort of customers based on the
goal is and what decision-making it date when they made their first
will facilitate. What outcome from the purchase. Subsequently, you can
analysis would you consider a study the spending trends of
success? cohorts from different periods in
time to determine whether the
These initial analysis questions are quality of the average acquired
important to guide you through the customer is increasing or
process and help focus on valuable decreasing over time.
insights. You can start by
brainstorming and preparing a draft
c. Predictive Analysis – Predictive
guideline for specific questions you
analytics involves the analysis of
want to find from the data. This will
historical datasets to predict
help you to dive deeper into the more future possibilities. It can also be
specific insights you want to achieve. used for generating alternative
scenarios and risk assessments.
• Which statistical techniques are
applicable? • Who will be using the final results?
There are several statistical analysis An important aspect of your data
techniques that you can use for analytics refers to the end-users of
analyzing data. However, in real-life our analysis. Who are they and how
scenarios, three statistical will they be using the reports you
techniques are mostly used for create? You must get to know your
analysis: final users, including:
15
a. What do they expect to learn from be able to understand the insights
the data? from them.
b. What do they need?
c. How advanced are their technical It is essential to convince executive
skills?
and decision-makers that the data
d. How much time do they have?
that you have gathered and analyzed
are:
If you know these answers, you can
decide on how detailed your data
visualizations should be and what a. Correct
areas of the data your report should b. Important
be focused on. c. Urgent to act upon
16
Recap
Exercises
Objective Type Questions
Please choose the correct option in the questions below.
17
4. Which format of data is easiest for analysis?
a. Tabular data
b. Text data in a PDF
c. Data in an image
d. Speech data
5. Which visualization is best for representing a relation between two variables?
a. Scatter plot
b. Histogram
c. Pie chart
d. Gantt Chart
Standard Questions
Please answer the questions below in no less than 100 words.
1. What are the steps to make sure that the correct data is collected for analysis?
2. Write a short note on the statistical techniques which can be used for data
analysis.
3. Is it important to assess the end-users for a visualization? Explain in your own
words.
2. If you find that the data collected has outliers, what steps can you take to ensure
that your analysis is still accurate?
Applied Project
Each student should write down the marks he/she had received in the examination for
the subjects studied in the previous grade. Use these marks to plot on paper
a. bar graph to display marks of each individual subject.
b. line graph to display marks of each individual subject.
c. pie chart to show percentage contribution of marks of each subject to the total
marks obtained.
18
CHAPTER
19
various brands in your window. Ever querying data, mining data, search data,
wondered, how this new application or and analyzing data to get insights.
website knows that you are looking to
buy a handbag? Well, the answer to this For example, if we have a database with
is data science. Algorithms in data
customer data, an end-user could query
science help in tracking your searches
the database to find out how many
and learn your preferences from them.
customers have started using the
Speech Recognition - Speech company's services in the last quarter
recognition is now part of our everyday and how many have stopped using the
lives. Speech recognition has now service. They can do so by just entering
become a part of phones, game consoles, a query in plain English instead of a
and even smartwatches. Have you heard
query language like SQL.
of Microsoft's Cortana? It uses speech
recognition behind the scenes to take
Chatbots are also an important area that
inputs from the user.
uses text analytics for both querying and
Speech recognition can also be found on searching data. Chatbots can use to
many devices that can be used to query a database and give a reply based
automate our homes. on the question. They can also use
search based on text analytics to help in
Speech recognition has been around for retrieving a document based on what
more than a decade. However, it is end users are looking for.
gaining popularity now as machine
learning is helping organizations make 4. Analytics on image
speech recognition much more accurate.
data
3. Analytics on text data Image recognition can be described as a
Text analytics can be defined as the process by which we can process images
process of collecting unstructured text for identifying people, patterns, logos,
from various sources and analyzing and objects, or places.
extracting relevant information from it. Many machine learning tools can assist
It can also be used for transforming it users with facial recognition of objects in
into structured information that can a picture. These tools can also scan the
then be used in various other ways. objects in the picture and attempt to
There are several ways to analyze identify and name them based on a large
unstructured text. Most of these database of images.
techniques can be divided under these Mobile phones, for example, make use of
technical areas - Natural Language computer vision technologies in
Processing (NLP), data mining, and combination with a camera to achieve
information retrieval. image recognition. This advanced
technology has a variety of applications
Typically, we used text analytics
technologies for four basic tasks –
20
like accessibility for the visually
impaired and interactive advertising.
21
c. Planning and Navigation:
Making computers capable of Recap
traveling from Point X to Point Y.
For example, a self-driving robot. • There are two important applications
of data science – digital ads and
d. Natural Language Processing: speech recognition.
Make computers capable of • Text analytics can be defined as the
understanding and processing a process of collecting unstructured
language. For example, a web text from various sources and
translator that translates one analyzing and extracting relevant
language to another. information from it.
• Chatbots are also an important area
e. Perception: Make computers that uses text analytics for both
capable of interacting with real- querying and searching data.
world objects by the sense touch, • Image recognition can be said to be
sound, smell and eyesight. a process by which we can process
images for identifying people,
f. Emergent Intelligence: Make patterns, logos, objects or places.
computers capable of Intelligence • Artificial Intelligence is defined as
that is not explicitly programmed the science and engineering of
but is derived from AI making intelligent machines.
capabilities. The basic vision for • AI has many sub goals like – natural
this goal is to enable machines to language processing, perception etc.
exhibit emotional intelligence,
moral reasoning, and more.
Exercises
Objective Type Questions
Please choose the correct option in the questions below.
22
3. Which of the following is a use case of data science?
a. Facial recognition
b. Text analytics
c. Sentiment analysis
d. All of the above
4. What does natural language processing help us with?
a. Text analytics
b. Video analytics
c. Image analytics
5. What technologies are used by chatbots?
a. Text analytics
b. Speech recognition
c. Both above
Standard Questions
Please answer the questions below in no less than 100 words.
Applied Project
Understanding the mood of the speaker can be very useful. Certain keywords can be
associated with different sentiments.
Example 1: “The news continues to be gloomy.” If you read this sentence you will
understand that the sentiment of the speaker is sad.
Example 2: “I was infuriated by his arrogance.” This sentence tells you that the
sentiment of the speaker is angry.
Discuss with your classmates how text analytics can help us identify the sentiment of
the speaker i.e. if the speaker is happy, angry, or sad. It is possible that a sentence may
have more than one keywords which highlight the sentiment of the speaker. Provide 2
examples of such scenarios for each of the sentiments discussed above.
23
References
Vivek Kumar. 2020. WHY DOES DATA SCIENCE MATTER IN ADVANCED IMAGE
RECOGNITION? [Online]. [4 March 2021]. Available from:
https://www.analyticsinsight.net/data-science-matter-advanced-image-recognition/
24