0% found this document useful (0 votes)

55 views57 pages

Class X AI Unit 4: Data Science

The document provides an overview of Data Science, highlighting its integration of statistics, data analysis, and machine learning to analyze real-world phenomena. It discusses various applications of Data Science in fields such as finance, genetics, internet search, targeted advertising, and airline route planning. Additionally, it outlines the importance of data acquisition, collection methods, and tools like Python libraries (NumPy, Pandas, Matplotlib) for data manipulation and visualization.

Uploaded by

Laksin VJ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views57 pages

Class X AI Unit 4: Data Science

Uploaded by

Laksin VJ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Class X

AI
Unit 4: Data
Science
CLASS X – ARTIFICIAL
INTELLIGENCE
DATA SCIECE?
Talking about Data Sciences, it is a
concept to unify statistics, data
analysis, machine learning and their
related methods in order to understand
and analyse actual phenomena with
data.

It employs techniques and

theories drawn from many fields
within the context of Mathematics,
Statistics, Computer Science, and
Information Science.
Applications of Data
Sciences
Fraud and Risk Detection:
• The earliest applications of data science were in Finance.
• Companies were fed up of bad debts and losses every year.
• However, they had a lot of data which use to get collected during
the initial paperwork while sanctioning loans.
• They decided to bring in data scientists in order to rescue them
from losses.
• Over the years, banking companies learned to divide and conquer
data via customer profiling, past expenditures, and other
essential variables to analyse the probabilities of risk and default.
• Moreover, it also helped them to push their banking products
based on customer’s purchasing power.
Genetics & Genomics*:
• Data Science applications also enable an advanced level of treatment
personalization through research in genetics and genomics.
• The goal is to understand the impact of the DNA on our health and find
individual biological connections between genetics, diseases, and drug
response.
• Data science techniques allow integration of different kinds of data with
genomic data in disease research, which provides a deeper understanding of
genetic issues in reactions to particular drugs and diseases.
• As soon as we acquire reliable personal genome data, we will achieve a deeper
understanding of the human DNA.
• The advanced genetic risk prediction will be a major step towards more
individual care.
Internet Search: When we talk about search
engines, we think ‘Google’. Right? But there
are many other search engines like Yahoo,
Bing, Ask, AOL, and so on.
All these search engines (including Google)
make use of data science algorithms to deliver
the best result for our searched query in the
fraction of a second.
Considering the fact that Google processes
more than 20 petabytes of data every day, had
there been no data science, Google wouldn’t
have been the ‘Google’ we know today.
Targeted Advertising:
If you thought Search would have been the biggest of all
data science applications, here is a challenger – the
entire digital marketing spectrum.
Starting from the display banners on various websites to
the digital billboards at the airports – almost all of them
are decided by using data science algorithms.
This is the reason why digital ads have been able to get
a much higher CTR (Call-Through Rate) than traditional
advertisements. They can be targeted based on a user’s
past behaviour.
Website Recommendations:
Aren’t we all used to the suggestions about similar
products on Amazon?
They not only help us find relevant products from
billions of products available with them but also add
a lot to the user experience.
A lot of companies have fervidly used this engine to
promote their products in accordance with the user’s
interest and relevance of information.
Internet giants like Amazon, Twitter, Google Play,
Netflix, LinkedIn, IMDB and many more use this
system to improve the user experience.
The recommendations are made based on previous
search results for a user.
Airline Route Planning:
The Airline Industry across the world is
known to bear heavy losses. Except for
a few airline service providers,
companies are struggling to maintain
their occupancy ratio and operating
profits. With high rise in air-fuel prices
and the need to offer heavy discounts
to customers, the situation has got
worse. It wasn’t long before airline
companies started using Data Science
to identify the strategic areas of
improvements.
Now, while using Data Science, the
airline companies can:
• Predict flight delay
• Decide which class of airplanes to buy
• Whether to directly land at the destination or take a halt
in between
(For example, A flight can have a direct route from New
Delhi to New York. Alternatively, it can also choose to halt
in any country.)
• Effectively drive customer loyalty programs
Getting Started

Data Sciences is a combination of Python and Mathematical

concepts like Statistics, Data Analysis, probability, etc. Concepts
of Data Science can be used in developing applications around AI
as it gives a strong base for data analysis in Python.
Revisiting
AI
Project
Cycle
Humans are social animals. We tend to organise
and/or participate in various kinds of social
gatherings all the time. We love eating out with
friends and family because of which we can find
restaurants almost everywhere and out of these,
many of the restaurants arrange for buffets to offer
a variety of food items to their customers. Be it
small shops or big outlets, every restaurant
prepares food in bulk as they expect a good crowd
to come and enjoy their food. But in most cases,
after the day ends, a lot of food is left which
becomes unusable for the restaurant as they do not
wish to serve stale food to their customers the next
day.
So, every day, they prepare food in large
quantities keeping in mind the probable
number of customers walking into their
outlet.
But if the expectations are not met, a
good amount of food gets wasted which
eventually becomes a loss for the
restaurant as they either have to dump it
or give it to hungry people for free.
And if this daily loss is taken into account
for a year, it becomes quite a big amount.
Problem Scoping Now that we have understood
the scenario well, let us take a deeper look into the
problem to find out more about various factors
around it.

Let us fill up the 4Ws problem canvas to find out.

Data Acquisition:

After finalising the goal of our project, let us now move

towards looking at various data features which affect the
problem in some way or the other.

Since any AI-based project requires data for testing and

training, we need to understand what kind of data is to be
collected to work towards the goal.
In our scenario,
various factors
that would affect
the quantity of
food to be
prepared for the
next day
consumption in
buffets would be:
• Now let us understand how these factors are related to
our problem statement.

• For this, we can use the System Maps tool to figure out
the relationship of elements with the project’s goal.

• Here is the System map for our problem statement.

After looking at the factors affecting our problem statement, now it’s
time to take a look at the data which is to be acquired for the goal.

For this problem, a dataset covering all the elements mentioned

above is made for each dish prepared by the restaurant over a
period of 30 days.

This data is collected offline in the form of a regular survey since

this is a personalised dataset created just for one restaurant’s
needs.
Specifically, the data collected comes
under the following categories:

• Name of the dish,

• Price of the dish,
• Quantity of dish produced per day,
• Quantity of dish left unconsumed
per day,
• Total number of customers per day,
• Fixed customers per day, etc.
Data Collection
• Data collection is nothing new which
has come up in our lives.
• It has been in our society since ages.
• Even when people did not have fair
knowledge of calculations, records
were still maintained in some way or
the other to keep an account of
relevant things.
• Data collection is an exercise which
does not require even a tiny bit of
technological knowledge.
Data Collection
But when it comes to analysing the data, it becomes a tedious
process for humans as it is all about numbers and alpha-
numerical data.
That is where Data Science comes into the picture.
It not only gives us a clearer idea around the dataset, but also
adds value to it by providing deeper and clearer analyses
around it.
And as AI gets incorporated in the process, predictions and
suggestions by the machine become possible on the same.
For the data domain-based
projects, majorly the type of data
used is in numerical or alpha-
numerical format and such
datasets are curated in the form
of tables.
Such databases are very
commonly found in any
institution for record maintenance
and other purposes.
Some examples of datasets which you must already be aware of are:
Sources of Data

There exist various sources of data from where

we can collect any type of data required and
the data collection process can be categorised
in two ways:

Offline and Online.

Types of Data For Data Science, usually the
data is collected in the form of tables.
These tabular datasets can be stored in
different formats.

Some of the commonly used formats are:

• CSV
• Spreadsheet
• SQL
1. CSV:
CSV stands for comma separated values.
It is a simple file format used to store
tabular data.

Each line of this file is a data record and

reach record consists of one or more
fields which are separated by commas.

Since the values of records are

separated by a comma, hence they are
known as CSV files.
2. Spreadsheet:
A Spreadsheet is a piece of paper or a computer program
which is used for accounting and recording data using
rows and columns into which information can be entered.
Microsoft excel is a program which helps in creating
spreadsheets.
3. SQL:
SQL is a programming language also known
as Structured Query Language.
It is a domainspecific language used in
programming and is designed for managing
data held in different kinds of DBMS
(Database Management System)
It is particularly useful in handling structured
data.
Data Access

After collecting the data, to be able to use it for programming

purposes, we should know how to access the same in a Python
code.

To make our lives easier, there exist various Python packages which
help us in accessing structured data (in tabular form) inside the
code.

Let us take a look at some of these packages:

NumPy
NumPy, which stands for Numerical Python, is the fundamental
package for Mathematical and logical operations on arrays in
Python.
It is a commonly used package when it comes to working around
numbers.
NumPy also works with arrays, which is nothing but a homogenous
collection of Data.
An array is nothing but a set of multiple values which are of same
datatype.
They can be numbers, characters, booleans, etc. but only one
datatype can be accessed through an array. In NumPy, the arrays
used are known as ND-arrays (N-Dimensional Arrays) as NumPy
comes with a feature of creating n-dimensional arrays in Python.
Pandas
• Pandas is a software library written for the Python
programming language for data manipulation and
analysis.

• In particular, it offers data structures and operations for

manipulating numerical tables and time series.

• The name is derived from the term "panel data", an

econometrics term for data sets that include
observations over multiple time periods for the same
individuals.
The two primary data structures of Pandas,
Series (1-dimensional) and DataFrame (2-
dimensional), handle the vast majority of typical
use cases in finance, statistics, social science,
and many areas of engineering.

Pandas is built on top of NumPy and is intended

to integrate well within a scientific computing
environment with many other 3rd party
libraries.
Matplotlib*
Matplotlib is an amazing visualization library in Python for 2D plots
of arrays.
Matplotlib is a multiplatform data visualization library built on
NumPy arrays.
One of the greatest benefits of visualization is that it allows us
visual access to huge amounts of data in easily digestible visuals.
Matplotlib comes with a wide variety of plots.
Plots helps to understand trends, patterns, and to make
correlations.
They’re typically instruments for reasoning about quantitative
information.
Some types of graphs that we can make
with this package are listed :
Basic Statistics with Python
We have already understood that Data Sciences works around
analysing data and performing tasks around it.

For analysing the numeric & alpha-numeric data used for this
domain, mathematics comes to our rescue. Basic statistical
methods used in mathematics come for analysing and working
around such datasets.

DFE - Interview Questions and Answers: T24 System - Work File - Output File (.CSV, XML, Etc)
100% (1)
DFE - Interview Questions and Answers: T24 System - Work File - Output File (.CSV, XML, Etc)
15 pages
CoverageMaster - Tutorial - Eng (Winams Tutorial)
No ratings yet
CoverageMaster - Tutorial - Eng (Winams Tutorial)
123 pages
Oracle CPQ 2021: Replace (STR, Old, New, (N) )
No ratings yet
Oracle CPQ 2021: Replace (STR, Old, New, (N) )
30 pages
Autochro-3000 Manual Eng
100% (2)
Autochro-3000 Manual Eng
151 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Data Science S (2 Files Merged)
No ratings yet
Data Science S (2 Files Merged)
30 pages
DP 900T00A ENU TrainerHandbook
100% (2)
DP 900T00A ENU TrainerHandbook
103 pages
BF Bot Manager v3 Strategies Manual
100% (1)
BF Bot Manager v3 Strategies Manual
121 pages
Igcse Ict Textbook 1
50% (2)
Igcse Ict Textbook 1
114 pages
1c. INTRODUCTION-Data-Science-basic
No ratings yet
1c. INTRODUCTION-Data-Science-basic
31 pages
Unit 2 Data Science
No ratings yet
Unit 2 Data Science
53 pages
1 DataScience
No ratings yet
1 DataScience
91 pages
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
No ratings yet
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
54 pages
Screenshot 2023-10-23 at 7.15.14 AM
No ratings yet
Screenshot 2023-10-23 at 7.15.14 AM
25 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Tutorial For Steam Hammer Analysis Using CAEPIPE: General
No ratings yet
Tutorial For Steam Hammer Analysis Using CAEPIPE: General
6 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Data Science CBSE Notes
No ratings yet
Data Science CBSE Notes
45 pages
Irimeter Manual
No ratings yet
Irimeter Manual
41 pages
Session 1819
No ratings yet
Session 1819
47 pages
SlotServer Configuration Manual
No ratings yet
SlotServer Configuration Manual
37 pages
How To Create An Excel Output File Through Concurrent Program
No ratings yet
How To Create An Excel Output File Through Concurrent Program
13 pages
Data Science Notes
No ratings yet
Data Science Notes
4 pages
Ai Project 1
No ratings yet
Ai Project 1
21 pages
File
No ratings yet
File
27 pages
Notes Data Science
100% (1)
Notes Data Science
5 pages
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
No ratings yet
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
6 pages
Yuvan Skool Managment Project
No ratings yet
Yuvan Skool Managment Project
11 pages
Unit I
No ratings yet
Unit I
52 pages
Data Science XTH
No ratings yet
Data Science XTH
10 pages
Question Bank Syllbuswise
No ratings yet
Question Bank Syllbuswise
16 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
Open A Text File in Excel VBA
No ratings yet
Open A Text File in Excel VBA
5 pages
Grade 10 Unit 4 - Data Science
No ratings yet
Grade 10 Unit 4 - Data Science
14 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
Applications of Data Science
No ratings yet
Applications of Data Science
5 pages
Enterprise Security Biology Dissecting The Splunk Enterprise Security Threat Intelligence Framework
No ratings yet
Enterprise Security Biology Dissecting The Splunk Enterprise Security Threat Intelligence Framework
64 pages
Unit-4 Data Science: What Is Data Science? Write Some of Its Applications. Ans
No ratings yet
Unit-4 Data Science: What Is Data Science? Write Some of Its Applications. Ans
5 pages
Unit 1 R Reading-Writing Files
No ratings yet
Unit 1 R Reading-Writing Files
8 pages
Applied Biostatistics 2020 - 02 The R Environment
No ratings yet
Applied Biostatistics 2020 - 02 The R Environment
27 pages
Export Data Using R Studio
No ratings yet
Export Data Using R Studio
9 pages
BCSC 0016 - Emerging Tech (Updatedv3) - 1
No ratings yet
BCSC 0016 - Emerging Tech (Updatedv3) - 1
66 pages
Data Science
No ratings yet
Data Science
68 pages
Data Science 2
No ratings yet
Data Science 2
3 pages
Emerging Tech Notes - Module1
No ratings yet
Emerging Tech Notes - Module1
55 pages
URKUND URSA Major Userguide en
No ratings yet
URKUND URSA Major Userguide en
13 pages
Data Science
No ratings yet
Data Science
10 pages
AIDS C04-Session-19
No ratings yet
AIDS C04-Session-19
29 pages
Integrated Simulation Introduction Guide PDF
No ratings yet
Integrated Simulation Introduction Guide PDF
16 pages
Dsbda Unit 1
No ratings yet
Dsbda Unit 1
119 pages
Inventory Report Cisco
No ratings yet
Inventory Report Cisco
66 pages
Ip Project Source Code
No ratings yet
Ip Project Source Code
15 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Data Science Tutorial 1
No ratings yet
Data Science Tutorial 1
26 pages
Data Sciences
No ratings yet
Data Sciences
23 pages
Extending The CSV Importer
No ratings yet
Extending The CSV Importer
9 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Coll 3 PDF Free
No ratings yet
Coll 3 PDF Free
25 pages
Data Science - Chapter 3
No ratings yet
Data Science - Chapter 3
29 pages
Data Science
No ratings yet
Data Science
8 pages
Unit4StudyMaterial 15626
No ratings yet
Unit4StudyMaterial 15626
47 pages
Unit 1
No ratings yet
Unit 1
25 pages
Data Science Ai Revision Notes
No ratings yet
Data Science Ai Revision Notes
8 pages
Chapter-3 Data Sciences Study Materials Final-1
No ratings yet
Chapter-3 Data Sciences Study Materials Final-1
3 pages
Data Science
No ratings yet
Data Science
62 pages
B14 - LT2 - 07 - Numpy Matplotlib Pandas
No ratings yet
B14 - LT2 - 07 - Numpy Matplotlib Pandas
101 pages
The Field of Data Science
No ratings yet
The Field of Data Science
4 pages
BCA Lecture I
No ratings yet
BCA Lecture I
20 pages
Data Science
No ratings yet
Data Science
9 pages
PDF Data Science
No ratings yet
PDF Data Science
7 pages
Class X Artificial Intelligence: Computer Vision
No ratings yet
Class X Artificial Intelligence: Computer Vision
54 pages
Celebal Summer t-1
No ratings yet
Celebal Summer t-1
34 pages
Business Problem Statement
No ratings yet
Business Problem Statement
20 pages
CH 8 Introduction To Trignometry
No ratings yet
CH 8 Introduction To Trignometry
31 pages
Unit 1-FDS
100% (2)
Unit 1-FDS
18 pages
Himadev
No ratings yet
Himadev
37 pages
Pandas - Dataframe - Methods
No ratings yet
Pandas - Dataframe - Methods
15 pages
PreKG Class12 IssueUsageAndUpkeepOfSchoolTextbooksNotebooks 29march
No ratings yet
PreKG Class12 IssueUsageAndUpkeepOfSchoolTextbooksNotebooks 29march
3 pages
What Is Data Science?: Module - 1
No ratings yet
What Is Data Science?: Module - 1
29 pages
Srinivas Enterprises
No ratings yet
Srinivas Enterprises
609 pages
X Ai SS CH4 Notes
No ratings yet
X Ai SS CH4 Notes
5 pages
UNIT - I Intro To DS
No ratings yet
UNIT - I Intro To DS
18 pages
STEMpreneur Kreativity League 24-25
No ratings yet
STEMpreneur Kreativity League 24-25
5 pages
KG Class12 RescheduledReopeningDates
No ratings yet
KG Class12 RescheduledReopeningDates
1 page
CHN Cohort April 2025
No ratings yet
CHN Cohort April 2025
78 pages
Exp 5a & 5b Resistance in Series & Parallel
No ratings yet
Exp 5a & 5b Resistance in Series & Parallel
3 pages
Jäœehl Ä C G Â K W G ®KHD¡ FHF Èäbl : Ä F LZ Uój
No ratings yet
Jäœehl Ä C G Â K W G ®KHD¡ FHF Èäbl : Ä F LZ Uój
1 page
Introduction To Data Science
No ratings yet
Introduction To Data Science
15 pages
6mar HW Class8
No ratings yet
6mar HW Class8
1 page
3961502-Class10 Ai Part B Unit3 Unit3 Data Science
No ratings yet
3961502-Class10 Ai Part B Unit3 Unit3 Data Science
15 pages
Unit I TYCS DS
No ratings yet
Unit I TYCS DS
73 pages
GN-MC9557-2 (ExReader)
No ratings yet
GN-MC9557-2 (ExReader)
7 pages
DS B&V-1
No ratings yet
DS B&V-1
30 pages
DMSA Flow
No ratings yet
DMSA Flow
51 pages
Introductiontodatascience 230122140841 B90a0856
No ratings yet
Introductiontodatascience 230122140841 B90a0856
44 pages
Introductiontodatascience 230122140841 B90a0856 1
No ratings yet
Introductiontodatascience 230122140841 B90a0856 1
44 pages
Salesforce Certified Platform Foundations Exam Free Dumps
No ratings yet
Salesforce Certified Platform Foundations Exam Free Dumps
4 pages
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
From Everand
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
Riley Adams
5/5 (1)
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Class X AI Unit 4: Data Science

Uploaded by

Class X AI Unit 4: Data Science

Uploaded by

Class X

It employs techniques and

Data Sciences is a combination of Python and Mathematical

Let us fill up the 4Ws problem canvas to find out.

After finalising the goal of our project, let us now move

Since any AI-based project requires data for testing and

• Here is the System map for our problem statement.

For this problem, a dataset covering all the elements mentioned

This data is collected offline in the form of a regular survey since

• Name of the dish,

There exist various sources of data from where

Offline and Online.

Some of the commonly used formats are:

Each line of this file is a data record and

Since the values of records are

After collecting the data, to be able to use it for programming

Let us take a look at some of these packages:

• In particular, it offers data structures and operations for

• The name is derived from the term "panel data", an

Pandas is built on top of NumPy and is intended

You might also like