Ai Class X Notes
Ai Class X Notes
Intelligence
AI, ML & DL
Artificial Intelligence (AI): Refers to any technique that enables
computers to mimic human intelligence. It gives the ability to
machines to recognize a human’s face; to move and manipulate
objects, understand human voice commands, and do other tasks. The
AI-enabled machines think algorithmically and execute what they
have been asked for intelligently.
Machine Learning (ML): It is a subset of Artificial Intelligence that
enables machines to improve at tasks with experience (data). The
intention of Machine Learning is to enable machines to learn by
themselves using the provided data and make accurate Predictions/
Decisions.
Deep Learning (DL): It enables software to train itself to perform
tasks with vast amounts of data. In Deep Learning, the machine is
trained with huge amounts of data which helps it in training itself
around the data. Such machines are intelligent enough to develop
algorithms for themselves. Deep Learning is the most advanced form
of Artificial Intelligence out of these three.
AI Domains
With respect to the type of data fed in the AI model, AI models can be
broadly categorized into three domains:
• Data Sciences
• Computer Vision
• Natural Language Processing
Data Sciences
Data sciences is a domain of AI related to data systems and processes,
in which the system collects numerous data, maintains data sets and
derives meaning/sense out of them. The information extracted through
data science can be used to make a decision about it.
Computer Vision
Computer Vision, abbreviated as CV, is a domain of AI that depicts
the capability of a machine to get and analyse visual information and
afterwards predict some decisions about it. The entire process
involves image acquiring, screening, analysing, identifying and
extracting information. This extensive processing helps computers to
understand any visual content and act on it accordingly.
AI for kids: As we all can see, kids nowadays are smart enough to
understand technology from a very early age. As their thinking
capabilities increase, they start becoming techno-savvy and eventually
they learn everything more easily than an adult.
Unit 2: AI Project Cycle
AI Project Cycle
The AI Project Cycle provides us with an appropriate framework that
can lead us towards the goal. The AI Project Cycle mainly has 5
stages:
AI Project Cycle
• You need to acquire data that will become the base of your
project as it will help you in understanding the parameters that
are related to problem scoping.
• You go for data acquisition by collecting data from various
reliable and authentic sources. Since the data you collect would
be in large quantities, you can try to give it a visual image of
different types of representations like graphs, databases, flow
charts, maps, etc. This makes it easier for you to interpret the
patterns which your acquired data follows.
• After exploring the patterns, you can decide upon the type of
model you would build to achieve the goal. For this, you can
research online and select various models which give a suitable
output.
• You can test the selected models and figure out which is the
most efficient one.
• The most efficient model is now the base of your AI project and
you can develop your algorithm around it.
• Once the modelling is complete, you now need to test your
model on some newly fetched data. The results will help you in
evaluating your model and improve it.
• Finally, after evaluation, the project cycle is now complete and
what you get is your AI project.
Who?
The “Who” block helps in analysing the people getting affected
directly or indirectly due to it. Under this, we find out who the
‘Stakeholders’ of this problem are and what we know about them.
Stakeholders are the people who face this problem and would be
benefitted from the solution.
What?
Under the “What” block, you need to look into what you have on
hand. At this stage, you need to determine the nature of the problem.
What is the problem and how do you know that it is a problem?
Under this block, you also gather evidence to prove that the problem
you have selected actually exists. Newspaper articles, Media,
announcements, etc are some examples.
Where?
Now that you know who is associated with the problem and what the
problem actually is; you need to focus on the
context/situation/location of the problem. This block will help you
look into the situation in which the problem arises, its context of it,
and the locations where it is prominent.
Why?
You have finally listed down all the major elements that affect the
problem directly. Now it is convenient to understand who the people
that would be benefitted by the solution are; what is to be solved; and
where will the solution be deployed.
Problem Statement
The Problem Statement Template helps us to summarise all the key
points into one single Template so that in the future, whenever there is
a need to look back at the basis of the problem, we can take a look at
the Problem Statement Template and understand the key elements of
it.
Data Features
Data features refer to the type of data you want to collect. There can
be various ways in which you can collect data. Some of them are:
• Surveys
• Web Scraping
• Sensors
• Cameras
• Observations
• API (Application Program Interface)
Stage 4: Modelling
The ability to mathematically describe the relationship between
parameters is the heart of every AI model. Thus, whenever we talk
about developing AI models, it is the mathematical approach to
analyzing data that we refer to. Generally, AI models can be classified
as follows:
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
Supervised Learning
In a supervised learning model, the dataset which is fed to the
machine is labelled. In other words, we can say that the dataset is
known to the person who is training the machine only then he/she is
able to label the data.
Unsupervised Learning
An unsupervised learning model works on unlabelled dataset. This
means that the data which is fed to the machine is random and there is
a possibility that the person who is training the model does not have
any information regarding it. The unsupervised learning models are
used to identify relationships, patterns and trends out of the data
which is fed into it. It helps the user in understanding what the data is
about and what are the major features identified by the machine in it.
2. Dimensionality Reduction:
We humans are able to visualise upto 3-Dimensions only
but according to a lot of theories and algorithms, there are
various entities that exist beyond 3-Dimensions. For
example, in Natural language Processing, the words are
considered to be N-Dimensional entities. Which means
that we cannot visualise them as they exist beyond our
visualisation ability. Hence, to make sense out of it, we
need to reduce their dimensions. Here, a dimensionality
reduction algorithm is used.
Stage 5: Evaluation
Once a model has been made and trained, it needs to go through
proper testing so that one can calculate the efficiency and
performance of the model. Hence, the model is tested with the help of
Testing Data (which was separated out of the acquired dataset at the
Data Acquisition stage) and the efficiency of the model is calculated
on the basis of the parameters mentioned below:
Neural Networks
Neural networks are loosely modelled by how neurons in the human
brain behave. The key advantage of neural networks are that they are
able to extract data features automatically without needing the input
of the programmer.
A neural network is essentially a system of organizing machine
learning algorithms to perform certain tasks. It is a fast and efficient
way to solve problems for which the dataset is very large, such as in
images.
Applications of Python
Python Statements
Instructions written in the source code to execute are known as
statements. These are the lines of code that we write for the computer
to work upon. For example, if we wish to print the addition of two
numbers, say 5 and 10, we would simply write: print(5+10)
Python Comments
Comments are the statements that are incorporated in the code to give
a better understanding of code statements to the user. To write a
comment in Python, one can use # and then write anything after it.
For example:
# This is a comment and will not be read by the machine.
print(5+10) # This is a statement and the machine will print the
summation.
In Python, there exist some words which are pre-defined and carry a
specific meaning for the machine by default. These words are known
as keywords. Keywords cannot be changed at any point in time and
should not be used any other way except the default one, otherwise,
they create confusion and might result in ambiguous outputs.
Keywords in Python
Identifier
An identifier is any word that is variable. Identifiers can be declared
by the user as per their convenience of use and can vary according to
the way the user wants. These words are not defined and can be used
in any way. Keywords cannot be used as identifiers. Some examples
of keywords can be: count, interest, x, ai_learning, Test, etc.
Identifiers are also case-sensitive hence an identifier named as Test
would be different from an identifier named test.
Variable
A variable is a named location used to store data in memory. It is
helpful to think of variables as a container that holds data that can be
changed later throughout programming. Just like in Mathematics, in
Python too we can use variables to store values in it. The difference
here is, that in Python, the variables not only store numerical values
but can also contain different types of data.
Variable Examples:
Datatype
All variables contain different types of data in them. The type of data
is defined by the term datatype in Python. There can be various types
of data that are used in Python programming. Hence, the machine
identifies the type of variable according to the value which is stored
inside it.
Python inputs
In Python, not only can we display the output to the user, but we can
also collect data from the user and can pass it on to the Python script
for further processing. To collect the data from the user at the time of
execution, input() function is used.
While using the input function, the datatype of the expected input is
required to be mentioned so that the machine does not interpret the
received data in an incorrect manner as the data taken as input from
the user is considered to be a string (sequence of characters) by
default.
Python Operators
Operators are special symbols that represent computation. They are
applied to operand(s), which can be values or variables. The same
operators can behave differently on different data types. Operators
when applied to operands form an expression. Operators are
categorized as Arithmetic, Relational, Logical and Assignment. Value
and variables when used with operators are known as operands.
Conditional Operators
Operator Meaning Expression Result
20 > 10 True
> Greater Than
15 > 25 False
20 < 45 True
< Less Than
20 < 10 False
5 == 5 True
== Equal To
5 == 6 False
67 != 45 True
!= Not Equal to
35 != 35 False
45 >= 45 True
>= Greater than or Equal to
23 >= 34 False
13 <= 24 True
<= Less than or equal to
13 <= 12 False
Arithmetic Operators
Operator Meaning Expression Result
+ Addition 10 + 20 30
- Subtraction 30 - 10 20
* Multiplication 30 * 100 300
/ Division 30 / 10 20.0
// Integer Division 25 // 10 2
% Remainder 25 % 10 5
** Raised to power 3 ** 2 9
Logical Operators
Operator Meaning Expression Result
True and True True
and And operator
True and False False
True or False True
or Or operator
False or False False
not False True
not Not Operator
not True False
Assignment Operators
Operator Expression Equivalent to
= X=5 X=5
+= X +=5 X=X+5
-= X -= 5 X=X-5
*= X *= 5 X=X*5
/= X /= 5 X=X/5
Conditional Statements
Conditional statements help the machine in taking a decision
according to the condition which gets fulfilled. There exist different
types of conditional statements in Python. Some of them are:
• If statement
• If-else statement
• If-else ladder
Looping
The loop statements help in iterating statements or a group of
statements as many times as it is asked for. In this case, we will
simply write a loop that would start counting from 1 to 10. At every
count, it will print hello once on the screen and as soon as it reaches
10, the loop will stop executing. All this can be done by just one loop
statement. Various types of looping mechanisms are available in
Python. Some of them are:
• For Loop
• While Loop
• Do-While Loop
Python Packages
A package is nothing but a space where we can find codes or
functions or modules of similar type. There are various packages
readily available to use for free (perks of Python being an open-
sourced language) for various purposes.
• NumPy
- A package created to work around numerical arrays in python.
- Handy when it comes to working with large numerical databases and
calculations around it.
• OpenCV
- An image processing package that can explicitly work around
images and can be used for image manipulation and processing like
cropping, resizing, editing, etc.
• NLTK
- NLTK stands for Natural Language Tool Kit and it helps in tasks
related to textual data.
- It is one of the most commonly used packages for Natural Language
Processing.
• Pandas
- A package that helps in handling 2-dimensional data tables in
python.
- It is useful when we need to work with data from excel sheets and
other databases.
Unit 4: Data Science
Data Sciences
It is a concept to unify statistics, data analysis, machine learning and
their related methods in order to understand and analyse actual
phenomena with data. It employs techniques and theories drawn from
many fields within the context of Mathematics, Statistics, Computer
Science, and Information Science.
Applications of Data Sciences –
Fraud and Risk Detection
Over the years, banking companies learned to divide and conquer data
via customer profiling, past expenditures, and other essential variables
to analyse the probabilities of risk and default. Moreover, it also
helped them to push their banking products based on customers’
purchasing power.
For this problem, a dataset covering all the elements mentioned above
is made for each dish prepared by the restaurant over a period of 30
days.
Data Exploration
After creating the database, we now need to look at the data collected
and understand what is required out of it. In this case, since the goal
of our project is to be able to predict the quantity of food to be
prepared for the next day, we need to have the following data:
Modelling
Once the dataset is ready, we train our model on it. In this case, a
regression model is chosen in which the dataset is fed as a dataframe
and is trained accordingly. Regression is a Supervised Learning
model which takes in continuous values of data over a period of time.
Evaluation
Once the model has been trained on the training dataset of 20 days, it
is now time to see if the model is working properly or not. Once the
model is able to achieve optimum efficiency, it is ready to be
deployed in the restaurant for real-time usage.
Data Collection
Data collection is nothing new that has come up in our lives. It has
been in our society for ages. Even when people did not have a fair
knowledge of calculations, records were still maintained in some way
or the other to keep an account of relevant things.
Data collection is an exercise that does not require even a tiny bit of
technological knowledge. But when it comes to analysing the data, it
becomes a tedious process for humans as it is all about numbers and
alpha-numerical data. That is where Data Science comes into the
picture.
Data Science not only gives us a clearer idea of the dataset but also
adds value to it by providing deeper and clearer analyses around it.
And as AI gets incorporated in the process, predictions and
suggestions by the machine become possible on the same.
Sources of Data
Offline Data Collection: Sensors, Surveys, Interviews, Observations.
Online Data Collection: Open-sourced Government Portals, Reliable
Websites (Kaggle), World Organisations’ open-sourced statistical
Observations websites
While accessing data from any of the data sources, the following
points should be kept in mind:
i. Data that is available for public usage only should be taken up.
ii. Personal datasets should only be used with the consent of the
owner.
iii. One should never breach someone’s privacy to collect data.
iv. Data should only be taken from reliable sources as the data
collected from random sources can be wrong or unusable.
v. Reliable sources of data ensure the authenticity of data which
helps in the proper training of the AI model.
Types of Data
For Data Science, usually, the data is collected in the form of tables.
These tabular datasets can be stored in different formats.
CSV
CSV stands for comma-separated values. It is a simple file format
used to store tabular data. Each line of this file is a data record and the
reach record consists of one or more fields that are separated by
commas. Since the values of records are separated by a comma, hence
they are known as CSV files.
Spreadsheet
A Spreadsheet is a piece of paper or a computer program that is used
for accounting and recording data using rows and columns into which
information can be entered. Microsoft Excel is a program that helps in
creating spreadsheets.
SQL
SQL is a programming language also known as Structured Query
Language. It is a domain-specific language used in programming and
is designed for managing data held in different kinds of DBMS
(Database Management Systems) It is particularly useful in handling
structured data.
Data Access
After collecting the data, to be able to use it for programming
purposes, we should know how to access the same in Python code. To
make our lives easier, there exist various Python packages which help
us in accessing structured data (in tabular form) inside the code.
NumPy
NumPy, which stands for Numerical Python, is the fundamental
package for Mathematical and logical operations on arrays in Python.
It is a commonly used package when it comes to working around
numbers. NumPy gives a wide range of arithmetic operations around
numbers giving us an easier approach to working with them. NumPy
also works with arrays, which are nothing but a homogenous
collection of Data.
NumPy Arrays
Lists
Pandas
Pandas is a software library written for the Python programming
language for data manipulation and analysis. In particular, it offers
data structures and operations for manipulating numerical tables and
time series. The name is derived from the term "panel data", an
econometrics term for data sets that include observations over
multiple time periods for the same individuals.
Pandas is well suited for many different kinds of data:
Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots
of arrays. Matplotlib is a multiplatform data visualization library built
on NumPy arrays.
Matplotlib comes with a wide variety of plots. Plots help to
understand trends, and patterns, and to make correlations. They’re
typically instruments for reasoning quantitative information.
Some types of graphs that we can make with this package are listed
below:
Not just plotting, but you can also modify your plots the way you
wish. You can stylise them and make them more descriptive and
communicable.
Basic Statistics with Python
Data Visualisation
Analysing the data collected can be difficult as it is all about tables
and numbers. While machines work efficiently on numbers, humans
need a visual aid to understand and comprehend the information
passed. Hence, data visualisation is used to interpret the data collected
and identify patterns and trends out of it.
Issues we can face with data:
• Erroneous Data: There are two ways in which the data can be
erroneous:
Incorrect values: The values in the dataset (at random places)
are incorrect.
Invalid or Null values: In some places, the values get corrupted
and hence they become invalid.
• Missing Data: In some datasets, some cells remain empty.
• Outliers: Data that do not fall in the range of a certain element
are referred to as outliers.
Bar Chart
It is one of the most commonly used graphical methods. From
students to scientists, everyone uses bar charts in some way or the
other. It is very easy-to-draw yet informative graphical representation.
Various versions of bar chart exist like a single bar chart, double bar
chart, etc
Histograms
Histograms are the accurate representation of continuous data. When
it comes to plotting the variation in just one entity of a period of time,
histograms come into the picture. It represents the frequency of the
variable at different points of time with the help of the bins.
Box Plots
When the data is split according to its percentile throughout the range,
box plots come in Haman. Box plots also known as box and whiskers
plot conveniently display the distribution of data throughout the range
with the help of 4 quartiles.
K-Nearest Neighbour
The k-nearest neighbours (KNN) algorithm is a simple, easy-to-
implement supervised machine learning algorithm that can be used to
solve both classification and regression problems. The KNN
algorithm assumes that similar things exist in close proximity. In
other words, similar things are near to each other as the saying goes
“Birds of a feather flock together”.
Some features of KNN are:
Pixel value: Each of the pixels that represent an image stored inside a
computer has a pixel value that describes how bright that pixel is,
and/or what colour it should be. The most common pixel format is the
byte image, where this number is stored as an 8-bit integer giving a
range of possible values from 0 to 255. Typically, zero is to be taken
as no colour or black and 255 is taken to be full colour or white.
RGB Images: All the images that we see around are coloured images.
These images are made up of three primary colours Red, Green and
Blue. All the colours that are present can be made by combining
different intensities of red, green and blue.
What is a Kernel?
A Kernel is a matrix, which is slid across the image and multiplied
with the input such that the output is enhanced in a certain desirable
manner. Each kernel has a different value for different kind of effects
that we want to apply to an image.
Convolution
Convolution Layer
It is the first layer of a CNN. The objective of the Convolution
Operation is to extract the high-level features such as edges, from the
input image. CNN need not be limited to only one Convolutional
Layer. Conventionally, the first Convolution Layer is responsible for
capturing the Low-Level features such as edges, colour, gradient
orientation, etc. With added layers, the architecture adapts to the
High-Level features as well, giving us a network that has a
wholesome understanding of images in the dataset.
Rectified Linear Unit Function
The next layer in the Convolution Neural Network is the Rectified
Linear Unit function or the ReLU layer. After we get the feature map,
it is then passed onto the ReLU layer. This layer simply gets rid of all
the negative numbers in the feature map and lets the positive number
stay as it is.
Pooling Layer
Similar to the Convolutional Layer, the Pooling layer is responsible
for reducing the spatial size of the Convolved Feature while still
retaining the important features. There are two types of pooling which
can be performed on an image.
Chatbots
Script-bot Smart-bot
Script bots are easy to make Smart bots are flexible and powerful
Script bots work around a script that is Smart bots work on bigger databases
programmed into them and other resources directly
Mostly they are free and are easy to
Smart bots learn with more data
integrate into a messaging platform
Coding is required to take this up on
No or little language processing skills
board
Limited functionality Wide functionality
All the assistants like Google Assistant, Alexa, Cortana, Siri, etc. can
be taken as smart bots as not only can they handle the conversations
but can also manage to do other tasks which makes them smarter.
Human Language VS Computer Language
Use cases:
Data Processing
Stemming: In this step, the remaining words are reduced to their root
words. In other words, stemming is the process in which the affixes of
words are removed and the words are converted to their base form.
Lemmatization: Stemming and lemmatization both are alternative
processes to each other as the role of both processes is the same –
removal of affixes. But the difference between both of them is that in
lemmatization, the word we get after affix removal (also known as
lemma) is a meaningful one. Lemmatization makes sure that a lemma
is a word with meaning and hence it takes a longer time to execute
than stemming.
Bag of Words: Bag of Words is a Natural Language Processing
model which helps in extracting features out of the text which can be
helpful in machine learning algorithms. In bag of words, we get the
occurrences of each word and construct the vocabulary for the corpus.
TFIDF: Term Frequency & Inverse Document Frequency
The bag of words algorithm gives us the frequency of words in each
document we have in our corpus. It gives us an idea that if the word is
occurring more in a document, its value is more for that document.
For example, if I have a document on air pollution, air and pollution
would be the words that occur many times in it. And these words are
valuable too as they give us some context around the document.
Term Frequency: Term frequency is the frequency of a word in one
document. Term frequency can easily be found from the document
vector table as in that table we mention the frequency of each word of
the vocabulary in each document.
Inverse Document Frequency: Now, let us look at the other half of
TFIDF which is Inverse Document Frequency. For this, let us first
understand what does document frequency mean. Document
Frequency is the number of documents in which the word occurs
irrespective of how many times it has occurred in those documents.
Finally, the formula of TFIDF for any word W becomes:
TFIDF(W) = TF(W) * log( IDF(W) )
Here, the log is to the base of 10.
Summarising the TFIDF concept, we can say that:
Applications of TFIDF
TFIDF is commonly used in the Natural Language Processing
domain. Some of its applications are:
Here, we can see in the picture that a forest fire has broken out in the
forest. The model predicts a Yes which means there is a forest fire.
The Prediction matches with the Reality. Hence, this condition is
termed as True Positive.
Here there is no fire in the forest hence the reality is No. In this case,
the machine too has predicted it correctly as a No. Therefore, this
condition is termed as True Negative.
Here the reality is that there is no forest fire. But the machine has
incorrectly predicted that there is a forest fire. This case is termed as
False Positive.
Here, a forest fire has broken out in the forest because of which the
Reality is Yes but the machine has incorrectly predicted it as a No
which means the machine predicts that there is no Forest Fire.
Therefore, this case becomes False Negative.
Here, total observations cover all the possible cases of prediction that
can be True Positive (TP), True Negative (TN), False Positive (FP)
and False Negative (FN).
But this parameter is useless for us as the actual cases where the fire
broke out are not taken into account. Hence, there is a need to look at
another parameter that takes account of such cases as well.
Precision: Precision is defined as the percentage of true positive
cases versus all the cases where the prediction is true. That is, it takes
into account the True Positives and False Positives.
If Precision is high, this means the True Positive cases are more,
giving lesser False alarms.
Recall
Another parameter for evaluating the model’s performance is Recall.
It can be defined as the fraction of positive cases that are correctly
identified. It majorly takesinto account the true reality cases where in
Reality there was a fire but the machine either detected it correctly or
it didn’t. That is, it considers True Positives (There was a forest fire in
reality and the model predicted a forest fire) and False Negatives
(There was a forest fire and the model didn’t predict it).
F1 Score
F1 score can be defined as the measure of the balance between
precision and recall.