Ai Notes
Ai Notes
problem scoping
problem scoping is the stage that begins after identifying the problem in developing the AI
projects. this is where we identify the project and set goals that we want our project to achieve.
all the other stages of the AI project cycle come after this phase, and without proper problem
scoping everything may turn out to be useless.
who canvas
the “who” block helps in analysing the people getting affected directly or indirectly due to it.
under this, we find out who the ‘stakeholders’ to this problem are and what we know about
them. stakeholders are the people who face this problem and would be benefitted with the
solution.
what canvas
under the “what” block, you need to look into what you have on hand. at this stage, you need to
determine the nature of the problem. what is the problem and how do you know that it is a
problem? you also gather evidence to prove that the problem you have selected actually exists,
for example, newspaper articles, media announcements, etc.
where canvas
this focuses on the context/situation/location of the problem. this block will help you look into the
situation in which the problem arises, the context of it, and the location where it is prominent.
why canvas
the the “why” canvas, think about the benefits which the stakeholders would get from the
solution and how it will benefit them as well as society.
data acquisition
as the name clearly mentions, this stage is about acquiring data for the project. data can be a
piece of information or facts and statistics collected together for reference or analysis. whenever
we want an AI project to be able to predict an output, we need to train it first using data.
data classification
● basic
i) textual
made up of characters and, numbers, but we cannot perform calculations on these numbers. for
example: mickey, donald, grapes, ad43et, etc.
ii) numerical - discrete and continuous
involves the use of numbers. this is the data on which we can perform mathematical operations
such as addition, subtraction, multiplication, etc.
discrete data: contains only whole numbers
continuous data: with fractions and decimals
● structured
i) structured data
this type of data has a predefined data model, and it is organised in a predefined manner. train
schedules, mark sheets are common examples of this form of data.
ii) unstructured data
this type of data does not have any predefined structure. it can take any form. Videos, audios,
emails, documents, etc. are some examples of this form of data.
iii) semi-structured data
this type of data has a structure that has the qualities of both structured as well as unstructured
data. this data is not organised logically.
data features
data features refer to the type of data, which needs to be collected. for example, salary amount,
increment percentage, increment period, etc. ways in which you can collect data: surveys, web
scrapping, sensors, cameras, observations, application program interface (API).
data exploration
data exploration refers to techniques and tools that are used for identifying important patterns
and trends. we can do it through data visualisation or by adopting sophisticated statistical
methods. to analyse data, you need to visualise it in some user-friendly format so you can:
● quickly get a sense of the trends, relationships, and patterns contained within the data.
● define stratergy for which model to use at a larger stage
● communicate the same to other effectively. to visualize data, we can use various types of
visual representation.
modelling
based on trends and patterns regarding data exploration, the next important thing is to use data
for making predictions or future forecasts. we can do it through different data and modelling
methods. for example: decision trees.
decision tree
for the classification of data, one of the simplest methods used today is the ‘decision tree’. we
can use a tree-like model of decisions along with all the possible consequences.
supervised learning
in a supervised learning model, the dataset which is fed to the machine is labelled. we can say
that the dataset is known to the person who is training the machine only then he/she is able to
label the data. for example: students get grades according to the marks they secure in
examinations. these grades are labels which categorise the students according to their marks.
there are two types of supervised learning models: classification and regression.
classification
where the data is classified according to the labels. for example, in the grading system, students
are classified on the basis of the grades they obtain with respect to their marks in the
examination. works on discrete dataset.
regression
here, the data which has been fed to the machine is continuous. for example, if you wish to
predict your next salary, then you would put in the data for your previous salary, any increments,
etc., and would train the model.
The main difference between Regression and Classification algorithms that
Regression algorithms are used to predict the continuous values such as price, salary, age,
etc. and Classification algorithms are used to predict/Classify the discrete values such as
Male or Female, True or False, Spam or Not Spam, etc
output variable must be of continuous nature the output variable must be a discrete value
or real value
the task of regression algorithm is to map the the task of classification algorithm is to the
input value (x) with the continuous output map the input value (x) with the discrete
variable (y) output variable (y)
used to solve problems such as weather used to solve problems such as identification
prediction, house price prediction, etc. of spam emails, speech recognition,
identification of cancer cells, etc.
unsupervised learning
unsupervised learning model works on unlabeled dataset. this means that the data which is fed
to the machine is random and there is a possibility that the person who is training the model
does not have any information regarding it. the unsupervised learning models are used to
identify relationships, patterns and trends out of the data which is fed into it. unsupervised
learning can be further divided into clustering.
clustering
refers to the unsupervised learning algorithm which can cluster the unknown data according to
the patterns or trends identified out of it. The patterns observed might be the ones which are
known to the developer or it might even come up with some unique patterns out of it.
applications of clustering:
● netflix has used clustering in implementing movie recommendations for its users.
● news summarization can be performed using clustering analysis where articles can be divided
into a group of related topics.
● document clustering is being effectively used in preventing the spread of fake news on social
media.
reinforcement learning
for every right action or decision of an algorithm, it is rewarded with a positive reinforcement. on
the other hand, for every wrong action, it is given a negative reinforcement. in this way, it learns
about the nature of actions that needed to be preformed and which need not be done. this type
of learning can assist in industrial automation.
dimentionality reduction
the number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction. it is
a way of converting the higher dimensions dataset into lesser dimensions dataset ensuring that
it provides similar information. it is commonly used in the fields that deal with high-dimension
data, such as speech recognition, data visualization, noise reduction, cluster analysis,
bioinformatics, etc.
evaluation
once a model has been made and trained, it needs to go through proper testing so that one can
calculate the efficiency and performance of the model. Hence, the model is tested with the help
of testing data and efficiency of the model is calculated on the basis of the following parameters:
accuracy, precision, recall and F1 score.
neural networks
neural networks are loosely modelled after how neurons in the human brain behave. The key
advantage of neural networks are that they are able to extract data features automatically
without needing the input certain tasks. it is a fast and efficient way to solve problems for which
the dataset is very large, such as in images.