0% found this document useful (0 votes)
2K views50 pages

Unit 2 AI - AI Project Cycle

The document discusses the AI project cycle and its key components. It notes that the AI project cycle mainly has 5 stages: 1) Problem Scoping, 2) Data Acquisition, 3) Data Exploration, 4) Modelling, and 5) Evaluation. It provides details about each stage, including that problem scoping involves understanding the problem, data acquisition is collecting accurate and reliable data from various sources, and data exploration involves arranging the gathered data uniformly for analysis. It also discusses techniques for problem scoping like the 4Ws canvas and problem statement template, and classifications of data like structured vs. unstructured.

Uploaded by

harrshanrmt20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views50 pages

Unit 2 AI - AI Project Cycle

The document discusses the AI project cycle and its key components. It notes that the AI project cycle mainly has 5 stages: 1) Problem Scoping, 2) Data Acquisition, 3) Data Exploration, 4) Modelling, and 5) Evaluation. It provides details about each stage, including that problem scoping involves understanding the problem, data acquisition is collecting accurate and reliable data from various sources, and data exploration involves arranging the gathered data uniformly for analysis. It also discusses techniques for problem scoping like the 4Ws canvas and problem statement template, and classifications of data like structured vs. unstructured.

Uploaded by

harrshanrmt20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Unit 2

AI Project Cycle
Define project?
A project is defined as a sequence of tasks that must be completed to attain a certain
outcome.
What makes a project successful?
Successful projects are those that
1) Meet business requirements,
2) Are delivered and maintained on schedule,
3) Are delivered and maintained within budget, and
4) Deliver the expected business value and return on investment.
Define project cycle ?
Project Cycle is a step-by-step process to solve problems using proven scientific methods
and drawing inferences about them.
Difference between IT project and AI Project?
IT PROJECT AI PROJECT
Components of the AI Project Cycle

Problem Scoping : Understanding the Problem.


Data Acquisition :Collecting accurate &
reliable data.
Data Exploration : Arranging the data
uniformly.
Modelling : Creating Models from the data.
Evaluation : Evaluating the project.
The A1 Project Cycle mainly has 5 stages. They are
1. Problem Scoping
Problem Scoping refers to understanding a problem finding out various factors which affect the problem,
define the goal or aim of the project.
2.Data Acquisition
It is the process of collecting accurate and reliable data to work with. Data Can be in the format of the
text, video, images, audio, and so on and it can be collected from various sources like interest, journals,
newspapers, and so on.
3.Data Exploration
Data Exploration is the process of arranging the gathered data uniformly for a better understanding.
Data can be arranged in the form of a table, plotting a chart, or making a database.
4.Modelling
Modelling is the process in which different models based on the visualized data can be created and even
checked for the advantages and disadvantages of the model.
5. Evaluation
Evaluation is the method of understanding the reliability of an Model and is based on the outputs
which are received by feeding the data into the model and comparing the output with the actual
answers.
What is Problem Scoping?
Problem scoping is the first and elementary stage of Project. Problem scoping is the
process of understanding a problem, determining the various factors that affect the
problem, and defining the project’s purpose.

How can you identify the problem scoping from the project?
Problem scoping begins after identifying the problem in developing the AI projects.
The AI project begins with the
❖ Defining of problem,
❖ Followed by brainstorming,
❖ Designing,
❖ Building,
❖ Testing
❖ Concludes with sharing or showcasing the task.
The 4Ws canvas help in identifying the problem and understanding the
problem in a better and efficient manner.
The 4Ws include
Who? : The Who canvas tells us who are suffering from the problem and who are the
stakeholders.
What? :The What canvas gives us the information about the nature of the problem.
Where? :The Where canvas tells us where does the problem arise.
Why? : The Why canvas tells us why we need to solve the problem and what are the
benefits which the stakeholders would get from the solution .

What is the 4Ws Problem canvas in AI?


Who, What, Where, and Why are the 4Ws of Problem
Scoping. These four techniques aid in identifying and
comprehending the problem.
What is the problem statement template? (PST)
Make a summary of what you’ve learnt once you’ve finished the above 4Ws. This summary is
known as the problem statement template. This template gathers all of the relevant information
in one location.
The PST help us to summarize all the key points into one single Template so that in future,
whenever there is a need to look back at the basis of the problem, we can take a look at it and
understand the key elements of it.
1.What is AI Project Cycle?
2.How many stages does AI Project cycle have? Explain the
components and function of each stage with a diagram.
3.Which technique helps in identifying and understanding the
problem in a better and efficient manner in the stage of
Problem scoping?. What are its different parts? Explain.
4.What is the problem statement template?
What is Data Acquistion?
• Data acquisition is the second step in the project cycle,
• Data Acquisition is the process of collecting accurate and reliable data to work
with.
• It refers to collecting data from various sources and through various activities to
train the model.
• The data which is collected as input can be considered as training data and the
prediction data provided by the system or project is known as testing data.

What is Data Acquisition Important?


• we should ensure the data collected is collected from authentic and reliable sources
for effective Decision Making.
• Biased Data will create Bias Decisions
Classification of Data

Basic Data
Basically, data is classified into two categories:
1.Numeric Data: Mainly used for computation.
Numeric data can be classified into the following:
1. Discrete Data : It contains integer numeric
data. It doesn’t have any decimal or fractional
value. The countable data can be considered as
discrete data. For example 132 customers, 126
Students etc.
2. Continuous Data: It represents data with any
range. The uncountable data can be represented
in this category. For example 10.5 Kgs, 100.50
Kms etc.
2.Text Data: mainly used to represent names,
collection of words together, phrases, textual
information etc.
Structural Classification
The data which is going to be feed in the system to train the model or already fed in the system can have
a specific set of constraints or rules or unique pattern can be considered as structural data.

The structure classification is divided into 3 categories:


1.Structured Data: Structured data can have a specific pattern or set of rules. These data have a simple
structure and stores the data in specific forms such as tabular form.
Example, The cricket scoreboard, School time table, Exam datasheet etc.

2.Unstructured Data: The data structure which doesn’t have any specific pattern or constraints as well
as can be stored in any form is known as unstructured data. Mostly the data that exists in the world is
unstructured data.
Example, YouTube Videos, Facebook Photos, Dashboard data of any reporting tool etc.

3.Semi-Structured Data: It is the combination of both structured and unstructured data. Some data can
have a structure like a database whereas some data can have markers and tags to identify the structure
of data. Example : Emails
Other Classification
This classification is sub divided into the following
branches:
1. Time-Stamped Data: This structure helps the system
to predict the next best action. It is following a
specific time-order to define the sequence. This time
can be the time of data captured or processed or
collected.

2. Machine Data: Machine data is digital information


created by the activity of computers, mobile phones,
embedded systems and other networked devices. Such
data became more prevalent as technologies such as
radio frequency identification (RFID)
3.Spatiotemporal Data: Spatiotemporal data are data that relate to both space and time. It
records the location through GPS and time-stamped data where the event is captured or data is
collected.

4.Open Data: It is freely available data for everyone. Anyone can reuse this kind of data.There is
no copyrights, patents and control over it . Eg: www.data.gov.in , www.india.gov.in
5.Real-time Data: The data which is available with the event is considered as real-time data.
6.Big Data: Big data refers to extremely large and diverse collections of structured, unstructured,
and semi-structured data that continues to grow exponentially over time. These
datasets are so huge and complex in volume, velocity, and variety, that traditional.
data management systems cannot store, process, and analyze them..
Characteristics of Big data
1.Volume: It refers to the number of big data in terms of size. It is in
very big size or voluminous data.
2.Velocity: It refers to the rate of data transfer at very high speed.
3.Variety: Data is available in different forms like structured,
unstructured, and semistructured. It can be different files like text,
images, videos, web pages, etc.
Dataset
Dataset is a collection of data in tabular format. Dataset contains numbers or values that are related to
a specific subject. For example, students’ Exam scores in a class is a dataset.
The dataset is divided into two parts
a. Training dataset - Training dataset is a large dataset that teaches a machine learning model. Machine
learning algorithms are trained to make judgments or perform a task through training datasets.
Maximum part of the dataset comes under training data (Usually 80%)
b. Test dataset - Data that has been clearly identified for use in tests, usually of a computer program, is
known as test data. 20% of data used in test data
Data Can be in the format of the text, video, images, audio, and so on and it can be collected from various
source like interest, journals, newspapers and so on.
Concerns that need to be taken care of while collecting data are
• The data should be authentic
• The data should be accurate
• Collect the data from reliable sources
• Data should be open source not someone’s intellectual property
What are the various ways to collect data?
There are six ways to collect data from Reliable Sources.
a. Surveys
A research method for gathering data from a predetermined sample of
respondents in order to get knowledge and insights into a variety of
issues. Example a census survey is conducted every year for analyzing
the population.

b. Cameras
We can collect raw visual data with the help of cameras, this data is
unstructured data that can be analyzed via Machine learning.

c. Web Scraping
Web Scraping is a technique used for collecting structured data from
web using some technologies, such as news monitoring, market research,
and price tracking. E.g. Beautiful Soup is a Python library that is used
for web scraping purposes to pull the data out of HTML and XML
files.
d. Observation
Some of the information we can gather through attentive observation
and monitoring.
e. Sensors
With the help of sensors also we can collect the data. A device that
detects or measures a physical property are called sensors, such as
biomatrix.
f. Application program interface
An API is a software interface that enables two apps to communicate with
one another.
JSON is a text-based data format that is used to store
and transfer data. JavaScript Object Notation,
more commonly known by the acronym JSON,
is an open data interchange format that is both
human and machine-readable. is commonly used for
transmitting data in web applications
Data Features
Data features refer to the type of data that needs to be collected for an AI model or project.
1.Training Data: The collected data are fed into the system and its known as training data. In
other words its the input given by the user to the system and it can be considered as training data.
Training data is also known as a training set, training dataset or learning set.
2.Testing Data: Test Data are the data that is the input given to a software program during test
execution or evaluation.
Big data :
The data that increases the volume day by day is called as Big data . It is in huge amount so
cannot be processed or analyzed using conventional methods. Some of the popular terms associated
with big data for processing are
▪ Data Mining
▪ Data Storage
▪ Data Analysis
▪ Data Sharing
▪ Data Visualization
System Map
▪ A system map is a visual representation of set of things working together. It shows the
components and boundaries of a system and the components of the environment at a specific
point in time.
▪ The specific components of a system map can vary depending on the complexity and purpose of
the system being represented
▪ Systems mapping is an effective tool that we can use for understanding and redesigning systems.
▪ A system map shows the components and boundary of a system and the components of the
environment at a point in time.
Use of System Map
▪ System Map helps us to find relationships between different elements of the problem which we
have scoped.
▪ System Map helps in strategizing the solution for achieving the goal of our project.
▪ System Map is used to understand complex issues with multiple factors that affect each other.
▪ The main use of a system map is to help structure a system and communicate the result to others.
Animated Tool used for drawing and understanding System tool is :
https://ncase.me/loopy/
It’s a tool used for making interactive simulation.
Example : https://tinyurl.com/Loopysample

Components of System Map


S.No Component Represents
1. Circle Elements of the System
2. Arrows Relationship
3. Longer arrow Longer time for a change to happen. Also called as time delay.
4. Arrow with + sign Both the elements are directly related to each other
5. Arrow with - sign Both the elements are inversely related to each other
System Map to show the stress Management.

System Map to show the effect of increase


in number of vehicles on the road..
1.What is Data Acquisition? Why is it important?
2.What is the use of data in an Al project?
3.What is the different types of data and how is it classified?
4.What are the various ways to collect data? Explain
5.Is there any problem in extracting private data?
6.What is a system map? List out the components used in
System map
What is Data Exploration ?
Data exploration is the 3rd stage of Project Cycle and it is the initial step in data analysis, where users
explore a large data set using some techniques and tools to visualize data.

Why is Data Exploration important ?


Data Exploration helps us to gain a better understanding of a dataset before working with it.
Exploration allows for deeper understanding of a dataset, making it easier to navigate and use the data
later. The better an analyst knows the data they’re working with, the better their analysis will be.

How do we Explore data?


Data exploration is typically conducted using a combination of automated and manual activities.

What is Data visualization?


Data visualization is the graphical representation of information and data. By using visual elements like
charts, graphs, and maps, data visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in Big data. Data visualization charts can help people to understand large
volumes of information.
Types of data exploration
There are two main types of data exploration tools and techniques:
1. Manual data exploration
2. Automated data exploration.
Sample of Data Exploration , which shows a combination of Manual and Automated. Mark Entries
are made Manually and Graph Generation is Automated
For example consider that if we have data of 30 students in a class, we have their Name, Date of
Birth, Mobile Number, D.No etc.
In the process of data exploration, we can make a chart for that data in which all the names will be at
one place and all the mobile numbers at one, etc.
1. MS Excel
▪ It’s the most popular tool used for visualizing the information
and data in a graphical representation form.
▪ Excel users can fast analyze the data and represent data using
charts which makes complex data analysis easier to understand.
Excel has a variety of charts and it many built-in formulas
2.Google Charts
▪ Google Charts is an interactive Web service that creates
graphical charts from user-supplied information.
▪ Google chart tools are powerful, simple to use.
▪ Google Charts provides a perfect way to visualize data on the
website.
▪ Google Chart is a preferred application for collaboration work.
3. Tableau
▪ Tableau software is a popular data visualization and business
intelligence (BI) tool being used across different industries.
▪ With the help of Tableau, the raw data sets can be transformed
into an easily understandable format to ensure an easy
interpretation of the data.
▪ Tableau helps people and organizations be more data-driven.
▪ Tableau has a very large customer base of 57,000+ accounts
across many industries due to its simplicity of use and ability to
produce interactive visualizations .
4. Fusion Charts
This is a very widely-used, JavaScript-based charting and
visualization package that has established itself as one of the
leaders in the paid-for market.
It can produce 90 different chart types and integrates with a large
number of platforms and frameworks giving a great deal of
flexibility.
5. High charts
A simple options structure allows for deep customization, and styling
can be done via JavaScript or CSS. Highcharts is also extendable and
pluggable for experts seeking advanced animations and functionality.

List down 5 new data visualisation techniques which you learnt from

Link to visualisation website: https://datavizcatalogue.com/


1.What is Data Exploration and why is it important in data analysis?
2.What is Data visualization?
3.How can we represent data visually? Can you name a few types of charts or
graphs used for data visualization?
4.Differentiate between the following types of Graphs/Charts with examples:
a) Bar graph and Pie Chart
b) Line Graph and Bar Graph
5. What is the purpose of data exploration in real-world scenarios, such as
business, science, or social studies?
DATA MODELLING
4th Stage of AI Project Cycle -Modelling
What is Modeling?
▪ In AI Modelling refers to developing algorithms which can be trained to get intelligent
outputs. Its also called as models,by writing codes we can make a machine artificially
intelligent.
▪ An AI model is a program that has been trained to recognize patterns using a set of data.
▪ AI models are developed using various statistical methods. Some of the algorithms used are
➢ Decision trees
➢ Linear regression
➢ Logistic regression
➢ Naïve Bayes
➢ Random Forest
➢ K-Nearest Neighbors
➢ Support Vector Machines
Once the data is visualized and trends are formed, we need to work with algorithms to prepare the Al
model This can be done by designing our own models or using the existing Al models. Before we go into
the details of modelling, let us first understand the following important terms:
Definition for the terms:
1.ArtificialIntelligence:AI,refers to any technique
that enables computers to mimic human intelligence.
The AI-enabled machines think algorithmically and
execute what they have been asked produce output
intelligently. Eg: Voice Assistant, Robots
2.MachineLearning:ML,the machine learns from its
mistakes and takes them in to consideration. It
improvises using its own experiences. Machine
Learning enables machines to improve at tasks with
experience. Eg:Netflix and Youtube Recommendations
3.Deep Learning: DL, In deep learning, the machine is trained with huge amounts of data which helps it
into training itself around the data. Such machines are intelligent enough to develop algorithms for
themselves. Eg: Google Translator, Image Recognition in Google lens
Deep Learning is the most advanced form of Artificial Intelligence out of these three. Then comes Machine
Learning which is intermediately intelligent and Artificial Intelligence is an umbrella term that holds ML
and DL. AI covers all the concepts and algorithms which, in some way or the other mimic human
intelligence.
Difference between AI , Machine Learning and Deep Learning
Artificial Intelligence Machine Learning Deep Learning
Artificial Intelligence is the Machine Learning is a subset of Deep Learning is a subset of machine
concept of creating smart artificial intelligence that helps you learning that uses vast volumes of data and
intelligent machines. build AI-driven applications. complex algorithms to train a model.
It is the simulation of It trains machine to take decision It uses neural networks to solve the
intelligence in the machine based on the experience complex problems.
Eg: AI Robots,Voice Eg: Sales forecasting for different Eg: Cancer tumor detection, Image coloring
Assistant Apple’s Siri, products ,Fraud analysis in banking Object detection, Music Generation.
Cortana etc. Product recommendations, Stock price
prediction
Approaches of AI Model
In general, there are two approaches taken by researchers when building AI models. They are
Rule Based Approach
▪ Under Rule based approach, the developer feeds in data along with some ground rules to the model. The
model gets trained with these inputs and gives out answers in the form of predictions.
▪ A rule-based artificial intelligence produces pre-defined outcomes that are based on a set of certain rules
coded by humans.
▪ This approach is commonly used when we have a known dataset or labelled dataset.
▪ Rules-based systems are perfectly suited to projects and
applications that require small amounts of data and simple,
straight forward rules.
▪ Once trained Rule based model cannot improvise itself based
on the feedback and learning of the machine is static.
▪ Rules-based systems are often used in processes where errors
cannot be tolerated
E.g. Finance processing, Medical Diagnosis
2. Learning Based Approach
▪ Machine Learning approach the developer feeds in data along with the
answers. The machine then designs its own algorithms and
methodologies to match the data with answers and gives out the rules.
▪ The machine learning system defines its own set of rules that are based on data outputs.
▪ This approach is commonly used when the data is unknown/random or unlabelled dataset.
▪ ML systems are commonly used when large volumes of relevant data records are available for making
more accurate predictions.
▪ For any processes that have multiple factors, situations, numerous potential outcomes, ML systems
are your best fit. E.g. Recommendation systems
Decision Tree
▪ A decision tree is a decision support tool that uses a tree-like
model of decisions and their possible consequences, including
chance event outcomes.
▪ Decision Trees is a rule based approach AI model which helps
the machine in predicting what an element is with the help of
various rules.
▪ A decision tree follows a set of if-else conditions to visualize
the data and classify it according to the conditions.
▪ Decision trees are extremely useful for data analytics and
machine learning because they break down complex data into
more manageable parts. They're often used in these fields for
prediction analysis, data classification, and regression.
Common terms used in Decision Tree are stated below:
• Root Node - The beginning point of any Decision Tree is
known as its Root. The root node is always the top node
of a decision tree. It represents the entire data sample.
• Splitting - Division of nodes is called splitting or
diverge.
• Branches - Division of the whole tree is called branches.
Its represented in the form of arrows. Branches diverges
Based on the condition. The branches either lead to
another question, or they lead to a decision
• Terminal Node - Node that does not split further is called a terminal node or leaf node. It’s the final
result.
• Decision Node - It is a node that also gets further divided into different sub-nodes based on the
conditions. It is also termed as Interior node. They contain at least two branches.
• Parent and Child Node - When a node gets divided further then that node is termed as parent node
whereas the divided nodes or the sub-nodes are termed as a child node of the parent node.
• Subtree : Subsection of the decision tree is known as subtree
Draw a decision tree to show if you are accepting or rejecting the job based on the
salary and facility provided . • While making Decision Trees, one should take a
good look at the dataset given to them and try to
figure out what pattern does the output leaf
follow. Try selecting any one output and on its
basis, find out the common links which all the
similar outputs have.
● Many times, the dataset might contain redundant
data which does not hold any value while creating
a decision tree. Hence, it is necessary that you note
down which are the parameters that affect the
output directly and should use only those while
creating a decision tree.
● There might be multiple decision trees which lead to
correct prediction for a single dataset. The one
which is the simplest should be chosen as the best.
What is Supervised Learning?
Supervised Learning is the machine learning approach Supervised Learning ~ trains the model with labeled
dataset to find connections
Supervised Machine Learning Methods
There are two main areas where supervised machine learning comes in
handy:
1.classification problems
2. regression problems.
Classification
Classification refers to taking an input value and mapping it to a
discrete value. This Algorithm is used to predict or classify the discrete
values such as Male or Female, True or False, Spam or Not Spam, etc.
Classification works on Discrete dataset. Regression Classification

Regression
Regression is related to continuous data (value functions). This Algorithm
is used for understanding and predicting the continuous values such as
price, salary, age, etc. Regression works with continuous dataset, Eg: if
we want to predict the salary of an employee, we can use his past salaries
as training data and can predict his next salary.
Supervised Machine Learning Applications
• Predictive analytics (house prices, stock exchange prices, etc.)
• Text recognition
• Spam detection
• Customer sentiment analysis
• Object detection (e.g. face detection)
What is Unsupervised Learning?
Unsupervised Learning is a type of machine learning in which
machine trains with unlabeled dataset and the machine predicts
the output without any supervision.
Applications of Unsupervised
Learning Algorithms
• Fraud detection
• Malware detection
• Identification of human errors
during data entry
• Conducting accurate basket
analysis, etc.
Reinforcement Learning ~ It works on feedback process. In Difference between Supervised and Unsupervised Learning
this model AI software get rewarded for each good action
and gets punished for each bad action , hence the goal of
reinforcement learning is to maximize the rewards. E.g.
Video Gaming points
1. What is Data Modelling? How is it important?
2. What makes a machine intelligent?
3. Can Artificial Intelligence be a threat to Human Intelligence? How?
3. What are the different approaches used in AI Modelling? Explain.
3. Differentiate between Rule Based Approach and Learning Based
Approach with an example.
4. Explain the use of decision trees.
5. What are the Common terms used in Decision trees?
Pixel
The full form of the pixel is "Picture Element."Pixel is the smallest element of an image
on a computer display,
Pixel It
Pixel It activity is an example of how computers see images, process them and classify them.
This kind of Machine Learning approach is commonly used in Computer Vision related
applications.
Steps in image processing
▪ Every image which is fed to the computer is divided into pixels (which are
the smallest unit of an image).
▪ The Computer analyses each pixel and if it has to compare 2 pictures to
check if they are similar or not, pixel-wise comparison takes place.
▪ If pixels are identical, this means that the images are the same.
If we zoom the image we can see the pixels
5th Stage of AI Project Cycle -Evaluation
What is Evaluation?
▪ After a Model has been created and trained , It must be thoroughly tested in order to
determine its efficiency and performance which is termed as evaluation.
▪Model evaluation is the process of using different evaluation metrics to understand a
machine learning model's performance, as well as its strengths and weaknesses.
▪Evaluation helps us to identify the areas where the future research is required.
▪After the Evaluation stage the project will be deployed .

You might also like