Unit 2 AI - AI Project Cycle
Unit 2 AI - AI Project Cycle
AI Project Cycle
Define project?
A project is defined as a sequence of tasks that must be completed to attain a certain
outcome.
What makes a project successful?
Successful projects are those that
1) Meet business requirements,
2) Are delivered and maintained on schedule,
3) Are delivered and maintained within budget, and
4) Deliver the expected business value and return on investment.
Define project cycle ?
Project Cycle is a step-by-step process to solve problems using proven scientific methods
and drawing inferences about them.
Difference between IT project and AI Project?
IT PROJECT AI PROJECT
Components of the AI Project Cycle
How can you identify the problem scoping from the project?
Problem scoping begins after identifying the problem in developing the AI projects.
The AI project begins with the
❖ Defining of problem,
❖ Followed by brainstorming,
❖ Designing,
❖ Building,
❖ Testing
❖ Concludes with sharing or showcasing the task.
The 4Ws canvas help in identifying the problem and understanding the
problem in a better and efficient manner.
The 4Ws include
Who? : The Who canvas tells us who are suffering from the problem and who are the
stakeholders.
What? :The What canvas gives us the information about the nature of the problem.
Where? :The Where canvas tells us where does the problem arise.
Why? : The Why canvas tells us why we need to solve the problem and what are the
benefits which the stakeholders would get from the solution .
Basic Data
Basically, data is classified into two categories:
1.Numeric Data: Mainly used for computation.
Numeric data can be classified into the following:
1. Discrete Data : It contains integer numeric
data. It doesn’t have any decimal or fractional
value. The countable data can be considered as
discrete data. For example 132 customers, 126
Students etc.
2. Continuous Data: It represents data with any
range. The uncountable data can be represented
in this category. For example 10.5 Kgs, 100.50
Kms etc.
2.Text Data: mainly used to represent names,
collection of words together, phrases, textual
information etc.
Structural Classification
The data which is going to be feed in the system to train the model or already fed in the system can have
a specific set of constraints or rules or unique pattern can be considered as structural data.
2.Unstructured Data: The data structure which doesn’t have any specific pattern or constraints as well
as can be stored in any form is known as unstructured data. Mostly the data that exists in the world is
unstructured data.
Example, YouTube Videos, Facebook Photos, Dashboard data of any reporting tool etc.
3.Semi-Structured Data: It is the combination of both structured and unstructured data. Some data can
have a structure like a database whereas some data can have markers and tags to identify the structure
of data. Example : Emails
Other Classification
This classification is sub divided into the following
branches:
1. Time-Stamped Data: This structure helps the system
to predict the next best action. It is following a
specific time-order to define the sequence. This time
can be the time of data captured or processed or
collected.
4.Open Data: It is freely available data for everyone. Anyone can reuse this kind of data.There is
no copyrights, patents and control over it . Eg: www.data.gov.in , www.india.gov.in
5.Real-time Data: The data which is available with the event is considered as real-time data.
6.Big Data: Big data refers to extremely large and diverse collections of structured, unstructured,
and semi-structured data that continues to grow exponentially over time. These
datasets are so huge and complex in volume, velocity, and variety, that traditional.
data management systems cannot store, process, and analyze them..
Characteristics of Big data
1.Volume: It refers to the number of big data in terms of size. It is in
very big size or voluminous data.
2.Velocity: It refers to the rate of data transfer at very high speed.
3.Variety: Data is available in different forms like structured,
unstructured, and semistructured. It can be different files like text,
images, videos, web pages, etc.
Dataset
Dataset is a collection of data in tabular format. Dataset contains numbers or values that are related to
a specific subject. For example, students’ Exam scores in a class is a dataset.
The dataset is divided into two parts
a. Training dataset - Training dataset is a large dataset that teaches a machine learning model. Machine
learning algorithms are trained to make judgments or perform a task through training datasets.
Maximum part of the dataset comes under training data (Usually 80%)
b. Test dataset - Data that has been clearly identified for use in tests, usually of a computer program, is
known as test data. 20% of data used in test data
Data Can be in the format of the text, video, images, audio, and so on and it can be collected from various
source like interest, journals, newspapers and so on.
Concerns that need to be taken care of while collecting data are
• The data should be authentic
• The data should be accurate
• Collect the data from reliable sources
• Data should be open source not someone’s intellectual property
What are the various ways to collect data?
There are six ways to collect data from Reliable Sources.
a. Surveys
A research method for gathering data from a predetermined sample of
respondents in order to get knowledge and insights into a variety of
issues. Example a census survey is conducted every year for analyzing
the population.
b. Cameras
We can collect raw visual data with the help of cameras, this data is
unstructured data that can be analyzed via Machine learning.
c. Web Scraping
Web Scraping is a technique used for collecting structured data from
web using some technologies, such as news monitoring, market research,
and price tracking. E.g. Beautiful Soup is a Python library that is used
for web scraping purposes to pull the data out of HTML and XML
files.
d. Observation
Some of the information we can gather through attentive observation
and monitoring.
e. Sensors
With the help of sensors also we can collect the data. A device that
detects or measures a physical property are called sensors, such as
biomatrix.
f. Application program interface
An API is a software interface that enables two apps to communicate with
one another.
JSON is a text-based data format that is used to store
and transfer data. JavaScript Object Notation,
more commonly known by the acronym JSON,
is an open data interchange format that is both
human and machine-readable. is commonly used for
transmitting data in web applications
Data Features
Data features refer to the type of data that needs to be collected for an AI model or project.
1.Training Data: The collected data are fed into the system and its known as training data. In
other words its the input given by the user to the system and it can be considered as training data.
Training data is also known as a training set, training dataset or learning set.
2.Testing Data: Test Data are the data that is the input given to a software program during test
execution or evaluation.
Big data :
The data that increases the volume day by day is called as Big data . It is in huge amount so
cannot be processed or analyzed using conventional methods. Some of the popular terms associated
with big data for processing are
▪ Data Mining
▪ Data Storage
▪ Data Analysis
▪ Data Sharing
▪ Data Visualization
System Map
▪ A system map is a visual representation of set of things working together. It shows the
components and boundaries of a system and the components of the environment at a specific
point in time.
▪ The specific components of a system map can vary depending on the complexity and purpose of
the system being represented
▪ Systems mapping is an effective tool that we can use for understanding and redesigning systems.
▪ A system map shows the components and boundary of a system and the components of the
environment at a point in time.
Use of System Map
▪ System Map helps us to find relationships between different elements of the problem which we
have scoped.
▪ System Map helps in strategizing the solution for achieving the goal of our project.
▪ System Map is used to understand complex issues with multiple factors that affect each other.
▪ The main use of a system map is to help structure a system and communicate the result to others.
Animated Tool used for drawing and understanding System tool is :
https://ncase.me/loopy/
It’s a tool used for making interactive simulation.
Example : https://tinyurl.com/Loopysample
List down 5 new data visualisation techniques which you learnt from
Regression
Regression is related to continuous data (value functions). This Algorithm
is used for understanding and predicting the continuous values such as
price, salary, age, etc. Regression works with continuous dataset, Eg: if
we want to predict the salary of an employee, we can use his past salaries
as training data and can predict his next salary.
Supervised Machine Learning Applications
• Predictive analytics (house prices, stock exchange prices, etc.)
• Text recognition
• Spam detection
• Customer sentiment analysis
• Object detection (e.g. face detection)
What is Unsupervised Learning?
Unsupervised Learning is a type of machine learning in which
machine trains with unlabeled dataset and the machine predicts
the output without any supervision.
Applications of Unsupervised
Learning Algorithms
• Fraud detection
• Malware detection
• Identification of human errors
during data entry
• Conducting accurate basket
analysis, etc.
Reinforcement Learning ~ It works on feedback process. In Difference between Supervised and Unsupervised Learning
this model AI software get rewarded for each good action
and gets punished for each bad action , hence the goal of
reinforcement learning is to maximize the rewards. E.g.
Video Gaming points
1. What is Data Modelling? How is it important?
2. What makes a machine intelligent?
3. Can Artificial Intelligence be a threat to Human Intelligence? How?
3. What are the different approaches used in AI Modelling? Explain.
3. Differentiate between Rule Based Approach and Learning Based
Approach with an example.
4. Explain the use of decision trees.
5. What are the Common terms used in Decision trees?
Pixel
The full form of the pixel is "Picture Element."Pixel is the smallest element of an image
on a computer display,
Pixel It
Pixel It activity is an example of how computers see images, process them and classify them.
This kind of Machine Learning approach is commonly used in Computer Vision related
applications.
Steps in image processing
▪ Every image which is fed to the computer is divided into pixels (which are
the smallest unit of an image).
▪ The Computer analyses each pixel and if it has to compare 2 pictures to
check if they are similar or not, pixel-wise comparison takes place.
▪ If pixels are identical, this means that the images are the same.
If we zoom the image we can see the pixels
5th Stage of AI Project Cycle -Evaluation
What is Evaluation?
▪ After a Model has been created and trained , It must be thoroughly tested in order to
determine its efficiency and performance which is termed as evaluation.
▪Model evaluation is the process of using different evaluation metrics to understand a
machine learning model's performance, as well as its strengths and weaknesses.
▪Evaluation helps us to identify the areas where the future research is required.
▪After the Evaluation stage the project will be deployed .