Karunadu Technologies Private Limited: (Affiliated To and Approved By)
Karunadu Technologies Private Limited: (Affiliated To and Approved By)
#17, ATK complex, 4th Floor, Acharya college main road, beside
Karur Vysya bank, Guttebasaveshwaranagar Chikkabanvara,
Bengaluru, Karnataka-560090
Internship Report
On
Artificial Intelligence And Machine Learning
By
Abdull Gaffur
(1HK21CS004)
BACHELOR OF ENGINEERING
In
COMPUTER SCIENCE & ENGINEERING
List of Figures 3
References 51
List of Figures
CHAPTER 1
Company Profile
1.1 Profile
The company offers broad range of customized software applications powered by concrete
technology and industry expertise. It also offers end to end embedded solutions and services.
They deal with broad range of product development along with customized features ensuring at
most customer satisfaction and also empower individual with knowledge, skills and
competencies that assist them to escalate as integrated individuals with a sense of commitment
and dedication.
1.1.1 Vision
To Empower Unskilled Individual with knowledge, skills and technical competencies in the field
of Information Technology and Embedded engineering which assist them to escalate as
integrated individuals contributing to company’s and Nation’s growth.
1.1.2 Mission
Provide cost effective and reliable solutions to customers across various latest technologies.
Offer scalable end-to-end application development and management solutions
Provide cost effective highly scalable products for varied verticals.
Focus on creating sustainable value growth through innovative solutions and unique
partnerships.
Create, design and deliver business solutions with high value and innovation by leveraging
technology expertise and innovative business models to address long-term business
objectives.
Keep our products and services updated with the latest innovations in the respective
requirement and technology.
1.1.3 Objectives
To develop software and Embedded solutions and services focussing on quality standards and
customer values.
Offer end to end embedded solutions which ensure the best customer satisfaction.
To build Skilled and Talented manpower pool for global industry requirements.
To develop software and embedded products which are globally recognized.
To become a global leader in Offering Scalable and cost-effective Software solutions and
services across various domains like E-commerce, Banking, Finance, Healthcare and much
more.
To generate employment for skilled and highly talented youth of our Country INDIA.
1.2.1 Products
Karunadu Enterprise Content Management System is a one stop solution for all our enterprise
content management System relating to digital asset management, document imaging, workflow
systems and records management systems. Increasing digitalization has led to an exponential
growth in business content and managing this sea of unstructured data is tedious work. KECMS
enables you to create, capture, manage, distribute, archive different forms of content and has
much more features.
Manage diversified data relating to education management on cloud. Educational data including
students and staff is gathered over years which contain information from admission/appointment
until leaving the Education. Statistical reports for the College/school can be generated along with
admission Tracking and result analysis to keep track of progressive improvements of both
student and staff.
A Complete one stop embedded solution for large apartments. Security system which monitors
door breakage, window breakage, gas leakage, motion detection and various other features
which can be operated and maintained by centralized monitored system. This Embedded
solution enhances the security measures of apartment/building and enhances the security of
individuals may be from unintended intervention or from unauthorized access.
1.2.2 Services
Karunadu Technologies is a Bangalore based IT Training and Software Development center with
an exclusive expertise in the area of IT Services and Solutions. Karunadu Technologies Pvt. Ltd.
is also expertise in Web Designing and Consulting Services.
Karunadu Technologies Pvt. Ltd. has expertise in Design and development of embedded
products and offers solutions and services in field of Electronics.
Academic Projects
Karunadu Technologies Pvt. Ltd. helps students in their academics by imparting industrial
experience into projects to strive excellence of students. Karunadu Technologies Pvt. Ltd.
encourages students to implement their own ideas to projects keeping in mind "A small seed
sown upfront will be nourished to become a large tree one day”, thereby focusing the future
entrepreneurs. They have a wide range of IEEE projects for B.E, MTech, MCA, BCA,
DIPLOMA students for all branches in each and every domain.
Inplant Training
Karunadu Technologies Pvt. Ltd. provides Implant training for students according to the interest
of students keeping in mind the current technology and academic benefit one obtains after
completing the training. Students will be nourished and will be trained throughout with practical
experience. Students will be exposed to industrial standards which boost their carrier. Students
will become Acquaint to various structural partitions such as labs, workshops, assembly units,
stores, and administrative unit and machinery units. They help students to understand their
functions, applications and maintenance. Students will be trained from initial stage that is from
collection of Project Requirements, Project Planning, Designing, implementation, testing,
deployment and maintenance there by helping to understand the business model of the industry.
Entire project life cycle will be demonstrated with hands on experience. Students will also be
trained about management skills and team building activities. They assure that by end of implant
training students will Enhance communication skills and acquire technical skills, employability
skills, start-up skills, and will be aware of risks in industry, management skills and many other
skills which are helpful to professional engagement.
Software Courses
Karunadu Technologies Pvt. Ltd. provides courses for students according to the interest
of students keeping in mind the current technology and assist them for their further Employment.
Company provides various courses such as C, C++, VB, DBMS, Dot Net, Core Java and J2EE
along with live projects.
#17, ATK complex, 4th Floor, Acharya College Main Road, Beside KarurVysya Bank,
Guttebasaveshwaranagar, Chikkabanvara, Bengaluru, Karnataka- 560090
CHAPTER 2
The objective of the internship is to apply theoretical knowledge of “Machine Learning using
Python” to solve real time complex problems, in order to achieve these following basic concepts
were learnt:
Python
Machine Learning
2.1 Python
Extensive support libraries (NumPyfor numerical calculations, Pandas for data analytics).
Open source and community development.
Dynamically typed language (No need to mention data type based on value assigned, it
takes data type).
Object-oriented language, Portable and Interactive across Operating systems.
Machine Learning, as the name suggests, is the science of programming a computer by which
they are able to learn from different kinds of data. A more general definition given by Arthur
Samuel is – “Machine Learning is the field of study that gives computers the ability to learn
without being explicitly programmed.” They are typically used to solve various types of life
problems.
In the older days, people used to perform Machine Learning tasks by manually coding all the
algorithms and mathematical and statistical formula. This made the process time consuming,
tedious and inefficient. But in the modern days, it is become very much easy and efficient
compared to the olden days by various python libraries, frameworks, and modules. Python
libraries that used in Machine Learning are:
NumPy
Pandas
Matplotlib
2.2.1 Numpy
Numpy is basic package for scientific computing. It is the python language implementation
which includes powerful N-dimensional array structure, sophisticated functions, Tools that can
be integrated into C/C++ and Fortran code, Linear algebra, Fourier transform and Random
number features. Besides its obvious scientific uses, numpy can also be used as an efficient
multidimensional container of generic data.
The main aspect Numpy is the Numpy array, on which you can do various operations. The
key is that a Numpy array isn’t just a regular array you’d see in a language like Java or C++,
but instead it is like a mathematical object as a vector or a matrix. That means you can do
vector and matrix operations like addition, subtraction, and multiplication. The most
important aspect of NumPy arrays is that they are optimized for speed.
NumPy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object, and tools for working with these arrays. It is the fundamental
package for scientific computing with Python. Besides its obvious scientific uses, NumPy
can also be used as an efficient multi-dimensional container of generic data.
NumPy, arrays allow a wide range of operations which can be performed on a particular array
or a combination of Arrays. These operations include some basic Mathematical operation as
well as Unary and Binary operations.
# Python program to demonstrate
# basic operations on single array
import NumPy as np
a = np.array ([[1, 2], [3, 4]])# Defining Array 1
b = np.array ([[4, 3], [2, 1]]) # Defining Array 2
print ("Adding 1 to every element:", a + 1)# Adding 1 to every element
print ("\n Subtracting 2 from each element:", b - 2)# Subtracting 2 from each element
# sum of array elements
print ("\n Sum of all array "elements: ", a.sum()) # Performing Unary operations
print ("\n Array sum:\n", a + b)# Performing Binary operations
2.2.2 Pandas
#Python program using Pandas for arranging a given set of data into a table
import pandas as pd
data = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
"capital": ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"],
"area": [8.516, 17.10, 3.286, 9.597, 1.221],
"population": [200.4, 143.5, 1252, 1357, 52.98] }
data table = pd. DataFrame(data)
print(data table)
2.2.3Matplotlib
Matplotlib is a very popular Python library for data visualization. Like Pandas, it is not directly
related to Machine Learning. It particularly comes in handy when a programmer wants to
visualize the patterns in the data. It is a 2D plotting library used for creating 2D graphs and plots.
A module named pyplot makes it easy for programmers for plotting as it provides features to
control line styles, font properties, formatting axes, etc. It provides various kinds of graphs and
plots for data visualization, viz., histogram, error charts, bar charts, etc,
#Python program using Matplotlib for forming a linear plot
import matplotlib. Pyplot as plt# importing the necessary packages and modules
import NumPy as np
x = np. linspace (0, 10, 100)# Prepare the data
plt. Plot (x, x, label ='linear’ )# Plot the data
plt. Legend() # Add a legend
plt. Show()# Show the plot
2.4 Open CV
OpenCV is a huge open-source library for computer vision, machine learning, and image
processing. OpenCV supports a wide variety of programming languages like Python, C++,
Java, etc. It can process images and videos to identify objects, faces, or even the handwriting
of a human. When it is integrated with various libraries, such as Numpy which is a highly
optimized library for numerical operations, then the number of weapons increases in your
Arsenal i.e., whatever operations one can do in Numpy can be combined with OpenCV.
OpenCV (Open-Source Computer Vision Library) is an open-source computer vision and
machine learning software library. OpenCV was built to provide a common infrastructure for
computer vision applications and to accelerate the use of machine perception in the commercial
products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and
modify the code.
Here we convert the given image into a grey scale image i.e., Black and white image. The
function used here is cvtcolor and the code used for grey scale is ‘COLOR_BGR2GRAY’.
Similarly, we do for HSV scale and the function used is ‘COLOR_BGR2HSV’. Following is an
example for the above scales.
CODE:
import cv2
path=" C:\\Users\\Admin\\Desktop\\ajay\\Images\\cats.jpg"
img = cv2.imread(path)
cv2.imshow("Original Image", img)
img1= cv2.resize(img,(512,512))
cv2.imshow("Resized Image",img1)
grayimg = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
grayimg= cv2.resize(grayimg,(512,512))
cv2.imshow("Gray Image", grayimg)
havimg = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
havimg= cv2.resize(havimg,(512,512))
cv2.imshow(" HSV Image", havimg)
cv2.waitKey(0)
cv2.destroyAllWindows()
Resizing Image
Here we can resize the original image into whichever size required by us. The function used for
resizing is ‘resize’. For further applications we require resizing of images in order to match them
for further uses. This is particularly useful for cascading more than two images.
CODE:
import cv2
path = "C:\\Users\\Admin\\Desktop\\ajay\\Images\\Image.png"
img = cv2.imread(path)
cv2.imshow("Original Image", img)
img1= cv2.resize(img,(512,512))
cv2.imshow("Resized Image",img1)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here in order cascade or adding the images we require the images to be of the same size. We use
resize function to resize the image. Here adding the images means in single image we paste them
with required transparency. We can adjust the transparency according to our demand.
CODE:
import cv2
path1="C:\\Users\\Admin\\Desktop\\ajay\\Images\\Image.png"
img1 = cv2.imread(path1)
path2="C:\\Users\\Admin\\Desktop\\ajay\\Images\\cats.jpg"
img2 = cv2.imread(path2)
img1 = cv2.resize(img1,(512,512))
img2 = cv2.resize(img2,(512,512))
addimages = cv2.add(img1,img2)
addtransperency = cv2.addWeighted(img1,0.8,img2,0.2,0)
cv2.imshow("First Image",img1)
cv2.imshow("Second Image",img2)
cv2.imshow("Added Image", addimages)
cv2.imshow("Added Image with transparency", addtransperency)
cv2.waitKey(0)
cv2.destroyAllWindows()
Rotating Images
Here we rotate the images in clockwise or anticlockwise direction in the required degree. The
function we use ‘rotate’ in order to rotate the images.
CODE:
import cv2
path="C:\\Users\\Admin\\Desktop\\ajay\\Images\\cats.jpg"
img = cv2.imread(path)
imgresize = cv2.resize(img, (300,300))
img90 = cv2.rotate(imgresize, cv2.ROTATE_90_CLOCKWISE)
img180 = cv2.rotate(imgresize, cv2.ROTATE_180)
img270 = cv2.rotate(imgresize, cv2.ROTATE_90_COUNTERCLOCKWISE)
Here we detect the face and eye of an individual while he is alone or in a group. It is possible to
detect more than one face and eye at a time. Here we use haarcascades which is a pre-defined
function in order to detect the faces and eyes. It is also possible to detect the faces and eyes in
image or video. Similarly, we can do live screening and also, we need to define different
function for various tasks.
CODE:
import cv2
path="C:\\Users\\Admin\\Desktop\\ajay\\Images\\rotated_face.jpeg"
img = cv2.imread(path)
face_cascade=cv2.CascadeClassifier(cv2.data.haarcascades+'haarcascade_frontalface_de
fault.xml')
faces = face_cascade. detectMultiScale(img,1.1,4)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y),(x+w,y+h),(255,255,0),8)
cv2.imshow("Face detected”, img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to become more accurate at predicting outcomes without being explicitly
programmed to do so. Machine learning algorithms use historical data as input to predict new
output values.
Definition: Ability of a machine to improve its own performance through the use of software
that employs artificial intelligence techniques to mimic the ways by which humans seem to
learn, such as repetition and experience.
process (both human learning and other forms of learning), about the computational aspects
of learning behaviours, and to implant the learning capability in computer systems.
The goal of ML, in simple words, is to understand the nature of human and other forms of
learning, and to build learning capability in computers. To be more specific, there are three
aspects of the goals of ML.
To make the computers smarter, more intelligent. The more direct objective in this aspect
is to develop systems (programs) for specific practical learning tasks in application
domains.
To develop computational models of human learning process and perform computer
simulations. The study in this aspect is also called cognitive modelling.
To explore new learning methods and develop general learning algorithms independent of
applications.
Supervised learning
Unsupervised Learning
Unsupervised learning is the training of machine using information that is neither classified nor
labelled and allowing the algorithm to act on that information without guidance. Here the task of
machine is to group unsorted information according to similarities, patterns and differences
without any prior training of data. Unlike supervised learning, no teacher is provided that means
no training will be given to the machine. Therefore, machine is restricted to find the hidden
structure in unlabelled data by our-self.
For instance, suppose it is given an image having both dogs and cats which have not seen ever.
Thus, machine has no any idea about the features of dogs and cat so we can’t categorize it in
dogs and cats. But it can categorize them according to their similarities, patterns and differences
i.e., one can easily categorize the above picture into two parts. First may contain all pics having
dogs in it and second part may contain all pics having cats in it.
Unsupervised learning classified into two categories of algorithms:
Clustering: A clustering problem is where one wants to discover the inherent groupings
in the data as shown in fig 2.11, such as grouping customers by purchasing behaviour.
Association: An association rule learning problem is where one wants to discover rules
that describe large portions of the data, such as people that buy X also tend to buy Y.
Reinforcement learning
Machine learning has been recognized as central to the success of Artificial Intelligence, and
it has applications in various areas of science, engineering and society. Some of them are:
Product recommendations (e.g., Amazon etc.)
Refining the search engine results(e.g., Google)
Fighting the web spam(e.g., Gmail)
Video surveillance(e.g., crime alerts)
Face recognition and many more.
CHAPTER 3
Machine Learning Algorithms
Machine learning is an application of artificial intelligence (AI) that provides systems the ability
to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can
access data and use it learn for themselves.
It is one of the most widely known modeling technique. Linear regression is usually among the
first few topics which people pick while learning predictive modelling. In this technique, the
dependent variable is continuous, independent variable(s) can be continuous or discrete, and
nature of regression line is linear. Linear Regression establishes a relationship
between dependent variable (Y) and one or more independent variables(X) using a best fit
straight line (also known as regression line) as shown in fig 3.2.It is represented by an
equation Y=a+b*X + e, where a is intercept, b is slope of the line and e is error term. This
equation can be used to predict the value of target variable based on given predictor variable.
This task can be easily accomplished by Least Square Method. It is the most common method
used for fitting a regression line. It calculates the best-fit line for the observed data by
minimizing the sum of the squares of the vertical deviations from each data point to the line.
Because the deviations are first squared, when added, there is no cancelling out between positive
and negative values.
Input Graph
There must be linear relationship between independent and dependent variables. Linear
Regression is very sensitive to Outliers. It can terribly affect the regression line and eventually
the forecasted values. Simple linear regression is used for finding the relationship between the
dependent variable Y and the independent or predictor variable X. Both of these variables are
continuous in nature. While performing simple linear regression, we assume that the values of
predictor variable X are controlled. Furthermore, they are not subject to the measurement error
from which the corresponding value of Y is observed.
The equation of a simple linear regression model to calculate the value of the dependent
variable, Y based on the predictor X is as follows:
𝑦𝑖 = 𝑏0 + 𝑏1 𝑥 + 𝑒𝑖
Where the value of 𝑦𝑖 is calculated with the input variable xi for every ith data point;
The coefficients of regressions are denoted by 𝑏0 and 𝑏1 ; the i th value of x has 𝑒𝑖 as its error in
the measurement.
Regression analysis is implemented to do the following:
With it, we can establish a linear relationship between the independent and the dependent
variables.
The input variables x1, x2….xn is responsible for predicting the value of y.
In order to explain the dependent variable precisely, we need to identify the independent
variables carefully. This will allow us to establish a more accurate causal relationship
between these two variables.
Advantages
Linear regression is an extremely simple method. It is very easy and intuitive to use and
understand. A person with only the knowledge of high school mathematics can understand and
use it. In addition, it works in most of the cases. Even when it doesn’t fit the data exactly, we can
use it to find the nature of the relationship between the two variables.
Disadvantages
By its definition, linear regression only models relationships between dependent and
independent variables that are linear. It assumes there is a straight-line relationship
between them which is incorrect sometimes. Linear regression is very sensitive to the
anomalies in the data (or outliers).
Take for example most of your data lies in the range 0-10. If due to any reason only one
of the data items comes out of the range, say for example 15, this significantly influences
the regression coefficients.
Another disadvantage is that if we have a number of parameters than the number of
samples available then the model starts to model the noise rather than the relationship
between the variables.
In many cases, there may be possibilities of dealing with more than one predictor variable for
finding out the value of the response variable. Therefore, the simple linear models cannot be
utilized as there is a need for undertaking multiple linear regression for analyzing the predictor
variables. The difference between simple linear regression and multiple linear regression is that,
multiple linear regression has more than 1 independent variables, whereas simple linear
regression has only 1 independent variable. Using the two explanatory variables, we
can delineate the equation of multiple linear regression as follows:
𝑦𝑖 = 𝑏0 + 𝑏1 𝑥1 + 𝑏𝑥2 + 𝑒𝑖
The two explanatory variables 𝑥1 and 𝑥2 , determine 𝑦𝑖 , for the ith data point. Furthermore, the
predictor variables are also determined by the three parameters𝑏0 , 𝑏1 , and 𝑏2 of the model, and
by the residual 𝑒𝑖 of the point i from the fitted surface.General Multiple regression models can
be represented as:
𝑦𝑖 = ∑ 𝑏𝑖 𝑥𝑖 + 𝑒𝑖
Advantages
The ability to determine the relative influence of one or more predictor variables to the
criterion value.
The ability to identify outliers, or anomalies.
Disadvantage
Any disadvantage of using a multiple regression model usually comes down to the
data being used. Two examples of this are using incomplete data and falsely
concluding that a correlation is causation.
Logistic regression is a statistical model that in its basic form uses a logistic function to model
a binary dependent variable, although many more complex extensions exist. In regression
analysis, logistic regression is estimating the parameters of a logistic model (a form of binary
regression). Mathematically, a binary logistic model has a dependent variable with two possible
values, such as pass/fail which is represented by an indicator variable, where the two values are
labelled "0" and "1". In the logistic model, the log-odds (the logarithm of the odds) for the value
labelled "1" is a linear combination of one or more independent variables ("predictors"); the
independent variables can each be a binary variable (two classes, coded by an indicator variable)
or a continuous variable (any real value). The corresponding probability of the value labelled "1"
can vary between 0 (certainly the value "0") and 1 (certainly the value "1"), hence the labelling,
the function that converts log-odds to probability is the logistic function, hence the name.
The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the
alternative names. Analogous models with a different sigmoid function instead of the logistic
function can also be used, such as the profit model; the defining characteristic of the logistic
model is that increasing one of the independent variables multiplicatively scales the odds of the
given outcome at a constant rate, with each independent variable having its own parameter; for a
binary dependent variable this generalizes the odds ratio.
This is where logistic regression comes into play. In logistic regression, you get a probability
score that reflects the probability of the occurrence of the event.An event in this case is each row
of the training dataset. It could be something like classifying if a given email is spam, or mass of
cell is malignant or a user will buy a product and so on.
Advantages
Disadvantages
While working with Logistic regression you are not able to handle a large number
of categorical features/variables.
It is vulnerable to over fitting.
It can’t solve the non-linear problem with the logistic regression model that is why it
requires a transformation of non-linear features.
Logistic regression will not perform well with independent(X) variables that are
not correlated to the target(Y) variable.
3.4 KNN
K nearest neighbors or KNN Algorithm is a simple algorithm which uses the entire dataset in its
training phase. Whenever a prediction is required for an unseen data instance, it searches through
the entire training dataset for k-most similar instances and the data with the most similar instance
is finally returned as the prediction. KNN is often used in search applications where you are
looking for similar items, like find items similar to this one.
KNN is a Supervised Learning algorithm that uses labelled input data set to predict the
output of the data points.
It is one of the simplest Machine learning algorithms and it can be easily implemented for a
varied set of problems.
It is mainly based on feature similarity. KNN checks how similar a data point is to its
neighbor and classifies the data point into the class it is most similar to.
Unlike most algorithms, KNN is a non-parametric model which means that it does not make
any assumptions about the data set. This makes the algorithm more effective since it can
handle realistic data.
KNN is a lazy algorithm; this means that it memorizes the training data set instead of
learning a discriminative function from the training data.
KNN can be used for solving both classification and regression problems.
3.4.2 Working
In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding
factor. K is generally an odd number if the number of classes is 2. When K=1, then the algorithm
is known as the nearest neighbor algorithm. This is the simplest case. However, the number of
neighbours (K) is a hyper parameter that needs to be chosen at the time of model building.
Research has shown that no optimal number of neighbors suits all kind of data sets. Each dataset
has its own requirements. Generally, Data scientists choose as an odd number if the number of
classes is even. We can also check by generating the model on different values of k and check
their performance.
Suppose ‘?’ is the point, for which label needs to predict. First, you find the k closest point to P1
and then classify points by majority vote of its k neighbors. Each object votes for their class and
the class with the most votes are taken as the prediction. For finding closest similar points, you
find the distance between points using distance measures such as Euclidean distance, Hamming
distance, Manhattan distance and Minkowski distance. Then we find the one closest point to ‘?’
and then the label of the nearest point is assigned to ‘?’.
Advantages
Disadvantages
3.5 SVM
“Support Vector Machine” (SVM) is a supervised machine learning algorithm that can be used
for both classification and regression challenges. However, it is mostly used in classification
problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space
(where n is a number of features you have) with the value of each feature being the value of a
particular coordinate.
SVM is a Supervised Learning algorithm that uses labelled input data set to predict the output
of the data points.
It is one of the simplest Machine learning algorithms and it can be easily implemented for a
varied set of problems.
SVM can be used for solving both classification and regression problems.
3.5.2 Working
The working of the SVM algorithm can be understood by using an example. Suppose we have a
dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a
classifier that can classify the pair(x1, x2) of coordinates in either green or blue. So, as it is 2-d
space so by just using a straight line, we can easily separate these two classes. But there can be
multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of the lines
from both the classes. These points are called support vectors. The distance between the vectors
and the hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
3.5.3Advantagesand Disadvantages
Advantages
SVM works relatively well when there is a clear margin of separation between classes.
SVM is more effective in high dimensional spaces.
SVM is effective in cases where the number of dimensions is greater than the number of
samples.
SVM is relatively memory efficient.
Disadvantages
Decision Tree is a supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome. It is a graphical
representation for getting all the possible solutions to a problem/decision on based on given
conditions. In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
The logic behind the decision tree can be easily understood because it shows a tree-like
structure.
It is very easy to understand and implement.
3.6.2 Working
In a decision tree, for predicting the class of the given dataset, the algorithm starts from
the root node of the tree. This algorithm compares the values of root attribute with the record
(real dataset) attribute and, based on the comparison, follows the branch and jumps to the next
node. For the next node, the algorithm again compares the attribute value with the other sub-
nodes and move further. It continues the process until it reaches the leaf node of the tree. The
complete process can be better understood using the below algorithm:
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step3.
Continue this process until a stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
Advantages
It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
It can be very useful for solving decision-related problems.
It helps to think about all the possible outcomes for a problem.
There is less requirement of data cleaning compared to other algorithms.
Disadvantages
CHAPTER 4
Project -1 Description
Prediction of Different class type of flowers .
The dataset contains information on various flowers, including features such as area code,
locality code, region code, height, diameter, and species. Each row represents a distinct flower,
and the target variable, "Class," indicates the flower's specific class type. The dataset is suitable
for a classification task aiming to predict the class of a flower based on its characteristics.
4.2 Dataset
It appears you've shared a sample dataset with columns such as `Area_Code`, `Locality_Code`,
`Region_Code`, `Height`, `Diameter`, `Class`, and `Species`. Each row in the dataset represents
information about a flower. The `Class` column seems to represent the class of the flower.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
Output:-
Django Output: -
4.5 CONCLUSION
In conclusion, this project aimed to predict the class type of flowers using a K-Nearest
Neighbors (KNN) classifier based on features such as area code, locality code, region code,
height, diameter, and species. The model demonstrated its capability to make accurate
predictions, achieving [insert evaluation metric] accuracy on the test set. Data preprocessing,
feature engineering, and model training were essential steps in optimizing performance. Future
enhancements could involve exploring alternative classification algorithms and fine-tuning
hyperparameters for further improvements. Overall, the project provides valuable insights into
leveraging machine learning for flower classification tasks.
CHAPTER 5
Project -2 Description
Classify different type of objects
(bottle,chair,mouse,laptop)
5.1 Problem Statement
The provided Python script implements a Convolutional Neural Network (CNN) using
TensorFlow and Keras for the classification of images depicting various objects such as bottles,
chairs, mice, and laptops. The script loads a pre-trained model and processes a test dataset,
resizing images to 30x30 pixels. It then predicts the object type for each image, utilizing one-hot
encoding for categorical labels. The output includes the predicted class for each test image. The
project aims to automate the identification of different objects through machine learning,
offering a practical solution for object recognition and classification in images, with potential
applications in computer vision and automated systems..
5.2 Dataset
The dataset utilized in the project encompasses images of various objects, including bottles,
chairs, mice, and laptops. These images are categorized into distinct classes representing each
object type. Resized to 30x30 pixels, the images are sourced from the 'dataset/train/' directory.
The dataset is partitioned into training and testing sets, accompanied by corresponding labels
specifying the object type, facilitating the construction and assessment of a Convolutional
Neural Network (CNN) model for automated classification. This dataset and experimental setup
aim to develop an effective model for identifying and categorizing different types of objects
through image analysis and machine learning techniques.
CODE:-
Output:-
5.3 CONCLUSION
The internship is dedicated to harnessing the power of Artificial Intelligence (AI) methodologies
to practically apply theoretical concepts in tackling real-world challenges. The central objective
is to develop an efficient prediction model for categorizing diverse objects, including bottles,
chairs, mice, and laptops, by implementing appropriate machine learning algorithms. This
project not only hones programming skills but also provides an opportunity to apply
foundational knowledge to address tangible issues. Specifically, AI techniques are utilized to
predict the types of objects through the analysis of images. Commencing with a foundation in
Supervised Learning, the internship explores the application of AI to solve problems in the realm
of object recognition and classification..
REFERENCES
[1] www.karunadutechnologies.com
[4] Victor Roman, “Machine Learning Project: Predicting Boston House Prices With
Regression”, 20 Jan 2019