0% found this document useful (0 votes)
13 views25 pages

Skumar

The document is an internship report by Umang Kashyap on 'Machine Learning Using Python' submitted for a Diploma in Engineering in Computer Science & Engineering. It includes an introduction to machine learning, its architecture, various algorithms, and the importance of data preprocessing. The report also discusses different types of machine learning systems such as supervised, unsupervised, semi-supervised, and reinforcement learning.

Uploaded by

rajlakshmictrl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views25 pages

Skumar

The document is an internship report by Umang Kashyap on 'Machine Learning Using Python' submitted for a Diploma in Engineering in Computer Science & Engineering. It includes an introduction to machine learning, its architecture, various algorithms, and the importance of data preprocessing. The report also discusses different types of machine learning systems such as supervised, unsupervised, semi-supervised, and reinforcement learning.

Uploaded by

rajlakshmictrl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

INTERNSHIP ON MACHINE LEARNING USING PYTHON

Implant internship report submitted in partial


Fulfilment of the requirement for the award of

Diploma in Engineering

In

Computer Science &Engineering

Submitted by Under the guidance:-


UMANG KASHYAP ER. AMIT KUMAR (T.P.O)
(511121822030)

Internship place:- National Institute of Electronics 6 Information


Technology, Patna

Government Polytechnic , Bhagalpur


Barari, Bhagalpur, Bihar 812003
Affiliated to S.B.T.E , Bihar, Patna,
Approved by AICTE New Delhi ,
Recognized by DST , Govt. of Bihar
DECLARATION

I am Umang Kashyap Of the student bearing Board Roll


No:511121822030, hereby certify that the this is the report of
entitled, “Machine Learning Using Python” carried out under
the guidance of ER. AMIT KUMAR (T.P.O) is submitted to state
board of Technical Education, Bihar, Patna in partial
fulfilment of the requirements for the award of Diploma in
Engineering in Computer Science and Engineering. This is a
record of Bonafide work carried out by me and the results
embodied in this report have not been reproduced or copied
from any source. The results Embodied in this report have not
been submitted to any other University or Institute for the award
of any other degree.

Date:-
NAME : Umang Kashyap
SEMESTER : 5TH
CLASS ROLL : 23/CSE-030 BOARD
ROLL NO : 511121822030
DEPARTMENT OF C.S.E ,
GOVT. POLYTECHNIC BHAGALPUR
GOVERNMENT POLYTECHNIC, BHAGALPUR
Affiliated to S.B.T.E, Bihar, Patna, Approved by AICTE New Delhi,
Recognized by DSTTE, Govt. of Bihar

Department of Computer Science & Engineering

Bonafide Certificate

This is to certify that the Project entitled “MACHINE LEARNING USING


PYTHON”, being submitted by Umang Kashyap, bearing Roll
No:511121822030, to State Board of Technical Education, Bihar, Patna in
partial fulfilment of the requirements for the award of Diploma in
Engineering in Computer Science & Engineering, is a record of bonafide
work carried out by him. The results of investigations enclosed in this report
have been verified and found satisfactory. The results embodied in this report
have not been submitted to any other University or Institute for the award of
any other degree or diploma.

Project Guide
Head of the Department
Amit Sir and Sandeep Sir Er. Sandeep Sir
ACKNOWLEDGENT

With great pleasure I take this opportunity to express my heartfelt gratitude to


all the persons who helped me in making this project work a success.

First of all, I am highly indebted to Principal, Dr. Kamal Kishore Pathak


for giving me the permission to carry out this project.
I would like to thank ER. Sandeep Sir, Lecturer & Head of the Department
(CSE), for giving support throughout the period of my study at GP
Bhagalpur. I am grateful for his valuable suggestions and guidance during
the execution of this project work.

My sincere thanks to project guide Amit Sir for potentially explaining the
entire system and clarifying the queries at every stage of the project.

I would also thankful of Govt. Polytechnic Bhagalpur and our Principal


Sir who provided immense support as well as answers to our queries that
I kept firing on them during the development of this application. My whole
hearted thanks to the staff of GP Bhagalpur who co-operated us for the
completion of project in time.
I also thank my parents and friends who aided me in completion of the
project/training

ÎU
CONTENTS :-

TABLE OF CONTENT

TITLE:- PAGE NO:-


01
1. Introduction to Machine Learning

2. Architecture of Machine Learning Model 04

3. Classification of machine Learning 06

4. Types of Machine Learning Algorithm


08

5. Flow Chart of Machine Learning 10

6. Basic Python 12

7. Conditional statement, Strings, List, Tuples, Indexing 13


and Slicing

8. Python Machine Learning Packages 15

9. Future of Machine Learning 18

10. Key Research Papers and Articles using Python 19

V
1.Introduction to Machine Learning

Machine learning is the science of getting computers to act without


being explicitly programmed. In the past decade, machine learning has
give us self-driving cars, practical speech recognition, effective web
search, and vastly improved understanding of the human genome.
Machine learning is so pervasive today that you probably use it dozens
of times a day without knowing it. Many researchers also think it is
the best way to make progress towards human level A1.
General Definition: Ability of a machine to improve its own
performance through the use of a software that employs artificial
intelligence techniques to mimic the ways by which humans seem
to learn, such as repetition and experience.
ML Definition by Tom M. Mitchell: A computer program is said to
learn from experience E with respect to some class of tasks 7 and
performance measure P if its performance at tasks in 7. as measured
by P, improves with experience E.

Mach me Learning:
concerns with developing computational theories of learning and building
learning machines. The goal of machine learning, closely coupled with the
goal ofAl, is to achieve a thorough understanding about the nature of learning
process (both human learning and other forms of learning), about the
computational aspects of learning behaviours, and to implant the learning
capability in computer systems. Machine learning has been recognized as
central to the success of Artificial Intelligence, and it has applications in
various areas of science, engineering and society.

Learning?
Learning is a phenomenon and process which has manifestations of various
aspects. Roughly speaking, learning process includes (one or more ofJ the
following:
1.) Acquisition of new (symbolic) knowledge
2.) Development of cognitive skills through instruction and practice.
3.) Refinement and organization of knowledge into more effective
representations or more useful form
Discovery of new facts and theories through observation and experiment
The general effect of learning in a system is the improvement of the system's
capability to solve problems. It is hard to imagine a system capable of
learning cannot improve its probem-solving performance. A system with
learning capability should be able to do self-changing in order to perform
better in its future problem-solving.
We also note that learning cannot take place in isolation: We typically
learn something (knowledge K) to perform some tasks (T), tough some
experience E, and whether we have learned well or not will be judged by
some performance criteria P at the task T.

2
There are various forms of improvement of a system's problem-solving ability:
1.) To solve wider range of problems than before and perform generalization.
2.) To solve the same problem more effectively and give better quality.

3.) To Solve the Same Problem more Efficiently and Faster.

The Goals of Machine Learning.

The goal of ML, in simple words, is to understand the nature of(human and
other forms off learning, and to build learning capability in computers. To be
more specific, there are three aspects of the goals of ML.

1 ) To make the computers smarter, more intelligent. The more direct


objective in this aspect is to develop systems (programs) for specific
practical learning tasks in application domains.
2) To develop computational models of human learning process and perform
computer simulations. The study in this aspect is also called cognitive
modelling.
3) To explore new learning methods and develop general learning
algorithms independent of applications.

Why the goals of ML are important and desirable.?

The present-day computer programs in general (with the exception of some ML


programs) cannot correct their own errors or improve from past mistakes, or
learn to perform a new task by analogy to a previously seen task. In contrast,

3
human beings are capable of all the above. ML will produce smarter
computers
capable of all the above intelligent behaviour.
It is clear that central to our intelligence is our ability to learn. Thus a
thorough understanding of human learning process is crucial to understand
human intelligence. ML will gain us the insight into the underlying principles
of human learning and that may lead to the discovery of more effective
education techniques. It will also contribute to the design of machine learning
systems.

If we go into details of machine learning process, firstly we identify, choose


and get the data that we want to work with the data with which we start is
raw and unstructured, it is never in the correct form as needed for actual
processing. It could have duplicate data, or data that is missing, or else a lot
of extra data that is not needed. The data could be formed from various
sources which may also eventually end up being duplicate or redundant data.
In this case, there comes the requirement for pre- processing the data, so that
the process could understand the data, and the good thing is that the machine

4
learning products usually provide some data pre-processing modules to
process the raw or unstructured data.
So, in order to apply the actual algorithm to the data, we need to have that
complete unstructured data into a structured and shaped data for which a
process of pre-massaging is required, through which the data is passed.
Finally, we get a candidate copy of data which could be processes through
the algorithm to get the actual golden copy.

After the data is pre-processed, we get some good structured data, and this
data is now an input for machine learning. But is this a one-time job? Of
course not, the process has to be iterative, and it has to be iterative until the
data is available. In machine learning the major chunk of time is spent in this
process. That is, working on the data to make it structured, clean, ready and
available. Once the data is available, the algorithms could be applied to the
data. Not only pre-processing tools, but the machine learning products also
offer a large number of machine learning algorithms as well. The result of
the algorithm applied data is a model, but now the question is whether this is
the final model we needed.
No, it is the candidate model that we got. Candidate model means the first
most appropriate model that we get, but still it needs to be massaged.
But do we get only one candidate model? Of course not, since this is an
iterative process, we do not actually know what the best candidate model is,
until we again and again produce several candidate models through the
iterative process. We do it until we get the model that is good enough to be
deployed. Once the model is deployed, applications start making use of it, so
there is iteration at small levels and at the largest level as well.
We need to repeat the entire process again and again and re-create the model
at regular intervals. The reason again for this process is very simple, it's
5
because the scenarios and factors change and we need to have our model up
to date and real all the time. This could eventually also mean to process new
data or applying new algorithms altogether.

3. Classification of Machine Learning Svstem


There some variations of how to define the types of Machine Learning
Systems but commonly they can be divided into categories according to their
purpose and the main categories are the following:

Supervised Machine Learning: Supervised learning is a machine learning


technique for learning a function from training data. The training data consist
of pairs of input objects (typically vectors), and desired outputs. The output of
the function can be a continuous value (called regression), or can predict a
class label of the input object (called classification).
The task of the supervised learner is to predict the value of the function for
any valid input object after having seen a number of training examples (i.c
pairs of input and target output). To achieve this, the learner has to generalize
from the presented data to unseen situations in a "reasonable" way.

6
"Supervised learning is a machine learning technique whereby the algorithm
is first presented with training data which consists of examples which include
both the inputs and the desired outputs; thus, enabling it to learn a function.
The learner should then be able to generalize from the presented data to
unseen examples." By Tom M. Mitchel

Unsupervised Machine Learning: Unsupervised learning is a type of


machine learning where manual labels of inputs are not used. It is
distinguished from supervised learning approaches which learn how to
perform a task, such as classification or regression, using a set of human
prepared examples. Unsupervised learning means we are only given the X
(Feature Vector) and some (ultimate) feedback function on our performance.
We simply have a training set of vectors without function values of them.
The problem in this case, typically, is to partition the training set into subsets
in some appropriate way. Input data is not labelled and does not have a known
result. A model is prepared by deducing structures present in the input data.
This may be to extract general rules. It may be through a mathematical
process to systematically reduce redundancy, or it may be to organize data
by similarity.
Semi-Supervised Learning: Semi-Supervised learning uses both labelled and
unlabelled data to perform an otherwise supervised learning or unsupervised
learning task. There is a desired prediction problem but the
7
model must learn the structures to organize the data as well as make
predictions. The goal is to learn a predictor that predicts future test data better
than the predictor learned from the labelled training data alone. semi-
supervised learning finds applications in cognitive psychology as a
computational model for human learning. In human categorization and
concept forming, the environment provides unsupervised data (e.g., a child
watching surrounding objects by herself} in addition to labelled data from a
teacher (e.g., Dad points to an object and says "bird!"). There is evidence that
human beings can combine labelled and unlabelled data to facilitate learning.
Reinforcement Learning: Reinforcement Learning is a type of Machine
Learning, and thereby also a branch of Artificial Intelligence. It allows
machines and software agents to automatically determine the ideal behaviour
within a specific context, in order to maximize its performance. Simple
reward feedback is required for the agent to learn its behaviour; this is known
as the reinforcement signal. Some applications of the reinforcement
learning algorithms are computer played board games (chess, Go),
Robotics, hands.

4. Tvpes of Machine Learning Algorithms

Types of Learning

Reinforcement
Learning

O lassification Clustering
Regression Association
DimëñÎsiÖ”rÏàlÏiy
Reduction

8
Machine learning comes in many different flavours, depending on the
algorithm and its objectives. You can divide machine learning algorithms into
three main groups based on their purpose:
1.) Supervised Learning Algorithms
2.) Unsupervised Learning Algorithms
3.) Reinforcement Learning Algorithms
Supervised Learning Algorithms: Supervised learning is where you have
input variables (x) and an output variable (Y) and you use an algorithm to
learn the mapping function from the input to the output.
Y=F(X)
The goal is to approximate the mapping function so well that when you have
new input data (x) that you can predict the output variables (Y) for that data.
We know the correct answers, the algorithm iteratively makes predictions on
the training data and is corrected by the teacher. Learning stops when
Supervised learning problems can be further grouped into regression and
classification problems.
Classification: A classification problem is when the output variable is a
category, such as "red" or "blue" or "disease" and "no disease". Regression:
A regression problem is when the output variable is a real(continues) value,
such as "dollars" or "weight".
Regression: A regression problem is when the output variable is a
real(continues) value, such as "dollars" or "weight".
Unsupervised Learning Algorithms: Unsupervised learning is where you
only have input data (X) and no corresponding output variables. The goal for
unsupervised learning is to model the underlying structure or distribution in
the data in order to learn more about the data.

These are called unsupervised learning because unlike supervised learning


above there is no correct answers and there is no teacher. Algorithms are left
9
to their own devises to discover and present the interesting structure in the
data.
Unsupervised learning problems can be further grouped into clustering and
association problems.
Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behaviour.
Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that
buy X also tend to buy Y.
Reinforcement Learning
In reinforcement learning, the goal is to develop a system (agent) that
improves its performance based on interactions with the environment. Since
the information about the current state of the environment typically also
includes a so-called reward signal, we can think of reinforcement learning as
a eld related to supervised learning. However, in reinforcement learning this
feedback is not the correct ground truth label or value, but a measure of how
well the action was measured by a reward function. Through the interaction
with the environment, an agent can then use reinforcement learning to learn
a series of actions that maximizes this reward via an exploratory trial-and error
approach or deliberative planning. Consider an example of a child learning
to walk.
Let's formalize the above example, the "problem statement" of the example
is to walk, where the child is an agent trying to manipulate the environment
(which is the surface on which it walks) by taking actions (viz walking) and
he/she tries to go from one state (viz each step he/she takes) to another. The
child gets a reward (let's say chocolate) when he/she accomplishes a
submodule of the task (viz taking couple of steps) and will not receive any
chocolate (a.k. a negative reward) when he/she is not able to walk. This is a
simplified description of a reinforcement learning problem.

5. Flow Chart
10
11
6. Basic Pvthon
Python is a widely used high-level, general-purpose, interpreted, dynamic
programming language. Its design philosophy emphasizes code readability,
and its syntax allows programmers to express concepts in fewer lines of code
than would be possible in languages such as C** or Java. The language
provides constructs intended to enable clear programs on both a small and large
scale. Python supports multiple programming paradigms, including object-
oriented, imperative and functional programming or procedural styles. It
features a dynamic type system and automatic memory management and has
a large and comprehensive standard library. Python interpreters are available
for installation on many operating systems, allowing Python code execution on
a wide variety of systems.

Historv
Python was conceived in the late 1980s, and its implementation was started
in December 1989 by Guido van Rossum at CWI in the Netherlands as a
successor to the ABC language (itself inspired by SETL) capable of
exception handling and interfacing with the Amoeba operating system. Van
Rossum is Python's principal author, and his continuing central role in deciding
the direction of Python is reflected in the title given to him by the
Python community, benevolent dictator for life (BDFL)
Examples 1:
Input: numl = 5, num2 = 3
Output : 8
Input : num1 = 13, num2 = 6
Output : 19

Input : P = 10000
R 5
T 5
Output :2500.0
We need to find simple interest on Rs.
10,000 at the rate of 5% for 5 units
of time.
Examples 3:

12
a = 7 b
3
print(max(a, b))
7. Conditional Statement, Strings, List, Tuples, indexing and
Slicing.
Python Conditions and If statements

Python supports the usual logical conditions from mathematics:

Equals: a == b
Not Equals: a != b
Less than: a < b
Less than or equal to: a b
Greater than: a > b
Greater than or equal to: a >= b

These conditions can be used in several ways, most commonly in "if


statements" and loops. An "if statement" is written by using the if keyword.

Example
If statement:

a= 33 b= 200 if b>a:
print("b is greater than a")
Elif

13
The elif keyword is Python's way of saying "if the previous conditions were
not true, then try this condition".
Example
a= 33
b= 33
if b>a:
printf"b is greater than a")
elif a == b:
print("a and b are equal")

Else

The else keyword catches anything which isn't caught by the preceding
conditions.

Example a = 200 b = 33 if b >

print("b is greater than a")


elif a == b:
print("a and b are equal")
else: print("a is greater
than b")
Strings

14
Strings in python are surrounded by either single quotation marks, or double
quotation marks. 'hello' is the same as "hello". You can display a string literal
with the print() function:

Example
print("Hello")
print('Hello')

List
Lists are used to store multiple items in a single variable. Lists are one of 4
built-in data types in Python used to store collections of data, the other 3 are
Tu p_le, Set, and Dictiryona, all with different qualities and usage.

Lists are created using square brackets:


Example
Create a List:
thislist = ["apple", "banana", "cherry"] print(thislist)

Tuple
Tuples are used to store multiple items in a single variable. Tuple is one of 4
built-in data types in Python used to store collections of data, the other 3 are
List, Set, and Dictiryona, all with different qualities and usage. A tuple is a
collection which is ordered and unchangeable. Tuples are written with round
brackets.
Example
Create a Tuple:
thistuple = ("apple", "banana", "cherry") print(thistuple)

8. Pvthon Machine Learning Packages

15
Pandas
Pandas is a Python library used for working with data sets. It has functions
for analyzing, cleaning, exploring, and manipulating data. The name
"Pandas" has a reference to both "Panel Data", and "Python Data Analysis"
and was created by Wes McKinney in 2008.

Pandas allows us to analyse big data and make conclusions based on


statistical theories. Pandas can clean messy data sets, and make them
readable and relevant. Relevant data is very important in data science.

Example
Load the CSV into a DataFrame:
import pandas as pd

df = pd.read csv('data.csv')

print(df.to string())

Matplotlib

Matplotlib is a low level graph plotting library in python that serves as a


visualization utility. Matplotlib was created by John D. Hunter. Matplotlib is
open source and we can use it freely. Matplotlib is mostly written in python,
a few segments are written in C, Objective-C and Javascript for Platform
compatibility.

Example
Draw a line in a diagram from position (0, 0) to position (6, 250):
import matplotlib.pyplot as plt
import numpy as np

xpoints = np.array([0, 6]) ypoints


= np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()
16
Result:

Scikit-learn
Scikit-learn has emerged as a powerful and user-friendly Python library. Its
simplicity and versatility make it a better choice for both beginners and
seasoned data scientists to build and implement machine learning models. In
this article, we will explore about Sklearn.
# load the iris dataset as an example

from sk1earn.datasets import load iris

iris = load iris()

# store the feature matrix (X) and response vector (y)

X = iris.data y = iris.target

# store the feature and target names

feature names = iris.feature names

target names = iris.target names

# printing features and target names of our dataset

print(“Feature names:", feature names) print("Target

names:“, target names)

17
# X and y are numpy arrays print(“\nType

of X i s: “ , type(X) )

# printing first 5 input rows


print(“\nFirst 5 rows of X:\n“, X[:5])

Feature names: ['sepal length (cm)','sepal width (cm)',


'petal length (cm)','petal width (cm)'] Target
names: ['setosa' 'versicolor' 'virginica']
Type of X is:

First 5 rows of X:
[[ 5.1 3.5 1.4 0.21
[ 4.9 3. 1.4 0.21
[ 4.7 3.2 1.3 0.21
[ 4.6 3.1 1.5 0.21
[ 5. 3.6 1.4 0.211

9. Future of Machine Learning


Research in Machine Learning Theory is a combination of attacking
established fundamental questions, and developing new frameworks for
modelling the needs of new machine learning applications. While it is
impossible to know where the next breakthroughs will come, a few topics
one can expect the future to hold include:

• Better understanding how auxiliary information, such as unlabelled data,


hints from a user, or previously-learned tasks, can best be used by a machine
learning algorithm to improve its ability to learn new things. Traditionally,
Machine Learning Theory has focused on problems of learning a task (say,
identifying spam) from labelled examples (email labelled as spam or not).
However, often there is additional information available. One might have
access to large quantities of unlabelled data (email messages not labelled by
their type, or discussion-group transcripts on the web) that could potentially
provide useful information. One might have other hints from the
i8
user besides just labels, c.g. highlighting relevant portions of the email
message. Or, one might have previously learned similar tasks and want to
transfer some of that experience to the job at hand. These are all issues for
which a solid theory is only beginning to be developed.
Further developing connections to economic theory. As software agents
based on machine learning are used in competitive settings, "strategic"
issues become increasingly important. Most algorithms and models to date
have focused on the case of a single learning algorithm operating in an
environment that, while it may be changing, does not have its own
motivations and strategies. However, if learning algorithms are to operate in
settings dominated by other adaptive algorithms acting in their own users'
interests, such as bidding on items or performing various kinds of
negotiations, then we have a true merging of computer science and
economic models. In this combination, many of the fundamental issues are
still wide open.

10. Kev Research Papers and Articles Using Pvthon:


Scikit-learn: Machine Learning in Python" by Pedregosa et al. (2011)

This is the original paper on scikit-learn, the most widely used library for
traditional machine learning in Python. The paper describes its design,
functionality, and algorithms.

Link: Scikit-learn Paper

"Data Science at the Command Line" by Jeroen Janssens (2014)

This paper discusses the integration of Python tools for data science,
including Pandas, NumPy, and scikit-learn, and how they can be used
effectively in command-line environments for machine learning.

Link: Data Science at the Command Line


"A Survey on Machine Learning in Python" by various authors

19
This survey paper provides a detailed exploration of the various libraries
available in Python for machine learning, including an in-depth discussion of
scikit-learn, Keras, TensorFlow, and PyTorch.

20

You might also like