0% found this document useful (0 votes)
221 views65 pages

Python and Machine Learning: A Practical Training Report On

Machine learning is a subfield of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" with data, without being explicitly programmed. The goals of machine learning include developing algorithms and models that can learn from and make predictions on data. Machine learning is important because it allows computers to perform complex tasks like recognizing patterns and making intelligent decisions based on data. It also helps develop smarter computer programs that can learn from experience.

Uploaded by

aman gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
221 views65 pages

Python and Machine Learning: A Practical Training Report On

Machine learning is a subfield of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" with data, without being explicitly programmed. The goals of machine learning include developing algorithms and models that can learn from and make predictions on data. Machine learning is important because it allows computers to perform complex tasks like recognizing patterns and making intelligent decisions based on data. It also helps develop smarter computer programs that can learn from experience.

Uploaded by

aman gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

A

Practical Training Report


on

Python and Machine Learning

Submitted in partial fulfillment for the award of degree of

BACHELOR OF TECHNOLOGY

In

Computer Science & Engineering

Submitted To: Submitted By:


Dr. Smita Agarwal Aakriti Yadav
(17EGJCS001)

Department Of Computer Science & Engineering


GLOBAL INSTITUTE OF TECHNOLOGY JAIPUR
(RAJASTHAN)-302022 SESSION: 2019-2021

i
CERTIFICATE

ii
ACKNOWLEDGEMENT

It is our proud privilege and duty to acknowledge the kind of help and guidance received from several people in
preparation of this report. It would not have been possible to prepare this report in this form without their
valuable help, cooperation and guidance.

ii
We express our sincere gratitude to (Dr.) Sylvester Fernandes for providing us an opportunity to undergo this
Project as the part of the curriculum.

We are thankful to Mr. Yogendra Singh for his support, cooperation, and motivation provided to us during the
training for constant inspiration, presence and blessings.

We would also like to thank our H.O.D Mr. Girraj Khandelwal for her valuable suggestions which helps us lot in
completion of this project.

Lastly, we would like to thank the almighty and our parents for their moral support and friends with whom we
shared our day-to-day experience and received lots of suggestions that improved our quality of work.

iii
ABSTRACT
Present day computer applications require the representation of huge amount of complex knowledge and data in
programs and thus require tremendous amount of work. Our ability to code the computers falls short of the
demand for applications. If the computers are endowed with the learning ability, then our burden of coding the
machine is eased (or at least reduced). This is particularly true for developing expert systems where the "bottle-
neck" is to extract the expert’s knowledge and feed the knowledge to computers. The present day computer
programs in general (with the exception of some Machine Learning programs) cannot correct their own errors or
improve from past mistakes, or learn to perform a new task by analogy to a previously seen task. In contrast,
human beings are capable of all the above. Machine Learning will produce smarter computers capable of all the
above intelligent behavior.

The area of Machine Learning deals with the design of programs that can learn rules from data, adapt to changes,
and improve performance with experience. In addition to being one of the initial dreams of Computer Science,
Machine Learning has become crucial as computers are expected to solve increasingly complex problems and
become more integrated into our daily lives. This is a hard problem, since making a machine learn from its
computational tasks requires work at several levels, and complexities and ambiguities arise at each of those
levels.

So, here we study how the Machine learning take place, what are the methods, remedies associated, applications,
present and future status of machine learning.

iii
Index

CERTIFICATE i

ACKNOWLEDGEMENT ii
ABSTRACT iii
Chapter 1 Introduction to Machine Learning 6

1.1 WHY MACHINE LEARNING?

Chapter 2 Learning means? 9

2.1 THE ARCHITECTURE OF A LEARNING AGENT

Chapter 3 History of Machine leaning 12

3.1 The Neural Modeling (Self Organized System)


3.2 The Symbolic Concept Acquisition Paradigm
3.3 The Modern Knowledge-Intensive Paradigm

Chapter 4 Wellsprings of Machine Learning 14

4.1 Statistics
4.2 Brain Models
4.3 Adaptive Control Theory
4.4 Psychological Models
4.5 Artificial Intelligence
4.6 Evolutionary Models

Chapter 5 Machine Learning Overview 16

5.1 The Aim of Machine Learning


5.2 Machine Learning as a Science

Chapter 6 Classification of Machine Learning 18

Chapter 7 Types of Machine Learning Algorithms 21

7.1 Algorithm Types


7.2 Machine Learning Applications
7.3 Examples of Machine Learning Problems

Chapter 8 Project

iii
Chapter 9 Conclusions

Chapter 10 Future Directions

iii
List of figures

Figure No. Figure Name

Fig 1.1 Learning


Fig 2.1 Architecture of learning agent
Fig 8.3.1 New test file
Fig 8.3.2 Scrapp file
Fig 8.3.3 Scrapp file
Fig 8.3.4 Chart file
Fig 8.3.5 App file
Fig 8.4.1 Home page
Fig 8.4.2 Result page

iii
Chapter 1

Introduction to Machine Learning

Machine Learning (ML) is the computerized approach to analyzing computational work that is based on both a
set of theories and a set of technologies. And, being a very active area of research and development, there is not a
single agreed-upon definition that would satisfy everyone, but there are some aspects, which would be part of any
knowledgeable person’s definition. The definition mostly offers is:

Definition: xss of a machine to improve its own performance through the use of a software that


employs artificial intelligence techniques to mimic the ways by which humans seem to learn, such as repetition
and experience.

Machine Learning (ML) is a sub-field of Artificial Intelligence (AI) which concerns with developing
computational theories of learning and building learning machines. The goal of machine learning, closely coupled
with the goal of AI, is to achieve a thorough understanding about the nature of learning process (both human
learning and other forms of learning), about the computational aspects of learning behaviors, and to implant the
learning capability in computer systems. Machine learning has been recognized as central to the success of
Artificial Intelligence, and it has applications in various areas of science, engineering and society.

1.1 WHY MACHINE LEARNING?

To answer this question, we should look at two issues:

(1).What are the goals of machine learning;

(2). Why these goals are important and desirable.

1.1.1 The Goals of Machine Learning.

The goal of ML, in simples words, is to understand the nature of (human and other forms of) learning, and to
build learning capability in computers. To be more specific, there are three aspects of the goals of ML.

(1) To make the computers smarter, more intelligent. The more direct objective in this aspect is to develop
systems (programs) for specific practical learning tasks in application domains.

iii
(2) To develop computational models of human learning process and perform computer simulations. The study in
this aspect is also called cognitive modeling.

(3) To explore new learning methods and develop general learning algorithms independent of applications.

1.1.2 Why the goals of ML are important and desirable.

It is self-evident that the goals of ML are important and desirable. However, we still give some more supporting
argument to this issue.

First of all, implanting learning ability in computers is practically necessary. Present day computer applications
require the representation of huge amount of complex knowledge and data in programs and thus require
tremendous amount of work. Our ability to code the computers falls short of the demand for applications. If the
computers are endowed with the learning ability, then our burden of coding the machine is eased (or at least
reduced). This is particularly true for developing expert systems where the "bottle-neck" is to extract the expert’s
knowledge and feed the knowledge to computers. The present day computer programs in general (with the
exception of some ML programs) cannot correct their own errors or improve from past mistakes, or learn to
perform a new task by analogy to a previously seen task. In contrast, human beings are capable of all the above.
ML will produce smarter computers capable of all the above intelligent behavior.

Second, the understanding of human learning and its computational aspect is a worthy scientific goal. We human
beings have long been fascinated by our capabilities of intelligent behaviors and have been trying to understand
the nature of intelligence. It is clear that central to our intelligence is our ability to learn. Thus a thorough
understanding of human learning process is crucial to understand human intelligence. ML will gain us the insight
into the underlying principles of human learning and that may lead to the discovery of more effective education
techniques. It will also contribute to the design of machine learning systems.

Finally, it is desirable to explore alternative learning mechanisms in the space of all possible learning methods.
There is no reason to believe that the way human being learns is the only possible mechanism of learning. It is
worth exploring other methods of learning which may be more efficient, effective than human learning.

We remark that Machine Learning has become feasible in many important applications (and hence the popularity
of the field) partly because the recent progress in learning algorithms and theory, the rapidly increase of
computational power, the great availability of huge amount of data, and interests in commercial ML application
development.

Moreover we note that ML is inherently a multi-disciplinary subject area.


We compare the human learning with machine learning along the dimensions of speed, ability to transfer, and

iii
others. which shows that machine learning is both an opportunity and challenge, in the sense that we can hope to
discover ways for machine to learn which are better than ways human learn (the opportunity), and that there are
amply amount of difficulties to be overcome in order to make machines learn (the challenge).

Fig: 1.1
Learning

Chapter 2
Learni
n g
means?

Learning is a phenomenon and process which has manifestations of various aspects. Roughly speaking, learning
process includes (one or more of) the following:

(1) Acquisition of new (symbolic) knowledge. For example, learning mathematics is this kind of learning. When
we say someone has learned math, we mean that the learner obtained descriptions of the mathematical concepts,
understood their meaning and their relationship with each other. The effect of learning is that the learner has
acquired knowledge of mathematical systems and their properties, and that the learner can use this knowledge to
solve math problems. Thus this kind of learning is characterized as obtaining new symbolic information plus the
ability to apply that information effectively.

(2) Development of motor or cognitive skills through instruction and practice. Examples of this kind of learning
are learning to ride a bicycle, to swim, to play piano, etc. This kind of learning is also called skill refinement. In
this case, just acquiring a symbolic description of the rules to perform the task is not sufficient, repeated practice
is needed for the learner to obtain the skill. Skill refinement takes place at the subconscious level.

(3) Refinement and organization of knowledge into more effective representations or more useful form. One
example of this kind of learning can be reorganization of the rules in a knowledge base such that more important
rules are given higher priorities so that they can be used more easily and conveniently.

(4) Discovery of new facts and theories through observation and experiment. For example, the discovery of
physics and chemistry laws.
iii
The general effect of learning in a system is the improvement of the system’s capability to solve problems. It is
hard to imagine a system capable of learning cannot improve its problem-solving performance. A system with
learning capability should be able to do self-changing in order to perform better in its future problem-solving.

We also note that learning cannot take place in isolation: We typically learn something (knowledge K) to
perform some tasks (T), through some experience E, and whether we have learned well or not will be judged by
some performance criteria P at the task T. For example, as Tom Mitchell put it in his ML book, for the "checkers
learning problem", the task T is to play the game of checkers, the performance criteria P could be the percentage
of games won against opponents, and the experience E could be in the form playing practice games with a teacher
(or self). For learning to take place, we do need a learning algorithm A for self-changing, which allows the
learner to get experience E in the task T, and acquire knowledge K (thus change the learner’s knowledge set) to
improve the learner’s performance at task T.

Learning = Improving performance P at task T by


acquiring knowledge K using self-changing algorithm
A through experience E in an environment for
task T.

There are various forms of improvement of a system’s problem-solving ability:

(1) To solve wider range of problems than before - perform generalization.

(2) To solve the same problem more effectively - give better quality solutions.

(3) To solve the same problem more efficiently - faster.

There are other view points as to what constitutes the notion of learning. For example,
Minsky gives a more general definition,

"Learning is making useful changes in our minds".

McCarthy suggests,
"Learning is constructing or modifying representations of what is being experienced."
Simon suggests,

“Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same
task or tasks drawn from the same population more effectively the next time”.

From this perspective, the central aspect of learning is acquisition of certain forms of representation of some
reality, rather than the improvement of performance. However, since it is in general much easier to observe a

iii
system’s performance behavior than its internal representation of reality, we usually link the learning behavior
with the improvement of the the system’s performance.

2.1 THE ARCHITECTURE OF A LEARNING AGENT

Fig: 2.1 Architecture of learning agent

iii
Chapter 3
History of Machine leaning

Over the years, research in machine learning has been pursued with varying degrees of intensity, using different
approaches and placing emphasis on different, aspects and goals. Within the relatively short history of this
discipline, one may distinguish three major periods, each centered on a different concept:

• neural modeling and decision-theoretic techniques

• symbolic concept-oriented learning

• knowledge-intensive approaches combining various learning strategies.

3.1 The Neural Modeling (Self Organized System)

The distinguishing feature of the first concept was the interest in building general purpose learning systems that
start with little or no initial structure or task-oriented knowledge. The major thrust of research based on this
approach involved constructing a variety of neural model-based machines, with random or partially random initial
structure. These systems were generally referred to as neural networks or self-organizing systems. Learning in
such systems consisted of incremental changes in the probabilities that neuron-like elements would transmit a
signal. Due to the early computer technology, most of the research under this neural network model was either
theoretical or involved the construction of special purpose experimental hardware systems. Related research
involved the simulation of evolutionary processes that through random mutation and “natural” selection might
create a system capable of some intelligent, behavior. Experience in the above areas spawned the new discipline
of pattern recognition and led to the development of a decision-theoretic approach to machine learning. In this
approach, learning is equated with the acquisition of linear, polynomial, or related discriminated functions from a
given set of training examples. One of the best known successful learning systems utilizing such techniques as
well as some original new ideas involving non-linear transformations was Samuel’s checkers program. Through
repeated training, this program acquired master-level performance somewhat; different, but closely related,
techniques utilized methods of statistical decision theory for learning pattern recognition rules.

3.2 The Symbolic Concept Acquisition Paradigm

A second major paradigm started to emerge in the early sixties stemming from the work of psychologist and early
AI researchers on models of human learning by Hunt. The paradigm utilized logic or graph structure
representations rather than numerical or statistical methods Systems learned symbolic descriptions representing

iii
higher level knowledge and made strong structural assumptions about the concepts to be acquired. Examples of
work in this paradigm include research on human concept acquisition and various applied pattern recognition
systems.

3.3 The Modern Knowledge-Intensive Paradigm

The third paradigm represented the most recent period of research starting in the mid seventies. Researchers have
broadened their interest beyond learning isolated concepts from examples, and have begun investigating a wide
spectrum of learning methods, most based upon knowledge-rich systems specifically, this paradigm can be
characterizing by several new trends, including:

1. Knowledge-Intensive Approaches: Researchers are strongly emphasizing the use of task-oriented knowledge
and the constraints it provides in guiding the learning process One lesson from the failures of earlier knowledge
and poor learning systems that is acquire and to acquire new knowledge a system must already possess a great
deal of initial knowledge

2. Exploration of alternative methods of learning: In addition to the earlier research emphasis on learning from
examples, researchers are now investigating a wider variety of learning methods such as learning from
instruction.

In contrast to previous efforts, a number of current systems are incorporating abilities to generate and select tasks
and also incorporate heuristics to control their focus of attention by generating learning tasks, proposing
experiments to gather training data, and choosing concepts to acquire

Chapter 4
Wellsprings of Machine Learning

Work in machine learning is now converging from several sources. These different traditions each bring different
methods and different vocabulary which are now being assimilated into a more unified discipline. Here is a brief
listing of some of the separate disciplines that have contributed to machine learning;

4.1 Statistics

A long-standing problem in statistics is how best to use samples drawn from unknown probability distributions to
help decide from which distribution some new sample is drawn. A related problem is how to estimate the value of
iii
an unknown function at a new point given the values of this function at a set of sample points. Statistical methods
for dealing with these problems can be considered instances of machine learning because the decision and
estimation rules depend on a corpus of samples drawn from the problem environment.

4.2 Brain Models

Non-linear elements with weighted inputs have been suggested as simple models of biological neurons. Brain
modelers are interested in how closely these networks approximate the learning phenomena of living brains.
Several important machine learning techniques are based on networks of nonlinear elements often called neural
networks. Work inspired by this school is sometimes called connectionism, brain-style computation, or sub-
symbolic processing.

4.3 Adaptive Control Theory

Control theorists study the problem of controlling a process having unknown parameters which must be estimated
during operation. Often, the parameters change during operation, and the control process must track these
changes. Some aspects of controlling a robot based on sensory inputs represent instances of this sort of problem.
4.4 Psychological Models

Psychologists have studied the performance of humans in various learning tasks. An early example is the EPAM
network for storing and retrieving one member of a pair of words when given another. Related work led to a
number of early decision tree and semantic network methods. More recent work of this sort has been influenced
by activities in artificial.
4.5 Artificial Intelligence

From the beginning, AI research has been concerned with machine learning. Samuel developed a prominent
early program that learned parameters of a function for evaluating board positions in the game of checkers. AI
researchers have also explored the role of analogies in learning and how future actions and decisions can be based
on previous exemplary cases. Recent work has been directed at discovering rules for expert systems using
decision-tree methods and inductive logic programming.
Another theme has been saving and generalizing the results of problem solving using explanation-based learning.

4.6 Evolutionary Models

In nature, not only do individual animals learn to perform better, but species evolve to be better but in their
individual niches. Since the distinction between evolving and learning can be blurred in computer systems,
iii
techniques that model certain aspects of biological evolution have been proposed as learning methods to improve
the performance of computer programs. Genetic algorithms and genetic programming are the most prominent
computational techniques for evolution.

iii
Chapter 5
Machine Learning Overview

Machine Learning can still be defined as learning the theory automatically from the data, through a process of
inference, model fitting, or learning from examples:

 Automated extraction of useful information from a body of data by building good probabilistic models.
 Ideally suited for areas with lots of data in the absence of a general theory.

5.1 The Aim of Machine Learning

The field of machine learning can be organized around three primary research Areas:
 Task-Oriented Studies: The development and analysis of learning systems oriented toward solving a
predetermined set, of tasks (also known as the “engineering approach”).
 Cognitive Simulation: The investigation and computer simulation of human learning processes (also
known as the “cognitive modeling approach”)
 Theoretical Analysis: The theoretical exploration of the space of possible learning methods and
algorithms independent application domain.

Although many research efforts strive primarily towards one of these objectives, progress in on objective often
lends to progress in another. For example, in order to investigate the space of possible learning methods, a
reasonable starting point may be to consider the only known example of robust learning behavior, namely humans
(and perhaps other biological systems) Similarly, psychological investigations of human learning may held by
theoretical analysis that may suggest various possible learning models. The need to acquire a particular form of
knowledge in stone task-oriented study may itself spawn new theoretical analysis or pose the question: “how do
humans acquire this specific skill (or knowledge)?” The existence of these mutually supportive objectives
reflects the entire field of artificial intelligence where expert system research, cognitive simulation, and
theoretical studies provide some cross-fertilization of problems and ideas.

5.2 Machine Learning as a Science

The clear contender for a cognitive invariant in human is the learning mechanism which is the ability facts, skills
and more abstractive concepts. Therefore understanding human learning well enough to reproduce aspect of that
learning behavior in a computer system is, in itself, a worthy scientific goal. Moreover, the computer can render
iii
substantial assistance to cognitive psychology, in that it may be used to test the consistency and completeness of
learning theories and enforce a commitment to the fine-structure process level detail that precludes meaningless
tautological or untestable theories (Bishop, 2006).
The study of human learning processes is also of considerable practical significance. Gaining insights into the
principles underlying human learning abilities is likely to lead to more effective educational techniques. Machine
learning research is all about developing intelligent computer assistant or a computer tutoring systems and many
of these goals are shared within the machine learning fields. According to Jaime et al who stated computer
tutoring are starting to incorporate abilities to infer models of student competence from observed performance.
Inferring the scope of a student’s knowledge and skills in a particular area allows much more effective and
individualized tutoring of the student.

iii
Chapter 6
Classification of Machine Learning

There are several areas of machine learning that could be exploited to solve the problems of email management
and our approach implemented unsupervised machine learning method. Unsupervised learning is a method of
machine learning whereby the algorithm is presented with examples from the input space only and a model is fit
to these observations. In unsupervised learning, a data set of input objects is gathered. Unsupervised learning then
typically treats input objects as a set of random variables. A joint density model is then built for the data set. The
problem of unsupervised learning involved learning patterns in the input when no specific output values are
supplied”.
In the unsupervised learning problem, we observe only the features and have no measurements of the outcome.
Our task is rather to describe how the data are organized or clustered”. Trevor Hastie explained that "In
unsupervised learning or clustering there is no explicit teacher, and the system forms clusters or ‘natural
groupings’ of the input patterns. “Natural” is always defined explicitly or implicitly in the clustering system itself;
and given a particular set of patterns or cost function; different clustering algorithms lead to different clusters.
Often the user will set the hypothesized number of different clusters ahead of time, but how should this be done?
According to Richard O. Duda, “How do we avoid inappropriate representations?"
There are various categories in the field of artificial intelligence. The classifications of machine learning systems
are:

 Supervised Machine Learning: Supervised learning is a machine learning technique for learning a
function from training data. The training data consist of pairs of input objects (typically vectors), and
desired outputs. The output of the function can be a continuous value (called regression), or can predict a
class label of the input object (called classification).

The task of the supervised learner is to predict the value of the function for any valid input object after
having seen a number of training examples (i.e. pairs of input and target output). To achieve this, the
learner has to generalize from the presented data to unseen situations in a "reasonable" way. Supervised
learning is a machine learning technique whereby the algorithm is first presented with training data which
iii
consists of examples which include both the inputs and the desired outputs; thus enabling it to learn a
function. The learner should then be able to generalize from the presented data to unseen examples." by
Mitchell. Supervised learning also implies we are given a training set of (X, Y) pairs by a “teacher”. We
know (sometimes only approximately) the values of f for the m samples in the training set, ≡ we assume
that if we can find a hypothesis, h, that closely agrees with f for the members of ≡ then this hypothesis will
be a good guess for f especially if ≡ is large. Curve fitting is a simple example of supervised learning of a
function.

• Unsupervised Machine Learning: Unsupervised learning is a type of machine learning where manual
labels of inputs are not used. It is distinguished from supervised learning approaches which learn how to
perform a task, such as classification or regression, using a set of human prepared examples.
Unsupervised learning means we are only given the Xs and some (ultimate) feedback function on our
performance. We simply have a training set of vectors without function values of them. The problem in
this case, typically, is to partition the training set into subsets, ≡1 ……≡ R , in some appropriate way.

iii
Chapter 7
Types of Machine Learning Algorithms

Machine learning algorithms are organized into taxonomy, based on the desired outcome of the algorithm.
Common algorithm types include:

 Supervised learning → where the algorithm generates a function that maps inputs to desired outputs.
One standard formulation of the supervised learning task is the classification problem: the learner is
required to learn (to approximate the behavior of) a function which maps a vector into one of several
classes by looking at several input-output examples of the function.
 Unsupervised learning →which models a set of inputs, labeled examples are not available.
 Semi-supervised learning → which combines both labeled and unlabeled examples to generate an
appropriate function or classifier.
 Reinforcement learning → where the algorithm learns a policy of how to act given an observation of the
world. Every action has some impact in the environment, and the environment provides feedback that
guides the learning algorithm.
 Transduction → similar to supervised learning, but does not explicitly construct a function: instead, tries
to predict new outputs based on training inputs, training outputs, and new inputs.
 Learning to learn → where the algorithm learns its own inductive bias based on previous experience.
The performance and computational analysis of machine learning algorithms is a branch of statistics known as
computational learning theory. Machine learning is about designing algorithms that allow a computer to learn.
Learning is not necessarily involves consciousness but learning is a matter of finding statistical regularities or
other patterns in the data. Thus, many machine learning algorithms will barely resemble how human might
approach a learning task. However, learning algorithms can give insight into the relative difficulty of learning in
different environments.
7.1 Algorithm Types

In the area of supervised learning which deals much with classification. These are the algorithms types:
a. Linear Classifiers
1. Fisher’s linear discriminant
2. Naïve Bayes Classifier
3. Perceptron
iii
4. Support Vector Machine
b. Quadratic Classifiers
c. Boosting
d. Decision Tree
e. Neural networks
f. Bayesian Networks

7.1. a. Linear Classifiers:

In machine learning, the goal of classification is to group items that have similar feature values, into groups.
Timothy et al (Timothy Jason Shepard, 1998) stated that a linear classifier achieves this by making a
classification decision based on the value of the linear combination of the features. If the input feature vector to
the classifier is a real vector x, then the output score is

where is a real vector of weights and f is a function that converts the dot product of the two vectors into the
desired output.

7.1. (a.1) Fisher’s linear discriminant

Linear discriminant analysis (LDA) and the related Fisher's linear discriminant are methods used in  machine
learning to find a linear combination of features which characterizes or separates two or more classes of objects
or events. The resulting combination may be used as a linear classifier or, more commonly, for dimensionality
reduction before later classification.
7.1. (a.2) Naïve Bayes Classifier

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong


(naive) independence assumptions. A more descriptive term for the underlying probability model would be
"independent feature model".
In simple terms, a naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated
to the presence or absence of any other feature, given the class variable. For example, a fruit may be considered to
be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features
to contribute independently to the probability that this fruit is an apple, regardless of the presence or absence of
the other features.

iii
7.1. (a.3) Perceptron
 The perceptron is an algorithm for supervised classification of an input into one of several possible non-binary
outputs. The learning algorithm for perceptrons is an online algorithm, in that it processes elements in the training
set one at a time.

7.1. (a.4) Support vector machines

In machine learning, support vector machines (SVMs) are supervised learning models with associated


learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. The
basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the
output, making it a non-probabilistic binary linear classifier. Given a set of training examples, each marked as
belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one
category or the other. An SVM model is a representation of the examples as points in space, mapped so that the
examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then
mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

7.1.b. Quadratic classifier 

A quadratic classifier is used in machine learning and statistical classification to separate measurements of two or


more classes of objects or events by a quadric surface. It is a more general version of the linear classifier.
7.1.c. Boosting

Boosting is a machine learning meta-algorithm for reducing bias in supervised learning. Boosting is based on the


question posed as “Can a set of weak learners create a single strong learner?” A weak learner is defined to be a
classifier which is only slightly correlated with the true classification. In contrast, a strong learner is a classifier
that is arbitrarily well-correlated with the true classification.

7.1.d. Neural networks 

Neural networks  are capable of machine learning and pattern recognition. They are usually presented as systems
of interconnected "neurons" that can compute values from inputs by feeding information through the network.
Neural networking is the science of creating computational solutions modeled after the brain. Like the human
brain, neural networks are trainable-once they are taught to solve one complex problem, they can apply their
skills to a new set of problems without having to start the learning process from scratch.

7.1.e. Bayesian network

iii
A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical
model is a probabilistic graphical model (a type ofstatistical model) that represents a set of random variables and
their conditional dependencies via a directed acyclic graph (DAG). For example, suppose that there are two
events which could cause grass to be wet: either the sprinkler is on or it's raining. Also, suppose that the rain has a
direct effect on the use of the sprinkler (namely that when it rains, the sprinkler is usually not turned on). Then the
situation can be modeled with a Bayesian network (shown). All three variables have two possible values, T (for
true) and F (for false).

7.1.f. Decision Trees

A decision tree is a hierarchical data structure implementing the divide-and-conquer strategy. It is an efficient
nonparametric method, which can be used for both classification and regression. A decision tree is a hierarchical
model for supervised learning whereby the local region is identified in a sequence of recursive splits in a smaller
number of steps. A decision tree is composed of internal decision nodes and terminal leaves (see figure). Each
decision node m implements a test function fm(x) with discrete outcomes labeling the branches. Given an input, at
each node, a test is applied and one of the branches is taken depending on the outcome. This process starts at the
root and is repeated recursively until a leaf node is hit, at which point the value written in the leaf constitutes the
output.

7.2 Machine Learning Applications

The other aspect for classifying learning systems is the area of application which gives a new dimension for
machine learning. Below are areas to which various existing learning systems have been applied. They are:
1) Computer Programming

2) Game playing (chess, poker, and so on)

3) Image recognition, Speech recognition

4) Medical diagnosis

5) Agriculture, Physics

6) Email management, Robotics

7) Music
iii
8) Mathematics

9) Natural Language Processing and many more.

iii
7.3 Examples of Machine Learning Problems

There are many examples of machine learning problems. Much of this course will focus on classification
problems in which the goal is to categorize objects into a fixed set of categories. Here are several examples:

• Optical character recognition: categorize images of handwritten characters by the letters represented

• Face detection: find faces in images (or indicate if a face is present)

• Spam filtering: identify email messages as spam or non-spam

• Topic spotting: categorize news articles (say) as to whether they are about politics, sports, entertainment, etc.

• Spoken language understanding: within the context of a limited domain, determine the meaning of something
uttered by a speaker to the extent that it can be classified into one of a fixed set of categories

• Medical diagnosis: diagnose a patient as a sufferer or non-sufferer of some disease.

• Customer segmentation: predict, for instance, which customers will respond to a particular promotion.

• Fraud detection: identify credit card transactions (for instance) which may be fraudulent in nature

iii
Chapter 8
PROJECT

8.1 Project Overview

Name: Twitter Influencer Predictor

8.2 Technologies used

HTML

CSS

Python

Machine Learning

Operating System: Windows 7/8/8.1/10

Framing Tool: Anaconda Spyder

Team Size: 4

iii
8.3 Screenshots

- New_test.py

Fig 8.3.1 New test file

- Scrapp.py

iii
Fig 8.3.2 Scrapp file

Fig 8.3.3 Scrapp file


- Chart.py

iii
Fig 8.3.4 Chart file

iii
- App.py

Fig 8.3.5 App file

8.4 Project Screenshots

iii
- Homepage

Fig 8.4.1 Home Page

- Result page

iii
Fig 8.4.2 Result page

Chapter 9
Conclusion

iii
Machine Learning Theory is both a fundamental theory with many basic and compelling foundational questions,
and a topic of practical importance that helps to advance the state of the art in software by providing
mathematical frameworks for designing new machine learning algorithms. It is an exciting time for the field, as
connections to many other areas are being discovered and explored, and as new machine learning applications
bring new questions to be modeled and studied. It is safe to say that the potential of Machine Learning and its
theory lie beyond the frontiers of our imagination.

iii
Chapter 10
Future Directions

Research in Machine Learning Theory is a combination of attacking established fundamental questions, and
developing new frameworks for modeling the needs of new machine learning applications. While it is impossible
to know where the next breakthroughs will come, a few topics one can expect the future to hold include:

• Better understanding how auxiliary information, such as unlabeled data, hints from a user, or previously-
learned tasks, can best be used by a machine learning algorithm to improve its ability to learn new things.
Traditionally, Machine Learning Theory has focused on problems of learning a task (say, identifying spam) from
labeled examples (email labeled as spam or not). However, often there is additional information available. One
might have access to large quantities of unlabeled data (email messages not labeled by their type, or discussion-
group transcripts on the web) that could potentially provide useful information. One might have other hints from
the user besides just labels, e.g. highlighting relevant portions of the email message. Or, one might have
previously learned similar tasks and want to transfer some of that experience to the job at hand. These are all
issues for which a solid theory is only beginning to be developed.

• Further developing connections to economic theory. As software agents based on machine learning are used in
competitive settings, “strategic” issues become increasingly important. Most algorithms and models to date have
focused on the case of a single learning algorithm operating in an environment that, while it may be changing,
does not have its own motivations and strategies. However, if learning algorithms are to operate in settings
dominated by other adaptive algorithms acting in their own users’ interests, such as bidding on items or
performing various kinds of negotiations, then we have a true merging of computer science and economic
models. In this combination, many of the fundamental issues are still wide open.

• Development of learning algorithms with an eye towards the use of learning as part of a larger system. Most
machine learning models view learning as a standalone process, focusing on prediction accuracy as the measure
of performance. However, when a learning algorithm is placed in a larger system, other issues may come into
play. For example, one would like algorithms that have more powerful models of their own confidence or that can
optimize multiple objectives. One would like models that capture the process of deciding what to learn, in
addition to how to learn it. There has been some theoretical work on these issues, but there is certainly is much
more to be done.

iii
REFERENCES

MACHINE
LEARNING
PROJECT 19
References
[1] "Intro to Machine
Learning | Udacity."
Intro to Machine
Learning | Udacity.
Accessed
iii
April 27, 2016.
https://www.udacity.c
om/course/intro-to-
machine-learning--
ud120.
[2] "Elements of
Statistical Learning:
Data Mining,
Inference, and
iii
Prediction. 2nd
Edition.
Datasets:Coronary
Heart Disease
Dataset." Elements of
Statistical Learning:
Data
Mining, Inference, and
Prediction. 2nd
iii
Edition. Accessed
April 27, 2016.
http://statweb.stanford
.edu/~tibs/ElemStatLe
arn/.
MACHINE
LEARNING
PROJECT 19
References
iii
[1] "Intro to Machine
Learning | Udacity."
Intro to Machine
Learning | Udacity.
Accessed
April 27, 2016.
https://www.udacity.c
om/course/intro-to-
machine-learning--
ud120.
iii
[2] "Elements of
Statistical Learning:
Data Mining,
Inference, and
Prediction. 2nd
Edition.
Datasets:Coronary
Heart Disease
Dataset." Elements of
iii
Statistical Learning:
Data
Mining, Inference, and
Prediction. 2nd
Edition. Accessed
April 27, 2016.
http://statweb.stanford
.edu/~tibs/ElemStatLe
arn/.
iii
MACHINE
LEARNING
PROJECT 19
References
[1] "Intro to Machine
Learning | Udacity."
Intro to Machine
Learning | Udacity.
Accessed
iii
April 27, 2016.
https://www.udacity.c
om/course/intro-to-
machine-learning--
ud120.
[2] "Elements of
Statistical Learning:
Data Mining,
Inference, and
iii
Prediction. 2nd
Edition.
Datasets:Coronary
Heart Disease
Dataset." Elements of
Statistical Learning:
Data
Mining, Inference, and
Prediction. 2nd
iii
Edition. Accessed
April 27, 2016.
http://statweb.stanford
.edu/~tibs/ElemStatLe
arn/.
[1] "Intro to Machine
Learning | Udacity."
Intro to Machine
Learning | Udacity.
Accessed
iii
April 27, 2016.
https://www.udacity.c
om/course/intro-to-
machine-learning--
ud120.
[2] "Elements of
Statistical Learning:
Data Mining,
Inference, and
iii
Prediction. 2nd
Edition.
Datasets:Coronary
Heart Disease
Dataset." Elements of
Statistical Learning:
Data
Mining, Inference, and
Prediction. 2nd
iii
Edition. Accessed
April 27, 2016.
http://statweb.stanford
.edu/~tibs/ElemStatLe
arn/.
 An introduction in Machine learning:

http://alex.smola.org/drafts/thebook.pdf

 http://robotics.stanford.edu/~nilsson/MLBOOK.pdf

 http://research.microsoft.com/en-us/um/people/cmbishop/prml/

iii
iii
Appendix

- Index.py

{% extends "site1.html" %}
{% block code %}

<form action="{{ url_for('hello') }}" method="post">


<table style="margin-top: 150px">
<tr>
<td>
Enter Twitter Username1:
</td>
<td>
<input type="text" placeholder="@username1" style="height: 25px" name="name1"
size="50px" required>
</td>
</tr>
<br>
<br>
<tr>
<td>
Enter Twitter Username2:
</td>
<td>
<input type="text" placeholder="@username2" style="height: 25px" name="name2"
size="50px" required>
</td>
</tr>
<tr>
<td >
<input type="submit" size="40px" value="Submit" name="Submit " class="button" >
</td>
iii
</tr>
</table>
</form>
{% endblock %}

iii
- Response.py

{% extends "site2.html" %}
{% block code %}
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
body {
font-family: Arial;
color: white;
}

.split {
height: 100%
width: 50%;
position: fixed;
z-index: 1;
top: 0;
overflow-x: hidden;
padding-top: 20px;
}
.splits {
height: 100%;
width: 50%;
z-index: 1;
overflow-x: hidden;
padding-top: 20px;
}

.left {

iii
left: 0;
background-color: orange;
}

.right {
right: 0;
background-color: red;
}

.centered {
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
text-align: center;
}

.centered img {
width: 150px;
border-radius: 50%;
}
</style>
</head>
<body>
<div>
<div>
<marquee><h1>{{ "WINNER" +status[0]}}</h2></marquee>
</div>
<div class="split left">

<div class="centered">
<img src={{status[3]}}>
<h3>User Handeler 1 {{name1}}</h3>
<table style="width:100%">
iii
{% for key, value in status[1].items() %}
<tr>
<th> {{ key }} </th>
<td> {{ value }} </td>
</tr>
{% endfor %}
</table>
<p>Some text.</p>
</div>
</div>
<div class="split right">
<div class="centered">
<img src={{status[4]}}>
<h3>User Handler 2{{name2}}</h3>
<table style="width:100%">
{% for key, value in status[2].items() %}
<tr>
<th> {{ key }} </th>
<td> {{ value }} </td>
</tr>
{% endfor %}
</table>
<p>{{ status[0]}}</p>

</div>
</div>
</div>
</body>
</html>
{% endblock %}

- Site1.py

iii
<!DOCTYPE html>
<html>
<center>
<head>
<style>
h.intro{
margin: 20px;
font-size:40px;
text-align: center;
color:#FFD700;
font-family:Comic Sans MS;
background-color:transparent;
fill: black;
}
body {
margin: 0;
font-family: Arial, Helvetica, sans-serif;
text-align: center
margin-top:40px;
background-image: url('https://images.unsplash.com/photo-1499750310107-5fef28a66643?ixlib=rb-
1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1950&q=80');
background-position: bottom;
height: 80%;

/* Center and scale the image nicely */


background-position: center;
background-repeat: no-repeat;
background-size: cover;}

.topnav {
overflow: hidden;
background-color: transparent;

iii
}

.topnav a {
float: left;
color: black;
text-align: center;
padding: 14px 16px;
text-decoration: none;
font-size: 17px;
}

.topnav a:hover {
background-color: #ddd;
color: black;
}

.topnav a.active {
background-color: #3d0206;
color: white;
}
table{
font-size: 25px;
padding-top:0px;
color:#3d0206;
font-style: bold;
font-weight: 20;
background-color:white;
border-style: solid;
border-color: #3d0206;
border-radius: 6px;
}
.button {
border-radius: 4px;

iii
border: none;
color: #FFFFFF;
text-align: center;
font-size: 28px;
padding: 15px;
cursor: pointer;
margin: 5px;
float: right;
background:#3d0206;
}
.button:hover {
opacity: 1;
color: rgba(139,60,231,1);
cursor: pointer;
right: 0;
}
h2{
padding-top: 60px;
}
footer {
display: block;
background-size: 100px,20px;
background: #3d0206;
margin-top:80px;
color:white;
text-align: center;
}
</style>
<title>Twitter influencer</title>
</head>
<body>
<div class="topnav">
<a class="active" href="http://localhost:8000/main">Home</a>

iii
<a href="https://www.socialbakers.com/statistics/twitter/profiles/india">News</a>
<a href="http://localhost:8000/contact">Contact Us</a>
<a href="http://localhost:8000/about">About</a>
</div>
<h class="intro">TWITTER INFLUENCER</h>
<h2 style="color:#ffcc99">
<br>
Influencers have been making waves in the marketing industry over the past couple of years
<br>
Twitter has played a crucial role in many influencer marketing campaigns.
<br>
According to a recent study conducted by Twitter, partnering with the right Twitter influencers can help
brands increase the customers’ purchase intent by up to 88%.</h2>
{% block code %}
{% endblock %}
<br>
<br>
</body>
<footer>
<p>Posted by: influencers.com</p>
<p>Contact information: <a href="https://www.google.com/gmail/">
kunalsrivastava365.com</a></p>
</footer>
</center>
</html>

- Site2.py

<!DOCTYPE html>
<html>
<center>
<head>

iii
<style>
h.intro{
margin: 20px;
padding-top: 500px;
font-size:40px;
text-align: center;
color:white;
font-style:italic;
background-color:#003300;
fill: black;
}
table{
font-size: 30px;
padding-top:100px;
}
.imp{
background-color: #003300; /* Green */
border: none;
color: white;
padding: 15px 32px;
text-align: center;
text-decoration: none;
display: inline-block;
font-size: 16px;
margin: 4px 2px;
cursor: pointer;
}
.button {
border-radius: 4px;
background-color: #f4511e;
border: none;
color: #FFFFFF;
text-align: center;

iii
font-size: 28px;
padding: 15px;
width: 150px;
transition: all 0.5s;
cursor: pointer;
margin: 5px;
}
.button:hover {
opacity: 1;
color: rgba(139,60,231,1);
cursor: pointer;
right: 0;
}
body {
margin: 0;
font-family: Arial, Helvetica, sans-serif;
text-align: center
margin-top:40px;
background-image: url(https://media.sproutsocial.com/uploads/2015/11/Influencer-Marketing.png);
}

.topnav {
overflow: hidden;
background-color: #333;
}

.topnav a {
float: left;
color: #f2f2f2;
text-align: center;
padding: 14px 16px;
text-decoration: none;
font-size: 17px;

iii
}

.topnav a:hover {
background-color: #ddd;
color: black;
}

.topnav a.active {
background-color: #4CAF50;
color: white;
}
ul{
display: inline-block;
*display: inline;
zoom: 1
}
footer {
display: block;
background-size: 100px,20px;
background: #003300;
margin-top:80px;
color:white;
text-align: center;
}
</style>
<title>Twitter influencer</title>
</head>
<body>
<div class="topnav">
<a class="active" href="https://www.google.com/">Home</a>
<a href="#news">News</a>
<a href="#contact">Contact</a>
<a href="#about">About</a>

iii
</div>
<h class="intro">TWITTER INFLUENCER</h>
{% block code %}
{% endblock %}
<br><br>
<img src="https://tpc.googlesyndication.com/daca_images/simgad/8094133800400691929">
<footer>
<p>Contact information: <a href="https://www.google.com/gmail/">
kunalsrivastava365.com</a>.</p>
</footer>
</ceenter>
</html>

iii
iii

You might also like