About Machinelearning
About Machinelearning
org/wiki/Machine_learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the
development and study of statistical algorithms that can learn from data and generalize to
unseen data, and thus perform tasks without explicit instructions.[1] Advances in the field
of deep learning have allowed neural networks to surpass many previous approaches in
performance.[2]
ML finds application in many fields, including natural language processing, computer vision,
speech recognition, email filtering, agriculture, and medicine.[3][4] The application of ML to
business problems is known as predictive analytics.
The term machine learning was coined in 1959 by Arthur Samuel, an IBM employee and
pioneer in the field of computer gaming and artificial intelligence.[8][9] The synonym self-
teaching computers was also used in this time period.[10][11]
Although the earliest machine learning model was introduced in the 1950s when Arthur
Samuel invented a program that calculated the winning chance in checkers for each side, the
history of machine learning roots back to decades of human desire and effort to study
human cognitive processes.[12] In 1949, Canadian psychologist Donald Hebb published the
book The Organization of Behavior, in which he introduced a theoretical neural structure
formed by certain interactions among nerve cells.[13] Hebb's model of neurons interacting
with one another set a groundwork for how AIs and machine learning algorithms work
under nodes, or artificial neurons used by computers to communicate data.[12] Other
researchers who have studied human cognitive systems contributed to the modern machine
learning technologies as well, including logician Walter Pitts and Warren McCulloch, who
proposed the early mathematical models of neural networks to come up with algorithms
that mirror human thought processes.[12]
By the early 1960s, an experimental "learning machine" with punched tape memory, called
Cybertron, had been developed by Raytheon Company to analyze sonar signals,
electrocardiograms, and speech patterns using rudimentary reinforcement learning. It was
repetitively "trained" by a human operator/teacher to recognize patterns and equipped
with a "goof" button to cause it to reevaluate incorrect decisions.[14] A representative book
on research into machine learning during the 1960s was Nilsson's book on Learning
Machines, dealing mostly with machine learning for pattern classification.[15] Interest
related to pattern recognition continued into the 1970s, as described by Duda and Hart in
1973.[16] In 1981 a report was given on using teaching strategies so that an artificial neural
network learns to recognize 40 characters (26 letters, 10 digits, and 4 special symbols) from
a computer terminal.[17]
Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms studied
in the machine learning field: "A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P if its performance at tasks in T,
as measured by P, improves with experience E."[18] This definition of the tasks in which
machine learning is concerned offers a fundamentally operational definition rather than
defining the field in cognitive terms. This follows Alan Turing's proposal in his paper
"Computing Machinery and Intelligence", in which the question "Can machines think?" is
replaced with the question "Can machines do what we (as thinking entities) can do?".[19]
Modern-day machine learning has two objectives. One is to classify data based on models
which have been developed; the other purpose is to make predictions for future outcomes
based on these models. A hypothetical algorithm specific to classifying data may use
computer vision of moles coupled with supervised learning in order to train it to classify the
cancerous moles. A machine learning algorithm for stock trading may inform the trader of
future potential predictions.[20]
As a scientific endeavor, machine learning grew out of the quest for artificial intelligence
(AI). In the early days of AI as an academic discipline, some researchers were interested in
having machines learn from data. They attempted to approach the problem with various
symbolic methods, as well as what were then termed "neural networks"; these were mostly
perceptrons and other models that were later found to be reinventions of the generalized
linear models of statistics.[22] Probabilistic reasoning was also employed, especially in
automated medical diagnosis.[23]: 488
Machine learning (ML), reorganized and recognized as its own field, started to flourish in
the 1990s. The field changed its goal from achieving artificial intelligence to tackling
solvable problems of a practical nature. It shifted focus away from the symbolic approaches
it had inherited from AI, and toward methods and models borrowed from statistics, fuzzy
logic, and probability theory.[24]
There is a close connection between machine learning and compression. A system that
predicts the posterior probabilities of a sequence given its entire history can be used for
optimal data compression (by using arithmetic coding on the output distribution).
Conversely, an optimal compressor can be used for prediction (by finding the symbol that
compresses best, given the previous history). This equivalence has been used as a
justification for using data compression as a benchmark for "general intelligence".[25][26]
[27]
An alternative view can show compression algorithms implicitly map strings into implicit
feature space vectors, and compression-based similarity measures compute similarity
within these feature spaces. For each compressor C(.) we define an associated vector space
ℵ, such that C(.) maps an input string x, corresponding to the vector norm ||~x||. An
exhaustive examination of the feature spaces underlying all compression algorithms is
precluded by space; instead, feature vectors chooses to examine three representative
lossless compression methods, LZW, LZ77, and PPM.[28]
According to AIXI theory, a connection more directly explained in Hutter Prize, the best
possible compression of x is the smallest possible software that generates x. For example, in
that model, a zip file's compressed size includes both the zip file and the unzipping
software, since you can not unzip it without both, but there may be an even smaller
combined form.
Data compression aims to reduce the size of data files, enhancing storage efficiency and
speeding up data transmission. K-means clustering, an unsupervised machine learning
algorithm, is employed to partition a dataset into a specified number of clusters, k, each
represented by the centroid of its points. This process condenses extensive datasets into a
more compact set of representative points. Particularly beneficial in image and signal
processing, k-means clustering aids in data reduction by replacing groups of data points
with their centroids, thereby preserving the core information of the original data while
significantly decreasing the required storage space.[32]
Machine learning and data mining often employ the same methods and overlap
significantly, but while machine learning focuses on prediction, based on known properties
learned from the training data, data mining focuses on the discovery of (previously)
unknown properties in the data (this is the analysis step of knowledge discovery in
databases). Data mining uses many machine learning methods, but with different goals; on
the other hand, machine learning also employs data mining methods as "unsupervised
learning" or as a preprocessing step to improve learner accuracy. Much of the confusion
between these two research communities (which do often have separate conferences and
separate journals, ECML PKDD being a major exception) comes from the basic assumptions
they work with: in machine learning, performance is usually evaluated with respect to the
ability to reproduce known knowledge, while in knowledge discovery and data mining
(KDD) the key task is the discovery of previously unknown knowledge. Evaluated with
respect to known knowledge, an uninformed (unsupervised) method will easily be
outperformed by other supervised methods, while in a typical KDD task, supervised
methods cannot be used due to the unavailability of training data.
Machine learning also has intimate ties to optimization: Many learning problems are
formulated as minimization of some loss function on a training set of examples. Loss
functions express the discrepancy between the predictions of the model being trained and
the actual problem instances (for example, in classification, one wants to assign a label to
instances, and models are trained to correctly predict the preassigned labels of a set of
examples).[34]
Machine learning and statistics are closely related fields in terms of methods, but distinct in
their principal goal: statistics draws population inferences from a sample, while machine
learning finds generalizable predictive patterns.[35] According to Michael I. Jordan, the
ideas of machine learning, from methodological principles to theoretical tools, have had a
long pre-history in statistics.[36] He also suggested the term data science as a placeholder
to call the overall field.[36]
Conventional statistical analyses require the a priori selection of a model most suitable for
the study data set. In addition, only significant or theoretically relevant variables based on
previous experience are included for analysis. In contrast, machine learning is not built on a
pre-structured model; rather, the data shape the model by detecting underlying patterns.
The more variables (input) used to train the model, the more accurate the ultimate model
will be.[37]
Leo Breiman distinguished two statistical modeling paradigms: data model and algorithmic
model,[38] wherein "algorithmic model" means more or less the machine learning
algorithms like Random Forest.
Some statisticians have adopted methods from machine learning, leading to a combined
field that they call statistical learning.[39]
For the best performance in the context of generalization, the complexity of the hypothesis
should match the complexity of the function underlying the data. If the hypothesis is less
complex than the function, then the model has under fitted the data. If the complexity of the
model is increased in response, then the training error decreases. But if the hypothesis is
too complex, then the model is subject to overfitting and generalization will be poorer.[43]
In addition to performance bounds, learning theorists study the time complexity and
feasibility of learning. In computational learning theory, a computation is considered
feasible if it can be done in polynomial time. There are two kinds of time complexity results:
Positive results show that a certain class of functions can be learned in polynomial time.
Negative results show that certain classes cannot be learned in polynomial time.
Machine learning approaches are traditionally divided into three broad categories, which
correspond to learning paradigms, depending on the nature of the "signal" or "feedback"
available to the learning system:
Although each algorithm has advantages and limitations, no single algorithm works for all
problems.[44][45][46]
Supervised learning algorithms build a mathematical model of a set of data that contains
both the inputs and the desired outputs.[47] The data, known as training data, consists of a
set of training examples. Each training example has one or more inputs and the desired
output, also known as a supervisory signal. In the mathematical model, each training
example is represented by an array or vector, sometimes called a feature vector, and the
training data is represented by a matrix. Through iterative optimization of an objective
function, supervised learning algorithms learn a function that can be used to predict the
output associated with new inputs.[48] An optimal function allows the algorithm to
correctly determine the output for inputs that were not a part of the training data. An
algorithm that improves the accuracy of its outputs or predictions over time is said to have
learned to perform that task.[18]
Unsupervised learning algorithms find structures in data that has not been labeled,
classified or categorized. Instead of responding to feedback, unsupervised learning
algorithms identify commonalities in the data and react based on the presence or absence of
such commonalities in each new piece of data. Central applications of unsupervised
machine learning include clustering, dimensionality reduction,[7] and density estimation.
[51]
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so
that observations within the same cluster are similar according to one or more
predesignated criteria, while observations drawn from different clusters are dissimilar.
Different clustering techniques make different assumptions on the structure of the data,
often defined by some similarity metric and evaluated, for example, by internal
compactness, or the similarity between members of the same cluster, and separation, the
difference between clusters. Other methods are based on estimated density and graph
connectivity.
Semi-supervised learning falls between unsupervised learning (without any labeled training
data) and supervised learning (with completely labeled training data). Some of the training
examples are missing training labels, yet many machine-learning researchers have found
that unlabeled data, when used in conjunction with a small amount of labeled data, can
produce a considerable improvement in learning accuracy.
In weakly supervised learning, the training labels are noisy, limited, or imprecise; however,
these labels are often cheaper to obtain, resulting in larger effective training sets.[54]
Reinforcement learning is an area of machine learning concerned with how software agents
ought to take actions in an environment so as to maximize some notion of cumulative
reward. Due to its generality, the field is studied in many other disciplines, such as game
theory, control theory, operations research, information theory, simulation-based
optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In
reinforcement learning, the environment is typically represented as a Markov decision
process (MDP). Many reinforcements learning algorithms use dynamic programming
techniques.[55] Reinforcement learning algorithms do not assume knowledge of an exact
mathematical model of the MDP and are used when exact models are infeasible.
Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a
game against a human opponent.
Other approaches have been developed which do not fit neatly into this three-fold
categorization, and sometimes more than one is used by the same machine learning system.
For example, topic modeling, meta-learning.[57]
Self-learning, as a machine learning paradigm was introduced in 1982 along with a neural
network capable of self-learning, named crossbar adaptive array (CAA).[58] It is learning
with no external rewards and no external teacher advice. The CAA self-learning algorithm
computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about
consequence situations. The system is driven by the interaction between cognition and
emotion.[59]
The self-learning algorithm updates a memory matrix W =||w(a,s)|| such that in each
iteration executes the following machine learning routine:
It is a system with only one input, situation, and only one output, action (or behavior) a.
There is neither a separate reinforcement input nor an advice input from the environment.
The backpropagated value (secondary reinforcement) is the emotion toward the
consequence situation. The CAA exists in two environments, one is the behavioral
environment where it behaves, and the other is the genetic environment, wherefrom it
initially and only once receives initial emotions about situations to be encountered in the
behavioral environment. After receiving the genome (species) vector from the genetic
environment, the CAA learns a goal-seeking behavior, in an environment that contains both
desirable and undesirable situations.[60]
Manifold learning algorithms attempt to do so under the constraint that the learned
representation is low-dimensional. Sparse coding algorithms attempt to do so under the
constraint that the learned representation is sparse, meaning that the mathematical model
has many zeros. Multilinear subspace learning algorithms aim to learn low-dimensional
representations directly from tensor representations for multidimensional data, without
reshaping them into higher-dimensional vectors.[66] Deep learning algorithms discover
multiple levels of representation, or a hierarchy of features, with higher-level, more abstract
features defined in terms of (or generating) lower-level features. It has been argued that an
intelligent machine is one that learns a representation that disentangles the underlying
factors of variation that explain the observed data.[67]
Feature learning is motivated by the fact that machine learning tasks such as classification
often require input that is mathematically and computationally convenient to process.
However, real-world data such as images, video, and sensory data has not yielded attempts
to algorithmically define specific features. An alternative is to discover such features or
representations through examination, without relying on explicit algorithms.
In data mining, anomaly detection, also known as outlier detection, is the identification of
rare items, events or observations which raise suspicions by differing significantly from the
majority of the data.[70] Typically, the anomalous items represent an issue such as bank
fraud, a structural defect, medical problems or errors in a text. Anomalies are referred to as
outliers, novelties, noise, deviations and exceptions.[71]
In particular, in the context of abuse and network intrusion detection, the interesting
objects are often not rare objects, but unexpected bursts of inactivity. This pattern does not
adhere to the common statistical definition of an outlier as a rare object. Many outlier
detection methods (in particular, unsupervised algorithms) will fail on such data unless
aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the
micro-clusters formed by these patterns.[72]
Rule-based machine learning is a general term for any machine learning method that
identifies, learns, or evolves "rules" to store, manipulate or apply knowledge. The defining
characteristic of a rule-based machine learning algorithm is the identification and
utilization of a set of relational rules that collectively represent the knowledge captured by
the system. This is in contrast to other machine learning algorithms that commonly identify
a singular model that can be universally applied to any instance in order to make a
prediction.[77] Rule-based machine learning approaches include learning classifier
systems, association rule learning, and artificial immune systems.
Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami
introduced association rules for discovering regularities between products in large-scale
transaction data recorded by point-of-sale (POS) systems in supermarkets.[78] For
example, the rule
o
n
i
o
n
s
,
p
o
t
a
t
o
e
s
}
⇒
{
b
u
r
g
e
r
found in the sales data of a supermarket would indicate that if a customer buys onions and
potatoes together, they are likely to also buy hamburger meat. Such information can be used
as the basis for decisions about marketing activities such as promotional pricing or product
placements. In addition to market basket analysis, association rules are employed today in
application areas including Web usage mining, intrusion detection, continuous production,
and bioinformatics. In contrast with sequence mining, association rule learning typically
does not consider the order of items either within a transaction or across transactions.
Learning classifier systems (LCS) are a family of rule-based machine learning algorithms
that combine a discovery component, typically a genetic algorithm, with a learning
component, performing either supervised learning, reinforcement learning, or
unsupervised learning. They seek to identify a set of context-dependent rules that
collectively store and apply knowledge in a piecewise manner in order to make predictions.
[79]
Inductive logic programming (ILP) is an approach to rule learning using logic programming
as a uniform representation for input examples, background knowledge, and hypotheses.
Given an encoding of the known background knowledge and a set of examples represented
as a logical database of facts, an ILP system will derive a hypothesized logic program that
entails all positive and no negative examples. Inductive programming is a related field that
considers any kind of programming language for representing hypotheses (and not only
logic programming), such as functional programs.
A machine learning model is a type of mathematical model that, once "trained" on a given
dataset, can be used to make predictions or classifications on new data. During training, a
learning algorithm iteratively adjusts the model's internal parameters to minimize errors in
its predictions.[84] By extension, the term "model" can refer to several levels of specificity,
from a general class of models and their associated learning algorithms to a fully trained
model with all its internal parameters tuned.[85]
Various types of models have been used and researched for machine learning systems,
picking the best model for a task is called model selection.
Artificial neural networks (ANNs), or connectionist systems, are computing systems vaguely
inspired by the biological neural networks that constitute animal brains. Such systems
"learn" to perform tasks by considering examples, generally without being programmed
with any task-specific rules.
The original goal of the ANN approach was to solve problems in the same way that a human
brain would. However, over time, attention moved to performing specific tasks, leading to
deviations from biology. Artificial neural networks have been used on a variety of tasks,
including computer vision, speech recognition, machine translation, social network filtering,
playing board and video games and medical diagnosis.
Deep learning consists of multiple hidden layers in an artificial neural network. This
approach tries to model the way the human brain processes light and sound into vision and
hearing. Some successful applications of deep learning are computer vision and speech
recognition.[86]
Decision tree learning uses a decision tree as a predictive model to go from observations
about an item (represented in the branches) to conclusions about the item's target value
(represented in the leaves). It is one of the predictive modeling approaches used in
statistics, data mining, and machine learning. Tree models where the target variable can
take a discrete set of values are called classification trees; in these tree structures, leaves
represent class labels, and branches represent conjunctions of features that lead to those
class labels. Decision trees where the target variable can take continuous values (typically
real numbers) are called regression trees. In decision analysis, a decision tree can be used to
visually and explicitly represent decisions and decision making. In data mining, a decision
tree describes data, but the resulting classification tree can be an input for decision-making.
A Gaussian process is a stochastic process in which every finite collection of the random
variables in the process has a multivariate normal distribution, and it relies on a pre-
defined covariance function, or kernel, that models how pairs of points relate to each other
depending on their locations.
A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the
process of natural selection, using methods such as mutation and crossover to generate new
genotypes in the hope of finding good solutions to a given problem. In machine learning,
genetic algorithms were used in the 1980s and 1990s.[90][91] Conversely, machine
learning techniques have been used to improve the performance of genetic and
evolutionary algorithms.[92]
Typically, machine learning models require a high quantity of reliable data to perform
accurate predictions. When training a machine learning model, machine learning engineers
need to target and collect a large and representative sample of data. Data from the training
set can be as varied as a corpus of text, a collection of images, sensor data, and data
collected from individual users of a service. Overfitting is something to watch out for when
training a machine learning model. Trained models derived from biased or non-evaluated
data can result in skewed or undesired predictions. Biased models may result in
detrimental outcomes, thereby furthering the negative impacts on society or objectives.
Algorithmic bias is a potential result of data not being fully prepared for training. Machine
learning ethics is becoming a field of study and notably, becoming integrated within
machine learning engineering teams.
In 2006, the media-services provider Netflix held the first "Netflix Prize" competition to find
a program to better predict user preferences and improve the accuracy of its existing
Cinematch movie recommendation algorithm by at least 10%. A joint team made up of
researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and
Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million.
[96] Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the
best indicators of their viewing patterns ("everything is a recommendation") and they
changed their recommendation engine accordingly.[97] In 2010 The Wall Street Journal
wrote about the firm Rebellion Research and their use of machine learning to predict the
financial crisis.[98] In 2012, co-founder of Sun Microsystems, Vinod Khosla, predicted that
80% of medical doctors jobs would be lost in the next two decades to automated machine
learning medical diagnostic software.[99] In 2014, it was reported that a machine learning
algorithm had been applied in the field of art history to study fine art paintings and that it
may have revealed previously unrecognized influences among artists.[100] In 2019
Springer Nature published the first research book created using machine learning.[101] In
2020, machine learning technology was used to help make diagnoses and aid researchers in
developing a cure for COVID-19.[102] Machine learning was recently applied to predict the
pro-environmental behavior of travelers.[103] Recently, machine learning technology was
also applied to optimize smartphone's performance and thermal behavior based on the
user's interaction with the phone.[104][105][106] When applied correctly, machine
learning algorithms (MLAs) can utilize a wide range of company characteristics to predict
stock returns without overfitting. By employing effective feature engineering and
combining forecasts, MLAs can generate results that far surpass those obtained from basic
linear techniques like OLS.[107]
Recent advancements in machine learning have extended into the field of quantum
chemistry, where novel algorithms now enable the prediction of solvent effects on chemical
reactions, thereby offering new tools for chemists to tailor experimental conditions for
optimal outcomes.[108]
Machine Learning is becoming a useful tool to investigate and predict evacuation decision
making in large scale and small scale disasters. Different solutions have been tested to
predict if and when householders decide to evacuate during wildfires and hurricanes.[109]
[110][111] Other applications have been focusing on pre evacuation decisions in building
fires.[112][113]
The "black box theory" poses another yet significant challenge. Black box refers to a
situation where the algorithm or the process of producing an output is entirely opaque,
meaning that even the coders of the algorithm cannot audit the pattern that the machine
extracted out of the data.[118] The House of Lords Select Committee, which claimed that
such an "intelligence system" that could have a "substantial impact on an individual's life"
would not be considered acceptable unless it provided "a full and satisfactory explanation
for the decisions" it makes.[118]
In 2018, a self-driving car from Uber failed to detect a pedestrian, who was killed after a
collision.[119] Attempts to use machine learning in healthcare with the IBM Watson system
failed to deliver even after years of time and billions of dollars invested.[120][121]
Microsoft's Bing Chat chatbot has been reported to produce hostile and offensive response
against its users.[122]
Machine learning has been used as a strategy to update the evidence related to a systematic
review and increased reviewer burden related to the growth of biomedical literature. While
it has improved with training sets, it has not yet developed sufficiently to reduce the
workload burden without limiting the necessary sensitivity for the findings research
themselves.[123]
Settling on a bad, overly complex theory gerrymandered to fit all the past training data is
known as overfitting. Many systems attempt to reduce overfitting by rewarding a theory in
accordance with how well it fits the data but penalizing the theory in accordance with how
complex the theory is.[126]
Learners can also disappoint by "learning the wrong lesson". A toy example is that an image
classifier trained only on pictures of brown horses and black cats might conclude that all
brown patches are likely to be horses.[127] A real-world example is that, unlike humans,
current image classifiers often do not primarily make judgments from the spatial
relationship between components of the picture, and they learn relationships between
pixels that humans are oblivious to, but that still correlate with images of certain types of
real objects. Modifying these patterns on a legitimate image can result in "adversarial"
images that the system misclassifies.[128][129]
Researchers have demonstrated how backdoors can be placed undetectably into classifying
(e.g., for categories "spam" and well-visible "not spam" of posts) machine learning models
that are often developed and/or trained by third parties. Parties can change the
classification of any input, including in cases for which a type of data/software transparency
is provided, possibly including white-box access.[132][133][134]
Different machine learning approaches can suffer from different data biases. A machine
learning system trained specifically on current customers may not be able to predict the
needs of new customer groups that are not represented in the training data. When trained
on human-made data, machine learning is likely to pick up the constitutional and
unconscious biases already present in society.[138]
Systems that are trained on datasets collected with biases may exhibit these biases upon
use (algorithmic bias), thus digitizing cultural prejudices.[139] For example, in 1988, the
UK's Commission for Racial Equality found that St. George's Medical School had been using a
computer program trained from data of previous admissions staff and that this program
had denied nearly 60 candidates who were found to either be women or have non-
European sounding names.[138] Using job hiring data from a firm with racist hiring policies
may lead to a machine learning system duplicating the bias by scoring job applicants by
similarity to previous successful applicants.[140][141] Another example includes predictive
policing company Geolitica's predictive algorithm that resulted in "disproportionately high
levels of over-policing in low-income and minority communities" after being trained with
historical crime data.[142]
Language models learned from data have been shown to contain human-like biases.[145]
[146] Because human languages contain biases, machines trained on language corpora will
necessarily also learn these biases.[147][148] In 2016, Microsoft tested Tay, a chatbot that
learned from Twitter, and it quickly picked up racist and sexist language.[149]
Because of such challenges, the effective use of machine learning may take longer to be
adopted in other domains.[152] Concern for fairness in machine learning, that is, reducing
bias in machine learning and propelling its use for human good, is increasingly expressed by
artificial intelligence scientists, including Fei-Fei Li, who said that "[t]here's nothing
artificial about AI. It's inspired by people, it's created by people, and—most importantly—it
impacts people. It is a powerful tool we are only just beginning to understand, and that is a
profound responsibility."[153]
There are concerns among health care professionals that these systems might not be
designed in the public's interest but as income-generating machines. This is especially true
in the United States where there is a long-standing ethical dilemma of improving health
care, but also increasing profits. For example, the algorithms could be designed to provide
patients with unnecessary tests or medication in which the algorithm's proprietary owners
hold stakes. There is potential for machine learning in health care to provide professionals
an additional tool to diagnose, medicate, and plan recovery paths for patients, but this
requires these biases to be mitigated.[154]
Since the 2010s, advances in both machine learning algorithms and computer hardware
have led to more efficient methods for training deep neural networks (a particular narrow
subdomain of machine learning) that contain many layers of nonlinear hidden units.[155]
By 2019, graphics processing units (GPUs), often with AI-specific enhancements, had
displaced CPUs as the dominant method of training large-scale commercial cloud AI.[156]
OpenAI estimated the hardware compute used in the largest deep learning projects from
AlexNet (2012) to AlphaZero (2017), and found a 300,000-fold increase in the amount of
compute required, with a doubling-time trendline of 3.4 months.[157][158]
Neuromorphic computing refers to a class of computing systems designed to emulate the
structure and functionality of biological neural networks. These systems may be
implemented through software-based simulations on conventional hardware or through
specialized hardware architectures.[159]
Embedded machine learning is a sub-field of machine learning where models are deployed
on embedded systems with limited computing resources, such as wearable computers, edge
devices and microcontrollers.[162][163][164] Running models directly on these devices
eliminates the need to transfer and store data on cloud servers for further processing,
thereby reducing the risk of data breaches, privacy leaks and theft of intellectual property,
personal data and business secrets. Embedded machine learning can be achieved through
various techniques, such as hardware acceleration,[165][166] approximate computing,
[167] and model optimization.[168][169] Common optimization techniques include
pruning, quantization, knowledge distillation, low-rank factorization, network architecture
search, and parameter sharing.
Software suites containing a variety of machine learning algorithms include the following: