AI Unit-5
AI Unit-5
UNIT V APPLICATIONS
AI applications –Language Models –Information Retrieval-Information Extraction –Natural Language
Processing -Machine Translation –Speech Recognition –Robot –Hardware Perception Planning Moving
1. AI APPLICATIONS
AI programs are developed to perform specific tasks that are being utilized for a wide range of
activities including medical diagnosis, electronic trading platforms, robot control, and remote sensing.
AI has been used to develop and advance numerous fields and industries, including finance,
healthcare, education, transportation, and more.
Information Retrieval Agriculture
Natural Language Processing HealthCare
Machine Translation Gaming
Speech Recognition Space Exploration
Robot Marketing Autonomous Vehicles
Banking Chatbots
Finance Artificial Creativity
Marketing
search for an item on any e-commerce store, we get all possible results related to the item. It’s
like these search engines read our minds! In a matter of seconds, we get a list of all relevant
items.
With the growing advancement in AI, in the near future, it may be possible for consumers on the web
to buy products by snapping a photo of it
Banking
A lot of banks have already adopted AI-based systems to provide customer support, detect
anomalies and credit card frauds. An example of this is HDFC Bank.
HDFC Bank has developed an AI-based chatbot called EVA (Electronic Virtual Assistant),
built by Bangalore-based Sense forth AI Research.
Since its launch, Eva has addressed over 3 million customer queries, interacted with over half a
million unique users, and held over a million conversations.
Eva can collect knowledge from thousands of sources and provide simple answers in less than
0.4 seconds.
AI solutions can be used to enhance security across a number of business sectors, including
retail and finance.
By tracing card usage and endpoint access, security specialists are more effectively preventing
fraud. Organizations rely on AI to trace those steps by analyzing the behaviors of transactions.
Companies such as MasterCard and RBS WorldPay have relied on AI and Deep Learning to detect
fraudulent transaction patterns and prevent card fraud for years now. This has saved millions of
dollars.
Finance
1
Ventures have been relying on computers and data scientists to determine future patterns in the
market. Trading mainly depends on the ability to predict the future accurately.
Machines are great at this because they can crunch a huge amount of data in a short span.
Machines can also learn to observe patterns in past data and predict how these patterns might
repeat in the future.
Financial organizations are turning to AI to improve their stock trading performance and boost
profit.
Agriculture
AI can help farmers get more from the land while using resources more sustainably.
Issues such as climate change, population growth, and food security concerns have pushed the
industry into seeking more innovative approaches to improve crop yield.
Organizations are using automation and robotics to help farmers find more efficient ways to
protect their crops from weeds.
Blue River Technology has developed a robot called See & Spray which uses computer vision
technologies like object detection to monitor and precisely spray on cotton plants. Precision
spraying can help prevent herbicide resistance
The image recognition app identifies possible defects through images captured by the user’s
smart phone camera. Users are then provided with soil restoration techniques, tips, and other
possible solutions. The company claims that its software can achieve pattern detection with an
estimated accuracy of up to 95%.
Health Care
When it comes to saving our lives, a lot of organizations and medical care centers are relying on
AI.
clinical decision support system for stroke prevention that can give the physician a warning when
there’s a patient at risk of having a heart stroke.
Preventive care
Personalized medicine
Gaming
Artificial Intelligence has become an integral part of the gaming industry. In fact, one of the
biggest accomplishments of AI is in the gaming industry.
DeepMind’s AI-based AlphaGo software, which is known for defeating Lee Sedol, the world
champion in the game of GO, is considered to be one of the most significant accomplishment in the
field of AI
The actions taken by the opponent AI are unpredictable because the game is designed in such a
way that the opponents are trained throughout the game and never repeat the same mistakes.
They get better as the game gets harder. This makes the game very challenging and prompts the
players to constantly switch strategies and never sit in the same position.
Space Exploration
Space expeditions and discoveries always require analyzing vast amounts of data.
Artificial Intelligence and Machine learning is the best way to handle and process data on this
scale.
After rigorous research, astronomers used Artificial Intelligence to sift through years of data
obtained by the Kepler telescope in order to identify a distant eight-planet solar system.
2
Artificial Intelligence is also being used for NASA’s next rover mission to Mars, the Mars 2020
Rover. The AEGIS, which is an AI-based Mars rover, is already on the red planet.
The rover is responsible for autonomous targeting of cameras in order to perform investigations on Mars.
Autonomous Vehicles
For the longest time, self-driving cars have been a buzzword in the AI industry. The development of
autonomous vehicles wills definitely revolutionaries the transport system.
Companies like Waymo conducted several test drives in Phoenix before deploying their first AI- based
public ride-hailing service.
The AI system collects data from the vehicles radar, cameras, GPS, and cloud services to produce control
signals that operate the vehicle.
Advanced Deep Learning algorithms can accurately predict what objects in the vehicle’s vicinity are
likely to do. This makes Waymo cars more effective and safer.
Another famous example of an autonomous vehicle is Tesla’s self-driving car. Artificial Intelligence
implements computer vision, image detection and deep learning to build cars that can automatically detect
objects and drive around without human intervention.
Language Models
Language Model is an AI model that has been trained to predict the next word or words in a text based on
the preceding the next word’s is referred to as self-supervised learning
Language modeling (LM) is the use of various statistical and probabilistic techniques to determine the
probability of a given sequence of words occurring in a sentence. Language models analyze bodies of text
data to provide a basis for their word predictions. They are used in natural language processing (NLP)
applications, particularly ones that generate text as an output.
How language modeling works
Language models determine word probability by analyzing text data. They interpret this data by feeding it through an
algorithm that establishes rules for context in natural language. Then, the model applies these rules in language tasks
to accurately predict or produce new sentences. The model essentially learns the features and characteristics of basic
language and uses those features to understand new phrases.
A language model is the core component of modern Natural Language Processing (NLP). It’s a statistical
tool that analyzes the pattern of human language for the prediction of words.
Types of Language Models
There are two types of language models in NLP:
1. Statistical models develop probabilistic models that help with predictions for the next word in the sequence.
Statistical techniques like N-grams, Hidden Markov Models (HMM) and certain linguistic rules to learn the probability
distribution of words.
2. Neural Language Models refer to language models that are developed using neural networks.
Statistical language processing techniques can be also used. –
Optical character recognition –
3
Spelling correction –
Speech recognition –
Machine translation –
Part of speech tagging –
Parsing
Statistical models:
Statistical techniques can be used to disambiguate the input.
• They can be used to select the most probable solution.
• Statistical techniques depend on the probability theory.
• To able to use statistical techniques, we will need corpora to collect statistics
Corpora should be big enough to capture the required knowledge.
Bayes' theorem
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the probability
of an event with uncertain knowledge. Bayes Theorem from probability theory is named after Thomas Bayes. It is also
known as Bayes law or Bayes rule. Bayes theorem is used in describing the probability of an event on the earlier
Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:
Similarly, the probability of event B with known event A:P(A ⋀ B)= P(B|A) /P(A)
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern AI
systems for probabilistic inference.
NLP tasks.
• Machine Translation:
4
– P(high winds tonight) > P(large winds tonight)
• Spell Correction:
– Thek office is about ten minutes from here
– P(The Office is) > P(Then office is)
• Speech Recognition:
– P(I saw a van) >> P(eyes awe of an)
5
N-Grams Language models
Language models are used to determine the probability of a sequence of words. The sequence of words can be 2
words, 3 words, 4 words…n-words etc. N-grams is also termed as a sequence of n words. The language model
which is based on determining probability based on the count of the sequence of words can be called as N-gram
language model. Based on the count of words, N-gram can be:
Unigram: Sequence of just 1 word
Bigram: Sequence of 2 words
Trigram: Sequence of 3 words
…so on and so forth
P(“Which is best car insurance package”)=P(“Which is best car insurance package”)= P(which)P(is)…
P(insurance)P(package)P(which)P(is)…P(insurance)P(package)
P(“Which is best car insurance package”)=P(“Which is best car insurance package”)= P(whichstartOfSentence)P(isw
hich)P(bestis)..P(endOfSentencepackage)P(whichstartOfSentence)P(iswhich)P(bestis)..P(endOfSentencepackage)
6
The above can also be calculated as following:
P(“Which company provides best car insurance package”)=P(“Which company provides best car insurance package”)
= P(companywhich startOfSentence)P(provideswhich company)P(bestcompany provides)…
P(endOfSentenceinsurance package)P(companywhich startOfSentence)P(provideswhich company)P(bestcompany pr
ovides)…P(endOfSentenceinsurance package)
7
8
N-gram model
n-gram model is an insufficient model of a language because languages
have long-distance dependencies.
– “The computer(s) which I had just put into the machine room is (are) crashing.”
– But we can still effectively use N-Gram models to represent languages.
– In reality, we do not use higher than Trigram (not more than Bigram).
– How big are N-Gram tables with 10,000 words?
• Unigram -- 10,000
• Bigram – 10,000*10,000 = 100,000,000
• Trigram – 10,000*10,000*10,000 = 1,000,000,000,000
9
example, let’s predict the probability of the sentence There was heavy rain.
Each of the terms on the right hand side of this equation are n-gram probabilities that we can estimate using the
counts of n-grams in our corpus. To calculate the probability of the entire sentence,this formula does not scale since
we cannot compute n-grams of every length. For example, consider the case where we have solely bigrams in our
model; we have no way of knowing the probability `P(‘rain’|‘There was’) from bigrams.
By using the Markov Assumption, we can simplify our equation by assuming that future states in our model only
depend upon the present state of our model. This assumption means that we can reduce our conditional probabilities
to be approximately equal so that
P('rain'|'There was heavy') ~ P('rain'|'heavy')
More generally, we can estimate the probability of a sentence by the probabilities of each component part. In the
equation that follows, the probability of the sentence is reduced to the probabilities of the sentence’s individual
bigrams.
10
1.Initial probability distribution: An initial probability distribution over states, πi is the probability that the
Markov chain will start in a certain state i. Some states j may have πj = 0, meaning that they cannot be initial
states
3.Transition probability distribution: A transition probability matrix A where each aij represents the probability
of moving from state I to state j
The diagram below represents a Markov chain where there are three states representing the weather of the day
(cloudy, rainy, and sunny). And, there are transition probabilities representing the weather of the next day given the
weather of the current day.
There are three different states such as cloudy, rain, and sunny. The following represent the transition probabilities
based on the above diagram:
11
Sunny – Sunny (Tuesday) – Cloudy (Wednesday): The probability to a cloudy Wednesday can be calculated as
0.5 x 0.4 = 0.2
Sunny – Rainy (Tuesday) – Cloudy (Wednesday): The probability of a cloudy Wednesday can be calculated as
0.1 x 0.3 = 0.03
Sunny – Cloudy (Tuesday) – Cloudy (Wednesday): The probability of a cloudy Wednesday can be calculated as
0.4 x 0.1 = 0.04
The total probability of a cloudy Wednesday = 0.2 + 0.03 + 0.04 = 0.27.
As shown above, the Markov chain is a process with a known finite number of states in which the probability of
being in a particular state is determined only by the previous state.
Transition probability distribution: A transition probability matrix where each aij represents the probability of
moving from state i to state j. The transition matrix is used to show the hidden state to hidden state transition
probabilities.
A sequence of observations
Emission probabilities: A sequence of observation likelihoods, also called emission probabilities, each
expressing the probability of an observation oi being generated from a state I. The emission probability is used to
define the hidden variable in terms of its next hidden state. It represents the conditional distribution over an
observable output for each hidden state at time t=0.hidden Markov model representation shown below:
12
The hidden Markov model in the above diagram represents the process of predicting whether someone will be found
to be walking, shopping, or cleaning on a particular day depending upon whether the day is rainy or sunny. The
following represents five components of the hidden Markov model in the above diagram:
There are two hidden states such as rainy and sunny. These states are hidden because what is observed as the
process output is whether the person is shopping, walking, or cleaning.
The sequence of observations is shop, walk, and clean.
An initial probability distribution is represented by start probability
Transition probability represents the transition of one state (rainy or sunny) to another state given the current
state
Emission probability represents the probability of observing the output, shop, clean and walk given the states,
rainy or sunny.
13
Information retrieval (IR)
Information retrieval (IR) system is a set of algorithms that facilitate the relevance of
displayed documents to searched queries. In simple words, it works to sort and rank documents based on the queries
of a user. There is uniformity with respect to the query and text in the document to enable document accessibility.
With the help of the following diagram, understand the process of information retrieval (IR) −
It is clear from the above diagram that a user who needs information will have to formulate a request in the form of
query in natural language. Then the IR system will respond by retrieving the relevant output, in the form of
documents, about the required information
A retrieval model (IR) chooses and ranks relevant pages based on a user's query. Document selection and ranking can
be formalized using matching functions that return retrieval status values (RSVs) for each document in a collection
since documents and queries are written in the same way. The majority of IR systems portray document contents
using a collection of descriptors known as words from a vocabulary V.
The estimation of the likelihood of user relevance for each page and query in relation to a collection of q training
documents.
In a vector space, the similarity function between queries and documents is computed.
14
4.R (q, di) − A ranking function that determines the similarity between the query and the document to display
relevant information.
There are three types of Information Retrieval (IR) models:
1. Classical IR Model — It is designed upon basic mathematical concepts and is the most widely-used of IR models.
Classic Information Retrieval models can be implemented with ease. Its examples include Vector-space, Boolean and
Probabilistic IR models. In this system, the retrieval of information depends on documents containing the defined set
of queries. There is no ranking or grading of any kind. The different classical IR models take Document
Representation, Query representation, and Retrieval/Matching function into account in their modelling. This is one of
the most used Information retrieval models.
2. Non-Classical IR Model — They differ from classic models in that they are built upon propositional logic.
Examples of non-classical IR models include Information Logic, Situation Theory, and Interaction models.
3. Alternative IR Model — These take principles of classical IR model and enhance upon to create more functional
models like the Cluster model, Alternative Set-Theoretic Models Fuzzy Set model, Latent Semantic Indexing (LSI)
model, Alternative Algebraic Models Generalized Vector Space Model, etc.
The various components of an Information Retrieval Model include:
Step 1
Acquisition
The IR system sources documents and multimedia information from a variety of web
resources. This data is compiled by web crawlers and is sent to database storage systems.
Step 2
Representation
The free-text terms are indexed, and the vocabulary is sorted, both using automated or manual
procedures. For instance, a document abstract will contain a summary, meta description,
bibliography, and details of the authors or co-authors.
Step 3
File Organization
File organization is carried out in one of two methods, sequential or inverted. Sequential file
organization involves data contained in the document. The Inverted file comprises a list of
records, in a term by term manner.
Step 4
Query
An IR system is initiated on entering a query. User queries can either be formal or informal
statements highlighting what information is required. In IR systems, a query is not indicative of a
single object in the database system. It could refer to several objects whichever match the query.
However, their degrees of relevance may vary.
15
more editable and structured data formats.
The following steps are often involved in extracting structured information from unstructured texts
1. Initial processing.
3. Parsing.
5. Anaphora resolution.
1. Initial processing
The first step is to break down a text into fragments such as zones, phrases, segments, and tokens. This function can
be performed by tokenizers, text zoners, segmenters, and splitters, among other components. In the initial processing
stage, part-of-speech tagging, and phrasal unit identification (noun or verb phrases) are usually the next tasks.
One of the most important stages in the information extraction chain is the identification of various classes of proper
names, such as names of people or organizations, dates, monetary amounts, places, addresses, and so on. They may
be found in practically any sort of text and are widely used in the extraction process. Regular expressions, which are a
collection of patterns, are used to recognize these names.
3. Parsing
The syntactic analysis of the sentences in the texts is done at this step. After recognizing the fundamental entities in
the previous stage, the sentences are processed to find the noun groups that surround some of those entities and verb
groups. At the pattern matching step, the noun and verb groupings are utilized as sections to begin working on.
This stage establishes relations between the extracted ideas. This is accomplished by developing and implementing
extraction rules that describe various patterns. The text is compared to certain patterns, and if a match is discovered,
the text element is labeled and retrieved later.
16
This stage entails converting the structures collected during the preceding processes into output templates that follow
the format defined by the user. It might comprise a variety of normalization processes.
NLU enables machines to understand and interpret human language by extracting metadata from content. It performs
the following tasks:
NLU is more difficult than NLG tasks owing to referential, lexical, and syntactic ambiguity.
17
Lexical ambiguity: This means that one word holds several meanings. For example, "The man is looking for
the match." The sentence is ambiguous as ‘match’ could mean different things such as a partner or a
competition.
Syntactic ambiguity: This refers to a sequence of words with more than one meaning. For example, "The
fish is ready to eat.” The ambiguity here is whether the fish is ready to eat its food or whether the fish is ready
for someone else to eat. This ambiguity can be resolved with the help of the part-of-speech tagging technique.
Referential ambiguity: This involves a word or a phrase that could refer to two or more properties. For
example, Tom met Jerry and John. They went to the movies. Here, the pronoun ‘they’ causes ambiguity as it
isn’t clear who it refers to.
NLP Terminology
Phonology − It is study of organizing sound systematically.
Morphology − It is a study of construction of words from primitive meaningful units.
Morpheme − It is primitive unit of meaning in a language.
Syntax − It refers to arranging words to make a sentence. It also involves determining the structural role of
words in the sentence and in phrases.
Semantics − It is concerned with the meaning of words and how to combine words into meaningful phrases
and sentences.
Pragmatics − It deals with using and understanding sentences in different situations and how the
interpretation of the sentence is affected.
Discourse − It deals with how the immediately preceding sentence can affect the interpretation of the next
sentence.
World Knowledge − It includes the general knowledge about the world.
Sentence Segment is the first step for building the NLP pipeline. It breaks the paragraph into separate sentences.
18
Independence Day is one of the important festivals for every Indian citizen. It is celebrated on the 15th of
August each year ever since India got independence from the British rule. The day celebrates independence in
the true sense.
1. "Independence Day is one of the important festivals for every Indian citizen."
2. "It is celebrated on the 15th of August each year ever since India got independence from the British rule."
3. "This day celebrates independence in the true sense."
Tokenization is, generally, an early step in the NLP process, a step which splits longer strings of text into smaller
pieces, or tokens. Larger chunks of text can be tokenized into sentences, sentences can be tokenized into words, etc
Example:
JavaTpoint offers Corporate Training, Summer Training, Online Training, and Winter Training.
Word Tokenizer generates the following result:
"JavaTpoint", "offers", "Corporate", "Training", "Summer", "Training", "Online", "Training", "and", "Winter",
"Training", "."
Step3: Stemming
Stemming is used to normalize words into its base form or root form. For example, celebrates, celebrated and
celebrating, all these words are originated with a single root word "celebrate." The big problem with stemming is that
sometimes it produces the root word which may not have any meaning.
For Example, intelligence, intelligent, and intelligently, all these words are originated with a single root word
"intelligen." In English, the word "intelligen" do not have any meaning.
Step 4: Lemmatization
Lemmatization is quite similar to the Stemming. It is used to group different inflected forms of the word, called
Lemma. The main difference between Stemming and lemmatization is that it produces the root word, which has a
meaning.
For example: In lemmatization, the words intelligence, intelligent, and intelligently has a root word intelligent,
which has a meaning.
In English, there are a lot of words that appear very frequently like "is", "and", "the", and "a". NLP pipelines will flag
these words as stop words. Stop words might be filtered out before doing any statistical analysis.
19
Dependency Parsing is used to find that how all the words in the sentence are related to each other.
POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates that how a word
functions with its meaning as well as grammatically within the sentences. A word has one or more parts of speech
based on the context in which it is used.
Named Entity Recognition (NER) is the process of detecting the named entity such as person name, movie name,
organization name, or location.
Example: Steve Jobs introduced iPhone at the Macworld Conference in San Francisco, California.
Corpus
In linguistics and NLP, corpus (literally Latin for body) refers to a collection of texts. Such collections may be formed
of a single language of texts, or can span multiple languages -- there are numerous reasons for which multilingual
corpora (the plural of corpus) may be useful. Corpora may also consist of themed texts (historical, Biblical, etc.).
Corpora are generally solely used for statistical linguistic analysis and hypothesis testing.
Phases of NLP
The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of characters and
converts it into meaningful lexemes. It divides the whole text into paragraphs, sentences, and words.
20
2. Syntactic Analysis (Parsing)
Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among the words.
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected by the Syntactic
analyzer.
3. Semantic Analysis
Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal meaning of words,
phrases, and sentences.
4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of the sentences that
follow it.
5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by applying a set of rules that
characterize cooperative dialogues.
21
Machine Translation (MT) is the automated process computers use to translate text from one natural
language to another (e.g. Google Translate).
Human-translated text is based on linguistics and grammatical understanding of a language pair and is
usually quite accurate.
Machine translation is the process of using artificial intelligence to automatically translate text from one
language to another without human involvement. Modern machine translation goes beyond simple word-
to-word translation to communicate the full meaning of the original language text in the target language.
The software parses text and creates a transitional representation from which the text in the target language is
generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and
large sets of rules. The software uses these complex rule sets and then transfers the grammatical structure of the
source language into the target language.
Pattern recognition
A pattern is an entity, vaguely defined, that could be given a name, e.g., fingerprint ,image handwritten, word , human
face ,speech signal,DNA sequence
22
Activities for designing the Pattern Recognition Systems
Data collection: Collecting training and testing data I How can we know when we have adequately large and representative set
of samples.
Feature selection: Domain dependence and prior information
Computational cost and feasibility
Discriminative features – Similar values for similar patterns – Different values for different patterns Invariant features with
respect to translation, rotation and scale
Robust features with respect to occlusion, distortion, deformation, and variations in environment
Model selection: Domain dependence and prior information
23
Definition of design criteria
Parametric vs. non-parametric models
Handling of missing features
Computational complexity
Types of models: templates, decision-theoretic or statistical, syntactic or structural, neural, and hybrid I
Training: How can we learn the rule from data?
Supervised learning: a teacher provides a category label or cost for each pattern in the training set Unsupervised learning: the
system forms clusters or natural groupings of the input patterns
Reinforcement learning: no desired category is given but the teacher provides feedback to the system such as the decision is
right or wrong
Evaluation: How can we estimate the performance with training samples? How can we predict the performance with future
data? Problems of overfitting and generalization
Characteristic (feature); a. the distinction between good and poor features, and b. feature properties.
24
Classifier is used to partition the feature space into class-labeled decision regions. While Decision Boundaries are
the borders between decision regions.
ComponentsinPatternRecognitionSystem:
A pattern recognition systems can be partitioned into components.There are five typical components for various
pattern recognition systems. These are as following:
A Sensor : A sensor is a device used to measure a property, such as pressure, position, temperature, or
acceleration, and respond with feedback.
A Preprocessing Mechanism : Segmentation is used and it is the process of partitioning a data into multiple
segments. It can also be defined as the technique of dividing or partitioning an data into parts called segments.
A Feature Extraction Mechanism : feature extraction starts from an initial set of measured data and builds
derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning and
generalization steps, and in some cases leading to better human interpretations. It can be manual or automated.
A Description Algorithm : Pattern recognition algorithms generally aim to provide a reasonable answer for all
possible inputs and to perform “most likely” matching of the inputs, taking into account their statistical variation
A Training Set : Training data is a certain percentage of an overall dataset along with testing set. As a rule, the
better the training data, the better the algorithm or classifier performs.
DesignPrinciplesofPatternRecognition
In pattern recognition system, for recognizing the pattern or structure two basic approaches are used which
can be implemented in different techniques. These are –
1.Statistical Approach and
2. Structural Approach
StatisticalApproach:
Statistical methods are mathematical formulas, models, and techniques that are used in the statistical
25
analysis of raw research data. The application of statistical methods extracts information from research data
and provides different ways to assess the robustness of research outputs.
Two main statistical methods are used :
1. Descriptive Statistics: It summarizes data from a sample using indexes such as the mean or standard
deviation.
2. Inferential Statistics: It draw conclusions from data that are subject to random variation.
StructuralApproach:
The Structural Approach is a technique wherein the learner masters the pattern of sentence. Structures are
the different arrangements of words in one accepted style or the other.
Types of structures:
Sentence Patterns
Phrase Patterns
Formulas
Idioms
Speech recognition
Speech recognition, also known as automatic speech recognition (ASR), computer speech recognition, or speech-to-
text, is a capability which enables a program to process human speech into a written format.Speech recognition, or
speech-to-text, is the ability of a machine or program to identify words spoken aloud and convert them into
readable text.
Speech recognition is an AI-enhanced technology converting human speech from an analog form to digital form.
Advanced computer programs then use the digital speech for further processing.
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops
methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also
known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT).
ASR (Automated speech recognition) is a technology that allows users to enter data into information systems by
speaking rather than punching numbers into a keypad. ASR is primarily used for providing information and
forwarding phone calls.
Automatic speech recognition is the process by which a computer maps an acoustic speech signal to text. Automatic
speech understanding is the process by which a computer maps an acoustic speech signal to some form of abstract
meaning of the speech
26
Fig: System Architecture of for Automatic Speech Recognition System
.
The main focus of feature extractor is to keep the relevant information and discard irrelevant one. Feature extractor
divides the acoustic signal into 10-25 ms. Data acquired in these frames is multiplied by window function. There are
many types of window functions that can be used such as hamming Rectangular, Blackman, Welch or Gaussian etc.
Feature extraction methods: Principal component Analysis (PCA), Linear Discriminate Analysis (LDA), Wavelet,
Independent component Analysis(ICA)
PCA and LDA are two popular dimensionality reduction methods commonly used on data with too many input
features. In many ways the two algorithms are similar, but at the same time very dissimilar. It can be divided into
feature discovery and extraction of features.
27
Why do we use PCA?
Practically PCA is used for two reasons:
Dimensionality Reduction: Information spread over many columns is converted into main components ( PCs) such
that the first few PCs can clarify a substantial chunk of the total information (variance). In Machine Learning models,
these PCs can be used as explanatory variables.
Visualize classes: It is difficult for data with more than three dimensions (features) to visualize the separation of
classes (or clusters). With the first two PCs alone, a simple distinction can generally be observed.
Linear Discriminant Analysis (LDA)
LDA is a technique of supervised machine learning which is used by certified machine learning experts to
distinguish two classes/groups. The critical principle of linear discriminant analysis ( LDA) is to optimize the
separability between the two classes to identify them in the best way we can determine. LDA is similar to PCA,
which helps minimize dimensionality. Still, by constructing a new linear axis and projecting the data points on that
axis, it optimizes the separability between established categories.
LDA does not function on finding the primary variable; it merely looks at what kind of point/features/subspace to
distinguish the data offers further discrimination.
28
How are LDA models represented?
The depiction of the LDA is obvious. The model consists of the estimated statistical characteristics of your data for
each class. In the case of multiple variables, the same properties are computed over the multivariate Gaussian. The
multivariates are matrices of means and covariates. By providing the statistical properties in the LDA equation,
predictions are made. From your data, the properties are estimated. Finally, to construct the LDA model, the model
values are stored as a file.
LDA Real-Life Applications
Some of the practical LDA applications are described below:
Face Recognition-In face recognition, LDA is used to reduce the number of attributes until the actual classification
to a more manageable number. A linear combination of pixels that forms a template is the dimensions that are
created. Fisher’s faces are called these.
Medical-LDA may be used to identify the illness of the patient as mild, moderate, or extreme. The classification is
carried out on the patient’s different criteria and his medical trajectory.
Customer Identification– By conducting a simple question and answering a survey, you can obtain customers’
characteristics. LDA helps to recognize and pick the assets of a group of consumers most likely to purchase a specific
item in a shopping mall.
PROPERTY PCA LDA
Type Unsupervised Supervised
Goal -Training faster - Good for classification
-Visualization
Dimension of -Less than or equal to the -Less than or equal to the result
the new data original one's when subtracting 1 from the number
classes
Method Maximize the variance Maximize between-class variance and
minimize within-class variance
29
Acoustic Model
30
1. An analog-to-digital converter measures the size of the current – which approximates
the amplitude of the sound wave at discrete intervals called as sampling rate.
3. A phoneme is the smallest unit of sound that has a distinct meaning to speakers of a
particular language.
For example the “t” in “stick” sounds similar enough to the “t” in “tick” that speakers
of English consider them the same phoneme.
model.
FigureTranslating the acoustic signal into a sequence of frames. In this diagram each frame is described
by the discretized values of three acoustic features; a real system would have dozens of features
31
What does speaker dependent / adaptive / independent mean?
A speaker dependent system is developed to operate for a single speaker. These systems are
usually easier to develop, cheaper to buy and more accurate, but not as flexible as speaker
adaptive or speaker independent systems.
A speaker independent system is developed to operate for any speaker of a particular type (e.g.
American English). These systems are the most difficult to develop, most expensive and
accuracy is lower than speaker dependent systems. However, they are more flexible.
A speaker adaptive system is developed to adapt its operation to the characteristics of new
speakers. It's difficulty lies somewhere between speaker independent and speaker dependent
systems.
What does small/medium/large/very-large vocabulary mean?
The size of vocabulary of a speech recognition system affects the complexity, processing
requirements and the accuracy of the system. Some applications only require a few words (e.g.
numbers only), others require very large dictionaries (e.g. dictation machines). There are no
established definitions, however, try
For computer systems, which are speaker-independent, phonemes are extracted from the audio
provided, converted into ASCII characters, and then formulated into words to allow applications
using speech recognition to act upon the input. There are mathematical formulas and models
used to identify the most likely word spoken. These models match spoken words against known
word models and selects one that has the greatest likelihood of being the correct word. In order
to identify the "greatest likelihood", large amounts of training data is used to create the models.
This type of statistical model is known as the Hidden Markov Model (HMM).
A speaker-dependent system is developed to operate for a single speaker. These systems are
usually easier to develop, cheaper to buy and more accurate. The system is train to understand
one user's pronunciations, inflections, and accents, and can run much more efficiently and
accurately. It requires users to participate in training sessions that "teach" the computer to
recognize the user's voice. The computer then makes a voice profile that matches the require
training.
The speaker-dependent side of the technology-where the system is "trained" though repetition to
recognize a certain vocabulary of words and accept no substitute-is fairly well-established. This
technology is base generally on template, or acoustical, representation of spee