BI Unit 3
BI Unit 3
Unit 3
Basics of Neural networks
A neural network is a method in artificial intelligence that teaches computers to process data in a
way that is inspired by the human brain. It is a type of machine learning process, called deep
learning, that uses interconnected nodes or neurons in a layered structure that resembles the
human brain.
Concept: Neurons are the basic building blocks of neural networks. They receive inputs,
perform a computation, and produce an output.
Example: In image recognition, each neuron might represent a pixel. The first layer
receives raw pixel values, and subsequent layers extract increasingly complex features.
2.Input Layer:
Concept: The input layer is where the neural network receives its input data.
Example: In a spam email classifier, the input layer would consist of neurons
representing features like email content, sender information, etc.
3.Hidden Layers:
Concept: Between the input and output layers, neural networks have one or more hidden
layers where complex representations are learned.
Example: In a financial fraud detection system, hidden layers might learn patterns
indicating potential fraudulent transactions.
Business Intelligence
4. Weights — These values explain the strength (degree of importance) of the connection
between any two neurons.
Bias — is a constant value added to the sum of the product between input values and respective
weights. It is used to accelerate or delay the activation of a given node.
5. Activation function — is a function used to introduce the non-linearity phenomenon into the
NN system. This property will allow the network to learn more complex patterns.
Output Layer:
Concept: The output layer produces the final result of the neural network's computation.
History and timeline of neural networks
The history of neural networks spans several decades and has seen considerable advancements.
The following examines the important milestones and developments in the history of neural
networks:
1940s. In 1943, mathematicians Warren McCulloch and Walter Pitts built a circuitry
system that ran simple algorithms and was intended to approximate the functioning of the
human brain.
1950s. In 1958, Frank Rosenblatt, an American psychologist who's also considered the father
of deep learning, created the perceptron, a form of artificial neural network capable of
learning and making judgments by modifying its weights. The perceptron featured a single
layer of computing units and could handle problems that were linearly separate.
1970s. Paul Werbos, an American scientist, developed the backpropagation method, which
facilitated the training of multilayer neural networks. It made deep learning possible by
enabling weights to be adjusted across the network based on the error calculated at the output
layer.
1980s. Cognitive psychologist and computer scientist Geoffrey Hinton, along with computer
scientist Yann LeCun, and a group of fellow researchers began investigating the concept of
connectionism, which emphasizes the idea that cognitive processes emerge through
interconnected networks of simple processing units. This period paved the way for modern
neural networks and deep learning.
1990s. Jürgen Schmidhuberand Sepp Hochreiter, both computer scientists from Germany,
proposed the Long Short-Term Memory recurrent neural network framework in 1997.
2000s. Geoffrey Hinton and his colleagues pioneered RBMs, a sort of generative artificial
neural network that enables unsupervised learning. RBMs opened the path for deep belief
networks and deep learning algorithms.
Business Intelligence
An artificial
cial neuron takes input values (it can be several) with weights assigned to them. Inside
the node, the weighted inputs are summed up, and an activation function is applied to get the
results. The output of the node is passed on to the other nodes or, in th
thee case of the last layer of
the network, the output is the overall output of the network.
A single neuron, like the one shown above, performs the following mathematical operation,
Equation 1
In the Equation, four things are happening — input is multiplied with the respective weights and
added, bias is added to the result, and then an activation function, g,, is applied so that the output
of the neuron is g(w·x+b).
For an n-dimensional
dimensional input, the first layer (also called the input layer) will have n nodes
and the t-dimensional
dimensional final/output layer will have t neural units.
Figure 2: A Neural network with 3 input features, two hidden layers with 4 nodes each and one-
one
value output.
Simplified Example
Let’s take a simple example of how a single neuron work. In this example, we assume 3 input
values and a bias of 0.
Business Intelligence
In this example, we will consider a commonly used activation function called sigmoid,
sigmoid which is
defined as
Sigmoid function (f(x)) with its derivative (f’(x)). The sigmoid f(x) pushes any real value x into
the range (0,1). At this moment don’t mind too much about the derivative.
Business Intelligence
This is a plot of sigmoid plot. Notice that for a value of x less than -5
5 or greater than 5 then f(x)
approaches 0 and 1, respectively.
As said before, four things are happening inside the neuron. First, the input values are weighted
weight
by multiplying the input values with corresponding weights.
If the given neuron is in the hidden layer, then this output becomes the input of the next
neuron(s). On the other hand,, if this value is the output of the last layer, then it can be interpreted
as the final prediction of the model
Important Note: To simplify the mathematical operation done with the neuron, we can use a
more compact matrix form of the first two operations. In this case, a dot product operation
between the vector of input values and the weights vector will come in handy.
The nervous system in the biological brain consists of two categories of cells: neurons and glial
gl
cells. Glial cells provide supportive functions to the nervous system. Specifically, the cells are
tasked to maintain homeostasis, form a myelin sheath that insulates the nerves, and participate in
signal transmission.
The dendrites are the projections that act as the input to the neuron
neuron.. It receives electro-
electro
chemical information from other neurons and propagates them to the cell body.
On the other hand, the axon is a long elongation of the neuron that transports information
from the cell body into the other neurons, glands, and muscles. Axon connects to the cell
body in a conical projection called the axon hillock. The hillock is responsible for
summing the inhibitory and excitatory signals, and if the sum exceeds some threshold, the
neuron fires a signal (called an action potential). Two neurons connect at the synapses.
The synapses are located at the axon terminal of the first neuron and the dendrites of the
second neuron.
Biological (left)
t) and artificial neuron (right).
Artificial neuron
An artificial neuron (also called a unit or a node) mimics the biological neuron in structure
and function.
The artificial neuron takes several input values (synonymous to the dendrites in the
biological neuron) with weights assigned to them (analogous to the role of synapses).
Inside the node, the weighted inputs are summed up, and an activation function is applied
to get the results. This operation matches the role of the cell body and the axon hillock in
the biological neuron.
The output of the node is passed on to the other units — an operation that mimics the
process of electro-chemical
chemical information being passed on from one neuron to another or
other parts
ts of the nervous system.
If the solution requires an application that utilizes the network, then this flow is in
addition to the standard software development cycle.
The first step, Data Sourcing, refers to the collection and “normalization” of data to be
fed into neural networks. The process for this step differs based on data readiness, but in
general involves accessing where the data is stored and converting the data to be in the
same format universally.
Next is the process of Data Labeling, which can be time consuming for certain neural
networks. For example, a network designed to categorize its input will need to have
initial data that has already been categorized manually.
Data Versioning is as it sounds, each data set needs to be properly notated so that
developers can reference which sets produced the best results.
The next section of steps involves the cycle of creating the actual neural network. The network,
modeled in the image above, goes through three main stages:
Model Architecture: The first stage is Model Architecture, this is where the developer
will decide based on purpose and input data exactly what type of network to create and
what layers the model will consist of.
Model Training: After the model is defined it will need to be trained. Model Training is
the stage where the model will be exposed to most of the data that was labeled in the
most recent set.
The process consists of a batch of data points being passed through the model, then the
outputs of each data point pass is compared to the labels designated during the Data
Labeling stage.
Depending on how close the outputs are to their respective label, the model’s weights
will be updated via one of numerous methods to attempt closing the difference between
output and label. This stage ends when either the model has passed over the entire data
set a specified number of times or when each pass yields no additional benefit.
Model Evaluation: After the training stage, the model enters the Model Evaluation
stage. During this stage the model will pass over the data points not used during training.
This method of splitting the data ensures that the model never explicitly “sees” the data it
Business Intelligence
is being evaluated with. The only way the model could perform well on this evaluation
set would be if it truly identified the co
correct
rrect patterns within the training data.
Finally, after multiple iterations of the model development sub sub-cycle,
cycle, the development team will
have a functioning model that is ready to make predictions. Before the model is utilized in a
production environment itt must be versioned.
Model Versioning is essentially documenting the specifics regarding when the model
was trained and on what data. This step is vital for ensuring model quality in the future. If
a new model is trained, it is important to be able to compare its results with previous
iterations as well as allowing the developers to assess why one model performs better
than the other.
Model Deployment steps differ based on use case. For exa example,
mple, if the network is a
stand-alone
alone entity, this step is mainly just hosting the model somewhere in the cloud or as
a runnable script. But, if the model is to be used within custom software, this is where the
neural network development cycle would return to the software development cycle, most
likely within the “integration” phase.
Sensitivity analysis involves examining how changes in input variables affect the output of the
neural network. By systematically varying the inputs and observing the corresponding changes in
output, analysts can infer which inputs have the most significant impact on the model's
predictions. This helps in understanding the model's behavior and identifying which features it
relies on most heavily for decision-making.
In the context of illuminating the black box of an ANN, sensitivity analysis can provide valuable
insights into the features or variables that drive the network's decisions, helping users understand
its inner workings and potentially uncovering areas for improvement or refinement.
Basic Concept: SVMs operate by identifying the hyperplane that maximizes the margin
between classes in the feature space. The feature space is the multidimensional space
where each data point is represented by its features or attributes.
Linear Separability: SVMs are initially designed for linearly separable data. In a binary
classification problem, SVM finds the hyperplane that separates the classes with the
widest possible margin. This hyperplane is determined by support vectors, which are the
data points closest to the decision boundary.
Kernel Trick: SVMs can be extended to handle non-linearly separable data using a
technique called the kernel trick. This involves mapping the input space into a higher-
dimensional feature space where the data might be linearly separable. Common kernels
include linear, polynomial, radial basis function (RBF), and sigmoid.
Optimization: The goal of SVMs is to find the hyperplane that maximizes the margin
while minimizing classification errors. This is achieved by solving an optimization
problem, typically a quadratic programming problem, where the objective is to minimize
the classification error and maximize the margin.
Business Intelligence
Classification and Regression: While SVMs are primarily used for classification tasks,
they can also be adapted for regression tasks (Support Vector Regression). In regression,
SVMs aim to fit a hyperplane that predicts continuous output values with minimal error.
Overall, SVMs are powerful and versatile machine learning algorithms that are widely
used in various applications such as text classification, image recognition, bioinformatics,
and financial forecasting. They are particularly effective when dealing with high-
dimensional data and datasets with a clear margin of separation between classes.
Problem Definition: Clearly define the problem you want to solve using SVMs. This
could be a classification or regression task. Identify the nature of the data (e.g., structured
or unstructured) and the desired outcome.
Data Collection and Preprocessing: Gather the data relevant to your problem. This may
involve collecting data from various sources, cleaning, and preprocessing it to ensure it's
suitable for use with SVMs. Preprocessing steps may include feature scaling,
normalization, handling missing values, and encoding categorical variables.
Feature Selection and Engineering: Identify relevant features that may help the SVM
model make accurate predictions. This may involve feature selection techniques to
choose the most informative features or feature engineering to create new features based
on domain knowledge.
Model Selection: Choose an appropriate SVM variant and kernel function based on the
problem at hand and the characteristics of the data. Consider factors such as linearity,
separability, and the presence of noise in the data. Common choices include linear SVMs,
polynomial SVMs.
Business Intelligence
Model Training: Train the SVM model on a labeled dataset using the chosen kernel
function and parameters. During training, the SVM algorithm learns to find the optimal
hyperplane that separates the different classes or predicts the target variable.
Model Evaluation: Evaluate the performance of the trained SVM model using appropriate
evaluation metrics such as accuracy, precision, recall, F1-score, or mean squared error
(for regression). Use techniques like cross-validation to ensure the model's
generalizability and robustness.
Hyperparameter Tuning: Fine-tune the hyperparameters of the SVM model to improve its
performance further. This may involve grid search, random search, or more advanced
optimization techniques to find the optimal combination of hyperparameters.
Deployment and Monitoring: Once satisfied with the performance of the SVM model,
deploy it into production and monitor its performance over time. Continuously collect
feedback data and retrain or update the model as needed to adapt to changing conditions
or new patterns in the data.
Basic Concept: The nearest neighbor method operates on the principle that similar
instances tend to have similar labels. In sentiment analysis, this means that
texts/documents with similar content tend to have similar sentiments.
Training Phase: During the training phase, the method doesn't actually build a model in
the traditional sense. Instead, it memorizes the labeled instances in the training
dataset. Each instance consists of a text/document and its corresponding sentiment label
(e.g., positive, negative, neutral).
Business Intelligence
Prediction Phase: When given a new, unlabeled text/document to classify sentiment, the
nearest neighbor method identifies the labeled instances (neighbors) from the training
dataset that are most similar to the input text/document.
Similarity Metric: The similarity between the input text/document and each labeled
instance is typically computed using a distance or similarity metric, such as cosine
similarity, Euclidean distance, or Jaccard similarity, depending on the nature of the data
and the text representation used.
Voting Scheme: Once the nearest neighbors are identified, the sentiment label of the
input text/document is determined using a voting scheme. For example, a simple
approach is to assign the sentiment label that occurs most frequently among the nearest
neighbors.
Parameter Tuning: The performance of the nearest neighbor method can be influenced by
various factors, including the choice of similarity metric and the number of nearest
neighbors considered. These parameters may need to be tuned to optimize performance
on a given dataset.
Limitations: While simple and intuitive, the nearest neighbor method for sentiment
analysis has some limitations. It can be computationally expensive, especially when
dealing with large datasets. Additionally, it may not perform well when the feature space
is high-dimensional or when there is noise or irrelevant features in the data.
In summary, the nearest neighbor method for sentiment analysis offers a simple yet
effective approach for classifying sentiment based on the similarity of input
texts/documents to labeled instances in a training dataset.
The goal of sentiment analysis is to determine the sentiment expressed in a piece of text, whether
it's positive, negative, or neutral.
Social Media Monitoring: Analyzing sentiment in social media posts, comments, and
reviews to understand public opinion about products, services, events, or brands.
Customer Feedback Analysis: Assessing sentiment in customer reviews, surveys, and
feedback to identify areas for improvement and measure customer satisfaction.
Market Research: Analyzing sentiment in market reports, news articles, and financial
data to gauge market sentiment and make informed investment decisions.
Brand Monitoring and Reputation Management: Monitoring sentiment around a brand or
organization to manage reputation, address customer concerns, and improve brand
perception.
Product Analysis and Recommendation Systems: Analyzing sentiment in product reviews
and user feedback to improve product features, recommend products, and personalize
user experiences.
Political Analysis: Analyzing sentiment in political speeches, news articles, and social
media discussions to understand public opinion, election outcomes, and political trends.
Data Collection: Gather text data from various sources such as social media, customer
reviews, surveys, news articles, or any other relevant sources.
Preprocessing: Clean and preprocess the text data by removing noise, irrelevant
information, special characters, punctuation, and stopwords. Perform tokenization,
stemming, and lemmatization to normalize the text.
Feature Extraction: Represent the text data as numerical features that can be used by
machine learning algorithms. Common techniques include bag-of-words, TF-IDF (Term
Frequency-Inverse Document Frequency), word embeddings (e.g., Word2Vec, GloVe),
or character n-grams.
Evaluation: Evaluate the performance of the sentiment analysis model using appropriate
evaluation metrics such as accuracy, precision, recall, F1-score, or confusion matrix. Use
techniques like cross-validation to ensure the model's generalizability and robustness.
Sentiment Trends Over Time: Analyze how sentiments change over time to identify
patterns, trends, or events that may influence sentiment.
Key Sentiment Drivers: Identify the key topics, themes, or features mentioned in positive
and negative sentiments to understand what aspects are driving sentiment.
Speech analytics
Speech analytics is the process of analyzing spoken language to extract valuable insights,
patterns, and information. It involves the use of various technologies and techniques to
analyze recorded speech data, typically in the form of audio recordings or transcribed text.
Data Collection: Gather speech data from various sources such as call recordings,
voicemails, interviews, focus groups, or speech-to-text transcripts.
Business Intelligence
Speech Recognition: Convert spoken language into text using automatic speech
recognition (ASR) technology. This step is essential for processing and analyzing the
speech data.
Transcription and Text Processing: Clean and preprocess the transcribed text data by
removing noise, filler words, and irrelevant information. Perform text normalization,
tokenization, and part-of-speech tagging as needed.
Feature Extraction: Extract relevant features from the text data, such as sentiment,
emotion, keywords, topics, speaker characteristics, speech rate, or intonation patterns.
Analysis and Modeling: Apply analytical techniques and machine learning algorithms to
analyze the extracted features and derive actionable insights from the speech data. This
may involve sentiment analysis, topic modeling, clustering, classification, or other NLP
tasks.
Visualization and Reporting: Visualize the results of the analysis using charts, graphs,
dashboards, or reports to communicate key findings and insights effectively to
stakeholders.
Data Privacy and Security: Speech data may contain sensitive information, so it's
essential to ensure compliance with data privacy regulations and implement robust
security measures to protect confidentiality.
In summary, speech analytics is a valuable tool for extracting insights from spoken
language data across various domains and applications. By leveraging advanced
technologies and analytical techniques, organizations can gain valuable insights into
customer behavior, employee sentiments, market trends, and business performance,
leading to improved decision-making and outcomes.
Customer Service Optimization: Analyzing customer interactions with call center agents
to identify areas for improvement, measure customer satisfaction, and enhance the quality
of service.
Quality Assurance: Assessing the quality and effectiveness of training programs, scripts,
and call handling procedures by evaluating interactions between agents and customers.