Automatic Ticket Assignment AIML Online Capstone Group 6
Automatic Ticket Assignment AIML Online Capstone Group 6
uiopasdfghjklzxcvbnmqwertyuiopasd
fghjklzxcvbnmqwertyuiopasdfghjklzx
cvbnmqwertyuiopasdfghjklzxcvbnmq
AUTOMATIC TICKET ASSIGNMENT
wertyuiopasdfghjklzxcvbnmqwertyui
AIML Online Capstone Group 6
opasdfghjklzxcvbnmqwertyuiopasdfg
05-Jul-20
Group 6
hjklzxcvbnmqwertyuiopasdfghjklzxc
vbnmqwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopasdfg
hjklzxcvbnmqwertyuiopasdfghjklzxc
vbnmqwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopasdfg
hjklzxcvbnmrtyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnmqwert
yuiopasdfghjklzxcvbnmqwertyuiopas
Summary of problem statement, data and findings
Problem Statement
Manual assignment of incidents is time consuming and requires human efforts. There may
be mistakes due to human errors and resource consumption is carried out ineffectively
because of the misaddressing. On the other hand, manual assignment increases the
response and resolution times which result in user satisfaction deterioration / poor customer
service.
Abstract
classify tickets and assign them to the right owner in a timely manner to save effort, increase
Solution Approach
2. LSTM
3. Bi LSTM
4. GRU
5. Attention Layer
https://github.com/jayapavandeshpande/nlp_project/tree/NLP-Soumendra
4. Password reset is one of the most occurring tickets which reflects in the Short
description column.
5. The top occurring Description in the dataset is only the text 'the', which absolutely
doesn't make any sense. hence by looking at the Short description of such rows
Total Records
8500
Data Fields
Sample data
1. High imbalance seen in data for target column in our dataset with GRP_0 having
a. Short description 8
b. Description 1
c. Assignment group 0
Load Dataset
Loaded the input csv file into pandas data frame
EDA on dataset
1. Distribution of data
2. Identified the languages
3. Most common words
4. Top n words
5. Bi grams
6. Trigrams
Pre-Processing
1. Treating the missing values using Rake Algorithm: Rake extract Key phrases in a
body of text by analysing the frequency of word appearance and its co-occurrence
with other words in the text.
We are finding similar rows based upon identical Rake values and replacing the
missing NaN values.
2. Replace email Ids – Email Ids of users are replaced with common text ‘Email
Address’ from Description and Short Description as it does not hold any significance.
3. Expanded all contractions as this is an important step, because it will reduce
disambiguation between similar phrases
4. Grouped together the different inflected forms of a word using NLTK Lemmatization
so they can be analysed as a single item
5. Used Regex Library for removal of
a. Trailing spaces,
b. line breaks and tabs (\r\n\t),
c. Special characters and
d. Extra spaces.
e. Convert the data having garbled text such as mojibake using Ftfy Library.
Deterministic rules are fixed set of rules /conditions that provide an accurate match
on given dataset.
The dataset provided is having 74 groups and the data distribution is not balanced. The
likelihood of predicting the accurate group may vary using a ML model due to this imbalance
and may cause over fitting.
To eliminate this problem, we have selected the classes with less than 20 samples and tried
to identify the similarities to accurately identify each of these classes.
We have queried the data for similarities within each class by loading data into Excel and
SQL. We were successfully able to identify rules to predict 12 classes completely and 4
classes partially.
The deterministic rules are applied on the translated data set. The predicted samples are
assigned a 'predicted group' as a new column in dataset.
When we evaluated model performance, we found that the dataset on which deterministic
rule is not implemented is giving more accuracy. This step is excluded in the final project.
Text Summarisation
Summarization aims to highlight important information within a large corpus. We
have used Gensim summarization on our dataset
Up sampling
Imbalanced datasets are those where there is a severe skew in the class distribution,
such as 1:100 or 1:1000 examples in the minority class to the majority class.
This bias in the dataset can influence many machine learning algorithms, leading some to
ignore the minority class entirely. This is a problem as it is typically the minority class on
which predictions are most important.
One approach to addressing the problem of class imbalance is to randomly resample the
dataset. The two main approaches to randomly re sampling an imbalanced dataset are
a. Deleting samples from the majority class known as under sampling, and
b. Duplicating samples from the minority class, known as Over sampling.
In our case, we are not considering under sampling as it would result in loss of data. Instead
we have used up sampling using Random oversampling technique.
We are passing the cleaned data set excluding the majority class (GRP_0) to this over
sampler. We were able to up sample 4524 minority class records (input) to 48253 records
(output).
Clustering
During our analysis we noticed that GRP_0 has around 3976 records out of 8500 records
and have different category of issues mapped within the same group. In order to identify the
similar categories and map to the subgroups we explored various clustering techniques and
finalized K-means clustering which is an unsupervised clustering algorithm which determines
the optimal number of clusters using the elbow method. It works iteratively by selecting a
random coordinate of the cluster centre and assign the data points to a cluster. It then
calculates the Euclidean distance of each data point from its centroid and based on this, it
updates the datapoint positions as well as the cluster centres.
For finding the optimum number of centroids, the ‘elbow method’ is used. In it, the SSE value
(sum of squared error) is calculated for different values of k (i.e no. of clusters) by clustering
the dataset following each value of k. The point on the graph where a ‘hinge’ occurs is
considered to be the optimal value of k (below figure) shows the elbow method for k means
algorithm. Thus, by looking at the
graph, the total number of clusters
is found to be 8.
Since we can get any clear and prominent elbow, we are opting for a high value of cluster
i.e. 8 because our objective is to divide the GRP_0 into maximum numbers of possible
subgroups so that the individual group counts of the subgroups will be small.
Using the cluster numbers obtained from the elbow method, we use the k-means algorithm
to predict the labels. Below is the data clustered as per the labels.
We used PCA to plot the clusters and to indicate the centre of the cluster we labelled them
as “x”
Based on the identified labels we categorized Grp_0 data into sub groups as depicted below:
Below are the group labels identified using clustering:
Label Encoding
Labels in Target (Dependent) column are encoded using sklearn Label Encoder. SkLearn
Label Encoder encodes target labels with value between 0 and n_classes-1.
Vectorization
Vectorization is used to score the relative importance of words. This can be done by either using TF-
IDF or Keras Tokenizer and BoW.
TF-IDF
Term Frequency (TF) is the number of times a word appears in a document divided by the total
number of words in the document.
Inverse Data Frequency (IDF) is the log of the number of documents divided by the number of
documents that contain the word w. Inverse data frequency determines the weight of rare words
across all documents in the corpus
BoW
In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its
words, disregarding grammar and even word order but keeping multiplicity.
Tokenizer
This class allows to vectorize a text corpus, by turning each text into either a sequence of integers
(each integer being the index of a token in a dictionary) or into a vector where the coefficient for
each token could be binary, based on word count, based on tf-idf.
Splitting Datasets
We have used the train_test_split function for splitting a single dataset into training and testing in
70:30 ratios.
The testing subset is for building the model. The testing subset is for using the model on unknown
data to evaluate the performance of the model.
In addition to performing linear classification, SVMs can efficiently perform a non-linear classification
using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature
spaces.
Light GBM
LightGBM uses histogram-based algorithms which helps in speeding up training as well as
reduces memory usage. This algorithm constructs trees leaf-wise in a best-first order due to
which there is a tendency to achieve lower loss.
We tried Random forest and Naives bayes algorithims for prediction but the results
were significantly inferior. Hence, we dropped these algorithms to save run time of
the project.
Model Summary:
Model Performance:
Bi-Directional LSTM
Bidirectional RNN (BRNN) duplicates the RNN processing chain so that inputs are
processed in both forward and reverse time order. This allows a BRNN to look at future
context as well.
Model Summary:
Model Performance:
GRU
A gated recurrent unit (GRU) is a gating mechanism in recurrent neural networks (RNN)
similar to a long short-term memory (LSTM) unit but without an output gate. GRU’s try to
solve the vanishing gradient problem that can come with standard recurrent neural networks.
Model Summary:
Model Performance:
GRU accuracy is very poor(~10.94%) and hence, it is not suitable for our ticket generation
application.
Attention Mechanism
Given the tendency of RNNs to forget relevant information in the previous steps of the
sequence, this could lose some of the useful information encoded there. In order to keep
that information, we can use an average of the encoded states the RNN outputs. Since all
these encoded states of the RNN are equally valuable, we use a weighted sum of these
encoded states (i.e., our Attention mechanism) to make our prediction.
Model Summary:
Model Performance:
Model evaluation
Describe the final model in detail. What was the objective, what parameters were prominent, and
how did you evaluate the success of your models?
SVC LightGBM
Accuracy F1-Score Accuracy F1-Score
BoW 94.72% 0.9441 96.33 0.9615
Tf-Idf 92.41% 0.9213 96.2 0.960
Topic Modelling 92.41 0.9213 96.19 0.960
Model Accuracy
LSTM 94.31%
Bi-LSTM 94.56%
GRU 10.94%
Attention Layer 95.06%
Conclusion:
As per our analysis the best performing model with highest accuracy is BoW+ LightGBM which is
giving us 96.33% of accuracy with the F1 score of 0.9615. Hence, this model for ticket assignment
based upon available dataset.
Remark: We will switch to deep learning RNN models like Attention Layer and Bi LSTM if we will
have the access to a much larger dataset compared to the current one.
Comparison to benchmark
How does your final solution compare to the benchmark you laid out at the outset? Did you improve
on the benchmark? Why or why not?
Visualizations
In addition to quantifying your model and the solution, please include all relevant visualizations that
support the ideas/insights that you gleaned from the data.
Data Distribution:
Word Cloud for Short Description:
Word Cloud Description:
Distribution of tickets in different groups:
Ticket Distribution in various groups using matplotlib:
Visual Analysis of Change in Assignment Group Distribution before and After
Upscaling+Subgrouping GRP_0:
Implications
How does your solution affect the problem in the domain or business? What recommendations
would you make, and with what level of confidence?
In the digital era everyone is relying on the technology and expecting quick resolution to the
problems. In order to meet customer expectations, the L1/L2 IT staff is overwhelmed with the work,
and hiring more resources is not going to solve the problem. As volume of tickets will continue to
increase, AI-led solutions are the need of hour.
Our NLP model can process request like a human agent by reading the ticket description to make
sense of text data before categorizing the ticket to any group.
The Bidirectional LSTM model has around 93% accuracy and the precision will increase over time
with continuous training.
This solution when integrated with the ticket management workflow will also help to increase
customer satisfaction as tickets will be automatically sent to the right person (L2 Team), instead of
being passed from one agent to the next, and responses and resolutions can be completed more
accurately and faster while reducing customer effort. Our solution can act as a potential
replacement for L1 Team. Hence, saving both manpower and time for the organization.
Also, the ticket categorization and allocation can work 24/7 so if an urgent ticket is logged our
machine learning models can deal with them as quickly as possible.
This approach will also help in reducing human errors as learning applied by the models will be used
in ticket classification and hence increase the accuracy of the process.
Limitations
What are the limitations of your solution? Where does your model fall short in the real world? What
can you do to enhance the solution?
The dataset used for the analysis have only 8500 records. We have applied deterministic rules, up
sampling and clustering techniques to prepare the data which has been used to train and test the
model. If this solution is implemented directly in production then with there is a possibility that it
will not give the desired outcome.
We would have to work with Business Owners to first understand the real-world scenarios in which
these tickets are handled and categorized. The deterministic rules will be validated by the SMEs and
will be implemented first to identify the prediction by the solution.
Our next step would be to gather the enough and right set of data from the Business Owners and
validate our models. If needed we will tweak our models to ensure that they deliver the desired
results and outcomes are as per the business needs.
Closing Reflections
What have you learned from the process? What you do differently next time?
Key Learnings:
1. Use deterministic rules to eliminate the data which will not contribute much in machine learnings
and will increase the overhead. In our problem statement we got better results without using
deterministic rule-based predications.
3. Focus on data visualization and pre-processing techniques which are being applied and validate
the results
Flow Chart
FlowChart.html.drawi
o