0% found this document useful (0 votes)
121 views21 pages

Automatic Ticket Assignment AIML Online Capstone Group 6

The document summarizes the steps taken to automatically assign tickets to the appropriate resolution group. It involved loading a dataset of 8500 tickets, preprocessing the text through techniques like handling nulls, removing identifiers, expanding contractions, and translating foreign languages. Deterministic rules were applied to assign some tickets but did not significantly improve models. The data was then summarized, oversampled to address imbalances, and clustered within the majority class. Various machine learning and deep learning models were trained on the preprocessed and balanced data to predict assignments.

Uploaded by

Richa Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views21 pages

Automatic Ticket Assignment AIML Online Capstone Group 6

The document summarizes the steps taken to automatically assign tickets to the appropriate resolution group. It involved loading a dataset of 8500 tickets, preprocessing the text through techniques like handling nulls, removing identifiers, expanding contractions, and translating foreign languages. Deterministic rules were applied to assign some tickets but did not significantly improve models. The data was then summarized, oversampled to address imbalances, and clustered within the majority class. Various machine learning and deep learning models were trained on the preprocessed and balanced data to predict assignments.

Uploaded by

Richa Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

qwertyuiopasdfghjklzxcvbnmqwerty

uiopasdfghjklzxcvbnmqwertyuiopasd
fghjklzxcvbnmqwertyuiopasdfghjklzx
cvbnmqwertyuiopasdfghjklzxcvbnmq
AUTOMATIC TICKET ASSIGNMENT
wertyuiopasdfghjklzxcvbnmqwertyui
AIML Online Capstone Group 6

opasdfghjklzxcvbnmqwertyuiopasdfg
05-Jul-20

Group 6

hjklzxcvbnmqwertyuiopasdfghjklzxc
vbnmqwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopasdfg
hjklzxcvbnmqwertyuiopasdfghjklzxc
vbnmqwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopasdfg
hjklzxcvbnmrtyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnmqwert
yuiopasdfghjklzxcvbnmqwertyuiopas
Summary of problem statement, data and findings

Problem Statement

Manual assignment of incidents is time consuming and requires human efforts. There may

be mistakes due to human errors and resource consumption is carried out ineffectively

because of the misaddressing. On the other hand, manual assignment increases the

response and resolution times which result in user satisfaction deterioration / poor customer

service.

Abstract

Applying traditional machine learning and neural network-based NLP to automatically

classify tickets and assign them to the right owner in a timely manner to save effort, increase

user satisfaction and improve throughput in the ticketing pipeline of an organization.

Solution Approach

We have used separate ipynb files for below:

1. All machine learning models

2. LSTM

3. Bi LSTM

4. GRU

5. Attention Layer

https://github.com/jayapavandeshpande/nlp_project/tree/NLP-Soumendra

Data & Findings

1. The dataset comprises of 8500 rows and 4 columns

2. All columns are of type object containing textual information.


3. There are 8 null/missing values present in the Short description and 1 null/missing

values present in the description column

4. Password reset is one of the most occurring tickets which reflects in the Short

description column.

5. The top occurring Description in the dataset is only the text 'the', which absolutely

doesn't make any sense. hence by looking at the Short description of such rows

reveals that these are also a category of Password reset.

Data provided in format


XLSX/CSV

Total Records
8500

Data Fields

Short description A summary of the issue faced by the user


Description Detailed description of the issue
Assignment group GRP_0 ~ GRP_73 (total 74 classes of
Assignment group)

Sample data

Short description Description Assignment group

login issue -verified user details.(employee# & manager na... GRP_0

outlook \r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail... GRP_0

cant log in to vpn \r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail... GRP_0

Distribution of classes and Observation

1. High imbalance seen in data for target column in our dataset with GRP_0 having

highest percent of representation


2. Many classes with very little representation.

3. Null values in Data:

a. Short description 8

b. Description 1

c. Assignment group 0

4. Observed few Ticket having Non-English ticket descriptions

Overview of the final process


1. Loaded the given dataset
2. Performed EDA on the dataset to identify the
a. Distribution of data,
b. Languages,
c. Length of each column of dataset.
3. Perform pre-processing
a. Replace email Ids
b. Contractions
c. Lemmatization
d. Replace the gibberish text using FTFY
e. Detect the language
f. Translate the non English content to English using Azure API
4. Identify the deterministic rules using MS SQL and MS Excel
5. Apply Deterministic rules on Dataset to address the groups before modelling
6. Perform text summarisation
7. Perform Up sampling on complete dataset except Group 0.
8. Perform Clustering on Group 0 records alone
9. Combine both the Up sampled and clustered data frames
10. Define independent and dependent features
11. Perform Label Encoding on dependant feature (target column)
12. Perform Vectorisation on independent features using TF-IDF/Tokenizer
13. Split the data to train and test dataset in 70:30 ratio
14. Apply Machine Learning models
a. Support Vector Machine (SVM)
b. Naïve Bayes Classifier
c. Random Forest
d. Light GBM
15. Apply Deep learning models
a. LSTM
b. Bi directional LSTM
c. RNN
d. Attention
16. Identify the best model based on the predictions
Step-by-step walk through the solution

Load Dataset
Loaded the input csv file into pandas data frame

EDA on dataset
1. Distribution of data
2. Identified the languages
3. Most common words
4. Top n words
5. Bi grams
6. Trigrams

Pre-Processing
1. Treating the missing values using Rake Algorithm: Rake extract Key phrases in a
body of text by analysing the frequency of word appearance and its co-occurrence
with other words in the text.
We are finding similar rows based upon identical Rake values and replacing the
missing NaN values.

2. Replace email Ids – Email Ids of users are replaced with common text ‘Email
Address’ from Description and Short Description as it does not hold any significance.
3. Expanded all contractions as this is an important step, because it will reduce
disambiguation between similar phrases
4. Grouped together the different inflected forms of a word using NLTK Lemmatization
so they can be analysed as a single item
5. Used Regex Library for removal of
a. Trailing spaces,
b. line breaks and tabs (\r\n\t),
c. Special characters and
d. Extra spaces.
e. Convert the data having garbled text such as mojibake using Ftfy Library.

Foreign language detection and translation


1. Identified around 28 languages - For language identification, we used Facebooks's
fastText library which can recognize more than 170 languages and classify
thousands of documents per second.
2. The model returns back two tuples back. ISO code and the confidence level. #
([['_label_de']], array([[0.96568173]]))
3. For converting the ISO code to language name we used pycountry library. eg: de -->
German
4. Translated using
a. Goslate
b. Google Translate API and
c. Azure APIs
5. Identified better translation accuracy using Google/Azure translator (translate 3.5.0)
which uses Microsoft Translation API, a cloud-based machine translation service that
extends the reach of apps in more than 60 languages
6. To the save number of hits to API, we have saved the already translated dataframe
into csv file and reusing it in our project.
Deterministic Rules
Our objective for using the deterministic rule is to identify and remove some of the
groups which can be easily predicted by these rules and hence can be bypassed before
feeding data into any kind of ML/DL Models.
Another added advantage is it will remove many GRP labels from Assignment group feature
that have only one row/record which are not enough to train models on them.

Deterministic rules are fixed set of rules /conditions that provide an accurate match
on given dataset.

The dataset provided is having 74 groups and the data distribution is not balanced. The
likelihood of predicting the accurate group may vary using a ML model due to this imbalance
and may cause over fitting.

To eliminate this problem, we have selected the classes with less than 20 samples and tried
to identify the similarities to accurately identify each of these classes.

We have queried the data for similarities within each class by loading data into Excel and
SQL. We were successfully able to identify rules to predict 12 classes completely and 4
classes partially.

The deterministic rules are applied on the translated data set. The predicted samples are
assigned a 'predicted group' as a new column in dataset.

When we evaluated model performance, we found that the dataset on which deterministic
rule is not implemented is giving more accuracy. This step is excluded in the final project.

Text Summarisation
Summarization aims to highlight important information within a large corpus. We
have used Gensim summarization on our dataset

Up sampling
Imbalanced datasets are those where there is a severe skew in the class distribution,
such as 1:100 or 1:1000 examples in the minority class to the majority class.

This bias in the dataset can influence many machine learning algorithms, leading some to
ignore the minority class entirely. This is a problem as it is typically the minority class on
which predictions are most important.

One approach to addressing the problem of class imbalance is to randomly resample the
dataset. The two main approaches to randomly re sampling an imbalanced dataset are

a. Deleting samples from the majority class known as under sampling, and
b. Duplicating samples from the minority class, known as Over sampling.
In our case, we are not considering under sampling as it would result in loss of data. Instead
we have used up sampling using Random oversampling technique.

We are passing the cleaned data set excluding the majority class (GRP_0) to this over
sampler. We were able to up sample 4524 minority class records (input) to 48253 records
(output).

Clustering
During our analysis we noticed that GRP_0 has around 3976 records out of 8500 records
and have different category of issues mapped within the same group. In order to identify the
similar categories and map to the subgroups we explored various clustering techniques and
finalized K-means clustering which is an unsupervised clustering algorithm which determines
the optimal number of clusters using the elbow method. It works iteratively by selecting a
random coordinate of the cluster centre and assign the data points to a cluster. It then
calculates the Euclidean distance of each data point from its centroid and based on this, it
updates the datapoint positions as well as the cluster centres.

For finding the optimum number of centroids, the ‘elbow method’ is used. In it, the SSE value
(sum of squared error) is calculated for different values of k (i.e no. of clusters) by clustering
the dataset following each value of k. The point on the graph where a ‘hinge’ occurs is
considered to be the optimal value of k (below figure) shows the elbow method for k means
algorithm. Thus, by looking at the
graph, the total number of clusters
is found to be 8.
Since we can get any clear and prominent elbow, we are opting for a high value of cluster
i.e. 8 because our objective is to divide the GRP_0 into maximum numbers of possible
subgroups so that the individual group counts of the subgroups will be small.

Using the cluster numbers obtained from the elbow method, we use the k-means algorithm
to predict the labels. Below is the data clustered as per the labels.

We used PCA to plot the clusters and to indicate the centre of the cluster we labelled them
as “x”

Based on the identified labels we categorized Grp_0 data into sub groups as depicted below:
Below are the group labels identified using clustering:

['GRP_0_1', 'GRP_0_0', 'GRP_0_6', 'GRP_0_2', 'GRP_0_5', 'GRP_0_4', 'GRP_0_7', 'GRP_0_3']

Define independent and dependent features


We are concatenating the Short description and Description as “Complete_Description”.

“Complete_Description” is considered as Independent attribute and Target – “Assignment


group” is Dependent attribute.

Label Encoding
Labels in Target (Dependent) column are encoded using sklearn Label Encoder. SkLearn
Label Encoder encodes target labels with value between 0 and n_classes-1.
Vectorization
Vectorization is used to score the relative importance of words. This can be done by either using TF-
IDF or Keras Tokenizer and BoW.

TF-IDF
Term Frequency (TF) is the number of times a word appears in a document divided by the total
number of words in the document.

Inverse Data Frequency (IDF) is the log of the number of documents divided by the number of
documents that contain the word w. Inverse data frequency determines the weight of rare words
across all documents in the corpus

TF-IDF is the TF multiplied by IDF.

BoW
In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its
words, disregarding grammar and even word order but keeping multiplicity.

Tokenizer
This class allows to vectorize a text corpus, by turning each text into either a sequence of integers
(each integer being the index of a token in a dictionary) or into a vector where the coefficient for
each token could be binary, based on word count, based on tf-idf.

Splitting Datasets
We have used the train_test_split function for splitting a single dataset into training and testing in
70:30 ratios.

The testing subset is for building the model. The testing subset is for using the model on unknown
data to evaluate the performance of the model.

Machine Learning Models


Support Vector Machine (SVM)
The Support Vector Machine (SVM) algorithm is a popular machine learning tool that offers solutions
for both classification and regression problems.

In addition to performing linear classification, SVMs can efficiently perform a non-linear classification
using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature
spaces.
Light GBM
LightGBM uses histogram-based algorithms which helps in speeding up training as well as
reduces memory usage. This algorithm constructs trees leaf-wise in a best-first order due to
which there is a tendency to achieve lower loss.

We tried Random forest and Naives bayes algorithims for prediction but the results
were significantly inferior. Hence, we dropped these algorithms to save run time of
the project.

Deep Learning Models


LSTM
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture
used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has
feedback connections. It can not only process single data points (such as images), but also
entire sequences of data (such as speech or video).

Model Summary:
Model Performance:

Bi-Directional LSTM
Bidirectional RNN (BRNN) duplicates the RNN processing chain so that inputs are
processed in both forward and reverse time order. This allows a BRNN to look at future
context as well.

Model Summary:
Model Performance:

GRU
A gated recurrent unit (GRU) is a gating mechanism in recurrent neural networks (RNN)
similar to a long short-term memory (LSTM) unit but without an output gate. GRU’s try to
solve the vanishing gradient problem that can come with standard recurrent neural networks.

Model Summary:

Model Performance:
GRU accuracy is very poor(~10.94%) and hence, it is not suitable for our ticket generation
application.

Attention Mechanism
Given the tendency of RNNs to forget relevant information in the previous steps of the
sequence, this could lose some of the useful information encoded there. In order to keep
that information, we can use an average of the encoded states the RNN outputs. Since all
these encoded states of the RNN are equally valuable, we use a weighted sum of these
encoded states (i.e., our Attention mechanism) to make our prediction.

Model Summary:

Model Performance:

Model evaluation

Describe the final model in detail. What was the objective, what parameters were prominent, and
how did you evaluate the success of your models?

SVC LightGBM
Accuracy F1-Score Accuracy F1-Score
BoW 94.72% 0.9441 96.33 0.9615
Tf-Idf 92.41% 0.9213 96.2 0.960
Topic Modelling 92.41 0.9213 96.19 0.960

Model Accuracy
LSTM 94.31%
Bi-LSTM 94.56%
GRU 10.94%
Attention Layer 95.06%

Conclusion:
As per our analysis the best performing model with highest accuracy is BoW+ LightGBM which is
giving us 96.33% of accuracy with the F1 score of 0.9615. Hence, this model for ticket assignment
based upon available dataset.

Remark: We will switch to deep learning RNN models like Attention Layer and Bi LSTM if we will
have the access to a much larger dataset compared to the current one.

Comparison to benchmark

How does your final solution compare to the benchmark you laid out at the outset? Did you improve
on the benchmark? Why or why not?

Visualizations

In addition to quantifying your model and the solution, please include all relevant visualizations that
support the ideas/insights that you gleaned from the data.

Data Distribution:
Word Cloud for Short Description:
Word Cloud Description:
Distribution of tickets in different groups:
Ticket Distribution in various groups using matplotlib:
Visual Analysis of Change in Assignment Group Distribution before and After
Upscaling+Subgrouping GRP_0:

Implications

How does your solution affect the problem in the domain or business? What recommendations
would you make, and with what level of confidence?

In the digital era everyone is relying on the technology and expecting quick resolution to the
problems. In order to meet customer expectations, the L1/L2 IT staff is overwhelmed with the work,
and hiring more resources is not going to solve the problem. As volume of tickets will continue to
increase, AI-led solutions are the need of hour.

Our NLP model can process request like a human agent by reading the ticket description to make
sense of text data before categorizing the ticket to any group.

The Bidirectional LSTM model has around 93% accuracy and the precision will increase over time
with continuous training.

This solution when integrated with the ticket management workflow will also help to increase
customer satisfaction as tickets will be automatically sent to the right person (L2 Team), instead of
being passed from one agent to the next, and responses and resolutions can be completed more
accurately and faster while reducing customer effort. Our solution can act as a potential
replacement for L1 Team. Hence, saving both manpower and time for the organization.

Also, the ticket categorization and allocation can work 24/7 so if an urgent ticket is logged our
machine learning models can deal with them as quickly as possible.

This approach will also help in reducing human errors as learning applied by the models will be used
in ticket classification and hence increase the accuracy of the process.
Limitations

What are the limitations of your solution? Where does your model fall short in the real world? What
can you do to enhance the solution?

The dataset used for the analysis have only 8500 records. We have applied deterministic rules, up
sampling and clustering techniques to prepare the data which has been used to train and test the
model. If this solution is implemented directly in production then with there is a possibility that it
will not give the desired outcome.

We would have to work with Business Owners to first understand the real-world scenarios in which
these tickets are handled and categorized. The deterministic rules will be validated by the SMEs and
will be implemented first to identify the prediction by the solution.

Our next step would be to gather the enough and right set of data from the Business Owners and
validate our models. If needed we will tweak our models to ensure that they deliver the desired
results and outcomes are as per the business needs.

Closing Reflections
What have you learned from the process? What you do differently next time?

Key Learnings:

1. Use deterministic rules to eliminate the data which will not contribute much in machine learnings
and will increase the overhead. In our problem statement we got better results without using
deterministic rule-based predications.

2. When and how to apply clustering and up sampling techniques

3. Focus on data visualization and pre-processing techniques which are being applied and validate
the results

Flow Chart

FlowChart.html.drawi
o

You might also like