0% found this document useful (0 votes)

121 views21 pages

Automatic Ticket Assignment AIML Online Capstone Group 6

The document summarizes the steps taken to automatically assign tickets to the appropriate resolution group. It involved loading a dataset of 8500 tickets, preprocessing the text through techniques like handling nulls, removing identifiers, expanding contractions, and translating foreign languages. Deterministic rules were applied to assign some tickets but did not significantly improve models. The data was then summarized, oversampled to address imbalances, and clustered within the majority class. Various machine learning and deep learning models were trained on the preprocessed and balanced data to predict assignments.

Uploaded by

Richa Anand

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views21 pages

Automatic Ticket Assignment AIML Online Capstone Group 6

Uploaded by

Richa Anand

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

qwertyuiopasdfghjklzxcvbnmqwerty

uiopasdfghjklzxcvbnmqwertyuiopasd
fghjklzxcvbnmqwertyuiopasdfghjklzx
cvbnmqwertyuiopasdfghjklzxcvbnmq
AUTOMATIC TICKET ASSIGNMENT
wertyuiopasdfghjklzxcvbnmqwertyui
AIML Online Capstone Group 6

opasdfghjklzxcvbnmqwertyuiopasdfg
05-Jul-20

Group 6

hjklzxcvbnmqwertyuiopasdfghjklzxc
vbnmqwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopasdfg
hjklzxcvbnmqwertyuiopasdfghjklzxc
vbnmqwertyuiopasdfghjklzxcvbnmq
wertyuiopasdfghjklzxcvbnmqwertyui
opasdfghjklzxcvbnmqwertyuiopasdfg
hjklzxcvbnmrtyuiopasdfghjklzxcvbn
mqwertyuiopasdfghjklzxcvbnmqwert
yuiopasdfghjklzxcvbnmqwertyuiopas
Summary of problem statement, data and findings

Problem Statement

Manual assignment of incidents is time consuming and requires human efforts. There may

be mistakes due to human errors and resource consumption is carried out ineffectively

because of the misaddressing. On the other hand, manual assignment increases the

response and resolution times which result in user satisfaction deterioration / poor customer

service.

Abstract

Applying traditional machine learning and neural network-based NLP to automatically

classify tickets and assign them to the right owner in a timely manner to save effort, increase

user satisfaction and improve throughput in the ticketing pipeline of an organization.

Solution Approach

We have used separate ipynb files for below:

1. All machine learning models

2. LSTM

3. Bi LSTM

4. GRU

5. Attention Layer

https://github.com/jayapavandeshpande/nlp_project/tree/NLP-Soumendra

Data & Findings

1. The dataset comprises of 8500 rows and 4 columns

2. All columns are of type object containing textual information.

3. There are 8 null/missing values present in the Short description and 1 null/missing

values present in the description column

4. Password reset is one of the most occurring tickets which reflects in the Short

description column.

5. The top occurring Description in the dataset is only the text 'the', which absolutely

doesn't make any sense. hence by looking at the Short description of such rows

reveals that these are also a category of Password reset.

Data provided in format

XLSX/CSV

Total Records
8500

Data Fields

Short description A summary of the issue faced by the user

Description Detailed description of the issue
Assignment group GRP_0 ~ GRP_73 (total 74 classes of
Assignment group)

Sample data

Short description Description Assignment group

login issue -verified user details.(employee# & manager na... GRP_0

outlook \r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail... GRP_0

cant log in to vpn \r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail... GRP_0

Distribution of classes and Observation

1. High imbalance seen in data for target column in our dataset with GRP_0 having

highest percent of representation

2. Many classes with very little representation.

3. Null values in Data:

a. Short description 8

b. Description 1

c. Assignment group 0

4. Observed few Ticket having Non-English ticket descriptions

Overview of the final process

1. Loaded the given dataset
2. Performed EDA on the dataset to identify the
a. Distribution of data,
b. Languages,
c. Length of each column of dataset.
3. Perform pre-processing
a. Replace email Ids
b. Contractions
c. Lemmatization
d. Replace the gibberish text using FTFY
e. Detect the language
f. Translate the non English content to English using Azure API
4. Identify the deterministic rules using MS SQL and MS Excel
5. Apply Deterministic rules on Dataset to address the groups before modelling
6. Perform text summarisation
7. Perform Up sampling on complete dataset except Group 0.
8. Perform Clustering on Group 0 records alone
9. Combine both the Up sampled and clustered data frames
10. Define independent and dependent features
11. Perform Label Encoding on dependant feature (target column)
12. Perform Vectorisation on independent features using TF-IDF/Tokenizer
13. Split the data to train and test dataset in 70:30 ratio
14. Apply Machine Learning models
a. Support Vector Machine (SVM)
b. Naïve Bayes Classifier
c. Random Forest
d. Light GBM
15. Apply Deep learning models
a. LSTM
b. Bi directional LSTM
c. RNN
d. Attention
16. Identify the best model based on the predictions
Step-by-step walk through the solution

Load Dataset
Loaded the input csv file into pandas data frame

EDA on dataset
1. Distribution of data
2. Identified the languages
3. Most common words
4. Top n words
5. Bi grams
6. Trigrams

Pre-Processing
1. Treating the missing values using Rake Algorithm: Rake extract Key phrases in a
body of text by analysing the frequency of word appearance and its co-occurrence
with other words in the text.
We are finding similar rows based upon identical Rake values and replacing the
missing NaN values.

2. Replace email Ids – Email Ids of users are replaced with common text ‘Email
Address’ from Description and Short Description as it does not hold any significance.
3. Expanded all contractions as this is an important step, because it will reduce
disambiguation between similar phrases
4. Grouped together the different inflected forms of a word using NLTK Lemmatization
so they can be analysed as a single item
5. Used Regex Library for removal of
a. Trailing spaces,
b. line breaks and tabs (\r\n\t),
c. Special characters and
d. Extra spaces.
e. Convert the data having garbled text such as mojibake using Ftfy Library.

Foreign language detection and translation

1. Identified around 28 languages - For language identification, we used Facebooks's
fastText library which can recognize more than 170 languages and classify
thousands of documents per second.
2. The model returns back two tuples back. ISO code and the confidence level. #
([['_label_de']], array([[0.96568173]]))
3. For converting the ISO code to language name we used pycountry library. eg: de -->
German
4. Translated using
a. Goslate
b. Google Translate API and
c. Azure APIs
5. Identified better translation accuracy using Google/Azure translator (translate 3.5.0)
which uses Microsoft Translation API, a cloud-based machine translation service that
extends the reach of apps in more than 60 languages
6. To the save number of hits to API, we have saved the already translated dataframe
into csv file and reusing it in our project.
Deterministic Rules
Our objective for using the deterministic rule is to identify and remove some of the
groups which can be easily predicted by these rules and hence can be bypassed before
feeding data into any kind of ML/DL Models.
Another added advantage is it will remove many GRP labels from Assignment group feature
that have only one row/record which are not enough to train models on them.

Deterministic rules are fixed set of rules /conditions that provide an accurate match
on given dataset.

The dataset provided is having 74 groups and the data distribution is not balanced. The
likelihood of predicting the accurate group may vary using a ML model due to this imbalance
and may cause over fitting.

To eliminate this problem, we have selected the classes with less than 20 samples and tried
to identify the similarities to accurately identify each of these classes.

We have queried the data for similarities within each class by loading data into Excel and
SQL. We were successfully able to identify rules to predict 12 classes completely and 4
classes partially.

The deterministic rules are applied on the translated data set. The predicted samples are
assigned a 'predicted group' as a new column in dataset.

When we evaluated model performance, we found that the dataset on which deterministic
rule is not implemented is giving more accuracy. This step is excluded in the final project.

Text Summarisation
Summarization aims to highlight important information within a large corpus. We
have used Gensim summarization on our dataset

Up sampling
Imbalanced datasets are those where there is a severe skew in the class distribution,
such as 1:100 or 1:1000 examples in the minority class to the majority class.

This bias in the dataset can influence many machine learning algorithms, leading some to
ignore the minority class entirely. This is a problem as it is typically the minority class on
which predictions are most important.

One approach to addressing the problem of class imbalance is to randomly resample the
dataset. The two main approaches to randomly re sampling an imbalanced dataset are

a. Deleting samples from the majority class known as under sampling, and
b. Duplicating samples from the minority class, known as Over sampling.
In our case, we are not considering under sampling as it would result in loss of data. Instead
we have used up sampling using Random oversampling technique.

We are passing the cleaned data set excluding the majority class (GRP_0) to this over
sampler. We were able to up sample 4524 minority class records (input) to 48253 records
(output).

Clustering
During our analysis we noticed that GRP_0 has around 3976 records out of 8500 records
and have different category of issues mapped within the same group. In order to identify the
similar categories and map to the subgroups we explored various clustering techniques and
finalized K-means clustering which is an unsupervised clustering algorithm which determines
the optimal number of clusters using the elbow method. It works iteratively by selecting a
random coordinate of the cluster centre and assign the data points to a cluster. It then
calculates the Euclidean distance of each data point from its centroid and based on this, it
updates the datapoint positions as well as the cluster centres.

For finding the optimum number of centroids, the ‘elbow method’ is used. In it, the SSE value
(sum of squared error) is calculated for different values of k (i.e no. of clusters) by clustering
the dataset following each value of k. The point on the graph where a ‘hinge’ occurs is
considered to be the optimal value of k (below figure) shows the elbow method for k means
algorithm. Thus, by looking at the
graph, the total number of clusters
is found to be 8.
Since we can get any clear and prominent elbow, we are opting for a high value of cluster
i.e. 8 because our objective is to divide the GRP_0 into maximum numbers of possible
subgroups so that the individual group counts of the subgroups will be small.

Using the cluster numbers obtained from the elbow method, we use the k-means algorithm
to predict the labels. Below is the data clustered as per the labels.

We used PCA to plot the clusters and to indicate the centre of the cluster we labelled them
as “x”

Based on the identified labels we categorized Grp_0 data into sub groups as depicted below:
Below are the group labels identified using clustering:

['GRP_0_1', 'GRP_0_0', 'GRP_0_6', 'GRP_0_2', 'GRP_0_5', 'GRP_0_4', 'GRP_0_7', 'GRP_0_3']

Define independent and dependent features

We are concatenating the Short description and Description as “Complete_Description”.

“Complete_Description” is considered as Independent attribute and Target – “Assignment

group” is Dependent attribute.

Label Encoding
Labels in Target (Dependent) column are encoded using sklearn Label Encoder. SkLearn
Label Encoder encodes target labels with value between 0 and n_classes-1.
Vectorization
Vectorization is used to score the relative importance of words. This can be done by either using TF-
IDF or Keras Tokenizer and BoW.

TF-IDF
Term Frequency (TF) is the number of times a word appears in a document divided by the total
number of words in the document.

Inverse Data Frequency (IDF) is the log of the number of documents divided by the number of
documents that contain the word w. Inverse data frequency determines the weight of rare words
across all documents in the corpus

TF-IDF is the TF multiplied by IDF.

BoW
In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its
words, disregarding grammar and even word order but keeping multiplicity.

Tokenizer
This class allows to vectorize a text corpus, by turning each text into either a sequence of integers
(each integer being the index of a token in a dictionary) or into a vector where the coefficient for
each token could be binary, based on word count, based on tf-idf.

Splitting Datasets
We have used the train_test_split function for splitting a single dataset into training and testing in
70:30 ratios.

The testing subset is for building the model. The testing subset is for using the model on unknown
data to evaluate the performance of the model.

Machine Learning Models

Support Vector Machine (SVM)
The Support Vector Machine (SVM) algorithm is a popular machine learning tool that offers solutions
for both classification and regression problems.

In addition to performing linear classification, SVMs can efficiently perform a non-linear classification
using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature
spaces.
Light GBM
LightGBM uses histogram-based algorithms which helps in speeding up training as well as
reduces memory usage. This algorithm constructs trees leaf-wise in a best-first order due to
which there is a tendency to achieve lower loss.

We tried Random forest and Naives bayes algorithims for prediction but the results
were significantly inferior. Hence, we dropped these algorithms to save run time of
the project.

Deep Learning Models

LSTM
Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture
used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has
feedback connections. It can not only process single data points (such as images), but also
entire sequences of data (such as speech or video).

Model Summary:
Model Performance:

Bi-Directional LSTM
Bidirectional RNN (BRNN) duplicates the RNN processing chain so that inputs are
processed in both forward and reverse time order. This allows a BRNN to look at future
context as well.

Model Summary:
Model Performance:

GRU
A gated recurrent unit (GRU) is a gating mechanism in recurrent neural networks (RNN)
similar to a long short-term memory (LSTM) unit but without an output gate. GRU’s try to
solve the vanishing gradient problem that can come with standard recurrent neural networks.

Model Summary:

Model Performance:
GRU accuracy is very poor(~10.94%) and hence, it is not suitable for our ticket generation
application.

Attention Mechanism
Given the tendency of RNNs to forget relevant information in the previous steps of the
sequence, this could lose some of the useful information encoded there. In order to keep
that information, we can use an average of the encoded states the RNN outputs. Since all
these encoded states of the RNN are equally valuable, we use a weighted sum of these
encoded states (i.e., our Attention mechanism) to make our prediction.

Model Summary:

Model Performance:

Model evaluation

Describe the final model in detail. What was the objective, what parameters were prominent, and
how did you evaluate the success of your models?

SVC LightGBM
Accuracy F1-Score Accuracy F1-Score
BoW 94.72% 0.9441 96.33 0.9615
Tf-Idf 92.41% 0.9213 96.2 0.960
Topic Modelling 92.41 0.9213 96.19 0.960

Model Accuracy
LSTM 94.31%
Bi-LSTM 94.56%
GRU 10.94%
Attention Layer 95.06%

Conclusion:
As per our analysis the best performing model with highest accuracy is BoW+ LightGBM which is
giving us 96.33% of accuracy with the F1 score of 0.9615. Hence, this model for ticket assignment
based upon available dataset.

Remark: We will switch to deep learning RNN models like Attention Layer and Bi LSTM if we will
have the access to a much larger dataset compared to the current one.

Comparison to benchmark

How does your final solution compare to the benchmark you laid out at the outset? Did you improve
on the benchmark? Why or why not?

Visualizations

In addition to quantifying your model and the solution, please include all relevant visualizations that
support the ideas/insights that you gleaned from the data.

Data Distribution:
Word Cloud for Short Description:
Word Cloud Description:
Distribution of tickets in different groups:
Ticket Distribution in various groups using matplotlib:
Visual Analysis of Change in Assignment Group Distribution before and After
Upscaling+Subgrouping GRP_0:

Implications

How does your solution affect the problem in the domain or business? What recommendations
would you make, and with what level of confidence?

In the digital era everyone is relying on the technology and expecting quick resolution to the
problems. In order to meet customer expectations, the L1/L2 IT staff is overwhelmed with the work,
and hiring more resources is not going to solve the problem. As volume of tickets will continue to
increase, AI-led solutions are the need of hour.

Our NLP model can process request like a human agent by reading the ticket description to make
sense of text data before categorizing the ticket to any group.

The Bidirectional LSTM model has around 93% accuracy and the precision will increase over time
with continuous training.

This solution when integrated with the ticket management workflow will also help to increase
customer satisfaction as tickets will be automatically sent to the right person (L2 Team), instead of
being passed from one agent to the next, and responses and resolutions can be completed more
accurately and faster while reducing customer effort. Our solution can act as a potential
replacement for L1 Team. Hence, saving both manpower and time for the organization.

Also, the ticket categorization and allocation can work 24/7 so if an urgent ticket is logged our
machine learning models can deal with them as quickly as possible.

This approach will also help in reducing human errors as learning applied by the models will be used
in ticket classification and hence increase the accuracy of the process.
Limitations

What are the limitations of your solution? Where does your model fall short in the real world? What
can you do to enhance the solution?

The dataset used for the analysis have only 8500 records. We have applied deterministic rules, up
sampling and clustering techniques to prepare the data which has been used to train and test the
model. If this solution is implemented directly in production then with there is a possibility that it
will not give the desired outcome.

We would have to work with Business Owners to first understand the real-world scenarios in which
these tickets are handled and categorized. The deterministic rules will be validated by the SMEs and
will be implemented first to identify the prediction by the solution.

Our next step would be to gather the enough and right set of data from the Business Owners and
validate our models. If needed we will tweak our models to ensure that they deliver the desired
results and outcomes are as per the business needs.

Closing Reflections
What have you learned from the process? What you do differently next time?

Key Learnings:

1. Use deterministic rules to eliminate the data which will not contribute much in machine learnings
and will increase the overhead. In our problem statement we got better results without using
deterministic rule-based predications.

2. When and how to apply clustering and up sampling techniques

3. Focus on data visualization and pre-processing techniques which are being applied and validate
the results

Flow Chart

FlowChart.html.drawi
o

Ritesh Tandon Machine Learning Project
100% (5)
Ritesh Tandon Machine Learning Project
23 pages
EYFS - Development Matters
100% (1)
EYFS - Development Matters
47 pages
The Giant Ilonggo Phrasebook: Over 2000 Ilonggo (Hiligaynon) Phrases and Notes That Will Help You Speak Like An Ilonggo
No ratings yet
The Giant Ilonggo Phrasebook: Over 2000 Ilonggo (Hiligaynon) Phrases and Notes That Will Help You Speak Like An Ilonggo
124 pages
Data Science
No ratings yet
Data Science
25 pages
Topic Analysis Presentation
No ratings yet
Topic Analysis Presentation
23 pages
SampleQuestion - AIOL 2024
No ratings yet
SampleQuestion - AIOL 2024
5 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
Day 10 PromptEngineering Day2 Sp25
No ratings yet
Day 10 PromptEngineering Day2 Sp25
52 pages
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
No ratings yet
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
54 pages
Sms Spam Using Machine Learning 4
No ratings yet
Sms Spam Using Machine Learning 4
42 pages
Manual
No ratings yet
Manual
48 pages
Data Mining - UOG (HH) - Final - F23-1
No ratings yet
Data Mining - UOG (HH) - Final - F23-1
10 pages
CH 04
No ratings yet
CH 04
47 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
FIT5196-S2-2020 Assessment 1: Task 1: Parsing Text Files (U)
No ratings yet
FIT5196-S2-2020 Assessment 1: Task 1: Parsing Text Files (U)
4 pages
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
No ratings yet
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
6 pages
AI Phase4
No ratings yet
AI Phase4
11 pages
AD3461-Machine Learning Lab Manual
No ratings yet
AD3461-Machine Learning Lab Manual
26 pages
Pricing Mercari
No ratings yet
Pricing Mercari
41 pages
Project Report
No ratings yet
Project Report
12 pages
Mini Projects
No ratings yet
Mini Projects
25 pages
Supervised Learningclassification Part3
No ratings yet
Supervised Learningclassification Part3
42 pages
ETL (Extract, Transform, Load)
No ratings yet
ETL (Extract, Transform, Load)
43 pages
Machine Learning For Technical Information Quality Assessment
No ratings yet
Machine Learning For Technical Information Quality Assessment
89 pages
22K61A0618 - Removed - Lab Manual Sasi CLD
No ratings yet
22K61A0618 - Removed - Lab Manual Sasi CLD
25 pages
Information Security Awareness - Refresher Course
100% (2)
Information Security Awareness - Refresher Course
83 pages
Com 327 Week 3 Practical Activity
No ratings yet
Com 327 Week 3 Practical Activity
8 pages
Data Science Report
No ratings yet
Data Science Report
33 pages
NM Engineering Projects
No ratings yet
NM Engineering Projects
22 pages
Exploratory Project Report
No ratings yet
Exploratory Project Report
57 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
No ratings yet
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
6 pages
NEEL (1) Edited Edited
No ratings yet
NEEL (1) Edited Edited
12 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Doubt Clearance Session (AI) On 29.12.2024
No ratings yet
Doubt Clearance Session (AI) On 29.12.2024
41 pages
Neel
No ratings yet
Neel
12 pages
Unstructured
No ratings yet
Unstructured
37 pages
IJCRT23A5429
No ratings yet
IJCRT23A5429
7 pages
ML Project Report
100% (2)
ML Project Report
35 pages
Python Lab Manual
No ratings yet
Python Lab Manual
33 pages
Aiml Lab Aim & Alg
No ratings yet
Aiml Lab Aim & Alg
22 pages
Answer Kwy AIML NW
No ratings yet
Answer Kwy AIML NW
10 pages
NLP MTE Syllabus and Practice Problems
No ratings yet
NLP MTE Syllabus and Practice Problems
2 pages
Credit Risk Project
No ratings yet
Credit Risk Project
11 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
ML Lab Draft Manual
No ratings yet
ML Lab Draft Manual
46 pages
Data Analytics Pyq
No ratings yet
Data Analytics Pyq
32 pages
NEEL
No ratings yet
NEEL
12 pages
Aca 19 CLH
No ratings yet
Aca 19 CLH
39 pages
ML Mini Project 2
No ratings yet
ML Mini Project 2
26 pages
Sentiment Analysis On Tweets
No ratings yet
Sentiment Analysis On Tweets
2 pages
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
No ratings yet
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
8 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Python - Project 2 Problem Statement
No ratings yet
Python - Project 2 Problem Statement
3 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
201CS240 Mllabmanual
No ratings yet
201CS240 Mllabmanual
20 pages
Lab 4 - Logistic Regression - KNN
No ratings yet
Lab 4 - Logistic Regression - KNN
4 pages
NEEL (1) - Edited
No ratings yet
NEEL (1) - Edited
12 pages
STS Activity 1
No ratings yet
STS Activity 1
2 pages
Scientific Thinking and Reasoning
No ratings yet
Scientific Thinking and Reasoning
6 pages
Unit 2 - Lesson 1 - Activity 1 & 2 - SLP 1
No ratings yet
Unit 2 - Lesson 1 - Activity 1 & 2 - SLP 1
7 pages
Philosophy of Design PDF
No ratings yet
Philosophy of Design PDF
21 pages
Mathematics Ss 1 1st Term Week 4
No ratings yet
Mathematics Ss 1 1st Term Week 4
14 pages
Aps 204
No ratings yet
Aps 204
8 pages
APSY-353 Developmental Psychology
No ratings yet
APSY-353 Developmental Psychology
3 pages
NST1501 Exam
No ratings yet
NST1501 Exam
4 pages
st20219157 BSP6064 WRIT1
No ratings yet
st20219157 BSP6064 WRIT1
8 pages
William Glasser Choice Theory-1
No ratings yet
William Glasser Choice Theory-1
13 pages
Chethana H N REPORT
No ratings yet
Chethana H N REPORT
12 pages
Presentation Tweet - 140 Ways To Present With Impact
No ratings yet
Presentation Tweet - 140 Ways To Present With Impact
22 pages
Unit 13 I See Numbers Activities
No ratings yet
Unit 13 I See Numbers Activities
3 pages
Inspiring People With Anxiety Through Graphic Design Quotes
No ratings yet
Inspiring People With Anxiety Through Graphic Design Quotes
1 page
Romania Vanda Study International
No ratings yet
Romania Vanda Study International
3 pages
Unit Ii Unpacking Self: Module 6 The Physical Self
No ratings yet
Unit Ii Unpacking Self: Module 6 The Physical Self
34 pages
COMP9491 Week2 Deep - Learning 1
No ratings yet
COMP9491 Week2 Deep - Learning 1
66 pages
Subject-Verb Agreement - Handout 24
No ratings yet
Subject-Verb Agreement - Handout 24
7 pages
Pe - Cooperative Games
No ratings yet
Pe - Cooperative Games
8 pages
What Why How KevinYousie
No ratings yet
What Why How KevinYousie
4 pages
1 - Information Society
100% (1)
1 - Information Society
12 pages
Issb Interview Script
No ratings yet
Issb Interview Script
4 pages
Midterm - Chapter 7
No ratings yet
Midterm - Chapter 7
6 pages
Ed 243 - Final Lesson Plan Bella Roumain Cec Binder
No ratings yet
Ed 243 - Final Lesson Plan Bella Roumain Cec Binder
11 pages
Life Skills June Test Grade 5 2025 - 090547
No ratings yet
Life Skills June Test Grade 5 2025 - 090547
7 pages
International NLP Practitioner Training and Certification Guide
No ratings yet
International NLP Practitioner Training and Certification Guide
13 pages
Purposive Communication (Activity 2)
No ratings yet
Purposive Communication (Activity 2)
1 page
M Tech Mid 2 Nnfs Paper
No ratings yet
M Tech Mid 2 Nnfs Paper
2 pages

Automatic Ticket Assignment AIML Online Capstone Group 6

Uploaded by

Automatic Ticket Assignment AIML Online Capstone Group 6

Uploaded by

qwertyuiopasdfghjklzxcvbnmqwerty

Applying traditional machine learning and neural network-based NLP to automatically

user satisfaction and improve throughput in the ticketing pipeline of an organization.

We have used separate ipynb files for below:

1. All machine learning models

Data & Findings

1. The dataset comprises of 8500 rows and 4 columns

2. All columns are of type object containing textual information.

values present in the description column

reveals that these are also a category of Password reset.

Data provided in format

Short description A summary of the issue faced by the user

Short description Description Assignment group

login issue -verified user details.(employee# & manager na... GRP_0

outlook \r\n\r\nreceived from: hmjdrvpb.komuaywn@gmail... GRP_0

cant log in to vpn \r\n\r\nreceived from: eylqgodm.ybqkwiam@gmail... GRP_0

Distribution of classes and Observation

highest percent of representation

3. Null values in Data:

4. Observed few Ticket having Non-English ticket descriptions

Overview of the final process

Foreign language detection and translation

['GRP_0_1', 'GRP_0_0', 'GRP_0_6', 'GRP_0_2', 'GRP_0_5', 'GRP_0_4', 'GRP_0_7', 'GRP_0_3']

Define independent and dependent features

“Complete_Description” is considered as Independent attribute and Target – “Assignment

TF-IDF is the TF multiplied by IDF.

Machine Learning Models

Deep Learning Models

2. When and how to apply clustering and up sampling techniques

You might also like