0% found this document useful (0 votes)

45 views7 pages

Analysis of User Behavior Patterns Using Machine Learning Algorithms

important document, machine learning, data set, Ai

Uploaded by

kamaram Monira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views7 pages

Analysis of User Behavior Patterns Using Machine Learning Algorithms

important document, machine learning, data set, Ai

Uploaded by

kamaram Monira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2023 International Conference on Recent Advances in Science & Engineering Technology (ICRASET)

Analysis of User Behavior Patterns using Machine

Learning Algorithms
2023 International Conference on Recent Advances in Science and Engineering Technology (ICRASET) | 979-8-3503-0692-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICRASET59632.2023.10419986

Harshitha Y J P Jayarekha
Dept. of ISE, Dept. of ISE
B.M.S. College of Engineering BMS College of Engineering
Bangalore, India Bangalore, India
[email protected] [email protected]

Abstract— A weblog is a dynamic record of transactions to eliminate errors and inconsistencies, user identification to
regularly updated by website visitors. It contains a variety of distinguish individual visitors, session identification to group
information, such as IP addresses, status codes, bytes sent, related actions, content retrieval to extract relevant
categories, and timestamps. The primary purpose of a weblog is information, and path completion to understand the flow of
to monitor user behavior and classify their interests based on
user interactions on the website.
different categories and qualities. This study aims to achieve two
main objectives: first, to categorize successful responses, and By leveraging web mining techniques, particularly web
second, to distinguish between normal and abnormal user usage mining, web content mining, and web structure mining,
behavior. The research process outlined in this paper involves researchers and they gain valuable insights into user
several steps. It begins with data collection, where relevant
information is gathered from the weblogs. The next step is data
behaviour, preferences, and interests. Understanding user
pre-processing, which involves cleaning and organizing the data navigation patterns and behaviour can significantly improve
to make it suitable for analysis. The researchers then employ the design of online services, enhance user experience, and
clustering techniques to identify different patterns of user provide more relevant and personalized content to users.
activity, enabling a quick assessment and prediction of user
behavior.The main focus of the research is user prediction,
There are several benefits to predicting typical or aberrant
achieved by analyzing user preferences extracted from the behaviour in online applications and websites, which can
weblogs at various levels. To accomplish this, the researchers improve security, improve the user experience, and optimise
utilize various machine learning techniques. One of the speed.
implemented models is the Random Forest Classifier, which is
utilized to predict user behavior based on specific input Web server activity prediction methods include online
parameters. and offline elements. Online analysis uses the weblog in real-
time, utilising user intuition lists, whereas offline analysis
Keywords— Weblog, Machine learning, clustering uses past data, such as log files or downloaded weblogs, for
analysis. Historical weblog analysis of user behaviour reveals
I. INTRODUCTION trends of web navigation. Web traversal patterns can be
In today's digital age, consumers have access to an broken down into frequent and semi-frequent sequences
abundance of online information. However, sifting through using classifiers, which can provide information about client
this mass of information to find relevant and valuable content preferences.
has evolved more and more challenging. Analysing and
For our work we consider this list as data and use it for
modeling web navigation behaviour can be beneficial in
analysis. We perform a comparative analysis on different
understanding user preferences and requests for online
Machine learning techniques on our dataset to check how true
services. Web mining, a data mining technique, plays an
the data is based on Root mean square error and R2 score. We
important role in collecting and analysing significant data
will also implement Random Forest classifier Machine
from web data. It comprises three primary subfields: web
Learning Model to identify the behavior of User.
content mining, web structure mining, and web usage mining,
each specializing in various types of data. II. REVIEW OF LITERATURE
Web content mining involves collecting data from [1]. Despite the ability to collect vast amounts of logging
different online resources, like text, images, audio, and videos data from each encounter, security experts face challenges in
available on the internet. Web structure mining focuses on identifying attacks. The system allows a predefined set of
studying the link architecture of websites to uncover activities, such as system calls in any operating system or
meaningful patterns and relationships among web pages. On actions like "Search Item" and "Filter Results" in online
the other hand, web usage mining delves into online user stores. Interaction logs can be leveraged to develop
activities, including the analysis of weblogs, which offer automated security models that protect against invasions.
insightful information into user behaviour. Weblogs record Empirical evidence demonstrates that informed modelling
the actions of each website visitor, enabling the prediction of effectively captures typical behaviour, enabling the
user behaviour. identification of abnormal conduct. The dataset spans 31 days
and consists of over 15,000 sessions conducted by 1,400
However, weblogs are typically unstructured,
users, involving nearly 300 different actions. To address this,
necessitating data pre-processing before meaningful analysis
we propose a strategy that employs machine learning
can be performed. Data pre-processing involves transforming
techniques, specifically LSTM neural networks, to simulate
the raw weblog data into a processed format that reveals user
typical system interaction behaviour. The suggested
navigation patterns. This includes steps such as data cleansing
979-8-3503-0692-7/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: Georgia State University. Downloaded on May 14,2024 at 19:19:19 UTC from IEEE Xplore. Restrictions apply.
methodology is evaluated using a dataset that includes [7] This study introduces a concept called multi-
interaction logs from a login and security server administrator dimensional semantic activity space, where user behaviour
interface. features are aggregated and represented as vectors. By
analysing action data from log files across different
[2]The WE dataset is derived from a particular user's
subsystems in specific domains, the researchers identify user
anonymous online browsing behaviour using a search engine.
behaviour patterns. Experimental results strongly support the
It comprises a sequence of activities, including
efficiency of the proposed approach in detecting variations in
"ActionSearchUser," "ActionDisplayUser," and more. To
typical behavioural characteristics of participants across
analyse this dataset, several algorithms were evaluated, such
different domains. These variations include patterns related
as K-Nearest Neighbour (KNN), Naive Bayes (NB), Support
to resource access, operational tasks, and performance
Vector Machine, and K-means clustering. According to the
evaluation. Overall, the study demonstrates the potential of
findings, the Naive Bayes algorithm stands out with an
this approach to gain meaningful insights into user behaviour
impressive prediction accuracy of around 90.4%.
across various domains.
[3] This study centres on effectively classifying user
[8] In this paper, researchers employ Big Data Analytics
behaviour by utilizing keystroke dynamics for authentication.
and Machine Learning algorithms to predict whether users
The behavioural biometrics of users are recorded, and
are legitimate or malicious based on the Application Layer
machine learning principles are employed to categorize them.
logs generated by their browsing patterns. The system
The study involved gathering anonymized data from 94 users
processes real-time data sourced from their internet-launched
to verify their identities. Classification was based on the
application, involving over 10 million lines of application-
events of button presses and action timestamps. Among the
layer logs for analysis. Among the machine learning
classifiers used, the SVM, RBF classifier exhibited the
algorithms used, the Random Forest Algorithm achieves
highest performance metrics in terms of classification.
higher accuracy in the prediction task. This algorithm
Additionally, grid search optimization was employed to find
demonstrates superior performance in distinguishing between
out the optimal values for the RBF kernel.
legitimate and malicious users based on the analysed
[4] A model for a behaviour-based anomaly detection browsing patterns recorded in the Application Layer logs.
system from the Android device has been developed using The findings emphasize the effectiveness of using Big Data
machine learning. The goal of this system is to identify Analytics and the Random Forest Algorithm to detect
malware vulnerabilities on the actions performed by mobile potential malicious activities and classify user behaviour.
applications. Three machine learning algorithms were
deployed in this system: K-Nearest Neighbour (KNN), Naïve III. METHODOLOGY
Bayes, and a decision tree method. Among these algorithms, Dataset Preprocessing Knowledge
KNN demonstrated the highest accuracy in determining
mobile application behaviour within the system, providing Base
the most accurate results.
[5] The researchers propose an innovative ensemble
hybrid machine learning strategy to identify additive outliers
in behaviour patterns based on their spatial-temporal
properties. This approach combines Multi-State Long Short- Identify User Classifier Clustering
Term Memory and Convolutional Neural Networks for time Behavior
series anomaly detection. By experimenting, they found that
utilizing Multistate LSTM outperforms using a single-state
basic LSTM model. To evaluate its effectiveness, the model Fig.1 presents a comprehensive framework designed for analysing user
is trained on publicly available datasets for insider threats. behaviour.
The results demonstrate the success the proposed model with
Multistate LSTM in detecting insider threats, achieving high
Figure 1 shows the comprehensive framework for
Area Under the Curve scores of 0.9042 on the training data
analysis of user behaviour pattern in web. It consists of five
and 0.9047 on the test data, indicating its accuracy in
phases.
identifying anomalous behaviour patterns related to insider
threats. •weblogs
[6] The paper focuses on evaluating user behaviour in a •Pre-processing
distributed computing environment using ml algorithms. The •knowledge base
primary goal is to distinguish closely related user groups that •Clustering
exhibit similar behaviour patterns. The researchers record and •Classifier
store behaviour-related events in a database for analysis.
Three ml algorithms, namely K-Nearest Neighbour (KNN), The Kaggle website is where the data is gathered. Web
Naive Bayes, and a decision tree method, were employed. logs, which are records or files that record and preserve
The evaluation shows that the decision tree method provides information about activities and interactions that take place
higher accuracy compared to the other two algorithms, on a website, are contained in the weblog. Web servers
making it more efficient in accurately discriminating between automatically produce web logs, also known as "web server
closely related user groups based on their behaviour patterns. logs" or "access logs," as users access and engage with the
online pages and resources stored on the server.

Authorized licensed use limited to: Georgia State University. Downloaded on May 14,2024 at 19:19:19 UTC from IEEE Xplore. Restrictions apply.
A technique called pre-processing is used on the data set. better fit of the model to the data. The formula for RMSE is
The actions taken to prepare and modify raw data into a calculated as the square root of the mean squared error
format that can be efficiently utilised by ml algorithms are between the actual and predicted values.
referred to as pre-processing in machine learning. The
R-squared (R2) is another metric used to evaluate how
effectiveness of machine learning models is greatly
well the model's predictor variables account for the variance
influenced by the quality and relevance of the data; therefore
in the response variable. This value ranges between 1 and 0,
pre-processing is a vital component of the overall data
with a higher R2 score indicating a better fit of the model.
preparation process.
The relationship between the predictor variables and the
A knowledge base is a centralised database or repository
response variable is established through models, and the
used to contain structured and organised knowledge or data
process of fitting the model involves determining how
about a certain field. It is useful tool for gathering, keeping,
effectively it can predict the value of the response variable
and disseminating knowledge within a company or for
based on the predictor variables.
general accessibility. Businesses, educational institutions,
customer support teams, or any other entity that deals with a Throughout the study, we apply various classification
significant volume of information can develop and uses machine learning algorithms to the dataset to assess their
knowledge base. performance. By analyzing the results, we aim to identify the
algorithm for our specific dataset and use it to predict whether
The clustering technique involves assembling related data
a behavior is normal or abnormal based on user inputs.
points based on their intrinsic patterns or similarities. It is an
unsupervised learning technique; therefore, no labels or The model applied in this study includes:
predetermined classes are necessary.
1. Logistic Regression
In machine learning, a classifier is a model or algorithm 2. Random Forest
that discovers patterns and relationships using labelled 3. Decision Tree
training data to make predictions or assign class labels to
4. Ada Boost
hidden or unlabelled data points. Since it uses supervised
learning, a labelled dataset with input characteristics and 5. Gradient Boost
corresponding target labels is necessary. 6. KNN
7. Voting Classifier
In this part we discuss about the methods we used in each
step of our analysis. 8. Light GBM

A. COLLECTION OF DATASETS: RMSE and R2 metrics were used with these models to
The Weblog dataset was obtained from the Kaggle evaluate how well the model fits the dataset.
website. Weblog contains the following details. IV. RESULT AND ANALYSIS
• inter_api_access_duration(sec) Algorithms Applied:
• api_access_uniqueness
• sequence_length(count) A. Logistic Regression
• vsession_duration(min) Logistic regression is a statistical method used for binary
• num_sessions classification tasks, where the goal is to predict the likelihood
• num_users of an event belonging to one of two classes (typically labeled
• num_unique_apis, as 0 and 1). It involves employing a logistic function to model
• source the connection between the input features (independent
• classification variables or predictors) and the probability of the binary
B. Cleaning the dataset outcome.
The data is cleaned and prepared to help plot graphs. To
plot the graphs, first extra labels such as “_ID” were removed LOGISTIC REGRESSION
from the dataset. Using python, labels such as “Sl.no” were
1.2
removed to train the model. The dataset was split into training 0.97
1 0.91
set which is of 70% and testing set which is of 30%.
0.8
Value

C. Proposed work using different techniques:

0.6
In this research paper, we employ machine learning
0.4
techniques to conduct an analysis of different algorithms to 0.14
find the best fit for our dataset. We utilize two measures, Root 0.2
Mean Squared Error (RMSE) and R-squared (R2), to assess 0
the model's performance in fitting the dataset. Additionally, R-SQUARED RMSE ACCUARACY
we implement a machine learning model using the Random VALUE
Forest Classifier. This model predicts whether a behavior is Metrices
normal or abnormal based on user inputs.
Root Mean Squared Error (RMSE) is a statistical metric
that quantifies the average distance between the expected and Fig.2 Graphical Representation of Metrices of Logistic Regression
observed values in the dataset. A lower RMSE indicates a

Authorized licensed use limited to: Georgia State University. Downloaded on May 14,2024 at 19:19:19 UTC from IEEE Xplore. Restrictions apply.
In Figure 2, a Logistic Regression model achieves an R- and makes accurate predictions (low RMSE), are the
squared value of 0.91, indicating a high explanatory power desirable characteristics for a well-performing model.
where 91% of the variance is explained. Additionally, the
model exhibits a low (RMSE) of 0.14, which reflects accurate
predictions, making it a well-performing model.
B. Decision Tree
Random Forest
A decision tree classifier is a specific type of decision tree
method used for classification tasks. It falls under the 1.2
category of supervised learning and is employed to predict 0.99 0.99
categorical class labels for examples based on their 1
characteristics. The main motive is to divide the data into 0.8
purest subsets or minimize impurity (e.g., Gini impurity or

Value
entropy) by selecting the best feature and corresponding 0.6
0.44
threshold at each node. This process is same for both internal
0.4
and leaf nodes till certain stopping criteria are met, such as
reaching a maximum depth, having a minimum number of 0.2
samples in a node, or all instances in a node belonging to the
same- class. 0
R-SQUARED RMSE ACCUARACY
VALUE
Decision Tree Metrices
1.2 Fig.4 Graphical Representation of Metrices of Random Forest
1 1
1
0.8 D. AdaBoost
Value

0.6 AdaBoost (Adaptive Boosting) is another prominent

0.4 ensemble learning method that may be utilized for
classification and regression applications. AdaBoost, like
0.2
0 Random Forest, is intended to enhance the performance of
0 weak learners (usually decision trees) by combining them
R-SQUARED RMSE ACCUARACY into a stronger, more accurate model.
VALUE
The core principle behind AdaBoost is to concentrate on
Metrices
occurrences in the trained data that are difficult to categorize
correctly. It iteratively trains a set of weak learners, putting
Fig 3. Graphical Representation of Metrices of Decision Tree
increasing weight on misclassified examples with each
iteration. The weights are allocated to the weak learners based
In Figure 3, the graphical representation of metrics for the on their accuracy, and the final model is built by integrating
Decision Tree, indicates that it achieves an RMSE of 0, their weighted predictions.
implying a perfect match between its predictions and the
actual values in the dataset. Furthermore, the R2 value of 1 AdaBoost
signifies that the Decision Tree model fits the data perfectly,
explaining 100% variance in the target variable using the 1.2
0.96
provided features, with no unexplained variance. 1 0.86
C. Random Forest 0.8
VALUE

Random Forest is a model that combines various decision 0.6

trees to create a more robust and accurate classifier. It is 0.4
widely used for both classification and regression tasks in 0.18
0.2
machine learning.
0
The Random Forest classifier works by training many R-SQUARED RMSE ACCUARACY
decision trees and then aggregating their predictions to make VALUE
the final classification. Each decision tree in the forest is METRICES
trained on a random subset of the training data. This
randomization helps in reducing overfitting and improving
the performance of the model. This randomness aids in the Fig 5. Graphical Representation of Metrices of AdaBoost
reduction of overfitting and enhances the generalization of
the model. As shown in Fig4. In summary, an R2 score of 0.86 and
an RMSE of 0.18 for the AdaBoost model are positive
As shown in Fig4. R-squared value of 0.99 and an RMSE
indicators of a well-performing model, suggesting that the
of 0.44 for a Random Forest model means that the model has
model is both explaining a significant amount of variance in
a high explanatory power (99% of the variance is explained)
the target variable and making accurate predictions.

Authorized licensed use limited to: Georgia State University. Downloaded on May 14,2024 at 19:19:19 UTC from IEEE Xplore. Restrictions apply.
E. Gradient Boost
Gradient Boosting, like AdaBoost, combines numerous Light GBM
weak learners (often decision trees) to build a powerful and 1
accurate classifier. The essential distinction, though, is in how 0.8
it constructs the ensemble of weak learners. 0.8

VALUE
Gradient Boosting constructs the ensemble in a stepwise 0.6
0.43
fashion, with each weak learner trained to rectify the faults 0.4
caused by the prior learners. It applies a gradient descent 0.205
optimization approach to minimize the loss function, which 0.2
assesses the variation between the actual and anticipated class 0
labels. R-SQUARED RMSE ACCUARACY
VALUE

Gradient Boost METRICES

1 0.92

0.8 0.69 Fig 7. Graphical Representation of Metrices of Light GBM

VALUE

0.6 As observed in the evaluation of the Light GBM model,

0.4 0.27 the R-squared score of 0.20 indicates that it may not
0.2 effectively capture the underlying relationships within the
data. This suggests that the model's performance is lower than
0
expected for a well-performing model. Additionally, the
R-SQUARED RMSE ACCUARACY
RMSE (Root Mean Squared Error) value of 0.43 for Light
VALUE
GBM signifies that, on average, its predictions deviate from
METRICES the actual values (ground truth) by approximately 0.43 units.
G. KNN
Fig. 6. graphical representation of metrics for the Gradient Boosting K-Nearest Neighbors (KNN) is a straightforward and
intuitive classification algorithm suitable for a multi-class
The model illustrates an R-squared score of 0.69, classification and binary classification tasks. It belongs to the
indicating that approximately 69% of the variance in the category of non-parametric and lazy learning algorithms,
target variable is accounted for by the model. Additionally, Instead, KNN memorizes the entire training dataset and
the Root Mean Squared Error (RMSE) of 0.27 suggests that, makes predictions the proximity of new data points to their k-
on average, the model's predictions deviate from the actual nearest neighbors in the training data.
values by approximately 0.27 units. In summary, the R-
squared score of 0.69 and RMSE of 0.27 suggest that the
Gradient Boosting model is performing moderately well in KNN 0.95
1
explaining the variance and making reasonably accurate 0.82
0.9
predictions.
0.8
F. Light GBM 0.7
0.6
Light GBM is a specialized implementation of gradient
VALUE

0.5
boosting designed for classification tasks. Its primary
0.4
objective is to design a predictive model capable of
0.3 0.2
classifying data examples into multiple groups or classes
0.2
based on their respective attributes. 0.1
Light GBM utilizes gradient boosting, which is a widely 0
used ensemble learning technique. accurate predictions. R-SQUARED RMSE ACCUARACY
VALUE
METRICES

Fig 8. Graphical Representation of Metrices of KNN

In Figure 8, the graphical representation of metrics for the

KNN model reveals an R-squared score of 0.82 and an RMSE
of 0.2. These results dipicts that the model demonstrates
reasonably good predictive performance. The R-squared
score signifies that the model effectively captures a
substantial portion of the variance in the target variable, while
the low RMSE suggests that the model's predictions are
accurate and exhibit minimal errors on average.

Authorized licensed use limited to: Georgia State University. Downloaded on May 14,2024 at 19:19:19 UTC from IEEE Xplore. Restrictions apply.
H. Voting Classifier
The Voting Classifier Combines predictions from Comparision
multiple individual classifiers, also known as base or Voting… 0.205 0.43 0.8
component classifiers, to make the final prediction. The KNN 0.82 0.2 0.95
primary concept behind the Voting Classifier is to harness the Light GBM 0.205 0.43 0.8
collective knowledge of multiple classifiers, utilizing their
Gradient Boost 0.69 0.27 0.92
respective strengths and compensating for their individual
weaknesses. This approach often leads to improved overall AdaBoost 0.86 0.18 0.96
performance and more robust predictions compared using Decision Tree 1 0 1
classifier. Random Forest 0.99 0.44 0.99
Logistic… 0.91 0.14 0.97

Algorithms
The R2 score of 0.20 suggests that the Voting Classifier
model is capturing the underlying relationships in data very 0% 50% 100%
well. It might be underperforming compared to what is Values
expected from a good model. The R2 score of 0.20 suggests R2 Score RMSE Accuracy
that the Voting Classifier model is capturing the underlying
relationships in data very well. It is underperforming
Fig.10. Graphical Representation of Metrices of algorithms
compared to what is expected from a good model.
As shown in Table 10. and Fig 10. Random Forest,
Logistic Regression and Decision Tree are the best fit models
our Data Set, since they have high R2 score and low RMSE
Voting Classifier value. when a model has both a high R-squared score and a
0.9 0.8 low RMSE value, it suggests that the model is a good fit to
0.8
0.7 the data, explains a significant proportion of the variance in
0.6 the dependent variable, and provides accurate predictions.
VALUE

0.5 0.43
We have implemented a Random Forest classifier
0.4
machine learning model for predicting behavior based on
0.3 0.205
0.2
some input parameters, it predicts whether the behavior of the
0.1 user is malicious or normal. A function is defined.
0 The function starts by collecting input values from the
R-SQUARED RMSE ACCUARACY user using the . Each input corresponds to a specific feature
VALUE for the prediction. The code then reads a dataset from a CSV
METRICES file . It separates the features (X) and the target variable (y).
Next, it splits the data into training and testing sets. The
Fig 9. Graphical Representation of Metrices of Voting Classifier
training set is used for training the model.
I. Comparison of metrics of all Algorithms
The user input values are converted into a list named
TABLE 1. COMPARISON TABLE FOR MATRIX OF ALGORITHMS x_text and then transformed into a array. This is to match the
shape expected by the classifier.
Algorithms R2 Score RMSE Accuracy
Based on the predicted result, a message (msg) is assigned
Logistic Regression 0.91 0.14 0.97
either "Normal Behavior" or "Abnormal Behavior".
Random Forest 0.99 0.44 0.99

Decision Tree 1 0 1

AdaBoost 0.86 0.18 0.96

Gradient Boost 0.69 0.27 0.92

Light GBM 0.205 0.43 0.8

KNN 0.82 0.2 0.95

Voting Classifier 0.205 0.43 0.8 Fig 11. Normal Behavior Prediction

Authorized licensed use limited to: Georgia State University. Downloaded on May 14,2024 at 19:19:19 UTC from IEEE Xplore. Restrictions apply.
Technology (GECOST), Miri Sarawak, Malaysia, 2022, pp. 314-319,
doi: 10.1109/GECOST55694.2022.10010386.
[13] S. Dong, Y. Xia and T. Peng, "Network Abnormal Traffic Detection
Model Based on Semi-Supervised Deep Reinforcement Learning," in
IEEE Transactions on Network and Service Management, vol. 18, no.
4, pp. 4197-4212, Dec. 2021, doi: 10.1109/TNSM.2021.3120804.
[14] M. S. Ashraf, F. Rehman, H. Sharif, M. Aqeel, M. Arslan and A. Rida,
"Spam Consumer’s Reviews Detection for E-Commerce Website using
Linguistic Approach in Deep Learning," 2022 3rd International
Conference on Innovations in Computer Science & Software
Engineering (ICONICS), Karachi, Pakistan, 2022, pp. 1-7, doi:
10.1109/ICONICS56716.2022.10100351.
Fig 12. Abnormal Behavior Prediction [15] C. H. Sumanth, P. P. Kalyan, B. Ravi and S. Balasubramani., "Analysis
As shown in Fig 11 and Fig 12. Based on the input weblog of Credit Card Fraud Detection using Machine Learning Techniques,"
2022 7th International Conference on Communication and Electronics
values from the user the Random Classifier model predicts Systems (ICCES), Coimbatore, India, 2022, pp. 1140-1144, doi:
whether the user behavior is normal or Malicious. 10.1109/ICCES54183.2022.9835751.

V. CONCLUSION
In summary, the dataset used in this study was sourced
from the Kaggle website. The data underwent preparation and
preprocessing, and various ML algorithms were applied to
generate graphs and calculate R-squared scores and RMSE
for comparison. The utilization of machine learning
techniques allowed for accurate analysis of the dataset.
Additionally, the implementation of the Random Forest
classifier algorithm model enabled the prediction of whether
a user's behavior is normal or abnormal.
REFERENCES
[1] L. Adilova, L. Natious, S. Chen, O. Thonnard and M. Kamp, "System
Misuse Detection Via Informed Behavior Clustering and Modeling,"
2019 49th Annual IEEE/IFIP International Conference on Dependable
Systems and Networks Workshops (DSN-W), Portland, OR, USA,
2019, pp. 15-23, doi: 10.1109/DSN-W.2019.00011
[2] Ashwini, K Viswavardhan Reddy“Predicting the User Behavior
Analysis using Machine Learning Algorithms.” International Research
Journal of Engineering and Technology
[3] Krishnamoorthy, Sowndarya, Rueda, Luis,Saad, Sherif, Elmiligi,
Haytham, 2018 “Identification of User Behavioral Biometrics for
Authentication using Keystroke Dynamics and Machine Learning”
[4] S. Vanjire and M. Lakshmi, "Behavior-Based Malware Detection
System Approach For Mobile Security Using Machine Learning," 2021
International Conference on Artificial Intelligence and Machine Vision
(AIMV), Gandhinagar, India, 2021, pp. 1-4, doi:
10.1109/AIMV53313.2021.9671009.
[5] Singh, Malvika,Mehtre, B.M.S Sangeetha ,2019/01/01, “User Behavior
Profiling using Ensemble Approach for Insider Threat “
[6] Callara, Matias,Wira, Patrice, 2018/11/01,“User Behavior Analysis
with Machine Learning Techniques in Cloud Computing Architecture”
[7] Y. Tao, S. Guo, C. Shi and D. Chu, "User Behavior Analysis by Cross-
Domain Log Data Fusion," in IEEE Access, vol. 8, pp. 400-406, 2020,
doi: 10.1109/ACCESS.2019.2961769.
[8] Rohit Ranjan, Shashi Shekhar Kumar , Volume 2, Issue 1, March
2022, 100034 .”User behavior analysis using data analytics and
machine learning to predict malicious user versus legitimate”
[9] D. F. Galletta, R. Henry, S. McCoy, and P. Polak, “When the Wait
Isn’t So Bad: The Interacting Effects of Website Delay, Familiarity,
and Breadth, ” Information Systems Research, vol. 17, no. 1, pp. 20-
37, 2006.
[10] J. Palmer, “Web Site Usability, Design, and Performance Metrics,
”Information Systems Research, vol. 13, no. 2, pp. 151-167, 2002.
[11] Y. Chen and W. Liu, "The Sentiment Attitude of Weibo Users towards
Annual Individual Income Tax Return: Based on Natural Language
Processing and Machine Learning Methods," 2023 IEEE 6th
International Conference on Big Data and Artificial Intelligence
(BDAI), Jiaxing, China, 2023, pp. 67-72, doi:
10.1109/BDAI59165.2023.10256913.
[12] A. Saleem Raja, B. Sundarvadivazhagan, R. Vijayarangan and S.
Veeramani, "Malicious Webpage Classification Based on Web Content
Features using Machine Learning and Deep Learning," 2022
International Conference on Green Energy, Computing and Sustainable

Authorized licensed use limited to: Georgia State University. Downloaded on May 14,2024 at 19:19:19 UTC from IEEE Xplore. Restrictions apply.

Machine Learning Approaches To Classification of Online Users by Exploiting Information Seeking Behaviours
No ratings yet
Machine Learning Approaches To Classification of Online Users by Exploiting Information Seeking Behaviours
6 pages
Web Usage Mining Master Thesis
100% (2)
Web Usage Mining Master Thesis
7 pages
Thesis On Web Log Mining
100% (3)
Thesis On Web Log Mining
8 pages
Irjet V7i7318
No ratings yet
Irjet V7i7318
7 pages
A New Intelligent Algorithm To Create A Profile Fo
No ratings yet
A New Intelligent Algorithm To Create A Profile Fo
6 pages
H 5
No ratings yet
H 5
13 pages
A Survey On Preprocessing Methods For Web Mining
No ratings yet
A Survey On Preprocessing Methods For Web Mining
6 pages
Web Data Mining - 5
No ratings yet
Web Data Mining - 5
14 pages
Ijdkp 030204
No ratings yet
Ijdkp 030204
20 pages
Mini Final
No ratings yet
Mini Final
7 pages
Log-Based Session Profiling and Online Behavioral Prediction in ECommerce Websites
No ratings yet
Log-Based Session Profiling and Online Behavioral Prediction in ECommerce Websites
17 pages
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
No ratings yet
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
7 pages
11997-Etarjome English
No ratings yet
11997-Etarjome English
10 pages
User Behavior Path Analysis Based On Sales Data
No ratings yet
User Behavior Path Analysis Based On Sales Data
12 pages
Web Mining
No ratings yet
Web Mining
6 pages
An Analysis of Web User Behavior Using Hybrid Algorithm Based On Sequential Pattern Mining
No ratings yet
An Analysis of Web User Behavior Using Hybrid Algorithm Based On Sequential Pattern Mining
8 pages
Acstv10n5 65
No ratings yet
Acstv10n5 65
12 pages
Cluster Optimization For Improved Web Usage Mining
No ratings yet
Cluster Optimization For Improved Web Usage Mining
6 pages
Advance Clustering Technique Based On Markov Chain For Predicting Next User Movement
No ratings yet
Advance Clustering Technique Based On Markov Chain For Predicting Next User Movement
7 pages
User Navigation Pattern Prediction From Web Log Data: A Survey
No ratings yet
User Navigation Pattern Prediction From Web Log Data: A Survey
6 pages
Ijca PDF
No ratings yet
Ijca PDF
9 pages
Exploring Process Mining For Analyzing User Navigation Behavior
No ratings yet
Exploring Process Mining For Analyzing User Navigation Behavior
10 pages
What User Wants, How User Gets The Same: Extraction of User Preferences in Web Site
No ratings yet
What User Wants, How User Gets The Same: Extraction of User Preferences in Web Site
6 pages
Review of User Behavior Analysis Based On Big Data
No ratings yet
Review of User Behavior Analysis Based On Big Data
5 pages
9-Advanced Preprocessing Using Distinct User
No ratings yet
9-Advanced Preprocessing Using Distinct User
5 pages
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
No ratings yet
Sat - 100.Pdf - Prediction of Cyber Attacks Using Data Science Technique
11 pages
Ijesat 2012 02 Si 01 12
No ratings yet
Ijesat 2012 02 Si 01 12
5 pages
User Behavior Analysis Based On User Interest by Web Log Mining
No ratings yet
User Behavior Analysis Based On User Interest by Web Log Mining
5 pages
Web Usage Mining For Extracting Users' Navigational
No ratings yet
Web Usage Mining For Extracting Users' Navigational
7 pages
Analysis of Web Server Log Files
No ratings yet
Analysis of Web Server Log Files
8 pages
Analysis of User Identification Methods in Web Usage Mining: Abstract
No ratings yet
Analysis of User Identification Methods in Web Usage Mining: Abstract
9 pages
A Study On User Future Request Prediction Methods Using Web Usage Mining
No ratings yet
A Study On User Future Request Prediction Methods Using Web Usage Mining
5 pages
Unit 5 DM
No ratings yet
Unit 5 DM
61 pages
An Improved Heuristic Approach To Page Recommendation in Web Usage Mining
No ratings yet
An Improved Heuristic Approach To Page Recommendation in Web Usage Mining
4 pages
Our Topic:: Web Usage Mining
No ratings yet
Our Topic:: Web Usage Mining
51 pages
Clustering and Classification
No ratings yet
Clustering and Classification
1 page
Web Mining and Knowledge Discovery of Usage Patterns - A Survey
No ratings yet
Web Mining and Knowledge Discovery of Usage Patterns - A Survey
27 pages
Artificial Neural Network Approach For Student S Behavior Analysis
No ratings yet
Artificial Neural Network Approach For Student S Behavior Analysis
5 pages
Linear Regression
83% (6)
Linear Regression
499 pages
An Effective Web Usage Analysis Using Fuzzy Clustering: P.Nithya, P.Sumathi
No ratings yet
An Effective Web Usage Analysis Using Fuzzy Clustering: P.Nithya, P.Sumathi
6 pages
Content 1) Introduction 2) Brief Review of The Work Done in The Related Field 3) ) Noteworthy Contributions 4) Proposed Methodology 5) Expected Outcome 6) References
No ratings yet
Content 1) Introduction 2) Brief Review of The Work Done in The Related Field 3) ) Noteworthy Contributions 4) Proposed Methodology 5) Expected Outcome 6) References
5 pages
A Data Warehousing and Data Mining Framework For Web Usage Management
No ratings yet
A Data Warehousing and Data Mining Framework For Web Usage Management
24 pages
An Artificial Ant Colony Methodology For Users Navigation Patterns Mining
No ratings yet
An Artificial Ant Colony Methodology For Users Navigation Patterns Mining
4 pages
Student - Dummy Variable Issue
No ratings yet
Student - Dummy Variable Issue
3 pages
User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns
No ratings yet
User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns
5 pages
Algorithm For Tracing Visitors' On-Line Behaviors
No ratings yet
Algorithm For Tracing Visitors' On-Line Behaviors
7 pages
Web Mining PPT 4121
No ratings yet
Web Mining PPT 4121
18 pages
Log Paper-1
No ratings yet
Log Paper-1
15 pages
Web Mining Notes
100% (1)
Web Mining Notes
8 pages
Web Mining Using Artificial Ant Colonies: A Survey
No ratings yet
Web Mining Using Artificial Ant Colonies: A Survey
6 pages
An Enhanced Pre-Processing Research Framework For Web Log Data
No ratings yet
An Enhanced Pre-Processing Research Framework For Web Log Data
7 pages
The Research and Application of Web Log Mining Based On The Platform Weka
No ratings yet
The Research and Application of Web Log Mining Based On The Platform Weka
6 pages
Towards An Extensible Web Usage Mining Framework For Actionable Knowledge
No ratings yet
Towards An Extensible Web Usage Mining Framework For Actionable Knowledge
6 pages
An Optimized K-Harmonic Mean Based Clustering User Navigation Patterns
No ratings yet
An Optimized K-Harmonic Mean Based Clustering User Navigation Patterns
4 pages
User Profiling: Web Usage Mining
No ratings yet
User Profiling: Web Usage Mining
4 pages
Mining Web Log Files For Web Analytics and Usage Patterns To Improve Web Organization
No ratings yet
Mining Web Log Files For Web Analytics and Usage Patterns To Improve Web Organization
9 pages
Correlation UNIT III
No ratings yet
Correlation UNIT III
2 pages
Bda Class - Feb 7th
No ratings yet
Bda Class - Feb 7th
28 pages
Ijctt V3i1p138
No ratings yet
Ijctt V3i1p138
7 pages
Classn 439
No ratings yet
Classn 439
6 pages
Behavior Study of Web Users Using Two-Phase Utility Mining and Density Based Clustering Algorithms
No ratings yet
Behavior Study of Web Users Using Two-Phase Utility Mining and Density Based Clustering Algorithms
6 pages
Ijctt V3i4p110
No ratings yet
Ijctt V3i4p110
3 pages
Earned Value Project Management - Improving The Predictive
No ratings yet
Earned Value Project Management - Improving The Predictive
8 pages
R For Health Data Science Ewen Harrison Riinu Pius Download
No ratings yet
R For Health Data Science Ewen Harrison Riinu Pius Download
78 pages
CC Quality Control
No ratings yet
CC Quality Control
8 pages
Arch. Assignments Stat.
No ratings yet
Arch. Assignments Stat.
3 pages
2nd Sem Final Exam in Statistics
No ratings yet
2nd Sem Final Exam in Statistics
12 pages
Kruskal and Wallis 1952
No ratings yet
Kruskal and Wallis 1952
40 pages
Diff - Simplifying The Estimation of Difference-In-difference Treatment Effects
No ratings yet
Diff - Simplifying The Estimation of Difference-In-difference Treatment Effects
20 pages
Short Quizzes 13-15
No ratings yet
Short Quizzes 13-15
9 pages
Correlation and Regression: Statistics For Economics 1
No ratings yet
Correlation and Regression: Statistics For Economics 1
72 pages
A. Discriminant B. Dispersion C. Range D. Standard Deviation
100% (1)
A. Discriminant B. Dispersion C. Range D. Standard Deviation
1 page
CU - P2 - Statistical Analysis - Sukhvinder
No ratings yet
CU - P2 - Statistical Analysis - Sukhvinder
11 pages
Econ G2 Final
No ratings yet
Econ G2 Final
10 pages
Exercises Chapter 2
No ratings yet
Exercises Chapter 2
8 pages
13 Correlation Analysis 1633738603
No ratings yet
13 Correlation Analysis 1633738603
17 pages
Morey - 2016 - The Fallacy of Placing Confidence Interval
No ratings yet
Morey - 2016 - The Fallacy of Placing Confidence Interval
21 pages
Module-5-Statistics-And-Probability 11
No ratings yet
Module-5-Statistics-And-Probability 11
9 pages
Understanding Statistical Power in The Context of Applied Research
No ratings yet
Understanding Statistical Power in The Context of Applied Research
8 pages
How To Use EViews by Lei Lei
No ratings yet
How To Use EViews by Lei Lei
20 pages
Answer Key Variability
No ratings yet
Answer Key Variability
2 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Highly Variable Drugs (HVDS)
No ratings yet
Highly Variable Drugs (HVDS)
4 pages
T-Test: Psychological Statistics Prof. Meg Ferrer, RPM
No ratings yet
T-Test: Psychological Statistics Prof. Meg Ferrer, RPM
20 pages
Hypothesis Melted Ice
No ratings yet
Hypothesis Melted Ice
3 pages
Preliminary Selection of Mean Structure
No ratings yet
Preliminary Selection of Mean Structure
9 pages
Contoh Perhitungan T Test
No ratings yet
Contoh Perhitungan T Test
8 pages
TH TH
No ratings yet
TH TH
1 page
Assessment 1 - Sta404 - Nov 2021 - Week 7
No ratings yet
Assessment 1 - Sta404 - Nov 2021 - Week 7
2 pages

Analysis of User Behavior Patterns Using Machine Learning Algorithms

Uploaded by

Analysis of User Behavior Patterns Using Machine Learning Algorithms

Uploaded by

2023 International Conference on Recent Advances in Science & Engineering Technology (ICRASET)

Analysis of User Behavior Patterns using Machine

C. Proposed work using different techniques:

0.6 AdaBoost (Adaptive Boosting) is another prominent

Random Forest is a model that combines various decision 0.6

Gradient Boost METRICES

0.8 0.69 Fig 7. Graphical Representation of Metrices of Light GBM

0.6 As observed in the evaluation of the Light GBM model,

Fig 8. Graphical Representation of Metrices of KNN

In Figure 8, the graphical representation of metrics for the

AdaBoost 0.86 0.18 0.96

Gradient Boost 0.69 0.27 0.92

Light GBM 0.205 0.43 0.8

KNN 0.82 0.2 0.95

You might also like