Ieee Edited
Ieee Edited
Web Analytics
Mr. V. Pavan Kumar, Surendra Duvvapu Sri Hasa Reddy Edula
Department of Computer Science and Department of Computer Science and
Assistant Professor Engineering Engineering
Department of Computer Science and B.V Raju Institute of Technology
B.V.Raju Institute of Technology
Engineering Affiliated to JNTUH Affiliated to JNTUH
B.V.Raju Institute of Technology Vishnupur,Narsapur,Medak,Telangana Vishnupur,Narsapur,Medak,Telangana
Affiliated to JNTUH State,India State,India
Vishnupur,Narsapur,Medak,Telangana [email protected] [email protected]
State,India
[email protected]
Abstract— The increase in terrorism around the world is a Naive Bayes is a probabilistic algorithm that calculates the
major concern for global security. Advancements in technology likelihood of a particular category based on the occurrence
have enabled terrorist organizations to use computer networks to of certain features in the text. It can be used to classify text
spread their propaganda through videos and speeches. These into different categories based on the extracted features.
groups also use the internet to recruit new members and engage in Logistic Regression is a statistical algorithm that models the
illegal activities. To combat these threats, web mining and data relationship between a dependent variable and one or more
mining techniques are used in combination to extract relevant independent variables.
information from various sources. Web mining employs various
techniques, such as citation mining, to extract relevant news from It can be used to classify text based on the extracted
unstructured data. One common technique is idea mining, which is features. K-nearest neighbors (KNN) is a non-parametric
used to locate patterns, keywords, and relevant facts in algorithm used for classification and regression. It compares a
disorganized texts. Text mining also utilizes data and web mining new data point to its k nearest neighbors in the training data to
techniques to extract relevant information for analysis. By determine its label or value. Steps include selecting k,
leveraging these techniques, security experts can gain insights into calculating distances, and identifying the nearest neighbors. For
potential threats and inform policy development to counteract the classification, the majority vote determines the label; for
spread of terrorism through online channels. regression, the average predicts the value. Important
considerations include choosing an appropriate k, feature
Keywords— web excavating, web mining, data mining, text scaling, and the lack of an explicit decision boundary. Despite
mining. its simplicity, KNN is effective in certain scenarios, offering a
reliable approach to classification and regression tasks.
I. INTRODUCTION
To lower the predominance of hazardous websites The decision tree algorithm is a popular supervised
connected to the internet, we must conceive a scheme that learning method that creates a tree-like model to make
can identify phrases on a site. The webpage should if the predictions. It recursively splits the training data based on
terms are discovered for active happening of bureaucracy, feature conditions to build a hierarchical structure of decision
the content will be apparent as unfit. nodes and leaf nodes. Each decision node represents a feature
and a condition, while each leaf node represents a predicted
In addition to using text excavating methods to identify class or value. The splits are chosen to maximize information
hazardous websites, it is also important to promote digital gain or minimize impurity, aiming to create homogeneous
literacy and cybersecurity education among internet users. subsets. The resulting tree can be used to classify new instances
Many people are unaware of the risks associated with online or make regression predictions based on the learned decision
activity and are therefore more vulnerable to scams, phishing rules. Decision trees are interpretable, versatile, and capable of
attacks, and other forms of cybercrime. By providing education handling both numerical and categorical data.
and resources on safe internet use, we can help individuals
protect themselves and their personal information from
malicious actors. The gradient boosting algorithm is a powerful ensemble
There are various machine learning algorithms that can learning method that combines multiple weak predictive
be used for text mining and classification, including Random models to create a strong predictive model. It works by
Forest, Decision Tree, Naive Bayes, Logistic Regression, iteratively adding new models to correct the errors made by the
and k-Nearest Neighbors, Gradient Boosting. Random previous models. At each iteration, a new model is trained to
Forest is an ensemble learning method that combines predict the residuals of the previous models. The models are
multiple decision trees to improve the accuracy of the typically decision trees, and the learning process is guided by
predictions. It can be used to classify text into different gradient descent optimization. By combining the predictions of
categories based on the extracted features. Decision Tree is a all the models, the algorithm gradually improves the overall
supervised learning algorithm that builds a tree-like model accuracy and reduces the bias. providing high predictive
of decisions and their possible consequences. It can be used performance and handling complex datasets.
to classify text based on a set of rules and criteria.
Another important step towards a safer online environment [8]T.Anand et al. implemented Data mining as well as web mining
is to improve the security of websites themselves. Website are used together at times for efficient system development. System
owners should implement secure protocols such as HTTPS and will track web pages that are more susceptible to terrorism and will
use strong passwords to prevent unauthorized access. Regular report IP Address to the user who is using the system.
security audits and updates should also be conducted to
identify and address any vulnerabilities. Ultimately, creating a
safer internet requires a collaborative effort between PROPOSED SYSTEM
governments, cybersecurity professionals, website owners, and
individual users. By working together and implementing Counterterrorism Intelligence Gathering Web Analytics is the need to
effective measures to combat online threats, we can ensure a effectively identify and track online terrorist activity, as well as to
more secure and trustworthy digital landscape for everyone. analyze and interpret large amounts of unstructured data found on the
internet. This requires the use of advanced web analytics techniques to
The antagonistic-terrorism and cyber security answer identify and monitor hazardous websites and other online content, and
instrumentalities should benefit from this order. The to extract relevant information and patterns from this data. The
technology concede possibility assist law enforcement in challenge is to develop a system that can accurately and efficiently
following terrorists' means as well as within financial means identify potential threats and alert law enforcement and other relevant
identify websites constituted on various planks. agencies in a timely manner. Furthermore, the system must also be able
to adapt to new and emerging threats and technologies, ensuring that it
II. LITERATURE REVIEW remains effective and relevant in an ever-changing digital landscape.
The architecture for Counterterrorism intelligence Gathering Web
[1] Guler, E. R., & Ozdemir, S.They proposed relation Analytics includes multiple components such as web scraping, data
with big and streaming data are emphasizes the utilization preprocessing, machine learning algorithms, and data visualization.
of big data and streaming data to enhance the analysis and Data is collected from various sources such as social media, news
intelligence gathering process for combating terrorism.By articles, and government databases. Preprocessing is performed to clean
leveraging large-scale datasets and real-time information, and transform the data, followed by applying machine learning
the aim is to identify patterns and trends associated with algorithms like Naive Bayes, Random Forest, and Logistic Regression
terrorist activities, improving counterterrorism for classification and prediction. The results are then visualized using
effectiveness. various charts, graphs, and maps to aid in decision-making and
intelligence gathering.
[2] Chung, W., & Tang, W. The recent decade has
witnessed a rapid growth in domestic terrorism, with the We propose a technique whose primary goal is to create a website
internet playing a significant role in fueling its expansion. where users may search any page or website for any evidence of
Extensive studies have been conducted worldwide to terrorist activity. To accomplish this, our website will give users the
analyze and understand various cases of domestic terrorism option to enter the URL of the pages they desire to scan. Our
and its connection to online platforms. technology will count the words on the entire webpage when you
enter the URL and compare them to words in our database. We will
[3] Naseema Begum et al. classified the web pages into assign a score to every syllable that we keep in our database.
various categories and sorted them appropriately. There are
two features used in this system that are data mining and Our technology will retrieve the scores for each word that appears in
web mining. the user's web page through a database before calculating a
website's overall rank. This ranking will assess whether the user's
[4] Thongtae, P., & Srisuk, S recognized that data mining website has any indication of terrorism. Our system searches the
has emergered as a significant field over the past decade, disorganized text of a webpage for patterns, keywords, and relevant
offering valuable contributions to variety of tasks information using web mining and data mining approaches.
including, identifying terrorist activity control.
[5] Chen, H. et al. used the features of data mining to The implementation of the Counterterrorism intelligence Gathering
extract the words of a web page, classify them and assert a Web Analytics involves several steps. First, data is collected from
score to each word in "Sentiment Analysis in Multiple various sources, such as social media platforms and other web-based
Languages: Feature Selection for Opinion Classification in platforms. This data is then pre-processed to remove noise and
Web Forums." irrelevant information. Next, machine learning algorithms such as
Random Forest, Decision Tree, Naive Bayes, Logistic Regression, and
[6]T.Anand,S. Padmapriya,E. Kirubakaran . An LSTM K-Nearest Neighbors are used to analyze the data and identify patterns
model takes input from the output of a CNN model to related to terrorism. Finally, the results of the analysis are visualized
capture sequential correlation in a document. The model using tools such as dashboards and reports, which can be used by
considers previous data to capture global dependencies of a analysts to make informed decisions and act against potential terrorist
sentence. The goal is to classify tweets into extremist and threats.
non-extremist categories.
[7] Goradia, R., Mohite, S., Jhakhariya, A., & Pinjarkar,
V.They proposed to implement an efficient web data A. Machine Learning algorithms
mining system to detect such as web data properties and Random Forest:
flag then for further human review. A system with the
primary goal of developing a website where users can Random forest algorithms can classify and identify relevant data
check any website for any trace of terrorist activity. patterns and features in large and complex datasets. By using an
ensemble of decision trees, the algorithm can create a robust and
accurate model for predicting the likelihood of certain events or
outcomes based on a set of input variables. In the context of potential threats in online environments for counterterrorism efforts
Counterterrorism intelligence Gathering Web Analytics, this
could be used to identify and flag potential threats or
suspicious activity based on various data sources, such as
social media, online forums, or other publicly available
information.
Decision Tree:
User
A decision tree is a graph that resembles a tree, where
the nodes represent the points where we select an attribute
and ask a question, the edges represent the answers, and the
leaves represent the overall performance or class value.
They use a simple linear decision surface while making non-
linear judgements.
The decision tree algorithm is used in the classification of
web pages based on their content, identifying whether they are Check Unauthorize
relevant to counterterrorism intelligence gathering or not. It
helps to automate the process of identifying potentially useful
web pages and reduces the need for manual screening, saving
time and increasing efficiency.
Home Page
Naïve Bayes:
It can help to identify suspicious activities or communications
by analyzing the language used in online messages, emails, or
social media posts. The algorithm can calculate the probability
of certain keywords or phrases being associated with terrorist Open Detect Page
activities, which can aid in the detection of potential threats.
Naïve Bayes algorithm can also be used to filter out irrelevant
information and focus on relevant data to improve the accuracy
of the analysis.
Logistic Regression:
Upload URL link
Logistic regression is a method of prediction. We use
logistic regression to describe the data and show the link
between a single dependent binary parameter and one or
more independent nominal, ordinal, interval, or ratio-level Scan Url
variables.
In the context of counterterrorism intelligence gathering, this
algorithm can be used to classify data into different categories
such as suspicious websites, potential threats, and so on. It can
also help in identifying key factors that contribute to the Detect Terrorism
likelihood of an event occurring, which can be useful in
developing counterterrorism strategies.
K-nearest Neighbors:
K nearest neighbors is a straightforward method that About Page
categorizes new arguments based on a cosine similarity and
stores all of the existing cases (e.g., distance functions).
KNN is a non-parametric technique that has been utilized in
statistical estimates and pattern recognition since the early
1970s. End process
Gradient Boosting algorithm
The Gradient Boosting algorithm is applied in
"Counterterrorism Intelligence Gathering through Web
Analytics" to analyze web data, identify patterns, and classify
extremist accounts, activities, and propaganda. It handles high- Fig 1 Architecture Diagram for
dimensional data, missing values, and iteratively improves counterterrorism intelligence
model performance. It aids in predicting and detecting gathering through web analytics
Data Set and Training point of entry. Homepages frequently present a salutation
to virtual visitors, furnish an exposition of the website's
content, and offer a navigational tool displaying links to
Collecting a comprehensive dataset for "Counterterrorism
additional websites.
Intelligence Gathering through Web Analytics" involves gathering
information from various sources. Firstly, social media platforms Open Detecting Page:
Upon selecting the detect page, an application form is
such as Twitter, Facebook, and Instagram provide valuable data on
accessed and therein, the URL is scanned.
extremist accounts, posts, conversations, and trends. Monitoring
the dark web and encrypted platforms, known for hosting terrorist Uploading URL and Scanning The Website:
content and communication, can unveil hidden networks and The URL can be uploaded by pasting and other way is
entering manually. Then the entire website is scanned by
activities.
using the URL and predicts whether there is any terrorist
Additionally, capturing data from publicly available websites activity or not.
affiliated with terrorist organizations, including forums and online
magazines, is crucial. This includes collecting articles, videos, Result Prediction:
After the identifying website having any terrorist activity,
recruitment materials, and propaganda disseminated through these
we can know whether it is spreading terrorism or not so
channels. we can decrease spread of terrorism in our region.
Monitoring online news sources, both mainstream and alternative,
helps capture reports on terrorist activities, attacks, and emerging About Page:
threats. Open-source intelligence (OSINT) platforms aggregate In the about page, we have methods used for detecting
the terrorist activity website and how the result will be
publicly available data from different online sources, aiding in
predicted
data extraction and identification of potential connections.
Official reports from law enforcement and intelligence agencies,
as well as academic research papers on terrorism-related topics,
contribute to the dataset. Lastly, analyzing user-generated content
such as blog posts, comments, and discussion forums can provide
insights into public sentiments, recruitment strategies, and
extremist narratives.
Care must be taken to ensure legal and ethical compliance,
including privacy considerations and protection of sensitive
information, throughout the dataset collection process.
Fig 2 Sign In page
MODULES:
• Home Page
• Open Detect Page
• Upload URL link
• Scan URL
• Detect Terrorism
• About Page