0% found this document useful (0 votes)
29 views16 pages

Unit 5

Data mining techniques are used in the financial sector to analyze large amounts of financial data and extract hidden patterns to predict future market trends. This allows analysis of factors like profitability, risk assessment, fraud detection, and targeted marketing. Data mining of financial data helps organizations make better capital investment decisions, assess loan risk, detect money laundering, and classify customers for marketing. It provides efficient, effective, accurate and scalable analysis of large datasets.

Uploaded by

priyabmishralove
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views16 pages

Unit 5

Data mining techniques are used in the financial sector to analyze large amounts of financial data and extract hidden patterns to predict future market trends. This allows analysis of factors like profitability, risk assessment, fraud detection, and targeted marketing. Data mining of financial data helps organizations make better capital investment decisions, assess loan risk, detect money laundering, and classify customers for marketing. It provides efficient, effective, accurate and scalable analysis of large datasets.

Uploaded by

priyabmishralove
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT-5

Data Mining For Financial Data Analysis


Data Mining is a quite strong field to execute advanced examination of data as well
as it carries off techniques and mechanisms from statistics and machine learning.
Business intelligence and advanced analytics applications use the information which
is generated by it which involves the analysis of verified data.
Financial analysis of data is very important in order to analyze whether the business
is stable and profitable to make a capital investment. Financial analysts focus their
analysis on the balance sheet, cash flow statement, and income statement.
Data mining techniques have been used to extract hidden patterns and predict future
trends and behaviors in financial markets. Advanced statistical, mathematical and
artificial intelligence techniques are typically required for mining such data,
especially the high-frequency financial data.

Benefits of Data Mining for financial data analysis:


 Efficient
 Effective
 Accurate
 Scalable
 Economical(Affordable)
Data Mining techniques related to finance can be utilized on categories which are
given below:
 Peak Sales
 Gross Profit and Net Sales
 Stockpile
Instances/Examples:
 Financial risk models built using data mining tools by banks and credit card
companies.
 Data mining also plays an important role in marketing(like detecting fraud,
financial applications).
Data mining can help in the following fields:
 Detection of money laundering and other financial crimes: Money laundering
is a criminal activity to convert black money into white money. In today’s world,
data mining approaches have been developed in such a way that it considered
appropriate techniques for identifying money laundering. The methodology of
data mining presents an approach for bank clients in order to identify or to check
the identification of the anti-money laundering effect.
 Loan payment prediction and customer credit policy analysis: Loan
Distribution is the fundamental part of the business of every bank. The loan
Prediction system automatically calculates the size of the features which is used
in it and also tests data concerning its size. So data mining helps in it with
managing all the vital data and their large databases with help of its models.
 Classification and clustering of customers for targeted marketing: The data
mining approaches along with marketing work together to target a specific
market, they also support and decide market decisions. With data mining, it helps
retain profits, margin, etc and decide which product is best for different kinds of
a costumer.
 Design and construction of data warehouses for multidimensional data
analysis and data mining: The organization manages to recover or transfer the
data into various large data warehouses, so different approaches or ways of data
mining help a large amount of data that can be analyzed properly as well as
accurately. It also checks a huge amount of transactions.

Data Mining for Retail and


Telecommunication Industries
Data Mining plays a major role in segregating useful data from a heap of big data.
By analyzing the patterns and peculiarities, it enables us to find the relationship
between data sets. When the unprocessed raw data is processed into useful
information, it can be applied to enhance the growth of many fields we depend on in
our day-to-day life.
This article shows the data mining role in the retail and telecommunication
industries.
Role of data mining in retail industries
In the dynamic and fast-growing retail industry, the consumption of goods increases
day by day which in turn increases the data collected and used. The retail industry
includes the sales of goods to the customer through retailers. It covers from a local
booth in the street to the big malls in cities. For eg: The grocery shop owner in a
defined area would know about their customer details after-sales for few months.
When he notes the need of his customer, it would be easy to enhance the sales. The
same happens in the big retail industries. They collect customers’ responses to a
product, the time zone, their location, shopping cart history, etc. Preference of
brands and products help the company to create targeted ad to increase the sales and
profit.

Knowing the customers:

What is the purpose of sales if the retailer doesn’t know who their customers are?
It’s a definite need to understand about their customers. It starts by analyzing them
with various factors. Finding the source by which the customer gets to know about
that retailing platform would help in enhancing the advertisement of retailers to
attract a completely new set of people. By finding the days they have frequently
purchased can help in discount sales or special boost up on festival days. The time
they spend buying per order can give us useful statistical data to enhance growth.
The amount of money spent on the order can help the retailer in separating the
customer crowd into groups of High paid orders, medium-paid orders, and low-paid
orders. This will increase the targeted customers or help in introducing customized
packages depending on price. By knowing the language and payment method
preferences, retailers can provide required services to satisfy the customers.
Managing a good business relationship with the customer can gain trust and loyalty
that can bring a rapid profit for the retailer. The retention of customers in their
company will help them to withstand the competition between similar other
companies.

RFM Value:

RFM stands for Recency, Frequency, Monetary value. Recency is nothing but the
nearest or recent time when the customer made a purchase. Frequency is how often
the purchase had taken place and Monetary value is the amount spent by the
customers on the purchase. RFM can surge monetization by holding on to the
regular and potential customers by keeping them happy with satisfying results. It can
also help in pulling back the trailing customers who tend to reduce the purchase. The
more the RFM score, the more the growth of sales is. RFM also prevents from
sending over requests to engaged customers and it helps to implement new
marketing techniques to low ordering customers. RFM helps in identifying
innovative solutions.
Market-based analysis:

The market-based analysis is a technique used to study and analyze the shopping
sequence of a customer to increase revenue/sales. This is done by analyzing datasets
of a particular customer by learning their shopping history, frequently bought items,
items grouped like a combination to use.
A very good example is the loyalty card issued by the retailer to customers. From the
customer’s point of view, the card is needed to keep track of discounts in the future,
incentive criteria details, and the history of transactions. But, if we take this loyalty
card from a retailer point of view, the applications of market-based analysis will be
layered inside to collect the details about the transaction.
This analysis can be achieved with data science techniques or various algorithms.
This can even be achieved without technical skills. Microsoft Excel platform is used
to analyze the customer purchases, frequently bought or frequently grouped items.
The spreadsheets can be organized by using ID as specified for different
transactions. This analysis helps in suggesting products for the customer which may
pair well with their current purchase which leads to cross-selling and improved
profits. It also helps to track the purchase rate per month or year. It manifests the
correct time for the retailer to make the desired offers to attract the right customers
for the targeted products.

Potent sales campaign:

Everything nowadays needs advertising. Because advertising the product helps


people know about its existence, use, and features. It takes the product from the
warehouse to the real world. If it has to attract the right customers, data must be
analyzed. This is the right call to sales or market campaign performed by the
retailers. The marketing campaigns must be initiated with the right plans else it may
lead to loss of company by over-investing in untargeted Advertisements. The sales
campaign depends on the time, location, and preference of the customer. The
platform in which the campaign takes place also plays a major role in pulling the
right customers in. It requires regular analysis of the sales and its associated data
taking place in a particular platform at a certain time. The traffic in social or network
platforms will give us the favoring of campaigned product or not. The retailer can
make changes in the campaign with the previous statistics which rapidly increases
the sales profit and prevents overspending. Learning about the customer profits and
the company profits can enhance the usage of campaigns. The number of sales per
one campaign can also guide the retailer on whether to invest in it or not. A trial-
and-error method can be converted into a well-transformed method by the efficient
handling of data. A multi-channel sale campaign also helps to analyze the purchases
and surges the revenue, profit, and number of customers.
Role of data mining in telecommunication industries
In the highly evolving and competitive surroundings, the telecommunication
industry plays a major in handling huge data sets of customers, network and call
data. To thrive in such an environment, the Telecommunication Industry must find a
way to handle data easily. Data Mining is preferred to enhance the business and to
solve the problem in this industry. The major function includes fraud call
identification and spotting the defects in a network to isolate the faults. Data mining
can also enhance effective marketing techniques. Anyways, this industry confronts
challenges in dealing with the logical and time aspect in data mining which calls the
need to foresee rarity in telecommunication data to detect network faults or buyer
frauds in real-time.

Call detail data:

Whenever a call starts in the telecommunication network, the details of the call are
recorded. The date and instant of time in which it happens, the duration of call along
with the time when it ends. Since all the data of a call is collected in real-time, it is
ready to be processed with data mining techniques. But we should segregate data
from the customer level not from isolated single phone call levels. Thus, by efficient
extraction of data, one can find the customer calling pattern.
Some of the data that help to find the pattern are
 average time duration of calls
 Time in which the call took place (Daytime/Night-time)
 The average number of calls on weekdays
 Calls generated with varied area code
 Calls generated per day, etc.
By sensing the proper customer call details, one can progress the business growth. If
a customer makes more calls during dayshift working hours, that makes them
distinguished as a part of a business firm. If the night-time call rate is high, it may be
used only for residential or domestic purposes. By the frequent variance in the area
code, one can segregate the business calls because people calling for the residential
purpose may call over limited area codes in a period. But the data collected in the
evening time cannot give the exact detail of whether the customer belongs to a
business or residential firm.
0 seconds of 15 secondsVolume 0%

Data of customers:

When it comes to the telecommunication industry, there would be an enormous


number of customers. This customer database is sustained for any further queries in
the data mining process. For example, when a customer fraud case is encountered,
these customer details would help in the identification of the person with the details
in the customer database like name, address of the person. It would be easy to trace
them and solve the issue. This dataset can also be extracted from external sources
because mostly this information would be common. It also includes the plan chosen
for subscription, proper payment history. By using this dataset, we can escalate the
growth in telecommunication industries.

Network Data:

Due to the use of well-developed complex appliances used in telecommunication


networks, there is a possibility that every part of the system may generate errors and
messages. This leads to a large amount of network data being processed. This data
must be separated, grouped, and stored in order if the system causes any network
fault isolation. This ensures that the error or status message of any part of the
network system would reach the technical specialist. So, they could rectify it. Since
the database is enormous, when a large number of status or error messages get
generated, it becomes difficult to solve the problems manually. So, some sets of
errors and messages can be automatized to reduce the strain. A methodical approach
of data mining can manage the network system efficiently which can enhance the
functions.

Preparing and clustering data:

Even though raw data are processed in data mining, it must be in a well sensed and
properly arranged format to be processed. And, in the telecommunication industry
dealing with the giant database, it’s an important need. First, clashing and contrary
data must be identified to avoid inconsistency. Making sure of the removal of
undesired data fields heaping space. The data must be organized and mapped by
finding the relationship between datasets to avoid redundancy.
Clustering or grouping similar data can be done by algorithms in the data mining
field. It can help in analyzing the patterns like calling patterns or customer behavior
patterns. Group of frequencies is made by analyzing the similarities between them.
By doing this, data can easily be understood which leads to easy manipulation and
use.

Customer profiling:

The telecommunication industry deals with a large scale of customer details. It starts
observing patterns of the customer from call data to profile the customers to predict
future trends. By knowing the customer pattern, the company can decide the
promotion methods offered to the customer. If the call ranges within an area code.
The promotion made in that aspect would gain a group of customers. This can
efficiently monetize the promotion techniques and stop the company from investing
in a single subscriber but it can attract a group of people with the right plan. Privacy
issues arise when the customer’s call history or details are monitored.
One of the significant problems that the telecommunication industry faces is
that Customer churn. This can also be stated as customer turnover in which the
company loses its client. In this case, the client leaves and switches to another
telecommunication company. If the customer churn rate is high in a company, the
respective company will experience severe loss of revenue and profit which will lead
to its decline in growth. This issue can be fixed by data mining techniques to collect
patterns of customers and profiling them. Incentive offers provided by companies
attract the regular user of some other company. By profiling the data, the customer
churn can be effectively forecasted by their behaviors like subscription history, the
plan they choose, and so on. While collecting data from the paid customers, it’s also
possible to collect data of the receiver or non-customer but with a set of restrictions.

Fraud detection:

Fraud is a critical problem for telecommunication industries which causes loss of


revenue and also causes a deterioration in customer relations. Two major fraud
activity involved is subscription theft and super-imposed frauds. The subscription
fraud involves collecting the details of customers mostly from the KYC(Know Your
Customer) documents like name, address, and ID proof details. These details are
needed to sign up for telecom services with authenticating approval but without any
type of intention to pay for using the service using the account. Some offender not
only stops with the illegitimate use of services but perform bypass fraud by diverting
voice traffic from local to international protocols which causes destructive loss to the
telecommunication company. In super-imposed frauds, it starts with a legitimate
account and a legal activity but with further lead to the overlapped or imposed
activity by some other person illegally using the services rather than the account
holder. But by collecting the behavioral pattern of the account holder, if a suspect is
found on super-imposed fraudulent activities it will lead to immediate actions like
blocking or deactivating the account user. This will prevent further damage to the
company.
These fraudulent activities can be reduced by using data mining techniques to collect
information of the customer and patterning their behavior like call details as said
earlier can lead to the detection of frauds. When the data detection is performed in
real-time, the frauds can easily be identified. This can also be done by comparing the
account of suspected call behavior with the general fraud profiles. If the call pattern
matches that of generic frauds, they can be detected. Instead of collecting data at the
individual user level, collecting data from the customer level can enhance this fraud
detection process. Sometimes the wrong classification of frauds may cause loss to
the company. So, they must know the relative price of letting go of a false call and
blocking a suspect for fraudulent activities with a legal account. The correct use of
data mining would help in dealing with this issue with accuracy.

Web Mining
Web Mining is the process of Data Mining techniques to automatically discover and
extract information from Web documents and services. The main purpose of web
mining is discovering useful information from the World-Wide Web and its usage
patterns.
Applications of Web Mining:
Web mining is the process of discovering patterns, structures, and relationships in
web data. It involves using data mining techniques to analyze web data and extract
valuable insights. The applications of web mining are wide-ranging and include:
Personalized marketing:
Web mining can be used to analyze customer behavior on websites and social media
platforms. This information can be used to create personalized marketing campaigns
that target customers based on their interests and preferences.
E-commerce
Web mining can be used to analyze customer behavior on e-commerce websites.
This information can be used to improve the user experience and increase sales by
recommending products based on customer preferences.
Search engine optimization:
Web mining can be used to analyze search engine queries and search engine results
pages (SERPs). This information can be used to improve the visibility of websites in
search engine results and increase traffic to the website.
Fraud detection:
Web mining can be used to detect fraudulent activity on websites. This information
can be used to prevent financial fraud, identity theft, and other types of online fraud.
Sentiment analysis:
Web mining can be used to analyze social media data and extract sentiment from
posts, comments, and reviews. This information can be used to understand customer
sentiment towards products and services and make informed business decisions.
0 seconds of 0 secondsVolume 0%
Loading ad

Web content analysis:


Web mining can be used to analyze web content and extract valuable information
such as keywords, topics, and themes. This information can be used to improve the
relevance of web content and optimize search engine rankings.
Customer service:
Web mining can be used to analyze customer service interactions on websites and
social media platforms. This information can be used to improve the quality of
customer service and identify areas for improvement.
Healthcare:
Web mining can be used to analyze health-related websites and extract valuable
information about diseases, treatments, and medications. This information can be
used to improve the quality of healthcare and inform medical research.
Process of Web Mining:

Web Mining Process

Web mining can be broadly divided into three different types of techniques of
mining: Web Content Mining, Web Structure Mining, and Web Usage Mining.
These are explained as following below.

Categories of Web Mining

1. Web Content Mining: Web content mining is the application of extracting


useful information from the content of the web documents. Web content consist
of several types of data – text, image, audio, video etc. Content data is the group
of facts that a web page is designed. It can provide effective and interesting
patterns about user needs. Text documents are related to text mining, machine
learning and natural language processing. This mining is also known as text
mining. This type of mining performs scanning and mining of the text, images
and groups of web pages according to the content of the input.
2. Web Structure Mining: Web structure mining is the application of discovering
structure information from the web. The structure of the web graph consists of
web pages as nodes, and hyperlinks as edges connecting related pages. Structure
mining basically shows the structured summary of a particular website. It
identifies relationship between web pages linked by information or direct link
connection. To determine the connection between two commercial websites, Web
structure mining can be very useful.
3. Web Usage Mining: Web usage mining is the application of identifying or
discovering interesting usage patterns from large data sets. And these patterns
enable you to understand the user behaviors or something like that. In web usage
mining, user access data on the web and collect data in form of logs. So, Web
usage mining is also called log mining.

Comparison Between Data mining and Web mining:


Points Data Mining Web Mining

Data Mining is the process that Web Mining is the process of data
attempts to discover pattern and mining techniques to automatically
Definition
hidden knowledge in large data discover and extract information
sets in any system. from web documents.

Data Mining is very useful for Web Mining is very useful for a
Application
web page analysis. particular website and e-service.

Target Data scientists along with data


Data scientist and data engineers.
Users analysts.

Data Mining access data


Access Web Mining access data publicly.
privately.

In Data Mining get the In Web Mining get the information


Structure information from explicit from structured, unstructured and
structure. semi-structured web pages.

Clustering, classification,
Problem Web content mining, Web
regression, prediction,
Type structure mining.
optimization and control.

Special tools for web mining are


It includes tools like machine
Tools Scrapy, PageRank and Apache
learning algorithms.
logs.

It includes approaches for data It includes application level


cleansing, machine learning knowledge, data engineering with
Skills
algorithms. Statistics and mathematical modules like
probability. statistics and probability.
Data Mining For Intrusion Detection and
Prevention
The security of our computer systems and data is at continual risk. The extensive
growth of the Internet and the increasing availability of tools and tricks for intruding
and attacking networks have prompted intrusion detection and prevention to become
a critical component of networked systems.
Intrusion
Unauthorized access by an intruder involves stealing valuable resources and misuse
those resources, e.g. Worms and viruses. There are intrusion prevention techniques
such as user authentication, and sharing encrypted information that is not enough to
operate because the system is becoming more complex day by day so, we need a
layer of security controls.
Intruder
It is an entity that is trying to gain unauthorized access over a system or a network.
Moreover, the data present in that system will be corrupted along with an imbalance
in the environment of that network.
Intruders are of majorly two types
 Masquerader (Outside Intruder) – No authority to use the network or system
 Misfeasor (Inside Intruder) – authorized access to limited applications

Intrusion detection system

An Intrusion Detection System is a device or an application that detects unusual


indication and monitors traffic and report its results to an administrator, but cannot
take action to prevent unusual activity. The system protects the confidentiality,
integrity, and availability of data and information systems from internet attacks. We
see that the network extended dynamically, so too are the possibilities of risks and
chances of malicious intrusions are increasing.
Types of attacks detected by Intrusion detection systems majorly:
 Scanning attacks
 Denial of service (DOS) attacks
 Penetration attacks
Fig.2 Architecture of IDS

Intrusion prevention system


It is basically an extension of the Intrusion Detection System which can protect the
system from suspicious activities, viruses, and threats, and once any unwelcome
activity is identified IPS also takes action against those activities such as closing
access points and prevent firewalls.
The majority of intrusion detection and prevention systems use
either signature-based detection or anomaly-based detection.
1. Signature-Based – The signature-based system uses some library of signatures of
known attacks and if the signature matches with the pattern the system detects the
intrusion take prevention by blocking the IP address or deactivate the user account
from accessing the application. This system is basically a pattern-based system
used to monitor the packets on the network and compares the packets against a
database of signature from existing attacks or a list of attack patterns and if the
signature matches with the pattern the system detect the intrusion and alert to the
admin. E.g.Antiviruses.
Advantage
 Worth detecting only Known attacks.
Disadvantage
0 seconds of 0 secondsVolume 0%
Loading ad

 Failed to identify new or unknown attacks.


 Regular update of new attacks
2. Anomaly-Based – The anomaly-based system waits for any abnormal activity. If
activity is detected, the system blocks entry to the target host immediately. This
system follows a baseline pattern first we train the system with a suitable baseline
and compare activity against that baseline if someone crosses that baseline will be
treated as suspicious activity and an alert is triggered to the administrator.
Advantage
 Ability to detect unknown attacks.
Disadvantage
 Higher complexity, sometimes it is difficult to detect and chances of false
alarms.

As we know that data mining is the system of extracting patterns from huge datasets
through combining techniques from statistician artificial intelligence with database
management. In intrusion detection (ID) and intrusion prevention device (IPS) we
recollect a few things which might be utilized in data mining for intrusion
detection systems (IDS) and intrusion prevention devices (IPS).

How does data mining help in Intrusion detection and prevention

Modern network technologies require a high level of security controls to ensure safe
and trusted communication of information between the user and a client. An
intrusion Detection System is to protect the system after the failure of traditional
technologies. Data mining is the extraction of appropriate features from a large
amount of data. And, it supports various learning algorithms, i.e. supervised and
unsupervised. Intrusion detection is basically a data-centric process so, with the help
of data mining algorithms, IDS will also learn from past intrusions, and improve
performance from experience along with find unusual activities. It helps in exploring
the large increase in the database and gather only valid information by improving
segmentation and help organizations in real-time plan and save time. It has various
applications such as detecting anomalous behavior, detecting fraud and abuse,
terrorist activities, and investigating crimes through lie detection. Below list of areas
in which data mining technology can be carried out for intrusion detection.
 Using data mining algorithms for developing a new model for IDS: Data
mining algorithm for the IDS model having a higher efficiency rate and lower
false alarms. Data mining algorithms can be used for both signature-based and
anomaly-based detection. In signature-
based detection, training information is classified as either “normal” or
“intrusion.” A classifier can then be derived to discover acknowledged intrusions.
Research on this place has included the software of
clarification algorithms, association rule mining, and cost-sensitive modeling.
Anomaly-primarily based totally detection
builds models of normal behavior and automatically detects massive deviations
from it. Methods consist of the software of clustering, outlier analysis,
and class algorithms, and statistical approaches. The strategies used have
to be efficient and scalable, and able to dealing
with community information of excessive volume, dimensional, and
heterogeneity.
 Analysis of Stream data: Analysis of stream data means is analyzing the data in
a continuous manner but data mining is basically used on static data rather than
Streaming due to complex calculation and high processing time. Due
to the dynamic nature of intrusions and malicious attacks, it is
more critical to perform intrusion detection withinside
the records stream environment. Moreover, an event can be ordinary on
its own but taken into consideration malicious if regarded as a part of a
series of activities. Thus, it’s far essential to look at what sequences
of activities are regularly encountered together, locate sequential patterns,
and pick out outliers. Other data mining strategies for locating evolving clusters
and constructing dynamic class models in records streams also are essential for
real-time intrusion detection.
 Distributed data mining: It is used to analyze the random data which is
inherently distributed into various databases so, it becomes difficult to integrate
processing of the data. Intrusions may
be launched from numerous distinctive places and focused on many distinctive de
stinations. Distributed data mining strategies can be used to
investigate community data from numerous network places to detect those distrib
uted attacks.
 Visualization tools: These tools are used to represent the data in the form of
graphs which helps the user to get a visual understanding of the data. These tools
are also used for viewing any anomalous patterns
detected. Such tools may encompass capabilities for viewing associations,
discriminative patterns, clusters, and outliers. Intrusion
detection structures must actually have a graphical user interface
that permits safety analysts to pose queries concerning the network data or
intrusion detection results.

Page Rank Algorithm in Data Mining


The page rank algorithm is applicable to web pages. The page rank algorithm is used
by Google Search to rank many websites in their search engine results. The page
rank algorithm was named after Larry Page, one of the founders of Google. We can
say that the page rank algorithm is a way of measuring the importance of website
pages. A web page basically is a directed graph which is having two components
namely Nodes and Connections. The pages are nodes and hyperlinks are
connections.

Hyperlink Induced Topic Search (HITS)


Algorithm
Hyperlink Induced Topic Search (HITS) Algorithm is a Link Analysis Algorithm
that rates webpages, developed by Jon Kleinberg. This algorithm is used to the web
link-structures to discover and rank the webpages relevant for a particular search.
HITS uses hubs and authorities to define a recursive relationship between webpages.
Before understanding the HITS Algorithm, we first need to know about Hubs and
Authorities.

 Given a query to a Search Engine, the set of highly relevant web pages are
called Roots. They are potential Authorities.
 Pages that are not very relevant but point to pages in the Root are called Hubs.
Thus, an Authority is a page that many hubs link to whereas a Hub is a page that
links to many authorities.
Hyperlink Induced Topic Search (HITS) Algorithm is a Link Analysis Algorithm
that rates webpages, developed by Jon Kleinberg. This algorithm is used to the web
link-structures to discover and rank the webpages relevant for a particular search.
HITS uses hubs and authorities to define a recursive relationship between webpages.
Before understanding the HITS Algorithm, we first need to know about Hubs and
Authorities.

 Given a query to a Search Engine, the set of highly relevant web pages are
called Roots. They are potential Authorities.
 Pages that are not very relevant but point to pages in the Root are called Hubs.
Thus, an Authority is a page that many hubs link to whereas a Hub is a page that
links to many authorities.

What is Text Mining?


Text mining is a component of data mining that deals specifically with unstructured
text data. It involves the use of natural language processing (NLP) techniques to
extract useful information and insights from large amounts of unstructured text data.
Text mining can be used as a preprocessing step for data mining or as a standalone
process for specific tasks.
By using text mining, the unstructured text data can be transformed into structured
data that can be used for data mining tasks such as classification, clustering, and
association rule mining. This allows organizations to gain insights from a wide range
of data sources, such as customer feedback, social media posts, and news articles.

What is sentiment analysis (opinion mining)?


Sentiment analysis, also referred to as opinion mining, is an approach to natural
language processing (NLP) that identifies the emotional tone behind a body of text.
This is a popular way for organizations to determine and categorize opinions about
a product, service or idea. Sentiment analysis involves the use of data mining,
machine learning (ML), artificial intelligence and computational linguistics to
mine text for sentiment and subjective information such as whether it is expressing
positive, negative or neutral feelings.

Sentiment analysis systems help organizations gather insights into real-time


customer sentiment, customer experience and brand reputation. Generally, these
tools use text analytics to analyze online sources such as emails, blog posts, online
reviews, customer support tickets, news articles, survey responses, case studies,
web chats, tweets, forums and comments. Algorithms are used to implement rule-
based, automatic or hybrid methods of scoring whether the customer is expressing
positive words, negative words or neutral ones.

In addition to identifying sentiment, sentiment analysis can extract the polarity or


the amount of positivity and negativity, subject and opinion holder within the text.
This approach is used to analyze various parts of text, such as a full document or a
paragraph, sentence or subsentence.

Vendors that offer sentiment analysis platforms include Brandwatch, Critical


Mention, Hootsuite, Lexalytics, Meltwater, MonkeyLearn, NetBase Quid, Sprout
Social, Talkwalker and Zoho. Businesses that use these tools to analyze sentiment
can review customer feedback more regularly and proactively respond to changes
of opinion within the market.

You might also like