0% found this document useful (0 votes)
123 views53 pages

Web Analytics, Web Mining, and Social Analytics

The document discusses web mining, which is the process of discovering relationships from web data through techniques like web content mining, web structure mining, and web usage mining. It also covers related topics like search engines, search engine optimization (SEO), web analytics metrics and dashboards, and the web analytics maturity model. The goal of web mining and analytics is to extract useful knowledge and insights from large amounts of web data.

Uploaded by

Asmita Nagpal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views53 pages

Web Analytics, Web Mining, and Social Analytics

The document discusses web mining, which is the process of discovering relationships from web data through techniques like web content mining, web structure mining, and web usage mining. It also covers related topics like search engines, search engine optimization (SEO), web analytics metrics and dashboards, and the web analytics maturity model. The goal of web mining and analytics is to extract useful knowledge and insights from large amounts of web data.

Uploaded by

Asmita Nagpal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Web Analytics, Web Mining, and

Social Analytics
Web Mining Overview
 Web is the largest repository of data
 Data is in HTML, XML, text format
 Challenges (of processing Web data)
 The Web is too big for effective data mining
 The Web is too complex
 The Web is too dynamic
 The Web is not specific to a domain
 The Web has everything
 Opportunities and challenges are great!
Web Mining
 Web mining (or Web data mining) is the
process of discovering intrinsic relationships
from Web data (textual, linkage, or usage)
 Is it the same as data mining on data
generated on the Internet?
 Web data?
 Content, Link, Log, …
 Web Mining versus Web Analytics
 Look at the simple taxonomy on the next slide
Web Mining
Data Text
Mining Mining

WEB MINING

Web Content Mining Web Structure Mining Web Usage Mining


Source: unstructured Source: the unified Source: the detailed
textual content of the resource locator (URL) description of a Web
Web pages (usually in links contained in the site’s visits (sequence
HTML format) Web pages of clicks by sessions)

Search Engines Sentiment Analysis Semantic Webs Web Analytics

Page Rank Information Retrieval Graph Mining Social Analytics Clickstream Analysis

Search Engines Optimization Social Network Analysis Social Media Analytics Log Analysis

Marketing Attribution Customer Analytics 360 Customer View


Web Content/Structure Mining
 Mining the textual content on the Web
 Data collection via Web Crawlers/Spiders

 Web pages include hyperlinks


 The best place to hide a body !
 Second page of Google Search results
Search Engines
 Google, Bing, Yahoo, …
 For what reason do you use search engines?
 Search engine is a software program that
searches for documents (Internet sites or
files) based on the keywords (individual
words, multi-word terms, or a complete
sentence) that users have provided that have
to do with the subject of their inquiry
 They are the workhorses of the Internet
Structure of a
Typical Internet Search Engine

Pro Web Crawler Cra


to wl i
rch Query Analyzer
ce
Qu ssed RLs n
Sea ery f U We g the
ery t o wl b
Qu Lis Cra Scheduler

Metadata

Index
Cashed / Indexed
Documents DB
User Responding Cycle Development Cycle
World Wide Web
ed
tch Pro sse
d
Ord Rank a
f M es ce ce s
ere ed-
dP
Document to
Lis Pag Pag ssed Document npro Page
es U eb
age
s
Matcher/Ranker Indexer W
Anatomy of a Search Engine
1. Development Cycle
 Web Crawler
 Document Indexer
 Steps
 Step 1 – Pre-Processing the Documents
 Collecting, organizing, and storing
 Step 2 – Parsing the Documents
 Step 3 – Creating the Term-by-Document Matrix
 How to represent the values (numeric, binary, …)
 Term Frequency / Inverse Document Frequency
Anatomy of a Search Engine
2. Response Cycle
 Query Analyzer
 Document Matcher/Ranker

 How does Google do it?


 Googlebot
 Google indexer
 Google Query Processor
Technology Insights
PageRank Algorithm
 PageRank is a link
analysis algorithm
  Larry Page
 Outcome of a
research project
at Stanford
University in 1996
 The “secret
sauce” in Google
Search Engine Optimization (SEO)
 It is the intentional activity of affecting the
visibility of an e-commerce site or a Web site in a
search engine’s natural (unpaid or organic)
search results
 Part of an Internet marketing strategy
 Based on knowing how a search engine works
 Content, HTML, keywords, external links, …
 Indexing based on …
 Webmaster submission of URL
 Proactively and continuously crawling the Web
Top 15 Most Popular Search
Engines
Methods for
Search Engine Optimization
 Search engine recommended techniques
(White-Hat SEO)
 Producing results based on good site design,
accurate content (for users, not engines)
 Search engine disapproved techniques
(Black-Hat SEO)
 Deception (what is shown is different to
human and machine/spider)
 Spamdexing? (search spam, search engine
spam, or search engine poisoning)
Web Usage Mining
  Web Analytics!
 Extraction of information from data
generated through Web page visits and
transactions…
 data stored in server access logs, referrer logs,
agent logs, and client-side cookies
 user characteristics and usage profiles
 metadata, such as page attributes, content
attributes, and usage data
 Clickstream data, clickstream analysis
Web Usage Mining
 Web usage mining applications
 Determine the lifetime value of clients
 Design cross-marketing strategies across products
 Evaluate promotional campaigns
 Target electronic ads and coupons at user groups
based on user access patterns
 Predict user behavior based on previously learned
rules and users' profiles
 Present dynamic information to users based on their
interests and profiles
 …
Web Usage Mining
(Clickstream Analysis)

Pre-Process Data Extract Knowledge


Website
User / Collecting Usage patterns
Customer Merging User profiles
Cleaning Page profiles
Structuring Visit profiles
- Identify users Customer value
- Identify sessions
- Identify page views
- Identify visits
Weblogs

How to better the data


How to improve the Web site
How to increase the customer value
Web Analytics Metrics
 Provides near-real-time data to deliver invaluable
information to …
 Improve site usability
 Manage marketing efforts
 Better document ROI, …
 Web analytics metric categories:
 Web site usability: How were they using my Web site?
 Traffic sources: Where did they come from?
 Visitor profiles: What do my visitors look like?
 Conversion statistics: What does all this mean for the
business?
Web Analytics Metrics
- Web Site Usability

Web Site Usability Traffic Source


1. Page views 1. Referral Web sites
2. Time on site 2. Search engines
3. Downloads 3. Direct
4. Click map 4. Offline campaigns
5. Click paths 5. Online campaigns
Web Analytics Metrics
- Web Site Usability

Visitor Profiles Conversion Statistics


1. Keywords 1. New visitors
2. Content groupings 2. Returning visitors
3. Geography 3. Leads
4. Time of day 4. Sales/conversions
5. Landing page 5. Abandonment rates
A Web Analytics Dashboard
Web Analytics Maturity Model
 Maturity  degree of proficiency, formality, and
optimization of business models
 Business Intelligence Maturity Model (TDWI)
 Management Reporting ➔ Spreadmarts ➔ Data Marts
➔ Data Warehouse ➔ Enterprise Data Warehouse ➔
BI Services
 Business Analytics Maturity Model (INFORMS)
 Descriptive Analytics ➔ Predictive Analytics ➔
Prescriptive Analytics
 Web analytics maturity model  next slide…
Web Analytics Maturity Model
Web Analytics Tools
 Plenty of them exist, and numbers are increasing
(Web-based versus downloadable)
 Google Web Analytics (google.com/analytics)
 Yahoo! Web Analytics (web.analytics.yahoo.com)
 Open Web Analytics (openwebanalytics.com)
 Piwik (PIWIK.ORG)
 FireStats (firestats.cc)
 Site Meter (sitemeter.com)
 Woopra (woopra.com)
 AWStats (awstats.org)
 Snoop (reinvigorate.net) …
Putting It All Together—A Web
Site Optimization Ecosystem
Two-Dimensional
View of the Inputs
for Web Site
Optimization

Goal:
 Customer Experience
Management (CEM)
 Voice of Customer (VOC)
Web Mining Success Stories
 Amazon.com, Ask.com, Scholastic.com, …
 A Process View of the Web Site Optimization
Ecosystem
Customer Interaction Analysis of Interactions Knowledge about the Holistic
on the Web View of the Customer

Web
Analytics

Voice of
Customer

Customer Experience
Management
Voice of the Customer Strategy
Framework
Social Analytics
Social Network Analysis
 Social Network - social structure composed
of individuals linked to each other
 Analysis of social dynamics
 Interdisciplinary field
 Social psychology
 Sociology
 Statistics
 Graph theory
Social Analytics
Social Network Analysis
 Social Networks help study relationships
between individuals, groups,
organizations, societies
 Self organizing
 Emergent
 Complex
 Typical social network types
 Communication networks, community
networks, criminal networks, innovation
networks, …
Bridges of Konigsberg
There are 2 islands and 7
bridges that connect the islands
and the mainland .Find a path
that crosses each bridge exactly
once

City Map (From Wikipedia)


 Euler proved that since except for the starting
and ending point of a walk, one has to enter
and leave all other nodes, thus these nodes
should have an even number of bridges
connected to them (1736 )
 Foundations of Graph Theory , first true proof in
the theory of network
 Modern era social media mining ,infact design of
SN is heavily dependent of the branch of
mathematics
 A network is a graph.
 Elements of the network have
meanings
 Network problems can usually be
represented in terms of graph
theory
Visualizing network as a collection of points
connected by lines
 Points are referred to as nodes, actors,

or vertices (plural of vertex)


 Connections are referred to as edges or

ties

Nod
Edg
e
e
 A social network
 A network where elements have a social structure
 A set of actors (such as individuals or organizations)
 A set of ties (connections between individuals)

 Social networks examples:


 your family network, your friend network, your colleagues ,etc.

 To analyze these networks we can use Social Network


Analysis (SNA)

 Social Network Analysis is an interdisciplinary field from


social sciences, statistics, graph theory, complex networks,
and now computer science
Centrality measures
 Simplest measure
 degree centrality,
 which is defined as the number of links
incident upon a node
 in-degree/out degree
Centrality measures
 closeness centrality (or closeness)
 is the average length of the shortest path
between the node and all other nodes in
the graph.
 Farness ! (reciprocal)
 For finding the broadcasters !
Centrality measures
 Betweenness centrality
 betweenness centrality for each vertex is
the number of these shortest paths that
pass through the vertex.
 For finding the individuals who influence
the flow around a system.
Practical application
– GS Identification

1. A social network is created using the similarity calculated


between the users . For creating baseline model for
benchmarking with existing approach , similarity is
calculated through traditional rating-based method
2. Degree centrality for each node is calculated.
3. Nodes below a certain threshold value are dropped from
the system one by one.
4. Prediction formula of CF is applied on the remaining data.
5. Users removed in step 3 for the optimum threshold value
identified are the GS users.

-40
Social Media
Definitions and Concepts
 Enabling technologies of social interactions
among people
 Relies on enabling technologies of Web 2.0
 Takes on many different forms
 Internet forums, Web logs, social blogs,
microblogging, wikis, social networks,
podcasts, pictures, video, and product reviews
Social Media
 social media can be defined as any web
or mobile based platform that enables an
individual or agency to communicate
interactively and enables exchange of user
generated content.
3C of Social media
Different Types of Social Media
1. Collaborative projects (e.g., Wikipedia)
2. Blogs and microblogs (e.g., Twitter)
3. Content communities (e.g., YouTube)
4. Social networking sites (e.g., Facebook)
5. Virtual game worlds (e.g., World of
Warcraft), and
6. Virtual social worlds (e.g., Second Life)
--Kaplan and Haenlein (2010)
Social versus Industrial Media
 Web-based social media are different from
traditional/industrial media, such as
newspapers, television, and film
 Differentiating characteristics
 Quality  Immediacy
 Reach  Updatability
 Frequency
 Accessibility
 Usability
How Do People Use Social Media?
 Different engagement levels
Creators
Level of Social Media Engagement

Critics

Joiners

Collectors

Spectators

Inactives

Time
Social Media Analytics
 It is the systematic and scientific ways to
consume the vast amount of content created by
Web-based social media outlets, tools, and
techniques for the betterment of an
organization’s competitiveness
 Fastest growing movement in analytics
Social Media Insights
Tweeter
Facebook
Solutions
LinlkedIn Course of Actions
… …
Social Media Analytics
 HBR Analytic Services survey
 75% of the companies did not know where their
customers are talking about them
 31% do not measure effectiveness of social media
 only 23% are using social media analytics tools
 7% are able to integrate social media into marketing
 Measuring the Social Media Impact
 Descriptive analytics – simple counts/statistics
 Social network analysis
 Advanced analytics – predictive analytics, text mining
Best Practices in
Social Media Analytics
 Think of measurement as a guidance system, not
a rating system
 Track the elusive sentiment
 Continuously improve the accuracy of text
analysis
 Look at the ripple effect
 Look beyond the brand
 Identify your most powerful influencers
 Look closely at the accuracy of your analytic tool
 Incorporate social media intelligence into planning
Social Media Analytics
Tools and Vendors
 Radian6/Salesforce Cloud
 Sysomos
 Collective Intellect
 Webtrends
 Crimson Hexagon
 Converseon
 SproutSocial …
Social Media Analytics
Find Answers to Questions that matter
Where are my visitors
coming from?

What keywords did they


use to get there?

Am I creating
effective content?

Where are visitors


abandoning my
shopping cart? Why?

What devices they are using,


How can I improve
site interaction?

5
2
26
Google Analytics
 Demo

 https://analytics.google.com/analytics/
academy/

You might also like