Unit - 2 Web Intelligence
Unit - 2 Web Intelligence
1. Web Scraping
● Description: APIs are like doors that some websites open to allow programs
to access their data directly. Using APIs, we can get data in a ready-to-use
format (often JSON or XML).
● Example Tools:
○ API Documentation: Provides details on how to use the API,
including available data, endpoints, and rules.
○ Requests Library (Python): Helps to send and receive data from an
API.
● Example Use Case: For example, if you’re building a weather app, you
might use a weather API that provides real-time weather updates for
different locations.
● Considerations: Most APIs limit how often you can request data and
require you to use an access key (for security). So, make sure to follow their
usage rules and not exceed limits.
3. Web Crawling
● Description: RSS feeds are regularly updated lists of website content, often
used by blogs or news sites. They let you receive updates without visiting
the website.
● Example Tools:
○ RSS Readers: Tools that show updates from multiple websites, like
Feedly.
○ API Integration: Sometimes data feeds are also available through
APIs.
● Example Use Case: A news app might use RSS feeds to get the latest
headlines from multiple news sources and display them in one place.
● Considerations: Not every site has an RSS feed, so you might not be able to
use this method everywhere.
Web Scraping
1. Purpose:
○ Definition: Web scraping is like using a tool to gather specific
information from a website. This is usually done for research,
competitive analysis, or to compile content from different sources.
○ Focus: It’s about collecting specific data points (like prices, product
details, or reviews) directly from a website.
2. Data Extraction:
○ Method: Tools like BeautifulSoup or Selenium (both Python
libraries) go through a website’s HTML code to pick out needed
information.
○ Data Type: The data collected is usually in raw HTML form, which
may need further processing to be useful.
3. Examples:
○ Competitive Intelligence: A business might scrape prices from a
competitor’s site to adjust its pricing.
○ Research: A researcher could scrape social media or e-commerce
sites to study trends or analyze customer feedback.
4. Legal Considerations:
○ Web scraping should always respect a website’s terms of service and
legal regulations. If a site forbids scraping, doing so could result in
bans or legal action.
5. Real-Life Example:
○ Imagine you’re interested in tracking prices of laptops from various
online stores. Web scraping tools can help you collect prices from
each store automatically, so you don’t have to check manually.
Web Analytics
1. Purpose:
○ Definition: Web analytics is about collecting and analyzing data on
how users interact with a website.
○ Focus: It provides insights into user behavior on a site to improve user
experience, optimize marketing, and support business decisions.
2. Data Collection:
○ Method: A small piece of code (called a “tracking code” or “pixel”) is
added to each page of a website. This code captures data on user
activities, such as which pages they visit and how long they stay.
○ Data Type: Analytics tools gather organized data, like total visits,
page views, and conversions (when a visitor completes an action like
signing up or purchasing).
3. Examples:
○ Performance Measurement: Website owners can track how many
people visit, which pages they visit most, and where they drop off in
the sales process.
○ Conversion Optimization: Businesses analyze data to see where
users are abandoning their carts and make changes to improve the
buying experience.
4. Legal Considerations:
○ Since analytics tools collect user data, it’s important to get users'
consent, especially in regions with strict privacy laws like the EU
(GDPR) and California (CCPA). Analytics data is generally
anonymized to respect privacy.
5. Real-Life Example:
○ A blog owner might use Google Analytics to see which articles are
most popular. If they notice that posts on specific topics get more
traffic, they may decide to create more similar content.
Dashboard
A basic dashboard in web scraping is a simple, visual tool that organizes and
displays data collected from websites. It helps users quickly see important trends,
track the progress of scraping tasks, monitor performance, and ensure data quality.
Here’s a breakdown of the main parts of a basic dashboard and why each part is
useful.
1. Data Visualizations:
○ Explanation: These are charts (line, bar, pie) or other visual elements
to help users see trends, comparisons, and patterns in the scraped data.
○ Example: A pie chart might show the proportion of products in each
category, while a line chart shows price trends over time.
2. Status and Logs:
○ Explanation: This section shows tables with detailed information,
such as task statuses (completed, running, failed) and logs that record
the history of each scraping activity.
○ Example: A table could list all completed tasks with information on
how many items were scraped and any errors encountered.
3. Filters and Interactivity:
○ Explanation: Interactive elements like dropdowns and date pickers let
users filter data to view specific information.
○ Example: A date filter could allow users to view only the data
scraped in the last week, while a category filter could display only
certain types of products.
4. Alerts and Notifications:
○ Explanation: Alerts notify users of important events, like task failures
or unexpected changes in data.
○ Example: If a scraping job fails, a notification could alert you to
review and fix the issue.
1. Python Libraries:
○ Dash: A Python framework that helps create interactive web
dashboards with minimal code.
○ Plotly: Adds interactive and customizable graphs, such as line or bar
charts.
○ Flask or Django: Web frameworks that support building dashboards
and integrating scraped data.
○ Example: A small-scale scraping project could use Dash and Plotly to
create a simple dashboard showing scraped data, task statuses, and
error logs.
2. Business Intelligence (BI) Tools:
○ Tools: Tableau, Power BI, and Qlik are popular BI tools that offer
powerful dashboarding and data visualization features.
○ Example: For a large project where scraped data is combined with
other sources, Tableau might be used to create a more advanced,
sharable dashboard with detailed visuals and interactive elements.
In web scraping, the type of report you create depends on the project’s goals, the
people who will use the data, and the insights you need. Here’s a simplified
overview of common types of reports and how they might be used.
Web analytics helps businesses understand how people use their websites. It tracks
visitor actions, such as clicks, views, and purchases, and then uses this data to
make better decisions. Here's how different industries use web analytics to improve
their operations:
2. Digital Marketing
● Content Performance: Websites track which articles or videos get the most
views, likes, and shares. For example, a news site might find that political
articles get more traffic than entertainment stories, helping them focus on
content people like.
● User Engagement: Websites measure how long visitors stay on a page or
how deep they scroll to see if they're really engaged. For instance, if readers
quickly leave a blog post, it could mean the content isn’t interesting enough.
● Ad Revenue Optimization: By analyzing how people interact with ads,
media companies can improve ad placement to increase clicks and revenue.
For example, moving an ad to a more visible spot could get more attention.
4. Healthcare
● User Behavior Analysis: Banks track how customers use their online
services, such as making payments or checking balances. If customers
frequently drop off during the payment process, the website can be
improved.
● Fraud Detection: Unusual behavior, like multiple failed login attempts or
sudden large transactions, can be flagged for possible fraud.
● Customer Experience: Banks use data to make online banking more
user-friendly, like streamlining forms or offering personalized product
recommendations.
● Booking Trends: Travel companies track when people book trips, which
destinations are popular, and when customers are most likely to book. This
helps them offer discounts or promote specific deals at the right time.
● Customer Reviews: Travel companies monitor online reviews to improve
their services. If many customers mention a hotel’s slow check-in process,
they might work on speeding it up.
● Dynamic Pricing: Travel websites use web data to adjust prices based on
demand. For instance, hotel prices might increase if lots of people are
searching for rooms in a certain city.
7. Education