Social Media Analytics - Unit I
Social Media Analytics - Unit I
DATA IDENTIFICATION:
Data identification involves locating the right datasets required for analysis. This
step is crucial as the quality and relevance of the data directly impact the
insights and conclusions drawn from it. In the context of social media analytics,
data can come from various sources such as:
Social media platforms (e.g., Twitter, Facebook, Instagram)
Forums and discussion boards (e.g., Reddit)
Blogs and news sites
Public datasets and APIs
DATA MEANING:
Data Meaning refers to interpreting and understanding the significance and
implications of the data. This process involves contextualizing the data,
assessing its relevance, analyzing sentiments, and identifying trends. Here’s a
detailed explanation of each aspect:
1. Contextual Analysis
Contextual Analysis involves understanding the circumstances and conditions
in which the data was generated. This helps in interpreting the data accurately
and deriving meaningful insights.
Key Components:
Source: Understanding where the data comes from (e.g., social media
platform, survey, website).
Content: Analyzing the actual content, such as text, images, or videos.
Metadata: Evaluating additional information like timestamps, user
demographics, and location.
Historical Context: Considering past events or trends that might
influence the data.
Example:
Analyzing tweets about a political event requires understanding the event itself,
the timing of the tweets, and the users' locations to accurately interpret the
sentiments and opinions expressed.
2. Relevance
Relevance refers to the importance and applicability of the data to the specific
objectives of the analysis. Relevant data directly impacts the accuracy and
usefulness of the insights derived.
Key Components:
Objective Alignment: Ensuring the data aligns with the goals of the
analysis.
Scope: Ensuring the data covers the necessary aspects and dimensions
required.
Completeness: Ensuring the data is comprehensive and not missing
critical information.
Quality: Ensuring the data is accurate, reliable, and free from errors.
Example:
For a marketing campaign targeting young adults, data on social media usage
patterns and preferences of users aged 18-25 is highly relevant, while data on
older age groups may be less relevant.
3. Sentiment Analysis
Sentiment Analysis involves determining the emotional tone behind a series of
words to gain an understanding of the attitudes, opinions, and emotions
expressed within an online mention.
Key Components:
Polarity: Classifying the sentiment as positive, negative, or neutral.
Intensity: Measuring the strength of the sentiment expressed.
Context: Considering the context in which sentiments are expressed to
avoid misinterpretation.
Language Processing: Using natural language processing (NLP)
techniques to analyze text data.
Example:
Analyzing product reviews on an e-commerce website can reveal customers'
positive or negative sentiments towards specific product features, helping
businesses make informed decisions.
4. Trend Analysis
Trend Analysis involves examining data over time to identify patterns, trends,
and changes. This helps in understanding how variables of interest evolve and
predicting future movements.
Key Components:
Temporal Patterns: Identifying trends over specific time periods (daily,
monthly, yearly).
Cyclical Patterns: Recognizing repetitive patterns or cycles in the data.
Anomalies: Detecting unexpected changes or outliers in the data.
Comparative Analysis: Comparing current trends with historical data to
gauge changes.
Example:
Analyzing social media mentions of a brand over several months can reveal
trends in customer engagement, identify peak times of interest, and highlight
any significant changes in public perception.
VALUE PYRAMID:
The value pyramid is a conceptual model used to illustrate the progression from
raw data to wisdom. This progression represents increasing value and utility at
each level. Here's a breakdown of each stage in the pyramid:
1. Noisy Data:
o Description: This is raw, unprocessed data that contains errors,
redundancies, and irrelevant information.
o Value: Low. It's difficult to derive meaningful insights from noisy
data due to its unrefined state.
2. Filtered Data:
o Description: Noisy data that has been cleaned and organized,
removing errors, redundancies, and irrelevant information.
o Value: Moderate. While still basic, filtered data is more reliable and
easier to work with than noisy data.
3. Information:
o Description: Filtered data that has been processed, organized, and
structured in a way that gives it context and meaning. This often
involves summarizing or categorizing data.
o Value: Higher. Information is useful for understanding specific
aspects of a situation or phenomenon.
4. Knowledge:
o Description: Information that has been synthesized and analyzed
to provide insights, patterns, and understanding. Knowledge is often
gained through experience or study.
o Value: High. Knowledge allows for informed decision-making and
problem-solving.
5. Wisdom:
o Description: The application of knowledge in practical, judicious,
and ethical ways. Wisdom involves deep understanding and the
ability to make sound judgments and decisions.
o Value: Highest. Wisdom enables individuals and organizations to
navigate complex situations and achieve long-term success.
Visual Representation
DATA ATTRIBUTES:
The 5 V's are defined as follows:
1.Velocity is the speed at which the data
is created and how fast it moves.
2.Volume is the amount of data
qualifying as big data.
3.Value is the value the data provides.
4.Variety is the diversity that exists in
the types of data.
5.Veracity is the data's quality and
accuracy.
Velocity
Velocity refers to the speed at which data is generated and processed. This is
crucial for organizations needing real-time data to make timely decisions.
Example: In healthcare, medical devices like in-hospital monitors and wearable
devices continuously collect patient data. This data needs to be transmitted and
analyzed rapidly to ensure timely medical interventions.
Volume
Volume pertains to the sheer amount of data collected. Big data is characterized
by its large volume, which requires significant storage and processing
capabilities.
Example: A retail company with hundreds of stores generates millions of
transactions daily. The total number of these transactions represents the volume
of data, qualifying it as big data.
Value
Value is the benefit derived from analyzing big data. It's essential for
organizations to extract meaningful insights to enhance their operations and
decision-making processes.
Example: By gathering and analyzing individual customer data, a company can
create detailed customer profiles. This allows for personalized marketing and
sales strategies, improving customer satisfaction and operational efficiency.
Variety
Variety refers to the different types of data collected from various sources. This
includes structured, semi-structured, and unstructured data from both internal
and external sources.
Example: An organization collects data from multiple sources like social media,
transactional databases, and customer feedback forms. The challenge lies in
standardizing and integrating this diverse data for comprehensive analysis.
Veracity
Veracity concerns the accuracy, quality, and trustworthiness of data. Ensuring
high data veracity is crucial for deriving reliable and valuable insights.
Example: Collected data may have missing values, inaccuracies, or
inconsistencies. Ensuring high veracity means implementing rigorous data
cleaning and validation processes to maintain the integrity and credibility of the
data.
CASTING A NET:
Casting a net in social media analytics is a comprehensive strategy to collect and
analyze data from various sources to gain a thorough understanding of public
sentiment, trends, and interactions. This approach involves using different
methods and tools to ensure no relevant information is missed.
Explanation of Casting a Net:
Casting a net involves gathering a broad and varied set of data from multiple
sources to ensure comprehensive insights. Here’s how each component
contributes to this strategy, with real-time examples and sources of information.
1. Broad Search Query
Definition:
A broad search query uses general and inclusive terms to capture a wide range
of data related to a topic. It aims to gather all possible mentions and discussions.
How It Works:
You create search queries that include broad and general terms. This helps
capture various ways people might discuss the topic, even if they use different
language or terminology.
Real-Time Example:
Scenario: A company is launching a new smartwatch and wants to understand
overall consumer interest.
Implementation: Instead of searching for “SmartWatch XYZ reviews,” they use
broader terms like “best smartwatches 2024” or “latest wearable technology.”
This approach gathers a wide array of related discussions and reviews.
Where We Get This Information:
Search Engines: Google, Bing (for articles, blog posts, and news).
Social Media Platforms: Twitter, Facebook, Instagram (for user-generated content
and comments).
2. Automation Tools
Definition:
Automation tools are software applications that automate the process of data
collection, analysis, and reporting, providing real-time insights and reducing
manual effort.
How It Works:
These tools continuously track relevant data, analyze trends, and generate
reports automatically. This allows for efficient and timely monitoring of large
volumes of information.
Real-Time Example:
Scenario: A company wants to track social media feedback on their new smart
watch line.
Implementation: Using tools like Hootsuite or Sprout Social, the retailer sets up
automated tracking for mentions, sentiment analysis, and engagement metrics
related to the smart watches. These tools provide real-time updates and alerts.
Where We Get This Information:
Automation Tools: Hootsuite, Sprout Social, Buffer (for tracking and reporting).
3. Multiple Platforms of Gathering Data
Definition:
Collecting data from various social media platforms and other sources to ensure
a comprehensive view of user interactions and sentiments.
How It Works:
Data is gathered from different platforms to capture a wide range of discussions
and interactions. This provides a holistic view of the topic or brand.
Real-Time Example:
Scenario: A smart watch company wants to understand international reactions to
a new product launch.
Implementation: Using tools like Brandwatch, the company collects data from
Twitter, Facebook, Instagram, LinkedIn, and tech forums. This multi-platform
approach ensures they capture diverse discussions from different regions and
audiences. Use Twitter API to collect tweets mentioning smart watch related
keywords.
Where We Get This Information:
Social Media Platforms: Twitter, Facebook, Instagram, LinkedIn (for varied user
interactions).
Data Aggregation Tools: Brandwatch, BuzzSumo (for comprehensive data
collection).
4. Continuous Monitoring
Definition:
Ongoing tracking and analysis of data to capture real-time trends, feedback, and
changes. It involves regularly reviewing data to stay updated.
How It Works:
Continuous monitoring involves setting up alerts and frequently checking data to
track shifts in trends, sentiment, and user engagement.
Real-Time Example:
Scenario: A company is running a promotional campaign and wants to monitor
customer reactions in real time.
Implementation: Using tools like Google Alerts and Mention, the company
continuously tracks mentions and feedback related to the campaign. This
enables quick responses to emerging issues or opportunities.
Where We Get This Information:
Monitoring Tools: Google Alerts, Mention, Sprout Social (for real-time tracking
and notifications).
5. Inclusive Criteria
Definition:
Using broad and varied criteria to ensure all relevant data is captured. This
approach avoids missing significant information by employing wide-ranging
search terms and filters.
How It Works:
Inclusive criteria involve setting up diverse search terms and parameters to
cover all potential mentions and interactions related to the topic.
Real-Time Example:
Scenario: A smart watch company is assessing the impact of a recent health
tracker watch campaign.
Implementation: The organization monitors variations of the campaign’s name,
related hashtags, and general terms like “smart watch” or “health tracker
watch.” This ensures comprehensive data collection.
Where We Get This Information:
Search Engines: Google, Bing (for broad search terms).
Social Media Platforms: Twitter, Facebook, Instagram (for diverse content).
Analytics Tools: Mention, Brandwatch (for broad analysis).
REGULAR EXPRESSIONS:
Regular expressions are the sequence of characters that are used to match or
find similar strings from an existing data.
1. Match function - this function attempts to match regular expression
pattern to an existing string. The syntax for this function is:
re.match (patterns, strings, flags=0)
The parameters to the match functions are patterns, strings, flags. The
re.match fuction returns a match object on success and none on no result.
3. Search function – this function searches for the first occurrence of the
regular expression pattern. The syntax:
re.search (patterns, strings, flags=0)
The parameters are patterns, strings, flags. The re.search fuction returns a
match object on success and none on no result.