Social Network Analytics (SNA)
Social Network Analytics (SNA)
Analytics (SNA)
Social Network Analytics
• Social Network Analytics (SNA) is an essential field that leverages mathematical and
statistical tools to examine relationships within networks, offering insights into their
structure and behavior.
2.Centrality
Centrality measures identify key nodes in a network:
1. Degree Centrality: Number of direct connections a node has.
2. Betweenness Centrality: Importance of a node in connecting others.
3. Eigenvector Centrality: Influence of a node based on its connections to highly connected nodes.
3. Clustering Coefficient
•Measures the likelihood of a node's neighbors being connected.
•A high clustering coefficient indicates tightly knit groups, suggesting cohesive sub-networks.
4. Community Detection
•Identifies clusters of nodes that are more interconnected with each other than with the rest of the network.
•Useful for uncovering subgroups or communities with shared attributes or frequent interactions.
•Describes the property of most nodes being reachable within a few steps, even in large networks.
•Highlights the efficiency of social networks in spreading information or influence.
5. Homophily
•Refers to the tendency of nodes with similar attributes (e.g., interests, behaviors) to connect.
6. Small-World Phenomenon
•This similarity-driven connectivity shapes the structure and dynamics of social networks.
Methodologies in Social Network
Analytics (SNA)
1. Data Collection
•Gathering data from diverse sources like social media platforms, organizational databases, and
surveys.
•Representing data using adjacency matrices (tabular representation of connections) or edge lists
(pairs of connected nodes).
2. Network Visualization
•Using tools like Gephi, Cytoscape, or NetworkX to create visual representations of networks.
•Visualization helps detect patterns, clusters, and key connections within the network.
3. Descriptive Analysis
•Calculating metrics such as degree centrality, clustering coefficients, and identifying
community structures.
•Provides a summary of the network’s properties and highlights influential nodes and patterns.
4. Centrality Analysis
•Identifying central nodes to determine their influence or strategic importance.
•Useful for pinpointing hubs (highly connected nodes) or intermediaries critical for network flow.
5. Community Detection
•Using algorithms like the Louvain method or modularity optimization to group nodes into
communities.
•Helps understand the internal structure and dynamics of the network.
Applications of Social Network
Analytics
1. Social Media Analysis
•Understanding information flow, identifying influencers, and detecting communities of interest.
•Monitoring sentiment and trends on platforms like Twitter, Facebook, and LinkedIn.
4. Counterterrorism
•Analyzing networks of extremist groups to identify key players and vulnerabilities.
•Enhancing strategies to disrupt harmful networks effectively.
2. Task Types
•Link Prediction: Predict potential connections between nodes (e.g., suggesting friends on social
media).
•Node Classification: Assign labels to nodes based on features and network structure (e.g., spam
detection).
4.Temporal Dynamics
•Dynamic Graph Learning: Analyze networks that evolve over time to capture changes.
•Time-Aware Embeddings: Integrate temporal aspects into embeddings for tasks like trend
analysis.
Methods in Social Network
Learning
1. Supervised Learning
•Classification and Regression: Train models using labeled data for tasks like link prediction.
•Ensemble Methods: Combine predictions from multiple models for robust results.
2. Unsupervised Learning
•Clustering: Group nodes with similar features or structures (e.g., k-means, spectral clustering).
•Community Detection: Identify densely connected node groups without using labels.
•Recommendation Systems
•Suggesting friends, products, or content based on social interactions and preferences.
•Fraud Detection
•Identifying fraudulent activities by spotting anomalies in network interactions.
•Collaborative Filtering
•Recommending items by analyzing preferences of similar users in social networks.
•Healthcare Analytics
•Studying collaboration among healthcare professionals to identify key influencers and optimize
information flow
Relational Neighbor Classifier
(RNC)
• The Relational Neighbor Classifier (RNC) is a machine learning
algorithm tailored for graph-structured data, leveraging the inherent
relationships between entities to improve classification accuracy.
• It is particularly useful for applications involving interconnected
systems, such as social networks, knowledge graphs, and biological
networks.
Key Components of Relational
Neighbor Classifier
1. Relational Representation
•Graph Structure: Data is represented as a graph with nodes (entities) and edges (relationships),
capturing relational dependencies.
2. Relational Features
•Node Features: Attributes associated with individual nodes, such as demographic data or transactional
history.
•Edge Features: Characteristics of edges, such as weights or types, which may represent relationship
strength or type (e.g., "friend," "colleague").
3. Relational Learning
•Neighbor Information: The algorithm assumes that the class of a node is influenced by the classes of
its neighboring nodes.
•Label Propagation: Information (e.g., class labels) flows from neighbors to nodes, leveraging the
graph's structure to infer missing or unknown labels.
4. Classification Model
•Classifier Type: Typically uses standard classifiers (e.g., decision trees, logistic regression, or support
vector machines) but adapts them to incorporate relational features.
•Relational Integration: Extends traditional classifiers by embedding features derived from neighbors
and relationships.
Workflow of Relational Neighbor
Classifier
•Graph Representation
•Transform data into a graph format with nodes (entities) and edges (relationships).
•Assign attributes or features to nodes and edges.
•Feature Extraction
•Extract node-specific features and relationship-specific features.
•Aggregate information from neighboring nodes, such as the average class distribution or feature
similarity.
•Learning Relational Features
•Incorporate neighbor-based information into node features, such as weighted averages or majority
voting from neighboring labels.
•Employ methods like label propagation to enhance feature richness.
•Classifier Training
•Use the enriched relational features to train a classification model.
•The model learns both individual and relational characteristics for accurate predictions.
•Prediction
•For a new or unlabeled node, combine its own features with propagated information from neighbors to
predict its class label.
Probabilistic Relational
Neighbor Classifier (PRNC)
• The Probabilistic Relational Neighbor Classifier (PRNC) extends the
traditional Relational Neighbor Classifier (RNC) by incorporating
probabilistic models to handle uncertainty in relationships and
attributes within graph-structured data.
• It is particularly useful in scenarios where data is incomplete, noisy, or
inherently uncertain, enabling a richer and more nuanced
understanding of the underlying patterns.
Key Components of PRNC
1. Graph Representation
•Graph Structure: Represents data as a graph where nodes symbolize entities, and edges
represent relationships. This structure captures the relational dependencies critical for accurate
predictions.
5. Probabilistic Classifier
•Bayesian Inference: Computes the posterior distribution of class
labels by integrating node features, edge probabilities, and
relational dependencies.
•Uncertainty Estimation: Provides probabilistic outputs, offering
both predictions and confidence measures.
Workflow of PRNC
1.Graph Representation
1. Structure data as a graph, assigning features to nodes and associating probabilities with edges.
2.Probabilistic Modeling
1. Construct a probabilistic graphical model to define the joint distribution of nodes, attributes,
and edges.
3.Learning Probabilistic Features
1. Use techniques like EM or variational inference to infer latent features and optimize edge
probabilities.
4.Classifier Training
1. Train a probabilistic classifier, such as a Bayesian classifier, using the learned features and
relational probabilities.
5.Probabilistic Prediction
1. For new or unlabeled nodes, compute the posterior probability distribution over class labels,
capturing both predictions and associated uncertainties.
Egonets
• Egonets, or egocentric networks, focus on the local network surrounding a specific node (the ego) in a larger
network.
• This approach is essential for understanding immediate relationships, dynamics, and structures within a
localized context.
• Egonet analysis helps researchers and practitioners study personalized network interactions, such as social
behavior, influence, and communication patterns.
Concepts in Egonets
1. Ego Node
•The central node of the egonet, representing the individual or entity whose local network is under
analysis.
2. Egonet
•A localized network consisting of the ego node and all its direct neighbors (alters). This network
focuses on direct connections and relationships.
3. Ties
•The relationships between the ego and its alters or between alters themselves. Ties can be:
•Directed: Representing one-way relationships (e.g., follower-following relationships on social
media).
•Undirected: Representing mutual relationships (e.g., friendship).
4. Network Metrics
Various metrics quantify egonet properties:
•Degree Centrality: Number of ties connected to the ego.
•Clustering Coefficient: Measures how interconnected the ego’s neighbors are.
•Reciprocity: Ratio of mutual relationships among the ego’s ties.
•Ego Density: Proportion of possible ties among the ego’s alters that are realized.
Egonet Analysis
• Egonet analysis involves examining the structural and functional properties of an
ego’s local network to uncover patterns, behaviors, or trends.
1.Steps in Egonet Analysis
1. Data Collection: Identify the ego node and gather data on its direct connections and
relationships.
2. Visualization: Map the egonet to visualize the connections and ties.
3. Metric Computation: Calculate metrics like degree centrality, clustering coefficient, and
density.
4. Interpretation: Analyze the metrics and structures to draw insights into the ego’s role in the
broader network.
2.Types of Analysis
1. Descriptive Analysis: Focuses on the structure and metrics of the egonet.
2. Comparative Analysis: Compares egonets of different nodes to identify patterns or
variations.
3. Predictive Analysis: Uses egonet properties to predict behaviors, influence, or network
evolution.
Analysis of Egonets:
1. Degree Distribution:
• The degree of a node in an egonet represents the number of direct connections it has. Analyzing the degree
distribution of an egonet provides insights into the ego’s popularity or connectivity within its immediate network.
2. Clustering Coefficient:
• The clustering coefficient measures the extent to which the neighbors of the ego are connected to each other. A
high clustering coefficient indicates that the ego’s contacts are likely to be interconnected.
3. Reciprocity:
• Reciprocity in an egonet refers to the likelihood that connections are mutual. In social networks, this could
indicate mutual friendships or interactions.
4. Centrality Measures:
• Degree centrality, closeness centrality, and betweenness centrality are examples of centrality measures that can
be calculated for nodes within an egonet. These measures help identify key nodes and their influence within the
local network.
Mobile Analytics
Mobile analytics involves collecting, measuring, and analyzing data
from mobile platforms to understand user behavior, optimize
experiences, and make informed decisions.
Components
1.Data Collection:
1. Tracking user interactions, device info, and in-app events through SDKs or
APIs.
2.User Identification:
1. Methods like device fingerprinting or authentication to track user journeys.
3.Event Tracking:
1. Monitoring specific user actions, e.g., app launches, purchases.
4.User Segmentation:
1. Grouping users by demographics or behavior for targeted analysis.
5.Funnel Analysis:
1. Mapping the step-by-step user journey to identify drop-off points.
Metrics
1.User Acquisition:
1. Installations and Source Tracking (organic, paid, referrals).
2.User Engagement:
1. Session Duration, DAU, WAU, MAU.
3.Retention Rates:
1. Day 1, Day 7, Day 30 retention rates.
4.Monetization:
1. ARPU, Conversion Rates.
5.User Behavior:
1. Event tracking, Screen Views.
6.Performance Metrics:
1. Crash counts, App load times.
7.Geolocation & Device Info:
1. Device types, Geographic distribution.
Tools
1.Google Analytics for Mobile:
1. User behavior and engagement analysis.
2.Firebase Analytics:
1. Real-time analytics with attribution tracking.
3.Flurry Analytics:
1. Retention analysis, demographics insights.
4.Mixpanel:
1. Focused on user behavior and A/B testing.
5.Amplitude:
1. Cohort analysis and predictive analytics.
6.Localytics:
1. Engagement and conversion tracking with push notifications.
Practices of Analytics in Google
• Google Analytics (Web & App Analytics)
1. Web Analytics:
Tracks website traffic, user interactions, demographics, and conversion rates.
2. Event Tracking:
Monitors user engagements such as clicks, form submissions, and video views.
3. E-commerce Analytics:
Provides insights into transactions, revenue, and purchase behaviors.
• Google Ads Analytics
1. Ad Performance Metrics:
Tracks CTR, CPC, and conversion rates to measure ad campaign effectiveness.
2. Conversion Tracking:
Measures user actions after ad interactions to calculate ROI.
3. Audience Insights:
Analyzes user demographics and interests for precise targeting.
• Google Search Console
1. Search Performance Analytics:
Insights into search queries, clicks, impressions, and CTR for SEO optimization.
• Firebase Analytics (Mobile App Analytics)
1. App Analytics:
Tracks user behavior, in-app events, and conversion rates for mobile apps.
2. User Attribution:
Identifies user acquisition sources and marketing channel effectiveness.
• Google Cloud Platform (Big Data & Visualization)
1. BigQuery:
Real-time analysis of large datasets for machine learning and data analytics.
2. Data Studio:
Interactive dashboards for visualizing data from multiple sources.
• Google Trends
1. Search Trends Analysis:
Tracks the popularity of search queries over time.
2. Geographical Insights:
Analyzes regional interest variations to guide localized strategies.
• Google Cloud AI & Machine Learning
1. Machine Learning Services:
Tools like TensorFlow and AutoML for implementing ML models.
2. Predictive Analytics:
Forecasts trends and identifies patterns using ML models.
6. Gaming Analytics
• Xbox Analytics: Tracks player behavior, preferences, and engagement.
• Game Development Analytics: Optimizes gameplay based on real-time player feedback.
Practices of Analytics on Kaggle
1. Exploratory Data Analysis (EDA)
• Data Exploration:
Kagglers begin by examining datasets to understand distributions, detect missing
values, and explore variable relationships.
• Visualization:
Tools like Matplotlib, Seaborn, and Plotly are used to create visualizations that
reveal patterns, trends, and anomalies in data.
2. Feature Engineering
•Creating New Features:
Participants derive new features by combining or transforming existing variables to
enhance predictive performance.
•Handling Categorical Variables:
Techniques such as one-hot encoding, label encoding, and target encoding are
commonly used to prepare categorical data for models.
3. Model Building
• Algorithm Selection:
Kagglers experiment with a variety of machine learning models, such as random
forests, gradient boosting (e.g., XGBoost, LightGBM, CatBoost), neural networks,
and ensemble methods.
• Hyperparameter Tuning:
Systematic optimization of algorithm parameters (using techniques like grid search
or Bayesian optimization) ensures models achieve their best performance.
4. Ensemble Methods
• Stacking Models:
Kagglers combine predictions from multiple models to improve accuracy, often
stacking diverse models to leverage their strengths.
• Voting Systems:
Weighted averages or majority voting methods are used to combine model
predictions for more robust outcomes.
5. Validation Strategies
• Cross-Validation:
Techniques like k-fold cross-validation help ensure models generalize well
to unseen data, reducing the risk of overfitting.
• Time Series Splitting:
For time-dependent data, Kagglers use forward-chaining cross-validation
to maintain the chronological order of observations.
4. Privacy Analytics
•Data Access Controls:
Analytics monitor and enforce strict controls on user data access.
•Privacy Impact Assessments:
Before new features are introduced, their impact on user privacy is evaluated.
•User Transparency Analytics:
Tools are analyzed to improve user interaction with privacy settings and enhance
transparency.
5. Trend and Virality Analysis
• Topic Modeling:
Textual data analysis identifies trending topics to highlight relevant content.
• Virality Metrics:
Metrics like content shares, spread speed, and engagement help prioritize
trending posts in user feeds.