Sna QB
Sna QB
UNIT I INTRODUCTION
Introduction to Semantic Web: Limitations of current Web - Development of Semantic Web -
Emergence of the Social Web - Social Network analysis: Development of Social Network Analysis -
Key concepts and measures in network analysis - Electronic sources for network analysis: Electronic
discussion networks, Blogs and online communities - Web-based networks - Applications of Social
Network Analysis.
Text Book:
Peter Mika, ―Social Networks and the Semantic Web‖, First Edition, Springer 2007
Part A
The semantic web is an extension of the current web in which information is given well-defined meaning,
better enabling computers and people to work in co-operation.
The current Web has its limitations when it comes to: 1. Finding relevant information 2. Extracting relevant
information 3. Combining and reusing information
Social network analysis [SNA] is the mapping and measuring of relationships and flows between people,
groups, organizations, computers, URLs, and other connected information/knowledge entities. The nodes
in the network are the people and groups while the links show relationships or flows between the nodes.
Actors and their actions are viewed as interdependent rather than independent, autonomous units.
Relational ties (linkages) between actors are channels for transfer or “flow” of resources.
Network models focusing on individuals view the network structure environment as providing
opportunities for or constraints on individual action.
Network models conceptualize structure as lasting patterns of relations among actors.
Structural intuition
Systematic relational data
Graphic representation and mathematical or computational models.
An online community is a group of people with common interests who use the Internet (web sites, email,
instant messaging, etc.) to communicate, work together and pursue their interests over time.
8. Define following terminologies from network analysis: actor, Dyad, Triad (K1)
Dyad: It is a tie between two actors and consists of a pair of actors and the tie(s) between them.
Triad: Triples of actors and associated ties. A subset of three actors and the tie(s) among them.
Modularity maximization. In spite of its known drawbacks, one of the most widely used methods for
community detection is modularity maximization. Modularity is a benefit function that measures the
quality of a particular division of a network into communities.
Graph Density is defined as the total number of observed lines in a graph divided by the total number of
possible lines in the same graph. Density ranges from 0 to 1.
Number of lines ( L )
Density (D) = Number of points ( Number of points−1 )
2
Affiliation networks contain information about the relationships between two set of nodes: a set of subjects
and a set of affiliations. An affiliation network can be formally represented as a bipartite graph, also known
as a two-mode network.
A distinctive feature of affiliation networks is duality. i.e., events can be described as collections of
individuals affiliated with them and actors can be described as collections of events with which they are
affiliated.
14. Explain adjacency matrix? (K1)
An adjacency matrix is a square matrix with one row and one column for each vertex in a network. The
content of a cell in the matrix indicates the presence and possibly the sign or value of a tie from the vertex
represented by the row to the vertex represented by the column.
The transitivity modle applies to an unsigned directed network if it consists of cliques such that cliques
within ranks are not related and cliques between ranks are related by null dyads or asymmetric dyads
pointing towards the higher rank.
In a two-mode networ, vertices are divided into two sets and vertices can only be related to vertices in the
other set.
Affiliationso f actors with events provide a direct linkage between actors through memberships in
events, or between events through common memberships.
Affiliations provide conditions that facilitate the formation of pairwise ties between actors.
Affiliations enable us to model the relaitonships between actors and events as a whole system.
Recall is the ratio of the number relevant records retrieved to the total number of relevant records in the
database. It is usually expressed as percentage.
Precision is the ratio of the number of relevant records retrieved to the total number of irrelevant and
relevant records retrieved.
Personal network is a set of human contacts known to an individual, with whom that individual would
expect to interact at intervals to support a given set of activities. In other words, a personal network is a
group of caring, dedicated people who are committed to maintain a relationship with a person in order to
support a given set of activities.
Part B
3. Explain the Resource Description Framework (RDF) and RDF schema. (K2)
Part C
Text Book:
Peter Mika, ―Social Networks and the Semantic Web‖, First Edition, Springer 2007
Part A
11.What is ROLAP?(K1)
ROLAP is a set of user interfaces and applications that give a relational database a dimensional flavour. ROLAP
stands for Relational Online Analytic Processing.
12. What is the need for End User Data Access tool?(K1)
End User Data Access tool is a client of the data warehouse. In a relational data warehouse, such a client maintains a
session with the presentation server, sending a stream of separate SQL requests to the server. Evevtually the end user
data access tool is done with the SQL session and turns around to present a screen of data or a report, a graph, or
some other higher form of analysis to the user. An end user data access tool can be as simple as an Ad Hoc query
tool or can be complex as a sophisticated data mining or modeling application.
Part C
1. Design a method for Modelling and aggregating of social network data. (K3)
2. Compare and Contrast the E/R, UML, XML and RDF/OWL languages. Identify Which one is more
efficient?Justify your answer. (K3)
UNIT III Extraction and Mining Communities in Web Social Networks
Extracting evolution of Web Community from a Series of Web Archive - Detecting communities in
social networks - Definition of community - Evaluating communities - Methods for community
detection and mining - Applications of community mining algorithms - Tools for detecting
communities social network infrastructures and communities - Decentralized online social networks
- Multi-Relational characterization of dynamic social network communities.
Text Book:
Peter Mika, ―Social Networks and the Semantic Web‖, First Edition, Springer 2007
Part A
which are adjacent to each other, and there are no other nodes that are also adjacent to all of
Decompose data objects into a several levels of nested partitioning (tree of clusters) called a dendrogram.
Discovering groups in a network where individual’s group memberships are not explicitly given.
A web community is a web site (or group of web sites) where specific content or links are only available to
its members. A web community may take the form of a social network service, an Internet forum, a group
of blogs, or another kind of social software web application
FOAF is used for describing people profiles, their relationships and their activities online. FOAF aims to
create a linked information system about people, groups, companies and other kinds of thing. If people
publish information in FOAF document format, machines will be able to make use of that information.
A decentralized online social network (DOSN) is a distributed system for social networking with no or
limited dependency on any dedicated central infrastructure
igraph is a free software package for creating and manipulating undirected and directed graphs. It includes
implementations for classic graph theory problems like minimum spanning trees and network flow, and also
implements algorithms for some recent network analysis methods, like community structure search
Dynamic social networks are social networks that take into account changes over time. They not only model
relations between human beings in terms of interpersonal interactions, but also consider the evolution of these
relations, i.e. the way and the extent by which they change over time
Mutual awareness refers to a relationship developed through observable interactions between two people. We can
define mutual awareness computationally by contextual use of links in social media
12. How is Web community extracted? (K1)
The web community is extracted communities based on their dense bipartite pattern. Ranking significantly
improves the relevance of the extracted communities with the search topic. Instead of working on the
whole web graph, we work on a web domain, which we extract based on the topic specific search results.
Therefore, the resulted communities are highly related with the search topic.
A virtual community is a social network of individuals who interact through specific social media, potentially
crossing geographical and political boundaries in order to pursue mutual interests or goals. Some of the most
pervasive virtual communities are online communities operating under social networking services.
A system of measurement is a collection of units of measurement and rules relating them to each other. Systems of
measurement have historically been important, regulated and defined for the purposes of science and commerce.
Systems of measurement in modern use include the metric system, the imperial system, and United States customary
units.
15. What attributes are used to represent how many URLs the focused community obtains or loses?
(K1)
HTML5 defines a menu, which is to be used to contain the primary navigation of a web site, be it a list of
links or a form element such as a search box. This is a good idea, as previous to this we would contain the
navigation block inside something like <div id = “navigation”>
16. Justify the statement “The Web is extremely dynamic”. (K1)
To facilitate this task we would appreciate that the largest amount of meta-data would be supplied along with the
contents, specially.
the web site address(es). If there are several web sites, please group the contents belonging to each one of
them on a separate directory;
the content addresses (URL). If you are providing a local copy of a site please maintain the original file
names. If you are supplying contents that you gathered from the web please provide their original URLs;
the content dates. Supply the date when each content was published or saved. If you do not know the exact
dates, please supply approximate dates;
the content media type (MIME). Please maintain the original file name extensions of the contents
(e.g. .gif, .html, .jpg). If possible, provide the full HTTP header for each content. It is particularly important
to provide the media type for contents dynamically generated that do not contain file name extensions.
Spectral methods are a class of techniques used in applied mathematics and scientific computing to numerically
solve certain differential equations, often involving the use of the Fast Fourier Transform. The idea is to write the
solution of the differential equation as a sum of certain "basis functions" (for example, as a Fourier series which is a
sum of sinusoids) and then to choose the coefficients in the sum in order to satisfy the differential equation as well as
possible.
Kemighan-Lin algorithm is about the heuristic algorithm for the graph partitioning problem. For a heuristic
for the travelling salesperson problem, see Lin-Kemighan heuristic. It is a heuristic algorithm for finding
partitions of graphs.
Modular programming is the process of subdividing a computer program into separate sub programs. A
module is a separate software component. It can often be used in a variety of applications and functions
with other components of the system.
A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by
hierarchical clustering. Dendrograms are often used in computational biology to illustrate the clustering of
genes or samples, sometimes on top of heatmaps.
The Girvan-Newman algorithm detects communities by progressively removing edges from the original
network. The connected components of the remaining network are the communities. Instead of trying to
construct a measure that tells us which edges are the most central to communities, the Girvan-Newman
algorithm focuses on edges that are most likely “between” communities.
The graph partition problem is defined on data represented in the form of a graph G=(V,E), with V vertices
and E edges, such that it is possible to partition G into smaller components with specific properties. For
instance, a k-way partition divides the vertex set into k smaller components. A good partition is defined as
one in which the number of edges running between separated components is small.
It is a known fact that solutions to a certain second order parabolic partial differential equation are
represented by means of a diffusion process or a stochastic flow.
Community mining is one of the major directions in social network analysis. However, in reality, there
exist multiple, heterogeneous social networks, each representing a particular kind of relationship, and each
kind of relationship may play a distinct role in a particular task.
Part B
1. Explain how will you extract of web community from a series of web archives? (K2)
2. Explain the various tools for Detecting Communities Social Network Infrastructures and Communities.
(K2)
Understanding and predicting human behaviour for social communities - User data management - Inference
and Distribution - Enabling new human experiences - Reality mining - Context - Awareness - Privacy in
online social networks - Trust in online environment - Trust models based on subjective logic - Trust network
analysis - Trust transitivity analysis - Combining trust and reputation - Trust derivation based on trust
comparisons - Attack spectrum and countermeasures.
Text Book:
Peter Mika, ―Social Networks and the Semantic Web‖, First Edition, Springer 2007
Part A
5. What are the two different threads of research on the analysis of dynamic social networks? (K1)
Social and temporal analysis methods.
10. What are the two risk functions of non-parametric method? (K1)
Modelling the risk function non-parametrically, estimating it, for example, by a smoothing (thin plate)
spline is attractive as a more explorative approach. For prospective studies this amounts to smoothing
within the framework and distributional assumptions of generalized regression models (for binary
observations). Case-control studies as retrospective studies with exposure to risk factors being observed do
not immediately fit into this setting.
Part B
Part C
1. Develop the four dimensions that are associated to knowledge discovery in social networks and
elaborate on their interplay in the context of evolution. (K3)
2. Identify how communities evolve into the learning process as smoothly evolving constellation of
interacting entities (K3)