0% found this document useful (0 votes)
18 views16 pages

Semantic Web Analysis

this notes contain the content of the semantic web and introduction of social media Analysis

Uploaded by

prathamsk130
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views16 pages

Semantic Web Analysis

this notes contain the content of the semantic web and introduction of social media Analysis

Uploaded by

prathamsk130
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

NOTES OF SOCIAL NETWORK ANALYSIS

Module-1: Introduction to Semantic Web


Semantic Web
The Semantic Web is the application of advanced knowledge technologies to the Web
and distributed systems in general.
Information that is missing or hard to access for our machines can be made accessible
using ontologies.
Ontologies are formal, which allows a computer to emulate human ways of reasoning
with knowledge.
Ontologies carry a social commitment toward using a set of concepts and relationships in
an agreed way.
The Semantic Web adds another layer on the Web architecture that requires agreements
to ensure interoperability.

Limitations of Current Web


In the case of the Web, adaptation to the primary interface to the vast information that constitutes the
Web: the search engine. In the following, search engines cannot answer at the moment with satisfaction
or not at all:
1. Who is Frank van Harmelen?
To answer such a question using the Web one would go to the search engine and enter the most logical
keyword: harmelen. The results returned can be related to different people. It can be a small town in
Netherlands. It is a place for Tragic train accident. Further English has a natural problem of polysemy
where same word can have different meanings. The reason is search engines know that users are
not likely to look at more than the top tenresults. Search engines are thus programmed in such
a way that the first page shows a diversity of the most relevant links related to the keyword.
This allows the user to quickly realize the ambiguity of the query and to make it moreSpecific.

2. Show the photo of Paris


Search engine fails to discriminate two categories of images: i. related to the city of Paris and ii.
Showing Paris Hilton. While the search engine does a good job with retrieving documents, the
results of image searches in general are disappointing. For the keyword Paris most of us would
expect photos of places in Paris or maps of the city. In reality only about half of the photos on
the first page, a quarter of the photos on the second page and a fifth on the third page are directly
related to our concept of Paris. The rest are about clouds, people, signs, diagrams etc
Problems:
Associating photos with keywords is a much more difficult task than simply looking
forkeywords in the texts of documents.
Automatic image recognition is currently a largely unsolved research problem.
Search engines attempt to understand the meaning of the image solely from its context

3. Find new music that one likes:


This is a difficult query. From the perspective of automation, music retrieval is just as
problematic as image search. Search engines do not exist for different reasons: most music on
the internet is shared illegally through peer-to-peer systems thatare completely out of reach
for search engines. Music is also a fast moving good; search enginestypically index the Web
once a month and therefore too slow for the fast-moving world of music releases. On the
other hand, our musical taste might change in which case this query would need to change

DEPT. OF AIML, JNNCE 1


NOTES OF SOCIAL NETWORK ANALYSIS

its form. A description of our musical taste is something that we might list on our
homepage but it is not something that we would like to keep typing in again for
accessingdifferent music-related services on the internet.

4. Tell me about music players with a capacity of at least 4GB


This is a typical e-commerce query: looking for a product with certain characteristics. One of
the immediate concerns is that translating this query from natural language to the boolean
language of search engines is (almost) impossible. The search engine will not know that 4GB
is the capacity of the music player. Problem is that general purpose search engines do not know
anything about music players or theirproperties and how to compare such properties. Another
bigger problem in our machines is trying to collect and aggregate product information from the
Web. The information extraction methods used for this purpose have a very difficult task and it
is easy to see why if we consider how a typical product description page looks like to the eyes
of the computer. Even if an algorithm can determine that the page describes a music player,
information about the product is very difficult to spot. Further, what one vendor calls
“capacity” and another may call “memory”. In order to compare music players from different
shops we need to determine that these two properties are actually thesame and we can directly
compare their values.

These questions are arbitrary in their specificity but they illustrate a general problem in
accessing the vast amounts of information on the Web. Namely, in all these cases we deal
with a knowledge gap: what the computer understands and able to work with is much more
limited than the knowledge of the user. In most cases, however, the knowledge gap is due to
the lack of some kind of background knowledge that only the human possesses. The
background knowledge is often completely missing from the context of the Web page and
thus our computers do not even stand a fair chance by working on the basis of the web page
alone.

Development of Semantic web


Evolution (Research)

Semantic Web has been actively promoted by the World Wide Web Consortium.
Knowledge representation and reasoning in the context of the Semantic Web involve
structuring data in a way that allows machines to understand and process information
meaningfully. The Semantic Web aims to create a universal framework that links data
across various sources using standardized formats like RDF (Resource Description
Framework) and OWL (Web Ontology Language). These standards enable the
representation of complex relationships between data, facilitating automated reasoning
and inferencing.
Natural Language Processing (NLP) plays a crucial role in the Semantic Web by
enabling the extraction, interpretation, and generation of meaningful information from
human language. NLP techniques are used to convert unstructured text into structured
data that can be understood and processed by machines. This involves tasks such as
entity recognition, relationship extraction, sentiment analysis, and automatic
summarization. By integrating NLP with the Semantic Web, it becomes possible to
enhance search engines, improve data retrieval, and enable more intuitive human-
computer interactions, ultimately making the web more accessible and useful for users.

DEPT. OF AIML, JNNCE 2


NOTES OF SOCIAL NETWORK ANALYSIS

Information retrieval in the context of the Semantic Web leverages structured data and
ontologies to improve the precision and relevance of search results. Unlike traditional
search engines that rely on keyword matching, Semantic Web technologies use
metadata and linked data principles to understand the context and relationships between
concepts. This allows for more sophisticated query processing, enabling the retrieval of
information based on the meaning and semantics of the data rather than just text
matches.
Semantic Web Layered Cake

The Semantic Web is often described as a layered cake, illustrating its architecture and the
various technologies that build upon one another to enable meaningful data interchange and
automated reasoning on the web. Here’s a brief overview of the layers in the Semantic Web
stack:
1. Unicode and URI (Uniform Resource Identifier):
o Unicode ensures that all characters are represented in a standardized form.
o URI provides a unique identifier for resources on the web.
2. XML (eXtensible Markup Language):
o A flexible text format used to encode documents in a machine-readable form.
3. Namespaces:
o Allow the use of XML elements and attributes from different vocabularies.
4. RDF (Resource Description Framework):
o A framework for representing information about resources in the web, using
triples (subject, predicate, object).
5. RDFS (RDF Schema):
o A semantic extension of RDF that provides mechanisms for describing groups
of related resources and the relationships between them.
6. OWL (Web Ontology Language):
o A language for defining and instantiating web ontologies, providing richer
integration and interoperability of data among diverse applications.
7. SPARQL (SPARQL Protocol and RDF Query Language):
o A query language and protocol used to query RDF datasets.

DEPT. OF AIML, JNNCE 3


NOTES OF SOCIAL NETWORK ANALYSIS

8. RIF (Rule Interchange Format):


o A framework for exchanging rules among different systems and integrating
them with RDF and OWL.
9. Logic and Proof:
o Involves using logical reasoning to derive new information and ensure the
correctness of data.
10. Trust:
o Mechanisms to establish the trustworthiness of data sources and the information
provided.
11. Cryptography:
o Ensures data integrity, authentication, and secure communication.

Innovation and Technology Adoption


The Semantic Web was originally conceptualized as an extension of the current Web,
i.e. as the application of metadata for describing Web content. In this vision, the content
that is already on the Web. However, this vision was soon considered to be less realistic
as it would be too difficult to master for the average person contributing content to the
Web.
The alternative view predicted that the Semantic Web will first break through behind
the scenes and not with the ordinary users, but among large providers of data and
services. The problem is that as a technology for developers, users of the Web never
experience the Semantic Web directly, which makes it difficult to convey Semantic
Web technology to stakeholders. Further, most of the times the gains for developers are
achieved over the long term,i .e. when data and services need to reused and re-purposed.
Semantic web exhibited fax effect in which adoption was slow. It also has the issue of
law of plentitude where only as many people /webpages begin using it will lead to
success.
The below figure shows the number of documents with the terms basketball, Computer
Science, and XML. The flat curve for the term basketball validates this strategy: the
popularity of basketballto be roughly stable over this time period. Computer Science
takes less and less share of the Web as the Web shifts from scientific use to everyday
use. The share of XML, a popular pre-semantic web technology seems to grow and
stabilize as it becomes a regular part of the toolkit of Web developers.

Against this general backdrop there was a look at the share of Semantic Web related
terms andformats, in particular the terms RDF, OWL and the number of ontologies

DEPT. OF AIML, JNNCE 4


NOTES OF SOCIAL NETWORK ANALYSIS

(Semantic Web Documents) in RDF or OWL

Five-stage hype cycle of Gartner Research


The first phase of a HypeCycle is the “technology trigger” or breakthrough, product
launch or other event that generates significant press and interest. In the next phase,
a frenzy of publicity typically generates overenthusiasm and unrealistic expectations.
There may be some successful applications of a technology, but there are typically more
failures. Technologies enter the “trough of disillusionment” because they fail to meet
expectations and quickly become unfashionable. Although the press may have stopped
covering the technology, some businesses continue through the “slope of
enlightenment” and experiment to understand the benefits and practical application of
the technology. A technology reaches the “plateau of productivity” as the benefits of it
become widely demonstrated and accepted. The technology becomes increasingly
stable and evolves in second and third generations. The final height of the plateau varies
according to whether the technology is broadly applicable or benefits only a niche
market.

Emergence of Social web


The Web was a read-only medium for a majority of users. The web of the 1990s was much like
the combination of a phone book and the yellow pages and despite the connecting power of
hyperlinks it instilled little sense of community among its users.

History of web 2.0


These set of innovations in the architecture and usage patterns of the Web led to an entirely
different role of the online world as a platform for intense communication and social
interaction. A recent major survey based on interviews with 2200 adults shows that the internet
significantly improves Americans’ capacity to maintain their social networks despite early
fears about the effects of diminishing real life contact.

Blogs The first wave of socialization on the Web was due to the appearance of blogs, wikis and
other forms of web-based communication and collaboration. Blogs and wikis attracted mass
popularity from around 2003. For adding content to the Web: editing blogs and wikis did not
require any knowledge of HTML any more. Blogs and wikis allowed individuals and groups
to claim their personal space on the Web and fill it with content at relative ease. Even more
importantly, despite that weblogs have been first assessed as purely personal publishing
(similar to diaries), nowadays the blogosphereis widely recognized as a densely interconnected

DEPT. OF AIML, JNNCE 5


NOTES OF SOCIAL NETWORK ANALYSIS

social network through which news, ideas and influences travel rapidly as bloggers reference
and reflect on each other’s postings.

Social networks
The first online social networks also referred to as social networking services. It entered the
fieldat the same time as blogging and wikis started to take off. Attracted over five million
registered users followed by Google and Microsoft. These sites allow users to post a profile
with basic information, to invite others to register and to link to the profiles of their friends.
The system alsomakes it possible to visualize and browse the resulting network in order to
discover friends in common, friends thought to be lost or potential new friendships based on
shared interests. The idea of network-based exchange is based on the sociological
observation that social interaction creates similarity and vice versa, interaction creates
similarity: friends are likelyto have acquired or develop similar interests.

User profiles
Explicit user profiles make it possible for these systems to introduce rating mechanism whereby
either the users or their contributions are ranked according to usefulness or trustworthiness.
Ratingsare explicit forms of social capital that regulate exchanges in online communities such
that reputation moderates exchanges in the real world. In terms of implementation, the new
web sites are relying on new ways of applying some of the pre-existent technologies.
Asynchronous JavaScript and XML, or AJAX, which drives many of the latest websites is
merely a mix of technologies that have been supported by browsers for years.

Web 2.0 + Semantic Web =Web 3.0?


Web 2.0 is often contrasted to the Semantic Web. the ideas of Web 2.0 and the
Semantic Web are not exclusive alternatives: while Web 2.0 mostly effects how users
interact with the Web, whilethe Semantic Web opens new technological opportunities
for web developers in combining data and services from different sources. In Web 2.0
users are willing to provide content as well as metadata. This may take the form articles
and facts organized in tables and categories in Wikipedia, photos organized in sets and
according to tags in Flickr or structured information embedded into homepages and
blog postings using micro formats.
The Semantic Web was originally also expected to be filled by users annotating Web
resources, describing their home pages and multimedia content. Micro formats, for
example, proved to be more popular due to the easier authoring using existing HTML
attributes.
Further, web pages created automatically from a database (such as blog pages or
personal profile pages) can encode metadata in microformats without the user
necessarily being aware of it. At the same time, microformats retain all the advantages
of RDF in terms of machine accessibility
due to the extensive collaborations online many applications have access to
significantly more metadata about the users. Information about the choices,
preferences, tastes and social networks of users means that the new breed of
applications are able to build on a much richer user profiles. Clearly, semantic
technology can help in matching users with similar interests as well as matching users
with available content
Standard formats for exchanging data and schema information, support for data
integration, along with standard query languages and protocols for querying remote
data sources provide a platform for the easy development of mashups is offered by

DEPT. OF AIML, JNNCE 6


NOTES OF SOCIAL NETWORK ANALYSIS

Semantic Web to Web 2.0.

Development of Social Network Analysis


The field of Social Network Analysis today is the result of the convergence of several
streams of applied research in sociology, social psychology and anthropology.
Sociology concepts in the Semantic Web explore the interplay between social
structures, human behavior, and the organization of information in digital spaces. By
applying sociological theories, such as social constructivism and actor-network theory,
the Semantic Web can be analyzed as a platform where knowledge is co-created
through user interactions and relationships among diverse actors, including individuals,
organizations, and technologies. Concepts like social capital and network analysis can
be used to understand how information flows and is shared within communities,
emphasizing the role of social ties in shaping access to information and resources.
Furthermore, the Semantic Web facilitates the study of collective intelligence and
collaborative knowledge production, highlighting how social dynamics influence data
representation, interpretation, and meaning-making in an increasingly interconnected
digital landscape.
Social psychology concepts in the Semantic Web focus on understanding how
individual and group behaviors influence the creation, sharing, and interpretation of
information online. Concepts such as social influence, conformity, and groupthink can
shed light on how users interact with content, make decisions, and establish norms
within digital communities. The Semantic Web's emphasis on structured data and
meaningful connections enhances the potential for social interactions, enabling users to
collaboratively construct knowledge and create shared meanings. Additionally, the
concepts of social identity and self-presentation are relevant, as individuals curate their
online personas through interactions with semantic data, affecting how they are
perceived in digital spaces.
Anthropology concepts in the Semantic Web examine the cultural, social, and
contextual factors that shape the way information is created, shared, and understood in
digital environments. By analyzing how different cultures and communities utilize
semantic technologies, anthropologists can explore issues of representation, meaning,
and power dynamics in knowledge production. Concepts such as cultural relativism and
ethnocentrism are particularly relevant, as they highlight the importance of
understanding diverse perspectives and the impact of cultural contexts on data
interpretation and usage. Additionally, the Semantic Web's focus on linking and
structuring data allows anthropologists to study the relationships between various
cultural artifacts and knowledge systems, facilitating a deeper understanding of how
digital spaces can reflect and influence human behavior, social practices, and cultural
identity in an increasingly interconnected world.

KEY CONCEPTS AND MEASURES IN NETWORK ANALYSIS


The global structure of networks
A Social network can be represented as a Graph G = (V,E) where V denotes finite set of
verticesand E denoted finite set of Edges.

DEPT. OF AIML, JNNCE 7


NOTES OF SOCIAL NETWORK ANALYSIS

Each graph can be associated with its characteristic matrix M: =(mi,j)n*n where n =|V|

A component is a maximal connected subgraph. Two vertices are in the same (strong)
componentif and only if there exists a (directed) path between them.
American psychologist Stanley Milgram experiment about the structure of social networks.
Milgram calculated the average of the length of the chains and concluded that the experiment
showed that on average Americans are no more than six steps apart from each other. While
this is also the source of the expression six degrees of separation the actual number is rather
dubious. Eg:

Formally, what Milgram estimated is the size of the average shortest path of the network, which
isalso called characteristic path length. The shortest path between two vertices vs and vt is a path
that begins at the vertex vs and ends in the vertex vt and contains the least possible number of
vertices. The shortest path between two vertices is also called a geodesic. The longest geodesic
in the graphis called the diameter of the graph: this is the maximum number of steps that is
required between any two nodes. The average shortest path is the average of the length of the
geodesics between allpairs of vertices in the graph.
A practical impact of Milgram’s finding structures is as that possible models for social
networks. The two-dimensional lattice model shown in following figure:

DEPT. OF AIML, JNNCE 8


NOTES OF SOCIAL NETWORK ANALYSIS

Clustering for a single vertex can be measured by the actual number of the edges
between the neighbors of a vertex divided by the possible number of edges between the
neighbors. When taken the average over all vertices we get to the measure known as clustering
coefficient. The clustering coefficient of tree is zero, which is easy to see if we consider that
there are no triangles of edges (triads) in the graph. Eg:

For a given node, clustering coefficient measures the proportion of connections between the
nodes in its neighbourhood that are actually connected. The formula for the local clustering
coefficient is:

The alpha and beta models of graphs are frameworks used to analyze and understand the
structure and dynamics of networks. The alpha model focuses on the concept of connectivity
and clustering within a graph, emphasizing how nodes are interconnected and the extent to
which they form tightly-knit communities or clusters. This model is useful for examining local
structures and the strength of connections among neighbors, often employing metrics like the
clustering coefficient to measure how nodes are grouped. In contrast, the beta model
emphasizes the overall topology of the network, considering global properties such as degree
distribution, centrality, and path lengths between nodes.

The macro-structure of social networks


A clique in graph theory is a subset of vertices in an undirected graph that are all adjacent to
one another, meaning every two distinct vertices within the clique are connected by an edge.
This concept is crucial for understanding the structure of networks, as cliques represent tightly-
knit groups where relationships are strong and mutual. In social networks, for example, a clique
might represent a group of friends who are all connected to each other, indicating a high level
of interaction and trust. The size of a clique is determined by the number of nodes it contains,
and cliques can vary in size from small groups (triangles) to larger formations. The study of

DEPT. OF AIML, JNNCE 9


NOTES OF SOCIAL NETWORK ANALYSIS

cliques has important implications in various fields, including sociology, biology, and
computer science, as it helps to identify communities, analyze network robustness, and
understand collaborative structures. Furthermore, the maximum clique problem, which
involves finding the largest clique within a graph, is a well-known computational challenge
with applications in clustering and optimization
The image that emerges is one of dense clusters or social groups sparsely connected to each
otherby a few ties. For example, this is the image that appears if we investigate the co-authorship
networks of a scientific community. Bounded by limitations of space and resources, scientists
mostly co-operate with colleagues from the same institute. Occasional exchanges and projects
with researchers from abroad, however, create the kind of shortcut ties that Watts explicitly
incorporated within his model. These shortcuts make it possible for scientists to reach each
other in a relatively short number of steps.

A k-plex is a type of graph structure used in network theory that allows for a relaxed form of
connectivity compared to a clique. In a k-plex, every node is connected to at least n−k other
nodes, where n is the total number of nodes in the k-plex and k is a given parameter that defines
the maximum number of nodes that can be excluded from the neighbourhood of any node. A
k-plex is a subset of a graph such that every vertex in the subset is connected to at least ∣S∣−k|
other vertices in the same subset, where ∣S∣| is the total number of vertices in the k-plex.

The lambda-set analysis method is a technique used in network theory and social network
analysis to identify and evaluate clusters or communities within a network based on the
relationships among nodes. A lambda set consists of a subset of nodes that share specific
characteristics, such as connectivity or similarity, determined by defined criteria like degree
thresholds. The process begins with constructing a network graph where nodes represent
entities and edges denote relationships. Researchers identify lambda sets by applying the
established criteria, followed by an analysis of the internal structure and dynamics of these sets.

Clustering a graph into subgroups allows us to visualize the connectivity at a group level. Core-
Periphery (C/P) structure is one where nodes can be divided in two distinct subgroups: nodes
in the core are densely connected with each other and the nodes on the periphery, while

DEPT. OF AIML, JNNCE 10


NOTES OF SOCIAL NETWORK ANALYSIS

peripheral nodes are not connected with each other, only nodes in the core. The result of the
optimization is a classification of the nodes as core or periphery and ameasure of the error of
the solution.

Fig.11

Affiliation networks are a type of bipartite graph that represent relationships between two
distinct sets of entities, typically individuals and their affiliations, such as organizations,
institutions, or events. In these networks, one set of nodes corresponds to individuals, while the
other set represents affiliations, with edges connecting individuals to the affiliations they
belong to or participate in. This structure allows for the analysis of collaboration patterns, social
dynamics, and the influence of affiliations on individual behavior and outcomes. For example,
in academic settings, affiliation networks can be used to study co-authorship patterns, where
researchers are connected to the institutions they work for, revealing insights into collaboration
trends, the impact of institutional networks on research productivity, and the dissemination of
knowledge. By examining these networks, researchers can identify key players, assess
community structures, and understand the flow of information within and across various
affiliations.

Eg:

Social capital refers to the networks, relationships, and norms that facilitate cooperation and
collaboration among individuals and groups within a society. It encompasses the resources
individuals can access through their social connections, such as trust, reciprocity, and shared

DEPT. OF AIML, JNNCE 11


NOTES OF SOCIAL NETWORK ANALYSIS

values, which enhance social cohesion and collective action. Social capital can be classified
into three types: bonding social capital, which occurs within close-knit groups and strengthens
ties among similar individuals; bridging social capital, which connects diverse groups and
fosters broader networks; and linking social capital, which involves connections between
individuals and institutions, enabling access to resources and opportunities

The structural dimension of social capital refers to patterns of relationships or positions that
provide benefits in terms of accessing large, important parts of the network.
Degree centrality equals the graph theoretic measure of degree, i.e. the number of (incoming,
outgoing or all) links of a node.
Closeness centrality, which is obtained by calculating the average (geodesic) distance of a
node to all other nodes in the network. In larger networks it makes sense to constrain the size
ofthe neighborhood in which to measure closeness centrality. It makes little sense, for example,
to talk about the most central node on the level of a society. The resulting measure is called
local closeness centrality.
Two other measures of power and influence through networks are broker positions and weak
ties.
Betweenness is defined as the proportion of paths — among the geodesics between all pairs of
nodes—that pass through a given actor.
A structural hole occurs in the space that exists between closely clustered communities.

Electronic sources for SNA

Electronic Discussion Networks


Electronic discussion networks are online platforms that facilitate communication and
interaction among individuals through discussions, debates, and exchanges of ideas.
These networks can take various forms, including forums, mailing lists, social media
groups, and chat applications, and they serve as spaces for users to share information,
seek advice, and collaborate on topics of interest. One of the key features of electronic
discussion networks is their ability to connect diverse participants, enabling knowledge
sharing and the formation of communities around common interests, whether they are
professional, educational, or recreational. The dynamics within these networks can be
analyzed using social network analysis (SNA) techniques to understand patterns of
communication, identify influential participants, and assess the flow of information.
One of the foremost studies to illustrate the versatility of electronic data is a series of
works from the Information Dynamics Labs of Hewlett-Packard. Tyler, Wilkinson and
Huberman analyze communication among employees of their own lab by using the
corporate email archive. They recreate the actual discussion networks in the
organization by drawing a tie between two individuals if they had exchanged at least a
minimum number of total emails in agiven period, filtering out one-way relationships.
The studies of electronic communication networks based on email data are limited by
privacy concerns. For example, in the HP case the content of messages had to be ignored
by theresearchers and the data set could not be shared with the community.
Public forums and mailing lists can be analyzed without similar concerns. The W3C —
which is also the organization responsible for the standardization of Semantic Web
technologies—is unique among standardization bodies in its commitment to
transparency towardthe general public of the Internet and part of this commitment is
the openness of the discussions within the working groups.

DEPT. OF AIML, JNNCE 12


NOTES OF SOCIAL NETWORK ANALYSIS

Blogging
Content analysis has also been the most commonly used tool in the computer aided
analysis of blogs (web logs), primarily with the intention of trend analysis for the
purposes ofmarketing. While blogs are often considered as “person themselves know
that blogs are muchmore than that: modern blogging tools allow to easily comment and
react to the comments ofother bloggers, resulting in webs of communication among
bloggers.
These discussion networks also lead to the establishment of dynamic communities,
which often manifest themselves through syndicated blogs (aggregated blogs that
collect posts from a set of authors blogging on similar topics), blog rolls (lists of
discussion partners on a personal blog) and even result in real world meetings such as
the Blog Walk series of meetings.
The 2004 US election campaign represented a turning point in blog research as it has
been the firstmajor electoral contest where blogs have been exploited as a method of
building networks among individual activists and supporters. Blog analysis has
suddenly shed its image as relevant only to marketers interested in understanding
product choices of young demographics; following this campaign there has been
explosion in research on the capacity of web logs for creating and maintaining stable,
long distance social networks of different kinds
Online community spaces and social networking services such as MySpace, Live
Journal cater to socialization even more directly than blogs with features such as social
networking (maintaining lists of friends, joining groups), messaging and photo
sharing. As they are typically used by a much younger demographic, they offer an
excellent opportunity for studying changes in youth culture.
RSS (Really Simple Syndication) feeds and blogs are complementary technologies that
enhance content distribution and consumption on the internet. A blog serves as a
platform where individuals or organizations publish articles, updates, and multimedia
content on specific topics. RSS feeds, on the other hand, are a standardized format for
delivering regularly updated information from blogs and other websites directly to
subscribers. When a blog publishes new content, an RSS feed automatically updates,
allowing users to receive notifications about the latest posts without needing to visit the
site manually. This streamlining of information delivery enables readers to efficiently
keep up with multiple blogs and websites in one place, using RSS readers or
aggregators. By combining blogs with RSS feeds, content creators can broaden their
reach and engagement, while users benefit from a curated and convenient way to access
fresh content from their favorite sources.

Web based Networks


There are two features of web pages that are considered as the basis of extracting social
relations: links and co-occurrence
The linking structure of the Web is considered as proxy for real world relationships
as links are chosen by the author of the page and connect to other information sources
that are considered authoritative and relevant enough to be mentioned
The biggest drawback of this approach is that such direct links between personal pages
are very sparse: due to the increasing size of the Web searching has taken over browsing
as the primary mode of navigation on the Web.
Co-occurrences of names in web pages can also be taken as evidence of relationships
and are more frequent phenomenon.
On the other hand, extracting relationships based on co-occurrence of the names of

DEPT. OF AIML, JNNCE 13


NOTES OF SOCIAL NETWORK ANALYSIS

individuals orinstitutions requires web mining as names are typically embedded in the
natural text of web pages. Web mining is the application of text mining to the content of
web pages. The techniques employed here are statistical methods possibly combined
with an analysis of the contents of web pages.
Using the search engine Altavista the system collected page counts for the individual
names as well as the number of pages where the names co-occurred.

Tie strength was calculated by dividing the number of co-occurrences with the number
of pages returned for the two names individually. Also known as the Jaccard-
coefficient, this is basically the ratio of the sizes of two sets: the intersection of the sets
of pages and their union.
The resulting value of tie strength is a number between zero (no co-occurrences) and
one (no separate mentioning, only co-occurrences). If this number has exceeded a
certain fixed threshold it was taken as evidence for the existence of a tie.
The number of pages that can be found for the given individuals or combination of
individuals. The reason is that the Jaccard-coefficient is a relative measure of co-
occurrence and it does nottake into account the absolute sizes of the sets. In case the
absolute sizes are very low we can easily get spurious results. A disadvantage of the
Jaccard-coefficient is that it penalizes ties between an individual whose name often
occurs on theWeb and less popular individuals.

For this reason we use an asymmetric variant of the coefficient. In particular, we divide
the number of pages for the individual with the number of pages for both names and
take it as evidence of a directed tie if this number reaches a certain threshold.
There have been several approaches to deal with name ambiguity. Instead of a single

DEPT. OF AIML, JNNCE 14


NOTES OF SOCIAL NETWORK ANALYSIS

name they assume to have a list of names related to each other. They disambiguate the
appearances by clustering the combined results returned by the search engine for the
individual names. The clustering can be based on various networks between the
returned webpages, e.g. based on hyperlinks between the pages, common links or
similarity in content.
The idea is that such key phrases can be added to the search query to reduce the set of
results to those related to the given target individual. We consider an ordered list of
pages for the first person and a set of pages for the second (therelevant set) as shown
in Figure:

Recall is defined as: Recall = TP /(TP + F N)


Precision is defined as: Precision = TP / (TP + FP)

We ask the search engine for the top N pages for both persons but in the case of the
secondperson the order is irrelevant() as the relevance for at the position.

rel(n) is 1 if the document at position n is the relevant set and zero.

The average precision method is more sophisticated in that it takes into account the order
in whichthe search engine returns document for a person: it assumes that names of other
persons that occurcloser to the top of the list represent more important contacts than
names that occur in pages at thebottom of the list.
This strength is determined by taking the number of the pages where the name of an
interest and the name of a person co-occur divided by the total number of pages about
the person.
Assign the expertise to an individual if this value is at least one standard deviation higher
than the mean of the values obtained for the same concept. The biggest technical
challenge in social network mining is the disambiguation of person names Persons
names exhibit the same problems of polysemy and synonymy that we have seen in
the general case of web search. Queries for researchers who commonly use different
variations of theirname (e.g. Jim Hendler vs. James Hendler).
Polysemy is the association of one word with two or more distinct meanings. A polyseme
is a wordor phrase with multiple meanings. In contrast, a one-to-one match between a

DEPT. OF AIML, JNNCE 15


NOTES OF SOCIAL NETWORK ANALYSIS

word and a meaningis called monosemy. According to some estimates, more than 40%
of English words have more than one meaning.The semantic qualities or sense relations
that exist between words with closely related meanings is Synonymy.

DEPT. OF AIML, JNNCE 16

You might also like