0% found this document useful (0 votes)
90 views8 pages

Big Data Analysis With Networks (BDAWN) : Name of The Faculty: Affiliation

This course provides a 3 credit introduction to analyzing large networked datasets using techniques from graph theory, linear algebra, and machine learning. It will be taught over 10 weekly sessions by Associate Professor Shankar Venkatagiri at IIM Bangalore. Students will learn to synthesize networks from structured data, detect communities, and apply these skills to domains like finance and healthcare. Pedagogy includes lectures, hands-on exercises using tools like Gephi and Spark, and a group project. Evaluation is based on midterm, final exams, project, and class participation. Prerequisites include basic programming skills, as students will work independently with software on their laptops.

Uploaded by

Shrey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views8 pages

Big Data Analysis With Networks (BDAWN) : Name of The Faculty: Affiliation

This course provides a 3 credit introduction to analyzing large networked datasets using techniques from graph theory, linear algebra, and machine learning. It will be taught over 10 weekly sessions by Associate Professor Shankar Venkatagiri at IIM Bangalore. Students will learn to synthesize networks from structured data, detect communities, and apply these skills to domains like finance and healthcare. Pedagogy includes lectures, hands-on exercises using tools like Gephi and Spark, and a group project. Evaluation is based on midterm, final exams, project, and class participation. Prerequisites include basic programming skills, as students will work independently with software on their laptops.

Uploaded by

Shrey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Big Data Analysis with Networks (BDAWN)

Name of the Faculty:


Shankar Venkatagiri
Affiliation:
(Institution / Organisation & Designation)
Associate Professor, IIM Bangalore
Teaching Area:
(such as Finance & Accounting; Marketing; Production
& Operations Management; Strategy)
Information Systems
This course may be offered to:
(PGP, FPM, PGPEM, PGPPM, EPGP)
http://www.iimb.ernet.in/programmes
PGPEM/EPGP/PGP
Credits (No. of hours):
(3 credits=30 classroom hours; 1.5 credits-15
classroom hours; session=90 minutes)
3 credits
Term / Quarter:
(Starting April /June /September/December)
Term 7 of PGPEM
Course Type:
(Regular: staggered across the term;
Workshop1: 3-5 continuous days)
Regular
There shall be 2 guest sessions, by 3 different
Additional information required guest speakers from industry
Are there any financial implications Travel and board for guest speakers, if
to this course? applicable.

1
Workshop course: Please provide reasons as to why the course is being offered in workshop mode and why it cannot be offered
as a regular course (that is spread over 10 weeks). As an institution, IIMB prefers courses offered in the regular mode, since it
results in better learning experience for the students and avoids overlapping of courses.

Big Data Analysis with Networks


Course Summary

The field of big data is grounded in the notion that an analyst can derive insights by
observing the entirety of the population (N = all). Contrast this with the field of
statistics, where a random sample from the data is typically used to drive inferences.
Popular examples of big data could be a repository of employee resumes, or closing stock
prices for a stock exchange. These are examples of “structured” data: methods to query
and study them have existed for decades. Relational databases commonly used to store
them.

Social networks like Facebook and Twitter have become the tools of choice for marketing
and communication. Linkages among the members of these networks can carry meaning,
and in a digital world, they also signal authority. There is no specification of the
structure that a post or a tweet must possess. It is imperative for business students to
understand the complexities that encompass such “unstructured” datasets, and to
develop a framework-based appreciation for the field. The BDAWN course has been
designed to address this need.

Graph theory and game theory form the main pillars of our discourse: they will help
represent and explain network structure and behaviour. While the instructor will
provide an adequate introduction, prior exposure to the topics would be considered
beneficial. The first half of the course is dedicated to the structural aspects, while the
second half is behavioural in its outlook. Any large sized dataset warrants a separate set
of techniques to manage and analyse it. Students will learn how to work with distributed
platforms like Spark and GraphX.

Networks of suppliers and buyers power the revenues of giants like Google. We will
show how economic and game theoretic concepts such as auctions and Nash equilibria
play a critical role in optimising value from digital advertising. The 2016 US election and
the Brexit vote were unduly influenced by platforms like Facebook; we will examine the
ethical dilemmas that stem from social preferences being made public.

Pre-requisites, inclusion/exclusion criteria (if any)

Big Data Analysis with Networks


If you have not programmed earlier, be forewarned: there is much by way of coding that
you may struggle with, compared to others who have prior experience. The instructor
will not double up as a trouble-shooter in class for technology issues.

Consulting with websites such as StackOverflow will be considered de rigueur. One can
always pick up R using the tutorial available on Datacamp. Students will be required to
independently work with various software packages (e.g. Gephi, Spark, GraphX) on their
laptops.
Learning Objectives
The goal of the course is to give the participant a hands-on feel for key concepts and
issues in the emerging field of networked data. The course is designed with the following
specific objectives and learning outcomes:

a. Understand how to synthesize meaningful networks from structured data


b. Detect communities and homophily in networked datasets and interpret them
c. Learn how networks can play a role in domains such as finance and healthcare

Pedagogy
The course employs a mix of lecture and hands-on exploration in class. Students are
required to bring their laptops to every session. A term project (max. group size of 3)
will help bring all the concepts together. The contents of the course are shaped by a
seminal reference by David Easley & Jon Kleinberg (EK) titled Networks, Crowds and
Markets (2010). In addition, a well-illustrated book on the subject is Network Science
(2016) by Albert-László Barabási. Notably, the authors have made available their books
online2.

Course Evaluation & Grading Pattern

Midterm 35%
Final 35%
Project 25%
Class Participation 5%

Constructing exams is an arduous task. There shall be no make-ups on any component.


Please note that the project shall involve multiple submissions and a face-to-face viva.
Any act of plagiarism – either in code or in written form – shall be subject to strict
disciplinary action as laid down by the norms of IIMB. You know the rules.

Session-wise plan

2
Accessible at: http://www.cs.cornell.edu/home/kleinber/networks-book/ and http://networksciencebook.org/

Big Data Analysis with Networks


Session Topic
1&2 An Overview of Big Data

The popular book by Viktor Mayer-Schonberger & Kenneth Cukier: Big Data
guides our discussion of the shifts in thinking about and processing big data,
facilitated by access to greater computational power and cheap storage.

Video: http://www.youtube.com/watch?v=bYS_4CWu3y8

3&4 Networks and their Visualisation

We dedicate this session to introduce a set of graph theory constructs that are
foundational to network analysis. Using the Gephi tool, we shall analyse a
collection of graph datasets.

Chapter 1 of Easley-Kleinberg (EK). Overview


1.1 Aspects of Networks
1.2 Central Themes and Topics

Chapter 2 of EK. Graphs


2.1 Basic Definitions
2.2 Paths and Connectivity
2.3 Distance and Breadth-First Search
2.4 Network Datasets: An Overview

5 Link Formation

Chapter 3 of EK. Strong and Weak Ties


3.1 Triadic Closure
3.2 The Strength of Weak Ties
3.3 Tie Strength and Network Structure in Large-Scale Data
3.4 Tie Strength, Social Media, and Passive Engagement
3.5 Closure, Structural Holes, and Social Capital

Reading: Borgatti, S. et al. Network Analysis in the Social Sciences.


Science, 13 February 2009 pp. 892-895

6 Community Detection

Chapter 4 of EK. Networks in Their Surrounding Contexts


4.1 Homophily
4.2 Mechanisms Underlying Homophily: Selection and Social Influence
4.3 Affiliation
4.4 Tracking Link Formation in On-Line Data
4.5 A Spatial Model of Segregation

Big Data Analysis with Networks


7 Random and Scale free Networks

A probabilistic analysis of random graphs was undertaken by Erdos and Renyi


during the 1950s. Their results establish that the degree distribution is Poisson.
Three decades later, Barabasi demonstrated that real networks, which contain
hubs, are not Poisson; instead, they are governed by power laws. In this session,
we explore the scale free property of some real networks.

Readings: Network Science, Chapters 3 and 4

8 Network Analysis with R

A programmatic tool becomes necessary when graphs get large and unwieldy.
In this session, we shall use various graphical packages in R.

9 & 10 Mathematics of Networks

In the first session, we get acquainted with linear algebra – specifically,


matrices, eigenvalues, and how they are used to derive PageRank. In the second
session, we shall examine network structures underlying the Internet, and
derive PageRank using an iterative approach.

Chapter 13 of EK. The Structure of the Web


13.1 The World Wide Web
13.2 Information Networks, Hypertext, and Associative Memory
13.3 The Web as a Directed Graph
13.4 The Bow-Tie Structure of the Web
13.5 The Emergence of Web 2.0

Chapter 14 of EK. Link Analysis and Web Search


14.1 Searching the Web: The Problem of Ranking
14.2 Link Analysis using Hubs and Authorities
14.3 PageRank
14.4 Applying Link Analysis in Modern Web Search
14.5 Applications beyond the Web

11 Distributed Computing

We shall learn how to configure Spark on a laptop using a virtual machine


(Docker), and carry out big data tasks. The fundamental constructs of RDD
and in-memory distributed computing shall be elaborated with examples in
the Scala language.

Video: https://www.youtube.com/watch?v=QaoJNXW6SQo
Tool Demonstration: Spark

Big Data Analysis with Networks


12 & 13 Parallelised Graph Processing

GraphX is a powerful library to handle network data. In the first session, we


will cover basic classes and operations that help us build or import graphs.
Next, we learn how to obtain quantities such as PageRank, triangle count,
shortest paths, connected components and so on. The second session is
dedicated to algorithms in supervised, unsupervised and semi- supervised
machine learning using graphs.

Tool Demonstration: GraphX

14 Game Theory

Over the next four sessions, we shall build the theory and constructs to
understand how digital advertising platforms optimise value by facilitating
transactions between buyers and sellers over a network.

Chapter 6 of EK. Games


6.1 What is a Game?
6.2 Reasoning about Behavior in a Game
6.3 Best Responses and Dominant Strategies
6.4 Nash Equilibrium
6.5 Multiple Equilibria: Coordination Games
6.8 Mixed Strategies: Examples and Empirical Analysis
6.9 Pareto-Optimality and Social Optimality

15 First and Second-Price auctions

Chapter 9 of EK. Auctions


9.1 Types of Auctions
9.2 When are Auctions Appropriate?
9.3 Relationships between Different Auction Formats
9.4 Second-Price Auctions
9.5 First-Price Auctions and Other Formats
9.6 Common Values and The Winner's Curse

16 Matching Markets

Chapter 10 of EK. Matching Markets


10.1 Bipartite Graphs and Perfect Matchings
10.2 Valuations and Optimal Assignments
10.3 Prices and the Market-Clearing Property
10.4 Constructing a Set of Market-Clearing Prices
10.5 How Does this Relate to Single-Item Auctions?

17 VCG and GSP Auctions

Big Data Analysis with Networks


Chapter 15 of EK. Sponsored Search Markets
15.1 Advertising Tied to Search Behaviour
15.2 Advertising as a Matching Market
15.3 Encouraging Truthful Bidding in Matching Markets: The VCG Principle
15.4 Analyzing the VCG Procedure: Truth-Telling as a Dominant Strategy
15.5 The Generalized Second Price Auction
15.6 Equilibria of the Generalized Second Price Auction
15.7 Ad Quality
15.8 Complex Queries and Interactions Among Keywords

18 Network Robustness

Like with power grids, networks too experience cascading failures. In this
session, we examine how certain graph structures are “robust” to failure. The
Molloy Reed criterion states the condition for a randomly wired network to
have a giant component. Using simulations, we study network breakdowns
and percolation, which can be considered as two sides to a coin. We conclude
with a discussion of information cascades on social networks like Twitter.

Reading: Network Science, Chapter 8

19 Spreading Phenomena

Network analysis offers a powerful lens to examine processes that involve


spreading, such as infectious diseases, computer viruses, and fake news on
social media. Using the framework of epidemiological modelling, an
individual can be classified into one of three states: susceptible (S), infectious
(I) and recovered (R). We will particularly study the SI, SIS and SIR models.

Reading: Network Science, Chapter 10


Optional: Leskovec, J., Adamic, L.A. & Huberman, B.A. The
Dynamics of Viral Marketing. ACM Transactions on the Web, Vol. 1,
No. 1, pp. 1-39, May 2007

20 Societal implications

The field of mass communications took a sharp turn with what became
popularly known as Web 2.0 around the turn of this century: user generated
content was at the heart of this revolution. The notion of targeted advertising
by the likes of Google was vastly improved upon by Facebook with its
incisive insights into the intimate details of an individual. However, the user’s
privacy was sacrificed along the way, and one’s personal data became a
marketable commodity. An outfit by the name of Cambridge Analytica
capitalised on Facebook’s knowledge of its users, and played a destructive

Big Data Analysis with Networks


role in the 2016 US elections as well as the Brexit campaign. This final
session is dedicated to discussing such abuses of big data.

Reading: Grasseger, H. & Krogerus, M. The Data that turned the


World Upside- down. Motherboard (Jan 2017)

Reading: Watts, D. & Steve Hasker. “Marketing in an Unpredictable


World,” Harvard Business Review, 2006

Big Data Analysis with Networks

You might also like