Big Data Analysis With Networks (BDAWN) : Name of The Faculty: Affiliation
Big Data Analysis With Networks (BDAWN) : Name of The Faculty: Affiliation
1
Workshop course: Please provide reasons as to why the course is being offered in workshop mode and why it cannot be offered
as a regular course (that is spread over 10 weeks). As an institution, IIMB prefers courses offered in the regular mode, since it
results in better learning experience for the students and avoids overlapping of courses.
The field of big data is grounded in the notion that an analyst can derive insights by
observing the entirety of the population (N = all). Contrast this with the field of
statistics, where a random sample from the data is typically used to drive inferences.
Popular examples of big data could be a repository of employee resumes, or closing stock
prices for a stock exchange. These are examples of “structured” data: methods to query
and study them have existed for decades. Relational databases commonly used to store
them.
Social networks like Facebook and Twitter have become the tools of choice for marketing
and communication. Linkages among the members of these networks can carry meaning,
and in a digital world, they also signal authority. There is no specification of the
structure that a post or a tweet must possess. It is imperative for business students to
understand the complexities that encompass such “unstructured” datasets, and to
develop a framework-based appreciation for the field. The BDAWN course has been
designed to address this need.
Graph theory and game theory form the main pillars of our discourse: they will help
represent and explain network structure and behaviour. While the instructor will
provide an adequate introduction, prior exposure to the topics would be considered
beneficial. The first half of the course is dedicated to the structural aspects, while the
second half is behavioural in its outlook. Any large sized dataset warrants a separate set
of techniques to manage and analyse it. Students will learn how to work with distributed
platforms like Spark and GraphX.
Networks of suppliers and buyers power the revenues of giants like Google. We will
show how economic and game theoretic concepts such as auctions and Nash equilibria
play a critical role in optimising value from digital advertising. The 2016 US election and
the Brexit vote were unduly influenced by platforms like Facebook; we will examine the
ethical dilemmas that stem from social preferences being made public.
Consulting with websites such as StackOverflow will be considered de rigueur. One can
always pick up R using the tutorial available on Datacamp. Students will be required to
independently work with various software packages (e.g. Gephi, Spark, GraphX) on their
laptops.
Learning Objectives
The goal of the course is to give the participant a hands-on feel for key concepts and
issues in the emerging field of networked data. The course is designed with the following
specific objectives and learning outcomes:
Pedagogy
The course employs a mix of lecture and hands-on exploration in class. Students are
required to bring their laptops to every session. A term project (max. group size of 3)
will help bring all the concepts together. The contents of the course are shaped by a
seminal reference by David Easley & Jon Kleinberg (EK) titled Networks, Crowds and
Markets (2010). In addition, a well-illustrated book on the subject is Network Science
(2016) by Albert-László Barabási. Notably, the authors have made available their books
online2.
Midterm 35%
Final 35%
Project 25%
Class Participation 5%
Session-wise plan
2
Accessible at: http://www.cs.cornell.edu/home/kleinber/networks-book/ and http://networksciencebook.org/
The popular book by Viktor Mayer-Schonberger & Kenneth Cukier: Big Data
guides our discussion of the shifts in thinking about and processing big data,
facilitated by access to greater computational power and cheap storage.
Video: http://www.youtube.com/watch?v=bYS_4CWu3y8
We dedicate this session to introduce a set of graph theory constructs that are
foundational to network analysis. Using the Gephi tool, we shall analyse a
collection of graph datasets.
5 Link Formation
6 Community Detection
A programmatic tool becomes necessary when graphs get large and unwieldy.
In this session, we shall use various graphical packages in R.
11 Distributed Computing
Video: https://www.youtube.com/watch?v=QaoJNXW6SQo
Tool Demonstration: Spark
14 Game Theory
Over the next four sessions, we shall build the theory and constructs to
understand how digital advertising platforms optimise value by facilitating
transactions between buyers and sellers over a network.
16 Matching Markets
18 Network Robustness
Like with power grids, networks too experience cascading failures. In this
session, we examine how certain graph structures are “robust” to failure. The
Molloy Reed criterion states the condition for a randomly wired network to
have a giant component. Using simulations, we study network breakdowns
and percolation, which can be considered as two sides to a coin. We conclude
with a discussion of information cascades on social networks like Twitter.
19 Spreading Phenomena
20 Societal implications
The field of mass communications took a sharp turn with what became
popularly known as Web 2.0 around the turn of this century: user generated
content was at the heart of this revolution. The notion of targeted advertising
by the likes of Google was vastly improved upon by Facebook with its
incisive insights into the intimate details of an individual. However, the user’s
privacy was sacrificed along the way, and one’s personal data became a
marketable commodity. An outfit by the name of Cambridge Analytica
capitalised on Facebook’s knowledge of its users, and played a destructive