Simon Lindgren Data Theory Interpretive Sociology and Computational Methods Polity 2020

Contents
Title page
Copyright page
Introduction: Data/Theory
Outline of this book
1 Beyond Method
Datafication
Data theory
Verstehen and Evidenz
Theories old and new
A bit of anarchy
Data piñata
Breaking things to move forward
A patchwork of solutions
The interpretive interface
Instruments of revelation
2 Decoding Social Forms
The odd places of politics
Social media politics for better or for worse
Virality and memes
The Weber connection: Ambivalence and trolling as ideal types
The Durkheim connection: Society > the sum of its parts
The Simmel connection: Social forms
Social cryptography
3 Unintended Consequences
The thumb-typing leader of the free world
Unpacking ambivalence
Error
L’affaire covfefe
Social media backfire
4 Actor-Networks
A sociology of translation
Finding the actors
Making connections
Who owns theory?
5 Collective Representations
Words and the company they keep
Learning from afar
Reading Reddit
Mapping the language of Reddit
Nodes and chains
6 Symbolic Power
Social fields
Capital and habitus
Adapting Bourdieu’s capital forms to social media data
A social space of political tweets

7 Theoretical I/O
Gaining theoretical sensitivity
Data science as ethnography
Conclusion: Theory/Data
Brutalising theory
Notes on ethics
Final remarks
References
Index
End User License Agreement
Data Theory
Interpretive Sociology and Computational Methods
Simon Lindgren
polity
Copyright page
Copyright © Simon Lindgren 2020

The right of Simon Lindgren to be identified as Author of this Work has been
asserted in accordance with the UK Copyright, Designs and Patents Act 1988.
First published in 2020 by Polity Press
Polity Press
65 Bridge Street
Cambridge CB2 1UR, UK
Polity Press
101 Station Landing
Suite 300
Medford, MA 02155, USA
All rights reserved. Except for the quotation of short passages for the purpose of
criticism and review, no part of this publication may be reproduced, stored in a
retrieval system or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording or otherwise, without the prior permission of
the publisher.
ISBN-13: 978-1-5095-3927-7 (hardback)
ISBN-13: 978-1-5095-3928-4 (paperback)
A catalogue record for this book is available from the British Library.
Typeset in 10.5 on 12pt Sabon
by Fakenham Prepress Solutions, Fakenham, Norfolk NR21 8NL
Printed and bound in Great Britain by TJ International
The publisher has used its best endeavours to ensure that the URLs for external
websites referred to in this book are correct and active at the time of going to
press. However, the publisher has no responsibility for the websites and can make
no guarantee that a site will remain live or that the content is or will remain
appropriate.
Every effort has been made to trace all copyright holders, but if any have been
overlooked the publisher will be pleased to include any necessary credits in any
subsequent reprint or edition.
For further information on Polity, visit our website: politybooks.com
Introduction: Data/Theory
This book is a tentative and modest proposal on how to think and operate as a
theoretically sensitive social scientist in the age of datafication, especially
when researching sociality and politics through the internet. The book emphasises
the need to think freely and openly about both theory and method, and goes beyond
some of the ways of doing social research that are dominant today. It does so by
playfully and tentatively combining elements of theories and methods, some of which
are commonly seen as being incompatible.
In the face of the availability of new types of digital research data, and the
contemporary popularity of computational methods in an increasing range of
scholarly fields, the book should be read as an explorative attempt to make
synergetic gains by harnessing the respective powers of interpretive social
analysis and computational methods within one and the same research framework. This
is not to say that the proposed hybrid approach must replace any other existing
approach, but I want to explore potential points of contact between some concepts
and strategies that are not regularly combined.
The book is tentative because it does not enter any deeper discussion about the
ontological, epistemological, or technical appropriateness of combining, for
example, vector models with selectively read parts of Laclau and Mouffe’s discourse
analysis, or social media metrics with elements of Bourdieu’s theory of practice.
Nor does it address, in any conventional way, the discussions about using
‘quantitative’ scores and measures as input for ‘qualitative’ readings, that its
case examples are likely to invite. This conscious choice has been made in order
for the argument not to get stuck with such things. The book as a whole should be
read as a proposal: ‘What if we did it like this?’ This is a way of exploring how
far things can bend.
Furthermore, the proposal that I make in this book is meant to be modest. A

significant amount of valuable and important work has already been done, and
continues to be carried out, along the rough lines that I am suggesting, by
scholars in fields such as science, technology, and society studies (Marres, 2017),
mixed methods research (Hesse-Biber and Griffin, 2013), analytical sociology
(Keuschnigg, Lovsjö, and Hedström, 2018), and computational social science (Lazer
et al., 2009). In developing my contribution to the discussion of how the
data/theory equation can be balanced in contemporary data-driven social research, I
draw on influences from all of these areas. In addition, there is a vast literature
in the area of the philosophy of science, where I am by no means an expert, but
with which the book still sometimes enters into partial dialogue. Therefore, the
book should be read for what it is – that is, an account by an alleged
‘qualitative’ sociologist entering the field of computational methods, with the aim
of tracing the outlines of a hybrid methodological position potentially to be held,
not in particular by data scientists or computational social scientists, nor by
digital ethnographers or anthropologists, but by scholars wanting to maintain an
interpretative sociological framework for analysis, while incorporating
computational methods that follow society’s datafication.
Outline of this book
The first chapter of the book, Beyond Method, is about the need to rethink and
repurpose research methods, as well as the role of social theory. I argue that
social researchers must move beyond prevailing notions of methodology in order to
find new and creative solutions in response to an increasingly complex social
reality.
The second chapter, Decoding Social Forms, turns to the empirical subject area of
the book – social media politics – and continues the discussion about how to
research complex sociality. Social research, and its object of study (society), are
equally messy, in ways that should be embraced rather than avoided. In addressing
how social theory can help in navigating the complexities, the chapter covers a set
of key concepts, drawing on classic sociological theorists such as Weber, Durkheim,
and Simmel.
The third chapter, Unintended Consequences, continues to make the argument that
pre-digital social theory can be repurposed to make sense of ambivalent sociality
in a datafied society. In the chapter, we approach US President Donald Trump’s
infamous ‘covfefe’ tweet from the perspective of the sociology of unanticipated
consequences, in order to disentangle its surrounding twisted web of tweets, talk,
and discourse. This is a case study, presented before we delve deeper into the
territory of computational methods in the chapters that follow, to illustrate how
social theory can aid the disentanglement of ambivalent online social practice. In
this particular case, we will take help from sociologist Robert K. Merton’s
perspective on the sometimes unpredictable, and possibly ambivalent, relationships
between what people do, or intend, and the outcomes of those actions.
Chapter 4, Actor-Networks, provides an example of how computational approaches can

be combined with interpretive theoretical analysis. This is done here, in an area –
science, technology and society studies (Callon et al., 1983; Marres, 2017, pp.
106–8) – where such connections have already been made, and where there is great
potential. The case analysis in the chapter is based on a dataset consisting of 1.1
million tweets, which were collected using search terms relating to climate change
discourse. The chapter uses these data to explore how computational approaches to
text analysis can be brought together with actor-network theory (Callon, 2001;
Latour, 2005; Law, 1999). This is done by combining elements of the theory with
suitable techniques for processing the tweets. First, an analysis based in actor-
network theory needs to identify social actors (human and others) in the social
context that is under analysis. This is done in this research example with the help
of the computational linguistics technique of Named Entity Recognition (Grishman
and Sundheim, 1996), which algorithmically identifies and tags any names of people,
places, organisations, corporations, nationalities, events, and so on, that appear
in the tweets. Second, actor-network theory is interested in how actors connect in
relational systems. It wants to map chains of association between humans, things,
and ideas that play a part in how social reality is constructed, and how ‘truths’
are manifested. In this chapter’s case example, information about such associations
was gained by analysing the network contexts of the mapped actors with the help of
topic modelling through so-called Latent Dirichlet Allocation (Blei, Ng, and
Jordan, 2003). The information gathered through that machine learning model, in
combination with techniques for visualisation from the field of social network
analysis (Bastian, Heymann, and Jacomy, 2009; Shannon, 2003; Wasserman and Faust,
1994), enables the drawing of tangible maps of actor-networks. The chapter
concludes by returning to the general theme of this book, by raising and discussing
the issue of how and why theories and methods can, and must, be adapted and tweaked
in ways that mean simplification as well as promoting the emergence of new
analytical opportunities. Theories, as well as methods, should be seen as open-
source: free for all to share, alter, and transform.
Chapter 5, Collective Representations, is focused on introducing early twentieth-

century approaches to the sociology of knowledge to the age of the internet, and
especially to the research context of current data science. The key argument in the
classic, Durkheimian, approach is that language, conceptual thinking, and logic are
shaped by the social contexts out of which they arise. This notion, that
stereotypes, categorisations, and manners of speaking that exert great power over
our reasoning and actions are social products, has formed the basis for a series of
other constructionist perspectives on society and culture over the years. The
chapter discusses some modern developments in the sociology of knowledge, alongside
social constructionism, and poststructural perspectives such as those of Laclau and
Mouffe (1985), and Deleuze and Guattari (1987), where abstract theoretical notions
such as discourse, rhizome, and assemblage are exploratively brought together with
data science methods. The focus is particularly on text mining through machine
learning, and specifically on word embedding models. The chapter aims to show how
one can approach, much as a social anthropologist would, massively networked social
settings online through big data techniques, and draw on sociological theory in
decoding their worldviews. The chapter includes an empirical case study of the
forum website Reddit, based on a comprehensive dataset including more than 1.2
billion posts.
The next section of the book, Chapter 6, Symbolic Power, works through an example
of how a well-established social theory can be transformed and adapted to enable
operationalisations that are fit for social media datasets. The case in focus is
Pierre Bourdieu’s theory of social practice (Bourdieu, 1977, 1984, 1992), by which
he argued that the social status of an individual is the result of how a variety of
resources are converted in a multitude of relational social fields. In his general
theory, Bourdieu imagined society as a multidimensional space, where the resources
of the individual – consciously and unconsciously – become tools for achieving
status to the degree that they are recognised as important by social others. He
conceptualised the resources in terms of different forms of ‘symbolic capital’:
economic capital, social capital, cultural capital. In spite of being an
anthropologist rather than a mathematician, Bourdieu even summarised his grand
theory in terms of an equation: [(habitus) (capital)] + field = practice. In spite
of these spatial and mathematical metaphors, large-scale empirical explorations and
validations of his influential theory have faced serious empirical and
computational challenges. This chapter’s case example makes use of a dataset of 1.7
million tweets matching the main hashtag for the 2018 Swedish general election
(#val2018). Approaching the question of how power and influence are constituted in
political social media discourse, the analysis builds on a conscious and quite far-
reaching modification of Bourdieu’s taxonomy of capital forms, in order to make
them measurable through social media data.
The seventh chapter, Theoretical I/O, gets more hands-on in terms of how a more
generic analytical framework that combines interpretive sociology with data science
can be developed. I revisit sociological methodologist Barney Glaser’s (1978)
writings on theoretical sensitivity, and argue that his vision for the research
process can be translated into the age of data science. I present a model for a
research process that alternates between data and computation on the one side, and
theory and interpretation on the other. The chapter also includes a concrete
example of how to apply the approach. This is in the form of a case study that uses
Marxist critical theory, together with the empirical case of the #deletefacebook
movement on Twitter, in the wake of the Cambridge Analytica scandal in 2018. The
case is used to explore and illustrate how the outlined approach can be realised in
empirical and analytical practice.
The book ends with a concluding section in which I summarise and discuss the data
theory approach at an overarching level.
Beyond Method
In light of the developments towards a datafication of society, there is a need to
reinvent and adapt our research approaches in order to make them more relevant and
useful. This demands a creative and somewhat anarchistic approach to existing
theories and methods.
Sociologist John Law argues, while acknowledging that conventional research methods
are indeed useful in some cases, that there is an urgent need to ‘remake social
science in ways better equipped to deal with mess, confusion and relative disorder’
(Law, 2004, p. 11). The need to go beyond methods as we know them is underpinned by
the fact that social science is not very good at understanding ‘things that are
complex, diffuse and messy’. This is because the simple and clear descriptions that
most conventional research methods aim for ‘don’t work if what they are describing
is not itself very coherent’ (Law, 2004, p. 2). Especially in light of the high
level of complexity of twenty-first-century networked society, it is imperative
that we develop more ambivalent methodologies to account for our increasingly
ambivalent object of study.
Paul Feyerabend, the refreshingly provocative enfant terrible of the philosophy of

science, wrote on the ‘complexity of human change’ and on the ‘unpredictable
character of the ultimate consequences’ of people’s actions. ‘Are we really to
believe’, he asked, ‘that the naive and simple-minded rules which methodologists
take as their guide are capable of accounting for such a maze of interactions?’
(Feyerabend, 1975, p. 9). This question is today more relevant than ever, as
datafication affects not only research practices, but also society as such, and
thereby the very object of study of our research. Social expressions in the age of
the internet are fragmented and entangled, in a system of platforms and relations
enabled through digital technologies.
As argued by Nick Couldry and Andreas Hepp (2017), we now live in an age of deep
mediatisation, where media can no longer be seen as specific channels of
centralised content. Rather, media are now better understood as platforms for
enacting social life (Dijck, Poell, and Waal, 2018). This is symptomatic of a
transition from a mass media system to a social media ecology. The transformation
has been described in terms of a rise of ‘mass self-communication’ (Castells,
2009), ‘networked individualism’ (Rainie and Wellman, 2012), and ‘connective
action’ (Bennett and Segerberg, 2012). In sum, such perspectives argue that
politics, opinions, and ideas, as well as social life in general now function in
accordance with a much more decentralised and democratic logic (Ito, 2008), but
also in more volatile and ‘viral’ ways (Sampson, 2012). This represents something
much more than a mere technological transition. Following ongoing processes of
digitalisation and datafication, our social world is suffused with technological
media of communication that bring about a refiguring of the world in, and on, which
we act. As argued by Couldry and Hepp (2017), social relations today are actualised
through a system of variously connected digital platforms, that bring about a much
more intense embedding of media in social processes than was ever the case before.
Now there is a need to adapt social science theories and methods in hybrid ways to
better account for this situation.
The digital society has been characterised as a ‘wicked system’ (Törnberg, 2017, p.
52), the analysis of which demands a critical methodological pluralism. In fact,
most social systems have this emergent property of wickedness to some degree – a
‘combination of complexity and complicatedness that entails plasticity and deep
ontological uncertainty’ (Törnberg, 2017, p. 25). In the specific case of social
media and politics, internet researcher Helen Margetts and her colleagues argue
that ‘social media are a source of instability and turbulence in political life’,
which creates an uncertain environment (Margetts et al., 2017, p. 74). They suggest
that:
Online platforms exhibit what other people are doing in real time and make other
people aware of what they themselves are doing, creating feedback loops and chain
reactions that draw in more people, whose actions in turn are likely to influence
others. It seems reasonable to claim that mobilizations formed in this way are
vulnerable to the impulses from which they start, which can push them over into
critical mass, or cause them to fade and die almost as soon as they appear, making
them hard to understand or predict.
(Margetts et al., 2017, p. 74)
As these authors argue, there is indeed a complexity (and complicatedness) of

factors, levels, forces, and influences involved, at all levels of the social –
especially in the digital society. And this book, in essence, is about approaching
this complexity analytically, with a theoretical and methodological openness that
can account for this turbulent, wicked, anarchistic, and ambivalent nature.
Datafication
The ongoing development of the internet and social media increasingly transforms
our lives into data. Vast amounts of information about individuals and their
interactions are being generated and recorded – directly and indirectly –
voluntarily and involuntarily – for free and for profit. These volumes of data
offer unforeseen and exciting opportunities for social research. It is because of
this that we have witnessed in recent years the rise of the much-hyped phenomenon
of big data. Alongside this development, computational methods have become
increasingly popular also in scholarly areas where they have not been commonly used
before.
’Big data’ refers broadly to the handling and analysis of massively large datasets.
According to a popular definition, big data conforms with three Vs. It has volume
(enormous quantities of data), velocity (is generated in real-time), and variety
(can be structured, semi-structured, or unstructured). Various writers and
researchers have suggested a number of other criteria be added to this, such as
exhaustivity, relationality, veracity, and value. Big data has indeed been a mantra
in the fields of commercial marketing and political campaigning throughout the last
decade. High hopes and strong beliefs have been connected with how these new types
of data – enabled by people’s use of the internet, social media, and technological
devices – might be collected and analysed to generate knowledge about how to get
people to click on adverts, or to buy things or ideas. Similar methods are also
becoming more and more used in fields such as healthcare and urban planning.
All of this is a consequence of what can be called the datafication of social life.
This is what happens when ‘we have massive amounts of data about many aspects of
our lives, and, simultaneously, an abundance of inexpensive computing power’
(Schutt and O’Neil, 2013, p. 4). Also beyond the internet and social media, there
has been an increased influence of data into most industries and sectors. There has
been huge interest, and many efforts made, to try to extract new forms of insight
and generate new kinds of value in a variety of settings. As explained on Wikipedia
(2018), lately ‘the term “big data” tends to refer to the use of predictive
analytics, user behavior analytics, or certain other advanced data analytics
methods that extract value from data, and seldom to a particular size of data set’.
As underlined by internet researchers Kate Crawford and danah boyd, ‘big data’ is
in fact a poorly chosen term. This is because its alleged power is not mainly about
its size, but about its capacity to compare, connect, aggregate, and cross-
reference many different types of datasets (that also often happen to be big). They
define big data as:
a cultural, technological, and scholarly phenomenon that rests on the interplay of:
(1) Technology: maximizing computation power and algorithmic accuracy to gather,
analyze, link, and compare large data sets. (2) Analysis: drawing on large data
sets to identify patterns in order to make economic, social, technical, and legal
claims. (3) Mythology: the widespread belief that large data sets offer a higher
form of intelligence and knowledge that can generate insights that were previously
impossible, with the aura of truth, objectivity, and accuracy.
(boyd and Crawford, 2012, p. 664)
From a critically sociological perspective, Lupton (2014, p. 101) argues that the
hype that surrounds the new technological possibilities afforded by big data
analytics contribute to the belief that such data are ‘raw materials’ for
information – that they contain the untarnished truth about society and sociality.
In reality, each step of the process in the generation of big data relies on a
number of human decisions relating to selection, judgement, interpretation, and
action. Therefore, the data that we will have at hand are always configured via
beliefs, values, and choices that ‘“cook” the data from the very beginning so that
they are never in a “raw” state’. So, there is no such thing as raw data, even
though the orderliness of neatly harvested and stored big datasets can create an
illusion to the contrary.
Sociologist David Beer (2016, p. 149) argues that we now live in ‘a culture that is
shaped and populated with numbers’, where trust and interest in anything that
cannot be quantified diminishes. Furthermore, in the age of big data, there is an
obsession with causation. As boyd and Crawford (2012, p. 665) argue, the mirage and
mythology of big data demand that a number of critical questions are raised with
regard to ‘what all this data means, who gets access to what data, how data
analysis is employed, and to what ends’. There is a risk that the lure of big data
will sideline other forms of analysis, and that other alternative methods with
which to analyse the beliefs, choices, expressions, and strategies of people are
pushed aside by the sheer volume of numbers. ‘Bigger data are not always better
data’, they write, and the analysis of them will not necessarily lead to insights
about society that are more true than what can be achieved through other data and
methods.
Many popular examples exist for illustrating how datafication is growing

exponentially intense, the most famous one being Moore’s Law, according to which
computers and their memory and storage will become ever more powerful by each unit
of time (Moore, 1965). Another telling comparison is this one: The Great Library of
Alexandria, which was established in the third century BCE, was regarded as the
centre of knowledge in the ancient world. It was believed to hold within it the sum
total of all human knowledge. Its entire collection has been estimated by
historians to have been the size of 1,200 million terabytes. Today however, we have
enough data in the world to give more than 300 times as much data to each person
alive (Cukier and Mayer-Schoenberger, 2013).
We are no doubt in the midst of an ongoing data explosion, and along with it the
development of ‘data science’. Data science is an interdisciplinarily oriented
specialisation at the intersection of statistics and computer science, focusing on
machine learning and other forms of algorithmic processing of large datasets to
‘liberate and create meaning from raw data’ rather than on hypothesis testing
(Efron and Hastie, 2016, p. 451). Data science is a successor to the form of ‘data
analysis’ proposed by the statistician John W. Tukey, whose analytical framework
focused on ‘looking at data to see what it seems to say’, making partial
descriptions and trying ‘to look beneath them for new insights’. In his exploratory
vein, Tukey (1977, p. v) also emphasised that this type of analysis was concerned
‘with appearance, not with confirmation’. This focus on mathematical structure and
algorithmic thinking, rather than on inferential statistical justification, is a
precursor to the flourishing of data science in the wake of datafication.
All the things that people do online in the context of social media generate vast
volumes of sociologically interesting data. Such data have been approached in
highly data-driven ways within the field of data science, where the aim is often to
get a general picture of some particular social pattern or process. Being data-
driven is not a bad thing, but there must always be a balance between data and
theory – between information and its interpretation. This is where sociology and
social theory come into the picture, as they offer a wide range of conceptual
frameworks, theories, that can aid in the analysis and understanding of the large
amounts and many forms of social data that are proliferated in today’s world.
But in those cases where we see big data being analysed, there is far too often a
disconnect between the data and the theory. One explanation for this may be that
the popularity and impact of data science makes its data-driven ethos spill over
also into the academic fields that try to learn from it. This means that we risk
forgetting about theoretical analysis, which may fade in the light of sparkling
infographics.
It is my argument that the social research that relies heavily on the computational
amassing and processing of data must also have a theoretical sensitivity to it.
While purely computational methods are extremely helpful when wrangling the units
of information, the meanings behind the messy social data which are generated in
this age of datafication can be better untangled if we also make use of the rich
interpretive toolkit provided by sociological theories and theorising. The data do
not speak for themselves, even though some big data evangelists have claimed that
to be the case (Anderson, 2008).
Big data and data science are partly technological phenomena, which are about using
computing power and algorithms to collect and analyse comparatively large datasets
of, often, unstructured information. But they are also most prominently cultural
and political phenomena that come along with the idea that huge unstructured
datasets, often based on social media interactions and other digital traces left by
people, when paired with methods like machine learning and natural language
processing, can offer a higher form of truth which can be computationally distilled
rather than interpretively achieved.
Such mythological beliefs are not new, however, as there has long been, if not a
hierarchy, at least a strict division of research methods within the cultural and
social sciences, where some methods – those that have come to be labelled
‘quantitative’, and that analyse data tables with statistical tools – have been
vested with an ‘aura of truth, objectivity, and accuracy’ (boyd and Crawford, 2012,
p. 663). Other methods – those commonly named ‘qualitative’, and involving close
readings of textual data from interviews, observations, and documents – are seen as
more interpretive and subjective, rendering richer but also (allegedly) more
problematic results. This book rests on the belief that this distinction is not
only annoying, but also wrong. We can get at approximations of ‘the truth’ by
analysing social and cultural patterns, and those analyses are by definition
interpretive, no matter the chosen methodological strategy. Especially in this day
and age where data, the bigger the better, are fetishised, it is high time to move
on from the unproductive dichotomy of ‘qualitative’ versus ‘quantitative’.
Data theory
Pure data science tends to focus very strongly simply on what is researchable. It
goes for the issues for which there are data, no matter if those issues have any
real-life urgency or not. The last decade has seen parts of the field of data
science and parts of the social sciences become entangled in ways that risk a loss
of theoretical grounding. In a seminal paper outlining the emerging discipline of
‘computational social science’, David Lazer and colleagues wrote that:
We live life in the network. We check our e-mails regularly, make mobile phone
calls from almost any location, swipe transit cards to use public transportation,
and make purchases with credit cards. Our movements in public places may be
captured by video cameras, and our medical records stored as digital files. We may
post blog entries accessible to anyone, or maintain friendships through online
social networks. Each of these transactions leaves digital traces that can be
compiled into comprehensive pictures of both individual and group behavior, with
the potential to transform our understanding of our lives, organizations, and
societies.
(Lazer et al., 2009, p. 721)
Furthermore, they argued that there was an inherent risk in the fact that existing
social theories were ‘built mostly on a foundation of one-time “snapshot” data’ and
that they therefore may not be fit to explain the ‘qualitatively new perspectives’
on human behaviour offered by the ‘vast, emerging data sets on how people interact’
(Lazer et al., 2009, p. 723). While I agree that social analysis must be re-thought
in light of these developments, I am not so sure that it is simply about
‘compiling’ the data, and then being prepared that existing theories may no longer
work. Rather, I argue, we should trust a bit more that even though the size and
dynamics of the data may be previously unseen, the social patterns that they can
lay bare – if adequately analysed – can still largely be interpreted with the help
of ‘old’ theories, and with an ‘old’ approach to theorising. After all, theories
are not designed to understand particular forms of data, but instead the sociality
to which they bear witness.
My point is that data need theory, for considering both the data, the methods, the
ethics, and the results of the research. By extension, still, theories may always
need to be updated, revised, discarded, or newly invented – but that has always
been true. As such, this book is therefore positioned within the broad field of
‘digital sociology’ as outlined by authors such as Deborah Lupton (2014) and
Noortje Marres (2017). One strand within the debate about what digital sociology
is, and what it entails, relates to the emergence of ‘digital methods’. In general,
there is widespread disagreement about what such methods are, and whether there
should be a focus on continuity with established social research traditions, or on
revolutionary innovation. In a sense, this book can be read as one out of many
possible ventures in the direction pointed out by Noortje Marres when she writes:
The digitization of social life and social research opens up anew long-standing
questions about the relations between different methodological traditions in social
enquiry: what are the defining methods of sociological research? Are some methods
better attuned to digital environments, devices and practices than others? Do
interpretative and quantitative methods present distinct methodological frameworks,
or can these be combined?
(Marres, 2017, p. 105)
With co-author Caroline Gerlitz, Marres suggests that we go beyond previous

divisions of methods by thinking in terms of ‘interface methods’ (Marres and
Gerlitz, 2016). This means highlighting that digital methods are dynamic and under-
determined, and that a multitude of methodologies are intersecting in digital
research. By recognising ‘the unstable identity of digital social research
techniques’, we can ‘activate our methodological imagination’ (Marres, 2017, p.
106). Marres continues to say that:
Rather than seeing the instability of digital data instruments and practices
primarily as a methodological deficiency, i.e. as a threat to the robustness of
sociological data, methods and findings, the dynamic nature of digital social life
may also be understood as an enabling condition for social enquiry.
(Marres, 2017, p. 107)
In this book, I suggest a general stance by which more integrated methodologies can
be developed and propagated. Writing from my own personal position as a social
media researcher and cultural sociologist, I will present an argument that the
data-drivenness of big data science does not in essence need to be conceived as
being different from the data-drivenness of ethnography and anthropology. My end
goal is to outline a framework by which theoretical interpretation and a
‘qualitative’ approach to data is integrated with ‘quantitative’ analysis and data
science techniques.
Verstehen and Evidenz
The book, in the end, is especially focused on what interpretive sociology can
bring to the table here. With this concept I refer to the classic notion of
sociology as ‘a science concerning itself with the interpretive understanding of
social action […] its course and consequences’ (Weber, [1921] 1978, p. 4). This
kind of sociology is about the understanding (Verstehen) of social life and has a
focus on processes of how meaning is created through social activities. In other
words, it is not a positivist and objectivist science. As Max Weber put it,
‘meaning’ never refers:
to an objectively ‘correct’ meaning or one which is ‘true’ in some metaphysical

sense. It is this which distinguishes the empirical sciences of action, such as
sociology and history, from the dogmatic disciplines in that area […] which seek to
ascertain the ‘true’ and ‘valid’ meanings associated with the objects of their
investigation.
(Weber, [1921] 1978, p. 4)
Still, he continued, interpretive sociology ‘like all scientific observations,

strives for clarity and verifiable accuracy of insight and comprehension (Evidenz)’
(Weber, [1921] 1978, p. 4). The interpretive stance should entail moving back and
forth between such evidence – data – and their iterative and cumulative
interpretation – theory.
Empirically speaking, this is a book about social media politics (see Chapter 2).
In a set of different case studies, it will say things about how social media are
used today for various political ends, under which circumstances, and to what
effects. The underlying and driving scholarly aim of the book, however, is more
methodological, and is about developing an analytical approach for bringing
together the Verstehen and the Evidenz in general, and social theory and data
science in particular. This agenda, rather than any one core research question
about social media politics, is the main driving force through the chapters that
follow.
I wrote this book as a reminder that, also (or maybe especially) in the age of
datafication, data (still) need theory, and theory (still) needs data. The book
provides a suggestion as to how one may conceptualise and do research that aligns
with that insight. The chapters in this book include theoretical and methodological
discussions, as well as a number of explorative and experimental case studies,
focused on how social media politics can be analysed based on these premises.
Ultimately, the book presents an approach that, while being data-driven and making
use of social media data, and computational data science techniques, is still
firmly set within a theoretically sensitive and sociologically interpretive
framework of analysis.
Theories old and new
Sociological theory, and often such theories that were developed in the pre-digital
age, can contribute immensely to our understanding of things that we are now in the
process of, maybe unnecessarily, inventing new names for: ‘viral communication’,
‘user-generated content’, ‘the blogosphere’, ‘online hate’, ‘cyber bullying’, and
so on. I do not mean that such words, at least not all of them, are merely
superfluous synonyms for things that we already have names for. Nor do I claim that
any old theory is always better than a new one, or that such old theories can be
applied unproblematically to twenty-first-century phenomena without modification.
But, in many cases, we run the infamous risk of throwing the baby out with the bath
water. When researching the peculiarities and novelties of interaction and
communication in the datafied society, we risk mistaking theories about general
patterns of social life as being obsolete just because they were developed in non-
digital contexts.
The already established theories are useful because, even though settings change,
we may often be dealing with the same underlying social forms as before. Georg
Simmel (1895, p. 54) argued that the most important task for the sociologist is to
separate analytically the form of social life from its content, even though the two
are in reality inseparably united. The aim of the analysis must be to detach the
forms from their contents and to bring them together systematically: ‘For it is
evident that the same form […] can arise in connection with the most varied
elements.’ Simmel continued to explain that:
We find, for example, the same forms of authority and subordination, of

competition, imitation, opposition, division of labor, in social groups which are
the most different possible.
(Simmel, 1895, p. 55)
Let us assume, to take but one example, that we were to establish empirically that
people on social media sometimes find themselves disillusioned by their own social
media use, and that they feel as if they are just like cogs in a bigger machine
beyond their individual control. Let us also assume that our analysis made us think
that this may even be a form of oppression or exploitation, where social media
conglomerates make a profit from what disillusioned and exploited users post
online. We may simply invent a new flashy theoretical concept for this, say:
‘digital brainwash’ or ‘social media disconnect’. But we could also make the effort
of going back to already established social theories. In my present example, a good
option may have been Karl Marx’s 1844 theory about alienation (Marx, 1844, pp. 69–
84). The social form of alienation, in that case, may transcend the contexts of
nineteenth-century industrial capitalism and social life on the twenty-first-
century internet. Once we see that, we also enable other insights such as, for
example, that our present-day society may still be quite similar in some respects
to nineteenth-century industrial capitalism.
I do not mean to say that such theoretical connections are not already made by many
scholars, nor do I mean that anyone who does not do it at every opportunity is lazy
or wrong. I myself am a repeat offender. And, conversely, it may indeed sometimes
be a good idea actually to invent new concepts – how else would theories develop? –
and in most cases there needs to be some sort of updating or modification of the
old theory that is re-employed. On the one hand, this book is an explicit effort to
explore and show how to apply existing, trusty, and well-worn social theory
systematically, through data science, to social media politics with this kind of
ambition and aspiration. On the other hand, the book is just as much an
encouragement to combine and re-invent theories in eclectic ways. I will return,
throughout the book, to issues of theory, as universal truth versus theory, as
emergent and constantly renegotiated.
A bit of anarchy
Data scientists Rachel Schutt and Cathy O’Neil (2013, p. 9) argue that data
scientists have much to benefit from collaborating with social scientists. This,
they write, is because social scientists ‘do tend to be good question askers and
have other good investigative qualities’. They write about the hyped and still
emerging speciality of data science that ‘it’s not math people ruling the world’.
Rather, they argue that when different ‘domain practices’ intersect with data
science, each such practice is ‘learning differently’ (Schutt and O’Neil, 2013, p.
219). Taking my cue from Schutt and O’Neil, I ask in this book what type of such
different learning – which methodological developments – can follow when sociology
meets data science.
This is obviously a vastly open question with a multitude of potential answers.

Therefore, my suggestion, which draws to a great extent on my personal
methodological and theoretical preferences as an interpretive sociologist, is but
one possibility. The main idea that I am putting forward is that the data-
drivenness of interpretive sociology, as formulated as a hands-on framework by
methodologists such as Barney Glaser and Anselm Strauss (1967), and particularly
Glaser’s (1978) notion of ‘theoretical sensitivity’, can be dusted off and brought
together with the data-drivenness of data science practices.
Many would say that the respective general views on science and methodology between
big data and grounded theory research are too divergent, to the point that they are
even incompatible. I do not believe that to be the case. Still, to experiment with
merging methods that are labelled ‘qualitative’ and ‘quantitative’ is not a good
idea if you want everyone to agree with you. In both camps (because sadly, that is
still what they are), it is equally easy to find people who are dogmatic. So, to
find productive ways across, there is definitely a need to think unconventionally.
Feyerabend had some good ideas about how science in general could do well with a
dose of theoretical anarchism, and claimed that research methods must always be
opposed and questioned:
The idea of a method that contains firm, unchanging, and absolutely binding
principles for conducting the business of science meets considerable difficulty
when confronted with the results of historical research. We find, then, that there
is not a single rule, however plausible, and however firmly grounded in
epistemology, that is not violated at some time or other. It becomes evident that
such violations are not accidental events, they are not results of insufficient
knowledge or of inattention which might have been avoided. On the contrary, we see
that they are […] absolutely necessary for the growth of knowledge.
(Feyerabend, 1975, p. 7)
This book does not swear by the entire philosophy of Feyerabend, but it does align
with his idea that it is good for science if we violate some of its rules every now
and then. It might be a way to move forward. This is therefore neither a book about
true data science nor about dogmatic sociology (whatever those might be). It
demands that the reader keep an open mind in relation to the transcending character
of the presented analytical approach.
As argued above, theory needs data. But this book is not about data science being
told correctly by sociology. It is just as much the other way around. And maybe not
so much telling as mutual learning. Throughout the central parts of this book, we
shall look at how knowledge about some particular data can be advanced through some
particular social theory. I will also discuss how theory can advance the
formulation of the methodology by which we approach the data. The overarching goal
is the productive meeting of the two.
There are new types of data that demand new types of methods, while there are also
new types of research questions arising that call for developing new theoretical
approaches. This demands the advancing of our perspective on data theory and
methods in parallel. In other words, developing a data theory approach. The term
‘data theory’ as such has been used to some extent already in statistics. William
G. Jacoby, a researcher on public opinion and voting behaviour, has used it to
refer to the process by which the researcher, being theoretically driven, chooses
some aspects of the observable reality as the data to be analysed:
Data theory examines how real world observations are transformed into something to
be analyzed – that is, data. Any empirical observation provides the observer with
information. Typically, however, only certain aspects of this information will be
useful for analytic purposes. The researcher takes a vitally important step in his
or her analysis simply by culling out those pieces of information that are used
from those that could be considered, but are not. The information that is used
comprises the data, and it is clearly only a subset of observable reality. Hence,
it is important to distinguish between observations (the information that we can
see in the real world around us) and data (the information that we choose to
analyze). The central concern of data theory is to specify how the latter are
derived from the former.
(Jacoby, 1991, p. 4)
Furthermore, there was even a Department of Data Theory in the 1990s at the
University of Leiden in the Netherlands, working to adapt classical statistical
methods to suit ‘the particular characteristics of data obtained in the social and
behavioral sciences’ as they ‘are often data that are non-numerical, with
measurements recorded on scales that have an uncertain unit of measurement’
(Meulman, Hubert, and Heiser, 1998, p. 489). I, however, use the concept of data
theory as a very broad label for the work that this book does in order to bring
social theory and data science closer to one another.
Data piñata
While most data scientists are hired by industry, they also exist within a number
of disciplines in academia where the focus is on computational methods applied to
unconventional or messy data. Rachel Schutt and Cathy O’Neil (2013, p. 15) suggest
that:
an academic data scientist is a scientist, trained in anything from social science

to biology, who works with large amounts of data, and must grapple with
computational problems posed by the structure, size, messiness, and the complexity
and nature of the data, while simultaneously solving a real-world problem.
Social scientists should ideally play an important role for data science as many
problems that data science works with – friending, connections, linking, sharing,
talking – are ‘social science-y problems’ (Schutt and O’Neil, 2013, p. 9). As put
by new media theorist Lev Manovich (2012, p. 461):
The emergence of social media in the middle of the 2000s created opportunities to
study social and cultural processes and dynamics in new ways. For the first time,
we can follow imaginations, opinions, ideas, and feelings of hundreds of millions
of people. We can see the images and the videos they create and comment on, monitor
the conversations they are engaged in, read their blog posts and tweets, navigate
their maps, listen to their track lists, and follow their trajectories in physical
space. And we don’t need to ask their permission to do this, since they themselves
encourage us to do so by making all of this data public.
But even if we sometimes may have actual, real-life, well-motivated questions to

pose to the data, data science notoriously runs the risk of becoming too data-
driven. Indeed, data science is sometimes referred to as ‘data-driven science’ as
its main aim actually is to extract knowledge from data. It is mostly not about
testing hypotheses or theories in the traditional scholarly way. Instead, the work
that is done with the data is driven by the data itself – in terms of the
possibilities for gathering it, and the available tools for probing it.
A related concept is data mining. As the word ‘mining’ hints, this approach is
about working to discover interesting patterns in large amounts of data, for
example from the internet and social media. This approach marks a break with the
established view of the research process – at least within the more objectivist
types of science – where a problem or research question is formulated beforehand.
This problem, formulated following a particular need for a certain type of
knowledge about a specific issue, then guides the researcher in sampling data,
devising the research methods, and choosing the theoretical perspectives – or even
in formulating strict hypotheses to verify or falsify. Such a process is by no
means axiomatic when it comes to data science, which makes no secret about often
being highly explorative, and going fishing with a very wide net. In many cases a
so-called data piñata approach is employed. As defined by the online resource Urban
Dictionary:
data piñata: Big Data method that consists of whacking data with a stick and
hopefully some insights will come out. [Example:] The Big Data Scientist made a
Twitter data piñata and found that Saturdays are the weekdays with the most tweets
linking to kitty pictures.
(Urban Dictionary, 2018)
Such strategies may be seen by some as unscientific, as they do not rely on actual
questions about real problems, but on patterns that one stumbles across more or
less randomly. Indeed, in the type of research that deals with solicited data,
intently collected for certain research purposes, a data piñata approach would be
odd. Why should we collect some random data, just to beat it with a stick to see
what pops out? And, what type of data should that be? What methods or informants
should be engaged, and how? In the case of register-based or database research, a
piñata strategy might be closer at hand. And this is most definitely true in the
case of the types of data that are enabled by people’s use of the internet and
social media.
Census and survey researcher Kingsley Purdam and his data scientist colleague Mark
Elliot aptly point out that today, to a lesser and lesser degree, data is
‘something we have’, rather: ‘the reality and scale of the data transformation is
that data is now something we are becoming immersed and embedded in’ (Purdam and
Elliot, 2015, p. 26). Their notion of a data environment underlines that people
today are at the same time generators of, but also generated by, this new
environment. ‘Instead of people being researched’, Purdam and Elliot (2015, p. 26)
write, ‘they are the research’. Their point is that new data types have emerged –
and are constantly emerging – that demand new flexible approaches. Doing digital
social research, therefore, often entails discovering and experimenting with
challenges and possibilities of ever-new types and combinations of information.
Among these are not only social media data, but also data traces that are left,
often unknowingly, through digital encounters. Manovich gives an explanation that
is so to the point that it is worth citing at length:
In the twentieth century, the study of the social and the cultural relied on two
types of data: ‘surface data’ about lots of people and ‘deep data’ about the few
individuals or small groups. The first approach was used in all disciplines that
adapted quantitative methods. The relevant fields include quantitative schools of
sociology, economics, political science, communication studies, and marketing
research. The second approach was used in humanities fields such as literary
studies, art history, film studies, and history. It was also used in qualitative
schools in psychology, sociology, anthropology, and ethnography. […] In between
these two methodologies of surface data and deep data were statistics and the
concept of sampling. By carefully choosing her sample, a researcher could expand
certain types of data about the few into the knowledge about the many. […] The rise
of social media, along with new computational tools that can process massive
amounts of data, makes possible a fundamentally new approach to the study of human
beings and society. We no longer have to choose between data size and data depth.
(Manovich, 2012, pp. 461–3)
Going back to 1978 and Glaser’s book on Theoretical Sensitivity, we can find some
useful pointers on how to see the research process – beyond ‘quantitative’ and
‘qualitative’. The first step, for Glaser (1978, p. 3), is ‘to enter the research
setting with as few predetermined ideas as possible’, to ‘remain open to what is
actually happening’. The goal is then to alternate between having an open mind –
working inductively, allowing an understanding of the research object to emerge
gradually – and testing the emerging ideas as one goes along – working deductively
trying to verify or falsify the developing interpretations. So, we can, quite
mindlessly, beat on the piñata for a little while to see what jumps out. Then try
to make sense of the things that emerged, and then beat some more to see what the
new stuff that is popping out adds or removes from our present analysis.
Using Glaser’s approach, then, means being truly data-driven. He argues that the
overarching question that must continually be posed in any research is: ‘What is
this data a study of?’ (Glaser, 1978, p. 57). Most of the time, research projects
start off with a clear idea of what to study. It would not make sense to be
completely oblivious as to the aims of one’s work. But still, Glaser argues,
constantly repeating and renewing the question of what the data is actually about,
allows for any other ideas or findings to either take place alongside the initially
intended ones, or even replace them completely. The point of the question is that
it ‘continually reminds the researcher that his original intents on what he thought
he was going to study just might not be; and in our experience it usually is not’
(Glaser, 1978, p. 57). The other important question for empirical research is:
‘What is actually happening in the data?’ The flexibility and inductiveness of the
approach wants to get at ‘what is actually going on’ in the area that is studied
(Glaser and Strauss, 1967, p. 239).
My point here is that being data-driven, as is often the case when working with big
data, is not (only) a new ill, caused by the datafication of society and the
fascination with huge datasets. Used in the right way, a data-driven approach – a
data piñata – can be truly useful in getting to know more about what goes on, what
social and cultural processes may be at work, in contexts and behaviours that are
still largely unknown to us. From that perspective, not really knowing what we are
looking for, and why, can be a means to tread new ground, veering off the well-
trodden paths, to get lost to find our way. If we don’t even know what is going on,
maybe beating that piñata with a stick isn’t such a bad idea? The new data science
opportunities and tools, in combination with social theory has a huge potential to
help decode the deeper meanings of society and sociality today.
Breaking things to move forward
Finding good solutions – rather than adhering to rules – should be the end goal of
any analytical strategy. This draws on Feyerabend’s idea that anarchism in science,
rather than ‘law-and-order science’, is what will help achieve progress. And, as
for the risk that such an approach will lead to an unproductive situation where
anything goes, we must simply trust in our own ability to think in structured ways
even without following rigid rules dogmatically:
There is no need to fear that the diminished concern for law and order in science
and society that characterizes an anarchism of this kind will lead to chaos. The
human nervous system is too well organized for that.
(Feyerabend, 1975, p. 13)
In order to think creatively and freely in relation to existing approaches, we must

allow ourselves not to think so much about which theoretical perspectives have been
conventionally agreed to be compatible with one another, or about whether it is
officially correct to mix certain methods together or not. In that sense, the
approach that I am proposing can be metaphorically understood as a form of hacking.
Because, in spite of its popular reputation to the contrary, hacking is not (only)
about breaking the law through forms of electronic vandalism. As argued by
cryptologist Jon Erickson (2008), hacking can in fact even be more about adhering
to rules than about breaking them. The goal of hacking is to come up with ways of
using, or exploiting, the structures and resources that are in operation in any
given situation in ways that may be overlooked or unintended. Hacking is about
applying existing tools in smart and innovative ways to solve problems. Erickson
writes that:
hacked solutions follow the rules of the system, but they use those rules in
counterintuitive ways. This gives hackers their edge, allowing them to solve
problems in ways unimaginable for those confined to conventional thinking and
methodologies.
(Erickson, 2008, p. 16)
Datafication presents us with a new data environment – with data traces, data
fragments, and unsolicited data – that offers the opportunity to think in new ways
about research in the ‘spirit of hacking’, aiming to surmount ‘conventional
boundaries and restrictions’ for the goal of ‘better understanding the world’
(Erickson, 2008, pp. 16–18). What I describe here as anarchistic, and as hacking,
may sound radical and dangerous – or maybe just plain stupid. But as a matter of
fact, this approach is not very far from how science, as conceived by Bruno Latour,
in general comes into being. Science and research happen in action. They are not
ready made. Interest should not be focused on any alleged intrinsic qualities of
approaches, but on the transformations that they undergo in their practical use.
Methods do not have any ‘special qualities’, as their effects come from the many
ways through which they are ‘gathered, combined, tied together, and sent back’
(Latour, 1987, p. 258). Thus, ‘we are never confronted with science, technology and
society, but with a gamut of weaker and stronger associations’ (Latour, 1987, p.
259). Knowledge about society is produced through more or less messy sets of
practical contingencies.
A patchwork of solutions
It is a common conviction in social research that one cannot do ‘qualitative’ and

‘quantitative’ in the same breath, as they are based on different epistemologies.
But this can in fact be debated. As argued by Bryman (1984), the difference may in
practice lie not so much in different philosophical views on how knowledge about
social reality is achieved, but simply in the path-dependent choices that are made
by individual researchers who get stuck with one paradigm or the other. While it
has become an eternal truth, reiterated by researchers and methods teachers alike,
that ‘the problem under investigation properly dictates the methods of
investigation’ (Trow, 1957, p. 33), very few adhere to this in practice. Bryman
explains that:
it is not so much a problem that determines the use of a particular technique but a
prior intellectual commitment to a philosophical position. The problem is then
presumably formulated within the context of these commitments. This suggestion also
makes some sense in terms of the individual biographies of many social researchers,
most of whom do seem to be wedded to a particular research technique or tradition.
Few researchers traverse the epistemological hiatus which opens up between the
research traditions.
(Bryman, 1984, p. 80)
Doing digital social research, due to the particular challenges it raises, has
‘prompted its researchers to confront, head-on, numerous questions that lurk less
visibly in traditional research contexts’ (Markham and Baym, 2009, pp. vii–viii).
One such issue is definitely the need to address the long-standing dispute in
social science between ‘qualitative’ and ‘quantitative’ methodological approaches,
which has persisted, apparently unresolvably, for more than a century – or since
ancient Greece, depending on who you ask. Among researchers, there are still traces
of a battle between case-oriented interpretative perspectives, on the one hand, and
variable-oriented approaches focused on testing hypotheses on the other. Scholars
who prefer case-oriented methods will argue that in-depth understandings of a
smaller set of observations are crucial for grasping the complexities of reality,
and those who prefer variable-oriented approaches will argue that only the highly
systematised analysis of larger numbers of cases will allow scholars to make
reliable statements about the ‘true’ order of things.
Today, however, there is an increasingly widespread consensus that the employment

of combinations of ‘qualitative’ and ‘quantitative’ methods is a valid and
recommended strategy, which allows researchers to benefit from their various
strengths, and balance their respective weaknesses. The ‘qualitative’ tradition is
seen as the more inductively oriented interpretative study of a small number of
observations, while the ‘quantitative’ tradition is characterised by the
deductively oriented statistical study of large numbers of cases. This has given
rise to the common notion that ‘qualitative’ research produces detailed accounts
through close readings of social processes, while ‘quantitative’ research renders
more limited, but controlled and generalisable, information about causal relations
and regularities of the social and cultural fabric.
As argued above, most researchers would agree in theory that methodological

pragmatism – letting the problem to be researched, and what type of knowledge is
sought, decide which method should be used – but few actually do this. This is not
because researchers are liars, but because it is in fact hard to make it happen.
The general direction for the work in this book, in combining the data-drivenness
of interpretive (‘qualitative’) sociology, with the data-drivenness of
(‘quantitative’) computational methods, most closely resembles what methodologists
Norman Denzin and Yvonna Lincoln (2005, pp. 4–6) have discussed in terms of
bricolage.
‘Bricolage’ is a French term, popularised by cultural anthropologist Claude Lévi-

Strauss (1966), which refers to the process of improvising and putting pre-existing
things together in new and adaptive ways. From that perspective, our research
method is not fully chosen beforehand, but rather emerges as a patchwork of
solutions, old or new, to problems faced while carrying out the research. As
critical pedagogy researcher Joe Kincheloe (2005, pp. 324–5) observes: ‘We actively
construct our research methods from the tools at hand rather than passively
receiving the “correct”, universally applicable methodologies’, and we ‘steer clear
of pre-existing guidelines and checklists developed outside the specific demands of
the inquiry at hand’. So, developing your method and methodology as a bricolage
means placing your specific research task at the centre of your considerations, and
allowing your particular combination and application of methods to take shape in
relation to the needs that characterise the given task. So this, then, is not about
letting the research problem guide a choice between already existing methods.
Rather, it is about re-inventing your methods in relation to each and every new
challenge.
For the purpose of this book’s ambition to establish an interface between

interpretive sociology and computational methods, the idea of bricolage refers to
the method of piecing these two together in the shape of an emergent construction
‘that changes and takes new forms as the bricoleur adds different tools, methods,
and techniques of representation and interpretation to the puzzle’ (Denzin and
Lincoln, 2005, p. 4). Method must not be dogmatic, but strategic and pragmatic. I
therefore argue in this book, that computational techniques, results, and
visualisations can be used as elements in a new form of interpretive enterprise.
The interpretive interface
Computational social scientists have worked to bring disciplines such as sociology

into closer contact with data-intensive approaches. In those cases, the translating
interface between the two paradigms has commonly been that of statistical and
mathematical language. It has been the ‘quantitatively’ oriented social scientists
that have bridged over. For example, Salganik (2018, p. 379) discusses how big data
can be useful in social research by helping produce faster estimates, and engaging
large numbers of research participants in crowd-coding efforts, especially if one
is using established statistical strategies to increase the validity of the more
messy kinds of online data. In this book, I instead advocate a more interpretive
and ‘qualitative’ interface between social science and data science.
Analysing sociality in the age of deep mediatisation may appear to be something

that should be done in more ‘quantitative’ terms, because of its scale and the
numerical character of much social media data. But there is actually even more
reason to approach such objects of study, as well as the new types of data they
enable and exude, from a more interpretive standpoint. Just because sociality in
the digital age happens in volume and numbers, does not mean that its traces are
automatically akin to survey data or other forms of statistical inputs. It is
important to realise that the internet, and its networked social tools and
platforms, in many ways serve up a different research context than what has been
the familiar one to social science. The new context possesses an ‘essential
changeability’ that begs a conscious shift of focus and method (Jones, 1999, p.
xi). It is because of this that researching digital society demands that the
researcher be even more critical and reflective than is already demanded by
scholarship in general.
The data that we face do not equal ‘society’. As, explained by Salganik (2018, p.
58), behaviour in big data systems is algorithmically confounded, as ‘it is driven
by the engineering goals of the systems’. This means that when we analyse different
forms of social interaction, social patterns, and activities in the datafied age,
we are analysing a new form of sociality, where automated actors, such as bots, as
well as the algorithmic logic of the systems, become part of the situation. On the
one hand, this is nothing new to sociology, as it has at its core a long-standing
interest in the interplay between structure and agency. It wants to study what
constitutes social action, and which enabling and limiting structures shape such
action. On the other hand, in relation to the digital setting, we are dealing with
new types of agency and new types of structures. Because of the multifaceted
character of sociality as mediated through digitally networked tools and platforms,
there is today more reason than ever to mix methods in social research so that the
discipline can continue to develop. Work towards this can, for example, draw on new
tools for data collection via web scrapers, APIs, or online repositories. And they
can also include new devices and strategies for analysing data, in the form of
computerised language processing, the harnessing of geolocative hardware, new
visualisation techniques, and so on.
Law (2004) has written about a need to move into an era After Method in social
research. His argument is that we must realise that there is no ‘general world’ to
be researched, and that there are no ‘general rules’ for how reality should be
analysed (Law, 2004, p. 164). It is not necessarily the case that failing to follow
conventional methodological rules that are imposed on science means that one will
end up with substandard or distorted knowledge. Underlining the inherent messiness
of social research practice, Law argues that we may have to ‘rethink our ideas
about clarity and rigour, and find ways of knowing the indistinct and the slippery
without trying to grasp and hold them tight’ (Law, 2004, p. 12). This is achieved
through being deliberately imprecise, by conventional standards, and to conceive of
social analysis in broader and more generous terms. As researchers, we must stop
desiring for, and expecting, security. Method, Law argues, offers no guarantees in
reality, even though we have been taught in academic programmes to believe that it
does. In a way, Law’s approach is more about honesty than actual change, as ‘the
problem is not so much lack of variety in the practice of method, as the hegemonic
and dominatory pretensions of certain versions or accounts of method’ (Law, 2004,
p. 13). It is Law’s argument that we should think of methodology and analysis not
in terms of dogma and rules, but as assemblages, where each set of tools and
approaches – like ‘a radio receiver, a gong, an organ pipe, or a gravity wave
detector’ – will resonate with the analysed reality in its specific ways (Law,
2004, p. 126). The sum of it all is that we must dare to think more openly and less
dogmatically about method and analysis, and especially so in relation to the
messiness of the social.
Instruments of revelation
It is inspiring to try to rethink the explorative, largely theory-less, data-

drivenness of data science research as a form of ethnography in the sense that it
is overarchingly about achieving what Clifford Geertz (1973) described as ‘thick
descriptions’ of social life:
Ethnography is thick description. What the ethnographer is in fact faced with –

except when (as, of course, he must do) he is pursuing the more automatized
routines of data collection – is a multiplicity of complex conceptual structures,
many of them superimposed upon or knotted into one another, which are at once
strange, irregular, and inexplicit, and which he must contrive somehow first to
grasp and then to render.
(Geertz, 1973, pp. 9–10)
Coming full circle, one of Geertz’s influences was Max Weber’s Verstehen method, as
he famously stated:
Believing, with Max Weber, that man is an animal suspended in webs of significance
he himself has spun, I take culture to be those webs, and the analysis of it to be
therefore not an experimental science in search of law but an interpretive one in
search of meaning.
(Geertz, 1973, p. 5)
In this book, I construe the outcomes of computational analysis, not as being

replies to research questions in and by themselves, but as intermediary results to
be interpreted in turn. This view thus understands data-intensive techniques not as
ready-made methods that are simple means of achieving research goals, but even as
part of the actual object of research. The computational techniques – like a
village or street corner – are something for the participant ethnographer to enter
into. Richard Rogers (2013, p. 1) suggests that what he calls ‘digital methods’ are
about identifying and following ‘the methods of the medium’ that are already
embedded in digital society. Rogers’ argument is that the internet is already doing
research-like things by itself, such as collecting, computing, sorting, ranking,
and visualising data. The central idea in Rogers’ approach to the study of the
digital is not to intervene or interfere very much with these existing ‘methods’.
Our analyses may in fact be more accurate if we respect the integrity of them,
follow them with curiosity, and learn from them. Rogers (2013, p. 1) writes:
For example, crawling, scraping, crowd sourcing, and folksonomy, while of different
genus and species, are all web techniques for data collection and sorting. PageRank
and similar algorithms are means to order and rank. Tag clouds and other common
visualizations display relevance and resonance. How may we learn from and reapply
these and other online methods? The purpose is not so much to contribute to their
fine-tuning and build the better search engine, for that task is best left to
computer science and allied fields. Rather, the purpose is to think along with
them.
The role of the researcher, then, becomes to attempt to ‘follow the medium’ and its
methods as they evolve, and to find ways of exploiting and recombining them in
useful ways. In the context of this book, one can correspondingly think in terms of
following the methods of data science, seeing them as symptomatic of the datafied
society, rather than simply designed to analyse it from the outside. But, in line
with Rogers’ argument, these methods can still be harnessed, interpreted, and
repurposed for interpretive social analysis.
The aim of thinking and working in this way is, Rogers (2013, p. 3) writes, ‘to
build upon the existing, dominant devices themselves, and with them perform a
cultural and societal diagnostics’. This means that the ‘initial outputs’ of the
research – a network graph, a topic model, a set of sentiment scores, a clustering
of users – can be ‘seen or rendered in new light’. The main challenge for digital
research, in that case, is to develop a mindset as well as a methodological outlook
for doing social and cultural research with, rather than about, digital society.
Thinking in this way about digital methods, framing them interpretively to pave the
way for a form of ‘distant reading’ (Moretti, 2013) of the data patterns of
society, offers ‘a new kind of microscope’ (Hargittai and Sandvig, 2015, p. 6).
As pointed out by Hargittai and Sandvig (2015, p. 11), the innovations,

experimentations, and renegotiations that we see within the area of digital methods
are ideally examples of what historian of science Derek J. de Solla Price (1986, p.
246) has called instruments of revelation. When discussing the scientific
revolution historically, he argued that its dominant driving force had been ‘the
use of a series of instruments of revelation that expanded the explicandum of
science in many and almost fortuitous directions’. He also wrote of the importance
of ‘the social forces binding the amateurs together’. So, in the case of digital
social research, we are now at that stage: a point where researchers often act like
curiously experimenting enthusiasts – ‘amateurs’ – in testing and devising new
‘instruments of revelation’.
Decoding Social Forms
The empirical case examples in this book are not about ‘politics’ in the narrow
sense, where we take it to refer only to things such as parties, parliaments,
governments, and elections. The book indeed deals with that type of politics, but
not at all exclusively. Politics, in a wider sense also includes things like social
movements, activism, and resistance. That understanding draws on Antonio Gramsci’s
(1971) notion of hegemony, which provides an account of how the dominant ways of
understanding reality are sustained, and how these hegemonic ways of thinking,
talking, and acting give even more power to the already powerful groups in society
by legitimising their interests. But the most important point of Gramsci’s theory
is that hegemony will only prevail as long as the dominant groups succeed in
disarming any competing definitions of reality. Civil society then, is not only a
place for the diffusion of hegemony, but also a terrain where counter-hegemonic
ideas, resistance, can be formulated. Generally, when I use words like politics or
political, I take them to have a broad socio-cultural meaning. Political theorist
Chantal Mouffe (2000, p. 101) defines ‘the political’ as:
the dimension of antagonism that is inherent in human relations, antagonism that

can take many forms and emerge in different types of social relations.
In other words, politics is not only about politicians, nor only about activists.
It is also about all of the, often everyday and seemingly ‘unpolitical’, things
that individuals and groups think, say, or do, that relate, with varying degrees of
directness, to hegemonic and counter-hegemonic processes. The examples in this
book, then, are about social traces of things that people do either for political
reasons, or that get political consequences, or both.
The odd places of politics

As I have argued before (Lindgren, 2013), the complex political outcomes of user-
driven social media practice demand to be studied empirically, and with a case-by-
case contextual sensitivity. And it is indeed in studies like these that the data-
theory approach outlined in this book is vital. Theoretical interpretation is
contingent on the specific context in which an empirical pattern is identified. In
relation to the issue of hegemony versus counter-hegemony, I conceptualised the
object of such research as disruptive spaces (Lindgren, 2013, p. 143). They are
such emergent online spaces that embody more or less conscious attempts at
obstructing or providing an alternative to prevailing discourses. I also argued,
and still do, that the actual effect and significance of such attempts must always
remain an empirical question. In the words of Phillips and Milner (2017, p. 14),
this is about posing questions such as ‘who is speaking, who is listening, and who
is refusing to engage?’ Is the analysed behaviour a matter of ‘punching down’ or
‘punching up’ in the process of challenging structural inequalities between races,
genders, classes?
There is also a need to map critically such processes on social media that we risk
dismissing as simply being weird, odd, or random. This is because they are in fact
traces of how politics function in today’s world. In line with this, Phillips and
Milner argue that:
ambivalent expression that seems like just a ghost story, just a gorilla meme, just
a political remix, is thus revealed to be so much bigger, so much messier, and so
much more intertwined with everything else.
(Phillips and Milner, 2017, p. 211)
I wrote about this importance of not neglecting the power of the ‘non-political’ in
my previous book Digital Media and Society (Lindgren, 2017). When discussing the
relationship between digital media and social change I argued that seemingly
mundane, odd, and quirky expressions of digital sociality – such as memes, selfies,
trolling, cute cats, and so on – may in fact have a much stronger transformatory
power than one may initially assume. Beyond more obvious examples of digitally
transformative types of social organisation and modes of operation such as, for
example, peer-production (Benkler, 2006), participatory culture (Jenkins, 2006),
and citizen journalism (Thorsen and Allan, 2014), we must also take the apparently
random social media practices seriously, as pieces of the puzzle. Perhaps, in fact,
such expressions seem insignificant and random for the very reason that they are
the most transformational. We simply have not mobilised concepts yet that allow us
to see them as anything other than oddities. The exploratory, experimental, and
open framework of data theory can aid in the process of developing a language and
understanding for such phenomena.
Writing about social change, sociologists Gene Shackman, Ya-Lin Liu, and George
Wang (2002) argue that ‘many coincidental, unique or random factors influence the
change process’. In a similar vein, social movement researchers Douglas McAdam and
William Sewell (2001, p. 102) have written about social change, that ‘very brief,
spatially concentrated, and relatively chaotic sequences can have durable,
spatially extended, and profoundly structural effects’. One might argue, then, that
there can exist a politics of the seemingly non-political. Political scientist
Jessica Beyer (2014) has conducted a series of case studies of online groups and
spaces – Anonymous, The Pirate Bay, World of Warcraft, and the IGN.com forum – from
a mobilisation perspective. In these studies, she found that digital tools,
platforms, and spaces that appear to be non-political are in fact crucial for
gaining an understanding of civic engagement.
Beyer argues that environments where no individual owns the content, where there is
a high degree of anonymity, and low levels of formal regulation, give rise to
creativity. Therefore, social interaction in online spaces that are ‘non-political’
is important for an understanding of how people conceive of themselves in relation
to political processes. This is because, no matter the aim or intended function of
the tools and platforms in question, political conversations and negotiations over
norms happen in unlikely places. In the online role-playing game World of Warcraft,
for example, Beyer (2014, p. 128) found that ‘politically significant interactions
permeated micro-interactions’.
There is a common normative assumption that for society to be healthy it should be

characterised by ‘high discourse’, where highly educated people who appear with
their real names interact with each other in a polite manner. Beyer argues,
however, that in places such as 4chan or Reddit, the wide range of content – some
of which can be deeply disturbing – and the wider range of conversation may create
better possibilities and more opportunities for political action as well. Beyer
(2014, p. 132) concludes that ‘there is value not only in the places online that
fit our expectations of civil society but also in the places online that make us
cringe’. In such spaces, Beyer says, there are much better opportunities to foster
activism than on privately owned sites. And looking beyond the narrow scope of
Beyer’s empirical case of activism, unexpected and seemingly random spaces are also
the ones likely to be disruptive (Lindgren, 2013), in a broader sense, and to
contribute more generally to social change.
Social media politics for better or for worse
It was not long ago that the common understanding of what a social media revolution
would entail was based on cyberutopian ideas, according to which progressive and
egalitarian grassroots movements would be able to leverage networked technologies
to make the world a better place. In this view, user-driven and leaderless digital
contention was something good. One could trust, in spite of some odd or degenerate
instances of misogyny and ‘cyberhate’, that, in most cases, ‘good’ people would
harness the political power of social media to do mostly ‘good’ things.
The world had been strengthened in this belief by a series of compelling examples
throughout the last two decades. The United States elected its first black
president in 2008, following a social media fuelled campaign that famously
generated large-scale popular engagement and a record turnout. There were the
largely digitally organised and communicated mass protests in Iran in 2009 at
election results that were allegedly rigged: the Arab Spring and the Indignados
Movement in 2011; the Umbrella Revolution in Hong Kong in 2014. Overall, social
media mobilisations have been strongly connected, especially in the way that things
have been popularly and academically highlighted, to progressive and emancipatory
politics in fields of race (as in #BlackLivesMatter), class (as in #Occupy), and
gender (as more recently in #MeToo).
This was a bit too good to be true, wasn’t it? Surely, critics warned, already at
the time, that one must remember, and understand, how communication technologies,
as such, come with no built-in specific uses, aims, or consequences. The opposite
view – the general assumption or prediction that the use of digitally networked
communications is bound to generate particular forms of (often progressive) social,
cultural, and political change – is what we usually refer to as technological
determinism. Visions of how technologies, such as social media, will by necessity
be a bearer of counter-hegemonic emancipation are, said Evgeny Morozov in his 2011
The Net Delusion, an expression of ‘ethical amnesia’:
Throughout history, new technologies have almost always empowered and disempowered
particular political and social groups, sometimes simultaneously – a fact that is
easy to forget under the sway of technological determinism. Needless to say, such
ethical amnesia is rarely in the interests of the disempowered.
(Morozov, 2011, p. 291)
Anyone who stops for a moment to try to break out of that state of memory loss
realises, alongside Morozov, that social media can both empower and disempower, and
that by empowering some they will unavoidably disempower others. The political
effects of social media are ambivalent. So, while much focus throughout the last
couple of decades has been on the potential of social media mobilisation to counter
racism and sexism, and to give voice to minority groups, the rise of the alt-right
during the Trump campaign was in fact a true social media triumph, not least
because of its successful use of memes as a political tool. This underlines how the
affordances of social media, in the shape of anonymity, decentralised
communication, and so on, can in fact be used to promote any ideology, also that of
the far right.
This is why social media politics can appear to be deceptive. The protocols –
cultural and technological – upon which it rests involve a complex set of forces,
some reactionary, some progressive. As a solution to the seemingly unbridgeable
clash between ‘cyberoptimists’ and ‘cyberpessimists’ (Lindgren, 2017, p. 45), it
has been suggested that we must adopt a cyberrealist perspective (Kahn and Kellner,
2008; Lovink, 2002; Morozov, 2011, pp. 315–20). This means acknowledging that
digital tools and platforms may be used for emancipatory ends, but also that
oppressive forces may just as well appropriate them. As critical social media
scholar Christian Fuchs (2008) argues, there is a constant struggle in digital
culture between cooperation and competition that produces tensions between networks
of domination and networks of liberation.
It is tempting to see this as an either-or situation, where one of the sides comes
out winning in each given situation. From that perspective, the progressive and
emancipatory potentials of social media as tools for social change are sometimes
realised. In other cases, they are not, as reactionary, conservative, or oppressive
forces can instead channel the power of social media to their advantage. I, myself,
made this argument in my 2013 book New Noise when saying things like:
In digital culture, an increasing number of online socio-cultural spaces are

celebrated for being disruptive. Groupings and discourse stemming from these spaces
may have the power to circumvent dominant flows of communication, to subvert
preferred meanings, and to challenge power structures. An important task for
internet research is to evaluate the conditions under which this power is realized
or not.
(Lindgren, 2013, p. 143)
This is still true in the sense that some political uses of social media empower
the powerless, while others empower those in power to render the powerless even
weaker and more marginalised or oppressed. But what is also true, and becoming more
and more apparent in light of developments in the last few years, is that in
between those two extremes – because that is what they are – is an unruly and
diverse field of social practice where motives, strategies, and effects are
strikingly ambiguous. This field of social practice, encompassing all of those
political social media expressions and behaviours that carry this kind of
doubleness or ambiguity, is of interest for this book. I take an interest in such
user-driven, networked, and ambiguous social media practice, and how it can be
analysed through a methodological sensibility for theory and data in conjunction.
Virality and memes
When talking about social media politics, it is also important to note that, while
all media have a social aspect to them, we are concerned here with the narrower
concept of social media: those digitally networked platforms for communication and
interaction that now dominate people’s social uses of the internet: Facebook,
Twitter, YouTube, Instagram, Snapchat, alongside a hybrid ecosystem of blogs and
forums. One of the historically new things with such platforms is how their
networked character and their technological affordances for crafting and sharing
messages allow for, and promote, viral communication and memes.
Virality is, a somewhat over-used, word for the process where content – images,
ideas, slogans, jokes, attitudes, videos – run rampant through the wicked,
turbulent, ambivalent networks of social media. Sampson (2012, p. 20) draws a
parallel between virality and late-nineteenth-century sociologist Gabriel Tarde’s
writings about ‘imitative repetitions’ in, for example, ‘education, language, legal
codes, crime, fashion, governance, and economic regimes’. This is in line with
Tarde’s, nearly chemical, perspective on how society comes into being through the
agglomeration of myriad micro-interactions among individuals. With its key concepts
of innovation, imitation, adaptation, and opposition, Tarde’s theory on the laws of
imitation provide a sociological antecedent to, what we today call, viral
phenomena. He wrote, more than a century ago, about how:
the incessant struggle between minor linguistic inventions which always ends in the
imitation of one of them, and in the abortion of the others, finally comes to
transform a language in such a way as to adapt it, more or less rapidly and
completely, according to the spirit of the community, to external realities and to
the social purposes of language.
(Tarde, 1903, p. 210)
Alongside the democratised access to the tools of communication, the potential

virality – the possibility that ideas and behaviours spread rapidly, widely, and
exponentially through networks of people – is the most crucial characteristic of
social media politics. A vital notion here is that of the meme. Coined by
evolutionary biologist Richard Dawkins in his 1976 book The Selfish Gene. Being a
shortened form of the Greek word mimeme (‘imitated thing’), it refers to a unit of
cultural transmission:
Examples of memes are tunes, ideas, catch-phrases, clothes fashions, ways of making
pots or of building arches. Just as genes propagate themselves in the gene pool by
leaping from body to body via sperms or eggs, so memes propagate themselves in the
meme pool by leaping from brain to brain via a process which, in the broad sense,
can be called imitation.
(Dawkins, 1976, p. 193)
In relation to the internet and social media, the word meme has become popular to
name the viral spread of jokes, rumours, videos, images, hashtags, and so on.
Shifman has suggested the following definition of an internet meme:
(a) a group of digital items sharing common characteristics of content, form,

and/or stance, which (b) were created with awareness of each other, and (c) were
circulated, imitated, and/or transformed via the Internet by many users.
(Shifman, 2014, p. 41)
So, in terms of its empirical case matter, this book is about stuff that people do
on social media that get political consequences, and with a special focus on the
turbulence and disorganisation generated through the viral and memetic logic which
is prevalent on such platforms. Social media, and especially their uses, are indeed
messy. People do ambiguous, odd, random, and plain weird things on social media,
and at the same time there seems to be some sort of connection between the most
silly and playful stuff on the one hand, and the most political and consequential
on the other. This is why it is urgent that social theory is revived, repurposed,
deconstructed, reconstructed and put in dialogue with the datafied context of the
twenty-first century.
The discussion above aligns with John Law’s argument that ‘the world is on the move
and social science more or less reluctantly follows’. He argues that we must think
in new and transcending ways about social analysis as the world increasingly cannot
be fully understood as a comprehensive set of determinate processes. Social
research must ‘work differently if it is to understand a networked or fluid world’
(Law, 2004, p. 12). He goes on to write:
This is the crucial point: what is important in the world including its structures
is not simply technically complex. That is, events and processes are not simply
complex in the sense that they are technically difficult to grasp (though this is
certainly often the case). Rather, they are also complex because they necessarily
exceed our capacity to know them.
(Law, 2004, p. 15)
Analysing the complexity of ambiguous political behaviours and practices on social

media, much like analysing social life in general, means facing the challenge of
finding patterns and meanings in a messy multitude.
The Weber connection: Ambivalence and trolling as ideal types
Turning to one of the classic sociologists, we can see how Max Weber argued,
already at the turn of the twentieth century, that society is very difficult to
analyse in all of its complexity. He suggested that the researcher must create a
conceptual tool, called an ‘ideal type’, ‘which has the merit of clear
understandability and lack of ambiguity’ (Weber, [1921] 1978, p. 6). He meant that,
in order to give a precise meaning to terms, one must formulate theoretical
concepts that have the highest possible fit with reality, even though ‘it is
probably seldom if ever that a real phenomenon can be found which corresponds
exactly’ to the ideal type (Weber, [1921] 1978, p. 20). One way of approaching the
complexity of social media politics described above is by ideal types such as, for
example, ‘ambivalence’ or ‘trolling’. In such a scenario one would decide that any
such concept is the closest description of what is going on.
Let us return to Morozov’s important argument: The political effects of social

media are indeed ambivalent. They can empower and disempower simultaneously.
Literary scholar Whitney Phillips and communications scholar Ryan Milner bring this
exact point into sharper focus in their 2017 book The Ambivalent Internet. They
argue that the fact that contemporary social media are overrun with ambivalent
expressions, and ‘the prevalence of silliness, satire, and mischief in online
spaces’ (Phillips and Milner, 2017, p. 9), gets actual political consequences. The
constellations of participants, and the range of perspectives, are vast and highly
complex in social media, but the use of common platforms alongside the large-scale
networked connections between people can still generate an ‘unintended collective
purpose’ (Phillips and Milner, 2017, p. 3).
Phillips, who has also authored the book This Is Why We Can’t Have Nice Things
(2015) on internet trolling, and Milner, author of The World Made Meme (2016),
point out that one commonly finds expressions and practices online that prompt the
question ‘is this a joke or are these people serious?’ (Phillips and Milner, 2017,
p. 6). Their examples range from political and commercial hashtag campaigns that
were hijacked and flooded with mockery, to distasteful fan cultures around spree
shooters, and illustrate the tendency in parts of internet culture towards
nihilism, a lack of morality, and the practice of posting extreme things as an end
in itself.
But just as the tendency, discussed above, towards wanting unambiguously to sort
political social media activities into either the counter-hegemonic or the
reactionary category, simply naming something as trolling, or ambivalence, is also
imprecise and unhelpful. If we want to unpack the complexity that lies underneath,
we can’t rely on such catch-all labels that lump together ‘online behaviours with
even the slightest whiff of mischief, oddity, or antagonism’ (Phillips and Milner,
2017, p. 7). This is also the flipside to Weber’s ideal types – they lead to an
illusion of clarity at the cost of a loss of nuance. Just like other simplifying
characterisations, such as ‘online hate’, labels such as ‘trolling’ and
‘ambivalence’, beyond being reductive, can also contribute to understate, neglect,
or misread the negative effects of some harmful online behaviours.
Still, it is tempting merely to chalk things up to internet culture’s being weird.

Sociologist Manuel Castells (2001) wrote of the internet as having its own specific
culture, which was thriving with free and open creativity, while being embedded in
virtual networks with the potential to reinvent society. Phillips and Milner (2017,
p. 8) write that the internet is sometimes ‘foregrounded as a discursive space with
its own absurd logics and twisted norms’ where the weird outnumbers the ‘normal’ at
a ratio of 2 to 1. Indeed, parts of today’s internet culture relies on an ethos of
reactive irony, cynicism, and nihilism. But, in line with the arguments above,
seeing the internet as just being ‘weird’ is also ultimately unhelpful, as its
weirdness is not only relative and subjective, but also quite complicated. Surely,
contemporary internet publics are often expressing a kind of shrugging
indifference, ¯\_(ツ)_ /¯, carelessness, resignation, or nihilism. But, the bottom
line is that neither the ideal type of trolling, nor that of the weird internet,
adequately captures the nuance of social practices in the wide and deep zone
between what is obviously good or clearly bad from any given point of view.
Refreshingly, however, Phillips and Milner want to do something more than just
dismiss the ambiguity as trolling, weirdness, and definite messiness. Following
them, this book sees the ideal type of ‘ambivalence’ not as the answer to the
question of what we are dealing with, but instead as an analytic starting point.
The research question at hand then becomes: What, in turn, does the ambivalence
consist of? What are its component parts, and what clues do they give about how
sociality and politics works in this context? Instead of getting lost in the
ambivalence as such, we should focus on analysing – as it was put in the citation
earlier – its ‘unintended collective purpose’ (Phillips and Milner, 2017, p. 3).
Maybe if the ambiguity is unpacked through careful empirical and theoretical
analysis, it does not appear ambiguous at all anymore.
The Durkheim connection: Society > the sum of its parts
Turning to another sociological classic, there is a connection here to Émile

Durkheim’s The Rules of Sociological Method (1895). In this defining text for the
formation of the sociological discipline, he made the critical point that society
and sociality is something more than the sum of the individual parts of which it is
composed. The social component of life is an emergent property, meaning that it
only exists when people interact with each other in one way or another. Following
Durkheim then, the sociological object of study is not the individuals or their
actions in isolation, but the extra dimension which is added when people come
together (Figure 2.1).
Figure 2.1 Sociological object of analysis.
Indeed, neither memes nor other forms of viral communication would be possible
without the social component. These phenomena could not have been brought about by
individuals in isolation. Instead, they emerge out of the social chemistry between
individuals, groups, platforms, and discourses – something akin to what Latour
(2005) and others call the actor-network (see Chapter 4). In line with this,
Durkheim (1895, pp. 52–3) wrote that ‘in a public gathering the great waves of
enthusiasm, indignation and pity that are produced have their seat in no one
individual consciousness’. In the same way, memes and viral phenomena cannot be set
in motion by individual citizens, journalists, or politicians. They are emergent
phenomena blazing across the social media platforms. That is why media are simply
called media, and social media are called social media. So, while an individual
tweet is just an individual tweet, the phenomenon of a meme or other online
campaign is what Durkheim would call a superindividual social fact. Such facts
‘assume a shape, a tangible form peculiar to them and constitute a reality sui
generis vastly distinct from the individual facts which manifest that reality’
(Durkheim, 1895, p. 54). He explains further that:
An outburst of collective emotion in a gathering does not merely express the sum
total of what individual feelings share in common, but is something of a very
different order, as we have demonstrated. It is a product of shared existence, of
actions and reactions called into play between the consciousnesses of individuals.
If it is echoed in each one of them it is precisely by virtue of the special energy
derived from its collective origins. If all hearts beat in unison, this is not a
consequence of a spontaneous, pre-established harmony; it is because one and the
same force is propelling them in the same direction. Each one is borne along by the
rest.
(Durkheim, 1895, p. 54)
As shown in Figure 2.1, society ‘does not equal the sum of its parts’, rather ‘it
is something different whose properties differ from those displayed by the parts
from which it is formed’ (Durkheim, 1895, p. 128). The object of study of
sociology, and for the type of approach that I advocate in this book, is that
surplus of sociality which makes society into more than the sum of the parts, what
is left of society when all of its individuality is subtracted. As Durkheim calls
it: the collective. Internet researcher and data scientist Sandra González-Bailón
(2017, p. 29) explains that:
the patterns we can see at the societal level (the whole that we can only access
from a bird’s-eye view) do not necessarily disclose much information about the
mechanisms that brought them into existence.
It is these mechanisms, rather than individual peculiarities, that sociology is

interested in. Durkheim explained that the ‘mentality of groups is not that of
individuals: it has its own laws’, and ‘collective ways of acting and thinking
possess a reality existing outside individuals’ (Durkheim, 1895, pp. 40 and 45).
This is why the emergent ambiguity, or ambivalence, of social media politics must
be sociologically unpacked.
The point here is that many political social media behaviours actually can be
‘earnest’ and ‘ironic’ at the same time. The scale according to which such
ambiguous social media practice should be measured is not aligned in that way. It
does not have maximum earnestness, and minimum irony, at one end, and vice-versa.
To assess these practices, one must instead start from the very ambivalence and
look further, dig deeper. Similarly, Phillips and Milner (2017, p. 126) point out
that most, if not all, examples of ambiguous social media practice in their book
‘are funny, or might be considered funny by someone’. Most of them are also
‘offensive, or could be considered offensive by someone’. So, in the end, it is all
about the possibility that both could be true, and at the same time. This is
exactly what ambivalence – which in Latin means roughly ‘strong on both sides’ –
stands for:
Simultaneously antagonistic and social, creative and disruptive, humorous and

barbed, the satirization of products, antagonization of celebrities, and creation
of questionable fan art, along with countless other examples that permeate
contemporary online participation, are too unwieldy, too variable across specific
cases, to be essentialized as this as opposed to that.
(Phillips and Milner, 2017, p. 10)
And, even though ambiguous behaviours are indeed possible in non-digital contexts,
the social media ecology makes it possible for them to be quickly amplified to, and
through, a massive number of people, according to the logic of memes and viral
communication.
The main reason for studying ambiguous social media behaviour in ways that go
beyond simplistic explanations – trolling, weirdness, hate – that black-box the
actual processes that are going on, is that such practices are not singular in a
way that makes their meaning easy to pin down. Instead, they house ‘a full spectrum
of purposes – all depending on who is participating, who is observing, and what set
of assumptions each person brings to a given interaction’ (Phillips and Milner,
2017, p. 10). It is partly about how people can have different intentions that we
need to know more about, but it is also about the unintended or aggregated outcomes
of the practices. People who engage in the creation and circulation of memes may do
so in a spirit of mere playfulness and silliness, but that does not necessarily
mean that their contribution will not serve political ends
The Simmel connection: Social forms
Thus far in this chapter, I have discussed the case of social media politics and
how it is marked by a striking ambiguity or weirdness. I have also argued that
while we can approximate an understanding of such ambiguous social media practices
by naming them with concepts – ideal types, in Weber’s terms – such as
‘ambivalence’ and ‘trolling’, it is the task of the sociologist to devise
methodological and theoretical strategies to further unpack, contextualise, and
analyse such ideal types. This is in line with Durkheim’s argument that it is the
proper field of sociology to analyse collective and social patterns that are
external to the individual, and the ambiguous social media practices that come into
expression in the age of memes and viral communication is an example of such
collective behaviours.
Even though many of the classic sociological theories were developed long before
the digitalisation of society, they are – maybe because of their proven and generic
character – highly useful for understanding politics, sociality, and interaction
also in the twenty-first century. Similarly, many modern theories that were not
explicitly developed for analysing the internet and social media can still be
applied, for the same reasons, to this domain. The fundamental idea here is Georg
Simmel’s theory of social forms, that he developed in his 1895 essay about ‘The
Problem of Sociology’. In this text, Simmel explained what he thought sociology
should be concerned with. First of all, he has a view on ‘society’ that fits well
with today’s age of globally networked interaction and communication, where the
role of geographical place and nation states often becomes backgrounded. Just as is
the case on the internet and in social media today, Simmel (1895, p. 23) said that
‘society, in its broadest sense, is found wherever several individuals enter into
reciprocal relations’. There are no overarching and fixed, universal structures
into which individuals enter. Rather, their interactions as such originate and
reproduce what we perceive to be such structures.
The interaction between people gives rise to what Simmel called ‘social forms’,
which are a set of basic and established social patterns or categories such as
competition, conflict, cooperation, attack, defence, play, gain, aid, instruction,
division of labour, representation, solidarity, and so on. The point is that such
forms, which are of a finite number, will appear in a variety of different social
settings, with different types of content, across time and space. For example, the
social forms of dominance and subordination may occur in settings or contents such
as a family, a workplace, and a football match. And however diverse these contents
may be, Simmel (1950, p. 22) argues, ‘the forms in which the interests are realized
may yet be identical’. It is the task of the sociologist to try to abstract away
the specific setting to be able to describe the general social forms that are in
operation. Sociological analysis isolates the forms ‘from the heterogeneity of its
contents and purposes, which, in themselves, are not societal’. Simmel wrote about
this type of formal analysis that:
It thus proceeds like grammar, which isolates the pure forms of language from their
contents through which these forms, nevertheless, come to life. In a comparable
manner, social groups which are the most diverse imaginable in purpose and general
significance, may nevertheless show identical forms of behavior toward one another
on the part of their individual members.
(Simmel, 1950, p. 22)
This is why social theories, especially if they are good ones, work across settings
and become useful for interpretations that transcend time and space. The theories
are conceptualisations and explanations of social life that help search for
underlying uniformities of social behaviour, rather than focus on the uniqueness of
any social phenomenon. In this book, when approaching the complex political social
media practices in order to unpack them, to provide descriptions and explanations
that go beyond superficial labels, social theory is a crucial tool in helping to
abstract away social media hype and contextual peculiarities in order to demystify
these practices and possibly lay bare their underlying structural relationships.
Social cryptography
Analysing, as I have discussed above, political practices on social media with

social theory also demands access to data about those practices. Returning once
again to Durkheim, he argued that sociology must deal not only with concepts but
also with things – that is, not only with theory, but also with data. And, drawing
a parallel once again to Weber: both with Verstehen and Evidenz. Basically: in
order to know something about social behaviours, we must study them empirically.
Durkheim (1895, p. 69) wrote that: ‘To treat phenomena as things is to treat them
as data, and this constitutes the starting point for science.’ In order to develop
theories about social life, we must observe and analyse it. The theories cannot ‘be
attained directly, but only through the real phenomena that express them’
(Durkheim, 1895, pp. 66–70). He continued by explaining that:
We do not know a priori what ideas give rise to the various currents into which
social life divides, nor whether they exist. It is only after we have traced the
currents back to their source that we will know from where they spring.
(Durkheim, 1895, p. 70)
In other words, the sociologist must decode society in order to understand it. As
explained by González-Bailón (2017, p. 6), social life ‘has a structure that is
often hidden, buried in noise’. She argues that the right analytical tools can
‘hand us a shovel to uncover the patterns that give meaning to the social world’.
The social researcher can use social theory as the key for this act of social
cryptography, while at the same time the key itself can also be transformed by the
findings made through the analysis of the data. The obvious connection between
sociology and computational methods may be through sociology’s ‘quantitative’
specialisations, but there is much to gain from bringing data science methods into
contact with the, maybe less expected, ‘qualitative’ framework. The data-drivenness
of data science can be construed as a form of digital fieldwork, rather than in
terms of positivistic hypothesis testing. This sits better with Max Weber’s notion
of sociology as an enterprise to interpret and understand the inherent ambivalence
of social actions.
Unintended Consequences
In the age of memes and viral communication, at six minutes past midnight on 31 May
of 2017, US President Donald Trump posted a tweet that read:
Despite the constant negative press covfefe.
The fact that he posted this with no context, and simply left readers with just
that word – ‘covfefe’ – hanging there, quickly awoke widespread confusion. Indeed,
people do sometimes stumble with their thumbs over the tiny keyboards on their
mobile devices (Trump’s tweet was sent from the Twitter app on an iPhone), but the
puzzlement over covfefe was allowed substantial time to grow, diversify, and branch
out as the tweet was not deleted until shortly before 6 am the following morning.
The covfefe tweet was soon to be liked and retweeted hundreds of thousands of
times, which made it one of the most popular tweets of 2017. Responses included
bafflement (what is this?), concern (is the President well?), criticism (this
proves that a president should not tweet!), alongside humour and bemusement (now,
I’d like me a nice cup of covfefe!)
In this chapter, we approach covfefe from the perspective of the sociology of

unanticipated consequences to disentangle its surrounding twisted web of tweets,
talk, and discourse. This is a case study, presented before we delve deeper into
data science territory in the chapters that follow, to illustrate how social theory
can aid the disentanglement of ambivalent social practice. In this particular case,
we will take help from sociologist Robert K. Merton’s perspective on the sometimes
unpredictable, and possibly ambivalent, relationships between what people do, or
intend, and the outcomes of those actions.
The thumb-typing leader of the free world
The wider context of the tweet was Trump’s position as probably the first full-
blown reality TV and social media president. Following historical examples such as
Franklin D. Roosevelt’s successful use of radio, John F. Kennedy and television, as
well as Barack Obama’s victorious appropriation of the internet, the populism of
Trump – alongside his loose-cannon character and doubtful public image – truly went
wild on Twitter and Facebook both before and after his election. Smart –
algorithmically enhanced – propagandising, digital agitation, and just plain troll
style social media chaos was definitely a key part of Trump’s successful
presidential campaign in 2016. Consequently, social media have been generally
credited for bringing about both the rise into the mainstream of the alt-right, as
well as the Trump presidency.
The official Donald Trump Twitter handle, @realDonald Trump, was created in March
2009. During the first two years of the existence of the account, it was largely
staffers who wrote the tweets which were then posted after Trump’s approval. During
this period, tweets written by Trump himself had the phrase ‘from Donald Trump’
tagged on to them. Around mid-2011 however, his own personal use of the platform
increased, and this tagging practice was gradually terminated. There has been lots
of speculation, and even some research, concerned with when @realDonaldTrump really
is Donald Trump. First, there was the widely known secret that by looking at the
metadata of tweets, for whether the tweet had been sent from an iPhone or an
Android, could provide some clues.
A general pattern was that more ‘boring’ tweets – event announcements, polls, press
releases – originated from iPhone, while the more emotionally charged stuff, often
posted late at night in the form of heated comments on politicians and celebrities,
came from an Android device. With the knowledge that Trump personally used Android
at this point in time, data-minded journalists and researchers could easily make
the distinction between true and ghost-authored Trump tweets. As the Android tweets
went quiet when Trump had reportedly switched to an iPhone, machine learning tools
such as @TrumpOrNotBot started to appear. Applications like that one were
developed, using data science techniques such as natural language processing and
machine learning, to estimate the likelihood by which a tweet was written by Trump
himself or not (McGill, 2017).
Even without such data processing however, there are ways to tell. Trump has been
said not to know how to attach media to tweets, or even how to use hashtags. It has
also been reported that the President’s own tweets often include spelling errors,
and furthermore that another way to tell is if ‘the tweetstorm consists of a series
of disconnected nonsense’ (Feinberg, 2017).
Even though Trump claimed before his inauguration that he was going to lessen his
social media use after becoming President, this turned out not to be the case. By
March 2018, he had a Twitter audience of close to fifty million people, enabling
him to bypass and sideline mainstream media. This dramatic shift in the way the US
President communicates with the public has created debate about how to view such
personal, often affective, and direct messages from the proverbial leader of the
free world, and not the least about the consequences for democracy. In June of
2017, about a week after the covfefe tweet, then White House press secretary Sean
Spicer made it clear that Trump’s tweets should indeed be seen as official
statements by the President of the United States, as he felt that the President,
with 110 million followers across media platforms, ‘is the most effective messenger
on his agenda’ (Landers, 2017).
So, back to the wee hours of 31 May 2017 and covfefe, the garbled midnight posting
that baffled an audience of millions. Surely, one could assume that it was just a
typo or other mishap, disregard it, and move on. But the playfully disruptive,
meme-hungry and trolling-prone networked publics of 2017 wouldn’t have it. Instead,
covfefe set off – on a global scale – just the type of multifaceted and ambiguous
viral response that was discussed in the previous chapter. As we shall see, the
reaction not only ranged from political criticism to lulzy fun-poking. It was both,
and a whole spectrum of much more, at the same time: It was prime ambivalent
internet. Postmodernist philosopher François Lyotard (1984) argued that in
‘computerized societies’, technologies (such as, possibly, today’s social media)
contribute to a breaking up of narratives that generates a crisis for absolute
meaning and truth. From such a perspective, the ambivalent social processes set in
motion by memes and viral communication, are simply par for the course in a
postmodern world.
But this book is not about accepting such black-boxed ambivalence as it is.
Instead, it is about empirically and theoretically unpacking such black boxes in
search for more nuanced knowledge about what is going on. In this particular
chapter, I approach covfefe from the perspective of the sociology of unanticipated
consequences. I direct my interest towards the ambivalent relationship between what
was done, or intended, and the outcomes.
Unpacking ambivalence
In a 1976 essay on ‘Sociological Ambivalence’, sociologists Robert K. Merton and

Elinor Barber, wrote about the permanence of the phenomenon of ambivalence of human
attitudes and behaviour. Indeed, long before social media memes and oddball troll
humour, and also today but far removed from internet culture, people are ‘pulled in
psychologically opposed directions’ in a wide range of contexts (Merton, 1976, p.
3).
Early psychoanalysts such as Eugen Bleuler (1911), who coined the concept, and
Sigmund Freud used the term ambivalence to name the alternating hostility (hate)
and devotion (love) that one and the same person may feel in relation to one and
the same thing. While they were primarily interested in combinations of opposing
psychic forces, Freud also wrote in 1910 about opposite, or antithetical, meanings
of words (Freud, 1997, pp. 94–100). He cited the philologist Carl Abel who showed
already in 1884 that in the ancient Egyptian language there was a fair number of
words that had two meanings that were the exact opposite of each other, as if
‘dark’ would mean also ‘light’, and that ‘weak’ would also mean ‘strong’. This, it
would appear, could mark the absolute dissolution of meaning. But, as Abel wrote in
his original paper ‘Egypt was anything but a home of nonsense.’ Rather, Freud
concluded, knowledge is always relative and things can have dual meanings – an idea
that he brought with him into his work in interpreting dreams.
Merton and Barber felt that most early writings on ambivalence were of a similar,
psychological, character. Now, they wanted to put social relations at the centre of
a discussion of the concept. A sociology of ambivalence, they argued, would focus
on how ambivalence comes to be a built-in part of the structure of society and its
roles. They stressed the importance of examining ‘the processes in the social
structure that affect the probability of ambivalence turning up in particular kinds
of role-relations’, as well as ‘the social consequences of ambivalence for the
workings of social structures’. So, in light of this, we can approach covfefe, as
it was constructed and appropriated by networked publics, not simply as play and
noise, but as an empirical case of twenty-first-century sociological ambivalence in
action.
Following the general objective of this book, the ambiguity of social media
practice can thus be unpacked drawing on sociological theories in order better to
identify the social origin and effects of such practice. Rather than writing off
the case of covfefe as simply ambivalent, we can bring its component parts into
view by construing the ambivalent with the help of a sociology that takes into
account the unexpected, the dysfunctional, and the serendipitous.
Sociology may have been mostly interested in purposive social action and its
intended consequences, such as when a social actor has a particular motivation.
Weber ([1921] 1978, pp. 24–6), for example, talked about how instrumentality,
values, traditions, and emotions could be drivers, as social actors employ certain
means to try to achieve their goals. Much social research has focused on why people
do things, and whether their goals are reached. As put by sociologist Alejandro
Portes (2000, p. 2):
From our first days in the discipline, we social scientists have been trained in
the formulation of hypotheses about aspects of social reality. Scientific
hypotheses explicitly assume the lawfulness of the real world that makes a number
of regularities predictable and observable. More implicitly, the logic of
hypothesis formulation commonly assumes that consequences follow more or less
linearly or rationally from certain antecedents. Linearity implies a cumulative
process where the presence and growth of given factors lead logically to their
culmination in specific effects. Rationality implies intentionality when these
effects are brought about by the deliberate actions of those involved. Many aspects
of social life are linear and rational in this sense and, hence, lend themselves to
a science of cumulative and predictable consequences.
From that perspective, one may be interested in analysing the role of social media
propaganda for the success of Trump’s presidential campaign. What did it aim to
achieve? What tactics and technological affordances did it leverage? And, to what
degree was it successful because of this? Definitely, such premeditated action and
its causal effects have forged the bedrock for the bulk of empirical research and
theoretical developments in social science. But, Portes (2000, p. 1) argues,
sociology has also since its beginnings ‘harbored, a “contrarian” vocation based on
examining the unrecognized, unintended, and emergent consequences of goal-oriented
activity’. Add to this that it has also been interested in understanding not only
such goal-oriented actions, but also irrationality and errors (Boudon, 1988;
Douglas, 2001; Mulkay and Gilbert, 1982). This sociology is interested in
alternative goals, means, and outcomes, and argues that we must be sensitive to the
unexpected dialectics and turns of events. As sociologists, we must have an
interest in ‘unearthing the unexpected in social structures and events’ (Portes,
2000, p. 3). Covfefe is an example of a phenomenon that cuts across a range of
social modes, spheres, and contexts, and which defies any such formal and logical
sense-making that conventional theorising demands.
It was this type of phenomenon that Merton had in mind when he formulated his
theory about unanticipated consequences of social action (Merton, 1936). The
concepts that he developed then are just as relevant today as they were when he
first published them more than eighty years ago. The article where he described his
theory was groundbreaking at the time, and it has consequently paved the way for a
number of modern sociological concepts that highlight ‘the paradoxical nature of
social life’ (Portes, 2000, p. 7).
Error
In his 1936 article, Merton argued that a scientific analysis of unanticipated

consequences of social action was needed. Instead of linking such consequences with
the ‘will of God or Providence or Fate’ (Merton, 1936, p. 894), he felt that the
phenomenon demanded systematic treatment. In discussing a set of conceptual
dimensions of unexpected consequences he focused, among other things, on the role
of error for the course of social events. Merton wrote about how error may occur in
any phase of our social activities:
we may err in our appraisal of the present situation, in our inference from this to
the future objective situation, in our selection of a course of action, or finally
in the execution of the action chosen.
(Merton, 1936, p. 901)
So when, at 12:06 am on the last day of May in 2017, the US President tweeted
covfefe, he can likely have been in error. Probably then of the last mentioned type
above: in his execution of the action. This erroneous social act – because let’s
assume that it was an unintentional typo – immediately led to a chain reaction of
unanticipated consequences. In Merton’s terms, these are consequences that, while
not necessarily undesirable for the actor, were not planned or expected by her or
him.
‘Despite the constant negative press covfefe.’ Half a sentence and a likely typo
from the President of the United States. Within mere minutes, this became one of
the biggest social media trends of 2017. This clause without any further context
came to the public as a clear reminder of how different Trump was from past US
presidents in terms of the ways in which he communicated with the public. Far
removed from the conventionally expected, carefully orchestrated, announcements
that have been typically associated with the office. CNN tried to contextualise the
covfefe tweet:
The President’s Twitter feed was largely quiet during his recent trip abroad, but
hours after returning home, he unleashed a social media blast that felt like a
pent-up release of fury that had been simmering over the nine days of his debut
foreign tour. As usual, he took aim at the media and decried the ongoing Russia
investigation. The tweets represent Trump at his most authentic and defiant,
lashing out when he feels he is under attack, and appeared to reflect a belief that
only he, and not his staff, is qualified to speak in his own defense […] It seemed
like the President was going to complain about the press – something he’s wont to
do – but it wasn’t part of a Twitter rant. It was his first tweet in 20 hours.
(Berlinger, 2017)
It was reported that, as of 4 am, the covfefe tweet had been retweeted more than
108,000 times, and had received more than 135,000 likes. This was in comparison
with many of his other tweets around this time that had much lower retweet and
favourite numbers that were in the lower tens of thousands. Covfefe was Trump’s
most popular tweet in months, as it instantly went viral and became an internet
meme spurring a plethora of jokes, paraphrases, suggestions or guesses to its
‘true’ meaning, as well as outspoken criticisms.
#Covfefe, it means ‘my hands are too small to type Coverage on the keyboard’
#Covfefe means resign in Russian
‘And just before you serve it, you hit it with a dash of #Covfefe’
dear everybody writing the stories, nobody ‘wonders what covfefe really means’
everybody knows he passed out with his thumb on the phone
I think my cat just licked his #covfefe
We’re all paying more attention to Trump’s covfefe typo than worrying about him
pulling out of the Paris climate deal. Part of the problem,
I’m not worth a damn until I have my first cup of #covfefe
Beware the power of #Covfefe The means to achieve global domination.
#Covfefe in Arabic but written different but sounds the same #Khafiif it means
light,
he felt light headed from all the negative news
Weird. When I try to type #covfefe it autocorrects to ‘I should resign because I’m
an incompetent fraud with obvious mental illness.’
Huh.
Aside from tweets like these, the web domains covfefe.com, covfefe.net, and
covfefe.org were quickly snapped up, covfefe t-shirt designs were announced, people
in the US bought COVFEFE licence plates, and a Twitter user registered the account
@CovfefeS as ‘Covfefe The Strong’, and responded to Trump’s tweet: ‘I HAVE BEEN
SUMMONED.’ Some Twitter users posted intertextual references to films, for example
drawing parallels with Citizen Kane where a reporter searches for the meaning of
the enigmatic word ‘rosebud’, or suggesting that covfefe might have been the final
word whispered to Scarlett Johansson by Bill Murray in the film Lost in
Translation. This, still, was just the tip of the iceberg with thousands and
thousands of similar reactions being posted globally, many marked by high levels of
creativity and internetty cross-referencing humour. Amid all of this fun, however,
it is important to note that this was not randomness. It was a specific social
effect following from specific traits in the structure and infrastructure of
society and communication in the 2010s. More precisely it is an effect of networked
publics and spreadable media.
Cultural anthropologist Mimi Ito (2008) describes the emergence of networked

publics as a consequence of a set of social, cultural, and technological
transformations that have followed from the transition to a digital society. People
today are largely ‘networked and mobilized with and through media’ (Ito, 2008, p.
2). By using the word ‘public’, rather than ‘audience’ or ‘consumers’, Ito wants to
put the more engaged stance of people interacting with media in digital society in
the foreground. In talking about the publics as ‘networked’, she emphasises how we
– in the age of social network apps and portable gadgets connected to the internet
– communicate increasingly through large and elaborate networks that may go in any
thinkable direction, such as bottom-up, top-down, or side-to-side. Furthermore,
participants – Ito prefers this label ahead of users or consumers or audience
members – are actively (re-)making and (re-)distributing content in emerging
systems of many-to-many communication and interaction. As media theorists Marshall
McLuhan and Barrington Nevitt wrote already in 1972:
we live in an age of simultaneity rather than of sequence. We start with the

effects before the product. The consumer becomes producer.
(McLuhan and Nevitt, 1972, p. 27)
What happened when covfefe, within hours, was massively taken up, renegotiated in a
multitude of ways, remixed, and re-recirculated is a prime example of what can
happen when networked publics have their way with social, often politically
discursive, raw materials. The networked social structure of the public, alongside
an infrastructure of ‘spreadable media’ aids this. This latter concept is defined
by media scholars Henry Jenkins, Sam Ford, and Joshua Green as:
an emerging hybrid model of circulation, where a mix of top-down and bottom-up

forces determine how material is shared across and among cultures in far more
participatory (and messier) ways.
(Jenkins, Ford, and Green, 2013, p. 1)
So, within this hybrid system of circulation, populated by networked publics,

covfefe (a likely error), was propelled into something else. The most obvious
reason for this was of course that it was posted by the holder of what is probably
the most scrutinised office in world politics, and also by a recently installed,
widely criticised president. From the perspective of social and cultural theory
however, Sara Ahmed’s (2004, p. 11) notion of stickiness can add to the
understanding of the process. She uses this concept to describe how some of the
objects that are shared and circulated socially ‘become sticky, or saturated with
affect, as sites of personal and social tension’. Things can be sticky because they
are loaded with affect. And sticky things can obviously stick to other things.
Online we might find sticky videos, sticky images, sticky hashtags, or sticky
discussion threads.
As things go viral they are stuck together by affect. The stickiness of such things
might be measured by how often people reply or comment, share or like, or dislike,
the content in question. For Ahmed (2004, p. 45), there is a ‘‘rippling’ effect of
emotions; they move sideways (through ‘sticky’ associations between signs, figures,
and objects). In Ahmed’s view, things might be sticky because of both enjoyment and
antagonism – both positive and negative affect.
Covfefe, no doubt, was sticky. It was saturated with affect, as there was a
widespread criticism of Trump’s election and office among networked publics. As
covfefe was thrown into the machinery of spreadable media, the reactions, ranging
from humour and mockery to outspoken criticism, were, seemingly automatically,
deployed. But of course, the reaction was far from automated as it comprised and
amplified the myriad micro-actions of thousands of covfefe tweeters.
L’affaire covfefe
Apart from error, Merton also discussed other factors behind unanticipated
consequences, one such being the ‘imperious immediacy of interest’ (Merton, 1936,
p. 901). By this he referred to such cases as where a social actor is so concerned
with the foreseen immediate consequences of the action, that he or she does not
care about considering any further, unforeseen, consequences. In situations where
an individual wants somehow to fix a situation in the short term, one may choose to
act in ways that, while rational in some sense, are irrational in the wider
perspective of things. Making such choices, Merton (1936, p. 902) argued, will get
consequences as ‘a particular action is not carried out in a psychological or
social vacuum’, meaning that ‘its effects will ramify into other spheres of value
and interest’. Let’s look here at two such actions. First what Trump did in the
early morning of 31 May 2017 when he realised that he had posted covfefe, and
second, what his then press secretary Sean Spicer chose to do.
Having left the world wondering for about six hours, Trump’s tweet was deleted at
around 5:50 am. Had he done this immediately after posting it, he may have chosen
to leave it at that. But, it seems, as he had woken up to a social media avalanche
of responses, reactions, and spin-offs, that he chose to post an alternative tweet
at 06:09 am:
Who can figure out the true meaning of ‘covfefe’ ??? Enjoy!
This tweet, whether posted by the President himself or by an aide, while having a
somewhat lighthearted tone, could in Merton’s terminology be interpreted as an act
to save face in the short term, but with wider unanticipated consequences. Rather
than just staying silent about covfefe, and probably leaving a majority of people
just assuming it to be a nightly typo, triggering a brief shower of nightly
internet weirdness, the President chose to continue playing into it. Was he humbly
joking about a typo, or had he in fact intentionally posted covfefe as the start of
a game that now continued as he asked for guesses as to its meaning? As suggested
by quite a few commentators, harnessing the insight that covfefe had gone viral can
likely have been a distraction tactic by the President, as:
Every moment that we spend talking about and writing about covfefe is one less
minute that we spend talking about Russia, or the resignation of his White House
communications director, or talking about why on earth he would want to pick a
fight with Germany.
(Borchers, 2017)
Paraphrasing Merton, this act certainly ramified into other spheres of value and
interest. The President actually engaging further with covfefe, beyond the initial
likely typo, fuelled discussions and intensified criticism related to what it
actually means to have a president who is tweeting poorly at midnight about getting
bad press. Political commentator and columnist Chris Cillizza wrote that:
To be clear: This is, on its face, dumb. Trump seemed to be trying to type
‘coverage’ and misspelled it. As he often does. Then he fell asleep and didn’t
correct the mistake until he got up in the morning. We’ve all been there! (OK, not
all of us. But me.)
(Cillizza, 2017)
But the lasting impression of covfefe, Cillizza argued, raises wider questions
about the political situation in the United States. He wrote that ‘what we have had
since the day Trump came into the White House is a deeply isolated President who
spends lots of time, particularly at night and in the early morning, watching TV
and tweeting’ (Cillizza, 2017). Trump’s two covfefe tweets, then, were evidence of
a lack of discipline that in turn bears witness to the fact that no one can say no
to Trump, especially no one that he will listen to. In his CNN.com column,
published the day after the covfefe postings, Cillizza continued to say that
Trump’s ongoing presence on Twitter was a perfect example of how the President does
not think that he needs advice. There had been talk of Trump finally changing his
ways of using Twitter into more of a serious political platform and less as a
weapon for exacting revenge on people. A reboot, where a ‘team of lawyers’ was to
be vetting Trump’s tweets before they were posted, had been promised:
And then, ‘covfefe’. What it should prove is that Trump is neither willing nor able
to change his stripes. […] Which means more ‘covfefes.’ Maybe many more.
(Cillizza, 2017)
Drawing on Merton’s theory about unintended consequences, putting the immediate

interest of playing into the networked public response to the likely typo ahead of
long-term humility and conventional political seriousness, backfired by spurring on
a counter-discourse about what was ‘the real problem’ behind covfefe. What could
have happened? Columnist Leonid Bershidsky wrote that rather than guffawing over
covfefe, being Russian he feared the worst. The typo set off a jokefest, but next
time it could set off something that is deadly serious. Bershidsky drew parallells
to Ronald Reagan’s nuclear joke in 1984:
But what if @realDonaldTrump tweeted something that came out of Ronald Reagan’s
mouth in 1984? ‘My fellow Americans, I am pleased to tell you that I have signed
legislation to outlaw Russia forever. We begin bombing in five minutes.’ Reagan’s
infamous ‘nuclear joke’, which rocked world leaders, was 136 characters long, just
right for a tweet. No military confrontation ensued because it was immediately
obvious Reagan didn’t mean it: He was merely unaware of a live microphone. I
shudder to think what procedures would be launched if a similar Trump tweet were
left out there for hours. […] What are we to think when a tweet by the commander-
in-chief suddenly trails off incoherently? Is he alive, is he in command of his
senses as well as that military strength? Has he been hacked? Or did he merely get
distracted mid-tweet and drop the phone in his pocket without blocking it?
(Bershidsky, 2017)
So, in sum, Trump’s way of responding to the multitude of reactions to covfefe, by
breathing new life into it by asking who could figure out what it meant, enabled
the likely typo to grow into a powerful platform from which to intensify the
criticism of the Trump presidency. At a White House press briefing later in the
afternoon of 31 May came another opportunity for the Trump office to impact on the
future destiny of covfefe. A reporter asked Trump’s then press secretary Sean
Spicer the question that she, and many confused and amused people around the world,
needed an answer to: ‘What is covfefe?’ The simplest, most likely, response to this
– ‘the Occam’s razor answer’ (Garber, 2017) – was that it was a typo. Furthermore,
the tongue-in-cheek tone of Trump’s 06:09 tweet about the ‘true’ meaning of covfefe
seemed to indicate that he admitted to having made such an error. The nightly wave
of jokes and memes seemed to have run their course, and the world was now ready to
move on. So how did Spicer respond to the reporter’s question?
Merton, in his sociology of the consequences of social actions, admits that there
are methodological pitfalls in investigations of the purposefulness of what people
do. First, there is the problem of causality. How can one surely know the extent or
degree to which certain outcomes are in fact consequences of certain actors, as
there is always a complexity of factors at play in any social situation. Second,
there is the problem of ascertaining which were the ‘true’ proposes of any given
action. There can be rationalisations made, he writes, ‘where apparently unintended
consequences are post facto declared to have been intended’ (Merton, 1936, p. 897).
Merton explains further:
Rationalizations may occur in connection with nation-wide social planning just as

in the classical instance of the horseman who, on being thrown from his steed,
declared that he was ‘simply dismounting’.
(Merton, 1936, p. 897)
When answering the reporter’s question about the meaning of covfefe, Spicer, it
appeared, gave such a rationalising reply:
I think the President and a small group of people knew exactly what he meant.
After which ‘chaos erupted among the press gallery’ (Garber, 2017). Spicer could
have chosen to say that covfefe was a mere typo, and that it was simply an
unfortunate flipside to having a president who is able to communicate personally
and directly to the people. But instead, the press secretary suggested that covfefe
did in fact not only have a specific meaning, but also a meaning that only ‘a small
group of people’ had the privilege of knowing. This set off a new wave of
bafflement, and also some ‘covfspiracy’ theories. Referring to ‘l’affaire covfefe’,
author and columnist Jonah Goldberg (2017) wrote about how ‘all reasonable people
assumed it was a typo’, ‘a lot of people had fun with it’, and ‘some people made a
big deal about how the President is up at midnight tweeting unvetted junk’.
Goldberg’s conclusion was that either covfefe is truly some kind of codeword that
only a few people know, which in that case would be ‘bananas’, and either way the
President should not send such messages over Twitter. Or, Goldberg wrote, Spicer
was lying, and in that case it would mean that the Trump administration was unable
to admit to making even such a minor and insignificant error and that the press
secretary felt ‘compelled to protect the myth of Trumpian infallibility at all
cost’, something that ‘tells us far more about this White House than Trump’s silly
tweet ever did’ (Goldberg, 2017). What are the consequences, Goldberg asked, of
having a government who would rather have the world think that the US President is
sending secret codes over Twitter than admitting to a typing error?
Social media backfire
The covfefe tweet may have been intentional – a purposive act (Merton, 1936, p.
895) – but was most likely an error (Merton, 1936, p. 901). No matter which of
these is true (see Figure 3.1), it garnered a rich ecosystem of ambivalent
reactions expressed through the kind of ambiguous social media practice that was
discussed in the previous chapter. If the tweet were indeed intentional, a
purposive guessing game or even a secret internal code, the act of posting it
indeed backfired and led to a set of unintended consequences. But if, which is much
more likely, the covfefe tweet was a mistake, the ambivalent social media reactions
that it set off marked the start of a chain-reaction where what Merton (1936, p.
901) calls ‘the immediacy of interest’ caused Trump and his press secretary to act
on the basis of what appeared to be a short-term rationality, and to make post
facto rationalisations. Trump jokingly tried to half-cover-up and half-admit that
covfefe was an error. This, as discussed above, played into the internetty logic to
boost the memes and randomness as a potential distraction of the public from
possibly more pressing current political issues at hand. Mostly however, the
President’s morning guessing-game tweet mostly added fuel to a rising tide of
intensified criticism of his social media practices.
Figure 3.1 Covfefe flowchart.
Spicer’s post facto rationalisation, claiming that covfefe was indeed a thing and a
serious and important one at that, definitely switched the social media backfire
into overdrive. Under political pressures, and affected by the immediacy of
interest, the press secretary somehow decided to give a response that was sure to
spur even more mockery, criticism, and conspiracy theories. In other words, covfefe
became a potential discursive tool for attacking or criticising the Trump
presidency. So, what was the political effect of covfefe in a wider perspective? In
its aftermath, quite a few Twitter users seemed to feel that reacting to covfefe
was a waste of political potential.
Think of all the problems that could have been solved if we channeled the
collective mental power wasted on #Covfefe.
My favorite part of Covfefe is how it’s distracting from the Paris Accord
withdrawal.
#Covfefe #ParisAgreement #WereAllGoingToDie

OK World: We’ve wasted enough fucking time on #covfefe idiocy. Let’s get back to
meaningful stuff, because that ain’t it.
But from a different perspective – even though it, short term, distracted from
Trump pulling the US out of the Paris Agreement on climate change – covfefe can
also be seen from the discursive perspective, as a disruptive space (see Chapter 2
of this book, and also Lindgren, 2013). Covfefe, construed as a disruptive space,
is a form of noise – a disturbance in the orderly sequence. Cultural theorist Dick
Hebdige (1979, p. 90) argues that the power of subculture is ‘an actual mechanism
of semantic disorder: a kind of temporary blockage in the system of
representation’. Subcultures thus operate to destroy existing codes and to
formulate new ones through appropriations, thefts, and subversive transformations
(Hebdige, 1979, p. 129). They are warnings to the ‘straight world’ of the presence
of difference, and they signal a refusal through their power to disfigure. In this
argument, all disruptions are somehow semantic or symbolic. The messy agglomeration
of micro-acts sticking to covfefe then becomes a platform from which to contest
social reality as it stands, whether this is through humour or other means. So,
even if much of the covfefe may appear as random ramblings, the disturbance as such
still has political consequences and effects. The noise generated through covfefe
becomes a crack in the dominant discourse, a crack that can be deepened through
continued work.
Disruptive spaces are the latent building blocks of a tentative alternative public
sphere, as counterpublics (Warner, 2002) may emerge through the cracks. If and when
the potential of such counterpublics is realised, they may make possible a wider
range of opinions, new modes of making sense of social reality, and new forms for
creativity, as well as direct activism. Under the best of circumstances, if they
are appropriated efficiently, disruptive spaces can ‘make possible a re-configuring
of politics and culture and a refocusing of participatory democratic politics for
everyday life’ (Kahn and Kellner, 2008, p. 33). So, we have come full circle to
Beyer’s (2014, p. 132) point, discussed in the previous chapter, that ‘there is
value not only in the places online that fit our expectations of civil society but
also in the places online that make us cringe’. Such as covfefe. The ordered chaos
emerging around the epicenter of covfefe, the ambivalence unpacked in Figure 3.1,
thus has the potential to function as a catalyst for political change.
The analysis presented in this chapter can be seen as a contribution to a sociology

of messiness. As an example of how social theory – in this case, prominently that
of Merton (1936), formulated long before the internet – and systematic analysis can
produce new ways of seeing things that we already have some sort of spontaneous
understanding of. Most people living in the twenty-first century would be able to
see, at face-value, what covfefe is about. It’s ‘Trump’, it’s ‘memes’, it’s ‘social
media’, it’s embarrassing or fun, saddening or alarming. From the perspective of
social analysis however, such gut reactions are rarely enough, and they may
sometimes also be wrong. That is why we need the theoretically informed and
empirically based analysis.
This also helps us steer clear of technological determinism, that is the idea that
the internet and social media by necessity lead either to good or bad. The logic
here is that just because the internet has this or that architecture, and just
because social media platforms are owned by so-and-so, and just because Twitter has
these types of affordances while Facebook has others, no given social outcomes
follow from this. It may appear as though the idea of the internet as being
ambivalent, as discussed in the previous chapter, is a solution to this. Since we
cannot assume any unified and universal consequences, we can therefore simply
assume that the internet is ambivalent. There is clearly the risk that this too
becomes a form of technological determinism, in the shape of the claim that the
internet and social media by necessity create ambivalence. But this is not enough,
the apparent ambivalence must in turn be analysed until we see something other than
ambivalence, or at least until we can say something about how something is
ambivalent, why we perceive it as ambivalent (does it maybe transcend our current
categories of thinking?), and what social consequences it can have.
As discussed earlier, studying ambivalence is not new to sociology. Max Weber

invented his analytical method of ideal types in the 1800s based on the insight
that society bears such complexity that we must simplify it in order to understand
it, and sociologist Zygmunt Bauman (1991, p. 1) wrote in the late 1900s that:
ambivalence is not the product of the pathology of language or speech. It is,

rather, a normal aspect of linguistic practice. It arises from one of the main
functions of language: that of naming and classifying. Its volume grows depending
on the effectivity with which that function is performed. Ambivalence is therefore
the alter ego of language, and its permanent companion – indeed, its normal
condition.
Ambivalence, then, is not an internet thing. It’s a modernity thing. It sits deeply
within our society and culture and we see it operate in upscaled, amplified, and
speeded-up ways on social media. A key difference in digital society is that
ambivalent acts, responses, and effects are documented in new ways. They are
datafied, and rendered visible and spreadable in new ways. Sociologist Anton
Törnberg (2017, p. 17) points out that social scientists are often interested in
phenomena that are ‘deeply characterized by nonlinear dynamics’. Today, he writes,
we see such dynamics expressed through digital media:
Personalized memes spread like global wildfires, and a previously unknown mobile
game may quickly become a viral success with millions of users worldwide hunting
artificial monsters on the streets. Sometimes these online phenomena also have
large societal consequences, for instance when the photograph of the three-year-old
Syrian boy, Alan Kurdi, washed up on a shore in Turkey, quickly spread in the media
and dramatically increased attention on the ongoing refugee crisis, initiating
protests all across Europe in favor of more humane immigration politics.
Internet researchers Eszter Hargittai and Christian Sandvig (2015) discuss how
digital media and the internet can offer new tools for answering questions about
society in new ways. Approaching ambiguous social media practice, as it was
discussed in the second chapter of this book, through the lens of social theory –
as illustrated through the case study in this present chapter – holds ‘the
potential to answer basic questions about the networked structure of human
interaction’ (Hargittai and Sandvig, 2015, p. 8). If we realise that the social
world is so complex that ‘there is no universal method that is capable of dealing
with social phenomena in their entirety, we may combine different methods to cast
light on different aspects of them’ (Törnberg, 2017, p. 88). With an open
methodological approach such as the one outlined in Chapter 1, digital social
research offers new tools that can be used to shed light on both new issues that
are specific to digital society, and on basic and long-standing questions about
human social life.
4
Actor-Networks
Drawing on actor-network theory – a tradition where significant work is already

being done in the direction of the hybrid and integrative
empirical/theoretical/methodological vision of this book – this chapter will
provide an example of a combination of computational techniques for data analysis
on the one hand, and interpretive theoretical analysis on the other. In this case,
the dataset consists of 1.1 million tweets that have been sampled from Twitter’s
Streaming API (Russell, 2018; Tweepy, 2009), and that includes a set of keywords
relating to climate change, global warming, and climate change denial.
As will be discussed in the chapter, I have chosen a particular theoretical

perspective beforehand, in the said form of actor-network theory (Callon, 2001;
Latour, 2005; Law, 1999). This interpretive framework has been consciously chosen
as it puts focus on what we want to know about, in this case example, climate
politics on Twitter, namely how human and non-human actors (policies, technologies,
ideas, biological processes, and so on) are associated, based on the data that we
have. While this chapter thus starts from theory, rather than data, the following
chapter (5) will offer a similar case example, but instead starting more clearly
from data. In actual research practice, these two sides will always be more or less
entangled, in different proportions and configurations.
As actor-network theory is about seeing how actors (human and non-human) connect
and interrelate, the methodological challenge in relation to our dataset involves
two steps. First, to find the actors, and second, to trace their relations. If we
can achieve that, we will have an empirical foundation upon which the theoretical
analysis can rest.
A sociology of translation
The theory that we are starting from here is actor-network theory (ANT), which has
been developed within the field of science, technology and society studies, but has
been applied way beyond its area of researching scientific and technical practices.
Most generally, it is a theoretical perspective for analysing how people and/or
things of any kind are associated in a given social context. It is, however, not
necessarily about analysing networks (nodes and their connections) in the
conventional sense. As formulated by one of its originators, sociologist Bruno
Latour:
With Actor-Network you may describe something that doesn’t at all look like a
network – an individual state of mind, a piece of machinery, a fictional character;
conversely, you may describe a network – subways, sewages, telephones – which is
not all drawn in an ‘Actor-Networky’ way.
(Latour, 2005, p. 142)
So, ANT is not a shape to be applied to the social, but a way of seeing the social
as ‘complicated, folded, multiple, complex, and entangled’ (Latour, 2005, p. 144).
ANT is about tracing connections and associations between heterogeneous elements.
This is what I will do in this chapter’s case example by using computational
methods in an attempt to unfold the nestedness of climate politics in a Twitter
dataset.
An early version of ANT was developed by Latour and Steve Woolgar as they undertook
an ethnographic analysis of a biological research laboratory. Their study
highlighted the important role of material objects in processes where truths are
constructed. They showed how physical things such as machines, paper, raw data, and
test tubes that were used and produced in the laboratory, played an equally
important role to that of humans and their intellectual contributions in producing
‘facts’ about nature. They therefore saw it as dubious to make any distinctions
between human and non-human agency altogether. They wrote that ‘one cannot take for
granted the difference between “material” equipment and “intellectual” components
of laboratory activity: the same set of intellectual components can be shown to
become incorporated as a piece of furniture a few years later’ (Latour and Woolgar,
1979, p. 238). Furthermore, they theorised the systems in which such actors assume
relational positions within which they have meanings as networks, hence the actor-
network. As argued by Latour and Michel Callon, one must also assume that there is
no inherent difference between ‘micro-actors’ and ‘macro-actors’. They explained
that ‘no actor is bigger than another except by means of a transaction (a
translation)’ (Latour and Callon, 1981, p. 280). In sum, ANT assumes no social
structure or other firm framework beforehand, as it analyses the social as emergent
through the connections made between elements.
A key point to ANT is that, in order to be successful, claims to truth must be

supported by chains of association that strengthen the claims to make them enduring
and robust. Such chains involve not only humans, but also things that come into
expression in the shape of ‘semiotic actors presented in the text but not present
in the flesh’ (Latour, 1987, p. 64). In the text one can therefore study ‘the
collective action of human and non-human actors tied together’ (Latour, 1987, p.
141), and in doing so ‘we have to be as undecided as the various actors we follow’
as to what the object of analysis is made of (Latour, 1987, p. 258). The general
methodological principle is to follow actors in the network to see how something
becomes accepted and ‘true’.
Truth is established through a process of, what is called, translation. The concept
of translation is vital to ANT, and it refers to something that is similar to the
notion of construction within social constructionism, namely to ‘a complex process
of negotiation during which meanings, claims and interests change and gain ground’
(Waeraas and Nielsen, 2016, p. 237). Latour and Callon (1981, p. 279) explained:
By translation we understand all the negotiations, intrigues, calculations. acts of

persuasion and violence, thanks to which an actor or force takes, or causes to be
conferred on itself, authority to speak or act on behalf of another actor or force.
Translation, in the sense of ANT, is a political process where interests are

pursued, meanings are interpreted, subjects are persuaded, and relationships of
power are acted out. Importantly, once again, it involves the mobilisation and
movement of both human and non-human actors. As put by Latour and Callon (1992, p.
353): ‘whatever term is used for humans, we will use it for nonhumans as well’.
Translation is also a geometrical process, as the mobilisation of human and non-
human resources entails a set of movements in different directions. In turn, these
movements also relate to semiotics, as they are connected to transformations of
meaning. Latour explains that:
Translating interests means at once offering new interpretations of these interests

and channelling people in different directions. ‘Take your revenge’ is made to mean
‘write a letter’; ‘build a new car’ is made to really mean ‘study one pore of an
electrode’. The results of such renderings are a slow movement from one place to
another. The main advantage of such a slow mobilisation is that particular issues
(like that of the science budget or of the one-pore model) are now solidly tied to
much larger ones (the survival of the country, the future of cars), so well tied
indeed that threatening the former is tantamount to threatening the latter. Subtly
woven and carefully thrown, this very fine net can be very useful at keeping groups
in its meshes.
(Latour, 1987, p. 117)
What Latour is explaining here is how the process of translation results in certain
sets of positions and relationships between actors in the network. These positions
and relationships in turn prescribe certain ways of acting or reasoning that could
have been different. For example, ‘taking one’s revenge’ can be done in many other
ways than by ‘writing a letter’. It is a process of translation which has made this
particular strategy into the truth in the given setting.
As described by Callon (1986a), translation happens through four stages. The first
stage is that of problematisation, by which a problem is defined, or a knowledge
claim is made. The problematisation also involves the definition of which actors
are required within the network. It thereby establishes roles and identities, as
well as proposes solutions or routes of action. Through problematisation, some
actors become defined as ‘obligatory passage points’ in the network, that prescribe
‘the movements and detours that must be accepted as well as the alliances that must
be forged’ (Callon, 1986a, pp. 205–6).
The second step is the ‘interessement’ through which other actors in the network
are locked into roles that align with the problematisation, as an actor uses
various actions in its ‘attempts to impose and stabilize the identity of the other
actors it defines through its problematization’ (Callon, 1986a, pp. 207–8). The
case analysis in this chapter will be particularly focused on these initial stages
of the translation. I analyse how the climate issue is problematised through the
Twitter dataset by looking especially at which actors are required, at passage
points, and at how actors are locked into roles.
These first two steps, in turn, may subsequently lead to the third step of
enrolment, where actors join the network by accepting their attributed roles. As
put by Callon:
To describe enrolment is thus to describe the group of multilateral negotiations,

trials of strength and tricks that accompany the interessements and enable them to
succeed.
(Callon, 1986a, p. 11)
In the fourth step, ‘spokespersons’ are mobilised to represent the truth generated
through the actor-network, which then becomes stabilised and no longer
controversial. This stabilisation is labelled blackboxing as it renders the
process, which led to the particular form of knowledge being generated, obscure:
When a machine runs efficiently, when a matter of fact is settled, one need focus
only on its inputs and outputs and not on its internal complexity.
(Latour, 1999, p. 304)
The blackboxed forms of knowledge – that is, the translation – can then be
inscribed into ‘intermediaries which come in many forms’, such as discussions,
declarations, texts, objects, skills, or any other material (Callon, 1991, p. 143).
Once inscribed, the network and its knowledge are not easily contested, as they
come across as unproblematic and given.
Finding the actors
A crucial part of analyses in the vein of ANT is to identify the actors that are
central to the relational system. Latour (1987, p. 84) has referred to these as
actants, defining them as ‘whoever and whatever is represented’ in the analysed
figuration. For the purpose of the case analysis in this chapter, I identify these
using a computational technique to recognise names of things in volumes of text,
that it would be immensely more time-consuming for human coders to read. More
specifically, the method used for this is Named Entity Recognition (NER), which has
been developed in the research field of information extraction as a technique that
‘basically involves identifying the names of all the people, organizations, and
geographic locations in a text’ (Grishman and Sundheim, 1996, p. 467). Subsequent
iterations of the technique also help identify the names of other entities such as,
for example, nationalities, infrastructural elements, events, works of art, and so
on (spaCy, 2019).
The dataset used in this analysis initially consisted of 4.4 million tweets,
collected via Twitter’s public Streaming API over a period of forty days in May and
June of 2019. The search terms used when sampling tweets were ‘climate’,
‘climateaction’, ‘climatechange’, ‘climatedenial’, ‘climategate’, ‘climatelies’,
‘globalwarming’, ‘biodiversity’, and ‘fakescience’. The reason for writing the two-
word terms without any space was mainly to grab the use of these terms as hashtags.
The hash symbol was still omitted throughout as some of the terms, such as
‘climate’ and ‘climategate’ can occur in both forms. Retweets were removed from the
dataset, leaving 1.1 million original tweets. The tweet text was then cleaned-up
and tagged using the NER implementation of the spaCy library for Python (explosion,
2014). The NER analysis was made, referring back to the framework of ANT, to find
the actants, or actors, that are key to the problematisation of climate politics on
Twitter. We want to see which actors are required within the network, by mapping
‘the repertoire of entities that it enlists’ (Callon, 1986b, p. 22). And
importantly, such semiotic actors that are presented in the text can be of
virtually any kind. There is an assumed symmetry among actors where one does not
‘impose a priori some spurious asymmetry among human intentional action and a
material world of causal relations’ (Latour, 2005, p. 76). The types of entities
that are tagged by spaCy’s NER algorithm are shown in Figure 4.1.
Figure 4.1 Named entity types tagged by spaCy (spaCy, 2019).

In the analysis of tweets made for this case example, some of the most prominent
actors were EVENTs such as ‘global warming’, ‘the climate crisis’, and ‘Brexit’;
PERSONs like US President Donald Trump, Swedish climate activist Greta Thunberg,
and Canadian environmental economist Ross McKitrick; ORGanisations such as Exxon,
the Green Party (UK), the UN, and the BirthStrike movement; MEDIA like Time
Magazine, CNN, and a number of individual journalists; alongside LAWs such as The
Paris Agreement. Actors such as these can thus be seen as composing the
heterogenous sociotechnical network of climate politics on Twitter at the
particular point in time and space of this dataset. These are such semiotic actors,
who make up the repertoire of entities that are required within this network.
Notably, aligning with ANT, these actors are of many different kinds, and both
human and non-human, both social and technical. This helps us analyse these tweets
not simply as social media posts, but as proxies for understanding the ways in
which the social must be problematised in terms of its materiality, while at the
same time the technical must be seen in terms of its social heterogeneity.
Intimidating as such an ontological position may seem, it is also quite liberating
for how one can proceed with the analysis. John Law (1999, p. 3) writes:
Truth and falsehood. Large and small. Agency and structure. Human and non-human.
Before and after. Knowledge and power. Context and content. Materiality and
sociality. Activity and passivity. In one way or another all of these divides have
been rubbished in work undertaken in the name of actor-network theory.
This obviously looks like potentially destructive hacking and unproductive anarchy.
It indeed sounds cool when Law (1999, p. 3) writes of ‘tossing sacred divisions and
distinctions into the flames’ but, as he also emphasises, this isn’t as dramatic as
it sounds. He continues to write that:
it is not, in this semiotic world-view, that there are no divisions. It is rather

that such divisions or distinctions are understood as effects or outcomes. They are
not given in the order of things.
(Law, 1999, p. 3)
It is from this perspective, that the computational method of Named Entity

Recognition is used here. Because we cannot know which the actors are, and how the
network is aligned, beforehand. In every given research setting, the contents and
structure of the sociotechnical network must be identified anew.
Making connections
Callon (1986b, p. 30) argues that ‘behind each entity there hides a set of other
entities which it more or less effectively draws together’. In the context of what
we are doing in this chapter, this can be interpreted as ‘behind each identified
Named Entity, there hides a set of other Named Entities that are somehow connected
to it’. Beyond simply identifying the required actors, the next step of an ANT-
influenced analysis is to move from the actors to the actor-network in which they
are enrolled. Such connections can be made in different ways, but not in any way as
the shape of, what Callon calls, the ‘actor-world’ is always an empirical question.
We must somehow get knowledge about how it is structured. The actor-world is ‘a
list of entities’ that may be either individual or collective, and also a ‘list of
what they do, think, want and experience’. The actor-world is a kind of system, or
engine, where entities ‘act, react and cancel each other out, in just the same way
as any others’ (Callon, 1986b, p. 22). The actor-world decides the repertoire of
entities, as discussed in the previous section, as well as their relative size. For
our purposes here, we can define climate politics on Twitter in relation to an
actor-world – ‘a combination of elements borrowed from […] different registers’ –
that ‘both supports and is being supported by it’ (Callon, 1986b, p. 22).
Mapping such worlds can, as argued above, be made with different approaches, where
one out of many possible ways is to somehow operationalise the relationships
between the identified entities. One has to decide what is required,
methodologically, for a connection to exist between the actants. This can be done,
depending on the research case and framework, through observations, interviews,
surveys, document analyses, and so on. With our Twitter dataset, where we have
derived the actants from how they are used in speech, a suitable next step can be
to find linguistic connections between those actants in the overall semiotic space
of the dataset. In some of his writings, Callon suggests something which is quite
along those lines when introducing ‘co-word’ analysis as a method for seeing how
certain pairs or groups of keywords appear together in scientific publications
(Callon et al., 1983).
By using machine learning in combination with the ontology and interpretive

framework of ANT’s sociology of translation, we can achieve such maps of
associations – visualisations of actor-worlds – that draw on very large datasets
and on standardised procedures for deciding which actants are associated. The
contexts of words can be computationally analysed in a variety of ways. The example
in this particular chapter uses topic modelling through LDA (Blei, Ng, and Jordan,
2003), to explore how the named entities are related (this and other strategies
will be further discussed in Chapter 5). An LDA model was created, whereafter a
network graph was generated based on the probability of words appearing in one and
the same topic. Basically, this means that the graph reflects connections between
words that are likely to appear together in the sampled stream of tweets. The
detailed workflow and code used for this, as well as for the NER analysis discussed
above, is available at github.com/simonlindgren/datatheory-twitter-ner-lda.
Using the Gephi software (Bastian, Heymann, and Jacomy, 2009), and its Minimum
Spanning Tree plugin (Schroedl, 2019), which employs Kruskal’s algorithm (Kruskal,
1956), only the strongest connections were kept in order to get a schematic
visualisation of the most prominent relationships. In practice, for Figure 4.2,
this means acquiring a subgraph that includes all nodes (actants) in the network,
with the minimum possible number of edges (connections), prioritising to keep
strong connections and drop weaker ones. The graph visualisation was laid out using
the Cytoscape software (Shannon, 2003), and the yFiles orthogonal layout algorithm
(yWorks, 2019). Callon writes (1986b, p. 24):
Figure 4.2 Types of entities and their connections in the context of climate
politics on Twitter (May–June 2019).
An actor-world associates heterogeneous entities. It defines their identity, the

roles they should play, the nature of the bonds that unite them, their respective
sizes and the history in which they participate.
Figure 4.2 shows the entities/actants that constitute the actor-world of climate
politics on Twitter, given our search terms and in our particular socio-historical
moment. The names of entities have been removed in the figure, so as to achieve an
even more schematic depiction of the kind of machinery that is at work when
knowledge and power is produced and reproduced through language in this particular
political setting. The network illustration is visually quite reminiscent of the
drawings in Callon’s own publications (Callon, 1986b, 1986a; Callon et al., 1983).
Seeing a map such as the one in Figure 4.2 from the perspective of a sociology of
translation means interpreting it as ‘a definition of roles, a distribution of
roles and the delineation of a scenario’, as ‘a geograpy of obligatory points of
passage’ (Callon, 1986b, p. 26). In a more everyday language, mapping the actor-
world around a specific entity or issue means getting to know the playing-field for
doing politics around that entity or issue. Figure 4.3 shows part of another layout
iteration of the same network, but now with entity labels shown.
Figure 4.3 Detailed view of some entities and their connections in the context of
climate politics on Twitter (May–June 2019).
Figure 4.3 holds a host of exciting information. An interesting aspect of seeing

all actors as equal (social, technical, human, non-human, etc.) is that for some
actors, for example people, organisations, or media, what figures in the network
are their actual names, while in other cases (marked with @ in Figure 4.3) the
network position is held by a digital intermediary such as a Twitter account or a
website. This means that some actors, such as Donald Trump and @realDonaldTrump,
the BBC and @BBCNews, and BirthStrike (the movement), #BirthStrike (the hashtag)
and birthstrike.tumblr.com (the website), all figure as different entities.
Obviously, analyses like the one initiated here will be highly sensitive to which
search terms are used and what social context and timeframes are used.
As the focus of this book is on how analytical possibilities may arise at the
intersection of social theory and data science, and is therefore methodological
rather than empirical, we will not move any further with this particular analysis
of climate politics on Twitter. The possibilities for moving further with an
analysis such as this one are vast, however. Marres (2017, p. 106), for example,
has pointed out that there are methodological affinities between ANT’s mappings of
heterogenous networks and emerging digital social research techniques. Marres has
also made analyses in which parallels are drawn between Callon’s co-word analysis
and current digital analytics techniques. These resonances, Marres (2017, p. 108)
argues: ‘open up a space of exploration for social enquiry, allowing us to
interrogate the capacities of social methods anew’. This chapter has also
illustrated such resonances, highlighting the interplay between computation and
theorisation. Without the theoretical framework of ANT, in this chapter’s case
study, we would not be able to make any socially anchored analyses of the Named
Entities, which would then be mere lexical tags. Furthermore, the choice of
combining NER with network analysis would not be given without a theoretical
framework guiding the analysis. On the other hand, without the computational
techniques we would not be able to make an ANT analysis of this type, with such
data volumes, and in such a systematic fashion. This is then a mutually beneficial
situation. On the other hand, however, maybe some experts at ANT, NER, or LDA,
might claim that this design does not do their perspective justice and that it is
rather mutually destructive. This gets us back to the question of mixing and
repurposing theories and methods. Can we allow for integrating fragments of
different paradigms into new paradigms, without paying full respects to the various
component parts?
Who owns theory?
It is important to emphasise once again the idea underlying this book, that
scholarly research will be better off with less theoretical dogmatism, and a more
open mindset. Rather than avoiding drawing on existing theories, simply because
their original formulations are not compatible with what we want to achieve, it is
much better to repurpose theories in quite far-reaching ways in order to get the
best possible analytical effect. As was discussed in Chapter 1, the idea of
‘hacking’ theory entails a practice of tweaking and adapting perspectives in ways
that may mean simplification as well as the emergence of new analytical
opportunities. In the particular case example that I draw upon in this chapter,
where I am using ANT, there is a risk that some scholars within that field will
argue that the theory is applied in the wrong way here. But it is vital to the
whole idea within this book that there is no truly wrong way when it comes to how
theories may be used or interpreted.
So, yes, anything goes, but no, it does not go in any way. In the end, the person
responsible will always be the researcher in question, who makes use of the theory.
Because with the great freedom of hacking theory comes the crucial requirement
always to make clear how one interprets any theory that one draws upon, and how one
intends to use it. From that point on, the theory – even if influenced by the work
of others – essentially becomes our own theory, rendering it quite uninteresting,
from a pragmatic perspective, whether the current application is in, for example, a
‘true’ Latourian or other spirit or not. Academic theorising is, and should be, a
constantly ongoing collaborative, iterative, and cumulative enterprise, where
theories, old, new, self-created, and created by others, should be constantly
renegotiated through ever new readings and empirical applications. Theories should
be open-source – their underlying code free for all to alter, rendering them more
efficient, less functional for some purposes while fitting for others, or broken,
to be potentially discarded, returned or passed further.
Theories are not sacred, and theorists are not untouchable. Karl Mannheim argued in
Ideology and Utopia, that intellectuals who are unanchored in the realities of
social life and who float freely above it, are in a unique position to make
analyses that are ‘unceasingly sensitive to the dynamic nature of society and to
its wholeness’ (Mannheim, [1929] 1954, p. 137). It is a much more appealing idea
however, that theorising is available to anyone, and that not only ‘theorists’ are
able to raise important questions or point out important patterns. Cultural
researcher Thomas McLaughlin (1996) discusses the notion of ‘vernacular theory’,
referring to theories formulated by non-elites, and argues that such theory does
not differ in quality from any theories that may have a higher status or more
scholarly style. Rather, involving vernacular elements in processes of theorising
fosters a more ‘questioning, open-ended spirit’ (McLaughlin, 1996, p. 6). In the
end, such a mindset will illustrate that:
Instead of running problems through unquestioned machines of analysis and

interpretation, someone was questioning the machines themselves.
(McLaughlin, 1996, p. 29)

Michel Foucault has argued that, while what he calls ‘global theories’ are very
useful in many respects, they are always modified in some way in those cases where
they have truly contributed to analyses. He said that theories, such as for example
those of Karl Marx or Sigmund Freud, ‘continue to provide in a fairly consistent
fashion useful tools’ for applied research (Foucault, 1980, p. 81). But, he
continued:
these tools have only been provided on the condition that the theoretical unity of
these discourses was in some sense put in abeyance, or at least curtailed, divided,
overthrown, caricatured, theatricalised, or what you will. In each case, the
attempt to think in terms of a totality has in fact proved a hindrance to research.
(Foucault, 1980, p. 81)
In order to be useful for research, theories must be renegotiated so that they fit
the current application. As Foucault argues, this may mean altering existing
theories in quite dramatic ways by for example curtailing, dividing, or over-
simplifying (caricaturing) them. This, once again, is precisely in line with what
has been argued elsewhere in this book about employing an element of anarchism in
the analytical enterprise. The obvious risk here is that we end up in a totally
relativistic position where anything goes. This is a delicate balancing act and, as
Foucault puts it, our position must not be ‘naive or primitive’, nor should it be
based on ‘a soggy eclecticism, and opportunism that laps up any and every kind of
theoretical approach’. At the other end, however, nor do we want to stick
dogmatically with a religious view on theories as immaculate and sacred, as this
will ‘reduce to the worst kind of theoretical impoverishment’ (Foucault, 1980, p.
81). What we want is:
an autonomous, non-centralised kind of theoretical production, one whose validity

is not dependent on the approval of the established regimes of thought.
(Foucault, 1980, p. 81)
It is in this light that theories are approached and handled in this book. So, when
approaching a framework such as ANT, we are not interested in what ANT essentially
‘is’, or how it is ‘appropriately’ done. We ask not what we can do for ANT as a
‘global’ phenomenon, but we ask what ANT – or any slice, interpretation, or maybe
even ‘misreading’ of it – can do for our ‘local’ work.
Collective Representations
This chapter will give an illustrative example of how one can draw on perspectives
from the field of the sociology of knowledge in the context of computational text
analysis. Where the previous chapter provided an example of a highly theory-driven
analysis, this chapter begins from the other side by starting quite openly with a
huge text dataset, that we somehow want to make sense of. As will be illustrated in
the following, we proceed by choosing some computational techniques to explore the
data further. Throughout this explorative work, we shall see how the fit with the
sociology of knowledge emerges quite gradually. So, while the analysis of tweets in
Chapter 4 was shaped by trying to make the data science play (as) nicely (as
possible) with social and cultural theories, this chapter is an example of the
process of finding social and cultural theories to play nicely with the data
science. This is basically the difference between what we call deduction (starting
with the theory as in the previous chapter) and induction (starting with the data
as in this chapter). In practice, it is always a little bit of both.
The sociology of knowledge is interested in how shared knowledge in social

settings, ranging from smaller groups to entire societies, is expressed and created
through so-called collective representations. Such representations are the dominant
and widespread forms of language-use, and their related world-views and modes of
action, that are characteristic of a given social context. The key argument in the
classic, Durkheimian, approach is that language, conceptual thinking, and logic are
shaped by the social contexts out of which they arise. This notion, that
stereotypes, categorisations, and manners of speaking, that exert great power over
our reasoning and actions, are social products, has formed the basis for a series
of other constructionist perspectives on society and culture over the years. This
chapter aims to illustrate what can be achieved through combining such ideas with
present-day data science approaches, such as text mining through machine learning.
It shows how one can approach, much like a social anthropologist would, massively
networked social settings online through big data techniques, and draw on
sociological theory in decoding the world-view of such a setting. The chapter
centres around an empirical case study of the forum website Reddit. The goal, in
aligning with what has been said in previous chapters, is to approach the
discursive universe of the vast social site while being theoretically sensitised by
the sociology of knowledge.
In undertaking the data theory experiment that is at the centre of this chapter, we
start in a most data-driven way, by applying a popular data science method to an
enormous social media corpus. Corpus is a word from the field of linguistics where
it refers to the mass of text, the collection of documents, that is approached with
one method or another in order to respond to some research question concerning it,
or simply to explore it without any other predefined aim. We shall process our
corpus, which consists of the more than 1.2 billion comments that were posted
across Reddit’s subforums between 1 January 2018 and 31 December 2018, using a text
analysis method called word2vec (Mikolov et al., 2013). In other words, we start
with a big dataset, and then use a computational method to cast it in a different
light, revealing linguistic structure. Once that has been done, we introduce a
theoretical and sociological element in interpreting it. In this particular case,
concepts from the sociology of knowledge will be employed. But, first, let us look
more closely at the method in question – the method that we then want to move
beyond – and in doing this, a number of familiar data science buzzwords will be
actualised. Such words include machine learning, neural networks, vector models,
NLP, LDA (see Chapter 4), and AI.
Words and the company they keep
Word2vec is a type of model used in text processing to find words that tend to
appear together frequently in a corpus. As such, it is part of a popular trend
within natural language processing (NLP) in recent years, namely that of analysing
word embeddings. NLP is interested in ‘natural’ languages, that is ordinary
languages that have taken shape ‘naturally’ among humans over time, without any
conscious effort to plan or structure the language in question. These types of
languages differ from languages that have been formally constructed, such as, for
example, those used to program computers.
The method of word2vec is constructed to map word embeddings. Word embeddings, in

turn, is an area of interest within the part of linguistics that is called
distributional semantics. This form of linguistics is interested in how meaningful
units of language are distributed across large samples of text data, such as for
example our Reddit dataset. It is a foundational assumption in distributional
semantics that words that are used together, i.e. that occur in the same contexts,
tend to bear meanings that are similar. American linguist Zellig Harris (1954)
argued that ‘each language can be described in terms of a distributional
structure’, as a ‘network of interrelated statements’ (Harris, 1954, pp. 147–8). He
suggested that the semantic meaning of words should be understood through the words
by which they tend to co-occur in actual language-use. Harris went on to explain
that:
the parts of a language do not occur arbitrarily relative to each other: each
element occurs in certain positions relative to certain other elements. The
perennial man in the street believes that when he speaks he freely puts together
whatever elements have the meanings he intends; but he does so only by choosing
members of those classes that regularly occur together, and in the order in which
these classes occur.
(Harris, 1954, p. 147)
What he meant, in other words, was that languages are not random, and that they
provide the structure by which we think and talk about reality, even if we may have
a tendency to see language as a neutral tool by which we can express ourselves
freely. The task for distributional semantics is to find these structures. So, for
Harris, investigating a language includes not only a mapping of which irreducible
elements the language consists of, but also a ‘mathematical search’ for ‘ordered
statements’ (Harris, 1954, p. 149). This is what we will do with the Reddit dataset
in this chapter, by mathematically finding patterns that describe how its manners
of speaking are ordered at the collective and aggregated level. As similarly
explained by British linguist John R. Firth (1957, p. 11), one ‘shall know a word
by the company it keeps’, by seeing them in ‘their contexts of situation’ and
presenting them ‘in their commonest collocations’.
During the 1960s, methods for analysing word embeddings, that is words in the
context of the company they keep, were advanced through the development of the so-
called vector space model for information retrieval (Salton, Wong, and Yang, 1975).
The vector space model is a way of representing text documents as numbers. In such
a model each token is given a numerical identifier. In this context a ‘token’ is
defined as the smallest unit into which the corpus is divided in the analysis. So,
commonly, tokens will be words, but the set of tokens analysed for a corpus can
also contain common bigrams (co-occurring pairs of words, such as ‘european
union’), trigrams (such as ‘use the force’), and so on. In vector models, each
unique token is given a numerical identifier, to make it possible to represent
documents as vectors, and thereby perform such mathemathical operations that were
mentioned earlier. Vectors are basically a series of numbers. A schematic example
of a vector model would be:
Document 1: ‘Sweep the leg!’

Document 2: ‘The leg is broken.’
Document 3: ‘Why?’
Tokens: (1) sweep; (2) the; (3) leg; (4) is; (5) broken; (6) why
Document vector 1: 1,1,1,0,0,0 [because it contains tokens 1–3]
Document vector 2: 0,1,1,1,1,0 [because it contains tokens 2–5]
Document vector 3: 0,0,0,0,0,1 [because it contains token 6]
In a model like this one, any token which occurs in the document is represented by
a non-zero value in the document vector. In the example above, this value has been
set to 1 but in actual practice this value is dependent on how the so-called term
weights have been computed, as one might want it to reflect different things for
different analysis tasks. Once this vectorisation has been made, it is possible to
do maths on the corpus and calculate things such as similarity scores between
documents. In the space of vectors, the document similarity becomes:
an inverse function of the angle between the corresponding vector pairs; when the
term assignment for two vectors is identical, the angle will be zero, producing a
maximum similarity measure.
(Salton, Wong, and Yang, 1975, p. 613)
A common version of this method, popularised in the late 1980s, is Latent Semantic
Analysis (LSA), which uses the vector space model, and a mathematical technique
called single value decomposition, to learn latent topics (groups of tokens that
are close to each other). As such, the method enables moving from information about
tokens in documents, to information about entire topics across documents. As
‘individual words provide unreliable evidence about the conceptual topic or meaning
of a document’, LSA instead maps the ‘underlying semantic structure in the data’ by
constructing a ‘‘semantic” space wherein terms and documents that are closely
associated are placed near one another’ (Deerwester et al. 1990, p. 391). The role
of single value decomposition in this case is to reduce the term-document matrix to
focus on major associative patterns in the data, while ignoring less important
ones. So, what LSA does, is that it extracts the conceptual content of a body of
text through mathematically establishing associations between words or concepts
that occur in similar contexts.
The topic modelling method of LDA (Latent Dirichlet Allocation), which has been
popular in recent years in text analysis and the digital humanities (Blei, 2012),
and which was used as part of the design in the previous chapter, builds on a
similar logic. Both LSA and LDA are aimed at finding the underlying, or hidden,
thematic structure in large text collections. Based on the results of such methods,
there is then the possibility of summarising, visualising, exploring, and
theorising about the corpus in question (Blei, 2012). So, for the purpose of what
we are doing in this book, methods like these are very fitting. They are data-
intensive approaches not commonly used in fields like sociology, which at the same
time produce results that can form the basis of theorisation.
By contrast with LSA, which is a count-based model, LDA is what is called a

probabilistic model. This means that it works with probabilities. It takes a text
collection as its input, then discovers a set of recurring themes (groups of terms)
in the collection – these themes are ‘topics’ in the vocabulary of ‘topic
modelling’ – and then specifies the probability by which terms occur in each topic.
Consequently, the researcher can also see to what degree each document exhibits
those topics (Blei, 2012). Technically, a topic model then is an imaginary recipe
for how to construct the corpus under analysis. It specifies post-hoc instructions
for how the specific body of text that was analysed was composed, even if that
composition was not consciously made.
Both of these are mathematical ways of finding the structure in large text data. As
mentioned above, the analysis in this chapter will use a method called word2vec,
which is a similar algorithm to LDA. So, we will use word2vec, but for the purposes
of this book, we could just as well have chosen LSA, LDA, or any other
distributional semantics and word embedding method. The point being, from the data
theory perspective, that we have some form of fairly good strategy to detect
structure in masses of text that we would not be able to read and interpret in
their entirety. And, also, even if we were able to read it, there may even be a
point to not doing that, at least initially, as the mathematical models may bring
out structure that we may not otherwise have detected.
Learning from afar
As the case study for this chapter, we employ word2vec to study the use of language
on Reddit in 2018 as a whole. As such, this is an example of the type of text
mining that researchers can engage in to be able to exploratively identify general
patterns in a corpus, which they would be unlikely to find by interacting manually
with the documents one at a time. A strategy such as this one, as opposed to close
reading of the text, can be seen from a more interpretive perspective as a ‘distant
reading’, where ‘unreadably large amounts’ of text are approached by ‘finding
numerical abstractions that can reveal qualities and patterns within those texts’
(Liddle, 2012, p. 230). Literary scholar Franco Moretti (2013) coined this idea,
arguing that there is an analytical point to not close read texts, since this
removes focus from the more general patterns on which he thinks research should be
focused:
The trouble with close reading […] is that it necessarily depends on an extremely
small canon […] You invest so much in individual texts only if you think that very
few of them really matter. Otherwise, it doesn’t make sense […] What we really need
is a little pact with the devil: we know how to read texts, now let’s learn how not
to read them. Distant reading: where distance […] is a condition of knowledge: it
allows you to focus on units that are much smaller or much larger than the text:
devices, themes, tropes – or genres and systems. And if, between the very small and
the very large, the text itself disappears, well, it is one of those cases when one
can justifiably say, less is more. If we want to understand the system in its
entirety, we must accept losing something. We always pay a price for theoretical
knowledge: reality is infinitely rich; concepts are abstract, are poor. But it’s
precisely this ‘poverty’ that makes it possible to handle them, and therefore to
know.
(Moretti, 2013, pp. 48–9)
By comparison with more conventionally ‘qualitative’ approaches, distant reading

demands that the researcher is prepared to move away from conventional close
reading in order to be able to grasp larger sets of data, and also to lose some
degree of ‘qualitative’ detail because of this. But, importantly, a distant reading
does not make the element of ‘qualitative’ interpretation any less relevant, as the
patterns we find still need to be contextualised and understood. However, while it
comes from the field of computational literary studies, the idea of distant reading
is not alien to what data science wants to achieve. As explained by data scientists
John Kelleher and Brendan Tierney:
If a human expert can easily create a pattern in his or her own mind, it is
generally not worth the time and effort of using data science to ‘discover’ it. In
general, data science becomes useful when we have a large number of data examples
and when the patterns are too complex for humans to discover and extract manually.
(Kelleher and Tierney, 2018, p. 4)
So, a word embedding model – even though it is a mathematical product – can be used
to perform the distant reading and discover patterns.
Technically speaking, rather than being a probabilistic model, word2vec is a

prediction-based model. The model architecture for word2vec was developed by Tomas
Mikolov and his colleagues at Google, and has been very popular since its
publication (Mikolov et al., 2013). It uses a neural network model called skip-gram
which takes every word in a large corpus, and for each such word notes which words
surround it within a defined ‘window’, for example five words ahead and five words
behind the word in focus (which is the window-size used for the analysis in this
chapter). This information is fed into a so-called neural network, which, after
some training, will be able to predict the probability of words appearing in the
window around any given focus word. A neural network is a group of artificial
‘neurons’ which use some form of mathematical or computational model to process
information in a multilayered fashion. They are based on a connectionist approach
to data modelling that is influenced by how actual brains and neurons handle
information in a non-linear way.
When talking of how a neural network is ‘trained’ to be able to predict word

contexts, this refers to training in the context of machine learning, which is a
branch of artificial intelligence (AI). The machine learning approach assumes that
systems can identify and learn patterns, and even make decisions, based on data
with a minimal level of human intervention. In the case of word2vec, the machine is
trained to reconstruct the linguistic contexts of words. Machine learning is also
employed in many other fields apart from text analysis. As explained by Müller and
Guido (2016, p. 1):
Machine learning is about extracting knowledge from data. It is a research field at

the intersection of statistics, artificial intelligence, and computer science and
is also known as predictive analytics or statistical learning. The application of
machine learning methods has in recent years become ubiquitous in everyday life.
From automatic recommendations of which movies to watch, to what food to order or
which products to buy, to personalized online radio and recognizing your friends in
your photos, many modern websites and devices have machine learning algorithms at
their core. When you look at a complex website like Facebook, Amazon, or Netflix,
it is very likely that every part of the site contains multiple machine learning
models. Outside of commercial applications, machine learning has had a tremendous
influence on the way data-driven research is done today.
So, as discussed in previous chapters, this is an example of a method which in

itself is part of the architecture of sociality in today’s world, while it at the
same time can be repurposed to be a tool for critical social analysis. Machine
learning models are embedded in the very architecture of society, ranging from
political propaganda and capitalist marketing, to governmental decision-making and
social media friend-recommendations. As such they are prime expressions of
datafication, because they illustrate how flows of data, and the ways in which they
are appropriated, have become a key social force. This is in terms of how data are
registered and owned, as well as of how data form the input for automated systems
with societal repercussions. But nonetheless, we can follow such societal
methodologies and try to harness them as tools for garnering insights that can be
analysed from a sociological standpoint.
Reading Reddit
The object of analysis in this chapter is the language of Reddit, which is an

American website hosting close to 1.5 million user-created subcommunities (so
called ‘subreddits’) covering a broad range of interests and topics that include
anything from news, memes, and fan forums, to a very wide variety of niche topics.
Reddit is commonly categorised as being focused on social news aggregation and web
content rating, which has to do with the fact that the prominence of content on the
site is decided through community voting. This rating system works according to a
model where users, commonly referred to as ‘redditors’, post content to the site in
the shape of links, text posts, images, or videos. Posts are made to subreddits,
which are named according to the convention of putting an ‘r/’ ahead of the
subreddit name, as in ‘r/news’ or ‘r/relationship_advice’.
Once a piece of content has been posted, other redditors can upvote or downvote the
post by clicking an upward or downward facing arrow icon respectively. Posts that
receive more upvotes than others will appear closer to the top of the subreddit
page, and posts that are very highly upvoted may reach the front page of Reddit,
which also proclaims itself to be ‘the front page of the internet’. For each
registered user, the front page will show a combination of the highest-rating posts
from all of the subreddits that the user is subscribed to. But while Reddit has
this bulletin board and rating system at its core, large amounts of social activity
are going on through the threaded commenting function that it allows. Depending on
the nature of the subreddit, Reddit therefore often functions more like a
discussion forum than a social news aggregator. With more users than Twitter, and
higher rates of engagement (Hutchinson, 2018), it is fair to say that Reddit is one
of the largest communities on the internet.
The site was founded by two University of Virginia students, Steve Huffman and
Alexis Ohanian, in 2005 but was acquired by American mass media company Condé Nast
in 2006. Since 2011, Reddit is run as an independent subsidiary of Advance
Publications, which is Condé Nast’s parent company. In early 2019, the site had
close to 550 million monthly visitors (234 million unique users) (Wikipedia,
2019e), and ranked as one of the most popular sites on the web, especially in the
English-speaking parts of the world (Alexa, 2019). The site garnered far-reaching
publicity, and entered the conscience of the mainstream through events such as then
President Barack Obama taking part in the site’s series ‘Ask Me Anything’ (Reddit,
2012), the vigilante misidentification of a perpetrator of the Boston Marathon
Bombings (Reddit, 2013), and the controversy surrounding leaked nude celebrity
photos being posted on the site, which intensified debates about how the site was
administrated (Massanari, 2017; Reddit, 2015).
As is well-documented in the entry about Reddit on Wikipedia (2019e), and in a

number of academic papers, the site is known to promote a certain kind of culture
and community. It has been referred to as ‘offbeat, quirky, and anti-establishment’
(Spector, 2017), and as being ‘a self-correcting marketplace of ideas that’s nearly
impervious to marketers’ (Silverman, 2012). It has a certain social power as its
architecture allows for disseminating content to a huge audience, while at the same
time serving extremely niche purposes through the large number of specialised
subreddits. Because of such affordances, Reddit has been used to raise attention
and foster discussion in a large number of different areas, and has also been used
for various fundraising and activist initiatives. In 2017, the March for Science in
response to Trump’s climate change denial rhetoric, and specifically to the
deletion from the White House website of all references to climate change, was
initiated through a discussion on Reddit. The dynamics proliferated by the site
were able to ensure that, as one report put it, ‘the movement took off like
subatomic particles in the Large Hadron Collider’ (Foley, 2017). In addition to
this, users of Reddit have engaged repeatedly in defending internet privacy and net
neutrality.
In spite of the wide range of topics, and the large user base, when inspected more
closely Reddit’s community has a prominently male (two thirds), young, urban, and
American bias (Alexa, 2019; Duggan and Smith, 2013). This underlines that even
though Reddit is a very large and popular site, its content cannot in any
unproblematic way be assumed to reflect the interests and views of internet users
in general. It indeed attracts some users more than others. It is however a prime
example of the type of internet culture that was discussed in Chapter 2, as when
Greenpeace in 2007 invited web users to name a humpback whale that they were
tracking, Reddit users mass voted to name it ‘Mr Splashy Pants’ (Reddit, 2014).
This prank was also encouraged by Reddit itself, as administrators changed the logo
of the site from its standard alien to a whale during the voting.
But Reddit culture is not only about community, cooperation, and activism. It is
also home to a range of exclusionary and hateful activities. Anonymity online can
indeed be a double-edged sword, as it disinhibits users. Psychologist John Suler,
writing about what he calls ‘the online disinhibition effect’ says that:
This disinhibition can work in two seemingly opposing directions. Sometimes people
share very personal things about themselves. They reveal secret emotions, fears,
wishes. They show unusual acts of kindness and generosity, sometimes going out of
their way to help others. We may call this benign disinhibition. However, the
disinhibition is not always so salutary. We witness rude language, harsh
criticisms, anger, hatred, even threats. Or people visit the dark underworld of the
Internet – places of pornography, crime, and violence – territory they would never
explore in the real world. We may call this toxic disinhibition.
(Suler, 2004, p. 321)

Registration to Reddit is free, and only requires the user to specify a username
and a password. No other personal details are required, and the site does not even
require the user to verify themselves via an email address. This kind of anonymity
is what enables the intimacy needed for exchanging various forms of peer-support,
and the sense of safety needed for people to be able to raise their voice and speak
up against prevailing powers. It is also this same anonymity that enables users to
deploy hate campaigns and to engage in criminal acts. It is also worth noting that
one does not need to have a registered user account to access the bulk of Reddit’s
content. While posting, voting, and commenting demands registration, simply
accessing the site as a read-only user does not.
Basically, a site like Reddit makes it possible for users to do all kinds of crazy
things, and such things – because people are what they are – will cut both ways.
According to its content policy, Reddit aspires to be ‘a platform for communities
to discuss, connect, and share in an open environment’, and to be ‘home to some of
the most authentic content anywhere online’. Furthermore, the content policy states
that ‘the nature of this content might be funny, serious, offensive, or anywhere in
between’, but that users should ‘show enough respect to others so that we all may
continue to enjoy Reddit for what it is’ (Reddit, 2019). The content policy
prohibits content that:
Is illegal
Is involuntary pornography
Is sexual or suggestive content involving minors
Encourages or incites violence
Threatens, harasses, or bullies or encourages others to do so
Is personal and confidential information
Impersonates someone in a misleading or deceptive manner
Uses Reddit to solicit or facilitate any transaction or gift involving certain

goods and services
Is spam
And behaviour that is:
Asking for votes or engaging in vote manipulation
Breaking Reddit or doing anything that interferes with normal use of Reddit
Creating multiple accounts to evade punishment or avoid restrictions
As these rules are strict, at the same time as the platform affordances in practice
may appear to encourage people to break at least a few of them, Reddit devotes lots
of resources to policing the rules, and to content moderation. This, in turn, has
led to a series of controversies as the site has closed down a number of subreddits
over the years (Wikipedia, 2019a).
It is worth noting that since the introduction of user-founded subreddits in 2008,

an increasing share of content has been posted in these highly specialised spaces,
where users cater their posts and responses for the eyes of the particular sub-
community that surrounds the subreddit in question. This development towards a
structural diversification of Reddit, where more and more discussion on the site
takes place in compartmentalised sub-settings has been mapped by Philipp Singer et
al. (2014, p. 519), who show that there has been a huge increase on the site of
‘secluded spaces […] with their own, clear-cut rules’.
Despite this vast heterogeneity of content, however, there is still a tendency

throughout the site of it expressing a fairly distinct culture of Reddit overall.
This is most prominently expressed through its particular lingo. For newcomers, the
jumble of memes, irony, made-up words, shorthands, and acronyms takes some time to
learn and some training to master. Most subreddits are marked in one way or another
by ‘Reddit Speak’ (Koerber, 2014). The culture and language are also established
and propagated through fewer and fewer posts being links to content outside of the
site itself, as was the most common in the early days, and more and more content
being in the form of ‘self-submissions’. As shown by Singer et al. (2014, p. 521),
‘Reddit has been experiencing an increasing, fundamental shift from “out-reference”
to more “self-reference”.’ Thus, while labelling itself as ‘the front page of the
internet’, the community aspect of the site, as well as its self-referentiality,
makes it interesting to study as a culture, however fragmented, in itself. As
formulated by Choudhury and De (2014, p. 71) it is an ‘online social system’.
Mapping the language of Reddit
The data for the case study in this chapter was retrieved from pushshift.io, which
is an open big data project that stores copies of all Reddit posts and comments.
The analysis to follow is based on all data for 2018, which corresponds to more
than 1.2 billion comments. A word2vec model was created, based on all of these
comments using the gensim package for the Python programming language (Rehurek and
Sojka, 2010). The workflow and code used to create the model is available at
github.com/simonlindgren/datatheory-reddit-w2v. With the model created, it can be
queried in a variety of ways, by posing questions, the responses to which help
reveal the site’s linguistic structure.
So how can we approach, interpret, and try to understand the patterns of words that
we are able to reveal through a word2vec-model of Reddit? From a sociological point
of view, as argued by social psychologists Margaret Wetherell and Jonathan Potter
in their seminal book on Mapping the Language of Racism (Wetherell and Potter,
1992, p. 1), mapping the ways in which language is used more generally can be a
means to ‘charting themes and ideologies, exploring the heterogenous and layered
texture of practices, arguments and representations which make up the taken for
granted in a particular society’. By extension such things:
define culture and society; they specify what political actions will be seen as
legitimate and what will be seen as mere ‘trouble-making’; they lay out what counts
as social progress and how it can be impeded […] All of these things are mediated
through patterns of signification and representations.
(Wetherell and Potter, 1992, p. 1)

One way, then, of uncovering what is taken for granted in the language of Reddit
would be to see which words the model defines as being the most similar to a given
word. For example, looking up the word ‘terrorist’ in the Reddit model created for
this chapter’s case study reveals that the word is the most similar to a number of
words relating to islamist terrorism, particularly ‘isis’. This is illustrated by
how, in the Reddit model (see Figure 5.1) the words ‘jihadist’, ‘isis’ and
‘islamist’ have very high similarity scores in relation to ‘terrorist’. Comparing
this with the linguistic structure of a different social setting, such as the pre-
trained word2vec model of a large Google News corpus which has been provided by
Google themselves (Google, 2013), we generally find a similar pattern. It may not
be very surprising that the Google News model which covers a longer period of time
(its exact scope is not well-documented) puts Al Qaeda (following 9/11 in 2001 and
its following ‘war on terror’) in a more prominent position. The Reddit corpus, on
its part, reflects only 2018 discourse.
Figure 5.1 Words most similar to the word ‘terrorist’ in two social settings
(Reddit and Google News, cosine similarities)
A very preliminary reading of Figure 5.1 also reveals that Reddit possibly
discusses the notion of ‘terrorist’ in a broader sense that includes more names of
different potential terrorist affiliations, while the Google News corpus highlights
fewer such groups. One could definitely go on at length to identify many such
preliminary interpretations to validate further by looking at more words, and by
digging deeper into the texts with other methods. One, out of many possible other
reflections could be that the Google News language may be more analytical, with
words such as ‘extremist’, ‘insurgent’, ‘antiterrorist’, ‘separatist’, ‘ideology’,
and ‘dissident’, while the Reddit model outputs more descriptive words (names of
groups), and only a couple of potentially more value-laden or analytical ones
(‘extremist’, ‘antigovernment’). The numbers listed in the figure represent the so-
called cosine similarity between the listed word and ‘terrorist’. This measure can
vary between -1 (when word vectors are diametrically opposed) to 1 (when word
vectors are identical) (Wikipedia, 2019b). For comparison ‘cat’ and ‘kitten’ in the
Reddit model has a cosine similarity of 0.89, while ‘ghost’ and ‘budgeting’ have a
score of -0.48. It is important to note that these scores have to do with word
embeddings, not with lexical opposition so, for example, ‘love’ and ‘hate’ have a
positive cosine similarity of 0.62, as they are often mentioned together. In
practice, however, high positive scores can be interpreted as that the two words in
question have much in common in terms of how they are used (their structural
positions) in the corpus.
But what is the social impact of such linguistic patterns? The analyses above are
yet quite preliminary and off the cuff, but let us assume for the sake of
illustration that they make some sense. The analysis then would be that Reddit
covers terrorism (at least as assessed through the word ‘terrorist’) in broader
terms than what is common in news reporting generally (as assessed through the
Google News corpus). Furthermore, we could conclude that news reporting is more
problematising and analytical, while Reddit is more matter-of-fact or descriptive.
If we went on to solidify such interpretations by more detailed analyses, what
would it say? If we look at it through the perspective of the sociology of
knowledge, the main point of which is that ideas and language define society and
culture, while society and culture at the same time define ideas and language, we
can abstract the analysis to higher levels.
The sociology of knowledge has been defined and re-defined several times throughout
its history (Berger and Luckmann, 1966; Mannheim, [1929] 1954; McCarthy, 1996;
Scheler, 1924). Since its inception in German sociology in the 1920s, its core
ideas have however remained the same. The main point with the perspective is that
there is a need for a sociology that is focused on what Marx called the
superstructure of society. He wrote, already in 1859, that while the material and
economic relations of production constitute ‘the real foundation’ of society, a
superstructure rises on top of it consisting of ‘definite forms of social
consciousness’ (Marx, [1859] 1904, p. 11). By this he referred to the ideas, world-
views, and terms by which reality is understood by people in society. He wrote
further that ‘it is not the consciousness of (people) that determines their
existence but, on the contrary, their social existence determines their
consciousness’ (Marx, [1859] 1904, pp. 11–12). In light of this, we can experiment
with another type of query that a word2vec model allows for, namely presenting the
model with a set of words and asking it to output which of the words fit the least
with the others.
A query for ‘apple’, ‘orange’, ‘windows’, ‘banana’, throws out ‘windows’, and
through cultural competence we know why this is. If we replace ‘banana’ with
‘amazon’, ‘orange’ becomes thrown out instead. This is obvious from a lexical point
of view, as the first example reveals the category of fruit, and the second one
reveals the category of digital brands. But it is important to remember here, that
the model was learned by a machine that was unsupervised in the sense that it had
no information about any natural language. So, the examples above can be seen as
validations of the model that confirm that it works. It has detected the categories
of fruit and digital brands, without knowing anything about that either beforehand
or now. The model simply knows that they belong together. Similarly, a query for
‘english’, ‘american’, ‘german’, and ‘french’ throws out ‘american’, revealing the
category of Europe. Replacing ‘french’ with ‘syrian’, instead drops ‘syrian’
revealing another categorisation – possibly that of ‘west’ and ‘east’. The more
interesting results can be achieved, however, when asking the model for information
that we could not guess beforehand. It is an interesting starting point for
sociological analysis to see that ‘norway’ is excluded from ‘sweden’ and ‘denmark’,
or to analyse the differences between the last two examples in Figure 5.2 from the
perspective of musical genres, or gender, race, and politics.
Figure 5.2 Some examples of ‘doesnt_match’ queries (Reddit model).
Examples like these illustrate that word embeddings, seen as proxies for how our
ways of thinking and talking are organised, have a social connection. Drawing on
that general idea, early proponents of a sociology of knowledge, such as Émile
Durkheim ([1912] 1995) and Max Scheler (1924), argued that a certain specialisation
within sociology must be focused on studying the origin, structure, and
consequences of various forms of collective consciousness. The key ideas within
such a sociology should be that ways of organising the understanding of the world
are socially founded, and that various belief systems, concepts, and ideas have an
inherent sociality that comes from the settings from which they spring. This view
has been possibly most famously formulated by Berger and Luckmann in their 1966
book The Social Construction of Reality in which they wrote that ‘specific
agglomerations of “reality” and “knowledge” pertain to specific social contexts’,
and that ‘the sociology of knowledge must concern itself with whatever passes for
“knowledge” in a society, regardless of the ultimate validity or invalidity’
(Berger and Luckmann, 1966, p. 15).
But what, then, has this got to do with data science? Returning to Reddit, we can
consider it, for the purpose of the case study in this chapter, to be an example of
such a social setting where certain forms of knowledge are emergent and
proliferated. Studying such processes, and their resulting forms of collective
representations, was in fact quite difficult to do in the pre-datafied age. Either
one would have to rely on studying language-use that represented a select few, for
example by looking at the texts of authors or journalists, or by gaining access to
reading personal diaries or exchanges of letters between individuals. Or otherwise
one would have to design research that enabled monitoring and analysing
conversations between people in various social settings. But with a dataset such as
the one we have at hand here, alongside the word-embedding method that we have
used, new opportunities for (distant) reading acts of communication that are co-
constitutive of a collective consciousness arise. This gets closer to studying
sociality and language as it ‘naturally’ occurs, and to do so based on
contributions of a vast number of individuals that presumably have a much wider
variety of backgrounds. In the words of Mannheim, the case study of Reddit through
word2vec is a study of a ‘mental structure in its totality as it appears in
different currents of thought and (…) groups’ (Mannheim, [1929] 1954, p. 238).
To provide these data with a theoretical counterpoint, one place to look is

therefore indeed within the sociology of knowledge. Remembering that the sociology
of knowledge argues that ideas, knowledge, ideologies, and other ways of thinking,
develop within the framework of groups and institutions in society, Reddit is a
suitable case for analysis. In the digital and datafied society of the early
twenty-first century, it can certainly be construed as one such setting (group or
institution). By using, in this case, word2vec, we can thus gain an insight into
how Reddit – as a super-individual entity, where we look not at the language-use of
individuals but at language-use as aggregated across the billions of posts –
produces concepts and systems for organising social reality. Figure 5.3, for
example, shows a word network based on our word2vec model of Reddit where the
starting point was the word ‘feminism’. The 250 most similar words to that word
were retrieved, and in turn the 250 most similar words to each of those first 250
words were retrieved as well. Based on those data, a network graph was created
where connections between words were weighted according to their cosine
similarities. This means that in Figure 5.3, the width of the connecting lines
between words reflect their degree of similarity. The graph was laid out, and
filtered to show only the most prominent words and connections, using the Gephi
software (Bastian, Heymann, and Jacomy, 2009).
Figure 5.3 Most similar words to ‘feminism’ on Reddit 2018
As a form of baseline of public discourse on feminism, it can be found in the

Google News model that some of the most similar words to ‘feminism’ are:
‘womanist’, 0.633
‘postfeminist’, 0.629
‘patriarchy’, 0.620
‘postmodernism’, 0.612
‘femininity’, 0.602
‘intersectionality’, 0.595
‘essentialism’, 0.594
‘antifeminist’, 0.5909
‘liberalism’, 0.5906
‘womanhood’, 0.585
‘misogyny’, 0.576
‘motherhood’, 0.574
‘environmentalism’, 0.569
‘sexism’, 0.562
‘masculinities’, 0.560
‘heteronormativity’, 0.559
This reflects a general, and what might be said to be a rather ‘neutral’ position
on feminism, where its various objects and areas of contention and contestation are
revealed. It is about various forms of feminism (postmodern, liberal,
intersectional), about its counterparts (patriarchy, antifeminism, misogyny,
sexism), and about other forms of activism (environmentalism). But looking at
Figure 5.3, one can see that ‘feminism’, in that particular social setting, is
embedded in a different context where a more specialised, non-mainstream, way of
conceptualising it comes to the fore. Feminism, in the Reddit corpus, is connected
to ‘identity politics’ (‘idpol’), ‘identitarianism’, ‘Social Justice Warriors’
(‘sjwism’), and is generally positioned in a more dynamic and polemic linguistic
field among notions of ‘eugenics’, ‘manhating’, ‘antifamily’, ‘equalism’, ‘MRAs’
(Men’s Rights Activists), ‘brocialists’, and ‘misandrists’. Many of the words that
are used draw on a highly specialised vocabulary that can be identified as being
antifeminist. Of course, a conceptual map such as the one presented in Figure 5.3
enables and demands much more far-reaching forms of analysis than what can be done
here. But the illustrative point for now is that ‘feminism’ on Reddit – as compared
with in several other settings – is positioned as something to be defined in more
precise and meticulous ways, and also as something to be countered or even
derogatorily questioned. In comparison with the Google News model, the Reddit model
also bears witness to feminism being largely embedded in a polemic setting here
(‘antimale’/’antifeminists’, ‘‘leftists’/’rightists’, ‘equalism’, ‘genderist’,
‘antiwhite’, and so on). In general, this points in the direction, once more, of
Reddit being both broader and more detailed in scope, while at the same time
leaning towards an antifeminist and antileftist position. More specifically, such
patterns can be made the object of a critical social analysis of how knowledge
(ideas, world-views, ways of thinking and understanding reality) emerges out of, as
well as are determined by the social contexts and positions of those who speak that
same knowledge.
This insight, which is of key importance for the sociology of knowledge, might seem
quite self-evident, as it may appear that there is no reason to believe that our
patterns of thinking and talking are universally given rather than contextually
contingent. But, as a matter of fact, when the sociology of knowledge was first
introduced, it pioneered a more pluralistic way of thinking about truth and
ideologies. It argued that what we believe ourselves to know will vary depending on
things such as culture, community, class, generation, and so on. This contextual
dependency of language and ideology has also been discussed by sociologist Mary
Douglas in her classic book on Purity and Danger ([1966] 2001). Her argument, which
is particularly focused on notions of dirt versus cleanliness, is that such
categories are relative and contextually dependent. She writes that:
Shoes are not dirty in themselves, but it is dirty to place them on the dining-
table; food is not dirty in itself, but it is dirty to leave cooking utensils in
the bedroom, or food bespattered on clothing; similarly, bathroom equipment in the
drawing room; clothing lying on chairs; out-door things in-doors; upstairs things
downstairs; under-clothing appearing where over-clothing should be, and so on. In
short, our pollution behaviour is the reaction which condemns any object or idea
likely to confuse or contradict cherished classifications. (Douglas, [1966] 2001,
p. 37)
A similar logic goes for any linguistic or categorising social practice: things –
such as ‘feminism’ – may indeed exist as something other than social constructions,
but they do get their meanings through ‘cherished classifications’, which are in
turn socially dependent. And it is obvious from the examples above that Reddit
cherishes classifications other than Google News. Still, the main criticism of the
sociology of knowledge has concerned the fact that it risks being too relativistic.
If there is not any one single universal reality or truth, does that mean that
anything goes, and that everything – climate change, rape, the holocaust, or the
colours of our skin – are ‘simply’ social imaginaries? Are they open to be thought
about in any way by anyone? The obvious response to such questions is that the
point is not about the existence of material reality. It is instead about
emphasising that any such reality only assumes meaning through social and cultural
categories and forms of knowledge. As foundational sociologist of knowledge Karl
Mannheim explained:
No one denies the possibility of empirical research nor does any one maintain that
facts do not exist. (Nothing seems more incorrect to us than an illusionist theory
of knowledge.) We, too appeal to ‘facts’ for our proof, but the question of the
nature of facts is in itself a considerable problem. They exist for the mind always
in an intellectual and social context. That they can be understood and formulated
implies already the existence of a conceptual apparatus. And if this conceptual
apparatus is the same for all the members of a group, the presuppositions (i.e. the
possible social and intellectual values), which underlie the individual concepts,
never become perceptible.
(Mannheim, [1929] 1954, p. 91)
When defining the discipline of sociology, German philosopher Max Scheler wrote
that ‘this science deals not with individual facts and events (in time – history)
but with rules, types (average types and logical-ideal types), and, where possible,
laws’ (Scheler, 1924, p. 33). By this, he meant that sociology should not be
concerned with describing individual phenomena, but with formulating theories about
the more general and emergent social patterns that arise from people’s interaction.
Scheler argued that ‘the knowledge that the members of a group have of one another
and the possibility of their mutual ’understanding’ is not something that is added
to a social group’ (Scheler, 1924, p. 67), rather, such common knowledge is
constituting social groups as such. Similarly, discourse theorists Ernesto Laclau
and Chantal Mouffe, whose theories we shall return to later in this chapter, argue
that:
An earthquake or the falling of a brick is an event that certainly exists, in the

sense that it occurs here and now, independently of my will. But whether their
specificity as objects is constructed in terms of ‘natural phenomena’ or
‘expressions of the wrath of God’, depends upon the structuring of a discursive
field.
(Laclau and Mouffe, 1985, p. 108)
By this they mean that it is how language-use (the ‘discursive’) is structured in a

given setting, which decides what is the ‘true’ and ‘right’ knowledge in that same
context. What happens on a site like Reddit then, can be understood as the
emergence of a, partly consciously and partly unconsciously, negotiated discursive
structure. It follows from this that a distant reading of a large number of posts
can help reveal the general structures – rules, types, or even laws – of its social
system. Drawing on our word2vec model (see Figure 5.3), one such rule or law could
be that the the hamburger emoji has a very strong tendency to occur together with
the french fries emoji, as well as with the one for pizza. As far as emojis go,
however, the burger is relatively unlikely to come along with snow or money. As
would generally be expected based on prior general knowledge, the green apple
belongs with the red apple (they are nearly the same), but is also similar to a
pear (as the shape is just a bit different), and also along with other popular
fruits. Due to a process of emergent social convention however, the peach (see
Figure 5.4) instead mobilises a set of sexually connoting emoji, those very
connotations relying on a different (internetty) cultural knowledge for their
decipherment. Obvious and banal as it may seem in such examples, the process as
such, when applied to more political dimensions of social life, is anything other
than that.
Figure 5.4 Some examples of similar and dissimilar emoji (Reddit model).
Scheler’s idea is that patterns of such meaning-making and larger systems of

knowledge determine ‘the nature of society in all its possible aspects’ (Scheler,
1924, p. 65), while knowledge at the same time is determined by how culture and
society are structured. This is also a foundational thought for the perspective
that has been called social constructionism according to which there is not one
fixed way of construing reality, but rather a number of competing ways:
The sociology of knowledge has to reject flatly this traditional concept of an

absolutely constant natural view of the world. But it must introduce, instead, the
concept of the relative natural view of the world.
(Scheler, 1924, p. 74)
Scheler (1924, p. 74) argued that some versions of reality become seen as the truth
within certain social groups ‘without question’, and that they therefore get vested
with ‘a givenness which is universally held and felt to be unneedy and incapable of
justification. But, he continued, these same truths, or understandings thereof,
‘can be entirely different for different groups and for the same groups during
various developmental stages’ (Scheler, 1924, p. 74).
Nodes and chains
Social constructionism, as defined by social psychologist Kenneth Gergen, aligns

with the view described above, as it asks us ‘to challenge the objective basis of
conventional knowledge’ (Gergen, 1985, p. 267). Similar to what I assume here when
approaching the language of Reddit, Gergen argues that ‘the terms in which the
world is understood are social artifacts, products of historically situated
interchanges between people’. Aligning with the view proposed in distributional
semantics, about words being defined by the company they keep, he writes that
‘terms acquire their meaning not from real-world referents but from their context
of usage’ (Gergen, 1985, p. 267). This is for example what happens when a concept
like ‘social justice’, in Reddit (and wider) discourse, is disconnected from its
previous positive connotations and turned into a derogatory term (Wikipedia, 2019f;
Urban Dictionary, 2019), or when the words ‘feminist’ and ‘nazi’ are commonly
(cosine similarity 0.67 in our Reddit model) pulled together to form the
politically charged word of ‘feminazi’ – which is widely used, if however not
coined, on the site (Wikipedia, 2019c). Such semiotic and linguistic practices form
part of the construction of knowledge about social reality, and become component
parts in defining views on the world that are dominating in certain social
settings.
Following Scheler, the most famous definition of the sociology of knowledge was
formulated by Mannheim in his book Ideology and Utopia ([1929] 1954). He writes of
how the mental structure of society, or of a social group, appears in different
currents of thought that can be analysed. He uses the German term Weltanschauung
(‘world-view’) when describing the style of thought that designates ‘the outlook
inevitably associated with a given historical and social situation’ (Mannheim,
[1929] 1954, p. 111). When we look at Reddit through word2vec, we can thus be said
to look at its Weltanschauung. Mannheim wrote further that we must ‘take account of
the rootedness of knowledge in the social texture’ (Mannheim, [1929] 1954, p. 29).
In this example, then, one can conclude that Reddit discourse when it comes to
gender – if not in all subreddits, so at the aggregate level – is both constituted
by, as well as constituent of, a broader, largely American, context of anti-
feminist sentiment as expressed and catalysed through phenomena such as Gamergate
(Salter, 2018), anti-#MeToo sentiment, and misogynist troll culture which takes it
as its ‘duty’ to ‘silence the feminized other’ (Phillips, 2015, p. 126). More
generally, this has to do with the rise of the alt-right, as defined by Mike
Wendling:
The alt-right is an incredibly loose set of ideologies held together by what they
oppose: feminism, Islam, the Black Lives Matter movement, political correctness, a
fuzzy idea they call ‘globalism’, and establishment politics of both the left and
the right. It’s a movement that for most of its relatively short history has
existed almost entirely online and one which, despite its lack of organization,
formal political channels, official candidates or party membership, burst into
mainstream consciousness in 2016, in tandem with the Trump candidacy. With amazing
speed after his election victory, the term ‘alt-right’ transformed from an obscure
idea into a commonly used – if sometimes ill-defined – label.
(Wendling, 2018, p. 3)
So, with such added contextual knowledge, what we have measured and mapped with our
word2vec model, can be interpreted as the influence of alt-right rhetoric on the
general language-use on Reddit, notwithstanding of course that many of its
subreddits may well be based on quite different rhetoric, and even an anti-altright
one. The takeaway point here, in keeping with the focus of this book, is that
computational methods can be used alongside critical theories and interpretive
approaches. When querying and exploring our word embedding model, what we are doing
can be construed as an analysis of the rootedness of forms of social and cultural
knowledge through the social texture of Reddit. In sum then, if we approach our
Reddit dataset from this perspective, looking at its linguistic and symbolic
structure, what we are in fact studying is a proxy for a form of collective
consciousness. As Mannheim wrote:
Only in a quite limited sense does the single individual create out of
[her]/himself the mode of speech and thought we attribute to [her]/him. [S]he
speaks the language of [her]/his group; [s]he thinks in the manner in which
[her]/his group thinks.
(Mannheim, [1929] 1954, p. 2)
This is also what Émile Durkheim and Marcel Mauss had written about earlier, saying
that we must study the various connections and divisions between things that ‘the
organization of the collective mind’ gives rise to (Durkheim and Mauss, [1903]
1963, p. 18). This is important to analyse, as influential signifying systems, such
as that of Reddit, contribute to forming social reality, and in turn politics,
economy, identities, and so on. In the case of alt-right discourse, Reddit has
functioned as a bridge from extreme and subcultural media, such as the forums 4chan
and 8chan, the latter self-proclaimed as ‘the Darkest Reaches of the Internet’,
into a more mainstream socio-linguistic space. This point is made by Wendling
(2018, p. 58), who writes that Reddit ‘does serve as one crossover point for the
movement into the mainstream – a bridge from 4chan and the hard-core alt-right to
the wider world’, and once on Reddit, the discourse ‘radiates out onto Twitter and
other social networks’ so that its ‘ideas, phrases and memes’ can ‘worm […] their
way into the public consciousness’. This process, identified by Wendling, can in
other words be studied at volume and in a certain kind of detail through
computational text analysis methods.
While the original sociology of knowledge was rather interested in studying social
structure as a generator of signifying systems, as in the base/superstructure
metaphor, a direction that has been called the ‘new sociology of knowledge’ has
become increasingly interested in particularly analysing language and other
symbolic and signifying operations. The ‘new’ sociologists of knowledge (McCarthy,
1996; Swidler and Arditi, 1994) have been more focused on applying a semiotic
perspective to how meanings are linguistically and symbolically communicated and
produced in society. Within this field, there is also a crossover towards critical
theories about the relationship between knowledge and power (Foucault, 1972), for
example towards discourse theories. Such perspectives can put focus on how the
rhetoric which is prominent in influential social settings in a society, get real-
life consequences through the reciprocal relationship between language and reality,
and between knowledge and power.
One such theory, which can be combined with word2vec is the discourse theory of
Laclau and Mouffe (1985). While it is commonly categorised as a poststructural
theory, rather than as an expression of the sociology of knowledge, its key premise
that discourse (i.e. things that people say and do in social settings) constructs,
and is constructed by, our social world is the same. Being a poststructural theory,
it also emphasises the inherent instability of language, which means that truths
and world-views are never universally fixed. Rather, just like in the sociology of
knowledge, they are dependent on social, historical, and cultural contexts. A key
concept for Laclau and Mouffe comes from Gramsci’s writings about hegemony – the
unstable and contestable power over people’s minds that is held by society’s ruling
groups, or by others who somehow have the power to define what is ‘true’, and which
is the ‘right’ knowledge in a given socio-cultural space and time. In a key passage
of their book Hegemony and Socialist Strategy, Laclau and Mouffe define four key
concepts of their theory:
we will call articulation any practice establishing a relation among elements such
that their identity is modified as a result of the articulatory practice. The
structured totality resulting from the articulatory practice, we will call
discourse. The differential positions, insofar as they appear articulated within a
discourse, we will call moments. By contrast, we will call element any difference
that is not discursively articulated.
(Laclau and Mouffe, 1985, p. 105)
Starting with the concept of discourse, it refers to the general fixation of

meaning within a certain domain. So, approaching Reddit through word2vec and
thereby getting knowledge about how different words cluster together as a
consequence of people’s actual language-use on the site, is a means of mapping that
‘structured totality’. Furthermore, in this case, people’s actual language-use
would be what Laclau and Mouffe call articulation, because it is through this
practice that users of Reddit collaboratively, and at the aggregated meta-level,
establish relations among elements (i.e. words), engaging them as moments (i.e.
words including their relational positions vis-à-vis other words). The structured
totality of relational positions among discursive moments, as described by Laclau
and Mouffe, takes shape around a set of privileged signs around which many other
signs are organised. They name such key signs as nodal points (Laclau and Mouffe,
1985, p. 112). Visualising our Reddit word2vec model and its embeddings using the
TensorFlow framework (tensorflow, 2019), and overlaying the terminology of
discourse theory, the connection between word embeddings and discourse theory that
I have in mind can be illustrated.
In Figure 5.5, words have been plotted based on their cosine similarities using
Principal Component Analysis (PCA), which is an algorithm that ‘tries to capture as
much of the data variability in as few dimensions as possible’ (TensorFlow, 2019) –
basically, it makes the visualisation structured and readable. The figure shows an
agglomerate of the top three principal components of the Reddit model, where words
are represented by dots plotted in three dimensions based on similarity.
Figure 5.5 Visualisation of the Reddit model using TensorFlow’s embedding projector
(projector.tensorflow.org).
In Figure 5.6, this same model has been filtered and zoomed in with a focus on the
word ‘feminism’. Returning to Laclau and Mouffe’s terminology, the figure shows
what could be conceived as the articulation of ‘feminism’ in Reddit discourse
through a 3D visualisation. In this case, we conceive of ‘feminism’ as a nodal
point, around which a set of other discursive moments are relationally positioned.
This way of reasoning can easily get quite abstract, so it is important to remember
that what we are actually looking at is the aggregated result of myriad utterances
and interactions among users of Reddit, and at a slice of how reality is socially
constructed in this setting. It is notable here that Laclau and Mouffe are not
commonly perceived as sociologists of knowledge as they come into this from a
different direction, nor would they be likely to agree that their ‘elements’ are
equal to mere words, or that their theory would be compatible with machine
learning. But this, then, is the point of the hacking mentality that was discussed
in Chapter 1. In order to bring the advantages of interpretive, critical, social
theory closer to those of large-scale, unsupervised, data science techniques, we
must be open to make such creative compromises risking failure and error from some
perspectives and success from others.
Figure 5.6 ‘Feminism’ on Reddit.
Once again, some preliminary reflections can be made about the character of Reddit
discourse on these issues. As we have seen in relation to Figure 5.3 above – which
was just another way of approaching the very same discursive formation – ‘feminism’
is dealt with in this social context in relation to a quite specialised vocabulary
which aligns with dominant forms of language-use in, and about, the alt-right
ecosystem and its antifeminist parts sometimes referred to as the manosphere;
‘incel’, ‘blackpill/redpill’, and ‘mgtow’ (Men Going Their Own Way) (see for
example. Hern, 2018; Wikipedia, 2019d; Williams, 2018). The manosphere, more
broadly, has been described by Alice Marwick and Robyn Caplan (2018) as the online
political and discursive space of the contemporary men’s rights movement, which has
a prominent antifeminist dimension. Marwick and Caplan define the manosphere as a
loosely knit online network, built around the notion of ‘misandry’, a term that
‘encapsulates a theory of feminism as intrinsically prejudicial and threatening
toward men, which provides justification for networked harassment of those
espousing feminist ideas’ (Marwick and Caplan, 2018, p. 544). They explain further
that:
While the manosphere includes a variety of groups, including MRAs (Men’s Rights
Activists), pickup artists, MGOW (men going their own way), incels (involuntary
celibates), father’s rights activists, and so forth, they share a central belief
that feminine values dominate society, that this fact is suppressed by feminists
and ‘political correctness’, and that men must fight back against an overreaching,
misandrist culture to protect their very existence.
(Marwick and Caplan, 2018, p. 546)
So, while the aim here is not to dig any deeper into this analysis, we can conclude
that the word embedding model provides support, richer linguistic context, and a
range of opportunities to pose further questions to the data when it comes to
analysing Reddit as an example of how ‘the internet has been key to the
popularization of men’s rights activism and discourse’ (Marwick and Caplan, 2018,
p. 546). As argued earlier, social settings like Reddit can function as a bridge
between a range of non-mainstream discourses and more legitimate spaces of
political language and practices. Therefore, they are important to study in terms
of how oppressive and hateful ideologies, in a number of areas, may be promoted as
well as counter-hegemonically contested. Such analyses have much to gain from a
combination of computational techniques such as the one used in this chapter to get
empirically validated insights into the data at a range of levels, and by slicing
it and zooming it in different ways, with interpretive techniques such as actually
reading the content of selected key parts of the corpus, and to analyse it
critically and in relation to social and cultural theory.
Another example of a potential connection between discourse theory and word

embeddings is through the theoretical concept of discursive equivalence. In Laclau
and Mouffe’s terminology, things in society (such as identities, or political
categories like ‘democracy’, ‘freedom’, and so on) get their meanings through
practices of ‘equivalential articulation’ with other meanings (Laclau and Mouffe,
1985, p. 116). This refers to the general process of connotation where, to take but
one very basic example, ‘man’ is articulated as equivalent to ‘strong’, and ‘woman’
is articulated as equivalent to ‘nurturing’. The discourse thus simplifies social
space by establishing chains of equivalence by which hegemonic articulations are
achieved (Laclau and Mouffe, 1985, p. 170). Starting from a word embedding model,
with similarity scores calculated between words, one can envision the idea of
grabbing a word in the jumble of the model and pulling it out like a thread where
the word is stringed along with its most similar word, and in turn with the most
similar word to that word, and so on. In Figure 5.7, the starting point is a
potential nodal point in the shape of the word ‘prejudice’ in the Reddit word2vec
model. The model was queried for the top five most similar words to ‘prejudice’,
then for the top five most similar words to those five words, then for the top each
of those five words, and so on in four rounds. The equivalences and chained
relationships between words were then plotted and filtered using Gephi to show only
the strongest similarities, to get an image of what ‘prejudice’ is ‘equal’ to on
Reddit.
Figure 5.7 ‘Prejudice’ chain of equivalence on Reddit.
It must be emphasised that this image, like all others shown in this chapter, must
not be read in the modernist sense as an untangling of the roots and branches of
prejudice as such. This is not about clear and hierarchical structures among
phenomena in the socio-cultural sense. The model is indeed structured, clear, and
hierarchical in the mathematical, or even linguistic sense, but it is not the
structure of society as such that we see revealed in these visualisations. What we
see is a proxy for the structure of discourse – a system of concepts and their
relations which is more fruitfully read as a rhizome. As defined by Gilles Deleuze
and Félix Guattari (1987, p. 6), rhizomes are heterogeneous networks that ‘shatter
the linear unity of the word, even of language’. Furthermore: ‘any point of a
rhizome can be connected to anything other, and must be. This is very different
from the tree or root, which plots a point, fixes an order’ (Deleuze and Guattari,
1987, p. 7). So, from a rhizomatic perspective, and as illustrated with the chain
of equivalence springing from the word ‘prejudice’ in Figure 5.7’s example, it may
always be possible to ‘break a language down into internal structural elements’
(Deleuze and Guattari, 1987, pp. 7–8), as if we were looking for roots and branches
– ‘there is always something genealogical about a tree’, but ‘it is not a method
for the people’ (Deleuze and Guattari, 1987, p. 8). For a linguistic scholar, and
for some data science applications, such as building recommendation systems and
making predictions, the strict tree of a word embedding model is of interest. For
interpretive and critical social and cultural analysis however, it must be read
differently as a fluid system of meanings, where Deleuze and Guattari’s notion of
‘assemblage’ is a fitting metaphor. While that notion is useful in a number of
various analytical contexts, in our case the word embedding model (as a proxy for a
discourse) can be construed as an assemblage – a heterogeneous set of connections
which is constituted by, and constitutive of, social relations. Once we see the
identities and definitions emerging through language-use on Reddit, we can also
assume that ‘once an assemblage is in place it immediately starts acting as a
source of limitations and opportunities’ (DeLanda, 2016, p. 21). Because from the
perspective of social analysis, ‘a rule of grammar is a power marker before it is a
syntactical marker’ (Deleuze and Guattari, 1987, p. 76).
Figure 5.8 provides a hint towards how a similar approach to the one outlined above
could be used in relation to other nodal points such as ‘terrorism’ or ‘climate’.
The point here is that data-driven, unsupervised, machine learning models and
related methodological designs, that tend to be understood in most cases as being
merely descriptive, can be patched into a wider framework of critical theoretical
analysis in useful ways, if one only allows oneself to hack and experiment with
both data science and discourse analysis so that they can be unified within the
same analytical framework. It is common for discourse analysts to approach ‘texts’
(potentially also including computational models) from a constructionist
perspective as floating and contestable systems of meaning. At the same time it is
common for data scientists to measure things like sentiment, linguistic structure,
word similarities, and so on. It is the key argument of this book that there is
much to gain, in the way of achieving research results that are both theoretically
sensitised, critically analysed, and empirically validated, by bringing the two
together. This is possible as long as one is also prepared that the result will
neither be, in this case, ‘robust’ computational linguistics nor ‘real’ discourse
analysis. My argument is however that if we put such demands aside, in the name of
simply doing the best we can with the tools we have at hand, to get research
results that make sense, a whole new field of possibilities – beyond ‘qualitative’
and ‘quantitative’ – opens up.
Figure 5.8 ‘Terrorism’ and ‘climate’ on Reddit.
So, summing up, we can note the possibility of channelling data-driven,

descriptive, models into a framework of critical theoretical analysis. As discussed
in Chapter 1, and illustrated in this present chapter, this can be done through a
form of metaphorical hacking, in terms of (re)purposing social science theories and
methodologies in ways that will also feed into new questions and new readings of
the results of the computational methods. The empirical examples in this chapter
have only skimmed the surface. In actual research practice, studies could address
questions relating to things such as: Which subreddits? Which users? Which periods
of time? What sentiments? Which frequencies and patterns in relation to prominent
political events?, and so on.
So, while computational text analysis techniques such as the ones discussed in this
chapter do not come bundled with any critical social or cultural theories, they can
still be combined with such perspectives. While a topic model or a word embedding
model simply shows groups of words that travel together in a text, perspectives
from the sociology of knowledge, social constructionism, and discourse theory can
provide a framework for how reasons and consequences of words keeping a certain
kind of company can be interpreted and critically analysed at the level of culture,
society, and politics. Importantly, this is not only about naming the patterns with
certain words, such as for example seeing a vector model as an ‘assemblage’ or
cosine similarity scores as indicative of ‘chains of equivalency’. Rather, it is
about approaching the data that we are subjecting to our distant reading in a way
that establishes the analytical connection between the numbers, vectors, and
metrics on the one hand and socio-cultural phenomena like politics, identities,
antagonisms, knowledge, and power, on the other.
Symbolic Power
While Chapter 4 showed an example of an analysis where a pre-selected theoretical

framework guided the application of computational methods, and while Chapter 5
turned the tables to have the data science techniques paving the way for applying a
certain interpretive framework, this following chapter illustrates yet another way
in which data/theory can be achieved. This is about determinedly adapting existing
taxonomies and terminologies for them to work with the knowledge and measures that
we have the possibility to achieve with the dataset and techniques that we have
access to. In this chapter’s case analysis, I will explore how a long-standing and
established sociological theory – Pierre Bourdieu’s theory of social fields, social
capital, and social practice – can be adapted to help analysing power and influence
through the kind of social media information that we have access to with a dataset
of Swedish political tweets.
More specifically, this chapter uses a dataset of tweets posted in relation to the
2018 Swedish general election. This was an election where issues of immigration,
integration, and healthcare were high on the agenda, and through which – after a
period of uncertainty as to which candidate would get the parliament’s support in
forming a government – incumbent Prime Minister Stefan Löfven was re-elected after
his Social Democrat party struck an agreement with the Greens, the Liberals, and
the Centre Party. For the purpose of this chapter however, we are not interested in
the particular political and rhetorical content of the tweets posted during this
campaign, but rather in using the Twitter dataset to pose research questions about
influence on social media. For the case example to be presented here, a set of 1.7
million tweets matching the main Swedish election hashtag (#val2018) was collected
via Twitter’s Streaming API (Russell, 2018; Tweepy, 2009), between 9 April and 9
November 2018, which corresponds to five months prior to, and two months after,
election day (9 September). The aim of the following example analysis is to explore
some different dimensions of status and influence among the most prominent user
accounts within the discursive space constituted by this hashtag.
Analysing processes that relate to the forms of influence and status that are
gained and ascribed by social media users, particularly in political settings, is
of interest since little is still known about the more fine-grained hierarchical
patterns within everyday political discussions online. Previous scholarship in this
field has mainly focused on how the internet and social media, in general, have
played a role in relation to delimited and often very specific political movements,
moments, and events. Social media users have also been seen as a relatively
homogenous and egalitarian group. Divides have mostly been analysed on a global or
societal level.
When we ask ourselves what ‘people’ are saying on social media about a given
political issue, we risk forgetting that social media discourse – like that of
traditional news media – however often informally, is also reliant on gatekeepers,
agenda setters, and ‘editors’. It is of interest to get to know more about such
actors, but throughout the history of internet research, generalising collectivist
abstractions such as ‘the wisdom of crowds’ (Surowiecki, 2005), ‘collective
intelligence’ (Lévy, 1999), and ‘participatory culture’ (Jenkins, 2006 ), have
contributed to concealing hierarchies and differences among speaking subjects
online. Social media interactions and discourse have been construed by many as
being non-hierarchical and without leaders (Castells, 2015), but in spite of this
it is rather clear that the content produced by different users has different
possibilities for reaching large audiences (Van Dijck and Poell, 2013). This will
inevitably lead to some users being more influential than others. Within the field
of internet research, authors have argued that the ‘crowdsourced elites’
(Papacharissi and de Fatima Oliveira, 2012) who gain recognition and status when
their content is shared, mentioned, and recognised as being important (Huffaker,
2010), must constantly (re)perform their positions to keep them (Bakardjieva, Felt,
and Dumitrica, 2018).
French sociologist Pierre Bourdieu developed his theory of social practice

throughout his life, and most prominently in the books Outline of a Theory of
Practice (1977), Distinction (1984), and The Logic of Practice (1992). At the
centre of this theory sits a formula which he formulated in quasi-mathematical
terms, even though he was mostly seen as an interpretive, ‘qualitative’,
anthropologist and sociologist of culture. The formula reads: ‘[(habitus) –
(capital)] + field = practice’ (Bourdieu, 1984, p. 118). Unravelling this equation
from left to right, the core idea is that theories are only theories, and that it
is only by empirically analysing social practice, the things that people think,
say, and do, that we as researchers can get a glimpse of the underlying components
and building-blocks of society and sociality. This is to say that no matter which
theoretical concepts we invent, we cannot study them directly. We cannot analyse
things like ‘power’, ‘gender’, ‘violence’, or ‘happiness’ as abstract constructs.
But we can use studies of social practice as a sounding board for such concepts,
devising the concepts in more precise ways through knowing things about social
practice, and getting better knowledge about social practice by using the concepts.
As argued by Bourdieu, this is how we can reveal ‘the unity hidden under the
diversity and multiplicity of the set of practices performed in fields governed by
different logics’ (Bourdieu, 1984, p. 118). In this chapter we will approach the
political tweeting that happens in our dataset as one such social practice to be
analysed. Returning to the formula that was mentioned above, ‘practice’ in turn
consists of a logic of social fields, and its interplay through habitus and capital
in conjunction. These key concepts will be discussed in more detail below.
Social fields
As explained in the quote from Bourdieu towards the end of the previous paragraph,
he sees social practice as something which happens in a number of different social
fields. Within the larger relational space of society, people do stuff (carry out
social practices) in a variety of different social arenas. Bourdieu’s notion of the
field designates distinct sub-spaces within the wider confines of society at a
global level. Basically, the social world houses an indefinite number of fields,
where each field is its own space of relations dedicated to some particular type of
activity or slice of the world. While the fields are separate and autonomous to
some extent, one of the key points in Bourdieu’s field theory is that there is also
a structural similarity – homology – between different fields. Fields from ‘the
most dissimilar areas of practice’ are ‘organized in accordance with structures of
opposition that are homologous to one another’, because they will all align with
some meta-level hierarchies of social life, such as those based in class, gender,
race, and other stratifying dimensions (Bourdieu, 1984, p. 175).
As ‘field’ is an analytical concept, rather than a word referring to any

unambiguous social phenomenon, it is up to the researcher at any given point to
decide what constitutes the field in question. The analysis could focus on fields
at a number of different levels or magnitudes, such as the field of international
politics, the field of vegan subculture in south England, the field of economic
journalism, or the field of a particular local sports club in Sweden. Bourdieusian
fields are defined as social contexts – institutions, discourses, roles,
hierarchies, regulations, and customs – which authorise, generate, and transform
the attitudes and behaviours of those people who engage in the field. It is an
important point here that while such fields define the perimeter as well as the
rules for certain forms of social practice, the fields themselves are at the same
time created and constantly reshaped by the often conflictual social activity that
happens within them. This means that social fields, similar to the actor-networks
discussed in Chapter 4, should be seen as dynamic systems of interactions, rather
than just inventories of static elements. This idea of rule-based but fluid fields
developed by a cultural sociologist fits well with this book’s mission to bring
such theories together with the mathematical logic of data science. As explained by
Hilgers and Mangez (2015, p. 2):
When Bourdieu was developing his theory, the concept of the field was already in
common use in other disciplines. Physics, mathematics and psychology had devised
field theories with various degrees of systematicity. While the theory of fields
developed in sociology by Bourdieu was constructed in a relatively autonomous
fashion, it nonetheless shares a common epistemology with them.
In this chapter, I approach the 2018 dataset of Swedish election tweets as

reflective of a particular social field, within which parts of Swedish political
discourse play out on social media. Drawing on field theory, I assume that it
therefore has a specific legitimacy and functioning that can be approached and
interpreted from Bourdieu’s perspective.
Capital and habitus
Bourdieu developed his notion of habitus as a simultaneous critique of two

sociological dilemmas. On the one hand, he wanted to avoid subjectivist
perspectives on human activity, that saw social actions and events as driven by the
self-sufficient agency of individuals. On the other hand, he also wanted to
critique approaches that were overly objectivist, and that saw the activity of
humans as the product of social structures beyond the control of discrete sole
actors. The solution presented by Bourdieu is that every individual bears within
them a habitus, a guiding principle, compass, scheme, or disposition, that has been
shaped by society.
So, while a person can act somewhat freely, the acts are always filtered through,
or shaped by, that person’s habitus, that expresses how the person in question
engages in practices. In other words, the habitus disposes the individual to
certain manners, activities, and perspectives that have in turn been socially
constituted. In Bourdieu’s words, habitus refers to ‘the durably installed
generative principle of regulated improvisations [which produces] practices’
(Bourdieu, 1977, p. 78). In other words, society has ‘installed’ scripts –
‘generative principles’ – in the individual, and social life is ‘improvised’ but
still in ways that are regulated by the intersection of social fields and habitus.
When people do things in social contexts, their habitus will modulate what they do,
and with which social effect (that is: how their behaviour is recognised by
others). The habitus of an individual is the product of a number of social factors
such as cultural history, class background, education, gender, race, and so on. As
explained by Webb, Schirato, and Danaher (2002, p. 37), we can act in a variety of
ways (improvisations are allowed), but our acts are still ‘always largely
determined – regulated – by where (and who) we have been in a culture’.
Returning to the case example for this chapter, we want to use social media data to
analyse forms of habitus among influential individuals in the social practice of
the social field of the 2018 Swedish election discourse on Twitter. And, looking at
Bourdieu’s formula – [(habitus) (capital)] + field = practice – once again, the
last element that remains is ‘capital’. Simply put, capital is what the habitus
consists of. The habitus reflects the amount, as well as the particular
composition, of capital that an individual has. Bourdieu does not speak only of
capital as in monetary wealth, however, as his focus on so-called symbolic capital
allows for a set of other forms of resources to be included in the concept. In this
theory, the definition of capital is wide and can include material things (that are
valuable in a symbolic sense), as well as other intangible attributes like status,
authority, prestige, and so on. To Bourdieu (1977, p. 178), capital refers to:
all the goods, material and symbolic, without distinction, that present themselves
as rare and worthy of being sought after in a particular social formation – ‘fair
words’ or smiles, handshakes or shrugs, compliments or attention, challenges or
insults, honour or honours, powers or pleasures, gossip or scientific information,
distinction or distinctions, etc.
Most famously, Bourdieu discussed cultural capital and social capital as being of
equal importance to economic capital. From the perspective of his theory, the
amount of power that an individual has within a certain social field will depend on
what position that individual holds in that field. In turn, that position will be
largely decided by the individual’s habitus, which in turn is the result of the
amount and composition of capital forms – symbolic resources – that the individual
has access to, and how those resources are recognised as valuable or important by
others in the field. In turn, those in power will be able to impact future
modifications of the norms or rules in that field.
Bourdieu’s theory is potentially very useful for studying how various social actors
assume different positions of power within the social order. Not only does the
theory state that the more social resources one has, the more powerful the
position. It also emphasises that different forms and combinations of resources
will be effective to varying degrees depending on the social setting. Having the
‘right’ habitus for a particular field gives the individual in question a ‘feel for
the game’ by which she will ‘just know’ how to navigate successfully in that social
environment. Doing an analysis of – as in the case of this chapter – a social field
of political tweeting can therefore not only reveal which individuals are more
influential than others, but will also contribute to mapping out the ‘rules of the
game’. Bourdieu famously said that sociology is a ‘combat sport’ that should work
to critically take on and expose such underlying structures of social life (Carles,
2001).
Adapting Bourdieu’s capital forms to social media data
We move on now to the transformation and adaption of this theory to the case of
studying influence and power in political tweeting. In order to be able to make use
of the Bourdieusian idea of fields, habitus, and capital in analysing social
practice, we need to define as well as operationalise the forms of capital that we
will measure. This process is reliant on a combined consideration of what type of
information we are looking for, and to what degree the data we have are able to
give us that information. In the case of our Twitter dataset, we have access to
data about all individual tweets (such as the date and time that they were posted,
by whom, what words they included, and so on), as well as about all active users
(such as their user names, number of followers, number of favourite markings
received, etc.). For our case analysis, let us now try to emulate something akin to
the three types of symbolic capital that are central to Bourdieu’s writing, namely
economic, social, and cultural capital.
Economic capital, for Bourdieu, refers to material wealth – a resource that is

‘relatively stable’ (Bourdieu, 1977, p. 66). He writes, however, that:
Wealth, the ultimate basis of power, can exert power, and exert it durably, only in
the form of symbolic capital; in other words, economic capital can be accumulated
only in the form of symbolic capital, the unrecognizable, and hence socially
recognizable, form of the other kinds of capital.
(Bourdieu, 1977, p. 195)
This means that the tangible resources become powerful in social fields only in
their symbolic form, such as when the wealth or consumption pattern of someone is
recognised in society as something which increases the status of that individual.
In the context of our dataset of political tweets, we have no systematic
information about the economical and material resources of the users that are
active in the Swedish political setting. There is an opportunity, however, to map
other forms of ‘relatively stable’ and measurable ‘tangible’ resources that may
function akin to a currency under the conditions of social media. For this case
example, I choose to define the number of followers that a particular Twitter
account has as such a resource. Indeed, the follower count bears resemblance to
material wealth. On the internet today, followers can be bought and sold, and
social media influencers get jobs and brand deals based on their follower count,
and so on. So, for this example, our ‘economic capital’ (within citation marks) is
operationalised in terms of the follower count of users. Technically speaking, a
variable was defined in the dataset reflecting the maximum number of followers held
by each user during the period of time that the data spans. In Bourdieu’s terms,
this can potentially be seen as a resource that can transmute into symbolic capital
under given circumstances.
Social capital refers to the individual’s ‘possession of a durable network of more

or less institutionalized relationships of mutual acquaintance and recognition’
(Bourdieu, 1986, p. 248). The logic behind this is that social groups possess a
collectively owned capital that entitles all members of the group to credit. In
Bourdieu’s writing, social capital is connected to things such as belonging to a
social class, the name of a family or a school, a corporation or party, and so on.
Bourdieu (1986, p. 249) explains that: ‘The volume of the social capital possessed
by a given agent thus depends on the size of the network of connections he can
effectively mobilize.’ The notion of social capital therefore sits particularly
well with the world of social media, where such connections are explicitly and
continuously made, unmade, enacted, maintained, and reinforced. Social media data
enable the empirical study of what Bourdieu (1986, p. 249) saw as the ‘symbolic
exchanges, the establishment and maintenance of which presuppose re-acknowledgement
of proximity’. The online economy of attention, likes, shares, and retweets – just
like Bourdieu’s fields more generally – presupposes that members of networks
constantly re-acknowledge their homogeneity, which ‘exerts a multiplier effect’ on
the capital that the individual possesses in her own right.
The reproduction of social capital presupposes an unceasing effort of sociability,

a continuous series of exchanges in which recognition is endlessly affirmed and
reaffirmed.
(Bourdieu, 1986, p. 250)
In the example analysis of Swedish political tweets, I have chosen to measure

‘social capital’ (within citation marks) in terms of the network metric of
betweenness. This is a measure of centrality that is based on the degree to which a
node in a network is placed on the shortest paths between other nodes, and thus has
the potential to control communication (Freeman, 1977). In other words, a node that
has a high degree of betweenness is a node that a large number of other nodes has
to go through in order to reach each other. Such nodes are hubs of communication. I
created a graph of connections between users based on the network of ‘friendships’
that can be garnered from their lists of followers, and calculated betweenness
scores based on that graph. The betweenness metric will be high for those users
that are followed by many others, who do not necessarily in turn follow each other.
Cultural capital, finally, refers mainly to the knowledge and intellectual skills
of the individual, that enables achieving a higher social status. The concept as
described by Bourdieu is quite complex. It has been most commonly used, by him and
others, in discussions of the role of education and social background in relation
to social status. Cultural capital is a measure of how cultivated or ‘civilised’ a
person is, and is the result of a combination of ‘innate’ (i.e. socially inherited)
and acquired properties (Bourdieu, 1986, p. 245). In wider use, however, cultural
capital has to do with an individual’s ability to present and hold themselves in
subtle, partly unconscious, ways to get recognition. When it is the most powerful,
cultural capital is ‘unrecognized as capital and recognized as legitimate
competence’ (Bourdieu, 1986, p. 245).
In the context of social media, such cultural capital can potentially be measured
in a broad range of different ways. Cultural capital could for example be seen in
terms of the language-use in tweets (knowing the right things to say, and how to
say them), or in terms of favourite markings (being part of defining what type of
content ought to be liked), or in terms of being social media savvy (the degree to
which a user is able to employ affordances such as hashtags, links, audiovisual
content, memes, etc., in the appropriate fashion). For the case example in this
chapter, I have chosen to operationalise ‘cultural capital’ (within citation marks)
in terms of the total number of retweets achieved by users included in the dataset.
This is because retweets, even though they far from always reflect pure
endorsements, reflect the importance and attention ascribed by others to the
content that has been posted. We can see the Twitter stream of a given user as a
series of performances, and the retweets they get as a measure of the success or
recognition of the performances’ influence. Bourdieu writes of how:
the continuum of infinitesimal differences between performances, produces sharp,

absolute, lasting differences, such as that which separates the last successful
candidate from the first unsuccessful one, and institutes an essential difference
between the officially recognized, guaranteed competence and simple cultural
capital, which is constantly required to prove itself. In this case, one sees
clearly the performative magic of the power of instituting, the power to show forth
and secure belief or, in a word, to impose recognition.
(Bourdieu, 1986, p. 248)
Accordingly, the number of retweets can be construed as a measure of how

‘officially recognised’ users are able to ‘impose recognition’.
A social space of political tweets
Having made the above intellectual effort of adapting Bourdieu’s theory for it to
be operationalised, we have created the possibility of using it to analyse the
dataset of tweets. To do so, a matrix of all users was created with columns of data
reflecting ‘economic capital’, ‘social capital’, and ‘cultural capital’. In order
to be able to put these forms of capital together dynamically – as in habitus – the
different scores were normalised to follow a 0–1 scale. So, while for example the
number of followers (‘economic capital’) spanned the range of zero to several
hundred thousand, this was transformed so that the top account had the value of
1.0. The same operation was made for the other measures as well. This enables
visualising the field of political tweeting in relation to the 2018 Swedish
election as a three-dimensional space where user accounts are positioned in
relation to axes that represent the three forms of capital.
Looking first at Figure 6.1, we see the social space with users plotted as circles,
based on their scores (0–1) on the three scales. The dark grey bar at the back of
the space represents a user – and ideal type – that has a very high score on the z-
axis (‘social capital’), but very low scores on axes x and y which represent the
other capital forms. From a Bourdieusian perspective, this is a user who can hold a
position of comparatively high status on the field in question, but who has a
habitus that is composed mainly of one form of capital. While such capital, like
monetary currencies, may be converted into other forms of capital, this example
user had not been able to do so at this particular point in time. The user account
represented by the dark grey bar is very well connected in the network of Swedish
political tweeting, meaning that it is strongly embedded in the network of
follower/followee-relationships that have formed in this setting. Let’s call this
connective capital.
Figure 6.1 Connective capital and engagement capital in the social field of Swedish
political tweets.
Looking further at Figure 6.1, the outstretched light grey box along the back left
wall of the space illustrates a user account (the circle in the top left corner of
the box) that stands out through its scores on the z-axis (‘social capital’) and
the x-axis (‘cultural capital’), while still having low scores on the y-axis
(‘economic capital’). So, aside from drawing on significant connective capital,
such a user also achieves status through being recognised and amplified through
retweets. We can call this resource engagement capital.
Moving on to Figure 6.2, the light grey box covering the back right wall of the
space, and also being a bit thicker than the shape of the engagement capital case
in Figure 6.1, represents a user account that has a high degree of ‘social
capital’, but also of ‘economic capital’. This means that this user assumes a
prestigious position in this social space on the basis of holding a high level of
the tangible and explicitly measurable resource of having a large following
(‘economic capital’). Let us call this resource attention capital.
Figure 6.2 Attention capital and symbolic capital in the social field of Swedish
political tweets.
To Bourdieu, ‘symbolic capital’ is the crucial form of capital, as it is the link

between the other capital forms and the habitus. The symbolic capital held by an
individual is the result of successful conversions of other forms of capital. It is
thus a form of meta capital that represents the parts of the other capitals that
are recognised in a given instance of social practice. As symbolic capital
represents the ‘collectively recognized credit’ ascribed to an individual
(Bourdieu, 1977, p. 41), one might explore the idea that the dark grey bar at the
back corner of the social space in Figure 6.2 is indicative of a user that has
strong symbolic capital – and a fitting habitus – for the field that we are
analysing. The example user account based on which the dark grey shape has been
drawn is one of the few users in the dataset that tend away from any extreme bias
in terms of which capital forms they possess. An imaginary ideal user would be
positioned as far away as possible from the floor level in the back corner of the
space. A user positioned at ceiling level at the point where the maximum levels of
axes y and x intersect would have the maximum amount of all three capital forms. In
reality, however, no existing user account nears this point. But a few users occupy
positions similar to the one illustrated by the dark bar in Figure 6.2, and would
thus in Bourdieusian terms be particularly well-placed to exert power and influence
within the field, as well as being able to take part in defining the rules of the
field as such.
Looking closer at which user accounts are hidden behind the forms of habitus that
the shapes in Figures 6.1 and 6.2 show, points in some interesting directions. Upon
closer inspection the ideal typical user with a high degree of connective capital
is neither a politician, nor a journalist or celebrity, but a seemingly regular
user, who is however deeply embedded in a highly specialised social media milieu
consisting of intensely active supporters of Swedish far-right movements and
parties. Furthermore, the ideal typical user that stands out in terms of engagement
capital is yet another individual from this same environment, but whose power to
engage others to share and circulate their posted content makes this person into an
underground informal leader of Swedish far-right tweeting. By contrast, the user
account commanding the standout level of attention capital (see Figure 6.2) is the
account of one of the largest mainstream Swedish newspapers.
Interestingly, the account with the strongest symbolic capital belongs to a person
who has connections both to the mainstream media, and to what might be labelled the
Swedish alternative right. This would point towards preliminary research results
where we get to know things both about how social media influence more generally is
constituted based on the interplay of a variety of different symbolic resources,
and more specifically about the processes by which the far-right have been able to
claim substantial parts of social media discourse on politics in Sweden, and
elsewhere. The main point in the context of this book is however, once more, that
we have been able to combine an interpretive perspective from social science, with
a data science approach. Other scholars, have indeed also begun to explore
potential connections between Bourdieu’s perspective and computational methods. For
example, Roose, Roose, and Daenekindt (2018) have used LDA topic modelling to
analyse contemporary art discourse drawing on Bourdieu’s theories, and Murtagh
(2018) makes the case for a ‘Bourdieu-based science of data’, arguing that social
field structure can be studied using methods such as machine learning. More
generally, this also relates to a broader effort towards ‘quantifying Bourdieu’
(Robson and Sanders, 2009), where scholars have used statistical means such as
regression models, latent class analysis, and structural equation modelling to
respond to research questions raised through Bourdieu’s terminology.
In the conclusion of this book, I will engage in a more elaborated discussion about
how and why we should allow ourselves to alter and modify theories in quite far-
reaching ways, even though it may come across to some as doing the theory ‘wrong’.
It is indeed counter-productive to see theories as fixed. This is especially in the
face of the increased complexity of twenty-first-century sociality alongside the
simultaneous datafication of social life and social research. As argued by
sociologist Omar Lizardo (2015, p. 3):
the conditions for doing theory today have changed so dramatically that it is
unlikely that the sort of ‘theory’ that self-identified ‘theory people’ will be
producing today will bear any resemblance to the sort of thing that was considered
theory (and which defined the prototype for the theory category) a generation
before.
Education researchers Costa and Murphy (2015, p. 195) address the need to combine
theories such as the ones formulated by Bourdieu with other, not necessarily self-
evident, perspectives in order to allow for the kinds of operationalisations that
we want to make. Bringing different theoretical approaches together makes novel
adaptations possible. They argue that there are big benefits of adapting social
theory so that it fits the research project, rather than the other way around. In
sum: ‘it pays not to be too concerned with doing Bourdieu “correctly”’ (Costa and
Murphy, 2015, p. 195).
Theoretical I/O
The aim in this final case-oriented chapter is to engage in a more concrete

discussion about how a generic analytical strategy that combines interpretive
sociology with data science and computational methods can be developed. I do this
by returning to sociological methodologist Barney Glaser’s notion of theoretical
sensitivity, and make the case that his vision for the research process can be
translated into the age of datafication as a means of securing the presence of an
interpretive sensibility. I present an iterative, alternating model for switching
between machine and human, and between data and theory, in a structured approach
for how data (science) and (social) theory can be incorporated in empirical
research. The chapter also includes a concrete example of how the approach may work
in practice. This is in the form of a case study which involves Marx-oriented
critical theory, computational text analysis, and the empirical case of the
#deletefacebook movement on Twitter. The case example will be presented bit by bit,
interjected between the more abstract discussions of the analytical framework.
Gaining theoretical sensitivity
Sociologist Barney Glaser is first and foremost known for being an originator of
the so-called Grounded Theory research method – an approach that has been widely
adopted during the last half-century within the areas of social research that have
been labelled as ‘qualitative’. In brief, the method builds on an approach where
the researcher starts out quite theory-less (inductively), and then works through
empirical data systematically to discover patterns, and ‘ground’ (validate) them,
for a coherent ‘theory’ to finally emerge. Glaser formulated the method together
with co-author Anselm Strauss in the late 1960s book The Discovery of Grounded
Theory (1967). Their original text has more than 120,000 citations according to
Google Scholar, and has formed the basis for an entire scholarly speciality of
Grounded Theory research, as well as for different simplifications or revisions. In
writings following the initial book, Glaser and Strauss, the latter having passed
away in 1996, did not agree on all aspects of the method, and it has also been
developed in different directions in their own writings as well as in the writings
of others. The position of Grounded Theory in the ‘qualitative’ research community
has been strong, but also widely debated because of its maybe unrealistic belief in
a pure form of induction, and for the ways in which it claims to ‘discover’ theory.
I do not deal with Grounded Theory as a detailed methodological procedure in this

book. I however take great inspiration from its ideas about being simultaneously
involved in data work (collection and processing) and analysis (constructing
concepts and contrasting with theory). In particular, I draw on Glaser’s 1978 book
on Theoretical Sensitivity, where he introduces a very useful, and inspirational,
way of thinking about the relationship between data and theory in social research.
He wrote:
What is the relationship between data and theory? – a standard question in

sociology. Two less frequent questions, though quite important, are: how do we get
those sociologists, who are afraid of it, into their data; and how do we get those
stuck in it, out of their data?
(Glaser, 1978, p. 15)
In the following, I will present a rebooted version of Glaser’s notion of

theoretical sensitivity in developing a strategy for combining social theory with
data science. This involves a renegotiation that substitutes the more
‘qualitatively’ oriented steps of iterative data processing, advocated by the
original Grounded Theory method, with explorative and experimental computational
techniques. Note therefore once again, that even though Glaser is a key influence
for the approach that I am presenting, I do not claim, or even try, to do by-the-
book orthodox Grounded Theory research. Importantly however, revising and adapting
the method for different and particular uses is also encouraged by Glaser (1978, p.
ix), who explains that his approach ‘is not doctrine’, and that he ‘trusts that
readers can see other possibilities’ for doing their research influenced by the
approach. The same idea is expressed in the original book by Glaser and Strauss
(1967, p. 12) where they say that they wish to encourage ‘able sociologists’ to
‘start developing methods of their own for all of us to use’.
The first step in gaining theoretical sensitivity is to begin the research process
by having ‘as few predetermined ideas as possible – especially logically deducted,
a prior hypotheses’ (Glaser, 1978, p. 3). The idea here is that the researcher must
try to be open to new discoveries when approaching the data. So, rather than
deciding on a theory, and then starting to look at the data through the particular
lens of that chosen theory, one should enter the field of research being as
unbiased as possible by any interpretive terminology. At the core of theoretical
sensitivity is the ambition to ‘remain open to what is actually happening’ (Glaser,
1978, p. 3). So, even if theories, both those we borrow from others, or repurpose,
and those that we invent ourselves, are key to the analysis as a whole, we should
start out open. And one way to do just that, is to allow oneself to be strongly
data-driven. It is worth noting here, that while I started out in a somewhat more
theory-driven fashion in Chapters 4 and 6, and used a more data-driven approach in
Chapters 3 and 5, these were all for the sake of illustration, as the case-examples
were orchestrated as such. In actual research practice, both sides (data and
theory) are always there. Taking our cue from Glaser, however, the researcher
starting out on a project should be firmly footed in the data, and with any pre-
existing theory temporarily in brackets.
Assuming that ‘the reality produced in research is more accurate than the theory
whose categories do not fit’ (Glaser, 1978, p. 4), any research process must start
from the general question about what is actually happening in the data. The
research process that follows means moving between theory and data. This can be
done in different ways, Glaser explains. One way is by pure deduction, that is by
using data to validate or falsify a pre-defined theory. Another way is to
unsystematically analyse lots of data, and to simply name the common-sense
impressions garnered along the way in theoretical language. Yet another strategy
for connecting theory to data is to systematically develop a few major categories
early in the analysis, and then proceed to simply describe them at length with the
help of the data. This means quickly deciding on a particular set of headings, and
then seeing everything else in terms of them. None of these ways, however, promotes
theoretical sensitivity, which comes from a ‘detailed grounding by systematically
analyzing data […] until a theory results’ (Glaser, 1978, p. 16). The analysis
should focus not on organising the data, but on organising the ideas that emerge
from it. Once one has arrived at a theoretical understanding, the data can be used
to illustrate the fitting theory – whether developed by others, modified, or newly
formulated.
But still, the idea is not to begin researching from the position of being a
complete novice. Starting out, the theory is within brackets, but not non-existent.
The researcher must be ‘steeped in’ literature, and thereby be familiar with a wide
variety of theories (Glaser, 1978, p. 3). It is by drawing on this capacity that
the researcher can then start to make sense of the empirical findings. Aligning
well with the overall objective, and title, of this present book then, Glaser’s
approach offers:
a perspective on both data and theory. It contends that there is much value in the
conceptualizing and conceptual ordering of research data into a body of theory.
This theoretical grasp of problems and processes within data is […] a very useful
way to understand what is going on in a substantive area and how to explain and
interpret it.
(Glaser, 1978, p. 3)
I suggest combining Glaser’s overarching ideas about theoretical sensitivity with a
range of other methods for achieving the systematic data analysis that he asks for.
In the context of this particular book, the aim of which is to bring computational
methods and social theory closer together, we turn particularly to data science
techniques. So, while the original rendition of Grounded Theory encourages a more
ethnographic data-drivenness, the approach that I am suggesting here goes beyond
the conventionally ethnographic to embrace all forms of data-drivenness – but still
within a theoretically interpretive framework.
Data science as ethnography
I wrote above that we should move beyond the conventionally ethnographic. By

conventional ethnography, I am then referring to participant observation, in-depth
interviews, and other forms of ‘qualitative’ fieldwork that is usually associated
with the concept. If we think in broader terms however, and focus on what the
general idea and goal is of ethnography – literally meaning the writing of
culture/society – we see that society can be ‘written’ in many ways. And, yes, we
can even write (about) our world with the help of data science.
Even though the methods that I use in this book, which come from fields such as
text mining, computational linguistics, machine learning, and network analysis,
could pass as ‘quantitative’ by conventional measures, a stimulating methodological
point of reference here is however that of ‘netnography’ as defined by Kozinets
(2015, p. 3). He argues that beyond traditional categorisations of methods, the
study of sociality online must be about ‘intelligent adaptation’ and ‘considering
all options’. For Kozinets, the root must be in the core explorative, descriptive,
and interpretive principles of conventional ethnography, but netnography also seeks
to selectively and systematically seize ‘the possibilities of incorporating and
blending computational methods of data collection, analysis, word recognition,
coding and visualization’ (2015, p. 79). This is because notions of what
constitutes ‘the field’ or ‘the data’ are shifting in datafied settings, which
means that also computational techniques can be (re-)construed as new forms of
‘ethnographic’ methods.
From that perspective, the rather episodic and offhand use – by regular measures –
of many of the methods in this book should be seen instead as elements in achieving
a methodological bricolage. As explained by Denzin and Lincoln (2005, p. 6), the
empirical endpoint is ‘a reflexive collage or montage, a set of fluid,
interconnected images and representations […] connecting the parts to the whole’.
This is also in line with Mayer-Schönberger and Cukier’s (2013) argument that
social research in the digital era, and its data environment, demands embracing ‘an
ethos of inexactitude’, and to ‘accept messiness as par for the course, not as
something we should try to eliminate’ (Mayer-Schönberger and Cukier, 2013, p. 44).
The different case studies in this book, for example, rely largely on social media
data of variable quality. While this does not sit well with conventional ideas
about sampling, bias, reliability, and robustness, if we accept the inherent
messiness of cross-platform digital social research, we will have ‘sacrificed the
accuracy of each data point for breadth, and in return we received detail that we
otherwise could not have seen’ (Mayer-Schönberger and Cukier, 2013, p. 44). As
argued by social researchers Colin Robson and Kieran McCartan (2016, p. 8), there
is a point in being ‘deliberately promiscuous’ in methodologically combining
techniques to arrive at strategies that are as appropriate as possible to the
particular research questions asked.
Messiness, indeed, is not always bad. Seen from the perspective of Geertz (1973),
it may even be a positive feature. For him, the ultimate goal of research is to
provide a thick description of the patterns, modes, and functions of social life
(see Chapter 1). The basic assumption on which this approach rests is that society
is ‘semiotic’ – it is made out of a complex set of symbols in the form of language,
traits, customs, gestures, attitudes, actions, and so on, which are webbed together
in systems ‘within which they can be intelligibly – that is, thickly – described’
(1973, p. 14). In other words, the project to cross-fertilise data science and
social theory may be carried out with such an ethnography that is focused on – in
Weber’s ([1921] 1978: 4) terms – ‘the interpretive understanding of social action’,
but using computational methods to carry out parts of the ‘reading’.
Another classic anthropologist, Bronislaw Malinowski (1922, p. 7), said in a

similar vein that ethnography should lay bare unknown social and cultural
principles that govern what previously seemed ‘chaotic and freakish’, ‘sensational,
wild and unaccountable’. This is also now the promise of data science, but its
computational techniques must be connected to an interpretive stance, and to social
theory. As put by González-Bailón (2017, p. 23), ‘everyday talk still articulates
our world’. Even in the age of big data, the social is still about people and
culture. This means that the research problems for data science, can be
surprisingly similar to those of anthropology and ethnography.
So, even though we employ computational tools from the field of data science that
were somehow developed in relation to ideals of positivist exactness, predictions,
and measurements, those very tools – in order to be of sociological use – must be
fitted into a wider interpretive framework. Looking back at what has been discussed
this far, I suggest that the data science tools are paired with an ethnographic
mindset (cf. Geertz) re-envisioned as part of a netnography (cf. Kozinets), which
is strongly data-driven, yet interpretive (cf. Glaser), and aims to feed its
findings forward, and interpret them, by maintaining a connection to social
theories – new, renegotiated, or invented.
Figure 7.1 is a visual representation of the data/theory relationship in the

process of research and analysis, as I have gradually introduced, exemplified, and
developed it throughout this book. Generally, it shares sociologist Herbert
Blumer’s (1954, p. 3) vision of these relationships:
Figure 7.1 Theoretical I/O
Theory, inquiry and empirical fact are interwoven in a texture of operation with
theory guiding inquiry, inquiry seeking and isolating facts, and facts affecting
theory. The fruitfulness of their interplay is the means by which an empirical
science develops.
As data and theory should be in continuous dialogue with each other, the figure is
illustrative of a relational system that can be explained in different orders and
directions, depending on where in the process one starts. This truly introduces a
pedagogical problem, as a book such as this one must be written in a linear way. I
simply must write about some things before others, and other things after others.
It cannot be stressed enough, however, that Figure 7.1 and the discussions and
examples that are related to it could be presented according to different
chronologies.
In practice, the researcher enters the field being simultaneously:
Theoretically sensitised (knowing about, and thinking through, theories).
Methodologically inclined (having a certain toolset to draw upon).
Empirically interested (wanting to know more about a particular thing).
For the sake of presentation, then, we can start here at the bottom of Figure 7.1,
and with data. The idea is that we have an empirical interest in some phenomenon or
real-life social activity, on which we have, or will gather, some kind of tangible
information. Research about social life and politics is indeed ‘real world
research’. In the end, we are interested in getting to know valuable things about
people’s lives, experiences, and the social systems in which they are set. We want
to understand ‘the lived-in reality of people in society and its consequences’
(Robson and McCartan, 2016, p. 3). Real-world research is research that ‘focuses on
problems and issues of direct relevance to people’s lives, to help find ways of
dealing with the problems or of better understanding the issues’ (Robson and
McCartan, 2016, p. 4). And, of course, in order to be able to carry out such
research, we need information, data, about social patterns in the part of reality
that we want to study. This slice of the world, that we are focusing the research
on at a given moment in time, is what is labelled as the empirical case in Figure
7.1.
Empirical interest: #deletefacebook
The real-life social activity that the empirical interest is focused on in this
case example is the anti-Facebook movement, and especially its increased online
activity and intensity following the data scandal of 2018. So, two of the crucial
research activities entail mapping out the picture of its social context, and
acquiring relevant data.
Context
In March of 2018, it became known that the British political consultancy firm
Cambridge Analytica had used personal information that was harvested from more than
50 million Facebook profiles. According to later reports, the number might have
been as high as 87 million. These data had been collected without permission, with
the aim of developing a system for mapping the psychological profile of users, and
targeting them with personalised political advertisements to try to affect their
voting behaviours. These details became known to the public through a
whistleblower, who claimed to have worked together with a professor at Cambridge
University to collect the data.
The data collection was done through the creation of an app called ‘This is your
digital life’, through which users could take a personality test. When installing
the app, participants had to sign in with their Facebook login, and also click to
agree that their data could be used for academic research. The app however went
beyond its stated scope and scraped data not only from the test-takers, but also
their friends, also including private messages. The app thus exploited Facebook to
harvest data from the profiles of millions of people.
It was reported that Facebook had known about the data breach since late 2015, but
had only taken limited steps to counteract it. It was not until the revelations in
March of 2018 that Cambridge Analytica was suspended from the platform.
The whole affair further fuelled a long-standing suspicion among the general public
that social media surveillance could be secretly used for political ends, and gave
rise to intense debate throughout a range of countries and social arenas. Facebook,
through its director Mark Zuckerberg, made apologies – in television interviews, on
social media, and through full-page print adverts – for the situation, and for not
having done more to amend it earlier. The company pledged to change its policy to
prevent similar things from happening in the future. In a testimony before the US
Congress, Zuckerberg admitted it to be his personal mistake that he hadn’t done
enough to prevent Facebook from being used for harm. But the damage was already
done. Cambridge Analytica was widely believed to have played an important role for
the outcomes of the US Presidential Election in 2016, the UK Brexit vote in the
same year, as well as a large number of other elections around the world in recent
times.
The effectiveness of psychometric targeting and the precision by which it can be

applied to voter behaviour has been questioned by many experts (Simon, 2018). But
regardless of this, the Cambridge Analytica data scandal left large numbers of
users feeling that Facebook had still messed up by not doing enough to protect the
integrity of their data. There was an increased awareness of the huge dangers to
democracy involved.
Facebook has been criticised and boycotted by both individuals and groups since its
early days in the mid-00s. This has been part of a broader sentiment against the
fact that alongside the participatory boost created by social media, ‘monopoly
power, commercialization, and commodification are on the rise as well, with just a
handful of social media platforms dominating the social web’ (Lovink, 2013, p. 10).
The commercial and monopolistic aspects alongside the question of privacy and
integrity have been the main targets of the criticism. Much of this goes for other
social media platforms, such as Twitter and Instagram (the latter now owned by
Facebook) as well, but due to its widespread use, Facebook has been the main target
of sceptics and agitators. There have been attempts to create decentralised and
idealistic alternatives such as diaspora* (Bielenberg et al., 2012), and the ‘anti-
Facebook’ platform Ello (Kleinman, 2014), and voices have been raised in repeated
calls to leave or boycott Facebook. Research has shown that concerns about privacy
and data misuse are among the top reasons that users choose to leave the platform
(Baumer et al., 2013). There have been waves of anti-Facebook opinion coming and
going. For example, there was a period in 2009 when Facebook refusal became a trend
(Karppi, 2011), and 2010 saw the ‘Quit Facebook Day’ campaign, which urged users to
sign an online pledge in which they committed to deleting their accounts in protest
at the platform’s exploitation of its users (Portwood-Stacer, 2013).
And so, in March of 2018, with the Cambridge Analytica revelations, the anti-
Facebook sentiments were propelled to new heights, especially through the use of
the #deletefacebook hashtag. This case of hashtag activism, while preceded by the
developments outlined above, marked the emergence of a new, broader, political
consciousness around the problems with Facebook.
Data
An empirical interest in the case of #deletefacebook formed the basis for

collecting a dataset consisting of around 797,000 tweets, posted between 17 March
(when the news story about the scandal broke) and 17 April of 2018. This was done
through a strategy of combining queries for the hashtag in question to Twitter’s
Search and Streaming APIs respectively. The data was processed to remove retweets,
as well as to keep only those tweets that were posted in the English language.
After those procedures were applied, the dataset consisted of approximately 294,000
tweets.
In the spirit of Glaser (1978, p. 44), who argues that the early decisions should
not be ‘based on a preconceived framework of concepts and hypotheses’, the next
step, once we have framed an empirical case such as the one above, is to approach
the data with an open and exploratory mindset aiming gradually to discover and
stabilise a frame of interpretation (1978, p. 51). The researcher must patiently
‘accept nothing until something happens’ (1978, p. 18). Making something happen is
the next level in Figure 7.1, the computational method, by which a suitable stick
is used to start beating the data (cf. the data piñata of Chapter 1), in order for
insights to emerge. The framework presented here does not say anything about which
such method to choose in a given research context. The response to which method
should be chosen is manifold.
On the one hand, the spirit of this book encourages that one starts with any method
that one feels like trying out on the current data and goes from there,
experimenting and exploring. On the other hand, just choosing methods at random is
neither completely viable nor even possible. Just as the researcher has a
theoretical sensitivity, she also has a methodological inclination that will guide
in choosing, or guessing, which approach is likely to bring out something
interesting from the data at hand, given its topic and character.
But then thirdly, there is the issue of the research question. If the research is
about emotions in language-use, some data science methods will be more suitable
than others. If it is about status relationships in a social network, other methods
will be more likely, and so on. In the purely explorative vision of Glaser – where
the research question is always the same: What is going on here? – this does not
apply. But in most actual research scenarios, there will exist an empirical
knowledge-interest guiding choices like these.
Methodological inclination: Computational text analysis
For our illustrative case analysis, we shall use an analytical tool called LIWC
(pronounced: luke), to process the dataset of #deletefacebook-tweets. LIWC
(Linguistic Inquiry and Word Count) is a computational text analysis method that
was developed by social psychologist James W. Pennebaker and colleagues as a
strategy for analysing a variety of structural, emotional, and cognitive elements
in verbal and written language (Pennebaker et al., 2015; Tausczik and Pennebaker,
2010). The method was devised based on the idea that, through their everyday
language-use, people express rich information about what they think, their
personalities, social relationships, beliefs, fears, and so on. Coincidentally
then, this method is really not unlike the one that Cambridge Analytica themselves
experimented with. The practice of measuring psychological states through content
analysis goes back to the Gottschalk-Gleser method. Put forth by psychiatrist Louis
Gottschalk and psychologist Goldine Gleser in the 1969 book The Measurement of
Psychological States Through the Content Analysis of Verbal Behavior, the approach
builds on the assumption that studying language-use is the key to knowledge about
the speaker’s, or writer’s, mental state. Gottschalk and Gleser (1969, p. 6) looked
at the process from their psychological standpoint:
The psychologist sees language, spoken or written, as a variety of learned human

behavior, like maze-running and lever-pushing. There are sets of factors
influencing the learning process, and, characteristically, the psychologist’s
question is: ‘What factors are operating to cause the speaker to say this at this
time?’
As this fifty-year-old citation clearly shows, they saw people as mere lab rats who
used language to express certain responses triggered by certain stimuli. While this
view of the individual is quite far removed from that which is common in more
constructionist perspectives, such as discourse analysis, the same logic sort of
works. In his programmatic lecture on ‘The Order of Discourse’, Michel Foucault
argued that ‘in every society the production of discourse is at once controlled,
selected, organised and redistributed according to a certain number of procedures’
(Foucault, 1971, p. 8). These procedures, he said, remove large portions of the
free agency of the individual:
We know perfectly well that we are not free to say just anything, that we cannot
simply speak of anything, when we like or where we like; not just anyone, finally,
may speak of just anything.
(Foucault, 1971, p. 8)
All forms of speech, Foucault argued, are linked to power. A society, to hold
together, demands certain narratives that are ‘told, retold and varied; formulae,
texts, ritualised texts to be spoken in well-defined circumstances’ (Foucault,
1971, p. 12). So, whether one would adopt the psychological stance of analysing
speech to diagnose the mental state of an individual, or the discursive stance of
analysing speech to diagnose the state of power and knowledge in a given socio-
cultural setting, the approach of LIWC still holds. It is about using language as
an indicator of underlying structures, whatever they may be.
Technically speaking, LIWC consists of an elaborated dictionary including around

6,400 entries with groups of words that have been designed to tap into different
domains, such as for example positive emotion, time orientation, and drives. This
dictionary can form the basis for a computational text analysis strategy where
documents (tweets, status updates, blog posts, or similar) are automatically
assessed for which of the categories they match. The output, covering about 90
variables, tells which documents are associated with which social psychological
categories.
Within many of the categories, there are also subcategories, such as ‘past focus’,
‘future focus’, and so on (within the time orientation category), or ‘anxiety’,
‘anger’, and so on (in the negative emotion category). Furthermore, each dictionary
word can belong to more than one category. A word such as ‘cried’, for example is
part of five different categories: ‘sadness’, ‘negative emotion’, ‘overall affect’,
‘verb’, and ‘past focus’. This means that for any document containing ‘cried’, the
document’s score for each of those five categories would be incremented (Pennebaker
et al., 2015, p. 2). The creators of LIWC recommend an approach where one for each
analysed document, calculates what percentage of its total words that match each
dictionary category. We can then say things like ‘The tweets of this user are 16.2%
angry’, or ‘The Facebook status updates of that user are 4.2% anxious.’
The 297,000 tweets that were collected for this chapter’s case analysis were sliced
according to the dimension of time, joining all tweets that were posted within one
and the same hour into one document. This is because Pennebaker and colleagues
argue that the larger the document, the more reliable the result, and suggest
steering clear of any text with a word count below fifty. To be able to see where
the mood of the #deletefacebook tweets differed from tweets in general, a reference
dataset of 800,000 random English language tweets were collected. Both sets of
tweets were tagged using LIWC to get an overview of where #deletefacebook discourse
stood out from Twitter discourse in general. The code and workflow used is
available at github.com/simonlindgren/datatheory-deletefacebook-liwc.
Returning now to the process described in Figure 7.1: once the computational method
has been chosen, whichever it may be, its prescribed steps and measures will define
how to proceed with the analysis. Sometimes, applying one data science method is
enough, sometimes using more than one will bring out more interesting ideas, and
sometimes one has to try, and discard, several of them until finding a viable and
useful approach for the current data. Note that this chapter only uses one
methodological iteration for its example, while in reality there will often be a
need for several iterations of the same or other methods before one could aspire to
having provided a thick description. As explained by Glaser (1978, p. 22):
creativity is cyclical and multi-leveled and […] it feeds back in and upon itself
in order for the generation of ideas from data to occur.
Such ideas, that are gradually and iteratively, hopefully, generated from the data
should, as discussed earlier, be met by theoretical sensitivity (drawing on the
fact that the researcher is ‘steeped’ in literature), while the new data is
constantly interpreted and re-interpreted. This is a process of ‘theoretical input–
output’ (Glaser, 1978, p. 18), or as it can be put in more digital nomenclature:
theoretical I/O. The researcher, by having theoretical training and knowledge
should have the tools both to understand the emerging patterns in the light of
existing theories in the field, and to formulate new ideas.
Theoretical sensitivity: Input
The aspect of the framework (cf. Figure 7.1) which is the most unwieldy to
exemplify is that of theoretical sensitisation. This is because one of its key
elements, as repeatedly underlined earlier, is that the researcher is broadly
oriented in social theory. Such an orientation will provide the researcher with a
theoretical toolkit – a set of conceptual and interpretive resources to make use of
depending on where the analysis is headed. The more elaborate and multifaceted such
a toolkit is, the better. But just for the sake of this case illustration, let us
suppose that a crucial part of the sociological toolkit for a researcher taking on
a study of #deletefacebook would be critical theory in the vein of Marx. Such a
perspective, that covers issues of power, capitalism, and control is close at hand
in relation to this empirical case and how it has been popularly framed.
Karl Marx’s critical theory of the capitalist society is indeed one of the most
classic social theories, and even though it was developed to understand the
emerging industrial society and its developments in the 1800s, it is still highly
relevant and useful today, also in the digital and datafied context (Fuchs and
Mosco, 2016). A key focus for Marx is on critiquing domination and exploitation in
society. His theories aim to lay bare, and question, such thoughts and practices
that contribute to creating and maintaining situations where some groups in society
benefit in various ways at the expense of others. This can be, for example, by
owning property or by having means to force others to work to produce some type of
value for them. Such relations, Marx argued, make people alienated and create an
undemocratic society.
As social media scholar Christian Fuchs (2017, p. 12) argues, Marx’s social theory
has become even more relevant following the capitalist crisis that started in 2008,
and its aftermath and later developments. Sociologist Manuel Castells (2010, p.
502) argues that digitalisation:
does not imply the demise of capitalism. The network society, in its various
institutional expressions, is, for the time being, a capitalist society.
Furthermore, for the first time in history, the capitalist mode of production
shapes social relationships over the entire planet.
Similarly, media scholar Nick Dyer-Witheford (1999, p. 2) contends that ‘the

information age, far from transcending the historic conflict between capital and
its labouring subjects, constitutes the latest battleground in their encounter’.
Such arguments, of course, make theories about the capitalist society seem very
useful for analysing political practices on the internet.
The Glaserian theoretical I/O approach advocated through Figure 7.1 lies in between
deduction (just moving from the theory to the observations) and induction (simply
going from the observations to theory). Some methodological literature uses the
concept of ‘abduction’, referring to the writings of Charles Sanders Peirce (1878),
for this type of reasoning. Grounded theory scholar Ian Dey (2007, p. 91) explains
that:
abduction relates an observation to a theory (or vice versa), and results in an

interpretation. Unlike induction, theory in the case of abduction is used together
with observation, in order to produce an interpretation of something specific,
rather than to infer a generalization. Unlike deduction, the result does not follow
logically from the premises: abduction offers a plausible interpretation rather
than producing a logical conclusion. Using abductive inference is thus a matter of
interpreting a phenomenon in terms of some theoretical frame of reference. This can
be one of several possible interpretations, depending on the theory we adopt. If it
is any good, this theory will offer new insights that help to explain some aspect
of the phenomenon under investigation.
In other words, abduction is both about using data to test theory (as in
deduction), and about generating theory from data (as in induction), and to do so
iteratively (as in Figure 7.1). However complicated this may sound, it is
essentially about being aware of existing theories, keeping an open mind about
them, and remaining open to renegotiating or replacing them based on insights
garnered from data. This is illustrated by the two-way arrow at the top of Figure
7.1, indicating that the theoretically sensitised researcher draws on social theory
in old, negotiated, as well as potentially new renditions. This is done through
iterative interpretation (repeating the three-tiered arrow in Figure 7.1).
Theoretical sensitivity: Output
Figure 7.2 gives a general overview of some of the language categories tagged by
LIWC. The bars in the figure show normalised positive and negative scores that
reflect points of difference between the reference dataset of random tweets, and
the #deletefacebook tweets. Before being normalised to a range between negative 1
and 1, the scores were calculated based on differences between the mean values for
each category in the two respective sets of data. Figure 7.2 has been simplified to
show only some of the most interesting patterns among the many categories in LIWC.
Obviously, also that type of selection is part of the analytical process.
Figure 7.2 Prominent content dimensions, based on LIWC, in #deletefacebook tweets.
First, looking at the bars with negative scores at the top of the figure, we can
see how the tweets in the reference dataset stand out. They do so by representing
the genre of ‘tweets in general’. They have a personal, often affective language
which is informal and socially oriented. They also have a tendency, especially in
comparison with the #deletefacebook tweets, to score higher on the dimensions of
‘assent’ and ‘posemo’, hinting that they are often affirmative, positive, and
consensus oriented.
Second, if we approach the bars that have positive scores, while being
theoretically sensitised by Marxist critical theory, several patterns resonate well
with such a perspective. The #deletefacebook tweets score comparably high on a set
of dimensions (marked with ‘1’ in Figure 7.2) that validate the relevance of a
Marxist reading of the Cambridge Analytica affair and its aftermath. Among such
dimensions are those that are indicative of a strong collective discourse, talking
in terms of power, money, and numbers. Furthermore, setting the stage by scoring
comparably high on discourse that uses a language of us and them, and agitating
through expressing negative emotion and anger. Looking closer, there are also
indications that the more social psychological dimensions of Marx’s theory of
alienation are present (cf. anxiety and feelings).
We cannot simply confirm the Marxist theory and leave it at that, however. Beyond
using that theory to characterise or explain #deletefacebook on Twitter, closer or
alternative readings of the same structure of categories can lead to insights about
how to move on. One, out of many possible analytical ideas in this case, could be
to explore the affective dimension more deeply. Because even if there is such a
dimension in the small part of Marx’s writing that deals with the notion of
alienation, the data tell us something more here. There are in fact a significant
number of dimensions (marked with ‘2’ in Figure 7.2) that push the interpretation
further towards the insight that affective language, and potentially underlying
emotions among users, is of crucial importance here. In the #deletefacebook tweets,
there is not only the anger, anxiety and negative emotion that we interpreted above
as driving forces for a collective anti-capitalist reaction. There is much more to
it. The discourse obviously scores high on criteria related to feelings more
generally, but also to sadness and to physical and bodily sensations. Preliminary
close readings of individual tweets also confirm this, as users tweet about being
‘sickened’, ‘panicking’ and so on.
Making readings like these of the data science results constitutes the first small
steps toward new discoveries, productive analyses and tentative emergent theories.
Is Marx enough here? Can we continue this research to formulate theory about a new
form of data alienation? Should we turn to theories about affect, rather than to
Marx? And how much of what we see can such theories in turn explain? Are there
other explanations for the patterns that we have identified? Are we witnessing an
effect of social media communication in general being marked by an overly affective
form of language-use? And so on.
Developing and maintaining theoretical sensitivity is about knowing which parts of
the data will be the most important in developing the analysis. For Glaser (1978,
p. 5), this is about identifying what is ‘the most interesting’ but without
becoming lazy and retorting to ‘pet ideas’. While there is never any guarantee that
the researcher can avoid such pitfalls, the vision and goal is to not shy away from
understanding the data in terms that are recognisable, but still to remain open to
constantly modifying the concepts. The guiding principle for this type of
analytical work is to try to develop concepts that are both analytic –
‘sufficiently generalized to designate characteristics’ of something larger than
the individual observation. The concepts should also aim to be sensitising, so that
they can help ‘yield a “meaningful” picture, abetted by apt illustrations that
enable one to grasp the reference’ (Glaser and Strauss, 1967, pp. 38–9).
The notion of sensitising concepts was coined in the 1954 paper ‘What is Wrong with
Social Theory?’ by Herbert Blumer, who wanted to discuss the role of social theory
in empirical science. Blumer said that ‘the aim of theory in empirical science is
to develop analytical schemes of the empirical world with which the given science
is concerned’ (Blumer, 1954, p. 3). But, in order to avoid that social theory
simply generates ‘an interpretation which orders the world into its mold’ we must
alternatingly ‘guide research by theory’, and ‘assess […] theoretical propositions
in the light of empirical data’ (Blumer, 1954, p. 4). For Blumer, concepts are the
only means of establishing the necessary connection between the theory and data.
Further, he argued, that the concepts to be used in research should never be
definitive, but rather sensitising:
A definitive concept refers precisely to what is common to a class of objects, by

the aid of a clear definition in terms of attributes or fixed bench marks. This
definition, or the bench marks, serve as a means of clearly identifying the
individual instance of the class and the make-up of that instance that is covered
by the concept. A sensitizing concept lacks such specification of attributes or
bench marks and consequently it does not enable the user to move directly to the
instance and its relevant content. Instead, it gives the user a general sense of
reference and guidance in approaching empirical instances. Whereas definitive
concepts provide prescriptions of what to see, sensitizing concepts merely suggest
directions along which to look. The hundreds of our concepts – like culture,
institutions, social structure, mores, and personality – are not definitive
concepts but are sensitizing in nature. They lack precise reference and have no
bench marks which allow a clean-cut identification of a specific instance and of
its content. Instead, they rest on a general sense of what is relevant.
(Blumer, 1954, p. 7)
This describes the conceptual openness by which the analysis should be moved
forward. In the case of the data theory approach, we let our computational data
science method help us get the ‘general sense of what is relevant’, and point out
where and how to look when approaching more data and trying other concepts.
Returning to Glaser, this means that we should treat everything as data. Alongside
the conventional empirical data, theories of our own and others are also part of
the input.
Conclusion: Theory/Data
Internet researchers danah boyd and Kate Crawford argued in a seminal 2012 paper
about posing critical questions to big data that working with large-scale social
datasets is still largely subjective: these data are no different from other
research data in the sense that they are most of the time flawed, incomplete,
biased, and – most importantly – data is imagined as such in different ways in
different disciplines and contexts: as evidence, as information, as the truth, and
so on. Add to this that most arguments about the possibilities of big data rely on
the wrongful assumption that it can replace social science altogether by producing
true knowledge about society and social life in its totality through ‘N=all’-
studies (boyd and Crawford, 2012). The possibility of not working with sampling,
but actually gathering data for something that gets close to having ‘everything’,
significantly changes the playing field for social science, but even the evangelic
big data literature comes full circle to the Weberian insight about not being able
to grasp society in all of its complexity. As argued by Mayer-Schönberger and
Cukier (2013, p. 197):
What we are able to collect and process will always be just a tiny fraction of the
information that exists in the world. It can only be a simulacrum of reality, like
the shadows on the wall of Plato’s cave. Because we can never have perfect
information, our predictions are inherently fallible. This doesn’t mean they’re
wrong, only that they are always incomplete. It doesn’t negate the insights that
big data offers, but it puts big data in its place – as a tool that doesn’t offer
ultimate answers, just good-enough ones to help us now until better methods and
hence better answers come along. It also suggests that we must use this tool with a
generous degree of humility . . . and humanity.
Such humility, I argue, can be achieved through working in the interpretive spirit
that I have advocated in this book, where the computational data science methods
are embedded in the established body of social theory, as well as driven by an
interpretive sensibility pointing forward to new theorising. But it is not only
good for data science to be influenced by social theory, there is also a need to
digitise social theory. As argued by digital sociologist Steffen Roth (2019), big
data can be harnessed to inform dominant theories through abductive reasoning. He
contends that a ‘digitally transformed social theory’ is needed to fully understand
digitally transformed society. In order to understand the digital society, based on
the logic of computers, computation, and computer networks, we must sometimes
reduce and unfold the complexity of analogue social theories – that is, theories
that were formulated in and for a pre-digital society. This, Roth writes, should be
done using a principle of transposition, where old theories are translated into the
new reality which requires ‘digital theories’ that can function as ‘agile matrices’
(Roth, 2019, p. 90). Data theory must be flexible and adaptive in a way that
‘reduces the risk of logical deadlocks without limiting the scale and scope of
theoretical explorations’ (Roth, 2019, p. 93).
Brutalising theory
Throughout this book, I have presented various anchor points in the writings of
others for the data theory approach. Examples of such points are John Law’s notion
of moving to a point ‘after method’, Richard Rogers’ ideas about following ‘methods
of the medium’, Thomas McLaughlin’s arguments concerning the value of ‘vernacular
theory’, Michel Foucault’s distinction between ‘global theory’ and ‘local theory’,
and Omar Lizardo’s discussion about ‘the end of theorists’. Some of these
perspectives could be misconstrued and caricatured as saying that we should do
research however we like, using any method and cherry-picking bits and pieces of
theories to our hearts’ content. This sounds fun and stimulating, but also begs the
question as to how far the renegotiations and eclecticisms can be pushed.
It could be argued, for example, that my treatment of Bourdieu’s concepts in

Chapter 6 were in fact rather brutal. Using his theory of field, capital, and
habitus in the way that I did meant taking the freedom of bluntly changing things
about so that they aligned with what I was able to do with the data at hand. It may
seem disrespectful to the complex and elaborate work of the original theorist
simply to downright decide things like equating ‘retweets’ with ‘cultural capital’,
or ‘followers’ with ‘economic capital’. What is the use, one might ask, of even
speaking in terms of theory if we are to strip it down opportunistically like this?
Indeed, this type of stripping down is not straightforward. It is by no means given
that the operationalisations of Bourdieu’s concepts that were presented in my
example were the best ones to use – even in relation to my empirical case. There is
no doubt a high degree of selectivity and arbitrariness to such exercises. So why
then bother with hybrid approaches that can never be robustly validated?
The reason to me is obvious. Without drawing on the prior research and theorising
of social scientists – Bourdieu in this case – we would not have been able to tease
out such an elaborate picture of social media influence based on the raw data that
we had. Indeed, statisticians would have been able to do useful things based on
causal analysis and the like. Network scientists could have dug into questions of
contagion, transmission, and prestige, using their own techniques and metrics.
Mathematicians and computer scientists could have engaged in advanced modelling,
and so on and so forth. But this book is about the contribution of interpretive
social science in the age of datafication. And, in relation to that agenda, it is
quite striking to see how a seemingly over-simplified – even erroneously simplified
– version of Bourdieu’s theory helped us to pose analytical questions and order our
data in ways that brought out interesting preliminary results, as well as points to
pertinent further questions to be analysed. This process was illustrated in Chapter
6 through my practice of (1) knowing Bourdieu; (2) putting Bourdieu within citation
marks; and (3) suggesting new concepts in the vein of Bourdieu, based on data work.
Still, there is reason to discuss the issue of how far theories can bend and how
much damage we should allow ourselves to do to them. Don’t we risk losing
interpretive power, as nuance gets lost while the theory is potentially reduced to
a far too simple set of ideas, definitions, or categories?
An interesting contribution to this discussion comes from sociologist Kieran Healy

(2017), who argues that this is not the case, when even writing that ‘nuance is not
a virtue of good sociological theory’. It has generally been popular in social
analysis to focus on providing theoretical and analytical nuance. In relation to
complex and multifaceted research problems it appears virtuous to try to see all
the subtle differences, rather than to boil down and simplify things. Healy’s
argument however is that embracing the complexity, simply accepting that things may
be either/or, is in fact often easier than to try to cut through it. Therefore, we
must not confuse this type of theoretical ‘simplification’ with laziness or
insensitivity. Isn’t it the point, after all, of science and research to be able to
say something meaningful about social reality more generally, rather than devoting
all of our energy to unravelling particularities and exceptions? Healy argues that,
especially for the types of problems that social analysis is facing today,
demanding too much nuance will often hinder rather than enable researchers to
develop results and theories that are interesting, generative, and useful (Healy,
2017, p. 118). He writes that:
When faced with a problem that is hard to solve, a line of thinking that requires
us to commit to some defeasible claim, or a logical dilemma we must bite the bullet
on, the nuance-promoting theorist says, ‘But isn’t it more complicated than that?’
or ‘Isn’t it really both/and?’ or ‘Aren’t these phenomena mutually constitutive?’
or ‘Aren’t you leaving out [something]?’ or ‘How does the theory deal with agency,
or structure, or culture, or temporality, or power, or [some other abstract noun]?’
This sort of nuance is, I contend, fundamentally antitheoretical. It blocks the
process of abstraction on which theory depends, and it inhibits the creative
process that makes theorizing a useful activity.
(Healy, 2017, p. 119)
From this perspective, then, what we are doing when seemingly brutalising
Bourdieu’s theoretical concepts, can be construed as a form of abstraction. We
throw away detail and particulars in the name of being able to produce
operationalisations and analyses that take our interpretive work forward, and that
may ideally be helpful for society at large. Healy reminds us of the fact that
abstraction, as well as theory-building, is quite a risky practice, especially if
those who are experimenting – pushing some definitions, ideas, or arguments, for a
while to see where it ends up – are constantly questioned. Quite contradictory,
asking for more nuance may lead to a situation where we end up with less clarity.
Adding to this point made by Healy, Lizardo (2015) has made the important argument
that social theory and theorising is changing today. Similar to McLaughlin’s notion
of ‘vernacular theory’, which was discussed in Chapter 4, Lizardo argues that:
we should begin to move away from our obsession with theory as a finished product
or as canon of works and towards a conception of theorizing as a creative activity.
One of the grave dangers that I see today is the continuing survival of an approach
to theory that conceives of the theory field as an ‘aristocracy of theorists’
ruling over mere empirical under-laborers.
(Lizardo, 2015, p. 5)
The point here is, again, that we have left the era of sacred theories formulated
by free-floating intellectuals for mere mortals to use and apply according to a
read-only logic, where the tool cannot be altered. Still, as is another argument
presented in this book, we can still use existing theories as building blocks or
raw materials in our own theorising. As argued by Barney Glaser, the interpretive
social researcher must have done her homework. She should be ‘steeped in the
literature’ (Glaser, 1978, p. 3), as this will increase the openness and
sensitivity of the analysis. This is what Lizardo (2015, p. 6) also argues when
writing that ‘In each generation, prospective theorists first become proficient at
the consumption of dead theoretical labor, so that they may later join the select
group of people in charge of producing living theory.’
Returning to Bourdieu, he himself was in fact a true advocate of the kind of

eclecticism demanded by a data theory approach. His theory of practice, as
developed in Outline of a Theory of Practice (1977) and The Logic of Practice
(1992), was developed as a result of the inadequacies that he saw in structuralist
anthropology when it came to understanding and accounting for the strategic and
practical dimensions of social interaction (Webb, Schirato, and Danaher 2002, p.
2). It was because of this that he developed his theory in a way that was designed
to make use of both theoretical and empirical methodologies as inseparable parts of
one and the same research strategy. Because of this, Webb, Schirato, and Danaher
(2002, p. 4) argue, his contribution does not align either with the positivist
Anglo-American social science that is light on theory, or with the areas of
cultural studies, literature, and philosophy, which shy away from data matrices.
Sociologist Loïc Wacquant describes the contribution of Bourdieu’s ‘oeuvre’, by
stating that it:
throws a manifold challenge at the current divisions and accepted modes of thinking
of social science by virtue of its utter disregard for disciplinary boundaries […]
and its ability to blend a variety of sociological styles, from painstaking
ethnographic accounts to statistical models, to abstract metatheoretical and
philosophical arguments. More profoundly, though, the unsettling character of
Bourdieu’s enterprise stems from its persistent attempt to straddle some of the
deep-seated antinomies that rend social science asunder, including the seemingly
irresolvable antagonism between subjectivist and objectivist modes of knowledge,
the separation of the analysis of the symbolic for that of materiality, and the
continued divorce of theory from research.
(Wacquant, 1992, p. 3)
Bourdieu’s vision, as described by Wacquant, itself then urges forward the kind of
disobedient applications of which Chapter 6 is an example. This description also
summarises the vision that data theory wants to bring into the future of social
science: utterly disregarding disciplinary boundaries, blending sociological styles
across ethnography and statistics, to metatheory, and to straddle subjectivism and
objectivism.
Notes on ethics
Throughout the entire book, I have evaded the important issue of research ethics in
the datafied society. This choice was made in order to be able to focus on the
analytical logic of the examples without getting lost in the ethical complexities.
I have however made ethical considerations. The covfefe tweets in Chapter 3 – apart
from those by Trump – have been paraphrased to make it more difficult to look up
the users behind them. No individual users are identifiable in the Reddit analysis
(Chapter 5), nor in the Twitter analysis in Chapter 7. A few public figures are
identifiable in the analysis of tweets in Chapters 4 and 6.
The emergent and volatile character of the field of digital social research, which
is in a perpetual state of taking shape, makes it impossible to escape ethical
dilemmas. It is indeed a good starting point to adhere to general principles about
maximising the benefits and minimising the harm of research – and about respecting
fundamental rights of human autonomy, dignity, and safety. There is no doubt
however, that self-published data, social media data, and other online data traces,
demand that the researcher constantly navigates the data environment and makes
choices in a critically reflective way. Guidelines for digital research ethics are
emerging and remain subject to healthy debate (Bassett and O’Riordan, 2002; franzke
et al., 2020).
More generally, and beyond the scope of this book, there is not only a need to
develop a theoretical sensitivity but also an ethical one. Carrying out social
research in the age of datafication demands continuous critical reflection. This is
because, just like in the case of previous definitions of ‘qualitative’ versus
‘quantitative’, many other established and agreed principles for research are also
cast in new light. This relates to new relationships between what is defined as
private and public space, and to new expectations for when and how one is expected
to be observed by researchers (Trepte and Reinecke, 2011). It also concerns
questions of when and how informed consent is needed from research subjects
(Williams, Burnap, and Sloan, 2017). Furthermore, using social media data and other
online data traces, awakens debates about research subjects’ right to be forgotten
(Tirosh, 2017), how to adequately anonymise digital social data (Eynon, Fry, and
Schroeder, 2008), and how data should be properly stored (Giglietto and Rossi,
2012).
Pointing out directions for new ethical frameworks in this spirit, scholars such as
Charles Ess have made strong arguments for understanding ethics through its
contextual relations to societal and technological change (Ess, 2015). There is a
need to develop ethical frameworks that can account for the processual and messy
nature of research, and the contingent and incremental ways in which knowledge is
produced (Ess, 2017; Markham, 2013). In digital social research, we must work from
the assumption that ethics should be part of the analysis, rather than something
that gets applied to it as separate or after-the-fact. Working in that way will
contribute to a more interdisciplinary, open, and generative approach to research
ethics in the age of datafication (Pink and Lanzeni, 2018). There is a need for
‘ethical pluralism’ and for ‘doing ethics’ beyond ‘simply picking a set of
principles, values, etc., and then applying these in a largely deductive,
algorithmic manner to a problem at hand’ (Ess, 2013, p. 200). This requires
thinking through ethics in relation to given problems, and making practical
judgements in ‘an ongoing effort to analyze and reflect on both familiar and new
experiences and problems’ (Ess, 2013, p. 200).
Final remarks
It is now time to bring together the different strands and strings of analyses and
insights that I have left scattered throughout the discussions and case studies
presented in the preceding chapters. The general ideas and arguments that I have
introduced thus far are that:
Society is becoming increasingly datafied. This means that today, social

researchers encounter an increasingly complex data environment, where conventional
research data in the form of interviews, observations, and surveys is supplemented
and sometimes replaced with unsolicited data in the form of self-published data,
found data, data traces, and social media data.
As a consequence, significant parts of the field of social research are also

becoming datafied. Through the popularisation of computational social science and
digital humanities, new modes of collecting and analysing data seep into these
fields without any extensive ethical or epistemological effort being made.
In addition to this, the digitalisation of social life in present-day society has

made the objects of study of sociology – social facts, social actions, and social
forms – appear in partly new shapes. Social formations are more complex,
interaction is documented in new ways, and the networked character of sociality in
general gives rise to new politicised phenomena such as internet memes, online
virality, trolling, and so on.
Facing the challenges posed by these transformations (1–3) we, as digital social
researchers, must dare to break down existing epistemological divisions in order to
find new and workable methodological strategies. I have especially underlined the
particular potential that is inherent in a strategy where interpretive sociology,
rather than just statistically influenced sociology, provides an interface between
social analysis and data science.
In a set of case-analysis chapters, I have offered experimental examples of the

approach that I advocate. These examples have used different types of data,
different computational techniques, different theoretical points of reference, and
different interpretive strategies. Taken together, however, the point has been to
illustrate how new social phenomena, new forms of data, and computational
analytical approaches that are relatively new to social science, can be
incorporated with an interpretive stance and ethos.
The book introduces the notion of data theory (hinting at its floating and
dualistic character as data, theory, data/theory, and theory/data) as a broad label
for a hybrid form of digital social research practice, that is data-intensive,
computational (‘quantitative’), yet theoretically interpretive (‘qualitative’).
In a piece for WIRED magazine in 2008, Chris Anderson argued that the deluge of
social data would mark the end of theory and make the scientific method as such
obsolete. In a world of ‘massively abundant data’, he wrote, there is no longer a
need to ‘settle for models’ at all (Anderson, 2008). His argument was that when we
have data ‘at the petabyte scale’, we can forget about ‘simple three- and four-
dimensional’ taxonomies. In what was probably a deliberately provocative fashion,
Anderson stated that:
This is a world where massive amounts of data and applied mathematics replace every
other tool that might be brought to bear. Out with every theory of human behavior,
from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows
why people do what they do? The point is they do it, and we can track and measure
it with unprecedented fidelity. With enough data, the numbers speak for themselves.
(Anderson, 2008)
But simply throwing out interpretations, models, and theories to let the data speak
for themselves is a horribly bad idea. In fact, it is just as bad as it would be to
put forward theories that have no root in reality. As data scientists Schutt and
O’Neil (2013, p. 25) argue: ‘It is wrong to believe either that data is objective
or that “data speaks”, and beware of people who say otherwise.’ This is why I have
argued in this book that a viable method for social analysis in the age of
datafication must have the best of both worlds: the gung-ho debauchery of the data
piñata approach, and the critically reflexive ethos and theoretical sensitivity of
sociology. Approaching not theory, not data, but data/theory – with an iterative
I/O setup – can ground data science, contextualise it in a lineage of social
analysis, and counteract the mythological belief that the bigger the data the less
the need for critical analysis and interpretation.
References
Ahmed, S. (2004). The Cultural Politics of Emotion. Edinburgh: Edinburgh University
Press.
Alexa (2019). Alexa Top 500 Global Sites. https://www.alexa.com/topsites
Anderson, C. (2008, 23 June). The End of Theory: The Data Deluge Makes the
Scientific Method Obsolete. WIRED. https://www.wired.com/2008/06/pb-theory/
Bakardjieva, M., Felt, M., and Dumitrica, D. (2018). The Mediatization of

Leadership: Grassroots Digital Facilitators as Organic Intellectuals, Sociometric
Stars and Caretakers. Information, Communication and Society, 21(6), 899–914.
Bassett, E. H. and O’Riordan, K. (2002). Ethics of Internet Research: Contesting

the Human Subjects Research Model. Ethics and Information Technology, 4(3), 233–47.
Bastian, M., Heymann, S., and Jacomy, M. (2009). Gephi: An Open Source Software for
Exploring and Manipulating Networks. In Third International AAAI Conference on
Weblogs and Social Media.
https://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154
Bauman, Z. (1991). Modernity and Ambivalence. Cambridge: Polity.
Baumer, E. P., Adams, P., Khovanskaya, V. D., Liao, T. C., Smith, M. E., Schwanda
Sosik, V., and Williams, K. (2013). Limiting, Leaving, and (Re)Lapsing: An
Exploration of Facebook Non-use Practices and Experiences. In Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems (pp. 3257–66). New York,
NY, USA: ACM.
Beer, D. (2016). Metric Power. London: Palgrave Macmillan.
Benkler, Y. (2006). The Wealth of Networks: How Social Production Transforms

Markets and Freedom. New Haven: Yale University Press.
Bennett, W. L. and Segerberg, A. (2012). The Logic of Connective Action: Digital

Media and the Personalization of Contentious Politics. Information, Communication
and Society, 15(5), 739–68.
Berger, P. L. and Luckmann, T. (1966). The Social Construction of Reality: A

Treatise in the Sociology of Knowledge. Harmondsworth: Penguin.
Berlinger, J. (2017). Covfefe: When a Typo Goes Viral. CNN.

https://www.cnn.com/2017/05/31/politics/covfefe-trump-coverage/index.html
Bershidsky, L. (2017, 31 May). After the ‘Covfefe’ Laughs Come the Security
Concerns. Bloomberg View. https://www.bloomberg.com/view/articles/2017-05-31/trump-
s-covfefe-tweet-should-raise-security-concerns
Beyer, J. L. (2014). Expect Us: Online Communities and Political Mobilization. New
York: Oxford University Press.
Bielenberg, A., Helm, L., Gentilucci, A., Stefanescu, D., and Zhang, H. (2012). The
Growth of Diaspora – A Decentralized Online Social Network in the Wild. In 2012
Proceedings IEEE INFOCOM Workshops (pp. 13–18).
Blei, D. M. (2012). Topic Modeling and Digital Humanities. Journal of Digital

Humanities, 2(1), 8–11.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet Allocation.
Journal of Machine Learning Research, 3 (January), 993–1022.
Bleuler, E. (1911). Vortrag von Prof. Bleuler-Zürich über Ambivalenz. Zentralblatt

für Psychoanalyse (1), 266–8.
Blumer, H. (1954). What is Wrong with Social Theory? American Sociological Review,
19(1), 3.
Borchers, C. (2017, 31 May). Is ’Covfefe’ Just Another Distraction? Washington

Post. http://www.washingtonpost.com/video/politics/is-covfefe-just-another-
distraction/2017/05/31/57a5752e-461c-11e7-8de1-cec59a9bf4b1_video.html
Boudon, R. (1988). Will Sociology Ever Be a Normal Science? Theory and Society,
17(5), 747–71.
Bourdieu, P. (1977). Outline of a Theory of Practice. Cambridge: Cambridge

University Press.
Bourdieu, P. (1984). Distinction: A Social Critique of the Judgement of Taste.

London: Routledge.
Bourdieu, P. (1986). The Forms of Capital. In J. Richardson (ed.), Handbook of

Theory and Research for the Sociology of Education (pp. 241–58). Westport, CT:
Greenwood Press.
Bourdieu, P. (1992). The Logic of Practice. Cambridge: Polity.
boyd, danah and Crawford, K. (2012). Critical Questions for Big Data. Information,
Communication and Society, 15(5), 662–79.
Bryman, A. (1984). The Debate about Quantitative and Qualitative Research: A

Question of Method or Epistemology? British Journal of Sociology, 35(1), 75–92.
Callon, M. (1986a). Some Elements of a Sociology of Translation: Domestication of

the Scallops and the Fishermen of St Brieuc Bay. In J. Law (ed.), Power, Action and
Belief: A New Sociology of Knowledge (pp. 196–233). London: Routledge and Kegan
Paul.
Callon, M. (1986b). The Sociology of an Actor-Network: The Case of the Electric

Vehicle. In M. Callon, A. Rip, and J. Law (eds), Mapping the Dynamics of Science
and Technology: Sociology of Science in the Real World (pp. 19–34). New York:
Palgrave Macmillan.
Callon, M. (1991). Techno-Economic Networks and Irreversibility. In John Law (ed.),

A Sociology of Monsters: Essays on Power, Technology and Domination (pp. 132–61).
Abingdon: Routledge.
Callon, M. (2001). Actor Network Theory. In N. J. Smelser and P. B. Baltes (eds),

International Encyclopedia of the Social and Behavioral Sciences (pp. 62–6).
Amsterdam ; New York: Elsevier.
Callon, M., Courtial, J.-P., Turner, W. A., and Bauin, S. (1983). From Translations
to Problematic Networks: An Introduction to Co-Word Analysis. Social Science
Information, 22(2), 191–235.
Carles, P. (2001). La Sociologie est un Sport de Combat [Film].

http://www.imdb.com/title/tt0271793/
Castells, M. (2001). The Internet Galaxy. Oxford: Oxford University Press.
Castells, M. (2009). Communication Power. Oxford: Oxford University Press.
Castells, M. (2010). The Rise of the Network Society. Chichester, West Sussex;
Malden, MA: Wiley-Blackwell.
Castells, M. (2015). Networks of Outrage and Hope: Social Movements in the Internet
Age. Cambridge: Polity.
Choudhury, M. D. and De, S. (2014). Mental Health Discourse on Reddit: Self-

Disclosure, Social Support, and Anonymity. In Proceedings of the Eighth
International AAAI Conference on Weblogs and Social Media (pp. 71–80). Association
for the Advancement of Artificial Intelligence.
Cillizza, C. (2017, 1 June). ‘Covfefe’ Tells You All You Need to Know About Trump.
CNN. https://www.cnn.com/2017/05/31/politics/donald-trump-covfefe/index.html
Costa, C. and Murphy, M. (2015). Conclusions: Method as Theory – (re)Exploring the

Intellectual Context of Education Research. In M. Murphy and C. Costa (eds), Theory
as Method in Research: On Bourdieu, Social Theory and Education (pp. 191–9).
Abingdon: Routledge.
Couldry, N. and Hepp, A. (2017). The Mediated Construction of Reality. Cambridge:

Polity.
Cukier, K. and Mayer-Schoenberger, V. (2013). The Rise of Big Data: How It’s
Changing the Way We Think About the World. Foreign Affairs, 92(3), 28–40.
Dawkins, R. (1976). The Selfish Gene. Oxford; New York: Oxford University Press.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R.

(1990). Indexing by Latent Semantic Analysis. Journal of the American Society for
Information Science, 41(6), 391–407.
DeLanda, M. (2016). Assemblage Theory. Edinburgh: Edinburgh University Press.
Deleuze, G. and Guattari, F. (1987). A Thousand Plateaus: Capitalism and

Schizophrenia. Minneapolis: University of Minnesota Press.
Denzin, N. K. and Lincoln, Y. S. (2005). Introduction: The Discipline and Practice

of Qualitative Research. In N. K. Denzin and Y. S. Lincoln (eds), The Sage Handbook
of Qualitative Research (pp. 1–32). Thousand Oaks, CA: Sage.
Dey, I. (2007). Grounded Theory. In C. Seale, G. Gobo, J. F. Gubrium, D. Silverman

(eds), Qualitative Research Practice (pp. 80–93). London; Thousand Oaks, CA: Sage.
Dijck, J. van, Poell, T., and Waal, M. de (2018). The Platform Society: Public
Values in a Connective World. New York: Oxford University Press.
Douglas, M. (2001). Dealing with Uncertainty. Ethical Perspectives, 8(3), 145–55.
Douglas, M. ([1966] 2001). Purity and Danger: An Analysis of the Concepts of

Pollution and Taboo. London: Routledge.
Duggan, M. and Smith, A. (2013). 6% of Online Adults are Reddit Users. Washington,
DC: Pew Research Center.
Durkheim, E. (1895). The Rules of Sociological Method. New York: Free Press.
Durkheim, E. ([1912] 1995). The Elementary Forms of Religious Life. New York: Free
Press.
Durkheim, É. and Mauss, M. ([1903] 1963). Primitive Classification. Abingdon, Oxon;

New York, NY: Routledge.
Dyer-Witheford, N. (1999). Cyber-Marx: Cycles and Circuits of Struggle in High-

Technology Capitalism. Urbana: University of Illinois Press.
Efron, B. and Hastie, T. (2016). Computer Age Statistical Inference: Algorithms,

Evidence, and Data Science. Cambridge University Press.
Erickson, J. (2008). Hacking: The Art of Exploitation. San Francisco, CA: No Starch
Press.
Ess, C. (2013). Digital Media Ethics. Cambridge; Malden, MA: Polity.
Ess, C. (2015). New Selves, New Research Ethics? In H. Fossheim and H. Ingierd
(eds), Internet Research Ethics (pp. 48–76). Oslo: Cappelen Damm.
Ess, C. (2017). A View from (the) AoIR (Foreword). In M. Zimmer and K. Kinder-
Kurlanda (eds), Internet Research Ethics for the Social Age: New Challenges, Cases,
and Contexts (pp. ix–xv). New York: Peter Lang.
explosion (2014). spaCy. https://github.com/explosion/spaCy
Eynon, R., Fry, J., and Schroeder, R. (2008). The Ethics of Internet Research. In
N. Fielding, R. M. Lee and G. Blank (eds), The SAGE Handbook of Online Research
Methods (pp. 19–37). Los Angeles, CA: Sage.
Feinberg, A. (2017, 6 October). How To Tell When Someone Else Tweets From
@realDonaldTrump. WIRED. https://www.wired.com/story/tell-when-someone-else-tweets-
from-realdonaldtrump/
Feyerabend, P. (1975). Against Method: Outline of an Anarchistic Theory of

Knowledge. London: NLB.
Firth, J. (1957). A Synopsis of Linguistic Theory 1930–1955. In F. Palmer (ed.),

Studies in Linguistic Analysis (pp. 1–32). London: Longman.
Foley, K. E. (2017). The Global March for Science Started with a Single Reddit
Thread. Quartz. https://qz.com/965485/the-global-march-for-science-started-with-a-
single-reddit-thread/
Foucault, M. (1971). Orders of Discourse. Social Science Information, 10(2), 7–30.
Foucault, M. (1972). The Archaeology of Knowledge. London: Routledge.
Foucault, M. (1980). Power/Knowledge: Selected Interviews and Other Writings, 1972–

1977. New York: Pantheon Books.
franzke, aline shakti, Bechmann, A., Zimmer, M., Ess, C., and the Association of
Internet Researchers. (2020). Internet Research: Ethical Guidelines 3.0.
https://aoir.org/reports/ethics3.pdf
Freeman, L. C. (1977). A Set of Measures of Centrality Based on Betweenness.

Sociometry, 40(1), 35–41.
Freud, S. (1997). Writings on Art and Literature. Stanford, CA: Stanford University
Press.
Fuchs, C. (2008). Internet and Society: Social Theory in the Information Age. New
York: Routledge.
Fuchs, C. (2017). Social Media: A Critical Introduction. Los Angeles, CA: Sage.
Fuchs, C. and Mosco, V. (2016). Marx in the Age of Digital Capitalism. Leiden:
Brill.
Garber, M. (2017, 31 May). Spicer’s Razor. The Atlantic.

https://www.theatlantic.com/entertainment/archive/2017/05/spicers-razor/528750/
Geertz, C. (1973). The Interpretation of Cultures: Selected Essays. New York: Basic
Books.
Gergen, K. J. (1985). The Social Constructionist Movement in Modern Psychology.

American Psychologist, 40(3), 266–75.
Giglietto, F. and Rossi, L. (2012). Ethics and Interdisciplinarity in Computational

Social Science. Methodological Innovations Online.
https://doi.org/10.4256/mio.2012.003
Glaser, B. G. (1978). Theoretical Sensitivity: Advances in the Methodology of

Grounded Theory. Mill Valley, CA: The Sociology Press.
Glaser, B. G. and Strauss, A. L. (1967). The Discovery of Grounded Theory:

Strategies for Qualitative Research. New York: Aldine de Gruyter.
Goldberg, J. (2017, 31 May). L’affaire Covfefe. National Review.

https://www.nationalreview.com/corner/covfefe-donald-trump-twitter-habits-white-
house/
González-Bailón, S. (2017). Decoding the Social World: Data Science and the
Unintended Consequences of Communication. Cambridge, MA: MIT Press.
Google (2013). Google Code Archive – Word2vec.

https://code.google.com/archive/p/word2vec/
Gottschalk, L. A. and Gleser, G. C. (1969). The Measurement of Psychological States

Through the Content Analysis of Verbal Behavior. Los Angeles, CA: University of
California Press.
Gramsci, A. (1971). Selections from the Prison Notebooks of Antonio Gramsci.

London: Lawrence and Wishart.
Grishman, R. and Sundheim, B. (1996). Message Understanding Conference-6: A Brief

History. In COLING 1996 Volume 1: The 16th International Conference on
Computational Linguistics (Vol. 1).
Hargittai, E. and Sandvig, C. (eds). (2015). Digital Research Confidential: The

Secrets of Studying Behavior Online. Cambridge, MA: MIT Press.
Harris, Z. S. (1954). Distributional Structure. WORD, 10(2–3), 146–62.
Healy, K. (2017). Fuck Nuance. Sociological Theory, 35(2), 118–27.
Hebdige, D. (1979). Subculture: The Meaning of Style. London; New York: Routledge.
Hern, A. (2018). Who Are the ‘Incels’ and How Do They Relate to Toronto Van Attack?
Guardian. https://www.theguardian.com/technology/2018/apr/25/what-is-incel-
movement-toronto-van-attack-suspect
Hesse-Biber, S. and Griffin, A. J. (2013). Internet-Mediated Technologies and Mixed

Methods Research: Problems and Prospects. Journal of Mixed Methods Research, 7(1),
43–61.
Hilgers, M. and Mangez, É. (2015). Bourdieu’s Theory of Social Fields: Concepts and
Applications. Abingdon, Oxon ; Routledge.
Huffaker, D. (2010). Dimensions of Leadership and Social Influence in Online

Communities. Human Communication Research, 36(4), 593–617.
Hutchinson, A. (2018). Reddit Now Has as Many Users as Twitter, and Far Higher
Engagement Rates. Social Media Today. https://www.socialmediatoday.com/news/reddit-
now-has-as-many-users-as-twitter-and-far-higher-engagement-rates/521789/
Ito, M. (2008). Introduction. In K. Varnelis (ed.), Networked Publics (pp. 1–14).

Cambridge, MA: MIT Press.
Jacoby, W. G. (1991). Data Theory and Dimensional Analysis. London: Sage.
Jenkins, H. (2006). Convergence Culture. New York: New York University Press.
Jenkins, H., Ford, S., and Green, J. (2013). Spreadable Media: Creating Value and
Meaning in a Networked Culture. New York: New York University Press.
Jones, S. (1999). Doing Internet Research: Critical Issues and Methods for
Examining the Net. Thousand Oaks, CA: Sage.
Kahn, R. and Kellner, D. (2008). Technopolitics, Blogs, and Emergent Media

Ecologies: A Critical/Reconstructive Approach. In B. Hawk, O. O. Oviedo, and D. M.
Rieder (eds), Small Tech: The Culture of Digital Tools (pp. 22–37). Minneapolis:
University of Minnesota Press.
Karppi, T. (2011). Digital Suicide and the Biopolitics of Leaving Facebook.

Transformations, 20(2), 1–18.
Kelleher, J. D. and Tierney, B. (2018). Data Science. Cambridge, MA: MIT Press.
Keuschnigg, M., Lovsjö, N., and Hedström, P. (2018). Analytical Sociology and
Computational Social Science. Journal of Computational Social Science, 1(1), 3–14.
Kincheloe, J. L. (2005). On to the Next Level: Continuing the Conceptualization of

the Bricolage. Qualitative Inquiry, 11(3), 323–50.
Kleinman, Z. (2014, 30 September). ’Anti-Facebook’ Platform Ello Attracts

Thousands. BBC News. http://www.bbc.com/news/technology-29409541
Koerber, B. (2014). The TL;DR Guide to Reddit Lingo. Mashable.

https://mashable.com/2014/03/10/reddit-lingo-guide/
Kozinets, R. (2015). Netnography: Redefined. Los Angeles: Sage.
Kruskal, J. B. (1956). On the Shortest Spanning Subtree of a Graph and the

Traveling Salesman Problem. Proceedings of the American Mathematical Society, 7(1),
48–50.
Laclau, E. and Mouffe, C. (1985). Hegemony and Socialist Strategy: Towards a
Radical Democratic Politics. London: Verso.
Landers, E. (2017, 6 June). White House: Trump’s Tweets are ’Official Statements’.
CNN. https://www.cnn.com/2017/06/06/politics/trump-tweets-official-statements/
index.html
Latour, B. (1987). Science in Action: How to Follow Scientists and Engineers

through Society. Cambridge, MA: Harvard University Press.
Latour, B. (1999). Pandora’s Hope: Essays on the Reality of Science Studies.

Cambridge, MA: Harvard University Press.
Latour, B. (2005). Reassembling the Social: An Introduction to Actor-Network-

Theory. Oxford: Oxford University Press.
Latour, B. and Callon, M. (1981). Leviathan: How Actors Macro-Structure Reality and
How Sociologists Help Them to Do So. In K. Knorr-Cetina and A. V. Cicourel (eds),
Advances in Social Theory and Methodology: Toward an Integration of Micro- and
Macro-Sociologies (pp. 277–303). London: Routledge.
Latour, B. and Callon, M. (1992). Don’t Throw the Baby Out with the Bath School! A
Reply to Collins and Yearley. In A. Pickering (ed.), Science as Practice and
Culture (pp. 343–68). Chicago: University of Chicago Press.
Latour, B. and Woolgar, S. (1979). Laboratory Life: The Construction of Scientific

Facts. Princeton, NJ: Princeton University Press.
Law, J. (1999). After ANT: Complexity, Naming and Topology. Sociological Review,
47(1_suppl), 1–14.
Law, J. (2004). After Method: Mess in Social Science Research. London: Routledge.
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., and
Alstyne, M. V. (2009). Computational Social Science, 323, 721–3.
Lévi-Strauss, C. (1966). The Savage Mind. London: Weidenfeld and Nicolson.
Lévy, P. (1999). Collective Intelligence: Mankind’s Emerging World in Cyberspace.

Cambridge, MA: Perseus Books.
Liddle, D. (2012). Reflections on 20,000 Victorian Newspapers: ‘Distant Reading’

The Times using The Times Digital Archive. Journal of Victorian Culture, 17(2),
230–37.
Lindgren, S. (2013). New Noise: A Cultural Sociology of Digital Disruption. New

York: Peter Lang.
Lindgren, S. (2017). Digital Media and Society. London: Sage.
Lizardo, O. (2015). The End of Theorists: The Relevance, Opportunities, and

Pitfalls of Theorizing in Sociology Today (Pamphlet based on the Lewis Coser
Memorial Lecture, delivered at the 2014 Annual Meeting of the American Sociological
Association). San Francisco, CA.
Lovink, G. (2002). Dark Fiber: Tracking Critical Internet Culture. Cambridge, MA:
MIT Press.
Lovink, G. (2013). A World Beyond Facebook. In G. Lovink and M. Rasch (eds), Unlike
Us Reader: Social Media Monopolies and their Alternatives (pp. 9–15). Amsterdam:
Institute of Network Cultures.
Lupton, D. (2014). Digital Sociology. Abingdon: Routledge.
Lyotard, J. F. (1984). The Postmodern Condition. Manchester: Manchester University

Press.
McAdam, D. and Sewell, W. H. (2001). It’s about Time. In R. R. Aminzade, J. A.

Goldstone, D. McAdam, E. J. Perry, W. H. Sewell, S. G. Tarrow, and C. Tilly (eds),
Silence and Voice in the Study of Contentious Politics (pp. 89–125). Cambridge:
Cambridge University Press.
McCarthy, E. D. (1996). Knowledge as Culture: The New Sociology of Knowledge.

London: Routledge.
McGill, A. (2017). Trump-twitter-classify. https://github.com/arm5077
McLaughlin, T. (1996). Street Smarts and Critical Theory: Listening to the

Vernacular. Madison: University of Wisconsin Press.
McLuhan, M. and Nevitt, B. (1972). Take Today: The Executive as Dropout. New York:
Harcourt Brace Jovanovich.
Malinowski, B. (1922). Argonauts of the Western Pacific: An Account of Native

Enterprise and Adventure in the Archipelagoes of Melanesian New Guinea. London:
Routledge.
Mannheim, K. ([1929] 1954). Ideology and Utopia: An Introduction to the Sociology

of Knowledge. London: Routledge and Kegan Paul.
Manovich, L. (2012). Trending: The Promises and the Challenges of Big Social Data.
In M. K. Gold (ed.), Debates in the Digital Humanities. Minneapolis: University of
Minnesota Press. http://dhdebates.gc.cuny.edu/
Margetts, H., John, P., Hale, S. A., and Yasseri, T. (2017). Political Turbulence:
How Social Media Shape Collective Action. Princeton: Princeton University Press.
Markham, A. N. (2013). Undermining ‘Data’: A Critical Examination of a Core Term in

Scientific Inquiry. First Monday, 18(10).
Markham, A. N. and Baym, N. K. (2009). Internet Inquiry: Conversations About

Method. Los Angeles, CA: Sage.
Marres, N. (2017). Digital Sociology: The Reinvention of Social Research.

Cambridge: Polity.
Marres, N. and Gerlitz, C. (2016). Interface Methods: Renegotiating Relations

Between Digital Social Research, STS and Sociology. Sociological Review, 64(1), 21–
46.
Marwick, A. E. and Caplan, R. (2018). Drinking Male Tears: Language, the

Manosphere, and Networked Harassment. Feminist Media Studies, 18(4), 543–59.
Marx, K. (1844). Economic and Philosophic Manuscripts of 1844. Amherst, NY:

Prometheus Books.
Marx, K. ([1859] 1904). A Contribution to the Critique of Political Economy.

Chicago: Charles H. Kerr.
Massanari, A. (2017). #Gamergate and The Fappening: How Reddit’s Algorithm,

Governance, and Culture Support Toxic Technocultures. New Media Society, 19(3),
329–46.
Mayer-Schönberger, V. and Cukier, K. (2013). Big Data: A Revolution That Will

Transform How We Live, Work, and Think. Boston, MA: Eamon Dolan/Houghton Mifflin
Harcourt.
Merton, R. K. (1936). The Unanticipated Consequences of Purposive Social Action.

American Sociological Review, 1(6), 894–904.
Merton, R. K. (1976). Sociological Ambivalence and Other Essays. New York: Free
Press.
Meulman, J. J., Hubert, L. J., and Heiser, W. J. (1998). The Data Theory Scaling
System. In A. Rizzi, M. Vichi, and H.-H. Bock (eds), Advances in Data Science and
Classification (pp. 489–96). Springer, Berlin, Heidelberg.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of
Word Representations in Vector Space.
Milner, R. M. (2016). The World Made Meme: Public Conversations and Participatory
Media. Cambridge, MA: MIT Press.
Moore, G. (1965). Cramming More Components Onto Integrated Circuits. Electronics,

38(8), 114–17.
Moretti, F. (2013). Distant Reading. London: Verso.
Morozov, E. (2011). The Net Delusion: The Dark Side of Internet Freedom. New York:
Public Affairs.
Mouffe, C. (2000). The Democratic Paradox. London; New York: Verso.
Mulkay, M. and Gilbert, G. N. (1982). Accounting for Error: How Scientists

Construct Their Social World When They Account for Correct and Incorrect Belief.
Sociology, 16(2), 165–83.
Müller, A. C. and Guido, S. (2016). Introduction to Machine Learning with Python: A

Guide for Data Scientists. Beijing: O’Reilly Media.
Murtagh, F. (2018). The Geometry and Topology of Data and Information for Analytics
of Processes and Behaviours: Building on Bourdieu and Addressing New Societal
Challenges.
Papacharissi, Z. and de Fatima Oliveira, M. (2012). Affective News and Networked

Publics: The Rhythms of News Storytelling On #Egypt. Journal of Communication,
62(2), 266–82.
Peirce, C. S. (1878). Deduction, Induction, and Hypothesis. Popular Science

Monthly, 13, 470–82.
Pennebaker, J. W., Boyd, R. L., Jordan, K., and Blackburn, K. (2015). The
Development and Psychometric Properties of LIWC2015. University of Texas at Austin.
Phillips, W. (2015). This Is Why We Can’t Have Nice Things: Mapping the
Relationship Between Online Trolling and Mainstream Culture. Cambridge, MA: MIT
Press.
Phillips, W. and Milner, R. M. (2017). The Ambivalent Internet: Mischief, Oddity,

and Antagonism Online. Cambridge: Polity.
Pink, S. and Lanzeni, D. (2018). Future Anthropology Ethics and Datafication:

Temporality and Responsibility in Research. Social Media and Society, 4(2).
Portes, A. (2000). The Hidden Abode: Sociology as Analysis of the Unexpected: 1999
Presidential Address. American Sociological Review, 65(1), 1.
Portwood-Stacer, L. (2013). Media Refusal and Conspicuous Non-Consumption: The

Performative and Political Dimensions of Facebook Abstention. New Media Society,
15(7), 1041–57.
Price, D. J. de S. (1986). Little Science, Big Science … and Beyond. New York:
Columbia University Press.
Purdam, K. and Elliot, M. (2015). The Changing Social Science Data Landscape. In P.
Halfpenny and R. Procter (eds), Innovations in Digital Research Methods (pp. 25–
58). London: Sage.
Rainie, H. and Wellman, B. (2012). Networked: The New Social Operating System.
Cambridge, MA: MIT Press.
Reddit (2012). R/IAmA – I Am Barack Obama, President of the United States – AMA.
Reddit.
https://www.reddit.com/r/IAmA/comments/z1c9z/i_am_barack_obama_president_of_the_uni
ted_states/
Reddit (2013). R/MuseumOfReddit – The Boston Bombing Debacle. Reddit.

https://www.reddit.com/r/MuseumOfReddit/comments/1iv343/the_boston_bombing_debacle/
Reddit (2014). R/MuseumOfReddit – The Saga of Mr Splashy Pants. Reddit.

https://www.reddit.com/r/MuseumOfReddit/comments/1xhkzb/the_saga_of_mr_splashy_pant
s/
Reddit (2015). R/MuseumOfReddit – The Fappening. Reddit.

https://www.reddit.com/r/MuseumOfReddit/comments/2pclqw/the_fappening/
Reddit (2019). Content Policy. https://www.redditinc.com/policies/content-policy
Rehurek, R. and Sojka, P. (2010). Software Framework for Topic Modelling with Large
Corpora. In Proceedings of the Lrec 2010 Workshop on New Challenges for Nlp
Frameworks (pp. 45–50).
Robson, C. and McCartan, K. (2016). Real World Research: A Resource for Users of
Social Research Methods in Applied Settings. Chichester: Wiley.
Robson, K. and Sanders, C. (eds). (2009). Quantifying Theory: Pierre Bourdieu.

Dordrecht: Springer.
Rogers, R. (2013). Digital Methods. Cambridge, MA: MIT Press.
Roose, H., Roose, W., and Daenekindt, S. (2018). Trends in Contemporary Art
Discourse: Using Topic Models to Analyze 25 years of Professional Art Criticism.
Cultural Sociology, 12(3), 303–24.
Roth, S. (2019). Digital Transformation of Social Theory. A Research Update.

Technological Forecasting and Social Change, 146, 88–93.
Russell, M. A. (2018). Mining the Social Web. Sebastopol, CA: O’Reilly Media.
Salganik, M. J. (2018). Bit by Bit: Social Research in the Digital Age. Princeton:
Princeton University Press.
Salter, M. (2018). From Geek Masculinity to Gamergate: The Technological

Rationality of Online Abuse. Crime, Media, Culture, 14(2), 247–64.
Salton, G., Wong, A., and Yang, C. S. (1975). A Vector Space Model for Automatic
Indexing. Communications of the ACM, 18(11), 613–20.
Sampson, T. D. (2012). Virality: Contagion Theory in the Age of Networks.

Minneapolis: University of Minnesota Press.
Scheler, M. (1924). Problems of a Sociology of Knowledge. London: Routledge.
Schroedl, C. (2019). Minimum Spanning Tree Plugin for Gephi.

https://gephi.org/plugins/#/plugin/spanning-tree-plugin
Schutt, R. and O’Neil, C. (2013). Doing Data Science. Beijing: O’Reilly Media.
Shackman, G., Wang, G., and Liu, Y.-L. (2002). How Societies Change: A Study of
Changes in Social, Economic and Political Structures.
http://gsociology.icaap.org/report/.
Shannon, P. (2003). Cytoscape: A Software Environment for Integrated Models of

Biomolecular Interaction Networks. Genome Research, 13(11), 2498–504.
Shifman, L. (2014). Memes in Digital Culture. Cambridge, MA: MIT Press.
Silverman, M. (2012). Reddit: A Beginner’s Guide. Mashable.

https://mashable.com/2012/06/06/reddit-for-beginners/
Simmel, G. (1895). The Problem of Sociology. Annals of the American Academy of

Political and Social Science, 6, 52–63.
Simmel, G. (1950). The Sociology of Georg Simmel (ed. Kurt H. Wolff). Glencoe, IL.
Simon, F. (2018). The Big Data Panic. Medium.

https://medium.com/@FelixSimon/cambridge-analytica-and-the-big-data-panic-
5029f12e1bcb
Singer, P., Flöck, F., Meinhart, C., Zeitfogel, E., and Strohmaier, M. (2014).
Evolution of reddit: From the Front Page of the Internet to a Self-Referential
Community? In Proceedings of the 23rd International Conference on World Wide Web –
WWW ’14 Companion (pp. 517–22). Seoul, Korea: ACM Press.
spaCy. (2019). Annotation Specifications · spaCy API Documentation.

https://spacy.io/api/annotation
Spector, N. (2017). Hipster Internet Favorite Reddit May Have to Lose its Edge to
Go Public. NBC News. https://www.nbcnews.com/tech/tech-news/hipster-internet-
favorite-reddit-may-have-lose-its-edge-go-n824866
Suler, J. (2004). The Online Disinhibition Effect. CyberPsychology and Behavior,

7(3), 321–6.
Surowiecki, J. (2005). The Wisdom of Crowds: Why the Many Are Smarter Than the Few.
London: Abacus.
Swidler, A. and Arditi, J. (1994). The New Sociology of Knowledge. Annual Review of
Sociology, 20(1), 305–29.
Tarde, G. (1903). The Laws of Imitation. New York: Henry, Holt and Co.
Tausczik, Y. R. and Pennebaker, J. W. (2010). The Psychological Meaning of Words:

LIWC and Computerized Text Analysis Methods. Journal of Language and Social
Psychology, 29(1), 24–54.
TensorFlow. (2019). Embeddings. https://www.tensorflow.org/guide/embedding
tensorflow, G. (2019). TensorFlow: An Open Source Machine Learning Framework for

Everyone. https://github.com/tensorflow/tensorflow
Thorsen, E. and Allan, S. (2014). Citizen Journalism: Global Perspectives Volume 2.

New York: Peter Lang Publishing, Inc.
Tirosh, N. (2017). Reconsidering the ‘Right to Be Forgotten’: Memory Rights and the
Right to Memory in the New Media Era. Media, Culture and Society, 39(5), 644–60.
Törnberg, A. (2017). The Wicked Nature of Social Systems: A Complexity Approach to

Sociology. Gothenburg: University of Gothenburg.
Trepte, S. and Reinecke, L. (eds). (2011). Privacy Online: Perspectives on Privacy

and Self-Disclosure in the Social Web. Berlin, Heidelberg: Springer Berlin
Heidelberg.
Trow, M. (1957). Comment on ’Participant Observation and Interviewing: A

Comparison’. Human Organization, 16(3), 33.
Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MA: Addison-Wesley.
Tweepy. (2009). Streaming With Tweepy – tweepy 3.3.0 documentation.

http://docs.tweepy.org/en/v3.4.0/streaming_how_to.html
Urban Dictionary (2018). Data Piñata. https://www.urbandictionary.com/define.php?

term=data%20pi%C3%B1ata
Urban Dictionary (2019). Social Justice Warrior.

https://www.urbandictionary.com/define.php?term=Social%20Justice%20Warrior
Van Dijck, J. and Poell, T. (2013). Understanding Social Media Logic. Media and
Communication, 1(1), 2–14.
Wacquant, L. J. D. (1992). Towards a Social Praxeology: The Structure and Logic of

Bourdieu’s Sociology. In L. J. D. Wacquant and P. Bourdieu, An Invitation to
Reflexive Sociology (pp. 1–59). Chicago: University of Chicago Press.
Waeraas, A. and Nielsen, J. A. (2016). Translation Theory ‘Translated’: Three

Perspectives on Translation in Organizational Research: Translation Theory
‘Translated’. International Journal of Management Reviews, 18(3), 236–70.
Warner, M. (2002). Publics and Counterpublics. New York: Zone Books.
Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and

Applications. Cambridge: Cambridge University Press.
Webb, J., Schirato, T., and Danaher, G. (2002). Understanding Bourdieu. London:
Sage.
Weber, M. ([1921] 1978). Economy and Society: An Outline of Interpretive Sociology

(Vol. 1). Berkeley, CA: University of California Press.
Wendling, M. (2018). Alt-Right: From 4chan to the White House. London: Pluto Press.
Wetherell, M. and Potter, J. (1992). Mapping the Language of Racism: Discourse and
the Legitimation of Exploitation. Hemel Hempstead: Harvester Wheatsheaf.
Wikipedia. (2018). Big Data. Wikipedia. https://en.wikipedia.org/wiki/Big_data
Wikipedia. (2019a). Controversial Reddit Communities. Wikipedia.

https://en.wikipedia.org/w/index.php?
title=Controversial_Reddit_communities&oldid=894715266
Wikipedia. (2019b). Cosine Similarity. Wikipedia.

https://en.wikipedia.org/wiki/Cosine_similarity
Wikipedia. (2019c). Feminazi. Wikipedia. https://en.wikipedia.org/w/index.php?

title=Feminazi&oldid=897276161
Wikipedia. (2019d). Men Going Their Own Way. Wikipedia.

https://en.wikipedia.org/w/index.php?title=Men_Going_Their_Own_Way&oldid=896734670
Wikipedia. (2019e). Reddit. Wikipedia. https://en.wikipedia.org/w/index.php?

title=Reddit&oldid=893855505
Wikipedia. (2019f). Social Justice Warrior. Wikipedia.

https://en.wikipedia.org/w/index.php?title=Social_justice_warrior&oldid=897053380
Williams, M. L., Burnap, P., and Sloan, L. (2017). Towards an Ethical Framework for
Publishing Twitter Data in Social Research: Taking into Account Users’ Views,
Online Context and Algorithmic Estimation. Sociology, 51(6), 1149–68.
Williams, Z. (2018). ‘Raw Hatred’: Why the ‘Incel’ Movement Targets and Terrorises
Women. Guardian. https://www.theguardian.com/world/2018/apr/25/raw-hatred-why-
incel-movement-targets-terrorises-women
yWorks (2019). yFiles Layout Algorithms for Cytoscape.

https://www.yworks.com/products/yfiles-layout-algorithms-for-cytoscape
Index
4chan 39, 119
abduction 160
Abel, Carl 58
abstraction 170
actor-network theory (ANT) 3, 47, 75–90

actors/actants 77, 80–2
blackboxing 79–80
chains of association 4, 77
connections 83–8
enrolment 79
interessement 79
machine learning 4, 84
mapping actor-worlds 83–8
mobilisation of spokespersons 79
NER analysis 4, 80–2, 83, 84, 86, 88
problematisation 78–9
translation 77–80
visualisation 84
Ahmed, Sara 64
alienation 18–19
alt-right 55, 117–18, 119, 122
ambiguous social media practice 42, 44, 47, 49–50, 51, 58, 69, 73
ambivalence 3, 49, 57–60
anarchism 20–1, 26, 90
anonymity 39, 41, 102–3, 173
Arab Spring 40
articulation 120, 121, 124
assemblage theory 5, 21, 125–6, 128
attention capital 141, 142
Bauman, Zygmunt 72–3
big data 5, 9, 10–11, 13, 30–1, 166–7
defining 10–11
mythology of 11
#BlackLivesMatter 40
blackboxing 79–80
Boston Marathon Bombings 101
Bourdieu, Pierre 2
capital 131, 134–43
eclecticism 171–2
habitus 131, 133–5, 139, 140, 141, 142
social fields 131, 132–3, 134–5, 143, 168–70
theory of social practice 5, 131, 132, 171–2
bricolage 29–30, 149
Callon, Michael 3, 75, 77–80, 81, 83–5, 86
see also actor-network theory
Cambridge Analytica 6, 152–4, 1555–6
capital
attention capital 141, 142
Bourdieu’s notion of 131, 134–43
connective capital 140, 142
cultural capital 134, 138–9, 140, 168
economic capital 134, 136, 139, 140, 141, 168
engagement capital 140, 141, 142
social capital 134, 137, 139, 140, 141
symbolic capital 5, 134, 136–7, 141, 142
Castells, Manuel 8, 46, 130, 159
chains 116–18
of association 4, 77
of equivalence 124, 125, 128

Citizen Kane 62
climate
on Reddit 109, 126, 127
collective representations 4–5, 91–128
see also assemblage theory; discourse theory; Reddit; word embedding
computational methods 1, 2, 9, 13, 92, 118, 154–5, 158
computational social science 2, 14
computational text analysis 3, 91, 96, 99, 155–8
connective action 8
connective capital 140, 142
corpus 92, 93, 94, 96
counterpublics 71
covfefe tweet 3, 54–74, 172
error 60–4, 69–70
flowchart 70
immediacy of interest 64–5, 67, 69, 70
social media backfire 69–74
cultural capital 134, 138–9, 140, 168
Cytoscape software 84
data mining 23
data piñata 22–6, 155, 175
definition 23
data science 12–13, 20, 22–3, 92
as data-driven 23, 26
as ethnography 148–65
data theory 14–16
statistics 21–2
use of term 21
datafication 7, 8, 9–14
Dawkins, Richard 43
deduction 91, 147, 160
deep data 24–5
deep mediatisation 8, 31
#deletefacebook 6, 152–5, 162
Deleuze, Gilles 5, 125, 126
Denzin, Norman 29, 30, 149
digital sociology 15, 167
discourse theory 119–24
articulation 120, 121, 124
chains of equivalence 124, 125, 128
discourse 119, 120–6
elements 120, 122
high discourse 39
moments 120, 121
nodal points 120, 121, 124, 126
word2vec and 119–24
disinhibition online 102
disruptive spaces 37, 39, 41–2, 71
distant reading 97–100
distributional semantics 93–4, 96, 117
Douglas, Mary 59, 113
Durkheim, Émile 47–9, 50–1, 52–3, 109, 118

Durkheimian approach 4, 92
economic capital 134, 136, 139, 140, 141, 168
elements 120, 122
emoji 115, 116
empirical interest 151, 152–4
engagement capital 140, 141, 142
ethical amnesia 40
ethics 172–3
ethnography
conventional 148
data science as 148–65
Evidenz concept 17, 52
Facebook see #deletefacebook
feminism and related terms 110–12, 117, 121–3
Feyerabend, Paul 7–8
anarchism 20–1, 26
field theory
social fields 131, 132–3, 134–5, 143, 168–70
Foucault, Michel 89–90, 119, 156–7, 168
4chan 39, 119
Freud, Sigmund 57–8, 89
Geertz, Clifford 33, 149, 150
Gephi software 84, 110, 124–5
Minimum Spanning Tree plugin 84
Gergen, Kenneth 116–17

Glaser, Barney 6, 20, 150, 170–1
grounded theory 20, 145, 146, 148, 160
theoretical I/O 158–65
theoretical sensitivity 20, 25–6, 144–8, 154
González-Bailón, Sandra 49, 53, 150
Google News 105–6, 107, 111, 112
Gottschalk-Gleser method 156
Gramsci, Antonio 36, 120
grounded theory 20, 145, 146, 148, 160
Guattari, Félix 5, 125, 126
habitus 131, 133–5, 139, 140, 141, 142
hacking 27, 82, 83, 88, 122, 126
Harris, Zellig 93–4
hegemony 36, 37, 120
ideal types 45–7, 72–3
Indignados Movement 40
induction 91, 145, 160
instruments of revelation 34–5
interface methods 15–16
interpretive interface 30–2
Kozinets, Robert 148, 150
Kruskal, Joseph 84
Laclau, Ernesto 2, 5, 114
Latent Dirichlet Allocation (LDA) 4, 84, 96, 143
Latent Semantic Analysis (LSA) 95–6
Latour, Bruno 3, 27, 47, 75, 76–9, 80, 81

Law, John 3, 7, 32, 44, 75, 82, 167
Lévi-Strauss, Claude 29
Lincoln, Yvonna 29, 30, 149
LIWC (Linguistic Inquiry and Word Count) 155–8, 161, 162
Lizardo, Omar 143, 168, 170–1
Luckmann, Thomas 109
Lupton, Deborah 11, 15
Lyotard, Francois 57
machine learning 5, 12, 125, 126, 143
neural networks 99
McLaughlin, Thomas 89, 167–8, 170
McLuhan, Marshall 63
Malinowski, Bronislaw 150
Mannheim, Karl 89, 110, 113–14, 117, 118
manosphere 123
March for Science 101
Margetts, Helen 9
Marres, Noortje 2, 3, 15–16, 86
Marx, Karl 89
critical theory 6, 159–60, 162–3
sociology of knowledge 107–8
theory of alienation 18–19
mass self-communication 8
Mauss, Marcel 118

memes 42–4, 47, 50, 73
Merton, Robert K. 3, 55, 57, 58, 72
error 60
immediacy of interest 64–5, 67, 69, 70
unanticipated consequences 60, 64–5, 67, 68
messiness 32, 47, 71–2, 149–50
methodological inclination 151, 155–8
#MeToo 40
Mikolov, Tomas 92, 98
mobilisation 38, 40, 41, 78, 79, 115
moments 120, 121
Moore’s Law 12
Morozov, Evgeny 40–1, 45
Mouffe, Chantal 2, 5, 36, 114
Named Entity Recognition (NER) 4, 80–2, 83, 84, 86, 88
netnography 148, 150
networked individualism 8
networked publics 57, 58, 63–4, 67
neural networks 98–9
nodal points 120, 121, 124, 126
nodes 116–18
Obama, Barack 40, 55, 101
#Occupy 40
online disinhibition effect 102
open-source theories 4, 89
Papacharissi, Zizi 131
Peirce, Charles Sanders 160

Pennebaker, James W. 155
Phillips, Whitney 37–8, 45–7, 49
philosophy of science 2, 7
platforms 8–9, 42
politics 36–53
see also social media politics
Potter, Jonathan 105
Price, Derek J. de Solla 34–5
Principal Component Analysis (PCA) 121
problematisation 78–9
pushshift. io 104–5
qualitative and quantitative methods 13, 20, 28–9, 173
interpretive interface 30–2
Reddit
alt-right discourse 117–18, 119, 122
anonymity 102–3
climate on 109, 126, 127
content policy 103–4
corpus 106, 112
culture 101–2, 104
discourse theory and 119–24
feminism and related terms 110–12, 117, 121–3
language of 104–16
nodes and chains 116–18
rating system 100
subreddits 100, 104
terrorism on 105–7, 126, 127

rhizomes 5, 125
Rogers, Richard 33–4, 167
semantic meanings 93–4
sensitising concepts 164
Simmel, Georg 18, 50–2
social capital 134, 137, 139, 140, 141
social change 38, 39, 41
social cryptography 52–3
social fields 131, 132–3, 134–5, 143, 168–70
social forms 18, 50–2
social media discourse 130–1
social media ecology 8, 50
social media politics 9, 17, 39–42
ambivalence 37–8, 41, 45
online campaigning 40
social status 5, 138
sociality 8, 31–2, 47, 49, 109
sociology of knowledge 4–5, 91–2, 107–20, 128
definitions 107–9
spaCy library 80–8
spreadable media 63–4
stickiness 64
Strauss, Anselm 20, 26, 164
subcultures 71, 94, 132
surface data 24–5
Swedish general election tweets 129–43
symbolic capital 5, 134, 136–7, 141, 142

synonyms 17–18
Tarde, Gabriel 43
technological determinism 40, 72
text analysis 3, 91, 96, 99, 155–8
text mining 5, 92, 97
theoretical I/O 158–65
theoretical sensitivity 6, 20, 25–6, 144–8
theory
end of theory/theorists 168, 175
Foucault on 89–90
global theories 89–90, 168
McLaughlin on 89
old and new theories 17–19, 72
open-source theories 4, 89
ownership of 88–90
vernacular theory 89, 168, 170
thick descriptions 33
translation 77–80
trolling 45–7
Trump, Donald 3
climate change denial 101
presidential campaign 41, 55, 59
Twitter-use 55–6
Tukey, John W. 12
Tweepy 75, 130
Twitter
actor-network analysis 75–90
#deletefacebook 6, 152–5, 162
social space of political tweets 139–43
Streaming API 75, 80, 130, 154
Swedish general election tweets 129–43
Umbrella Revolution 40
unanticipated consequences
causality and 68
error and 60–4
immediacy of interest 64–5
Merton on 60, 64–5, 67, 68
rationalizations 68
Urban Dictionary 23, 117
vector space model 4, 94–5, 128
Latent Semantic Analysis (LSA) 95–6
vernacular theory 89, 168, 170
Verstehen concept 16–17, 33, 52
virality 42–4
visualisation 4, 84, 121
Weber, Max 53, 58, 149–50
Evidenz 17, 52
ideal types 45–7, 72–3
Verstehen 16–17, 33, 52
weirdness of internet culture 37, 46–7, 50
Wellman, Barry 8
Weltanschauung 117
Wetherell, Margaret 105
wickedness 8–9
Wikipedia 10, 101
word2vec 110, 124, 191
discourse theory and 119–24
Google News model and 105–6, 107, 111, 112
method of 93
neural network model 98–9
queries 108–9
World of Warcraft 39
yFiles orthogonal layout algorithm 84
POLITY END USER LICENSE AGREEMENT
Go to www.politybooks.com/eula to access Polity's ebook EULA.

Simon Lindgren Data Theory Interpretive Sociology and Computational Methods Polity 2020

Uploaded by

Copyright:

Available Formats

Simon Lindgren Data Theory Interpretive Sociology and Computational Methods Polity 2020

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simon Lindgren Data Theory Interpretive Sociology and Computational Methods Polity 2020

Uploaded by

Copyright:

Available Formats

Contents

Outline of this book

Verstehen and Evidenz

Theories old and new

Breaking things to move forward

The interpretive interface

2 Decoding Social Forms

The odd places of politics

Social media politics for better or for worse

Virality and memes

The Weber connection: Ambivalence and trolling as ideal types

The Durkheim connection: Society > the sum of its parts

The Simmel connection: Social forms

The thumb-typing leader of the free world

Social media backfire

Finding the actors

Who owns theory?

Words and the company they keep

Learning from afar

Mapping the language of Reddit

Nodes and chains

Capital and habitus

Adapting Bourdieu’s capital forms to social media data

A social space of political tweets

Gaining theoretical sensitivity

Data science as ethnography

End User License Agreement

Interpretive Sociology and Computational Methods

Copyright © Simon Lindgren 2020

First published in 2020 by Polity Press

Cambridge CB2 1UR, UK

101 Station Landing

Medford, MA 02155, USA

ISBN-13: 978-1-5095-3927-7 (hardback)

ISBN-13: 978-1-5095-3928-4 (paperback)

Typeset in 10.5 on 12pt Sabon

by Fakenham Prepress Solutions, Fakenham, Norfolk NR21 8NL

Printed and bound in Great Britain by TJ International

For further information on Polity, visit our website: politybooks.com

Furthermore, the proposal that I make in this book is meant to be modest. A

Outline of this book

Chapter 4, Actor-Networks, provides an example of how computational approaches can

Chapter 5, Collective Representations, is focused on introducing early twentieth-

Paul Feyerabend, the refreshingly provocative enfant terrible of the philosophy of

(Margetts et al., 2017, p. 74)

As these authors argue, there is indeed a complexity (and complicatedness) of

(boyd and Crawford, 2012, p. 664)

Many popular examples exist for illustrating how datafication is growing

(Lazer et al., 2009, p. 721)

With co-author Caroline Gerlitz, Marres suggests that we go beyond previous

(Marres, 2017, p. 107)

Verstehen and Evidenz

to an objectively ‘correct’ meaning or one which is ‘true’ in some metaphysical

(Weber, [1921] 1978, p. 4)

Still, he continued, interpretive sociology ‘like all scientific observations,

Theories old and new

We find, for example, the same forms of authority and subordination, of

This is obviously a vastly open question with a multitude of potential answers.

an academic data scientist is a scientist, trained in anything from social science

But even if we sometimes may have actual, real-life, well-motivated questions to