Syllabus
Syllabus
Data Wise:
Data Science in Society
Data as evidence
Academic Integrity 4
Proper Referencing 4
Contact information 4
Weekly schedule 5
Week 1 Introduction to Data as Evidence: Characteristics of Data and Big Data Myths 5
This course introduces a set of concepts to assess how data shapes science, policy, and politics,
including how data is turned into metrics that are used to make decisions. Using a series of concrete
examples from areas such as sports, health and climate change, it connects data as a highly
technological practice to broad social questions of evidence.
In this course, we will focus specifically on the recent changes in how we use data as evidence. We
will analyze the recent growth of types and amount of data (datafication) and the different ways data
can be used as evidence. By the end of the course, you will master the following concepts: causation,
correlation, description, features, probability, sampling, model, population. You will also learn to
analyze data cycles and to map out knowledge production systems. A number of current issues will
also be examined from the perspective of data: fake news, bubbles, algorithmic discrimination, and the
prominence of data in the anti-pandemic measures. Through the combination of conceptual tools and
assignment, you will be well equipped to assess data sets and to address what you can and cannot do
with it.
In particular, the course will address the conceptual and analytic tools you need to develop good
critical data skills.
The elective course provides a theoretical basis for your Collaborative Data Projects and connect to
other courses through the centrality of data in all the questions addressed.
Learning goals
Upon the successful completion of this course, students will be able to:
1) Understand data production and data journeys and their implications
2) Understand technical, functional and epistemic dimensions of data and their interactions
3) Evaluate data as evidence in relation to knowledge needs
Course structure
The course runs for 6 weeks. There is one class of two hours every week.
The course accounts for 2.5, distributed across the following elements:
12 hrs lectures and in-class sessions
18 hrs reading and preparation for the lectures
18 hrs assignment 1
18 hrs assignment 2
4 hrs preparation of debate
Brightspace
We use the virtual learning environment “Brightspace” as the main platform for communication and
Blackboard Collaborate for the online classes. Here, you’ll find recommended literature, information
on assignments and your grades. Announcements regarding schedule -or content changes will also
be published in Brightspace.
All essential information about the course can be found in this syllabus. However, as we reserve the
right to change the syllabus, please keep track of Brightspace for the most up-to-date information.
Assessment
In this course, two assignments and one in-class activity will be assessed, as well as participation. All
aim to provide you with the opportunity to practice critical analysis skills and to receive feedback on
these. Such skills are essential to data work: they will enable you to perform really valuable work.
Many people can answer data questions, much fewer are able to make a strong case why a particular
data set is suitable to answer a particular question, or to improve the formulation of questions in
order to make sure you are asking the important questions, the right questions.
In all cases, the data and tasks may seem simple: this is deceptive! The assignment is not meant to
test your data crunching skills but to provide an opportunity to slow down and consider critically, in
detail, the assumptions, erasures, biases and constraints that arise in each step taken. Carefully
analyzing and reflecting on these aspects will enable you to develop basic skills that you will then be
able to apply to more complex situations.
The assessment will be done on the basis of two assignments each worth 30% and preparation and
performance in a debate 20% and a participation grade worth 20%. See appendix.
Academic Integrity
Cheating and plagiarism are academic offences, with severe consequences. They are acts or
omissions by students to partly or wholly hinder accurate assessment. As per the Teaching and
Examination Regulations, cases of cheating and plagiarism are reported to Exam Board and the
Board will decided upon the consequences.
Proper Referencing
Use an accepted academic referencing system such as author-date, Vancouver-numeric, etc. Different
fields use different referencing systems because different systems are more suitable for the specific
kind of scientific communication (articles, book) and the particular type of sources in a field (artworks,
legal texts, books, articles, patents, images).
If you directly quote a source (for example, take over a piece of text or passage in an article), you
must indicate this with quotation marks at the beginning and end of the quotation, and refer to the
source, including page number. If you paraphrase a source (write up what it says in your own words),
you must also indicate the source in the text with a reference. If you put forth particular facts or
statements about situations or phenomena that are not ‘common knowledge’, you must also provide
a reference.
It is NEVER suitable to simply add a list of sources at the end of a text. Proper referencing has two
components:
(1) You must indicate IN THE TEXT whenever a source is used
2) You must provide a list of these sources
If you do not do this, you make it impossible for the reader to evaluate the originality and strength of
your work. If you do not properly reference your sources, this amounts to plagiarism and will be
treated as such. This means your assignment will be referred to the Exam Board. Penalties for
plagiarism are serious and can vary from (but are not limited to) an automatic fail for the assignment,
exclusion from the course, or the impossibility of graduating with distinction.
Objectives
Students will be able to
• Understand recent dynamics in modes of data production, (including technological,
epistemic, economic, cultural and institutional dimensions)
• Recognize different discourses on data (big data hype, etc.)
Preparation
Readings must be done BEFORE class. That way, we can have more in-depth discussions during our
sessions. There is a bit more required reading this week! Be sure to take enough time to do all of it
and start the course off in a position of strength.
Required reading
Chapter 3, Characteristics of Data, in Beaulieu, A., & Leonelli, S. (2021). A Critical Introduction to
Data and Society. SAGE Publications Ltd.
Boyd, D., & Crawford, K. (2011). Six Provocations for Big Data (SSRN Scholarly Paper No. ID
1926431). Retrieved from Social Science Research Network website:
https://papers.ssrn.com/abstract=1926431
Alaimo, Cristina, and Jannis Kallinikos. 2017. ‘Computing the Everyday: Social Media as Data
Platforms’. The Information Society 33 (4): 175–91.
https://doi.org/10.1080/01972243.2017.1318327.
Beer, D. (2015). Productive measures: Culture and measurement in the context of everyday
neoliberalism. Big Data & Society, 2(1), 205395171557895.
https://doi.org/10.1177/2053951715578951
In this first class, we will first briefly revisit the concept of datafication and the intensification of data in
many spheres of life from Introduction to Data. We will quickly move on to address data creation and
the characteristics of data. Finally, we will review a number of big data myths and critically assess
them. Assignment 1 will be explained.
Objectives
Students will be able to:
• Analyze the context of data, platforms, creation and how it shapes meaning of data
• Apply concepts of traces versus observations and measurements & samples, populations,
bias to data they encounter
• Distinguish between Relational and representational views of data
• Be able to assess the limitations of data sets, including the issue of dark data and
phenomena such as Simpson’s paradox
Preparation
Required reading
There are three required readings for this week If you are quite new to data science (and actually, this
is a good refresher/overview for everyone) you might want to start by reading this document:
Data Science: A guide for society
https://www.askforevidence.org/articles/data-science-a-guide-for-society
Chapter 4, Data, Evidence and Knowledge in Beaulieu, A., & Leonelli, S. (2021). A Critical
Introduction to Data and Society. SAGE Publications Ltd.
Rieder, Gernot, and Judith Simon. 2016. ‘Datatrust: Or, the Political Quest for Numerical Evidence
and the Epistemologies of Big Data’. Big Data & Society 3 (1): 2053951716649398.
https://doi.org/10.1177/2053951716649398.
Content
We will explore what it means to use data as evidence. What are our criteria for evidence, and how
does data fulfill it? Are there other types of evidence? What are the differences between observation
and interpretation in knowledge production? We will further build on the insights about data creation
and provide concepts to name and navigate some of the ways in which data gets used and (features
of data sets) and valued as evidence (objectivity, quantification).
Also in this session, we will address key concept in using data for knowing and discuss how data
creation relates to what you want to do and can do with data—and vice versa. We will look at a
number of cases where we ‘know’ something on the basis of data and explore where trust in this
knowledge comes from. We will work towards a layered understanding of sampling and populations,
of the relation between technology and knowledge, of correlation and causation, prediction and
pattern detection, and of how black boxes play a role in data ecosystems (as basis for assignment 2).
Objectives
Students will be able to
• Understand how data circulates and is put to work
• Apply concepts to describe and assess how data shapes science, policy, and politics
(causation, correlation, prediction, description, features, probability, sampling, model,
population, black boxes)
• Distinguish the role of models, conventions, infrastructure, visualization and curation in
turning data into evidence
In our third week, we will focus on the importance of circulation for the meaning and value of data. A
number of concepts will be discussed that help us understand this circulation. The topic of data
visualization will also be an important topic, as a way of linking the issues of evidence and knowledge
we discussed in week 2 and the dynamics of circulation of data.
Preparation
Required reading
Chapter 5, Data Circulation in Beaulieu, A., & Leonelli, S. (2021). A Critical Introduction to Data and
Society. SAGE Publications Ltd.
Optional reading
Hand, David. 2019. ‘What Is the Purpose of Statistical Modelling?’ Harvard Data Science Review 1 (1).
https://doi.org/10.1162/99608f92.4a85af74.
Objectives
Students will be able to:
1) Critically assess the way data shapes science, policy, and politics, including how data are turned
into metrics that are used for decisions
3) Assess data sets for what you can and cannot do with them, how they might lead to outcomes that
can be negative (fake news, algorithmic discrimination) or positive (removing barriers, greater
efficiency and fairness)
4) Recognize how guidelines, codes of conducts and regulations apply to different data processes
Preparation
Required reading
Zook, M., Barocas, S., Boyd, D., Crawford, K., Keller, E., Gangadharan, S. P., … Pasquale, F. (2017).
Ten simple rules for responsible big data research. PLOS Computational Biology, 13(3),
e1005399. https://doi.org/10.1371/journal.pcbi.1005399
Recommended reading
Jasanoff, Sheila. 2017. ‘Virtual, Visible, and Actionable: Data Assemblages and the Sightlines of
Justice’. Big Data & Society 4 (2): 2053951717724477.
https://doi.org/10.1177/2053951717724477.
Schellenberg, S. 2020. How biased algorithms perpetuate inequality, The New Statesman, 29-04-
2020. https://www.newstatesman.com/science-tech/2020/04/how-biased-algorithms-
perpetuate-inequality
Content
In this session, we will discuss how data is increasingly used to make decisions that have implications
for weighty issues—healthcare, educational and employment opportunities, financial services and
ability to move across borders and in public and private spaces. Using a number of case studies, we
will discuss the current ways in which data is regulated and what might be needed to develop
responsible and socially just uses of data.
We will also explore why the concepts of opacity and transparency dominate current debates about
algorithms and AI. Several examples will be discussed (data gathering, contact tracing apps, etc.).
Objectives
Students will be able to
1) Apply the models and concepts learned during the course to argue specific positions with
regards to data issues
2) Demonstrate their grasp of the complex dynamics that shape how we value data and the
processes that rely on data
3) Debate an assigned position, listen to dissenting arguments, and counter the points brought
forward in the course of the debate.
Preparation
Required reading
Artificial Intelligence and Gender Bias. (n.d.). Retrieved 24 August 2019, from Catalyst website:
https://www.catalyst.org/research/trend-brief-gender-bias-in-ai/
Hao, K. (n.d.). This is how AI bias really happens—And why it’s so hard to fix. Retrieved 24 June 2019,
from MIT Technology Review website: https://www.technologyreview.com/s/612876/this-is-how-ai-
bias-really-happensand-why-its-so-hard-to-fix/
The Correspondent, 2020. The original Big Tech is working closer than ever with governments to
combat coronavirus – with no scrutiny
https://thecorrespondent.com/621/the-original-big-tech-is-working-closer-than-ever-with-
governments-to-combat-coronavirus-with-no-scrutiny/81373317498-76dea099
The Correspondent, 2020. Coronavirus apps show governments can no longer do without Apple or
Google https://thecorrespondent.com/546/coronavirus-apps-show-governments-can-no-longer-do-
without-apple-or-google/19583175612-bc49b1e0
O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens
Democracy. Crown/Archetype. Chapter 1
Neff, G., Tanweer, A., Fiore-Gartland, B., & Osburn, L. (2017). Critique and Contribute: A Practice-
Based Framework for Improving Critical Data Studies and Data Science. Big Data, 5(2), 85–97.
https://doi.org/10.1089/big.2016.0050
In this class, we will hold a debate on topics 1 and 2. You must attend class and take part in the
debate to be assessed on this part of the course.
Preparation
In this class, we will hold a debate on topics 3 and 4. You must attend class and take part in the
debate to be assessed on this part of the course.
10
For this assignment, you will investigate two elements that go into using data as evidence and assess
which conclusions you can draw. In this report, use concepts from lectures and readings from the first
four weeks of the course.
The case at the heart of this assignment is the calculation of the carbon footprint of your latest
holiday. In this assignment, you will have to
(1) Generate data on the relevant aspects of your latest holiday
(2) Use at least three different carbon calculators to analyse this data
(3) Put forth a conclusion on your carbon footprint based this data and the carbon calculation
11
12
This assignment is the opportunity to develop your skills in assessing data sets for what you can and
cannot do with them, how they might lead to outcomes that can be negative (fake news, algorithmic
discrimination) or positive (removing barriers, greater efficiency and fairness) and what makes them
suitable to answer particular questions and be analyzed in particular ways.
There are many aspects of data that shape how it can and cannot be used. Your task in this
assignment is to discover the strengths and limitations of a data set. This assignment will enable you
to4.6 Conclusion:
develop critical data Data
analysisscience in a relational
skills regarding the productionperspective
of data and to understand how
decisions about technical and conceptual aspects shape the data and what can be done with it.
We can now compare the more abstract cycle of knowledge production to the model of
To do this assignment:
data journeys in data science that we discussed earlier in Chapter 3 (see Figure 4.4).
-Get to know your data set
-Use the cycle of research from Beaulieu and Leonelli, chapter 4
Figure
Figure 4.4 cycle
1 The The steps of data
of research journeys
according toin
thedata science
relational (O’Neil
view and Schutt, 2013)
on data.
superimposed on the model of the research process. Adapted from Leonelli (2019 [EJPS]).
This combination of the two models enables us to see how data analysis functions as a
type of knowledge production. It also helps illustrate the kinds of work needed13 in
different steps of the process of knowledge production. As we turn to the steps involved
in data journeys, we will see how these different steps are structured by infrastructures
and
Course conventions
guide Data Wise (among other elements). These make it possible to put data to work.
We will also explore the knowledge and skills needed to ensure the integration of these
activities, so that the cycle can be completed.
-Reflect on each of these steps, to understand how each step implies interaction between the analysis
and the data, between the researcher and her or his object of study.
-report on the steps above, using the graphic of the cycle to structure your analysis. For each step,
make explicit the assumptions that go into shaping the data and what you can do with it. If your
project is not yet so advanced that you have done all the steps, that is not a problem. Be thorough
and detailed in what you have done with the data so far, and what you have been able to find out
about the data creation (if you didn’t collect/create it yourself)
Be sure to name and make explicit how each of these assumptions shapes how you interpret,
value and use data as evidence.
The report should be about 2000 words long. Please feel free to use annotated screenshots or other
supporting material.
14
This in-class activity will be assessed, for 20% of your grade. Both preparation and participation will
be graded. More details will be given in class.
Participation
Your grade will be based on your preparation for the classes (done the readings, etc), your
participation in the in-class activities and discussions, and of course your presence across the
sessions. This is worth 20% of your grade.
15