Stanley Stuff
Stanley Stuff
by
May 2022
ABSTRACT
Software systems can exacerbate and cause contemporary social inequities. As such,
scholars and activists have scrutinized sociotechnical systems like those used in facial
recognition technology or predictive policing using the frameworks of algorithmic bias and
dataset bias. However, these conversations are incomplete without study of data models: the
structural, epistemological, and technical frameworks that shape data. In Modeling Power: Data
Models and the Production of Social Inequality, I elucidate the connections between relational data
specifically white supremacy and cisgender normativity. This project has three distinct parts.
First, I historicize early publications by E. F. Codd, Peter Chen, Miles Smith & Diane
were products of their social and technical moments and, as such, reinforced dominant
Second, I reverse engineer Android applications (with Jadx and apktool) to uncover
the relational data models within. I analyze DAO annotations, create entity-relationship
diagrams, and then examine those resultant models, again linking them back to systems of
race and gender power. I craft a method for performing a reverse engineering investigation
Finally, I develop a relational data model that integrates insights from the project’s
that the temporality of modern digital computing is incommensurate with the temporality of
modern transgender lives. Following this, I speculate and build a trans-inclusive data model
i
that demonstrates uses of reification to actively subvert systems of racialized and gendered
power. By promoting aspects of social identity to first-order objects within a data model, I
Through each part, I argue that contemporary approaches to the social impacts of
software systems incomplete without data models. Data models structure algorithmic
ii
ACKNOWLEDGMENTS
This project, like all major projects, was a community and collaborative effort. I
have been keeping a list of people to mention in my acknowledgements since before I began
this project, afraid that I would forget people’s small generosities, and thus give the
impression that any part of this was a singular endeavor. As a result, these
acknowledgements may be far too long by academic standards; they are also far too short.
school with a cohort of powerful people. My experience would have been much less fun
without the companionship of Sean McCafferty, Noa Bruhis, Taylor Genovese, Shivam
Zaveri, Anna Muldoon, Vanya Bisht, Martín Perez Comisso, and Amanda Arnold. Our
weekly meetups at Casey’s kept me grounded during that first difficult year. While at ASU, I
also had the opportunity to spend time as a Graduate Fellow at the Institute for Humanities
Research. At the IHR, I got to share space with Leah Newsome, Stephanie Silva, Liz
Grumbach, and Dr. Mara Godoy-Lopez. Mara, in particular, has been a gift of support, care
and encouragement. While in Phoenix, I coached and performed at The Torch improv
theater, a delightful distraction from the business of school. During that time, I was also able
to reunite with Rachel Land Nystrom and Eric Nystrom – a surprise time travel reunion with
Part of this work was completed at Dartmouth College, while working in Jacque
Anna Gallert, Brinker Ferguson, Allie Martin, and so many lovely Dartmouth undergrads,
especially Kirby Phares and Caroline Casey. In Utah, where I landed during the first
lockdown, I thank Cale Carthey, Ruth Armbruster, Lou Schafer, the Iowa House
iii
community, and all of the perfect and beautiful queer and trans baristas at Sugar House
helping me navigate the intricacies of institutions, conferences, budgets, job markets, and all
the hidden curriculums. Sarah Florini and Anna Lauren Hoffmann have been especially
generous here.
The pandemic began shortly after I began writing, and when it started, I began
remotely coworking with some new and old friends. That group, now affectionally called
Good Zoom, is still going after more than 2 years and we have seen each other through
graduations, marriages, house purchases, the additions and losses of pets, big moves and
scary transitions. I am so grateful to my Good Zoom friends: Molly Morin, Liz Dietz,
Madison Weist, Allie Martin, Els Woudstra, Esther Grace Witte, Allie Tova Hirsch, Kirsten
Lee, Cassius Adair, and Jackie Pogue. Also on Zoom, I was greatly supported by writing
groups, including members of Good Zoom plus, variously, Jenny Brian, Dagmar van Engen,
As I began this dissertation, I lost Poodle, the old dog I mention in the introduction.
Poodle The Bulldog was my constant porcine companion from 2009 - 2020, and an eager
accomplice to early research on pet surveillance that informed this project. Since her death, I
gained two dogs: Marty and Chicken, who have been stalwart and delightful companions,
despite their disinterest in my work. Notable associate pets include Donut Pepper, Sydney
and Marisa Duarte – were steadfast supporters and cheerleaders from our very first
convening. Marisa is wise and kind and, when I was absolutely devastated by the scope of
iv
trans anguish in the archives and in the scholarship, helped me understand the potency of
those archives. Marisa’s work has underscored the importance of continuing to think about
those to whom I’m accountable. Jen has been encouraging, honest, pragmatic, and
supportive. I would not have ended up at ASU (nor stayed during a rough first semester)
were it not for Jen’s candor. Katina’s enthusiasm for her work and her students is
contagious. She is a vibrant and encouraging thinker and leader who has a talent for
connecting people and ideas to each other. Jacque was both my chair and advisor and in
those roles was an absolute force. She was generous with her time and thinking, giving me a
true apprenticeship for learning how to think in a very particular way. Any critical skill I
developed during school is in no small part a credit to her generosity and patience that were
strong enough to counter my stubbornness. She also fed me frequently, cared for me in the
wake of a very sudden surgery, and has made a grave for Poodle in her backyard (as an
incomplete list). It is with immense gratitude that I recount the opportunity to study with
Finally, everything is more possible with the support and care of those closest to me.
Mikel Wadewitz has been there for over a decade and is unwaveringly kind and supportive.
Mikel, Sarah Florini, Rowan Q. Buckton, and Liz Dietz have heard so much of this project,
both in and out of context, and have supported me through rounds of despair, enthusiasm,
bravado, and exhaustion. I received extra care from my professional collaborator turned wife
Molly Morin. I strongly recommend falling in love with and moving in with and then getting
engaged to the best artist and one of the kindest people you know during dissertations.
v
TABLE OF CONTENTS
Page
LIST OF FIGURES.................................................................................................................................. iv
INTRODUCTION.................................................................................................................................... 1
CHAPTER
Lovense............................................................................................................................... 72
Conclusion ......................................................................................................................... 91
Analysis .............................................................................................................................112
CONCLUSION......................................................................................................................................117
REFERENCES ......................................................................................................................................123
vi
APPENDIX Page
vii
LIST OF FIGURES
Figure Page
14. An Image for the Pear Flower, an Internet-Connected Sex Toy Made by Qiui.. 69
25. A One-Table Model Showing the Attributes of a User Entity ............................ 103
26. A Two-Table Model, Promoting the Gender Attribute to an Entity .................. 103
27. A Two-Table Model, with Context as an Attribute to the Gender Entity ......... 104
28. A Four-Table Model, in which Conext has been Promoted to an Entity ......... 105
29. A Five-Table model, Adding Name to the Gender-Context Entity .................... 107
30. An Order of Operations for Entering Data into the Full Data Model ............... 108
................................................................................................................................... 110
ix
INTRODUCTION
A few years ago, I planned to spend my December holiday break watching movies.
The weather was terrible, driving conditions a little treacherous, and my aging dog was
having an increasingly hard time in the snow and the cold. After the end of the semester, the
dog and I settled into the couch together and I logged in to Netflix to find a movie. Among
other holiday recommendations, Netflix suggested I watch A Madea Christmas. It had been
years since I had seen a Tyler Perry movie; and, after watching A Madea Christmas, I spent the
whole break idly watching all the other movies in the Madea franchise.
It is by now a joke that once Netflix “thinks you're Black” (i.e., you watch several
movies with majority-Black casts) it begins surfacing other movies with all-Black casts. For
months after the break, Netflix did just this, recommending an immense amount of Black
content. This was partially a delight—I had forgotten how many movies Queen Latifah has
been in—and partially a curiosity. Where had these movies been? I watched the algorithm try
to reconcile my Madea marathon (and subsequent Queen Latifah binge) with other escapist
mid-semester viewing habits, like Star Trek: The Next Generation. Prior to this, Netflix did not
recommend these movies to me. Was it my whiteness, or the fact that I mostly use Netflix as
background noise filled with childhood science-fiction nostalgia? Netflix asserts that this
recommendation algorithm is not based on the racial or ethnic demographics of its users—it
claims to not track that data. Indeed, I suspect the initial Madea recommendation was based
On the surface, it is not a problem to recommend movies like the ones people have
already watched. (What exactly makes Set it Off similar to Madea's Family Reunion?) As many
people have pointed out, this behavior is problematic not because it serves Black content to
viewers who have already watched Black content, but because it does not serve that content to
1
viewers who have not. This algorithmic result reinforces “white movies” as normative and
popular, and “Black movies” as niche interests that only warrant recommendation once
someone has indicated a desire. However, Netflix claims no ill-intent, only that its machine
learning algorithms are designed to foster interaction: to get users to click on thumbnails,
read synopses and ultimately view content. The machine learning algorithm is only
reproducing its inputs—people who have watched Tyler Perry movies have also watched
But Netflix, like all software, is more than just the complicated sets of instructions
we call algorithms. Netflix also has immense amounts of data, huge databases, and data
storage, all connected through complex workflows and managed both by automated systems
and individual people. There are many opportunities for biased, unjust, or otherwise
problematic outcomes to manifest. In this project, I focus on what Paul Dourish calls
“algorithmic others” -- the other parts of software that contribute to its form and
functionality (Dourish, 2016). Of these algorithmic others, I focus even more specifically on
data models. Data models are sets of design documents that, among other things, determine
how a database is constructed and how data moves through a software system.
The options available on forms (particularly web forms) are a common way that
individuals may experience a software's data model firsthand. Consider the banal act of
filling out a registration form on a website.1 The ways that website forms request data is
typically linked to the site's underlying data models. In other words, the web forms request
data in a format to match what the data model is expecting. When the data model includes
1
Discussing whether or not a website counts as “software” is out of scope here. For my purposes, it absolutely
does.
2
fields for a first, middle and last name, those fields are echoed in the first, middle and last
The creation of first, middle and last name fields reveal the biases and assumptions
of the data model designers. Restrictions on the length of a first or last name (both
minimums and maximums) entered into fields are racist manifestations of US-based
expectations for Anglo-Saxon names (Wachter-Boettcher, 2017). When a web form’s field
length restricts a last name field or provides only an Anglocentric “first, middle, last”
configuration it is typically because there are database fields expecting those three values
(and only those three values). Those database fields are in turn created in response to data
models (i.e., high-level database design documents) designed to track all the information that
the system needs to store. However, unlike other aspects of computing (like memory storage
limitations or even the foundational binary computing language), data models are flexible
and very often a “blank slate” opportunity for software development teams to model the
aspects of the world they wish to monitor and, within racial capitalism, monetize.
This dissertation takes data models as its focus, and throughout I offer my main
argument: that data models are important social and cultural artifacts worthy of scholarly
study. Furthermore, when we look at data models, we will find that they support existing
systems of power. Data models are also crucial sites for intervention in the unjust or
undesirable outcomes of software systems. I approach data models in several ways. First, I
locate them within the process of software production, demonstrating their foundational
role in the built software product. Second, I critically analyze early relational data modeling
texts, demonstrating how techniques and epistemologies of data modeling support systems
of power. In addition to studying the techniques of data modeling, I use reverse engineering
to peel back the layers of software and demonstrate how to examine data models within
3
published commercial software. Finally, I use data models as a form of intervention in
software systems, using techniques of relational data modeling to both explore ways of
subverting power and to make transparent the ways that data models reinforce existing
understanding of software.
Theoretical Frameworks
Science and Technology studies (STS) is a foundationally interdisciplinary field. My
work leverages that interdisciplinarity and combines scholarship from STS, critical data and
critical algorithm studies, surveillance studies, infrastructure studies, whiteness studies and
I hypothesize that within the design of data models, we will find that data structures
match the designers' visions both of the world they are co-creating and the users they are
courting. While there is voluminous scholarship about algorithmic matters, there is relatively
less so on the oppressive potential of data models.2 I pull together elements of the fields
mentioned above to craft an urgently needed project—a technosocial investigation into the
ways that data models can both support structural power and be used to subvert it.
technological systems are political, value-laden affairs (Winner, 1986). The values
represented are those of the creators, funders, and engineers; and, under the social
construction of technology model, society itself (Pinch & Bijker, 1984). While software
2
For more on algorithmic ethics see Mittelstadt et al. (2016), algorithmic regulation: Lodge & Mennicken
(2017); algorithmic violence: Owen (2015), algorithmic auditing: Sandvig et al. (2014).
4
engineering communities do not always consider the “politics” of their artifacts,
conversations about the human impacts of computing technology are nearly as old as the
technology itself.3 Recently, the software engineering industry has been receptive to
scholarship about the social impacts of their products and the ways that technology can
create injustice and produce disadvantages for groups of people. Work documenting the
unjust outcomes of technologies like predictive policing (Ferguson, 2017), facial recognition
systems (Gates, 2014), and housing algorithms (Eubanks, 2018) has encouraged
the solutions are technical, and specifically algorithmic. For example, data scientists can train
models to react “fairly” using “Variational Fair Autoencoder (VFAE). VFAE characterizes
'fairness' as a representation that is explicitly invariant with respect to some known aspect of
the dataset” (Xu, 2019). The use of algorithmic interventions to correct for “unfairness”
centers the disadvantage experienced by individuals and sidesteps the structures and systems
that produce dis/advantages. As Anna Lauren Hoffman notes, interventions to correct for
disadvantage (to make systems “fair”) “limit us to solutions that are, at best, reactive and
superficial” and occlude the underlying structural logics that are foundational to the
the systems and operations controlling that automation makes sense. Importantly, AI has
received immense focus in part because the algorithms within AI systems seem inscrutable.
But AI does not come from nowhere—systems are trained on large datasets—datasets that
3
See Baldwin & Shultz (1955) and Hoos (1960) for early business examples and Dow (1949) for an early
education example.
5
are simply an extension and high-volume content population of human-made data models.
This project situates data models as a key location in which the infrastructure of white-
cisness, which I discuss later, is maintained. Broadly, I am investigating the ways that we
think about data structures, use data structures in practice, and speculate about data
structures.
central STS concerns. Scholars like Alondra Nelson (2016) and Dorothy Roberts (2009) have
“traditional” STS projects like the production of knowledge and the governance of emerging
technologies. Their work goes beyond “adding” race or gender to analysis of sociotechnical
concern; they combine foundational critical race theory with framings of research informed
by issues of justice and social inequality. Building on their example, this project combines the
insights from critical race and critical trans scholarship with the decades of critical work
produced by STS touchstones like Judy Wajcman and James C. Scott. Furthermore, this
work augments ideas of using algorithmic interventions to correct for disadvantage, offering
instead opportunities for data modeling-focused intervention to correct for data models that
While this work's first audience is academic STS and critical data studies scholars, a
experience as a software engineer and tech speaker, I envision presenting this work to
engineers both in technical contexts and in more casual “engineering + society” contexts.
With the structure of this project, I am pushing against the split between “engineering” and
this separation and I offer a project in which engineering approaches can be social science
6
approaches, and engineering methodologies—legible to industry and academic engineers—
can be used to generate insights more commonly found within social science projects.
Thinking about this division, this work can act as a bridge between social science/humanist
and engineering audiences, engaging in work meaningful and familiar to both. Further, I
believe that the identification of specific technical methods that mark and support white-
cisness within data structures can open space for creative and engineering action.
systems work in the United States is underexplored in conventional STS scholarship and
often within STS departments. Michael Mascarenhas, drawing on the work of sociologist
Elijah Anderson, describes STS as a “white space”: “white spaces are not just about counting
but rather accounting for the ways that the dominant culture adopts a white habitus that is
indicting STS departments for their overwhelmingly white and (cis) male faculty.4 However,
he also performs some accounting—accounting for the ways in which we must acknowledge
the “persistence of legacies of white male supremacy and Eurocentrism in the work of even
progressive STS scholars” (Mascarenhas, 2018, p. 154). I follow this call for accounting in
4
Arizona State University's “STS Program” is my home department—the Human and Social Dimensions of
Science and Technology (HSD) program. HSD fares well on (binary) gender lines, with 48% female faculty,
but poorly along racial lines with 14% faculty of color, and 5% underrepresented faculty of color (which
Mascarenhas categorizes as “Hispanic, Latino, African American, Native American and Pacific Islander”).
However, HSD has a wide variety of affiliated faculty, and it is unclear how he is determining who “counts” as
faculty within the department.
7
Critical Data and Algorithm Studies
Within critical studies of data and algorithms, I draw on several important
conversations.5 First, the field does important definitional work for terms like “data” and
“database.” As data becomes an increasing focus of academic study, the term can become
dangerously universalized to the point of being less useful. I heed Angèle Christin's (2020)
call to break the monolithic concept of data into its constituent parts in order to facilitate
more precise study. In this work, I look not at individual datum, nor explicitly at any logics
of “Big Data.” To a limited extent, though I do engage with the power of data as
representational form, I do so not from a semiotics perspective, but grounded within STS
approaches. I follow Joanna Drucker's assertion that data is perhaps more accurately called
capta, material that has been taken and constructed (Drucker, 2011). Approaching data as
capta is in opposition to any work that assumes that data is an objective “given” to be
gathered (Gitelman, 2013). When we reorient ourselves to viewing data as particular pieces
of information sculpted and crafted to reinforce or support a view of the world, we are then
better prepared to think of the models that demand that data as themselves constructed. In
other words, the data models are the square holes training us to look for square pegs.
academic conversations about databases, the database is not clearly defined. In this work, I
follow Dourish's useful three-part definition. A database is, first, a collection of data. This is
often how the term is used by digital humanists and others thinking capaciously about things
that can be “read” as databases (Thomer & Wickett, 2020). A database is, second, an
organized collection of data items. Importantly, the data is organized according to a set and
5 While I do not engage directly with foundational critical data studies scholarship in this project, my thinking
here has been greatly informed by the works of boyd & Crawford (2012), Kitchen & Lauriault (2014), Kitchin
(2017), Iliadis & Russo (2016), and Richterich (2018).
8
predefined schema, and all data in the collection conforms to the schema. Finally, a database
database management system implements (2) a database, which organizes (1) a collection of
data” (Dourish, 2014, p. 10). When I use database, I am collapsing all three aspects of this
My work diverges from much of software studies and the social study of
technological harms in three important ways. First, I focus on not algorithms but on data
social harms that algorithms can produce. In Algorithms of Oppression, Safiya Noble explored
the ways that Google's search results—created by search algorithms powered by complex
machine learning and artificial intelligence systems—reinforce systemic, and specifically anti-
Black racism. Similarly, Virginia Eubanks (2018) demonstrated that the automation of social
infrastructures like child protective services calcified existing inequalities into technological
systems. These important works are tone-setting explorations into the human impacts of
algorithmic choices. However, the noisy focus on algorithms often overshadows the
importance of studying the data structures that support the algorithms themselves. I envision
this project acting as a support to broader algorithmic studies works, offering tools for
theorizing, reverse engineering and speculating about data models and I undergird my work
with the important conclusion that data models structure algorithmic opportunities.
Before algorithms can be written, tested, or deployed, there must be data models to support
them and these models determine the shape, content, and fluidity of the data generated,
gathered, and stored by software programs. Rather than engage in algorithmic audits and an
infrastructure.
9
Second, I focus not on the perpetuation of injustice, but on the maintenance of
sociotechnical systems. This is vital work, helping people who do not experience these
harms to begin to understand how software can have widespread, lasting social impacts.
However, the material conditions “brought to light” by this scholarship does not reveal
anything that those experiencing it do not already know. From this privileged position, there
marginalized in the US today: those who do not in one or more ways fit within normative
classness, etc. A focus solely on disadvantage disregards the structures and systems that
create advantage (Hoffmann, 2019, p. 904). Structures of racial superiority are intertwined
with structures of gender policing. As I have explained, the United States is a system run by
white people for the benefit of white people. A benefit is defined as a profit, gain or
n.,” n.d.). Thus, rather than point to non-whiteness and non-cisness as a problem, I point to
advantage. Importantly for me, that capaciousness includes experiences of joy and delight. In
10
my work, and my life, I center the creation and pursuit of joy as a vital resistance to
Rather than contribute to the reification of marginalized life as one of trauma and
systemic violence (though there are many varied and valid cases why this link is also true) I
want to move toward normalizing the view of white/cisness as one of unearned and
continually reinforced benefit and ease. This shift opens space for the lives of those of us
marginalized in one or more ways to be joyful and resistant and for our critical lens to shift
towards white-cisness.
Infrastructure Studies
Foundationally, I follow Bowker, et al.'s description of infrastructure as “a broad
category referring to pervasive enabling resources” (Bowker et al., n.d., p. 3 emphasis in original).
The pervasiveness of infrastructure tells us that there is no point at which it ceases to exist or
have influence. A pervasive infrastructure is one that is, in some senses, self-healing:
fractures within it are mended by unfractured parts, maintaining its existence. Enabling has
two important senses here. First, to enable means to cause to exist or to bring forth.
Secondly, to enable means to make possible (i.e., to facilitate rather than originate). In both
senses of the word, infrastructure enables the conditions of its politics. Perhaps most
infrastructures are value-laden and fundamentally relational (Star, 1999), what does this mean
6
In this, I draw on the work of Brock (2020), brown (2017), Lu & Steele (2019), Muñoz (2009) and long
legacies of Black, trans and Black transfemme resistance.
11
In this project, I treat infrastructure from both an infrastructural studies perspective
emphasize the products that emerge from practice. While driven by ideology, politics and
infrastructures; to the practices that provide pervasive enabling resources. In computing, this
includes not only the servers on which software are executed, but pervasive resources that
infrastructure as encompassing pervasive enabling resources, we are then able to think more
Americans, the white-cis supremacist infrastructure of the country is largely invisible and one
“infrastructural inversion” (G. Bowker, 1994): the act of making visible that which is
invisible. However, the infrastructure of white supremacy has always been visible to people
who are not white. Philosopher Jonathan Flowers notes that the idea of “making visible” or
7
Thanks to Liz Dietz for surfacing Flowers's work.
12
what Catherine D’Ignazio and Lauren Klein call the “privilege hazard”: “the phenomenon
that makes those who occupy the most privileged positions among us...so poorly equipped
to recognize instances of oppression in the world” (D’Ignazio & Klein, 2020, p. 29).
whiteness and the Other -- indeed, whiteness cannot exist without others it can point to as
“not white”. As David Roediger has argued, in the United States, Blackness has served as the
primary boundary for whiteness (Roediger & others, 1999). The US constitution and its
positions of Black and Indigenous people. Notably, enslaved Black Americans were counted
Oritz notes that Indigenous people were implicated in the second amendment to the
constitution. The right to bear arms is rooted in the history of male settlers joining militias
“for the purpose of raiding and razing Indigenous communities” and for joining “slave
patrols” (Dunbar-Ortiz, 2014, p. 80). These patrols, a precursor to modern policing, were
designed to surveil and police enslaved (Black) people who were attempting to move
towards their own liberation. From its infancy, the infrastructure of white-cis supremacy was
a system of pervasive resources enabling the control and monitoring of the movement of
Surveillance
“[w]hiteness wants to see everything but its own operations.” (Fiske, 1998)
gendered control (Browne, 2015; Lyon et al., 2007; Beauchamp, 2019). Historically,
surveillance in the early United States was part and parcel of chattel slavery. From the design
13
of ships to transport kidnapped Africans to early “slave patrols,” surveillance in public has
often been directed at non-white people (Benjamin, 2019a; Browne, 2015). Of course, public
surveillance also reinforces gender normativity. In the US in the late nineteenth and early
twentieth century, it was common for state and local legislation to enforce gender norms by
requiring individuals to wear a certain number of items matching their “sex.”8 This focus on
public surveillance does not mean that surveillance's racializing and gendering effects do not
also happen at home. The Ring doorbell serves as a contemporary example of the ways that
crossings, decisions about who gets WIC (Eubanks, 2018). These locations, rich sites of data
making-gathering, may feel removed from the daily lives of many white Americans.9 These
sites also reinforce the relationships between marginalized people and society's edges. The
contrast between the home and the jail cell is one that privileges perspectives uninformed by
house arrests and case worker visits. It is a perspective that disregards the monitoring of
monitoring continuity between sites of administrative violence and the home, and in Chapter
8
For a history of this law in San Francisco, see Sears (2015). For more on the surveillance of trans people, see
Dinh (2003), Fischer (2019), Bender-Baird (2016).
9
A series of surveys on the impacts of incarceration on families reveals that while only about 10% of white
people have experienced over-one-year-long incarceration of family members, 31% of Black people and 29%
of Indigenous people have Enns et al. (2019). Broadly 63% of Black people vs 42% of white people have
experienced the incarceration of a relative.
10
See Woodard & Mastin (2005) for an exploration of the racialization of “welfare mothers” in the context of
other stereotypes about Black women.
11 In addition to the works cited, my thinking about surveillance has been greatly influenced by Monahan
(2010), Levy (2014), Genosko & Thompson (2016), Rapoport (2012), Green & Zurawski (2015).
14
“smart objects” like internet-connected sex toys, to be surveillance devices, invited into the
obstruction, and to move through the world without being forced to confront one's race
(Ahmed, 2007). As Charles Mills notes, this ability implies the “existence of a system that not
only privileges whites but is run by whites, for white benefit” (Mills, 2004, p. 31 emphasis
added) What Mills is describing here he calls “white supremacy,” though others have
described it as “white privilege” (Sue, 2006) or, simply “whiteness” (Garner, 2007). The
“white benefits” that Mills references can take many forms. Based on studies performed
within the last 10 years, white benefits can be economic: the median white family has
$143,000 more wealth than the median Black family (Institute for Policy Studies, n.d.);
juridical: white arrestees are incarcerated at 20% the rate of Black arrestees (Rovner, n.d.). In
one example, white children experience asthma death 500% less frequently than Black
children (Lavizzo-Mourey & Williams, 2016). There is, of course, no biological reason that
white children would experience less asthma. There are, however, pervasive enabling resources
directed to keeping white children alive. Factors such as the locations of hospitals, the
affordability of asthma medications, the quality of air, the economic and housing mobility
are all part of the infrastructure of white-cis supremacy that benefits, in this example, white
children. Of course, this is not necessarily an intentional choice by white parents, many of
whom may still experience class or other marginalization. The system of white supremacy is
not one related to the moral character or agency of the individuals experiencing the benefits
or harms; it is a system of social control. Note that I intentionally choose and overload
15
technology words—system, infrastructure, reification—to make concepts visible to
engineering readers, and as an extension of Wendy H.K. Chun's work identifying race as a
technology (Chun, 2013). Additionally, Tara McPherson (2013) has done a deft job of
demonstrating the ways that spatial and social (read: racial) logics of gentrification influenced
the architecture of modular operating systems like Unix. This is reminiscent of what Junaid
Rana has termed “racial infrastructure”: “a spatial formation in which the social, political,
and economic relationships of racial systems operate through dominance and discursive
look only at whiteness or only at cisness not only elides the ways that individuals can
experience benefits from both systems, but also ignores their historical connection. A. Finn
within a binary gender system buy a privileged pass to social existence,” especially when,
among other things, one is white (Enke, 2012, p. 64). Gender alignment (cisness) has
historically been the restricted right of white individuals (Green & Bey, 2017), and as C. Riley
Snorton argues, the creation of trans identity was ultimately a racial narrative requiring the
For the purposes of this project, I offer a parallel heuristic for gender, extending
Mills' definition of supremacy to identify masculine supremacy and cis supremacy. Current
infrastructure that not only privileges masculinity but is run by men, for male benefit. This
among myriad other things gender pay gaps and disparity in disease research. Useful here is
16
that “masculine supremacy” does not force us to think in terms of binarist gender
frameworks (of men vs women), but instead allows us to conceptualize a world in which
benefits are received by “men” and not received by “everyone else.” In other words, a world
in which cisgender men control, and benefit from the control of, pervasive enabling
as outside those norms does not automatically mean that one experiences disadvantage, but
That there can be a loss of benefits for even perceived gender misalignment points
to the existence of a system that not only privileges cisgenderness but is run by cisgender
people, for cisgender benefit. This system is cis supremacy.13 To be cisgender (or “cis”) is to
experience alignment between one's gender and the sex one was assigned at birth. Thus,
cisgendered men who conform to gender norms are the express beneficiaries of masculine
supremacy. There are, of course, men who are both cisgendered and gender non-
conforming. This murky space—in which one can experience alignment but be perceived to
be out of alignment—is why some trans scholars reject the use of the word cisgender as
falsely universalizing. In his work on trans people and surveillance states, Toby Beauchamp
declines to use “cisgender” because it “centralizes a form of gender privilege that emerges
through normative race, class, sexuality and ability” (Beauchamp, 2019, p. 12). For
Beauchamp, this centralization makes “cis” inadequate for discussing the repercussions of
12
Within gender studies and queer studies, there is rich scholarship documenting the disparate treatment
between masculine (often called butch) and feminine (femme) presenting women in queer communities. See
Bauer (2008), Bergman (2009), Halberstam (1998), Levitt & Hiestand (2004), Rubin (2013).
13
We can extend Mills definition to include disability, citizenship, reproductive status, and other labels of
identity that mark one as non-normative.
17
gender deviations within state surveillance regimes because gender transgressive cisgendered
individuals still receive harm. However, it is this centralization of gender privilege that is key
here and that makes the term so useful; that cis people can be harmed by cis supremacy does
fields. Cis supremacy quickly becomes evident within data models and help us see the power
of data models. Trans and gender-nonconforming people often experience friction when
confronted with forms requiring gender selection. Recently, transgender activists asked to
expand the number of gender options on websites, especially popular social media and
online dating sites. In response, in 2014, Facebook increased the gender options that users
can select from 2 (M and F) to 58, and this change was hailed as a win for all trans, gender
transgressive and gender expansive users, finally able to use Facebook without having to lie
about their gender identities.1415 However, what felt like a victory for trans activists was short
lived. As a result, Facebook mapped all of these 58 genders back into a binary M/F in order
to send data to their advertisers, who were not prepared to receive 58 different values for
users, frankly, seems like a feature and not a bug. I give this example not as a
problematization of this data failure (as Rena Bivens notes in the work), but as an instance of
the ways that data structures (and choices at the physical data level) can impact the ability of
14
To be clear, some trans people feel included by binary M, F options because they do not wish to disclose
their trans status and/or feel represented by simply an M or F.
15
I use “trans” to describe the range of individuals who understand their gender to be out of alignment with
the gender they were non-consensually assigned at birth. Gender transgressive and gender expansive
individuals may share “trans” experiences or experience repercussions from their own (intentional or not)
gender performances, but do not explicitly identify as trans.
18
software systems to perform inclusivity. Recently, popular dating application Tinder
expanded their gender options, after transgender users reported experiencing high levels of
harassment when using the app. However, as Anna Lauren Hoffmann notes, the expansion
of data options does not immediately make a platform safer for its users. In Tinder's case, the
themselves: “such 'inclusive' solutions evade accountability and instead 'empower' individuals
to regulate their own safety and visibility” (Hoffmann, 2020, p. 17). While Facebook's
change may have been hailed as a win (and then a failure), and Tinder's as (in part) an
abrogation of responsibility to its users—in both cases, the choices of data structures have
Methods
In this dissertation, I use a variety of methods, several of which will be less familiar
to the reader. As a result, I review two key methods at length—reverse engineering and
I also engage with narrative and autoethnography. All of my methodological practices are
Reverse engineering
Reverse engineering is, broadly, the process of uncovering how something was
made.16 Colloquially, to say “I reverse engineered it” implies that you were not the person
who made “it” and that “it” is a thing whose contents, workings, dimensions, and materials
16
Reverse engineering is not an idea new to computers. The first documented case of reverse engineering
might have been Egyptians finding a chariot and trying to make one for themselves (Kahveci and Okutmuş
2015 in Neilson, 2020). During WWII, British and American troops reverse engineered German water cans,
called Jerry cans (Srivastava, 2015).
19
can be known, or at least something akin to understood.17 Within this project, I use reverse
engineering as an umbrella term for activities designed to unpack or tease out the
construction of a technological system (Dang, 2014). These activities are performed to help
those who did not create the software discover “the true purpose and actions of code”
Rekoff (1985) compares the process of reverse engineering to criminal detective work,
gathering clues to infer intent and reconstruct motivations and actions (but not necessarily
recreate the product). When working outside of a reproductive goal, there are no set steps
involved in reverse engineering, nor a universal point at which the process is complete. In
the layers involved in the software and the specific orientations of the engineer. For
approach: looking at all available points of vulnerability and developing tools to insert
malicious code into the system. A performance-oriented reverse engineering project might
take a stress-test approach: overloading various subsystems, watching errors, and decoupling
systems when they fail.18 For this project, reverse engineering is the disassembly of software
with the intention of learning more about one or more aspects of the system and without a
17
Whether or not a reverse engineered object can be known and understood is an ontological and
epistemological question that I leave to others.
18 Indeed, there is robust scholarship on different approaches to reverse engineering. For reverse engineering
design patterns see Shi & Olsson (2006), Dwivedi, et al. (2018); for deobfuscation of code, see Udupa, et al.
(2005), Yadegari, et al. (2015). Many innovations within reverse engineering processes were developed in the
IEEE Working Conference on Reverse Engineering (1993 – 2013).
20
This approach is in conversation with work around what Ruha Benjamin calls
“critical race code studies” a combination of critical race studies and STS insights to look at
what she describes as “The New Jim Code”—a sociotechnical regime of injustice towards
Black Americans. She relies on the anthropological practice of “thin description” as a way to
embrace the uncertainty and unknowability of systems that are, in part, within anti-Black
black boxes (Benjamin, 2019b). The literature of reverse engineering may not use the same
terms, but it is also one of unknowing. Engineers use reverse engineering techniques
specifically because they cannot know how a system was constructed. Reverse engineering is
a “technical-first” set of practices. This makes it highly complementary to the work done by
STS, critical algorithm, and critical software researchers. Although these works often start
with the social impacts of a technology and work backwards to find a source, reverse
engineering (despite its name) starts with the artifact working forwards to understand how a
oriented, as I am working to discover the physical data model within the software.19
However, it is also expansive. Rather than beginning with the technical specifications, I
begin the reverse engineering with close reading. Understanding the product and the way it
Coming out of the Italian design movement, designers put ironic or satirical spins on
19Premerlani & Blaha (1993) introduced a reverse engineering methodology specifically for relational databases.
Theirs is an important early work, though not accessible for this project because I did not have access to the
software’s database.
21
common objects to defamiliarize them and provoke deeper viewer engagement with the
socio-techno-political factors of the moment in which the design exists. (Malpass, 2013,
2017).
Terms like speculative design, critical design, experimental design, design fictions all
point to the same general practice: designing for worlds that do not yet exist (but perhaps
should). According to James Auguer, regardless of the term used, there are three important
characteristics to this speculative practice: “remove the constraints from the commercial
sector that define normative design processes; use models and prototypes at the heart of the
enquiry; and use fiction to present alternative products, systems or worlds” (Auger, 2013, p.
1). Design objects, those artifacts produced by speculative design, can themselves “co-
ways, proposing new connections and distinctions, and embodying design ideas and
processes across time and minds” (Bardzell et al., 2015, p. 1). The boundaries of design
practices have been variously defined to include technological practices like data modeling,
programming, or interface design. For my purposes, creating a false division between design
perform critical speculative work using the tools of (software) engineering in addition to
tools of design.
removed from capitalist gains; however, there is no clear method for extracting one's work
or one's mental model of work from capitalist paradigms. According to Sasha Costanza-
Chock, we are all designers; everyone designs. However, the power dynamics arise when
22
(Costanza-Chock, 2018).20 Corporations and capitalism have had an immeasurable impact on
Rather than ground my work squarely in the literature of design, I take Phil Agre's
critical technical practice and combine it with speculative design ideas to form a critical
choices. Rather than work in a future in which everything is flexible, or a world influenced
by science fiction, I am not working to speculate at the far future, but to speculate at
technology for the very near future—technologies that could be implemented now, given
our current infrastructure. In this way, I am not working outside of capitalist expectations,
but within them. Rather than build for futures that may come, I am thinking about futures that
are likely to appear, and the ways that I can transform them slightly but significantly—to bias
accountable. Having a critical technical practice and a strong near-term temporal orientation
means that my work is in conversation with people living and making in the present with the
tools we all have. I have long experience working within design and technical spaces as a
While I write from the position of a tech insider, I also bring a perspective informed
20
This separation into the professionalization of the work is reminiscent of David Noble's work on the
professionalization of engineers in the United States (Noble, 1979). His work highlights the role of
corporations in the creation of the white-collar engineer
23
identities as a white, queer, trans person in the United States. My mother is a first-generation
American, with both of her parents coming from Italy. She raised me as single parent in Las
Vegas, Nevada where she worked as a casino dealer. I never met my father, who is now
deceased. His family has been in the United States for centuries and is filled with generations
of doctors and high-ranking military men in the US South. I assume the worst about my
father's family (an extended family that declined further contact with me when they found
out that I am neither cisgender nor heterosexual) and assume that that their generational
wealth was accumulated not only by way of being white in the US, but also by participating
colors my experience of every other aspect of my identity, and I treat it as both a banal
factor of my existence, and a powerful location of advantage that, for many years, I was
femininity or androgyny, and decline to call myself a trans masculine or a butch person. I
am, ultimately, a person who wishes to have no gender at all -- the idea that a person can feel
steady and aligned in their gender is a wildly foreign concept to me. Ultimately, any medico-
Given the tendency for white trans people to center their oppression in their personal
illegibility creates more conditions for livability because of my whiteness (and other
tension, as I experience both benefit and harm from cis-white supremacist infrastructures.
24
I will likely not see the destruction of white-cis supremacy in my lifetime. I am
grateful to the scholars dreaming abolitionist futures, and also acknowledge that those
futures are not implementable (by me) at this moment. The communities to whom I am
accountable deserve both inspiring visions of futures and near-presents we could be having,
There is, of course, a tension between approaches that are radical and innovative and
those that are implementable given current technical and social limitations. Because I feel
continually navigating the tension between radical and visionary theoretical approaches and
those that could be built tomorrow by someone with an appropriate technical skillset. As a
person who came out of software engineering communities and open-source communities
and leaving those communities to get an education and produce knowledge that is not usable
Outline of Chapters
Working through an extended code example, in Chapter 1, I demonstrate the
interrelationship between data models, algorithmic approaches, and written code. I argue
that data models structure algorithmic opportunities. Without certain configurations of data,
algorithms cannot function, and I identify data models and data structures as software
network databases. Because of the rigid structured nature of these databases, all databases of
a certain type were similar and no particularly advanced levels of visualization were required.
25
As a result, the introduction of the relational paradigm was a key moment. As the relational
model grew in popularity, so did the techniques of creating and visualizing data models.
cisness. These moves happen in data models, and the chapter concludes with an examination
In the United States, surveillance is a key mechanism for the maintenance of whiteness and
cisness. Government and police surveillance co-produces Blackness while medical and social
continuum of surveillance locations extends into the home, and I argue, the surveilled home
I reverse engineer the data model of a series of home sexual surveillance devices.
or tease out the construction of a technological system. I outline four key steps to reverse
engineering Android applications and demonstrate the artifacts that result from each step.
After penetrating the application and examining the code, I demonstrate the inferential
Rather than tidily point to aspects of the data model and speculate about the
that occur within datafied white-cis surveillant infrastructures. Within the application, I
demonstrate the ways that reification, generalization and decontextualization work together
within these devices to promote and maintain world views consonant with white-cisness.
critical technical practice. I discuss the ontological commitments of relational data modeling,
26
specifically the choice of what concepts and objects are promoted to entities within the
model. Using trans lives as a starting point, I explore the speculative possibilities of using
reification to create more livable and workable data models for gender. I consider the
disconnect between the way that gender is represented in data models and lived experiences
of gender. Speculative engineering is the work of designing for worlds that do not yet exist
(but perhaps should). Informed by the technical moves in Chapter 2 and the result of reverse
engineering, I offer a data ecosystem that neither elides nor cements gender.
27
CHAPTER 1
MAKING SOFTWARE
In the mid-1960s, before E.F. Codd introduced the relational paradigm that would
shift databases and data models, programming was experiencing a “crisis.” Large software
projects, usually sponsored by the military, were chaotic, behind schedule and over budget.
Programming projects needed more structure, and that structure started in part with a
In 1968, the NATO Science Committee sponsored a conference that created the
term “software engineering,” to elevate the creation of software to the status of other white
collar (mechanical, electrical) engineering tasks. One attendee noted that “Programming is
still too much of an artistic endeavour” and advocated for greater structure in the education
and practice of the work. Another decried the language used to talk about software --- words
like “elegant” and “powerful” were too “fuzzy” for their vision of what software engineering
could become (Gehl & Bell, 2012, p. 11).21 Like other engineering tasks with discrete steps,
as a result of this conference and other meetings like it, software engineering also became a
discipline with discrete steps designed to control the project and make it appear predictable
and controllable.
In this chapter, I describe the discrete and “repeatable” steps that comprise one
Understanding these steps is an important first step to recognizing the importance of data
modeling in the outcomes of software. The SDLC illustrates the connections between data
21
There is still debate over whether software engineering is a proper subset of engineering. See Davis (1991)
for one perspective.
28
models, databases, programming choices, and the resultant software.22 In order to
understand the importance of data models, we must understand the lifecycle that is used to
make software, and where data models sit within that lifecycle. The ways of making software
are invisible to most, and data models in particular can be a difficult thing to conceptualize
After discussing the SDLC, I move through a technical example of the ways that
data models and data structures impact code. Through small code examples I illustrate the
ways that data structures create algorithmic opportunity, a crucial part of my argument that
data models require far more scholarly inquiry than they are receiving at present.
sometimes by the team(s) that created it, sometimes through abandonment or server
shutdowns and data destruction. Thinking about software as part of a process, the SDLC, is
important for several reasons. First, the SDLC is a reminder that software is created by
people in a highly iterative process into which intervention is possible at any moment.23
Second, it directs our attention to the ways that software “products” unfold and are created
over time, an important lever for analysis here. The creation of software depends on linear
and teleological notions of time - perhaps without those, software creation (and even
software idealization) could not exist). Third, the SDLC offers an opportunity for the
uninitiated reader to learn the steps involved in the creation of a “software product” and the
ways that each step is itself a process. Finally, the SDLC aligns with descriptions of violence
22
In this work I use the term “software engineering” to refer to all tasks required to produce software, and
programming to refer specifically to the act of typing code.
23
I explore one form of possible intervention in Chapter 4.
29
as processes --- not as one-time events, but as experiences and structures continually
The steps of the SDLC are variously conceptualized, but generally break down into
three categories: planning and design; building and testing; and publishing and maintaining.24
Importantly, although these steps are linear (one cannot publish before programming, nor
program before at least a little bit of design) they are not tied to a particular time domain or
iteration cycle.25 This means that a team could complete a full SDLC in a weekend or take
many months for only one step. For my purposes, I use the version SDLC that I used during
Figure The six phases of the software development lifecycle
my software engineering career. As I step through the process below, I draw
(SDLC).
narration happens “from the inside,” informed by the influence of Phil Agre’s critical
technical practice.
The six steps of the SDLC are Plan, Analyze, Design, Implement, Test & Integrate,
Maintain (refer to Figure 1).26 For the simplicity of the text below, I will write as though the
software is being engineered by an internal teams made of full-time employees and that each
team has a sub-team to lead it. Thus the “Software Team” comprises a Planning Team, a
Design Team, an Implementation Team, etc. It is worth noting that on many projects, there
are no individuals who are on more than one team other than high-level project managers
24
See Ruparelia (2010) for a review of other ways of modeling the software development lifecycle.
25
This signals an important separation between the development lifecycle and the development methodology The
lifecycle indicates the steps that one must complete, and the methodology determines how those steps are
arranged with respect to time and staff. Readers might be familiar with terms like “agile,” “standup,” “extreme
programming,” or “waterfall.”
26
Throughout the project, I use hand-drawn diagrams rather than computer-generated diagrams. This is a
deliberate choice to emphasize the human and individual within the software creation process. Indeed, “the
visual representation [of an engineering concept] is a re-presentation, and not the thing itself” (Frankel &
DePace, 2012, p. 3). Thus my manual diagrams and illustrations are not meant to belie the computational
nature of the content, but to amplify the human involvement. My thinking here is also influenced by creative
computation scholars like Taeyoon Choi and others making computational works out of fugitive materials.
30
overseeing the entire project. When work passes from one phase to another, it changes
hands entirely. This is an important consideration when thinking about who can
meaningfully intervene in an engineering problem with ethical stakes (which is nearly all
engineering problems). The division of labor implied by the cycle is by design rather than by
necessity. Teams of one create working and successful software products and/or packages
(the open-source ecosystem is an excellent example of this) and in that situation one person
works as designer, implementer, tester. The larger the product, and the more person-hours
required the larger the team will need to be. As early military-industrial project leaders
communicating the vision of the product itself. The Garmish conference made an explicit
move to separate software engineering from artistic endeavors and to emphasize that
Figure 1
The software development lifecycle.
Plan
The planning phase is the initial phase of a project, designed to facilitate gathering
the project requirements and constraints. In the planning phase, the Planning Team may
31
meet with both internal and external stakeholders A stakeholder is anyone with a “stake” in
the final product. Identifying those who have a “stake” is an arbitrary decision, ultimately
determined by those in power in the process. Stakeholders are frequently those with a
financial interest in the product, though stakes could be emotional, geographical, and/or
political. Within technology production, there is a recent move towards thinking broadly
about who a stakeholder might be and including them as early and as frequently in the
process as possible. As Rozilawati Razali and Fares Anwar note, the identification of and
engagement with stakeholders is often unstructured and unclear. Thus, teams should choose
a specific methodology for identifying and engaging with stakeholders (Razali & Anwar,
2011). Deciding who is a stakeholder and which stakeholder voices will be heard is a highly
political process that can have a significant impact on the final product.
The planning phase is often described as “non-technical” and while budgets and
timelines may be involved at this point, they are not examined in detail. During planning, the
Planning Team may work with other departments to make sure there are available resources
(this is usually another name for people), and to ensure that the project has a mostly
appropriate budget with a mostly doable timeline. A successful planning phase will deliver
Analyze
During this phase, the Analysis Team takes everything delivered from the Planning
Team and begins to make sense of it. Invariably, there will be conflicting requirements. For
example, it is common to have a design stakeholder who insists that the product use flashy
colors and innovative display mechanisms and an accessibility stakeholder who needs simple
high-contrast colors. The Analysis Team turns stakeholder interviews (aka, qualitative data)
32
into structured requirements. The format of structured requirements varies from team to
team, but it is common to use a format called “user stories.” User stories is a method of
gathering requirements and formatting them so that everyone reading understands the
context behind the requirement. User stories take the following format: “As a [ type of user ]
I should be able to [ task ] so that [ business reason ]”. User stories might include statements
as general as “As a mobile user, the page should load in under 5 seconds so that I don't have
every 60 days but not more frequently than every 24 hours in order to meet corporate
security rules.” For a sense of scale, it's not uncommon for projects to have hundreds of user
stories if that's the format they are using, or to have thousands of requirements.
In the same way that the choice of stakeholders can influence the contents of the
user stories, so too will the user stories influence the designs created in the next phase.
However, the Analysis phase does not simply create requirements, it also prioritizes them.
This process is frequently a part of requirements engineering (RE). RE is the highly political
process of deciding which stakeholders' requirements will be built during the current
Design
When people hear the word design, they may think of activities that involve images,
shapes, or colors - designing layouts, graphics, brand images, color schemes. While the visual
design of a software product may be designed during this phase, the data design / database
design also occurs here. If this phase includes visual design (which it often does), a Lead
Designer and their design subordinates will work to put together a series of deliverables for
33
stakeholder approval. The Design Team might provide artifacts like mood boards,
wireframes, and comps. They will then iterate on these artifacts until they have stakeholder
approval. This set of stakeholders might be internal, might not. Frequently, visual design
choices may impact functionality, so it is common for the Implementation Team to work
Design also includes aspects of the software that are not visual, like infrastructure
and data. Depending on the product’s scale, there might be an Infrastructure Design Team.
That team will design the arrangement of servers, load balancers, firewalls,
backups/redundancy, and other storage and security measures required by the Analysis
Team. There will also be a Data Design Team. This team will create data models, separate
from the physical storage requirements of the database(s) in use. The data models created by a
Data Design team is the focus of this dissertation. Like the Visual Design Team, the Data Design
Team's work is subject to revisions by stakeholders and further refinement through iteration
and conversation. Thus, like all software artifacts, it represents not only the clear vision of
Implement
During the Implementation phase, the implementers (usually called programmers,
developers, or software engineers), build it. They spend their days writing code and meeting
small milestones and accomplishing small tasks, as determined by the Lead Developer,
tasks varies in each organization). In implementation, the visual designs become code to
render them, the infrastructure design becomes provisioned servers, and the data design
becomes databases. The implementation phase is usually what most resembles “writing
34
code” in the popular imaginary. If the project was using user stories, the Implementation
Team will frequently complete one user story (or set of requirements) at a time, building
functionality with the context provided by the Analysis Team. In the next section, I explore
the impact of the Data Design team’s work on the implementation of a software product.
Test
During the testing and integration phase, the work done by the Implementation
Team is tested for conformity to the requirements. In most cases, testing is called Quality
Assurance (QA). There are several common kinds of testing: white box testing, in which the
QA team has access to the code; black box testing, in which the team tests the software
without access to the code or server, treating it like a “black box”; smoke testing, in which it
is assumed “If there’s smoke, there’s fire” and the QA team will test basic functionality or
core user pathways like user registration, or shop checkout. Members from the QA team
may have been involved with the project from nearly the beginning, and they often help to
structure user stories and requirements in ways that make testing easier.
It is important to remember that QA teams are not testing for “quality” but for a
QA team member finds that when software conforms to the requirements it has unintended
consequences, that team member may be able to report it to the Implementation Team, but
also will likely not have any power to ensure that a change is made. In other words, by the
time software reaches QA, it is often too late to screen for ethical or social consequences of
a software product. Those consequences are best caught much earlier in the SDLC.
35
Maintain
It is during the maintenance phase that fixes are released (or “bugs” “patched”), and
new features developed and deployed. In many cases, once a product is released and stable,
the maintenance overhead reduces enough to require very little time and only automated
monitoring. During the maintenance phase, significant new features will be developed using
the SDLC and then merged into the existing software product. For example, when a
platform like Twitter releases a new feature like the downvote button (new as of this writing
in 2022), that feature will have gone through every phase in the SDLC before it was released
into production.
put data models in their rightful place within the SDLC – that is, they are created early in the
process and have immense power to impact everything that comes after. Depending on the
structure of the software team, there might be no overlap between people working on the
pass their ethical or ideological orientations down through the process, except through the
artifacts they hand off. Considering the SDLC, it becomes clear that there is not one person
to work across all three levels of data models---conceptual, logical, and physical. At the
highest or most abstract level, called the “conceptual level,” the conceptual model contains
entities.27 Sometimes, modelers are modeling the world as they know it to be (e.g., customers
27
This is an oversimplification. For a good introductory text on how data modeling works, see Simsion & Witt
(2005).
36
place orders for products in Figure 2) and other times they may be modeling a speculative
world that does not yet exist (astronauts use their own spaceships to lead space tours). A
logical model focuses on the ways that entities are related to each other; a conceptual model
becomes a logical model when detail is added and when relationships between entities are
made explicit. Beginning with Figure 2, a conceptual model containing customers, orders
and products becomes a logical model when we add further detail about each entity (like
attributes) and how they are related. (See Figure 3). A logical model becomes a physical
model when field-level information is added. That is, when information about the attributes
of an entity become specific data types and fields. The physical model, often regarded as the
lowest level, contains all information necessary to create a database to contain and enact the
model (see Figure 4). At each level of data modeling, database designers are making choices
about what entities and relationships are most important and worth storing. No data exists
yet, but the containers in which data will be stored are being formed. It is important to
remember these three levels are all included when the term “data model” is used (and will be
included here as well). In Chapter 3, when I reverse engineer application data models, I am
working across the levels, and most frequently diagramming the physical level.
37
Figure 2
Figure 3
Figure 4
At each level, database designers are refining the entities, relationships and fields
contained in the data model. Choices made at the conceptual level (to make order its own
entity, for example) cascade down to the logical and physical levels. The physical model and
38
the resulting database are a direct result of the choices made at the conceptual level. This
means that when we consider something like dataset bias, our consideration is incomplete
unless we examine the data models shaping the dataset. Thinking about dataset bias through
the lens of data model levels expands the field of analysis and provides us with more
opportunities to think about how data and datasets are shaped. The majority of contents of a
data model (its entity, field, and relationship) are determined by the social, cultural and
commercial conditions of its use. However, there are some fields that have become standard
or expected because they provide expected computational tools when included. For example,
it is now commonplace for entities to have fields like created_date because this field allows
for sorting records based on which might be oldest or newest. It also allows developers to
write code to purge old records, and for computer systems to effectively synchronize
distributed storage. This is an obvious yet important point to underscore—field creation and
its attendant constraints are entirely created by the data modeling and development teams.
name and last name fields instead of given name, middle name, first surname, and second
surname fields. In other words, the problems created by a database's gender, name, and
other identity fields are not simply software problems, but reflective of assumptions
embedded within the broader sociotechnical systems. It is an easy STS truism that software
systems are also sociotechnical system and that a system like, for example, Facebook
encompasses not only the code and data used to power it but also the cultural norms
accompanying the creation of the code (Grodzinsky et al., 2012). Within any software
system, designers (programmers, architects) made assumptions about the society and the
individuals who were their concern. In the United States, the included cultural norms and
39
social assumptions made generally include the fundamental centering of normative identities-
happens within the governmental, industrial and nonprofit contexts of software production.
Data models, of course, are not created for their own sake, but as part of software
rather than individual programming.” The data models I consider and the processes I
examine are all squarely within this realm of “professional software development.” In this
work, software is not simply another word for a computer program (or set of instructions)
but “also all associated documentation, libraries, support websites, and configuration data
that are needed to make these programs useful” (Sommerville, 2015, p. 19).28
Software is, and has always been, about more than algorithms and their code. In a
germinal computer science text, Niklaus Wirth asserts that software programs are actually
algorithms + data structures (Wirth, 1976). In an extension of Wirth's work, Paul Dourish
supports Wirth's elevation of data structure to an algorithmic peer in terms of the necessity
of analysis, noting that we must analyze algorithms “alongside data, data structure, program,
process, and other analytic entities” (Dourish, 2016, p. 2). Notice that Wirth's focus is on the
data structures rather than the data itself---algorithms are designed to work with “forms and
regularities” rather than specific pieces of content. Algorithms are productive, taking inputs
(data) and generating conclusions, insights, verdicts, and diagnoses. However, the contours
and capabilities of those inputs are directly determined by the design of the data structures.
28
For my purposes, “software” does include all of the artifacts and configuration required to create, use and
maintain a software program. When I am referring to an executable binary, I will use the word “program.”
40
While it is common to hear phrases like “garbage in, garbage out” when talking about
algorithms and datasets, the data models prefigure the “garbage” data people refer to—they
determine the shape of the containers before they are filled with any data at all.
Data models become data structures, and data structures control what algorithms can
be used. The relationship between data structures and data models can be conceptualized as
a linear one: data structures are data models translated into programming code. Data models
are visualizations and abstractions of the ways that data gets stored and interrelated. Those
data models, when moved into code, become data structures. Consider the following small
programming task: I need to build a simple weather application that displays the hourly
weather and reports the high and low temperature for the day. As the programmer and data
modeler, I need to decide how to store the weather for each of the 24 hours in a day. One
approach to designing the data model might look like Figure 2.29
29
Technically, this is a table and not a data model. Further explication of data models will come in Chapter 2.
41
Figure 5
A data design like the one in Figure 2 has a direct impact on the code that I would
write. For example, based on this table, I might create variables for each day's weather. Note
that lines starting with // are comments to guide the reader and not executable code.30
inefficient in many ways. Additionally, I need to be able to determine the day's highest and
lowest temperatures. To do this, I could compare each hour to each other hour one at a
time:
30
I am using “vanilla” JavaScript for all of the code samples in this chapter because I think it is the easiest to
read for those without programming experience. Note that JavaScript is a dynamically typed language, which
means that data types for variables to not need to be declared in advance.
42
However, this is wildly inefficient, will not support more than a few variables, and is
an error-prone approach. This example illustrates that my choice of data storage (a collection
of unrelated variables) has a direct impact on the operations we can perform with our data
I might instead opt to store the day's worth of weather data in a data structure called
an array, which is a container “that hold[s] a fixed number of items of a particular type.”
(Smith, n.d.). Arrays are data structures that allow data items to be stored together and
The programmer is not required to use an array, but arrays make several
programming tasks easier. With an array, a programmer can use an algorithm to iterate over
items to perform the same operation on each one. In other words, the choice to store data in
an array, versus not in an array, has a direct impact on what a block of code (aka, an
algorithm) can do. If I choose to use an array, I can use a bubble sort (or other sorting
algorithm) to determine the day’s highest and lowest temperatures. For example:
31
Arrays that allow for items to be referenced by a string are multidimensional arrays, and not available in all
languages.
43
temps[y+1] = tempVariable
}
}
}
As we know, the choice of algorithm can have a powerful impact on the outcome
However, as this very small example demonstrates, those algorithms cannot operate without
the proper data structures in place. In these examples, the data structures are simple
containers (i.e., arrays) to hold data of type number. However, in large systems, there is an
immense volume of data items that will need to be structured and stored and it is insufficient
to think at the data structure level. Thus, we must consider the data models that shape the data
structures.
In this chapter, I reviewed the process of creating software, the SDLC, and
illustrated one way in which data models and data structures create algorithmic possibility.
The steps of the SDLC and the code examples are foundational to my argument that data
models are important site of analysis for those of us concerned about the social and cultural
impacts of software systems. In the next Chapter, I offer a critical analysis of some of the
techniques of data modeling, illustrating the ways that data models are both important sites
44
CHAPTER 2
In the United States, the 1970s were an active period for political and social
did not skip the computing culture of the time. The 1970s also saw a “database revolution”
(Kerssens, 2018). One important ingredient in this revolution was the introduction of the
relational paradigm of database development, modeling, and querying. Prior to the database
revolution, data was treated as material to be processed rather than collected, and computers
as the tools doing the processing. The affordances of available technology shaped this
assembly-line like perception of data and data processing. Software was written for specific
sets of computer hardware and updating that software was tedious and time-consuming. The
data storage paradigms popular in the late 1960s and early 1970s—hierarchical and network
databases—were not conducive to storing large volumes of data for future unknown
processing tasks.32
uses of computers and their data storage. At the time, proponents of this revolution
advocated for a “database approach:” “a view of the database as not just a technology, but as
a completely new managerial philosophy and mindset.” (Kerssens, 2018, p. 14) This new
mindset was facilitated by the introduction of database management systems (DBMS) and
the relational database (Haigh, 2009). DBMSes provided ways to interact with databases that
made databases, data processing and data storage more accessible to computer users.
DBMSes thus helped to reinforce the computer's utility as an administrative tool. Indeed,
32
In other words, there was little room for an approach like “Let’s collect it in case we need it, and figure out
what to do with it later.” This is an approach that we might now describe as a “big data” strategy
45
databases began to have significant impact on the ways software was developed and “the
materialities of database technologies shape[d] the kinds of ‘databasing’ that [could] be done
and imagined” (Dourish, 2014, p. 39). As “databasing” evolved, so too did the paradigms
used to conceptualize and visualize the databases. The relational database and its
convenience and flexibility required programmers to use new tools to communicate database
design and architecture. In many ways, data modeling—that crucial act of naming the things
to which a software system must attend and a database must store—evolved out of new
to the innovation of the relational database. The relational model, and the subsequent data
natural structure in which it is best to be both stored and visualized. I interrogate this claim
and link it to data modeling techniques. I demonstrate that each technique is a way that data
things (data) that an application needs to attend to.33 The modeling process requires the
modelers to draw boundaries around the information that is relevant, and to decide what to
include in a data model. As I discussed in Chapter 1, data models are often made before (or
33
Data modeling is considered a type of conceptual modeling, in which parts of the “universe of discourse” are
abstracted to be stored. Conceptual modeling as a field was named by John Mylopoulos in 1980. He describes
conceptual modeling as the activity of constructing abstract models of knowledge about some world uses it
synonymously with the terms “knowledge representation” and “semantic data model” as they have been used
in artificial intelligence and database discussions respectively (Mylopoulos, 1980, p. 167).
46
in tight parallel to algorithmic and code decisions and play a significant role in shaping how a
While it is not the case that data modeling never happened before relational
databases, it was far less common. Data modeling as a software practice was ushered in by
the relational database paradigm. Prior to relational databases becoming popular, the
structure of the data was connected to the structure of the database. These early databases
were often called “navigational” databases, because they required a skilled navigator to
epistemological shift that relational modeling created. The relational database paradigm was
not the first, or, for a while, even the most popular way of structuring databases. Before the
innovations of the “database revolution,” there were two popular types of databases. These
databases, hierarchical and network, were rigid and highly structured, requiring that data be
entered and connected following preset patterns. Using them required a sophisticated
47
Figure 6
Figure 7
In the early 1970s, if you wanted to store data in a database, you essentially had two
choices. In 1968, IBM introduced IMS - a database system that used a hierarchical model of
48
et al., 1999, p. 162). IMS stored data in a way that is easy to visualize like a family tree or
organizational chart (Figure 6) Each item could be the child of another item, a parent of
many others, and a sibling to others. The hierarchical structure was an innovation from no
structure at all but had many major shortcomings for programmers: in order to use the
structure, programmers needed to be familiar with the existing structure (i.e., one could not
simply insert data anywhere); a significant change to how data was related might trigger a
restructuring of the entire database; finding data (aka traversing the tree) was time-
consuming and memory intensive. Importantly, and perhaps obviously, the hierarchical
Lessons from History et al., 1999, p. 162). This standard, often called the network model,
allowed items to be linked to other items without having specific, pre-set rules about how
many items might be linked to others (See Figure 7). In other words, when using the
network model programmers were free from the parent-child constraints. The network
others. The network model was still quite rigid, however: data was required to be linked to
other data, and many of those relations were artificial or inefficient; finding data still required
Using the network or hierarchical models did not require the database
designer/creator to separate the structure of the database from its content (Dourish, 2014, p.
15). In practice, this meant that during the creation of the database there was no speculation
about how the data would be structured. In a hierarchical database, the data would be stored
49
hierarchically and the choices that were necessary were about which datum would be a
parent and which a child. The content and the structure were the same—to a certain extent,
the content dictated the structure and the structure dictated the content, both shaping where
the next piece of data was stored. For a few years, database designers had the option to store
data either in hierarchical or network configurations, and the data itself might indicate the
best option.34
structure fairly seamlessly, with parents naturally falling between grandparents and children.
Data about flight paths might be stored in network databases, with hubs connected to other
hubs.35 In both of these early models, data were interdependent. Regardless of whether the
relations between data felt artificial to programmers, data always had a type of context by
way of these linkages. In both network and hierarchical structures, all data items were linked
to other data, and the way items were stored in memory (aka, the “physical structure” of the
data) matched the way items were visualized (aka, the “logical structure” of the data). As a
result, there was no meaningful separation between the data's physical and logical
representations. This is an important point because when programmers were using the
network or hierarchical databases, there was no real need to model, visualize or diagram the
data—the data and the database shared a structure determined by the technology. Every
abstraction of a network database looked similar to Figure 7, every hierarchical database like
Figure 8, but relational databases could look like nearly anything at all.
34
Of course, there were a host of technical and financial considerations also at play here because different
computer systems supported different models.
35
The fit was not always so seamless. Data that might today seem more “flat,” like bank transaction records,
was often shoehorned into these structures
50
Data modeling
Edgar F. Codd introduced a new way of thinking about data and its “natural”
structure that would offer new freedoms, and new epistemological and ontological
frameworks to data thinking. Codd, a long-time IBM engineer, introduced the concept for
the relational database model in a paper he published in 1970. “A Relational Model of Data
for Large Shared Data Banks” opens with one of the fundamental objectives of relational
databases—the separation between data's logical and physical structures: “Future users of
large data banks must be protected from having to know how the data is organized in the
machine” (Codd, 1970, p. 377). By their nature, databases of any form already separate data
from their context, and with this move, Codd was working from what Tara McPherson calls
lenticular logics, offering ways to separate data's physical structure from its logical: the
relational model provides no dictates on the superstructure (i.e., tree, network) in which data
are interrelated. In other words, the developer has complete discretion over how data are
structured. The lenticular logic is a logic of the fragment or the chunk, a way of seeing the
world as discrete modules or nodes, a mode that suppresses relation and context
(McPherson, 2013).
The relational paradigm is built on two pillars: entities and their (eponymous)
relationships. The smallest relational database might contain two entities with one
relationship between them. In the very simplest terms, entities are nouns and relationships
are how those nouns connect: a customer (entity) places (relationship) an order (entity).36
This small database is diagrammed in Figure 8. Note that the entities customer and order are
connected by a relationship.
36
Relationships between entities are not always named in practice. However, in diagramming, its useful to label
the interaction between two entities
51
Figure 8
When introducing his relational model, Codd boasted that it “provides a means of
describing data with its natural structure only---that is, without imposing any additional
structure for machine representation purposes” (Codd, 1970, p. 377). The additional
structures to which Codd is referring here are network or hierarchical models that forced
data to be linked in ways that would be artificial, or to use Codd's terms, unnatural. What
exactly is the “natural structure” of data? According to Codd, the natural structure of data is
as entities in relationship to each other. In terms of how that data is stored, however, it
means that relational models are structured around data collections stored as tables and
related to each other in ways that are determined by the programming team, rather than
required by the database structure itself. In other words, unlike hierarchical or network
systems that controlled how data items were related (as nodes, or in parent-children-sibling
relationships), relational databases allowed the programmers make relationship decisions and
this allowance reinforces the separation between the data's “natural” structure and how it is
stored in a system.
This new paradigm for thinking about data structures, and method for designing
them amplified the power of data modeling within the software development lifecycle. The
paradigm has persisted within the SDLC with little interrogation. Thus, the ER model
52
Because relational databases can take many forms, programmers were experimenting
with new ways to visualize the data and the connections between entities. In 1976, Peter Pin-
Shan Chen created the Entity-Relationship (ER) Diagram to provide a “unified view of data”
(Chen, 1976). This unified view provided a visual vocabulary for several of the conventions
“natural” structure, echoing Codd's use of the word “natural” a few years earlier. Chen
writes “The entity relationship model adopts the more natural view that the real world consists
epistemological and ontological claim that I address in more detail in the next section.
Even though Chen insists that data has a natural state, he acknowledges the inherent
among entities. For instance, ‘father-son’ is a relationship between two ‘person’ entities”
(Chen, 1976, p. 10). However, as Chen notes, what becomes an entity is a “decision which
has to be made by the enterprise administrator. He [sic] should define what are entities and
what are relationships so that the distinction is suitable for his environment” (Chen, 1976, p.
10). With this statement, Chen positions the database designer (who he calls the “enterprise
administrator”), as the individual with the power to determine what “things” deserve
identification and attention, and thus which concepts are important and available.
view of the world. As we will see throughout, the use of the word “natural,” is a powerful
Chen's ER Diagrams. As I have argued elsewhere “by divorcing data from problems of
storage with the goal of creating generalizable database recommendations, the relational
model subtly reinforced the notion of data as ‘natural’ and corresponding to something prior
53
to or separate from the ways it is collected, labeled, organized, and stored...Such naturalistic
conceptions of data also separated them from the social, political, or economic contexts
relational data modeling, the first ontological commitment is agreeing to view the world as a
series of objects, properties of those objects, and relationships between objects (Olivé, 2007,
p. 11).37 Thinking in this way for long enough impacts one's perception of the world — this
agreement that what is represented through those objects and relationships is true (or
perhaps true enough for the system's purposes). It is our epistemological orientations that
allow us to ask questions of databases, assuming that the data they return will be accurate.
The commitment to trusting databases to provide knowable or usable data is also high
stakes.
I argue that there is one especially key moment in terms of both significance and
intervention: when an attribute becomes an entity. That is, when a modeler decides that
something that is a property or descriptive field about an entity, itself becomes an entity. In
data modeling, an entity is an object (person, place, thing, idea, or concept) that has
attributes, or properties. Entities are related to other entities, in keeping with Chen's Entity
Relationship Diagram. In databases which need to keep track of people, there is often a user
37
Note that the word ontology has two different, though interrelated meanings. In branches of computer
science, ontology is “a specification of a conceptualization” (e.g. deciding how to build a world full of objects
and relationships) (Olivé, 2007, p. 11). Ontology is also a branch of philosophy dealing with the nature of
reality. In this section, I am referring to the philosophical meaning
54
or customer entity. That entity might have attributes like email_address and first_name. The
key moment, then, is when an attribute of an entity, becomes an entity itself. In this section,
I describe the transformation from attribute to entity, and illustrate the ways that these
choices are ones grounded in ideology and power. That data models represent such intense
Figure 9
Figure 10
55
Figure 11
Deciding which questions, or type of questions, the data model needs to answer is
part of the process of data modeling. Consider the difference between “Is person X
married?”, “To whom is person X married?” and “Where and when did person X marry?”
Each of these questions requires increasing amounts of detail about a person and their
marriage. Figure 9 shows a User entity with an attribute marital_status enabling us to answer
whether a person is married. The marital_status attribute might contain values like “Single,
Divorced, Widowed, Married.” Figure 10 also shows a User entity, but this time the entity
has a partner_id attribute. The line on the left side depicts a relationship, indicating that the
partner is also a user in the table, and answering the question “To whom is partner X
married?” Finally, consider Figure 11 where marriage is reified: the relationship between two
User entities becomes an entity itself. When marriage becomes an entity of its own, those of
us who are looking at this model are then free to think about other properties of marriages
which we might want to store. In the example question above, I would opt to store the date,
location, and perhaps officiant of the wedding ceremony. The facts of the marriage are the
same regardless of whether it is modeled with Figure 8, Figure 9, or Figure 10. However, it is
this crucial act of “promoting” marriage to an entity separate from, but related to, the user
a list of customers to receive farm share deliveries during growing season. In this case,
entity (see Figure 12). However, in an e-commerce context, customers might have several
addresses stored and labeled for different transactions. Contrast Figure 12 and Figure 13.
Note that in Figure 8 there is one user with an infinite number of addresses supported by
this model---a user can have as many addresses as needed. Note the importance of the
markings on the line connecting the User and Address entities. On the left, the vertical line
through the horizontal line means one. The angled lines on the right mean many. Thus, the
diagram is read: Each user can have many addresses. In this configuration, the primary key
named user_uuid is the “foreign key” in the address table linking the address back to a
specific user record. Those with data modeling experience might point out that splitting
address from customer (i.e., making address its own entity) is a standard move because of
and as a foundation for my own data modeling, but there is a growing movement advocating
for ontological modeling. Ontological modeling is an emerging practice that “looks at the
data and asks the question ‘What does this represent?’ and then it structures the data around
that” (West, 2011, p. 65). Normalization is the process of looking at patterns and repetitions
38 For a solid overview of normalization, see Date (2019), Simsion & Witt (2005).
57
Figure 12
Figure 13
data model meets certain criteria, the model is said to be in a “normal form.” In 1970, E.F.
Codd introduced the relational database paradigm, and over the next decade developed
criteria for “normalizing” the database. While the intricacies of data model normalization are
out of scope, a key point here is that through the process of normalization, some attributes
become entities. The criteria by which that happens is based on the modeler's
58
User from their addresses because there is a common understanding that people can have
many addresses to which they ship packages. In both the marriage and address examples,
promoting a field from an attribute to an entity facilitates more expansive thinking about the
promoted to entities and other attributes remain attributes and that choice is based solely on
modeling, it is important to recognize that data modelers model data according to the
requirements they have been given (as I described in Chapter 1). A modeling team may get a
user requirement like “As a user, I want to have a profile with all of my information so that I
can receive personalized product recommendations.”39 The requirement may specify which
information about a user is stored (“name, email address, mailing address, age, favorite
sports”), but it is at the sole discretion of the data modeling and database teams how that
information is stored. In other words, the models are created to align with the modelers’
views of the “natural structure” of data. Perceptions about that natural structure are
inevitably influenced and determined by the social and cultural context in which the
modelers are working. In the United States, that context is the infrastructure of white-cis
supremacy.
The most common practice of storing gender has been a “state” paradigm---which is to say,
39
Remember that user story requirements are not necessarily “true” things that a user might want. They are
created by the planning team after discussions with stakeholders. User stories about how a user wants
personalized algorithmic recommendations are aspirational, based on product or monetization goals.
59
that the database records the active “state” of a value or setting in a system. When storing
the state of something, there is a fundamental idea that there can be only one valid state in
any given moment and that the number of states is finite. In the case of gender, people have a
gender that is only one of a set number of possibilities. Initially, valid values for the “state”
of gender would have been M and F. More recently, especially in commercial software
applications and web-based forms, these valid states have been expanded to include values
like “non-binary” and “MtF” or “FtM”.40 The addition of MtF and FtM values further imply
additional values, a free-entry text box, or radio buttons which support the selection of more
than one value (i.e., respondents can choose “non-binary” and “woman”) are still working
within a state paradigm. These data models and interface choices assume that users are able
and willing to fix themselves to a single gender and a single set of pronouns.
Importantly, the act of promoting an attribute to an entity may seem solely like an
epistemological shift. The affordances of an entity allow the data modeler to open new ways
of knowing---to ask new questions about (in our examples) the marriage and the addresses.
However, the act is about more than asking new questions and shifting epistemologies. The
act of promoting an attribute to an entity can signal an ontological shift, especially when that
attribute or entity is about an aspect of a person's identity, in this case, a person's gender.
Starting with our initial ontological commitment---that the world is made of objects and
relations---choosing to store gender as an attribute reflects the idea that gender is simply a
property of a person, not related to anything, nor influenced by anything external. In other
40“MtF” is an abbreviation signaling that a person was coercively assigned male at birth and is a woman: “male
to female.” Similarly, “FtM” is “female to male” for those coercively assigned female at birth who are men.
60
words, people have only one gender and there is no additional information to be gathered
about it. The relegation of gender to attribute of a person and not its own entity reflects a
collective, historical inability to view gender as something about which to store information
(in the case of the marriage example) or as something that a person might have more than
one of (in the case of the address example). I discuss this example in more detail in Chapter
4.
operating system. The illusion of wholeness McPherson describes, perhaps rendered through
a “unified view of data” obscures the ways that the relational paradigm extends the modular
logics, creating increasingly fragmented, yet superficially connected, depictions of the world.
These are, to use McPherson's term, “lenticular logics”— logics of fragmentation and “a way
In this section, I discuss several techniques of data modeling that enable data models
to reinforce social power. Both were introduced by John Smith and Diane C.P. Smith in
1977. First, I discuss reification, the act of turning a relationship into a noun---cementing or
fixing the connection between entities into something that has the tenor of fact. This mirrors
the ways that whiteness both reifies and is reified in the United States. Second, I discuss
instances (people, things, events) so that they can be stored together as an entity. For
choose “White” as an ethnicity category. It is a technical application of the way that both
61
Reification
“[Reification] refers to an abstraction in which a relationship between objects is
regarded as a higher-level object” (Smith & Smith, 1977, p. 106). For example, a relationship
between two people (i.e., being spouses) can get reified into a “marriage” object for storage
For example, consider Figure 5, a diagram in which two users are linked as spouses.
Contrast that to Figure 6, in which the relationship is reified and a marriage table added.
Antoni Olivé notes that “[t]he act of reification does not add any new knowledge to a
schema, but it is necessary when we need to record additional facts regarding the instances
of a relationship type” (Olivé, 2007, p. 124). For example, if the system needed to describe
not just that two people were married, but the date, location, officiant of that marriage, it
would make sense to reify the marriage into an entity itself. Here Olivé's claim seems to
naturalize and lower the stakes for the act of reification, asserting that it does not add new
knowledge, but simply creates opportunities to store facts. Instead, I would argue that
reification creates opportunities to create facts, and to create objectivity and fixity around a
relationship between entities. Furthermore, as I discuss next, the act of reification is a key
action of white-cisness.
which whiteness maintains its power — “expectations of white privilege” are transformed
62
from living relational negotiations to objects (Harris, 2006, p. 1784). She references Georg
[Reification's] basis is that a relation between people takes on the character of a thing
and thus acquires a 'phantom objectivity,' an autonomy that seems so strictly rational
and all-embracing as to conceal every trace of its fundamental nature: the relation
The autonomy and phantom objectivity here is crucial. When, for example, a domestic-legal
relation becomes reified into a two-person “marriage” entity, it becomes the default and
natural way to relate two individuals and, by virtue of the data model, excludes or marks
assertions about data's “natural” state. As Chen notes: “It is possible that some people may
view something (e.g., marriage) as an entity while other people may view it as a relationship.
We think that this is a decision which has to be made by the enterprise administrator. [They]
should define what are entities and what are relationships so that the distinction is suitable
for [their] environment.” (Chen, 1976) The “natural” view of data, coupled with the
contradictory, yet unremarkable idea that the administrator gets to decide how to model the
data is reminiscent of a (false) “view from nowhere” that simultaneously treats their view as
“natural” and imbues them with the power to interpret it flexibly. While this “interpretive
flexibility” is a key part of the way that scientific-technical knowledge is constructed (Pinch
& Bijker, 1984), the positioning of the administrator as the unquestioned expert on what
“counts” for modeling is a configuration that feminist science and technology studies
Troy Duster calls this the “the fallacy of reification” which he describes as “the tendency to
assume that categories of thought coincide with the obdurate character of the empirical
world” (Duster, 2005, p. 1050). Similarly, Harris notes that “the law constructed ‘whiteness’
subordination” (Harris, 2006, p. 1730). It is this mechanism by which aspects of data modeling
are able to support white-cisness. Indeed, reification was used “to identify and denounce the
transformation of historical processes into ahistorical entities, human actions into things that
situated actions into “ahistorical entities” that possess “phantom objectivity.” Why does the
person creating the database get to decide what is an entity and what is a relationship,
especially in a context in which this structure is “natural” for data? The answer, of course, is
that there is no natural structure for data. Only an ideological fallacy that enables the
Generalization
Generalization is the second of Smith & Smith's abstractions. Generalization is “an
abstraction in which a set of similar objects is regarded as a generic object...In making such
an abstraction, many individual differences between objects may be ignored” (Smith &
Smith, 1977, p. 106). They do not elaborate on the “similarity” in their definition, but their
similarity begin when developers are defining entities. Contrary to my own simplistic
definition earlier, an entity is not simply a metonym for the grammatical term “noun.”
64
Instead, entities represent aspects of the world that need to be stored. Modelers must be
Olivé but takes care to distinguish between a concept and an object (which the idea
of “entity” conflates). A concept is “an abstract or generic idea generalized from particular
instances,” those instances being “objects” (Olivé, 2007, p. 12). The distinction between
concepts and objects is an important one, related the computer science concept of
programming, and broader object-oriented paradigms (of which relational databases is one),
classes or concepts exist as “blueprints” for objects.41 The car parked in front of my house is
an instance of a concept that could be called “cars” or could more specifically be called
concepts so that that they can recognize the “things” or “objects” belonging to that concept.
If it is true that “[modelers] do not show a practical interest in concepts without instances”
(Olivé, 2007, p. 13). What does this mean for instances belonging to concepts which
modelers do not recognize? In other words, on what basis do modelers determine which
concepts do not have instances? How do modelers decide how to generalize? Like with
reification, data models are generalized based on the concepts to which modelers have access
approach: an attempt to make many disparate things fit in higher-level container. Modelers
41
The introduction of programming languages (specifically Simula) that supported objects, and classes allowed
programmers to develop technologies for generalization and abstraction. For a situated history of Simula, see
Holmevik (1994). For a review of the roots object-oriented programming, see Dahl (2002). With Simula, the
first object-oriented programming language, programmers had hands-on experience working with tools to
support concepts like generalization. This is important for the development and evolution of programming
technology, but has special relevance here.
65
perform generalization when they decide that an entity named Faculty and another named
Student can be stored in an entity called Person. The combination of faculty and student into
A common objection to this might be something like “how would we store different
objects in tables if we didn't erase some difference between them? If we did not decide
which aspects ‘matter’ and which ‘do not matter’?” We cannot, of course, turn the map into
the thing (or in this case, the model). The differences that we choose to erase (or prioritize)
are based on our specific cultural and epistemological world views and related to our access
to the parent concepts (or superclasses) for the objects we are considering.
“While ’generalization’ in UNIX has specific meanings, we might also see at work
here the basic contours of a lenticular approach to the world, an approach which separates
object from context, cause from effect.” (McPherson, 2013, p. 27). Generalization—the
removal of difference between objects (subjects) into a meaningful category—is a key move
performed by white-cisness. For example, over time, the generalization (or erasure of
Nell Irvin Painter (2011) describes, the category of “white” expanded several times, first to
include Irish immigrants, later Jews and Italians. Differences that seemed worth of separate
Throughout this chapter I have argued that data models are an important site of
analysis, one that reflects onto-epistemological commitments. These commitments are not
42
In a relational data model, there is often a Person entity with primary/foreign key relationships to entities
about that person's specific roles. In the above example, someone might have an entry in both Faculty and
Student tables linked back to their person entry.
66
simply those of the creators, but of the milieu in which the models are created. The
relational paradigm requires modelers to view the world as a series of entities and
relationships, a way of looking at the world and data that is more “natural” than the database
structures that were available in the late 1960s. However, this “natural” view of data is only
maintained through data modeling techniques that reinforce social systems of power. As a
result, looking closely at the techniques of reification and generalization reveal that they are
techniques that mirror some of the “interpretive flexibility” that white-cisness provides.
Reification creates false objectivity in contexts that would be better served to remain messy
and entangled. Generalization erases rich difference and flattens the world to reduce
complexity. Data models, then, support systems of power and demand an extensive amount
67
CHAPTER 3
When data models become databases powering software applications, the impacts of
data modeling choices become encoded in the software. Unfortunately, it is rarely possible to
obtain the original data model used to develop a particular software application, which is
why, in this chapter, I use reverse engineering to infer the data models used. In this chapter,
I demonstrate a methodology for obtaining the data models and, rather than extensively
draw my own conclusions because that is a project in itself, show how to think about the
that has impacts for individuals in their homes. As I discussed in the introduction,
surveillance in the United States is racialized and gendered. Thus, I open this chapter with an
example of the immediate harms that software and hardware can cause to an individual, and
then proceed to uncover the data models within Lovense software. It is crucial for those of
us who study software to look beyond their external results. Opening software to discover
the data models within is an important way to examine the manifestations of power within
68
Figure 14
An image for the Pear Flower, an internet-connected sex toy made by Qiui.
“Fill up the empty spaces in your relationship” is the tagline for the “Pear Flower,” a
butt plug made by Chinese company Qiui (Figure 14). However, the Pear Flower is distinct
from other anal play devices. It is controlled by a mobile app and with simple a tap of the
app screen, the device “blooms” in the wearer's rectal cavity, locking it in place. The Pear
Flower is one of many devices designed for adults engaging in dominance and submission
sexual play (a part of the BDSM umbrella).43 Qiui makes several of these devices, including
43For an introduction to critical literature on BDSM, please see the BDSM special issue of Sexualities. (Simula,
2021).
69
an electric shock collar (the “Little Devil”), and a chastity cage (the “Cellmate”). Similar to
the Pear Flower, Qiui's Cellmate locks a user-wearer’s penis in a chastity cage that the user-
wearer cannot open manually. “QIUI believes that a true chastity experience is one that does
not allow the wearer to have any control over” (Qiui, n.d.). Access to the cage is controlled
The conceit of Qiui’s Cellmate and Pear Flower is that for some people abdication of
control is a key element of sexual pleasure. The idea is not new, of course, but traditional
locking sexual devices are usually un/locked with a common lock and key; the wearer gives
the key, and thus control, to their partner. If a key is lost, its lock can often be picked with a
bobby pin or other common lock-picking device. Qiui products simplify the need to track
and manage keys, allowing one user account to control many devices, locking and unlocking
as the keymaster desires, with no need to even be in the same room as the wearer(s).
What happens when you connect sexual devices to the internet? They get hacked. In
2020, a security testing group released a report revealing that because of a security flaw in
part of the application's code, anyone could seize control of a Cellmate (Pen Test Partners,
n.d.). Without access control, and because of its strong locks, extrication from a locked
Cellmate would require bolt cutters or an angle grinder very close to the user-wearer’s penis.
This is probably not the kind of submission the participants intended when they opted into
this device.
The Cellmate is only the most recent internet-connected sex, or teledildonics, device
to have security flaws.44,45 The smart vibrator Vibratissimo had a similar flaw that let anyone
44See Wynn (2018) for a review of the privacy and security of several internet-connected sex devices
45Teldildonics (sometimes also called “cyberdildonics”) is used to mean “sexual interaction mediated through
remote control sex devices”(Faustino, 2018, p. 2). Although the term “dildonics” was coined in the early 70s
70
gain control over the device. However, Vibratissimo's device security flaws were matched
with data security flaws – “Usernames, plaintext passwords, chat histories, and explicit image
galleries that users created themselves were sitting in an” unsecured database (Whittaker,
n.d.). In 2018, a woman sued the makers of the Lovense sex toy for collecting data about
her use of the product, mirroring a similar suit against the makers of a sex toy called We-
Vibe in 2017. Both cases were settled out of court (N.P. v Standard Innovation Corp, 2017).46
For users of internet-connected sex toys, these teledildonics products likely feel like
simple pleasure between consenting partners—a fun new way to connect, or a way to stay
sexually connected over distance. Unfortunately, teledildonics devices, like other Internet of
Things products, are not simply entertainment; they are also surveillance. IoT devices live in
the home, which is a “prime data collection node” (McGuirk, 2015). While other works take
up the impact that this data can have in individual lives, I want to take care not to become
fixated on the content of the data itself---that this data has certain contexts or contours is a
factor of the data model and database design, themselves factors of the cultural-business
milieu in which they were designed. The milieu of IoT sex devices is one heavily influenced
In what follows, I describe in detail the process of reverse engineering the Lovense
Android application and how, through looking at the code, I was able to reconstruct the data
(by the same person who coined the word “hypertext”, Ted Nelson), the “tele” was popularized by Howard
Rheingold in 1991 (Faustino, 2018).
46 For information about several large commercial data leaks, see Shaban (2018), Newman (2018), Glaser
(2018). See Duhigg (2012) for an example of consumer behavior prediction. See Eubanks (2018) for
information about how the state uses algorithms to determine the services a citizen may need. For a sampling
of information on algorithmic bias, see O'Neil (2016), Chander (2016), Garcia (2016) Lodge and Mennicken
(2017), Sandvig et al. (2016), Yapo and Weiss (2018).
71
Lovense
In the next section I outline the “technical” steps included in reverse engineering.
However, with this section I begin with a foundational context and information about
Lovense, the product being investigated. While close reading has been described as “the
Tanenbaum, 2011, p. 262). The close reading that follows is part of the reverse engineering
of the device.
Lovense, Inc was founded in 2009, because “the founder of the company was in a
long-distance relationship. Lack of sexual intimacy was a serious problem in the relationship
and sparked an interest in teledildonics” (Lovense Bluetooth Sex Toys - History of Innovative Sex
Tech, n.d.). Lovense makes “sex tech for everyone” --- a line of teldildonics products that
remote control” (Lovense Sex Toys for Long-Distance Love, n.d.). The Lovense product line
contains familiar sex toy forms, like rabbit vibrators, penis masturbaters, and anal plugs.
Refer to Figure 15 for a table of their product offerings at the time of this writing.
72
Figure 15
A table listing Lovense product offerings.
Looking at the Lovense product lines and feature offerings reveals a richly
complicated view of gender, sexuality, disability, and sex work. Like most sex toys, Lovense
products hew to a cissexist, and biologically essentialist gender binary. On their product
table, they conflate particular genital configurations with gender identities. Note the “works
for” category in the figure, which assumes that “men” (♂) are users of devices that stimulate
penises and “women” (♀) are the audience for devices that stimulate “internal.” Two of
their products, the two I obtained for this project, Max and Nora, have traditionally Western
73
and as a “high-tech male masterbator.” In this language, they reinforce the gender-genital
connection, assuming that everyone with a penis is male and that all men have penises.47
Their descriptions of Max reinforce heternormative ideas about what people with
penises desire---presumably pink vaginas. Max has a plastic outer shell and a removable and
washable silicone insert. By default, Max is shipped with an insert that is a translucent clear
(nearly grey) material (Figure 16). The product's marketing simultaneously reinforces
heterosexuality and whiteness by offering a “vagina sleeve” that is pictured as a light pink
(Figure 17).48 While it is out of scope to explore how the company's choice of language
might impact the user's relationship to the people-with-vaginas in their own lives, it is worth
noting that instead of advertising that the user is able to control the toy, they say that the
user can “[c]ontrol the vagina contraction and vibration settings” (Lovense, n.d.). Nora's
marketing copy fares a little better: they describe Nora as “[d]esigned to pleasure your G-
spot!” (Nora by Lovense. The Original Bluetooth Rabbit Vibrator!, n.d.) and as the next
evolution of “devices with a “Flexible Clit Arm.” The branding page for Nora does not
47 There are masturbators made explicitly for trans men who have experienced “bottom growth” (aka clitoral
enlargement) due to testosterone. They are typically called “FTM strokers.”
48 As some authors note, discussions of the operations of teledildonics, are naturally intertwined with
descriptions of sexual activity and gendered bodies (Liberati, 2017). In this work, I operate from an explicitly
transfeminist point of view and divorce all connections between an individual's gender and the configuration of
their genitals or sexual preferences.
74
Figure 16
The clear liner for Max
Figure 17
The “vagina” liner for Max.
75
Despite the cisnormative language on the website, Lovense has made a few non-
normative choices in its products. First, in its operations (through code and how the
application works), Lovense lets any two connectable toys “connect” to each other in a way
that is unexpected given the website copy. In other words, two Maxes or two Noras can be
paired and sync via the app—a move that is hardly as heteronormative as the copy might
imply. Second, Lovense has made a long-term investment in making their devices accessible
to cam workers. The device is designed to be easy to operate for individuals who perform
“cam” work—a shorthand for people who charge money to viewers who watch them do
(often, though not exclusively) sexual acts on video. The Lovense ecosystem was designed
with this in mind. Third, the app enables “voice control” over Lovense toys. Commands
like “Alexa, make me moan with Max” or “Alexa, please my partner” are available to
Lovense customers. The integration provides accessibility to those who might have motor
control issues. While itis facile to dismiss this integration as another manifestation of
surveillance culture, looking beneath that we can see this integration as a move toward
inclusion—as one of its many possible interpretations. I note this specifically as a way to
way to signal that this reverse engineering process is one of inferring, and not knowing
explicitly.
Without the following technical steps of reverse engineering, any possible analysis of
the Lovense product is limited to the website copy and the interface of the application itself.
However, in this chapter I demonstrate a toolset that provides deeper layers of analysis. It
also serves as proof that data models, while difficult to obtain, are not universally
inaccessible. With the right steps, data models are available for examination and anlysis.
76
Uncovering the data model
In order to comprehensively analyze software in the context of its use, the data
model must be part of the analytic landscape. Without the data model, analysis like a close
reading of code, or assertions about algorithmic bias or dataset bias are incomplete, because
it is the data model that creates those possibilities. However, obtaining the data model of
commercial software is nearly impossible -- the data model is a series of artifacts that are
created in the process of making software. The shape of the data model directly impacts the
shape of the database used in the software, but the model itself remains elusive. Though it is
rarely possible to see the original data models of production software, it is possible to infer
the database schema (aka, the physical data model) and then to make inferences from that
model. In this section, I offer a technical methodology for obtaining the physical data model
and demonstrate how to infer the conceptual data model. As my use case, I continue with
the Lovense product and move stepwise through reverse engineering the Lovense
application.
engineering project might be to draw a set of architectural, plumbing, and electric schematics
of an already-built house. To find support beams, wiring harnesses, and large pipes will
require removing drywall and inspecting the contents between the walls. It might require
digging up some of the front yard to figure out where the house connects to the municipal
water supply. It is possible to do this reverse engineering without destroying the structure of
the home, but it will almost certainly make the home unlivable for a time. To reverse
understanding of how the house has been lived in; it certainly goes beyond comparing one
77
house to other houses or looking at neighborhood zoning laws. In this way, reverse
doing external analyses.49 In this project, instead of a house, I had the Lovense Android
Lovense has iOS and Android mobile applications. In cases like this, where
applications exist for two operating systems, those applications will use the same database
and data model. I opted to reverse engineer the Android application (aka, I chose the house
with unlocked doors); reverse engineering iOS applications is significantly more difficult
because iOS is closed source while Android is open-source software. Because Apple can
control the hardware on which their software runs, they have fewer concerns about
technological ecosystem. That is, nearly all of their code or hardware specifications are held
securely within their control and oversight, and they make available only what is needed to
Google, the Android operating system has always been open source. This means that the
code that runs the operating system is available for download, contribution, modification,
The Android platform, as well as most Android applications are built in Java, a
important innovation. Until that time, software was hardware specific. Code was written
and compiled for specific types of machines (recall the proliferation of branded software
49Please do not read this as a discrediting of existing analyses of software or their impacts. Instead, I am trying
to offer ways to develop deeper understandings of software as cultural objects.
78
packages like IBM-specific office management tools). Java was one of the earliest languages
that decoupled this dependence between software and hardware (code and machine).
Programmers can write Java on their own machine and prepare binaries for distribution.
Then, an operator can run that Java code in a Java virtual machine that adds system-specific
libraries. This means that one Java program can run on Windows, OSX, and Linux because
each of those machines will run the code inside a platform-specific container. This
interoperability made Java a natural choice for a mobile operating system that needed to run
on many different types of hardware. Because of Java's age and ubiquity, there is a robust
reverse engineering toolkit for Java applications. In other words, lots of people have been
working for a long time on easy ways to hack into Java apps.
While there is no perfect set of steps that every reverse engineering process follows,
I divided this reverse engineering process into four steps.50 First, I obtained, unpacked and
investigated the Android software package itself. Next, I deobfuscated code and read the
code for clues. Then, I developed a physical data model. Finally, I created a conceptual data
model based on what I had learned from the previous steps. In this section, I offer a
detailed description of each of these four steps of reverse engineering the Lovense Connect
app. Readers less interested in the technical details can safely skip to the next section.
Breaking In
The first step is to download Lovense Connect to a device that will allow it to be
manipulated—that is, to get the mobile application onto a laptop instead of a phone. Like
all Android apps, Lovense Connect is packaged in an APK. An APK is simply a renamed zip
50For a variety of approaches to reverse engineering, see Baloch (2017), Canfora, Harman & Di Penta (2007),
Z. Chen et al. (2019), Dang (2014), Dolan-Gavitt et al. (2015), Eilam (2005), Harris et al. (1995), Lanza &
Ducasse (2003), Lin et al. (2010), Rekoff (1985).
79
file, which is itself a folder “zipped up” or compressed into a smaller file size. Typically,
APKs are obtained via the Google Play store on an Android device. However, with my goal
of putting the APK on my laptop, I used a mirror of the Play store that I could access in a
web browser. Because the APK is essentially a zipped folder, it needs to be uncompressed.
One simple way to do this is with software called apktool. Invoking apktool in a terminal
will unpack the APK and allow inspection (Tumbleson & Wiśniewski, n.d.)
With only this first step complete, clues about the application emerge, including
important operating system files, and the still-obfuscated code of the application. The
results of this initial breaking in are artifacts worthy of analysis in their own right: the
operating system files provide an orienting first look into the workings of Lovense Connect.
The AndroidManifest is an xml file that contains information about the application, the
permissions it uses, and its hardware and software requirements (App Manifest Overview, n.d.).
The file is a key guide to navigating the rest of the application. Additionally, all APK folders
contain a resources folder, “additional files and static content that [the] code uses, such as
bitmaps, layout definitions, user interface strings, animation instructions, and more” (App
Resources Overview | Android Developers, n.d.). Browsing the resources folder is like looking at
pieces of the interface—images, text—disconnected from their context and use. The
strings.xml file (pictured in Figure 19) creates a variable for each piece of text (string) used in
the interface and provides the display text. There is typically one strings.xml file for each
language into which the app is translated. Looking carefully through each of these files helps
to get a 10,000-foot view of the application, and although data models are not yet available,
80
Figure 18
The permissions file in the Lovense APK.
Figure 19
The strings.xml file for the Lovense APK.
visible. However, the text within each file is unreadable; that is, each file contains series of
seemingly scrambled alphanumeric characters that are in an assembly language called Smali.51
51Assembly languages are considered to be “closer” to machine languages and “farther away” from the
incredibly human-intelligible languages that most of us write code in today.
81
The code for the application was not written in Smali: recall that code for this application
was written in Java. As part of the packaging process, the Java code is compiled into Dalvik
bytecode (or .dex files)---this is the format that Android OS uses. However, bytecode is even
less human readable. As a help, apktool converts this bytecode to an assembly language
called Smali. These many conversions are exemplary of the multiple mediations required to
turn “code” into “software” and the mediations required to turn software back into code.
A folder of Smali code is not ideal for our purposes. The second step is to convert
the Smali code to Java source code. Like all others, this mediation does not yield in a perfect
translation. Using a Java decompiler, .dex files can be imperfectly translated into Java. The
result of this decompilation is a series of folders and Java files. If the code has not been
deliberately obfuscated by the author(s), it will be very close to the state it was when it was
strings like a3f3 that allow the code to execute but not be intelligible. After deobfuscating
the Lovense Connect code, some files were completely readable with intelligible variable
names (like emailaddress or logintime). Others resembled Figure 20 where parts of the code
82
Figure 20
Partially obfuscated code. Note some variable names are random strings.
With mostly readable code, it is time to read the code to investigate the code’s
connection to its data source.52 This is the first move towards uncovering the app’s data
model. Java applications like Lovense Connect can use several different techniques to create
and interact with databases. In some cases, connections to the database need to be inferred
from lines of code that directly read or write data. Fortunately, Lovense Connect uses in-
code annotations; that is, the code describes the database tables and fields it is expecting.
Annotations are a way of describing the physical design of a database and specifically the
52My reading of code here is a simple reading. Those interested in more complex and critical readings of code
should refer to Marino (2020).
83
way that classes (or, instantiated Java objects) should persist into a database. Finding
annotations is simply a matter of searching the codebase for common words used in these
annotations. For example, inside one of the app's files is the following code:
@DatabaseTable(tableName = “tb_back_work”)
public class BackWork extends BaseEntity {
@DatabaseField
private String owner
@DatabaseField
private String staticParams;
@DatabaseField
private String status;
@DatabaseField
private String targetEmail;
@DatabaseField
private String workId;
standards.53 The format requires that the annotation specify the name of the table, and the
name of each field in the table. Reading the above, an excerpt from a file named BackWork,
reveals that the code is expecting/creating a table named tb_back_work with five string
fields: owner, staticParams, status, targetEmail and workId. Discovering that the project uses
annotations and that they seem to be mostly deobfuscated means that the next step is to
locate all of these annotations in the codebase and gather them into one file.
conceptual data model lays out all of the important things (aka, entities or objects) that an
application and database might track. The logical data model adds detail to the conceptual,
indicating the relationships between those objects and perhaps the flow of information.
field types.54
start with the physical.55 In order to do this, I transformed the file of annotations into a
series of SQL CREATE TABLE commands.56 CREATE TABLE statements are commands
in the SQL programming languages that do as their name implies -- they create complete
tables within the active database. The CREATE TABLE statement for the annotation
Executing all of these create table statements on a mysql database server would yield
a working database that, to the best of the code inferences, supports Lovense functionality.
However, a visual representation of this database is more useful. I used a data modeling
program to visualize each of the database tables and to begin to uncover how the tables were
connected through primary-key foreign-key relationships based on the field names. In other
words, the annotations alone do not reveal that the owner field above is the same as a
54 A very detailed physical model would include all primary/foreign key relationships, length and validation
restrictions for fields; in other words, everything required to create the database.
55 There are other ways to extract a database schema. See Andersson (1994) and Lubis & Zamzami (2020) for
two examples.
56 To do this, I used the command-line tool {grep} to search the codebase and pipe the results into one text
file. I then used a set of regular expressions to transform the annotation format into an SQL format. With
these steps, I was often able to get complete table contents.
57 The complete SQL creation script that I used to create the initial physical model is available in Appendix A.
85
Like data modeling itself, this was a bit of a creative process. In some cases, the relationships
were self-evident. For example, a table named user might have a field named id, and another
table might have a field named user_id. Based on my experience with software engineering,
I read through the tables, creating links and making a physical model based on both the code
Constructing the physical data model (Figure 21) was not a straightforward operation
for several reasons. First, while the model reflects the annotations in the codebase, the
Lovense APK's codebase was partially obfuscated, and contains what I believe to be some
“legacy” code.59 The partial obfuscation and legacy code make it difficult to determine
which parts of the data model are still in use, and to make sure that all annotations were
captured.
58 Readers with software experience might note that the format of this physical model is an ER Diagram. In
another work, I link the ER diagram to the maintenance of whiteness (Stevens et al., 2021)
59 Code becomes “legacy code” when the functionality it offers is no longer used by the application. This code
is often not deleted for several reasons, including that developers are unsure if deleting it would break other
functionality.
86
Figure 21
example, a table named BackWork does not seem to connect to anything. In this case, a trip
to the file where that table was defined is not any help. The file BackWork.java simply
contains additional getters and setters.60 Lovense uses the common data access object
(DAO) pattern, that allows developers to separate the functionality of a data object (in this
case, a table) from the fields and field storage information. In the case of mysterious tables,
visiting every file in which their parent classes are mentioned is the only thing that I can do.
not uncommon for production databases to contain legacy tables and for production
codebases to contain legacy code. That could be the case here. However, because the
Third, some tables, like tb_program_pattern do not seem to have relationships with
other tables. Some tables have a primary key that seems like it is keyed based on a text field,
often called “owner”, rather than the more common id field. In this case, I speculate that it
is keyed off the userid, based on a field named “ownerEmail” in table tb_alarm. Some field
names also do not make immediate sense, and in some cases, deeper understanding is
complicated by the initial obfuscation of the code---function and variable names are turned
into long strings, obscuring their purpose, making code especially difficult to read. Given
60In object-oriented programming, getters and setters are methods that get (return) or set (establish) a
particular property of a class.
88
Making a conceptual model
The final step in reverse engineering Lovense Connect is to develop a conceptual
concern in the system. There are many ways to create a conceptual diagram and Figure 22
represents one way to explain what is happening at a high level (aka distinct from the
Figure 22
A conceptual model of the Lovense application.
that the shapes and lines are doing here. In Figure , the squares simultaneously represent
abstract entities and all instances of those entities. That is, the user box is both an entity
describing every human user, and a representation of how one particular user might be
89
connected to other parts of the application. The lines represent one expression of the
relationships between entities. Parts of Figure could be read aloud as “Users own toys” to
signify that user entities have an owner/owned relationship with toy entities. It could also be
read as “A user owns toys” to signify that the relationship is not necessarily a 1:1
relationship. It could also be read as “Users friend and block other users” to indicate that
individual users can perform actions (becoming friends with, blocking) other users.
As these read-aloud examples indicate, the words on the lines are descriptions of the
relationships between entities. The relationships are bi-directional, e.g., users own toys and
toys are owned by users. However, the relationship is depicted in the direction that favors
using active-voice verbs. Within an object-oriented paradigm, the active voice signals to the
programmers which entities might “own” the tasks related to maintaining the relationship. A
conceptual diagram might be considered complete when reading it aloud covers the main
functionality of a product. In the case of the diagram above, there is more detail missing
about the ways that users and toys can interact with each other.
Figure 23 helps to fill the user-toy gap. Here again, squares represent entities and the
lines represent some of the relationships possible between those entities. In this case, the
polysemous line between user and toy in Figure 22 has been expanded to show how two
users, each with their own toys, can interact. This diagram can be read just like the previous
diagram -- “a user messages a user”, “a user controls a toy.” However, this diagram is itself
also an abstraction leaving out the technical details and other entities that might be involved
90
Figure 23
While the steps I have outlined may seem especially technical or tedious, within this
project, they are essential to developing a deeper understanding of the software and the
and mediations involved is crucial. With this information, we have immensely more
Conclusion
The crucial takeaway from this reverse engineering processes is that software and its
internal operations do not need to be taken at face value. Once data enters a database, it
91
separation of parts of data into different tables) and a generalization (via the elimination of
field dependencies. However, the power of decontextualization is not only in the specific
harms that it causes to those using the software, but that through the use of
decontextualization, certain things stay frictionless, seem as though they are the only
approach, and support how it has always been done. In the data model, it is possible that
there is not oppression, but smoothness, ease and the infrastructural invisibility of white-
cisness. As a result, looking through these data models is the opposite of reverse
engineering to discover bias. Instead, the reverse engineering helps us to discover ways that
things have always been, ways that things stay smooth. In some ways, this is reverse
engineering to see how “well” something is made. This is a critical perspectival shift.
By looking at the data models for Lovense, we can see the normalization of
surveillance and the generalization of everyone who uses the product (whether they be cam
workers or lovers) into one entity of Users. We can also see the exceptional lack of security
and attention to privacy in the data model. Though there maybe be some small security
allowances in the code itself, those allowances are absent in the model, which is, I believe, a
reflection of the social location of the modeling team. In this chapter, I demonstrated a
methodology for recreating the physical data model of one piece of commercial software,
fields of science and technology studies, critical data and critical software studies.
92
CHAPTER 4
Password must contain at least one capital letter and one number.
Software systems that require you to login will often present you with messages
about password requirements when you register. Those requirements are designed to
encourage you to choose a “strong password” (aka, one with many different characters in it
or one that does not contain your username). In contrast, a “weak password” is one that is
easy to guess or trivial for a simple computer program to hack. We need strong passwords to
protect our access to our accounts and to keep our information secure. The increased
culture informed a quote posted by tumblr user “meadovv.” They wrote: “Change your
name and gender at least twice a year for security reasons” (See Figure 24). Meadovv's
comment was the first in a series of comments riffing on software security requirements.
But meaddovv and the other users commenting are not simply mocking password
trans perspective, satirizing the limits of technology’s capacity for handling trans genders.61
61I am not making any claims as to the gender identities of the people behind these unknown-to-me tumblr
accounts.
93
Figure 24
An Instagram post from user dbweasel of a tumblr meme.
In this chapter, I explore technology’s limits for handling genders. I begin with data
models, demonstrating how data models can be used to intervene in the way software
handles aspects of people’s lives like gender. I then discuss the limitations of this approach,
acknowledging that the involvement of technology does not improve marginalized peoples’
experiences with datafied systems, only makes those experiences less bad.
To return to the tumblr post, meaddov’s joke that we should change our name and
gender twice a year for security purposes reveals an understanding of gender as fluid, and
perhaps even capricious and hard to pin down. Tumblr user “goodzillo” riffs on meadovv's
comment: “using a gender manager so I don't have to memorize what gender I'm currently
94
using for which accounts” (Figure 24). Goodzillo's comment gestures toward the password
Passwords are contextual: we should not use the same password for different accounts. If
one of those accounts gets hacked, your password could be guessed and then used on
another site. In order to keep track of all of these different passwords, it is helpful to use a
password manager to keep track and reduce the temptation to reuse easy-to-remember
passwords. Here goodzillo is joking that they have so many genders (one for each account)
that they need to use a gender manager to keep track. Their comment acknowledges the
deeply about gender and technology together, and offers space from demands that trans
people have one consistent and known gender. The subsequent comment from user
“horriblehottakes” reinforces this when they say “'Man' and 'Woman' are commonly used
genders. To make yourself more secure, try to come up with a unique gender that only you
know.”
Trans people often must perform “wrong body narratives” in order to access gender
affirming care, making the possibility of changing one's gender twice a year especially absurd.
In this context, as Jay Prosser notes, transsexuality is “always narrative work” (Prosser, 1998,
p. 4). These narratives (which absolutely may feel authentic to some) create a trans
intervention, then, is the a way of making a wrong body into a right body (Hughes, 2018).
Within this narrative, the bodily corrections happens only once and these ideas of transness
are founded on and inseparable from teleology: transness is a directional and goal-oriented
movement from one gender to another. However, this teleological foundation of transness
does not stand alone. As Chen and cárdenas note “If trans implies a movement from one
95
gender toward a different location, then transness is always imbricated with forward time
and cannot exist without linear, teleological time” (J. N. Chen & cárdenas, 2019). Linear,
teleological time is what Jacob Lau describes as “cis normative time.” Drawing on the
recognition that transness is racialized, we can say then that this is not just “cis normative
time,” but “white-cis normative time.” Thus white-cis time “remains the master code
through which trans folks must translate their racialized, gendered and sexed histories” (Lau,
2016, p. 2). This translation process is founded on ideas of the fixity of gender, and its
inherent knowability.
White-cis time also manifests in infrastructural control over trans life. A major part
of many trans people's lives is working to minimize the friction happens when a trans person
encounters a datacentric and white-cis centric infrastructure. For example, as trans people
were able to access gender-affirming care and to live in alignment with their gender, they
needed documents to reflect their gender.62 Accurate documentation is, among many things,
an access issue--a trans woman with an id reading “M” can be subjected to “embarrassing
and invasive questions” when trying to simply go about the business of her life (Davis,
2017). Through the efforts of trans activists, the amount of paperwork and gate keeping to
change one’s gender marker on state documents has steadily decreased in most states.
However, for many of us, neither an M nor an F is the correct gender marker. Beginning
with Oregon in 2017, people like me were able to access X gender markers on driver
licenses; and in 2021, the US Government began issuing passports with an X (Hauser, 2021;
Wamsley, 2017). Whether or not there should be gender markers on state documents is a
matter well-covered by other scholars (Davis, 2017). What is important here is that gender
62For an overview of the administration of gender see Spade (2008). Cassius Adair (2019) traces the
development of driver licenses and administrative gender changes through the lens of anti-Blackness in the US.
96
markers reflect the state's desire to administer, supervise and receive “proof” of an
individuals's gender marker change. In other words, once an individual has been proven an
“authentic trans,” the state administers gender (transness) using the logics of white-cis time.
marker is changed from F to M or from M to X and it assumes that once someone has
become an M -- legally, medically, socially -- no other change is expected and they are to
While this undated exchange between three tumblr accounts is satirical, it is also
reading, the idea of changing one's name and gender twice a year is inherently trans.
However, transness is not simply about a change in gender and (for some) name.
and gendered logics of colonialism. White supremacy and colonialism created ideas of
and victim of white supremacist and colonialist logics. In the United States, conceptions of
transness are inextricable from race.63 As Emily Skidmore recounts trans woman Christene
Jorgensen's rise to fame in the 1950s, she describes Jorgensen's embodiment of whiteness,
media of the time. In other words, Jorgensen only counted as a “good transsexual” because
women---Mexico City native Marta Olmos Ramiro, Pacific Islander Laverne Peterson, and
Black American Delisa Newton---were exoticized and rendered deviant, described mockingly
63A history of trans in any context is out of scope here. For more trans histories, see Meyerowitz (2002),
Skidmore (2011), Snorton (2017),Stone (1992), Stryker & Aizura (2013).
97
in the press. In an autobiographical essay, Newton tells readers how she struggled to find
gender affirming care: “Because I am a Negro it took me twice as long to get my sex change
operation as it would have a white person. Because I am a Negro many doctors showed me
little sympathy and understanding” (Skidmore, 2011, p. 292). Newton's statement, then,
provides one small example of the way that transness, both then and now, is always
racialized.
speculative ecosystem, in which gender, and perhaps sexuality and queerness, can exist
comfortably within sociotechnical systems. In “QueerOS” Fiona Barnett, et al. build on Kara
Keeling's work to think queerness and digital technologies together. “For Keeling, a
separate from information technologies” (Barnett et al., n.d., p. 1) What would it mean to
constructed with aspects that mirror computational operating systems, like interfaces, users
and kernels. The intervention in QueerOS is to offer new perspectives of queerness, but
2). In this chapter, I take up the challenge of thinking racialized trans lives, not through
If we think of transness, and the multiplicity of trans lives to be less teleological and
instead “about a multidirectional movement in an open field of possibility, then time and its
direction become more fluid.” (J. N. Chen, 2019, p. 473). Computing and its objects,
98
including databases and data models, do not lend themselves to fluidity. However, in this
chapter, I test the limits of data models to build a collection of entities that are fluid,
evolving and perhaps, fundamentally unknowable. This critical engineering intervention will
serves as a way to both imagine humane technology and as a “return the possibility of
Modeling (trans)gender
“Instead of speaking of a person’s gender status, we might do well to speak of a person’s gendered
status in a given cultural location, at a given time, and for a given purpose” (Hale, 1997, p. 232).
Because the temporality of my own work and critical technical practice is oriented
towards real situations that need intervention, I begin this section with a vignette. This
vignette is a composite of many people's experiences --- my own, those of my friends and of
people in my communities who have shared their struggles when gender encounters
computational systems. This vignette also mirrors the practice of user stories and user
______
On Monday, Tawny went into their local clinic for an HIV test. They gave their
name and contact information, and River, a clinic employee, entered that information into
their client database. After the test, Tawny and River talked about how the clinic sometimes
needs volunteers to help hand out condoms and HIV literature. Tawny was interested so
River put their same information into their volunteer management database.
On Friday, River called Tawny at home to invite them to volunteer because they had
99
“Hi, is Tawny there?”
“Oh, no I don't think I do. They're really tall, with brown hair and a heart tattoo on
their forearm.”
The following night, Tawny is performing as a drag queen. River was out with his
friends and recognized them making the rounds of the tall cocktail tables in the bar.
“Hi Tawny!”
Tawny shot him a look, “When I'm here and when I'm dressed, you better call me
Miss Anthropy.”64
The next morning, Tawny wears their hair down, puts on a dress and goes to brunch
with friends. On the way home, they stop at the pharmacy to pick up their prescriptions.
The pharmacist tells them that only Marcus can pick up these controlled substances. Tawny
shows the pharmacist their (Marcus's) driver license. The pharmacist spends a long time
looking at Tawny, and looking at the ID, finally agreeing to give them the prescription.
Tawny is relieved until they hand the pharmacist their TrueName MasterCard to pay.65 The
pharmacist, seeing the name difference, refuses to give them their prescriptions.
______
In the current way of storing gender (as a binary attribute of a Person entity), there is
no way to capture Tawny's experience or to ask questions about their gender that give us
meaningful and useful (never mind, true) answers. What Tawny’s story illustrates for us is
pushing past white-cis transition narratives and for having those understandings reflected in
our administrative/database systems. Places where non-cis people meet white-cis data
structures are sites of administrative violence and data violence (Hoffmann, 2018; Spade,
2011). The modeling that I do next, then, is not a recommendation for the datafication of
approach to harm reduction for people like the fictionalized Tawny and the very real trans
epistemological and data shifts from Figure where gender is an attribute to its final form
where gender is regarded as temporal, contextual and multiplicitous, stored in a way that
actively subverts white-cis linear time. This demonstration is also important because, for
those who have never made a data model, it shows how many active and conscious choices
need to be made in its creation. Moving stepwise through this process reinforces the data
important sites of intervention. As I demonstrate in this section, changing the data gender
model also changes the shape of the software that uses that model.
critical technical practice, I begin this part using a statement (or a proto user story, to use the
terms from Chapter 1) from the opening vignette: Out and about, Tawny uses they/them pronouns
and their gender is woman. Initially, this statement could be represented by two attributes on a
user entity (see Figure 25). However, this is the same representation that has been failing us
101
and it does not account for the complexity of Tawny's life. Contrast this with Figure 26.
Similar to the address model, we now have support for a Tawny to have as many genders as
needed. This lets Tawny have genders like woman, man, drag queen, and femme. In this
simple act—promoting gender to an entity—we are able to open space for gender to
become (within relational onto-epistemologies) an object distinct from a person, and with its
own attributes and relations. Although we are still working within the limiting factors of
and use the power of relations to position gender as a subject with its own set of attributes.66
2.
66I am excited about future interpretations of this move through the lens of abjection as framed by Robert
Phillips in Stryker & Currah (2014).
102
Figure 25
A one-table model showing the attributes of the User entity.
Figure 26
A two-table model, promoting the Gender attribute to an entity.
However, the diagram in Figure 26 does not support the nuance of Tawny's gender,
which we know to be fluid and contextual. Figure 26 supports us asking important questions
like “What genders does [person] have and what pronouns do they use for each?” But it
lacks the flexibility to tell us more than that. As Tawny's story shows us, gender is not just
about having more than one set of pronouns, it is about understanding the contextual nature
of that set. How can we record information about the context in which a gender is active?
As a next step, we can add contextual information to the gender table by making context an
attribute of gender (see Figure 27). Adding context as an attribute of gender is a logical next
103
step -- it needs to go somewhere and since it's about gender, this seems fitting. However,
this design choice also implies that context is gender dependent, rather than gender being
context dependent. This is a subtle but important difference related to what becomes an
attribute. When context is relegated to an attribute of gender, it can imply that context does
not exist separate from gender and that two genders might not have the same context, or
two contexts the same gender. This is the same logic undergirding the move to make address
Figure 27
A two-table model, with context as an attribute to the Gender entity.
This choice assumes that a person can have an unlimited number of genders, but
that each gender exists within its own context and has its own set of pronouns. The design
in Figure 27 replicates the flattening that we are already working to subvert--- the design
assumes one set of pronouns in one context. That is, it requires that when I am at a
particular place and my gender is nonbinary, I use they/them pronouns. If there are several
contexts in which I have this gender/pronoun combination, the design would create
Instead, consider Figure 28. Here again, we have made the crucial onto-
epistemological move of promoting an attribute (the context field) to an entity (the Context
104
table). As we have seen in the past, this promotion creates opportunities for considering
what kind of information about a given context would be useful to store. Figure 28 is a
minimalistic example, adding only the field “description.” I could imagine a Context table
that contains many more fields about a particular context, as appropriate to the software's
context of use. Note that Figure 28 has two new tables, Context and Gender-Context. This
second table is necessary because it links one context to many genders, and one gender to
many contexts. In other words, this table allows for a gender to exist in more than one
context, and in turn, for a context to contain more than one gender.
Figure 28
A four-table model, in which Context has been promoted to an entity.
This move reflects the position that the pronouns one uses do not need to align with
the normative expectations of that gender. To keep pronoun as an attribute of the gender
entity reinforces this connection in cisnormative ways. The separation of pronouns into a
linked table, reinforces the contextual nature of pronoun usage, and this series of design
choices-based on the intentional placing of attributes, opens the possibility of allowing the
data model to let gender be as fluid and contextual as we known it to be, and to consider
“gender status” rather than a fixed gender. This diagram also does not make an explicit
allowance for multiple sets of pronouns. Are he/they pronouns the same as having two
separate entries, e.g., he/him and they/them entries? It is not for me to decide whether
105
people experience difference between describing themselves as “he/him or they/them” or
“he/they.” This design could support both, depending, as always, on the questions that the
However, the model is still not complete. Currently gender and pronouns are sibling
attributes on a gender entity. However, this assumes that a gender always uses the same
pronouns, even in different contexts: this assumes that in every place where my gender is
trans, I use they/them pronouns. However, this is not the case. In order to store this in our
current configuration (that is, in the model represented Figure ), I would need to have two
different genders, one with they/them pronouns and one with he/him pronouns. This
assumes that the genders are ontologically different when using different pronouns. The
appropriate next move, then, is to turn pronouns into their own entity as well. Because
there are a finite number of pronouns possible, it makes sense for pronouns to be a simple
lookup table, listing the pronouns, and even an example sentence or pronunciation (for neo-
pronouns).67
This chapter opened with an anecdote about changing name and gender twice a year
for security reasons. Name is connected to gender and to context, as the vignette with
Tawny illustrates. We need a way to associate name with gender, context, and pronouns.
Figure 29 does exactly this by adding the name to the Gender-Context table which now
contains gender, context, pronouns, name and (by extension) the user record itself.
67 I want to note that this is an arbitrary design choice. I prefer a look up table when the number of options is
likely finite and knowable in advance (even if we consider adding pronouns in other languages), but there is a
valid case for designing this another way.
106
Figure 29
A five-table model, adding name to the Gender-Context entity.
This final iteration allows us to represent gender statuses from the vignette: When
Tawny (user) is at the club (context) her name is Miss Anthropy (name), they are a drag
While this model is clearly abbreviated to show only the attributes of interest to our
discussion, in my design this data model does not contain timestamp fields. The presence of
timestamp fields indicates that the order in which data were entered is important, and that
fields like date_created provide useful information. Those fields are often proxies for
“records that are new” or to answer questions like “Is this user's data stale?” In this model,
I advocate for other ways of answering these questions. For database systems that
relational paradigms, and it is able to fit in to existing relational databases. However, since
models do not exist in a vacuum, it is useful to see how it could be implemented to collect
the data it has been designed to store. Because this model has four new tables of user data,
107
the order in which data is saved becomes important. Because gender is a separate entity
from user, the user record will need to exist first. This workflow might look something like
Figure 30. In other words, the promotion of gender to an entity allows it to be completely
separated from the creation of a user record, almost as if the user's gender was not central to
Figure 30
An order of operations for entering data into the full data model.
In order to visualize how different this data model is and to continue to reinforce the
ways that data models shape software affordances, it is helpful to look at a sample
implementation. In software implementing this data model, gender collection might look
like the wireframe in Figure 31. A wireframe is “the basic blueprint that illustrates the core
form and function found on a single screen” of an application (Hamm, 2014, p. 15). Because
of the data model pictured in Figure 29, the software contains affordances that allow the
user to represent their gender(s) in a way that is quite .similar to how they might describe
them during a casual greeting, or aloud to a friend: “I’m not out at work, so I use he/him
pronouns and my coworkers call me Trevin.” This form could appear in an “edit profile”
section of a website, as part of a user onboarding process, or other appropriate context. The
wireframe itself contains only minimal user interface elements as a proof-of-concept and a
visual contrast to some of the more typical ways that gender is requested in software
108
products. Throughout this dissertation, I have argued that data models have immense
power in the ways that software is shaped and built. This example supports that argument:
by changing the data model of gender, the software’s shape will change accordingly.
There is an important question here that I leave deliberately ambiguous both in the
data model and in the wireframe --- while some people have different genders in different
moments or different contexts, others experience significant gender continuity (or, perhaps
the opposite of gender fluidity), but use a different name or other pronouns. The presence
of the gender field doesn't require people to have distinct genders when other factors
change. This is the point of this design choice. It allows for a separation between name,
gender and pronoun that supports both people who have different genders and people who,
for infinite reasons, use different pronouns or names (and similarly, who might use the same
109
Figure 31
A wireframe showing a proof-of-concept implementation for the gender model
This wireframe starts with example sentences demonstrating to the user that the
software supports a rendering of themselves that is similar to how they might want to
describe it. The button (8) allows the user to load more example sentences. The form
replicates the structure of the example sentence above, encouraging the user to follow a
familiar “mad libs” style fill-in-the-blank. This implementation invites users to firstly
consider the context (1) when answering questions about their gender. The context field is a
contexts as a start for the user. The user is not required to choose one of these contexts, but
can choose “another place” and enter a context that is appropriate for them. The field is
called “context” and not place because context is a more expansive term that may include
phrases like “anytime I'm with my family” or “when I'm in drag” - ideas that transcend
110
geographic notions of place. Next, the user can enter their gender in a text entry field (3).
Advocates of “clean data” prefer set options to text entry because people could enter
“anything at all” into these fields. From my perspective, being able to enter “anything at all”
is exactly the point; clean data is as oxymoronic as raw data (Gitelman, 2013). The pronoun
field (4) mirrors the structure of the context field---users are presented with several common
answers, but users are free to enter their own answer. In this configuration, those answers
do not become part of the list available to other users - it is not a crowd-sourced list of
pronouns. This is in part because for some people, their pronoun is their name, and they
decline any third-person pronoun usage at all. The name field (6) exists within the form
asking about gender. This could imply that the user's name is not requested in other parts of
the onboarding (but again, this is a proof of concept.) Finally, the form invites users to add
another statement (7) as needed. The wireframe is missing the mechanisms by which users
In this section I have presented a data model in which aspects of a person's gender(s)
are taken separately and seriously, resulting in a model that, on its face, does not reproduce
some of the harms that sociotechnical systems can cause trans people. The wireframe that
extends from the model supports this -- the data model results in a gender entry form that
can start to capture the contextuality and fluidity of gender. But now that it is possible and
complete, is it actually a thing that should get built? Is the answer to reducing systemic,
infrastructural and data harms to give more data? To be more visible? Frameworks of harm
111
Analysis
Recall the story of Tawny and their interactions at the health clinic, at their house, at
the drag bar, and at the pharmacy. In one case, the clinic's database's flat approach to gender
lead to Tawny being misgendered both at home and in public. In the case of the pharmacist,
Tawny was not able to receive the medications that they needed. One database would not
solve both problems, of course; but if both the clinic and the pharmacy had implemented
versions of the data model and wireframe I described (in Figure 29 and Figure 31), Tawny
However, in this chapter, I have taken that demand seriously and made a working
data model and functional wireframe that attends to the real harm that trans people
My offering does reduce some of those harms and is thus best considered within a
people where they're at,” typically in the context of drug use and further harm or disease
prevention. In that context, harm reduction is based on the belief that drug use harms
people and it works to mitigate that harm, rather than encourage people to stop using drugs.
Harm reduction programs provide interventions that often include needle exchanges and
safe drug use sites. In my work, I think of harm reduction as a reformist approach,
moment for most groups of people. Rather than move to abolish those oppressions, the
approach in this chapter seeks to mitigate the harm caused by whiteness and cisness.
In “Decolonization is Not a Metaphor,” Eve Tuck and Wayne K. Yang describe the
process of settler harm reduction as one “crucial in the resuscitation of practices and intellectual
life outside of settler ontologies” (Tuck & Yang, 2012, p. 21). In that lineage, this chapter is a
112
form of white-cis harm reduction, rehydrating ideas of gender in data-centric conversations
and circles. However, I heed their caution that forms of harm reduction, regardless of how
important they may be “[do] not inherently offer any pathways” to fundamental structural
social change.68 Harm reduction does offer material changes to the present world, but those
changes, like this data model and wireframe, are contextual just like the genders they are
capturing.
There are four specific harm reduction benefits. First, the data model creates
affordances separating the user from gender data. Via this separation, the user table could
be queried and user data, absent gender, would be returned. This turns the user's gender into
a sort of NULL value---unknowable until other tables are queried. The NULLness of
gender here renders it unknowable, indeterminate and, in that way, it could be considered
queer (Gaboury, 2018)69. While this may seem to be simply a theoretical benefit, a
the software.
Second, because of the way that the model leverages Codd's relational paradigm, it
offers capacious affordances for the ways that a user might experience their gender(s).
Relational moves (or, even the paradigm of normalization) are often used to advance logics
“scalable gender,” this model allows for the infinite ways that individuals experience their
contextual gender(s). With this allowance, the model responds to critiques of datafied
68 One method of technosocial harm reduction is the series of initiatives designed to store Indigenous place
names for locations in the United States. These toponymic projects often use USGS GIS data as the
foundation, and then overlay Indigenous words for places (Chambers et al., 2004).
69 My thinking here is also influenced by Bopp et al., (2019), Johnson (2018).
113
systems in which people are forced to commit themselves to a gender which is not
Third, the model especially rejects a popular technical move toward conciseness and
brevity. Computing and data modeling best practice is traditionally to store information in
the smallest container possible, abbreviating and curating wherever appropriate---states and
countries are often stored by abbreviation and then hydrated for display to the user;
formatted numbers, like phone numbers and social security numbers, are stored only with
numbers and formatted for familiar display. By promoting so many aspects of gender to the
status of entities, the model reflects an epistemology that there is an expansive amount of
information that is relevant to gender. It signals that brevity is a disservice to the subject and
it follows the admonition to be careful “that we have not lost any data in the translation to
tabular form” (Simsion & Witt, 2005, p. 41). In other words, via its design, it acknowledges
that so much data has been lost and is a move to restore some of it. Within a datafied
context, this expansive epistemology implies that promoting other aspects of one's identity
Finally, this model grapples directly with white-cis temporality and its limitations and
affordances. While data models are designed to support answering questions, their design
can also resist providing answers. Because there are no timestamps or integer-based primary
keys, the model resists answering questions about transition “direction,” treating all entered
documents was to prevent “fraud” and to make the identified person “known” to the
identifier (Currah & Moore, 2009). That “fraud” was a concern reflects the belief in a “truth”
70A common question that cis people ask when encountering a person who just says that they are “trans” is to
ask “from what to what?” expecting an answer like “FtM” or “MtF”
114
that that the fraud might hide. This data model promotes multiplicity, fluidity and a
This data model was borne out of my deep concern for the harms that sociotechnical
systems cause marginalized people. However, harm reduction approaches do not reduce
harm equally. Aligned with statist biopolitical interventions, the data model separates gender
from other aspects of race, class, disability, and sexual orientation. For many cultures, one’s
gender identity and sexual orientation are inseparable and one word would be used to
describe both. Similarly, certain words for gender identity are available only within certain
racial and cultural milieus. The designs of the form and model do not reflect those
imbrications. This is an example of one of the myriad ways that the model exists within
approach, performing the data version of respectability politics. In this way, it is reminiscent
of the ways that the trans people, like Christine Jorgensen, who rose to early notoriety were
the most cis- and hetero-normative people. It is for women like Christene Jorgensen, and
subsequent decades of white, cishetero normative trans people that shifts in the
“Could we see these changes as the individual pushing back against the nation state, pushing
the boundaries of the state, and the state responding and acknowledging self-expression?”
(Quinan, 2017, p. 164). In other words, are more nuanced gender identification and
administration systems good? Do they give us more power when interacting with the state?
I argue that the answer is an unequivocal “No.” As Eric A. Stanley has argued, we should
“resist reading these biopolitical shifts as victories” and resist any “work to translate and in
turn confine the excesses of gendered life into managed categories at the very moment of
115
radical possibility” (Stanley, 2014, p. 90). The data model in this chapter does just that---it
enacts, within the limits of available technology, one understanding of how gender works
and incorporates many of the critiques that trans people have about gender collection
systems; it turns our gendered lives, in all of their messy excesses into clearly defined
attributes of entities.
Implementing this data model is a profound act of settling, an act in which “less bad
becomes the only name freedom can take” (Stanley, 2021, p. 115 emphasis in original). That
this data model is less bad does not make it not bad and does not make it better than inviting
moments of radical possibility. In the absence of data models that “accurately” capture a
person’s gender and its expansive constellation of facts, we are free to dream both
abolitionist futures and alternative interventions. Rather than doing a better job of collecting
gender, we can, in nearly all contexts, simply do no job at all (Cifor et al., n.d.).
116
CONCLUSION
In October 2018, Stacia L. Brown (@slb79 on Twitter) posted cover art of the movie
Like Father asking “Other Black @netflix users: does your queue do this? Generate posters
with the Black cast members on them to try to compel you to watch? This film stars Kristen
Bell/Kelsey Grammer and these actors had maaaaybe a 10 cumulative minutes of screen
time. 20 lines between them, tops” (stacia l. brown, 2018). Brown's tweet was picked up by
various media outlets like Marie Claire and The Guardian, all reporting that Netflix algorithms
were engaging in racism by showing viewers Black actors after observing viewing activity
I return to the example of Netflix, with which I opened this dissertation because
Netflix is an easy target and exemplar of the ways that software companies have frequently
received attention for their algorithmic choices and yet managed to escape equivalent
critique for their data modeling choices. Complaints about the Netflix algorithm showing
too much (or too little) of a type of content, or “customizing” the race of the actors on the
content’s poster are only possible when there are equivalent supporting data models. What
is different about Brown's experience is that the algorithm had access to not only movies
watched, but racial data about the actors and the racial composition of the casts of viewed
movies.71 In this case, while the algorithm may have been engaging in racist behavior, it was
Throughout this project, I have discussed the ways that data models are powerful
structuring forces on software systems. In this conclusion, I offer both a summary of the
project as well as suggestions for how this work could be used by both researchers and
71
I am speculating here about how the Netflix algorithm works because it is, like a lot of technology, a closed-
source process.
117
software creators. Data models are an important part of the software development lifecycle,
borne as they are out of stakeholder requirements. Additionally, data models have a
cascading structural effect, controlling the shape of databases and thus influencing
subsequent data structures used in the written code. The choices that data modelers make
are not made in isolation, but within a social and cultural context.
As programmers and tech companies develop awareness of the social and cultural
contexts of software production, there are opportunities for small reforms even within the
feedback and participatory design at every stage of the development, including the data
modeling portion of the design phase. Drawing on works like Design Justice (Costanza-
Chock, 2018), teams can stop operating as though their work is separate from their
communities and work to fight ideas of the objectivity or neutrality of the software
engineering process. In practice, this might look like open-sourcing portions of the
development processes to promote transparency and public conversation about the works.
Within the United States, the infrastructures of software systems are built upon and
demonstrate this, I critically analyzed historical data modeling texts and linked several
techniques of data modeling to the techniques of white-cisness. That data reification mirrors
racial reification is not a coincidence, but a predictable outcome. This project makes an
important contribution to demonstrate that mirroring. The takeaway is not that some
techniques of data modeling uphold white-cisness, but that data models are a site where
power manifests. Chapter 2 is also a useful introduction to the emergence of data modeling,
118
As other forms of modeling (like mongodb’s document model) grow in popularity,
there is perhaps a tendency to reject techniques like reification and generalization because of
their connections with white-cis supremacy. I caution against this – moving reactively away
from one set of techniques will not separate data models from their cultural valances.
Instead, engineering teams using other modeling paradigms have the opportunity to think
modeling and race-gender power together. Because many teams will not have the capacity to
do this thinking without support, this is a crucial point of intervention for researchers. Those
of us with the training to do so can help teams develop new paradigms for data modeling. It
is my wish that more researchers develop fluency in both the mechanism of race-gender
power structures and the nuances of data modeling techniques and practices. This
To move past knowing about data models to seeing the data models within software
end, I reverse engineered the Lovense Android application. The four steps of reverse
engineering that I outlined are available to researchers investigating the social impacts of
software. Through the elucidation of those steps, I demonstrated that physical data models
are an available artifact, even within some commercial software applications. Once the data
While many software engineers are familiar with reverse engineering, it is my strong
desire for more social science and humanities researchers to integrate reverse engineering
into their research practices. I believe that developing the toolset to look within software,
not only theoretically, but materially, will greatly enrich critical software studies in a myriad
of ways. Importantly, scholarship that uses tools of reverse will be more accessible to
119
software engineers, and will help to move some of these rich critical insights out of the ivory
opportunity for intervention for those of us trying to create software for better worlds. I
made a data model and proof-of-concept wireframe that, in its design, could mitigate some
of the harms that trans people experience when encountering data systems. That this model
is possible given our existing tools is evidence that data models are important sites of power
--- they are leveraged to reproduce harms, even when there is no technological basis for
doing so. Working towards the complete data I made in Chapter 4 involves a lot of choices.
Part of my work is to make those choices visible and, through visibility, gestures towards
opportunities for intervention. By moving stepwise through the creation of the data model,
moving from the common practice of gender as an attribute to gender as one of many
my own understanding of gender. A different architect (or perhaps even me under different
constraints) would produce a different design even based on the same theoretical and
ideological foundation.
opportunity for both researchers and engineers.72 The practice and outcomes of research
can feel divorced from the constraints of engineering; software engineers may find academic
technical practice, can serve as a translational ground at which researchers and engineers cam
Although I address researchers and engineers separately in this conclusion, many of us are both and it is not
72
with whom I have worked, trained, and volunteered. Many engineers I know are eager to
make change but are themselves trapped within capitalism and claustrophobic ideas that they
should be able to do more within their constraints. Despite huge attention towards teaching
“ethical engineering,” in order to perform as an ethical engineer, one must be quite “heroic”
and able to withstand nearly super-human pressures (Basart & Serra, 2013).
Throughout this project, I have emphasized the role that data modelers play as the
creators of the model. However, I also want to emphasize that I do not believe that
individual engineers have significant power to change the technology on which they work.
In part, this is because of the software development lifecycle. The cycles of commercial
software production are such that most people on the project (including software engineers,
designers, and testers) do not have enough influence to drastically reimagine the project. In
my own career, I have had the structural and employment privilege to remove myself from a
project with which I had ethical problems. However, I was only able to remove myself
because the agency I worked with had another project for me. Had I stayed on this ethically-
questionable project, I would have not had any opportunity to impact its outcome. This
reflects both the size of the team involved and the capitalist foundation of software
production. Within our current methods of production, there is simply no room for most
small software companies to push back against problematic customer desires and even less
room for individual engineers. I am not here to critique critiques of the ethics of technology
121
production; however it is important to right size the emphasis that individual engineers can
Although individual engineers may not have much power to make significant
changes, in this context, researchers do. Those of us thinking critically about software and
society can expand our focus from algorithms to include data structures and data models.
Amidst conversations about the inscrutability of algorithms and the unreadability of code,
data models remain inherently scrutable. Using the techniques of reverse engineering I
outlined in Chapter 3, we can inspect the physical data models of extant software. Using the
analysis in Chapter 2, we can begin to think critically about the ways data modeling
techniques mirror a variety of systems of power. We can begin to think about using data
models as tools of intervention, as I did in Chapter 4. In other words, we can focus on data
122
REFERENCES
Advantage, n. (n.d.). In OED Online. Oxford University Press. Retrieved February 2, 2022,
from http://www.oed.com/view/Entry/2895
Android Open Source Project. (n.d.). Android Open Source Project. Retrieved February 21,
2020, from https://source.android.com/
App Manifest Overview. (n.d.). Android Developers. Retrieved February 21, 2020, from
https://developer.android.com/guide/topics/manifest/manifest-intro
App resources overview | Android Developers. (n.d.). Retrieved February 21, 2020, from
https://developer.android.com/guide/topics/resources/providing-resources
Auger, J. (2013). Speculative design: Crafting the speculation. Digital Creativity, 24(1), 11–35.
https://doi.org/10.1080/14626268.2013.767276
Baldwin, G. B., & Shultz, G. P. (1955). Automation: A new dimension to old problems.
Monthly Labor Review, 165–169.
Baloch, R. (2017). Ethical Hacking and Penetration Testing Guide. Auerbach Publications.
https://doi.org/10.4324/9781315145891
Bardzell, J., Bardzell, S., & Koefoed Hansen, L. (2015). Immodest Proposals: Research
Through Design and Knowledge. Proceedings of the 33rd Annual ACM Conference on
Human Factors in Computing Systems - CHI ’15, 2093–2102.
https://doi.org/10.1145/2702123.2702400
Barnett, F., Blas, Z., Cárdenas, M., Gaboury, J., Johnson, J. M., & Rhee, M. (n.d.). QueerOS.
Retrieved January 12, 2019, from http://dhdebates.gc.cuny.edu/debates/text/56
Basart, J., & Serra, M. (2013). Engineering Ethics Beyond Engineers’ Ethics. Science and
Engineering Ethics, 19(1), 179–187.
Bauer, R. (2008). Transgressive and transformative gendered sexual practices and white
privileges: The case of the dyke/trans BDSM communities. Women’s Studies Quarterly,
36(3/4), 233–253.
123
Beauchamp, T. (2019). Going Stealth: Transgender Politics and U.S. Surveillance Practices. Duke
University Press.
Bender-Baird, K. (2016). Peeing under surveillance: Bathrooms, gender policing, and hate
violence. Gender, Place & Culture, 23(7), 983–988.
https://doi.org/10.1080/0966369X.2015.1073699
Benjamin, R. (2019b). Race After Technology: Abolitionist Tools for the New Jim Code (1st edition).
Polity.
Benjamin, R. (Ed.). (2019a). Captivating technology: Race, carceral technoscience, and liberatory
imagination in everyday life. Duke University Press.
Bivens, R. (2017). The gender binary will not be deprogrammed: Ten years of coding gender
on Facebook. New Media & Society, 19(6), 880–898.
https://doi.org/10.1177/1461444815621527
Bizzocchi, J., & Tanenbaum, J. (2011). Well Read: Applying Close Reading Techniques to
Gameplay Experiences. In D. Davidson (Ed.), WellPlayed3.0 (pp. 262–290). ETC
Press. http://dl.acm.org/citation.cfm?id=2031858.2031881
Bopp, C., Benjamin, L. M., & Voida, A. (2019). The Coerciveness of the Primary Key:
Infrastructure Problems in Human Services Work. Proc. ACM Hum.-Comput. Interact.,
3(CSCW), 51:1-51:26. https://doi.org/10.1145/3359153
Bowker, G. C., Baker, K., Millerand, F., & Ribes, D. (n.d.). Towards Information Infrastructure
Studies: Ways of Knowing in a Networked Environment. 20.
Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural,
technological, and scholarly phenomenon. Information, communication & society, 15(5),
662-679.
brown, adrienne maree. (2017). Emergent Strategy: Shaping Change, Changing Worlds. AK Press.
Browne, S. (2015). Dark Matters: On the Surveillance of Blackness. Duke University Press Books.
CanforaHarman, G., & Di Penta, M. (2007). New Frontiers of Reverse Engineering. Future of
Software Engineering, 326–341. https://doi.org/10.1109/FOSE.2007.15
Chambers, K., Corbett, J., Keller, C., & Wood, C. (2004). Indigenous Knowledge, Mapping,
and GIS: A Diffusion of Innovation Perspective. Cartographica: The International Journal
124
for Geographic Information and Geovisualization, 39(3), 19–31.
https://doi.org/10.3138/N752-N693-180T-N843
Chen, J. N. (2019). Trans exploits: Trans of color cultures and technologies in movement. Duke
University Press.
Chen, J. N., & cárdenas, micha. (2019). Times to Come. TSQ: Transgender Studies Quarterly,
6(4), 472–480. https://doi.org/10.1215/23289252-7771639
Chen, P. P.-S. (1976). The entity-relationship model—Toward a unified view of data. ACM
Transactions on Database Systems (TODS), 1(1), 9–36.
Chen, Z., Pan, B., & Sun, Y. (2019). A Survey of Software Reverse Engineering
Applications. In X. Sun, Z. Pan, & E. Bertino (Eds.), Artificial Intelligence and Security
(pp. 235–245). Springer International Publishing.
Chun, W. H. K. (2013). Race and/as Technology, or How to do Things to Race. In Race after
the Internet (pp. 44–66). Routledge.
Cifor, M., Garcia, P., Cowan, T. L., Rault, J., Sutherland, T., Chan, A. S., Rode, J., Hoffman,
A. L., Salehi, N., & Nakamura, L. (n.d.). Feminist Data Manifest-No. Feminist Data
Manifest-No. Retrieved November 6, 2021, from https://www.manifestno.com
Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. 13(6), 11.
Crenshaw, K. (1989). Demarginalizing the Intersection of Race and Sex: A Black Feminist
Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics.
University of Chicago Legal Forum, 1989, 139–168.
Crenshaw, K. (1991). Mapping the Margins: Intersectionality, Identity Politics, and Violence
Against Women of Color. Stanford Law Review, 43(6), 1241–1299.
Currah, P., & Moore, L. J. (2009). “We Won’t Know Who You Are”: Contesting Sex
Designations in New York City Birth Certificates. Hypatia, 24(3), 113–135.
https://doi.org/10.1111/j.1527-2001.2009.01048.x
125
Dahl, O.-J. (2002). The Roots of Object Orientation: The Simula Language. In M. Broy & E.
Denert (Eds.), Software Pioneers: Contributions to Software Engineering (pp. 78–90).
Springer. https://doi.org/10.1007/978-3-642-59412-0_6
Dang, B. (2014). Practical Reverse Engineering: X86, x64, ARM, Windows Kernel, Reversing Tools,
and Obfuscation (1 edition). Wiley.
Date, C. J. (2019). Database design and relational theory: Normal forms and all that jazz. Apress.
Davis, M. (1991). Thinking like an engineer: The place of a code of ethics in the practice of a
profession. Philosophy and Public Affairs, 20(2), 150–167.
Dolan-Gavitt, B., Hodosh, J., Hulin, P., Leek, T., & Whelan, R. (2015). Repeatable Reverse
Engineering with PANDA. Proceedings of the 5th Program Protection and Reverse Engineering
Workshop, 1–11. https://doi.org/10.1145/2843859.2843867
Dourish, P. (2016). Algorithms and their others: Algorithmic culture in context. Big Data &
Society, 3(2), 2053951716665128. https://doi.org/10.1177/2053951716665128
Dunbar-Ortiz, R. (2014). An indigenous peoples’ history of the United States (Vol. 3). Beacon Press.
Dwivedi, A. K., Rath, S. K., Satapathy, S. M., Chakravarthy, L. S., & Rao, P. K. S. (2018,
September). Applying Reverse Engineering Techniques to Analyze Design Patterns
in Source Code. In 2018 International Conference on Advances in Computing,
Communications and Informatics (ICACCI) (pp. 1398-1404). IEEE.
Enke, A. F. (2012). The education of little cis: Cisgender and the discipline of opposing
bodies. In Transfeminist perspectives in and beyond transgender and gender studies (pp. 60–77).
Enns, P. K., Yi, Y., Comfort, M., Goldman, A. W., Lee, H., Muller, C., Wakefield, S., Wang,
E. A., & Wildeman, C. (2019). What Percentage of Americans Have Ever Had a
Family Member Incarcerated?: Evidence from the Family History of Incarceration
126
Survey (FamHIS). Socius, 5, 2378023119829332.
https://doi.org/10.1177/2378023119829332
Eubanks, V. (2018). Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor.
St. Martin’s Press.
Ferguson, R. A. (2003). Aberrations In Black: Toward A Queer Of Color Critique (First edition
edition). Univ Of Minnesota Press.
Fischer, M. (2019). Terrorizing gender: Transgender visibility and the surveillance practices of the US
security state. U of Nebraska Press.
Fiske, J. (1998). Surveilling the city: Whiteness, the black man and democratic totalitarianism.
Theory, Culture & Society, 15(2), 67–88.
Frankel, F., & DePace, A. H. (2012). Visual Strategies: A Practical Guide to Graphics for
Scientists & Engineers. Yale University Press.
Gaboury, J. (2018). Becoming NULL: Queer relations in the excluded middle. Women &
Performance: A Journal of Feminist Theory, 28(2), 143–158.
https://doi.org/10.1080/0740770X.2018.1473986
Gehl, R. (2015). Critical reverse engineering: The case of Twitter and TalkOpen. Compromised
Data: From Social Media to Big Data, 147.
Gehl, R. W., & Bell, S. A. (2012). Heterogeneous Software Engineering: Garmisch 1968,
Microsoft Vista, and a Methodology for Software Studies. Computational Culture, 2.
http://computationalculture.net/heterogeneous-software-engineering-garmisch-
1968-microsoft-vista-and-a-methodology-for-software-studies/
Genosko, G., & Thompson, S. (2006). Tense theory: The temporalities of surveillance. In
Theorizing Surveillance (pp. 137–152). Willan.
Green, K. M., & Bey, M. (2017). Where Black Feminist Thought and Trans* Feminism
Meet: A Conversation. Souls, 19(4), 438–454.
https://doi.org/10.1080/10999949.2018.1434365
127
Green, N., & Zurawski, N. (2015). Surveillance and Ethnography: Researching Surveillance
as Everyday Life. Surveillance & Society, 13(1), 27–43.
Grodzinsky, F. S., Miller, K., & Wolf, M. J. (2012). Moral Responsibility for Computing
Artifacts: “the Rules” and Issues of Trust. SIGCAS Comput. Soc., 42(2), 15–25.
https://doi.org/10.1145/2422509.2422511
Haigh, T. (2009). How Data Got its Base: Information Storage Software in the 1950s and
1960s. IEEE Annals of the History of Computing, 31(4), 6–25.
https://doi.org/10.1109/MAHC.2009.123
Halberstam, J. (1998). Transgender butch: Butch/FTM border wars and the masculine
continuum. GLQ: A Journal of Lesbian and Gay Studies, 4(2), 287–310.
Hale, C. J. (1997). Leatherdyke Boys and Their Daddies: How to Have Sex without Women
or Men. Social Text, 52/53, 223–236. JSTOR. https://doi.org/10.2307/466741
Harris, C. I. (2006). Whiteness as Property (SSRN Scholarly Paper ID 927850). Social Science
Research Network. https://papers.ssrn.com/abstract=927850
Harris, D. R., Reubenstein, H. B., & Yeh, A. S. (1995). Reverse Engineering to the
Architectural Level. 1995 17th International Conference on Software Engineering, 186–186.
https://doi.org/10.1145/225014.225032
Hauser, C. (2021, October 27). U.S. Issues First Passport With ‘X’ Gender Marker. The New
York Times. https://www.nytimes.com/2021/10/27/us/us-first-passport-gender-
x.html
Hoffmann, A. L. (2018). Data Violence and How Bad Engineering Choices Can Damage
Society. Medium. https://medium.com/s/story/data-violence-and-how-bad-
engineering-choices-can-damage-society-39e44150e1d4
Hoffmann, A. L. (2019). Where fairness fails: Data, algorithms, and the limits of
antidiscrimination discourse. Information, Communication & Society, 22(7), 900–915.
https://doi.org/10.1080/1369118X.2019.1573912
Hoffmann, A. L. (2020). Terms of Inclusion: Data, Discourse, Violence. New Media &
Society, preprint.
Hoos, I. R. (1960). The sociological impact of automation in the office. Management Science, 2,
10–19.
128
Hughes, L. (2018). Wronging the right-body narrative: On the universality of gender
uncertainty. In Current critical debates in the field of transsexual studies (pp. 181–193).
Routledge.
Hujainah, F., Bakar, R. B. A., Abdulgabber, M. A., & Zamli, K. Z. (2018). Software
Requirements Prioritisation: A Systematic Literature Review on Significance,
Stakeholders, Techniques and Challenges. IEEE Access, 6, 71497–71523.
https://doi.org/10.1109/ACCESS.2018.2881755
Iliadis, A., & Russo, F. (2016). Critical data studies: An introduction. Big Data & Society, 3(2),
2053951716674238.
Institute for Policy Studies. (n.d.). Racial Economic Inequality. Inequality.Org. Retrieved
October 16, 2019, from https://inequality.org/facts/racial-inequality/
Johnson, J. M. (2018). Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the
Digital Crossroads. Social Text, 36(4 (137)), 57–79.
https://doi.org/10.1215/01642472-7145658
Kerssens, N. (2018). The Database ‘Revolution’: The Technological and Cultural Origins of
the Big-data-based Mindset in American Management, 1970s–1980s. TMG Journal for
Media History, 21(2), 7–29. https://doi.org/10.18146/2213-7653.2018.364
Kitchin, R., & Lauriault, T. (2014). Towards critical data studies: Charting and unpacking
data assemblages and their work.
Lanza, M., & Ducasse, S. (2003). Polymetric views—A lightweight visual approach to reverse
engineering. IEEE Transactions on Software Engineering, 29(9), 782–795.
https://doi.org/10.1109/TSE.2003.1232284
Lavizzo-Mourey, R., & Williams, D. (2016, April 14). Being Black Is Bad for Your Health.
US News & World Report. https://www.usnews.com/opinion/blogs/policy-
dose/articles/2016-04-14/theres-a-huge-health-equity-gap-between-whites-and-
minorities
Levitt, H. M., & Hiestand, K. R. (2004). A Quest for Authenticity: Contemporary Butch
Gender. Sex Roles, 50(9/10), 605–621.
https://doi.org/10.1023/B:SERS.0000027565.59109.80
129
Liberati, N. (2017). Teledildonics and New Ways of “Being in Touch”: A Phenomenological
Analysis of the Use of Haptic Devices for Intimate Relations. Science and Engineering
Ethics, 23(3), 801–823. https://doi.org/10.1007/s11948-016-9827-5
Lin, Z., Zhang, X., & Xu, D. (2010). Reverse Engineering Input Syntactic Structure from
Program Execution and Its Applications. IEEE Transactions on Software Engineering,
36(5), 688–703. https://doi.org/10.1109/TSE.2009.54
Lodge, M., & Mennicken, A. (2017). The Importance of Regulation of and by Algorithm.
Algorithmic Regulation, 2.
Lovense Bluetooth Sex Toys—History of Innovative Sex Tech. (n.d.). Lovense. Retrieved
February 6, 2020, from https://www.lovense.com/sextoys/about-us
Lovense Sex Toys for Long-Distance Love. (n.d.). Lovense. Retrieved April 19, 2021, from
https://www.lovense.com/compare
Lovense. (n.d.). Max 2 by Lovense. The Best Wireless Male Masturbator. Lovense. Retrieved
February 6, 2020, from https://www.lovense.com/male-masturbators
Lu, J. H., & Steele, C. K. (2019). ‘Joy is resistance’: Cross-platform resilience and
(re)invention of Black oral culture online. Information, Communication & Society, 22(6),
823–837. https://doi.org/10.1080/1369118X.2019.1575449
Lubis, J. H., & Zamzami, E. M. (2020, June). Relational database reconstruction from SQL
to Entity Relational Diagrams. In Journal of Physics: Conference Series (Vol. 1566, No. 1,
p. 012072). IOP Publishing.
Lyon, D., Waldo, J., Lin, H. S., & Millett, L. I. (2007). A Short History of Surveillance and
Privacy in the United States. Engaging Privacy and Information Technology in a Digital Age.
Maalsen, S., & Sadowski, J. (2019). The Smart Home on FIRE: Amplifying and Accelerating
Domestic Surveillance. Surveillance & Society, 17(1/2), 118–124.
https://doi.org/10.24908/ss.v17i1/2.12925
Malpass, M. (2013). Between Wit and Reason: Defining Associative, Speculative, and Critical
Design in Practice. Design and Culture, 5(3), 333–356.
https://doi.org/10.2752/175470813X13705953612200
Malpass, M. (2017). Critical design in context: History, theory, and practices. Bloomsbury Publishing.
Mascarenhas, M. (2018). White Space and Dark Matter: Prying Open the Black Box of STS.
Science, Technology, & Human Values, 43(2), 151–170.
https://doi.org/10.1177/0162243918754672
McGuirk, J. (2015). Honeywell, I’m Home! The Internet of Things and the New Domestic
Landscape. E-Flux, 64. https://www.e-flux.com/journal/64/60855/honeywell-i-m-
home-the-internet-of-things-and-the-new-domestic-landscape/
130
McPherson, T. (2013). U.S. Operating Systems at Mid-Century: The Intertwining of Race
and UNIX. In L. Nakamura & Peter Chow-White (Eds.), Race After the Internet.
Routledge.
McPherson, T. (2018). Feminist in a software lab: Difference+ design. Harvard University Press.
Meyerowitz, J. (2002). How Sex Changed: A History of Transsexuality in the United States. Harvard
University Press. https://doi.org/10.2307/j.ctv1c7zfrv
Mills, C. W. (2004). Racial exploitation and the wages of whiteness. In What white looks like
(pp. 41–70). Routledge.
Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of
algorithms: Mapping the debate. Big Data & Society, 3(2), 2053951716679679.
https://doi.org/10.1177/2053951716679679
Muñoz, J. E. (2009). Introduction. In Cruising Utopia: The Then and There of Queer Futurity. New
York University Press. http://ebookcentral.proquest.com/lib/asulib-
ebooks/detail.action?docID=865693
N.P. v Standard Innovation Corp, 16 CV 8655 (Illinois Eastern August 15, 2017).
Neilson, B. (2020). The Reverse of Engineering. South Atlantic Quarterly, 119(1), 75–93.
https://doi.org/10.1215/00382876-8007665
Noble, D. F. (1979). America by Design: Science, Technology, and the Rise of Corporate Capitalism.
Oxford University Press.
Nora by Lovense. The Original Bluetooth Rabbit Vibrator! (n.d.). Lovense. Retrieved
February 6, 2020, from https://www.lovense.com/rabbit-vibrator
131
Pen Test Partners. (n.d.). Smart male chastity lock cock-up | Pen Test Partners. Retrieved
August 18, 2021, from https://www.pentestpartners.com/security-blog/smart-male-
chastity-lock-cock-up/?=october-5-2020
Pinch, T. J., & Bijker, W. E. (1984). The Social Construction of Facts and Artefacts: Or How
the Sociology of Science and the Sociology of Technology might Benefit Each
Other. Social Studies of Science, 14(3), 399–441.
https://doi.org/10.1177/030631284014003004
Premerlani, W. J., & Blaha, M. R. (1993, May). An approach for reverse engineering of
relational databases. In [1993] Proceedings Working Conference on Reverse Engineering (pp.
151-160). IEEE.
Prosser, J. (1998). Second Skins: The Body Narratives of Transsexuality, 288. Columbia University
Press.
Qiui. (n.d.). Cellmate Chastity Cage. QIUI. Retrieved August 18, 2021, from
https://www.qiui.store/product-page/cellmate-chastity-cage
Quinan, C. (2017). Gender (in) securities: Surveillance and transgender bodies in a post-9/11
era of neoliberalism. Security/Mobility, 153.
Rana, J. (2016). The Racial Infrastructure of the Terror-Industrial Complex. Social Text, 34(4
(129)), 111–138. https://doi.org/10.1215/01642472-3680894
Rapoport, M. (2012). The Home Under Surveillance: A Tripartite Assemblage. Surveillance &
Society, 10(3/4), 320–333.
Razali, R., & Anwar, F. (2011). Selecting the right stakeholders for requirements elicitation:
A systematic approach. Journal of Theoretical and Applied Information Technology, 33(2),
250–257.
Richterich, A. (2018). The Big Data Agenda: Data Ethics and Critical Data Studies. University of
Westminster Press.
Roberts, D. E. (2009). Race, gender, and genetic technologies: A new reproductive dystopia?
Signs: Journal of Women in Culture and Society, 34(4), 783–804.
Roediger, D. R. & others. (1999). The wages of whiteness: Race and the making of the American
working class. Verso.
Rovner, J. (n.d.). Racial Disparities in Youth Commitments and Arrests. The Sentencing
Project. Retrieved January 31, 2018, from
http://www.sentencingproject.org/publications/racial-disparities-in-youth-
commitments-and-arrests/
132
Rubin, G. (2013). Of catamites and kings: Reflections on butch, gender, and boundaries. In
The transgender studies reader (pp. 487–497). Routledge.
Sandvig, C., Hamilton, K., Karahalios, K., & Langbort, C. (2014). Auditing Algorithms:
Research Methods for Detecting Discrimination on Internet Platforms. International
Communication Association, 1–20.
Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed.
Yale University Press.
Shi, N., & Olsson, R. A. (2006, September). Reverse engineering of design patterns from
java source code. In 21st IEEE/ACM International Conference on Automated Software
Engineering (ASE'06) (pp. 123-134). IEEE.
Simsion, G. C., & Witt, G. C. (2005). Data modeling essentials (3rd ed). Morgan Kaufmann
Publishers.
Smith, J. M., & Smith, D. C. (1977). Database abstractions: Aggregation. Communications of the
ACM, 20(6), 405–413.
Snorton, C. R. (2017). Black on Both Sides: A Racial History of Trans Identity (3rd ed. edition). Univ
Of Minnesota Press.
Spade, D. (2011). Normal life: Administrative violence, critical trans politics, and the limits of law.
South End Press.
133
stacia l. brown. (2018, October 18). Other Black @netflix users: Does your queue do this?
Generate posters with the Black cast members on them to try to compel you to
watch? This film stars Kristen Bell/Kelsey Grammer and these actors had maaaaybe
a 10 cumulative minutes of screen time. 20 lines between them, tops.
Https://t.co/Sj7rD8wfOS [Tweet]. @slb79.
https://twitter.com/slb79/status/1052776984231718912
Star, S. L. (1999). The Ethnography of Infrastructure. American Behavioral Scientist, 43(3), 377–
391. https://doi.org/10.1177/00027649921955326
Stevens, N. L., Hoffmann, A. L., & Florini, S. (2021). The Unremarked Optimum:
Whiteness, Optimization, and Control in The Database Revolution. Review of
Communication.
Stone, S. (1992). The Empire Strikes Back: A Posttranssexual Manifesto. Camera Obscura:
Feminism, Culture, and Media Studies, 10(2 (29)), 150–176.
https://doi.org/10.1215/02705346-10-2_29-150
Stryker, S., & Aizura, A. (Eds.). (2013). The Transgender Studies Reader 2 (1 edition). Routledge.
Stryker, S., & Currah, P. (2014). Postposttranssexual: Key concepts for a twenty-first-century
transgender studies. TSQ: Transgender Studies Quarterly, 1–1.
Sue, D. W. (2006). The Invisible Whiteness of Being: Whiteness, White Supremacy, White
Privilege, and Racism. In Addressing racism: Facilitating cultural competence in mental health
and educational settings (pp. 15–30). John Wiley & Sons Inc.
Thomer, A. K., & Wickett, K. M. (2020). Relational data paradigms: What do we learn by
taking the materiality of databases seriously?: Big Data & Society.
https://doi.org/10.1177/2053951720934838
Tran, D. (2003). An Excerpt from Call Me by My Proper Name: Legal Surveillance and
Discipline of Transgender Body and Identity. Women's Rts. L. Rep., 25, 219.
Tuck, E., & Yang, K. W. (2012). Decolonization is not a metaphor. Decolonization: Indigeneity,
Education & Society, 1(1).
Tumbleson, C., & Wiśniewski, R. (n.d.). Apktool—A tool for reverse engineering 3rd party,
closed, binary Android apps. Retrieved December 27, 2021, from
https://ibotpeaches.github.io/Apktool/
134
Udupa, S. K., Debray, S. K., & Madou, M. (2005, November). Deobfuscation: Reverse
engineering obfuscated code. In 12th Working Conference on Reverse Engineering
(WCRE'05) (pp. 10-pp). IEEE.
Wachter-Boettcher, S. (2017). Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats
of Toxic Tech (1 edition). W. W. Norton & Company.
Wamsley, L. (2017, June 16). Oregon Adds A New Gender Option To Its Driver’s Licenses:
X. NPR. https://www.npr.org/sections/thetwo-
way/2017/06/16/533207483/oregon-adds-a-new-gender-option-to-its-driver-s-
licenses-x
Whittaker, Z. (n.d.). This smart vibrator can be “easily” hacked and remotely controlled by
anyone. ZDNet. Retrieved August 18, 2021, from
https://www.zdnet.com/article/this-smart-vibrator-can-be-easily-hacked-and-
remotely-controlled-by-anyone/
Winner, L. (1986). Do Artifacts Have Politics. In The Whale and the Reactor: The Search for
Limits in an Age of High Technology (pp. 19–39). Chicago University Press.
Wirth, N. (1976). Algorithms + Data Structures = Programs (1st edition). Prentice Hall.
Woodard, J. B., & Mastin, T. (2005). Black Womanhood: Essence and its Treatment of
Stereotypical Images of Black Women. Journal of Black Studies, 36(2), 264–281.
https://doi.org/10.1177/0021934704273152
Wynn, M. A. (2018). Categorizing the Security and Privacy of “Internet of Things” Devices
[Thesis]. https://utd-ir.tdl.org/handle/10735.1/5932
Xu, J. (2019, July 1). Algorithmic Solutions to Algorithmic Bias: A Technical Guide.
Medium. https://towardsdatascience.com/algorithmic-solutions-to-algorithmic-bias-
aef59eaf6565
135
APPENDIX A
136
The SQL code below is extracted from the annotations that I found in the Lovense codebase
145
ProQuest Number: 29162602
This work may be used in accordance with the terms of the Creative Commons license
or other rights statement, as indicated in the copyright statement or in the metadata
associated with this work. Unless otherwise specified in the copyright statement
or the metadata, all rights are reserved by the copyright holder.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346 USA