0% found this document useful (0 votes)
14 views17 pages

algorithm film as data analysis

Uploaded by

xuanshuo200206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

algorithm film as data analysis

Uploaded by

xuanshuo200206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Edinburgh University Press

Chapter Title: Algorithmic Films as Data Analysis

Book Title: Cinema and Machine Vision


Book Subtitle: Artificial Intelligence, Aesthetics and Spectatorship
Book Author(s): Daniel Chávez Heras
Published by: Edinburgh University Press. (2024)
Stable URL: https://www.jstor.org/stable/10.3366/jj.9941316.13

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Edinburgh University Press is collaborating with JSTOR to digitize, preserve and extend
access to Cinema and Machine Vision

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
Part III

AI and Criticism:
Aesthetics, Formats, and Interactions

One of the classical tenets of artificial intelligence is to model computer


systems to simulate human behaviour. However, in pursuit of many of the
same objectives, there is a parallel and complementary process that consists
of modelling human behaviour to accommodate several types of AI systems.
In this third part of the book the emphasis shifts to this other comple-
mentary impulse, as we move from using computers to analyse films and
towards rethinking film criticism and scholarship practice computationally.
In this line of thinking, I propose we re-examine the role of the critic, from
analytical tool bearer to a critical technologist of moving images, a role
similar to what Simondon calls ‘a mechanologist’ (2017[1958], p. 19).
To explore this change in roles, I stage an encounter between critics
and computers based on a different set of assumptions about the function
each part plays in the co-constitution of machine vision. And from this
encounter I outline a type of critical practice informed by an emerg-
ing kind of techno-aesthetic imagination, one which I provisionally call
macroscopic.
This encounter is staged in three parts. First, I address the prob-
lems of interfacing with cultural objects and traces that are cast in high-
dimensional space, asking how to explore and question visual worlds that
have already been seen for us, but not directly by us. In the next chapter
I consider the current and future roles critics might play in the produc-
tion of and interaction with these worlds, as well as the kinds of questions
that can be asked from moving imagery by this critical-technical avant-
garde. And finally, I introduce the idea of AI as a medium of inscription: a
protean space for expression as well as analysis through which critics can
document and narrate their interactions with inferred visual worlds.

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
C H A PT E R 7

Algorithmic Films as Data Analysis

One of the side effects of a distant reading of films or the cultural analyt-
ics of moving images is that their explicit commitment to large corpora
usually means in practice an implicit alignment to a political economy of
data where scale becomes both technically necessary and epistemically
sufficient to account for meaning-making. We observed this economic
logic at work in distant viewing, where distance becomes a condition
for knowledge no less than an appeal to the authority of induction. The
epistemic strategies of data and computer science, exported to other disci-
plines through their various techniques, devices and practices, pose in this
way a challenge to theoretical frameworks and modes of critical practice
that rely heavily on deductive reasoning, close readings of media texts, and
hermeneutics.
The meta-disciplinary insurrection brought about by the rise of induc-
tive computing is set against a historical backdrop of quantification, data-
fication, and their derived calculating devices and techniques – digital
computers and machine learning included. Numbers themselves, as
Porter reminds us, need to be understood as ‘technologies of trust’ (1996),
which are very often used to exercise or resist power in different contexts.
Quantification can be used, for instance, to constrain individual and per-
sonal authority by redistributing it to several quantification and calcula-
tion processes, and conversely, the more individuals are entrenched in
positions of authority, the more they ‘can afford to be looser with numbers
or even to block the intrusion of quantitative technologies that constrain
their judgment and limit their power’ (Espeland, 1997, p. 1109). Similarly,
an appeal to quantification and calculation in the form of statistics, for
example in ‘points-based’ processes or ‘data-driven’ decision-making, is
often the recourse of those who wish to justify their choices by challenging
the subjective judgements of individual critics. The notions of consensus
and transparency that data and statistics can conjure and mobilise can
be leveraged by the populist against the expert or the bureaucrat against

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
132 cine ma a nd mac hi n e v i s i o n

the despot: while individuals might be biased, unreliable, and decide on


whims and prejudices, opinions in aggregate and the calculation of majori-
ties present themselves as viable counterweights to individual authority.
Viewed as a form of computational statistics, machine learning technol-
ogies have been instrumental in the assault on the authority of individual
experts. Take a common training exercise in machine learning and data
science courses, in which students are given a dataset of physico-chemical
properties of wine samples labelled by their ratings and are tasked to train
a model to predict the quality of out-of-sample wines (KAGGLE, 2019).
With adequate implementation and some adjustments, these models
train well and can be accurate. Students are empowered as they realise
that their models can predict wine quality without ever having to open a
bottle. Why, then, trust an individual sommelier, if the model can perform
just as well? From this perspective, time and resources would be better
spent on improving the computational model: implementing the most
effective learning architecture, optimising training cycles, and, above all,
getting more and higher-quality data. In this way, knowledge about how to
model computational critics appears preferable to the knowledge needed
to become a critic, inevitably bound to individual subjective experience,
whereas a large computational model can be trained to abstract judge-
ments from a multitude of critics distributed in time, space, and across
cultural backgrounds and levels of experience.
It is important to note that there are also internal tensions and epis-
temic struggles within AI research communities, notably between so-
called symbolic and connectionist approaches, and between supervised,
unsupervised, and self-supervised training environments. In the above
exercise, for example, researchers who originally proposed the task of
wine quality prediction and assembled the dataset argued that a type of
algorithms called support vector machines (SVMs) outperform other
methods, including neural networks (Cortez et al, 2009). Meanwhile,
neural network proponents, who were for decades a marginal group in
AI research, argue their methods – now rebranded as deep learning – are
much more effective at dealing with massive quantities of data from the
‘real world’, as opposed to the toy worlds of symbolic AI (Cardon et al,
2018, p. 27).
Deep learning initially challenged what Haugeland (1989) called
‘Good Old-Fashioned Artificial Intelligence’ – a paradigm of symbolic
approaches, rule-based algorithms and expert systems, that were popular
in the 1980s. As we saw in earlier chapters, the success of deep learning
goes hand in hand with the large-scale datafication of cultural production
in the first two decades of the twenty-first century. The complementary

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
alg orithmic f ilms as dat a a n a l y s i s 133

factor in this process is a turn to induction. By passing sizeable portions


of cultural artefacts through inductive machines, the corporate harvesters
of cultural data mobilised the political powers of consensus, effectively
challenging a regime of knowledge-making through large computational
models of correlation and inference. As the effects of corporate AI become
more widely felt and its extractive practices better understood, trust in
these companies, their business and computational models, and their
influence in the governance of self and other, as Aradau and Blanke put it,
is being increasingly questioned (2022).
Hubert Dreyfus, who was one of the most notable critics of the first
wave of connectionist AI in the early 1970s, wrote that while machines
learned from a model of the world, for humans the model was the world
itself (1992, p. 266). His injunction relies largely on an ontological distinc-
tion between the world and its various models; however, as the scale and
complexity of the models exceed our individual ability to fathom their size
and identify their edges, the fantasy of a map of culture that is as large as
culture itself gains epistemic and practical currency. It is no longer just a
matter of measuring the world to analyse it using statistics, but almost the
reverse, a larger world that is already quantified and requires vast powers
of computing to make sense of it.
Let us examine in detail this fantasy of a model that is as large as the
world in terms of images and visual culture. In film studies, it appears to
fit well with Bordwell and Carroll’s own vision of a world beyond (film)
theory (1996). One gets the impression that they would welcome such
a hyper-archive with open arms, for it would mean that films too are
finally allowed to ‘speak for themselves’ against the individual authority
of a handful of theorists who have monopolised critical discourse from
the top down. By the same logic however, their programme of mid-level
research and their own critical authority would also be undermined from
the bottom up, by computer systems able to replicate the type of formal
analysis they champion. This is the double edge of inductive comput-
ing; it feeds from the cultural production that it later attempts to make
obsolete; it dispenses with individual experts only after it has ingested and
abstracted their expertise. In closer inspection, the idea of a large model of
film that is as large as all existing films is as enticing as it is threatening to
non-computational film scholarship.
In his short story The Library of Babel, Borges describes an infinite
library that included every possible book, to which the first reaction of
those who had access to it was of ‘unbounded joy’, for they felt in pos-
session of a treasure (1998 [1941], p. 115). A similar point can be made of
a cinephile who finds themselves in control of a set of all possible films:

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
134 cine ma a nd mac hi n e v i s i o n

every form of narrative structure, camera angle, performance style that


has ever been committed to celluloid or recorded in digital formats being
a part of it, waiting to be discovered or re-discovered in its relations to
all others. Assuming this hyper archive was all-encompassing, it would
include thousands of films of under-represented cinemas and lesser-
known filmmakers, ‘hidden gems’ as the presenter in MbM puts it, that
would have never seen the light of day under the tyranny of professional
critics and their normative cannons. Such a Cinematheque of Babel would
encompass all cinematic representations across all ages and be a paragon
of diversity and inclusion.
But Borges describes a darker side to this library. In his tale, he also
warns about its users going mad trying to make sense of such a place, as
they realise that any systematic approach to find or articulate anything of
value would be impossible. Viewers of the Cinematheque of Babel might
well be stoked by the same anxieties, the vertigo produced by the collapse
of difference and mourning the loss of the hidden. Borges’ parable can be
read today almost as a work of science fiction, including the exhilarating
enthusiasm of vast access followed by the abject panic of what this all-
encompassing collection means for those with finite lives and memories.
How can a critic navigate or make sense of millions of years of video cir-
culating online? What kind of judgement might they be able to formulate
when confronted with the vastness of a hyper-archive?
Before moving images became ubiquitous online, a major part of the
critic’s job was to find and track films; to watch more than the average
viewer by attending screenings, travelling to festivals, and various other
exchanges with their communities. Television, VCR, and DVD first altered
the circulation and distribution of films, making them increasingly avail-
able beyond the experience at the theatre (Elsaesser, 2005). Later, digital
technologies and the internet radically transformed every other aspect of
cinema, including criticism, which expanded from trade publications and
newspapers to an ever-growing ecosystem of blogs, podcasts, videos, and
dedicated websites (Frey and Sayad, 2015). Today, in the wake of induc-
tive AI, could film critics not be replaced by a computer model, much like
the sommelier? Assuming they resisted to be automated out of a job, and
organised in large groups to scour the hyper-archive like Borges’ ‘inquisi-
tors’, randomly sampling the endless vaults for years, how would they
configure their findings without resorting to creating new canons? And if
they again made canons, would that not negate the advantages of having
access to a model of the cinematic world that is as large as that world itself?
As disciplines in the arts and humanities adopt AI to study culture at
scale, they inherit a share of these imagined horizons and the hopes and

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
alg orithmic f ilms as dat a a n a l y s i s 135

anxieties that come with them. At the same time, these fantasies predate
deep learning, and indeed the internet, which as we have seen are built
on top of previous technical regimes. Some thirty years before YouTube
and Netflix, and roughly at the same time as Dreyfus was mounting his
critique on connectionist AI, filmmaker Hollis Frampton wrote about
a hypothetical machine comprised of ‘the sum of all projectors and all
cameras in the world’ endlessly running and growing ‘by many millions of
feet of raw stock every day’. He conjectured that such a project would lead
to an ‘infinite film’ that would collapse the world into its model:
The infinite film contains an infinity of passages wherein no frame resembles any
other in the slightest degree, and a further infinity of passages wherein successive
frames are nearly identical as intelligence can make them. [. . .] If we are indeed
doomed to the comically convergent task of dismantling the universe, and fabricat-
ing from its stuff an artifact called The Universe, it is reasonable to suppose that such
artifact will resemble the vaults of an endless film archive built to house, in eternal
cold storage, the infinite film. (Frampton, 2009, pp. 114–115 [1971])

Unburdened by the strictures of actualised systems, Frampton’s idea of


an infinite film reveals a more subtle point about connectionism and about
induction, as he hints at a transformation in kind as well as scale; an
ontological displacement that is only now being felt in our interactions
with large computational models. In his infinite film, moving images are
disaggregated into their individual frames, and arranged not by author,
context, or theme; there is no genre and no narrative classification to this
collection of frames, only an algorithm that sorts every frame along an
axis of similarity-difference. Frampton has a distinct inductive inclina-
tion coupled with a differential sensibility, meaning that he resists top-
down categories, inherited ‘film grammar’ conventions that structure film
making and viewing, and aims instead for emergent bottom-up properties
that can be observed through the recurrent calculation of difference. To
enact this logic, Frampton organises images along various axes of differ-
ence, eschewing explicit references to the contents of the images in favour
of tuning into a para-linguistic signal that arises between them.
Perhaps his work is most revealing now because for him cinema was
already back then a machine made of images, and making films meant to
interact with this machine as a critical technologist, before this practice
became available through code, data, and digital computers. Many of his
films show this commitment to algorithmic thinking executed through
analogue computing, often as a set of constraints designed to explore pro-
grammatically a collection of moving image clips.
For example, for his film Zorns Lemma (1970), Frampton collected
thousands of film clips of random words in signs, posters, and storefronts

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
136 cine ma a nd mac hi n e v i s i o n

recorded in Manhattan: ‘ACME’, ‘AQUARIUM’, ‘AGENT’, ‘ALARM’


[. . .] ‘BARBECUE’, ‘BARBER’, ‘BARGAIN’ [. . .] ‘SQUARE’, ‘STOP’,
‘STORAGE’, et cetera. He then devised a recurrent structure for the
mid-section of the film, so that forty-five minutes are broken into 2,700
one-second shots, each of a different image corresponding to a letter of
the alphabet. The process repeats 100 times over, and as clips with less
frequent starting letters like ‘Q’, ‘X’ and ‘Z’ start to run out, these seg-
ments are filled with images, some of which allude to the letter but have
no written text in them, for instance a person changing a tyre is shown
instead of the letter ‘T’. The result is generative and hypnotic: through
constant repetition an attentive viewer can learn the pattern and anticipate
the substitutions, so that by the end of the film one is still ‘reading the film’
even though there is no text left to read, only images. The first letter to
run out is ‘Q’, which is replaced by a smoking chimney that repeats over
ninety times thereafter; the most common letter and therefore the last one
to run out is ‘C’, which is replaced by a pink ibis flapping its wings, which
we only get to see once (Frampton in Gidal, 1985, pp. 94–97).
By prompting the viewer to deconstruct the algorithm that shapes the
film, Frampton is training the viewer to interact with the film, prompting
the isolating of variables and finding of patterns that allows us to treat
moving images as datasets before this became a necessary condition for
their computational production and analysis. Because of this, his work
affords a rare view at an emerging type of inductive technical practice still
untethered from digital devices:
I do most of my work in such a way that I supply a certain amount, I make a con-
tainer, and for the rest of it, the film – the work itself – generates its own set of
demands and its own set of rules [. . .] But it wasn’t simply a question of, say, getting
more ambitious, wanting to order larger and larger amounts of material. There are
ways of doing that. Rather, it was a question of finding some way the material would
order itself that would have something to do with it and that would also seem appro-
priate to my own feeling about it. (Farmton in Gidal, 1985, p. 99)

My contention is that this way of thinking about imagery, as a vast visual


substrate that can be made to self-organise and its patterns be prompted to
emerge through recurrent operations, is enacted today through generative
AI systems. And the main challenge, as Frampton anticipated, is not of
distance or scale but of interaction; of finding creative ways to redistribute
agencies between the various parts of the human-machine continuum
and to design the container that allows a degree of control in the self-
organisation of the material.
One of such ways is by rethinking inputs and outputs and the logical
units of processing and analysis. Specifically, that in order to enable self-

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
alg orithmic f ilms as dat a a n a l y s i s 137

organisation, moving images need to be pulverised, which is to say, broken


down into minute particles and stripped from previous structures: ‘data
is not made available to the perception of calculators in a “raw” and
“immediate” form, but rather is subject to atomization and dissociation
in order to transform it into the most elementary possible standardized
digital signs’ (Cardon et al, 2018, p. 29). And this is precisely the epistemic
bargain struck by the group of ‘neural conspirators’: we can model increas-
ingly large parts of the world through computers so long that the world
is formally encoded and logically dissociated from its context, vectorised
and represented into multidimensional number arrays that are amenable
to deep learning, but which by the same token become intractable to direct
human interpretation.
More than models that are as large as the world, current AI systems
depend on this more subtle but crucial epistemic gambit: to apprehend
the world they need this world to change. In other words, to understand
culture through AI requires us to reshape cultural artefacts and their
relations into computable units. This re-encoding of culture goes much
deeper than digitisation or datafication; it requires that all material culture
and traces of our interactions with it are reduced to isomorphic tokens,
so that every token can be set in relation to any other, and their positions
and differences calculated in a coherent representational space. This is the
vectorisation of culture.
Seen in this way, inductive computing and its neural methods are tech-
nically and historically contingent; not aiming to approximate a human-
like way of apprehending the world, but instruments of world-making in
their own right. And the worlds created through these AI systems are thus
revealed to be profoundly ideological, presented as totalities when they
are in fact reductions, vectors that only make sense in a self-contained
and self-identical representational space. Such reduction is a structural
feature of computational models, and not in itself a problem, but arguably
precisely what makes them useful in the first place as reductions that reveal.
The problem is, rather, that these reductions are always incomplete, and
that the source of their incompleteness is not often apparent, and some-
times even deliberately concealed. And this is also a structural issue, as the
pre-processes of tokenisation and encoding tend to privilege data signals
that offer the paths of least resistance: words over sentences, clicks over
opinions, pixels over shots, film ratings over critic reviews, etc.
Armed with this understanding, we can begin to dismantle the fantasy
of an infinite film. More than a map that is as large as the territory,
the hyper-archive can be better understood as the result of what Alan
Blackwell calls ‘a sublimated anxiety about a universal interface’ (2010,

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
138 cine ma a nd mac hi n e v i s i o n

p. 396). To process films and television through inductive computing we


need to reshape them into computable units, while at the same time trying
to keep these units under some form of control so that they respond to
us, much like Frampton described. Critical views of current AI systems
have mostly focused on their actual or potential nefarious effects, a more
nuanced theoretical critique complements this by addressing the problems
of what Blackwell calls ‘interacting with an inferred world’ (2015). This is,
in my view, the underlying anxiety outlined by Borges: the empowerment
of scale vis-à-vis the disempowerment of interaction; interfacing with a
world of films that have been seen for us but whose representation we
cannot directly access or explicitly control.
Somewhat paradoxically, the Cinematheque of Babel in the digital age
would not contain any films as such, it would be comprised instead of
computable artefacts, number array representations whose distance rela-
tions can be calculated reducing the representations to their similarities.
The reader might get some distinctly Deleuzian echoes from this idea,
which implies that to get a distinguishable signal of difference through
calculation requires repetition, or in other words, that reducing moving
images to the same kind of calculable tokens makes differences noticeable.
The upside of this logic is that, if we are successful in such intervention,
every image will enter into a latent relation with every other image in the
archive – we would have effectively created Vertov’s box, or Frampton’s
container. Being and operating in a container that appears to have no exte-
rior is the fundamental technical and ideological gambit of AI, and is what
enables its relational-analytical powers and creates the rational illusions of
completeness and infinity.
A concrete instantiation of this idea is a film archive processed in a
way similar to how Frampton imagined, in which every film is broken
into its constituent frames, and every frame is arranged according to a
calculated difference. In terms of their visual appearance, frames that are
similar would cluster together, whereas dissimilar frames would be far
apart. Initially, similar frames would be consecutive frames, as they would
be almost identical. But in time and over a larger corpus, similarities
would begin to emerge between frames from different films: clusters of
hands, close-ups of actors, several types of lenses, colour treatments, grain
textures, et cetera. And differences too would become apparent, even in
consecutive frames. In higher dimensions, the archive that self-organises
begins to warp and fold into itself, like Paris in Inception (2010), or the
farm world in Hyperbolica (2022).
The problem of course is that we do not experience the world in high-
dimensional space any more than we reason directly through vectors when

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
alg orithmic f ilms as dat a a n a l y s i s 139

watch films. For these relations to be observed, and patterns of difference


to be recognised, the archive needs to be flattened again and its dimen-
sionality reduced to 2D or 3D space that we human observers can perceive
and interpret. To solve this problem of dimensionality reduction, com-
puter science and engineering have developed a series of projection tech-
niques, often inspired or directly taken from topology. Techniques such
as Principal Component Analysis (PCA) (Wold et al, 1987), T-distributed
Stochastic Neighbour Embedding (t-SNE) (Maaten and Hinton, 2008),
or Uniform Manifold Approximation and Projection for Dimension
Reduction (UMAP) (McInnes et al, 2020; Narayan et al, 2021) are used
for this purpose, to afford interpretable views of high dimensional spaces.1
Using techniques such as t-SNE or UMAP allows us to visualise and
explore the film archive as a topology of learned features that is graphed
by measuring the distance between one and every other frame, and this
graph in turn shows latent structures in the form of clusters and continui-
ties. This is, at a glance, the route or cultural analytics of distant viewing
as it employs machine vision and AI: the re-inscription of complex time
representations into interpretable 2D and 3D space.
We must take care to understand, however, that the relations plotted
in this space are qualitatively different to the ones described in a regular
chart. Data points in these visualisations are not plotted according to an
external scale, but are instead projected onto a 2D or 3D plane through
the t-SNE or UMAP algorithms, which calculate the position of data
points at different steps according to their respective hyper-parameters,
in the case of t-SNE, for example, perplexity2 while in UMAP users can
control nearest neighbours,3 and minimum distance.4 The cluster size and
distance between clusters will depend not only on the underlying data
distribution, but on how the user configures the projection. The spatial
dependencies shown by these dimensionality reduction techniques are
therefore only meaningful in relation to themselves and can show vastly
different versions of the same underlying data depending on how the
technique is applied. Under certain configurations, these dimensionality
reduction algorithms will not show any structure, even if there is one, or
will produce clearly defined structures out of random noise.
The broader point I want to make through this dive into the technical
aspects of interacting with latent spaces is that this type of manipulation
is a significant departure from other forms of quantitative analysis and
their epistemic frameworks. The data practices and techniques of induc-
tive computing are much more performative and interactive than other
types of computational statistics, in that there is a more explicit feedback
loop between the expectations of the user and the renderings produced

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
140 cine ma a nd mac hi n e v i s i o n

by the algorithms. It is common for instance to go over many iterations to


test different combinations of data/hyper-parameters/targets, to evaluate
if the latent space allows for the kind of self-organisation that is useful
to solve a specific problem and meaningful to how such a problem was
defined in the first place. In other words, latent space navigation needs
its subjects to perform an iterative minimisation of error of their own in
coordination with their machines. Techniques like t-SNE and UMAP
are necessary precisely because they enable such coordination; they afford
agency over otherwise inaccessible manifolds. Conceptually, these data
embedding activities can and probably ought to be seen as creative prac-
tices, closer in kind to sculpture than to statistics.
In mathematics, embeddings are a core concept in manifold geometry,
and refer to objects that contain other objects, and to the shared properties
between the container and contained (Nicolaescu, 2007). But the word
‘embedding’ has an older history; it was first applied in the eighteenth
century to describe the condition of fossils in rock. Recovering this geolog-
ical etymology, a useful way to think about embeddings is to consider them
shapes or structures ‘trapped’ between data strata, and simultaneously
to think of embedding as the action of ‘casting’ these shapes into solid
rock. In two commonly quoted aphorisms attributed to Michaelangelo,
the Florentine artist is said to both have ‘seen the angel in the marble and
just set it free’, but also that a block of raw marble ‘can hold the form of
every thought an artist has’. In one version the angel lives in the artist’s
mind, in the other it lives in the rock. Embeddings in machine learning
embody this double meaning, and its models are designed as a dialogue
between analytical and creative practices. On the one hand the analytical
impulse asks us to uncover a structure that was there waiting to be set free,
while on the other, a creative impulse demands that we sculpt something
out of layers of noise. The emphasis one chooses will likely depend on the
context, of one’s background and inclinations, yet the broader argument
here is that inductive computing cuts both ways: it folds both the object
and the subject in a concerted interactive process.
Once we think of these as interpretive-sculptural processes, subjectivity
and the need for explanation return in full force. The question is not only
or mainly about ever more sophisticated methods and tools for revealing
patterns in data, but about the ways of interaction with AI models, and the
new forms of cognition and emotion they afford. What is needed from this
perspective is to identify and understand these emerging forms of algo-
rithmic reasoning and emoting enabled by machine vision, and the visual
imaginaries they bring into existence. My intuition is that this cannot be
accomplished through cultural analytics and its distant reading operations

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
alg orithmic f ilms as dat a a n a l y s i s 141

alone, which operate inside the container and can therefore not explain its
constitution. If we aim to turn co-relations into explanations, we need a
framework that accounts for the set of creative practices that go into the
constitution of AI systems, a system to narrate how the subjects are folded
into the objects.
I submit that certain types of algorithmic films already do some of this
successfully. I mentioned already the films of Dziga Vertov and Hollis
Frampton, and other significant referents are the experimental documen-
taries of Harun Farocki, in particular his three-part installation, Eye/
Machine (2000–2002), in which he coined the term operational images,
images that ‘do not represent an object, but rather are part of an operation’
(2004, p. 17). Farocki’s work ignited debates in media scholarship about
the production and circulation of images increasingly outside the realms
of human viewing and control, visual regimes that ‘require neither human
creators nor human spectators’ (Blumenthal-Barby, 2015, p. 329) as they
are produced through the eyes of an automaton that ‘replaces the human
robotically’ (Foster, 2004, p. 160). This idea of an economy of images that
has left humans outside the loop is an extreme version of Vertov’s original
observation of how the movie camera afforded points of view that were
unavailable to humans, thereby extending vision beyond human faculties.
This in turn sat well with post-humanist media theorists and media eco-
logical approaches, giving currency to their critiques of media essentialism
and anthropocentrism.
I share some of the commitments of post-humanists to expand the
field towards complex and dynamic systems that include an ecosystem of
non-human actors, but I am more sceptical of the leap that posits machine
vision as a form of non-human vision, and by that same token I am less
persuaded that the images produced and regulated by AI systems are
necessarily or mainly operational, at least in the sense that Farocki defined
it. Besides, there are several other works that already cover operational
images at length (Parikka, 2023; Hoel, 2018; Pantenburg, 2016). Instead,
here is where I believe a technical understanding about how the automa-
tion of vision is achieved becomes fundamental. As I show in earlier chap-
ters, dismantling inductive machine vision reveals many human actors and
nested layers of labour. We are certainly just one of many actors operat-
ing in larger environments but shifting the focus away from the human
subjects that co-constitute AI risks downplaying, and even effacing, key
power relations between designers, owners, users, and all others who are
affected by these systems.
It is not that operational images make humans unaccountable – there
is undoubtedly a political project in Farocki’s use of images of war in his

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
142 cine ma a nd mac hi n e v i s i o n

Eye/Machine – it is, rather, that when we redistribute aspects of vision to


other elements in the system, it is easy to forget that we ourselves are also
part of this system, and by downplaying the human in the loop, the power
relations between seers and the seen become harder to pin down. In this
I am closer to the media ecologists and the notion of culture as a complex
system than I am to the post-humanists, if and when they lean towards
anthropophobia. Machine vision of the kind described in this book is
arguably one of the most overtly anthropocentric assemblages, drawing
from and catering to human knowledge and desires, often to the expense
of the planet and its other inhabitants. Disassembling the technologies
that enable machine vision into its technical and historical constituents
reveals human intervention at every step and reveals automation to be an
intimately human process. The main challenges with AI today lie within,
not beyond, the human.
It follows that my interest in algorithmic films is not that they channel
non-human actors, but that they contain the instructions for their watch-
ing through the familiar logical gateways of difference and repetition, and
how this allows them to thematise and give content to human algorithmic
thinking. And this capacity is not circumscribed to avant-garde or experi-
mental documentaries either but is a visuality (a visual rationality) that is
also present in other forms of popular media; for example the supercut,
which I have argued is a computational form of the video essay (2023).
The supercut consists of an editing technique in which short video clips
with common motifs or salient stylistic characteristics are extracted from
their original context and are sequenced together in a montage. The com-
monalities are highlighted through repetition and interpreted by viewers
as a form of aboutness, meaning the thematic content of the supercut.
Popular supercuts abound online in numerous non-professional com-
pilation videos in different video platforms.5 Tohline (2021) traces the
history and aesthetics of the supercut, highlighting the ways in which it
intensifies attention, and theorising it as a visual expression of what he
calls ‘database episteme’:

[. . .] the supercut entails not simply a mode of editing, but a mode of thinking
expressed by a mode of editing. [. . .] Just as capitalism treated workers as machines
as a prelude to workers being replaced by machines, so also supercutters simulate
database thinking in apparent anticipation of a moment, perhaps in the near future,
when neural networks will be able to search the entirety of digitized film history and
create supercuts themselves, automatically. (p. 3)

Although as I have argued, machines do not create by themselves, Tohline


is right in that modes of thinking can be expressed as modes of editing.

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
alg orithmic f ilms as dat a a n a l y s i s 143

Algorithmic films, as a more ecumenical genus, prompt us to acknowledge


our position as viewers engaged in the pleasurable decoding of images
along a temporal axis. And this is a singular affordance of films; they can
conjure the conditions of necessity by which we feel and expect that some
images ought to follow others. By putting images in time, we endow them
with relations of continuity, which is a stronger form of relation than
contiguity or proximity. Perhaps continuous images in time appear to be
more densely related than proximities in space because time cannot be
frozen or experienced in reverse; it can be manipulated to appear revers-
ible, or altered such as in fast, slow, or reverse motion, but this is always a
manipulation of past duration, whereas our conscious perception of time
as we watch these images unfold can never be stopped. Space can come at
us simultaneously, time cannot.
This is why algorithmic films might be the right complement to data
projections like t-SNE or UMAP. The types of relations they can create
between images can arise too from high-dimensional spaces but are even-
tually arranged along a temporal axis that provides a structuring order:
a temporal container in which contingent proximities can solidify into
thematic and even narrative forms. And these forms are needed for the
configuration of explanations, which require stronger forms of causality
than the aggregate of co-relations.
Narrative, whether explicit or not, is a better container for causal expla-
nations, it allows for counterfactual understanding and the configuration
of subjects who can give meaning to the data projections and are able to
put forward answers to why data looks the way it does, giving the models
created by AI a moral dimension. Algorithmic films satisfy in this way the
creative and sculptural impulse at the heart of machine vision and can be
understood at the same time as an effective analytic practice and a dimen-
sionality reduction technique in their own right.

Notes
1. One significant difference between PCA and the other two is that PCA is a
linear projection and therefore cannot capture non-linear relations in data.
2. Van der Maaten and Hinton describe perplexity as ‘a smooth measure of the
effective number of neighbours’ and indicate that ‘typical values are between
5 and 50’ (Hinton, p. 2582). It can be thought of as a measure of the number
of neighbours for any given data point. One of the purposes of t-SNE is to
avoid what is known as ‘the crowding problem’, which occurs when points
in high dimensional space are forced to ‘fit’ into lower-dimensional spaces.
There simply is not enough room in fewer dimensions so the points end up

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms
144 cine ma a nd mac hi n e v i s i o n

overwhelming the space. To prevent this, t-SNE calculates the local euclidean
distance between points and then ‘spreads it out’ as points are projected.
3. The number of approximate nearest neighbours used to construct the initial
high-dimensional graph.
4. The minimum distance between points in low-dimensional space. Again, to
mitigate the crowding problem.
5. See for example: ‘The Wire – Donut Supercut’ (https://youtu.be/kkMFaWg
5rXY). A compilation of all the scenes of a small-time car thief character from
the television series The Wire, or ‘American Restaurant Chains in Movies and
TV – Supercut’ (https://youtu.be/_Aw8ruEm7Fo). Many more examples are
available in Tohline’s comprehensive study of the format.

This content downloaded from 146.50.98.82 on Wed, 05 Mar 2025 08:43:31 UTC
All use subject to https://about.jstor.org/terms

You might also like