Walter Benjamin’s seminal 1935 essay “The Work of Art in the Age of Mechanical Reproduction” wrestled with the effects of powerful technologies upon culture, and presaged much subsequent writing, e.g. Martin Heidegger and Italo Calvino. Here I want to consider not the artwork-qua-object as in Benjamin, but rather the work of art as an active force, in order to explore how the ‘looking-at’ and ‘understanding-of’ art, thought of as activities at once subjective and collective, which are informed by history and tradition, might change in the world-to-come. Moreover, under the cover of an essay nominally about art criticism, I hope to smuggle in questions about humanity’s project of transcending itself through AGI.
In the 1967 talk cited above, Calvino proposed a combinatorial machine as an aid or replacement for the human author. His comments presaged large language models (LLMs) like GPT-3 or PaLM that can work across natural (human-spoken) and formal (computer) languages and can operate in a wide variety of contexts. LLMs are a qualitative improvement in capability beyond the AIs that came before, in a direction that arguably points towards AGI.[1]
The interest in LLMs mirrors excitement that started around 2014 and continues today, with respect to multimodal models such as CLIP or DALL-E which correlate image and word. Social media are now saturated with much computer-produced art, and rivers of digital ink inevitably flow on whenever it occurs that machine artists might supplant human ones.[2]
Regarding criticism
This essay will consider a different side of the art world’s ecosystem: the critic. Namely, I want to ask: what would it mean to have a machine that engages in art criticism? I propose that an adequately powerful system could be trained to look critically at, and usefully discuss, visual art. Call this a ‘cybernetic’ critic, the adjective perhaps recalling the role of culture as ‘governor’ (kybernetik?) of politics – reflecting, defining, and perhaps restraining the human.[3]
Etymology aside, I do not speak here of this or that critic – the everyday viewer, the paid hack, enthusiastic blogger, or the turgid academic – but rather of that which is common to all: the critical function. More specifically, I consider the variety of criticism developed in art schools and universities, acknowledging that this is a tiny sample size: namely one coming from the perspective of an artist, educated at a UK art school in the 2010s, alongside conversations with a few other artists.
What is criticism?
All work in art (visual, at least) – that of the artist or the critic – starts in the looking.
The late printmaker and educator Klaas Hoek, in tutorial, suggested a 3-step process:
1. Describe the material of the artwork, the stuff that it is made of.
2. Describe the forms, shapes, and arrangements that are visible within the work.
3. A reasoned guess at what the work might ‘mean’, that augments the above steps with the critic’s own experience, the date it was produced, assumption about the artist’s intention, the contemporary and historical context, both of the art form, and the world-at-large.
More formally, the critic and educator Terry Barrett[4] lists some eighteen principles for art interpretation, which, in aggregate, concretise the notion of an artwork being ‘about something’, i.e. what Arthur Danto[5] termed ‘embodied meaning’.
Notably, Danto wasn’t quite talking about artistic quality: along with beauty and aesthetics, he didn’t really see quality as part of the critic’s domain. He preferred to find in successful artworks a certain ‘unity’, the ineffable something that promoted a ‘mood’ in the viewer (Danto, 152-156). Danto was drawing upon Martin Heidegger, who saw Stimmung (moods), like boredom, anxiety, or terror, as wayposts to the sublime, pulling the viewer out of the banality of his/her quotidian world.[6]
Returning to Barrett’s critical framework, a mood, in its common-sense meaning of feeling within the viewer, is vital to interpretation, but this feeling must be backed up with something that can be communicated to others. In the writing of Barrett, Danto, and others, criticism is importantly a discursive practice. Whilst one’s initial engagement with a work is perhaps personal and feeling-based, there is an instant conversation, particularly for the professional critic, with an academic discipline that is at once recondite, rigorous, and recursive. This is the history and theory of art, sedimented over generations dating perhaps back to the Florentine Giorgio Vasari (1550 CE). In writing or in conversation, opinions are articulated and debated, formed and changed (Barrett, 220-227), for as Barrett points out, an over-subjective interpretation tells more about the interpreter and is perhaps inaccessible to the audience-at-large (who do not share the same subjectivity as the interpreter). Meanwhile, an interpretation that is entirely art-historical or theoretical may struggle to engage that same audience, because it is too abstruse or presupposes too much theoretical or historical knowledge.
On the cybernetic-critic
So, what would building a cybernetic critic – hereafter also ‘c-critic’ – entail?
At a minimum, our c-critic would need to be trained on an image corpus of art, as well as non-art, taken from a suitably representative and diverse global context. In this sense, it would form a much larger task than, for instance, that recorded in this 2019 paper, trained on 80,000 images only from Western art history. This corpus would require photographs as art, photographs of art (i.e. images of other works such as installations), photographs of objects in the real world[7], and the ability to process moving-image versions (including audio) of the above.
Given the fundamentally discursive nature of criticism, our cybernetic critic would also need to digest a very large body of art reviews, theoretical articles, interviews, and so forth. Whilst some of this material may well be already included in existing corpora, such as that used for today’s LLMs, it isn’t clear whether those datasets include specialised art commentary or paywall-blocked general content, or indeed, the non-digitised content of the world’s (art) libraries.[8] For instance, the average essay in October is impenetrable without at least an undergraduate education in art theory and a substantial embedding in what Danto called the Artworld: “To see something as art requires something the eye cannot decry – an atmosphere of artistic theory, a knowledge of the history of art: an artworld.”[9] Although Danto wouldn’t have quite put it in the following way, the ‘artworld’ is keenly attuned to the fashions of the day: for instance, identity is today a dominant frame in the presentation and interpretation of art, in a way it might not have been in 1964. Walter Benjamin made a similar point, about how the reception of culture changes, when he traced the evolution of art from object of cult veneration to historical artefact, i.e. one of many serially arranged and catalogued in a museum.[10]
At the end of this process, one might expect the c-critic to be able to give a précis of a work: its material, organisation of forms that make it up, a list of the references it recalls (in and out of art) – in short, an account of the work that satisfies the criteria of coherence, correspondence, and inclusiveness that Barrett identifies (Barrett, 219-220).
Somewhat more speculatively, the c-critic might be able to take a stab at what the artist’s intention was, and back it up with references to any statements or prior works by the artist, or other criticism of the artist’s work. More ambitiously, perhaps it could propose a ‘meaning’ for the work, reasoning about how theories of art might apply, and by contextualising the work against and within its general knowledge about past or current events in the world.
In this vein, there has certainly been progress in getting machines to extract meaning from images; for instance, this 2021 paper purports to identify the ‘emotional content’ of a painting. There are also tentative machinic moves towards ‘reasoning’; for example, consider this series of exchanges between a human and GPT-3, which were designed to draw out logical inferences from the machine, many of which make sense, and some of which are quite subtle.
To be clear, however, the experiment in c-criticism is not feasible today[11] – although, within the AI research community, there is a view that human-level intelligence, across a wider variety of tasks, may be achievable simply by developing much bigger models and faster computers, with no requirement for consciousness, feelings, or any of the paraphernalia we regard as typically ‘human’.[12]
But is this really art criticism?
Above notwithstanding, could the potential c-critic be said to truly interpret, criticize, or understand the work in front of it? This is an open question: in the thought experiment above, the c-critic has produced a plausible opinion that is well-grounded in other writing.
Still, something appears to be missing.
For instance: common sense. GPT-3 has nothing like the everyday knowledge about the world that a child has and can easily be tripped up by a badly written prompt. Imbuing agents with common sense, a body of ‘core knowledge’ in Melanie Mitchell’s terminology, is a major imperative in AI. Another researcher, Shannon Vallor, argues that AIs such as GPT-3 don’t have any conception of a world and cannot go through the process of understanding, being too restricted to a static, impersonal, textual representation of the world.
Moreover, much of human knowledge isn’t necessarily written down or encoded: it is tacit or corresponds to know-how that stems from physical, as well as mental, activity and overlaps with ‘common-sense’ knowledge. This is particularly important for art, concerned as it is with technology in the form of craftsmanship, which goes under the rubrics of praxis and techne.[13]
Core knowledge may well require the AI to have something like a body, some awareness of the phenomenological experience of three-dimensional space, and time – an obvious point for anyone who has looked at sculpture. As Michael Fried memorably wrote, specifically concerning Minimalist sculpture:
The better new work takes relationships out of the work and makes them a function of space, light, and the viewer’s field of vision. The object is but one of the terms in the newer esthetic. It is in some way more reflexive because one’s awareness of oneself existing in the same space as the work is stronger than in previous work, with its many internal relationships. One is more aware than before that he himself is establishing relationships as he apprehends the object from various positions and under varying conditions of light and spatial context.[14]
That is, the act of walking around a Minimalist sculpture, and I would argue paintings or some installation art, is a big part of art appreciation.[15]
Moreover, the behaviour of other visitors in a gallery often has some bearing upon one’s experience of artwork, a point suggested by Fried when he described Minimalist sculpture as ‘theatrical’ (there is so little to see in the sculpture, in the sense that it is symmetric, monolithic or relatively featureless; thus, it presupposes and needs an audience). This would imply that a competent c-critic ought to be a robot – it could then actually move in the gallery and thus have some physical experience of three-dimensional work.
Still, even without such embodiment, the c-critic might actually have an advantage: it is natively virtual, that is, it has unmediated access to a computationally simulated world. In the same way that younger generations are described as ‘digital natives’, the c-critic would encounter art that exists entirely in a rich, digitally-created 3D world, without the apparatus of VR goggles and haptic gloves that we require.[16] If the future unfolds in this way, the c-critic might legitimately have a unique perspective that doesn’t merely ape a human critic but tells us something new about a strange land in which many of us would be, at least initially, strangers.
In any event, the difficulty of imbuing the c-critic with bodily sensation is only part of a bigger problem, namely, how to give it subjectivity, feeling, and emotion, which, if one believes Barrett’s taxonomy of criticism, are vital to interpretation. The c-critic, trained on all the world’s images and writings, presumably still would not possess human-type qualia[17], which, on some views, are intertwined with emotional states. Unless it is explicitly programmed otherwise, it would lack a detailed understanding of the human condition: our passage from childhood to death.[18] Thus, it might miss the point of much art – whether the memento mori of Renaissance painting or a Damien Hirst installation.
Further, Barrett and Vallor (above) both approach criticism and understanding, respectively, as communal acts, which might happen in an art-school critique, the pub, or less enjoyably, in an online essay-and-comments. The c-critic, unlike a human embodied and embedded in the ‘artworld’, would presumably have only partial experiential access to this give-and-take of art discussion. To the extent it updated its internal understanding or opinion, of a given work, or of art in general, it would be based on the secondary sources of textual or spoken art criticism by others. A similar idea is more directly addressed by cognitive scientist Joscha Bach, who sees language as something both ‘indexical’ – that is, referring to things in the world that are directly experienced – as well as communal – that is, specific social groups develop shared understandings of how strings of symbols correspond to certain shared mental models.[19]
This notion of language as a tool for collective explication also points towards an AI research approach, where two artificial agents debate each other, while a human judge judges the result of the debate, the idea being that the agents can explore a space of possible arguments more quickly than humans can, but any final conclusions must be capable of convincing a human.
What surprises might we get?
Simulated affect: While an AI might not have any human-like emotions, it could be programmed to simulate emotion. The c-critic could be trained to ask, “What would a human say about this artwork?”, as a way of getting it to learn to emulate a human critic. As these efforts become more convincing, perhaps we might relax our reluctance to view agents, whether a humble service robot or a c-critic, as viable subjects[20] in their own right. Taken to an extreme, might it eventually become a meaningless, or at least parochial, question, to ask “Does the c-critic ‘really’ feel anything?”
Meta-cognition: Even a c-critic which is little more than a correlation savant may surprise us or be useful. This is because, in a sense, art-making and art-interpretation are correlative activities. An artist’s process, in part, is one of traversing (mentally or by physically trying things out) a space of possible configurations of material (which may include paint, objects, shapes, video clips) and trying, either ex-ante or ex-post, to justify certain aesthetic decisions they have made.
Importantly, the juxtapositions an artist chooses are ‘strange’, or as Alva Noë puts it, “Art disrupts plain looking and it does so on purpose. By doing so it discloses just what plain looking conceals.”[21] Bach puts it into the language of AI: again using the metaphor of a configuration space, he suggests that creativity is the act of bridging or connecting unconnected portions of that space, while art (as distinct from creativity) is when these connections draw upon the agent’s unified model of the world, and thus have some meaning. Bach’s explanation thus echoes elements of Danto’s notions of unity and embodied meaning as being central to art.
Moving from art-making back to criticism: the viewer-as-interpreter tries to break down this hairball of stuff and ideas, in part to divine the artist’s justification or intention. This may happen intuitively or semi-consciously in the human case, or in the high-dimensional space of a neural network. Moreover, this process of mentally unraveling the work (and the visual world generally), which seems to be part of interpretation, also drives a higher-level loop in cognition, wherein we revisit our knowledge and opinions of the world. In theories of mind, this thinking-about-thinking falls broadly under the term ‘meta-cognition’, which has historically been studied as part of human developmental psychology but seems important in reaching AGI.
This internal dialogue is something that Walter Benjamin identified, at least in an art-viewing context, as an act that unfolds in time, as one of immersion, contemplation, and association-making (Benjamin, 34). Apocryphally, his near contemporary Aby Warburg practised this thinking through space and time in the Mnemosyne Atlas: a vast collection of physical photographs that Warburg, for years, arranged and rearranged, walking around them, using them to compose his theory of cultural evolution. For Warburg and the Benjamin of The Arcades Project (1927-1940), theories of art and culture were inherently peripatetic, to be constructed through an ‘exercise in embodied thought’.
In an AI-specific context, Venkatesh Rao has raised similar points regarding the (possible) centrality of time to the structuring of human cognition, as well as the relationship of time to embodiment, the physical world, and phenomenology.
The flow of time thus might inject another difference between human and automaton. The human views art over a timescale of minutes, during which the mind wanders, and environmental sounds and visual stimuli, as well as other visitors to the gallery, interpenetrate the art, spawning further associations. The c-critic, on the other hand, would not have (at least) two types of anthropogenic constraints. First, it could view the scene and ‘think’ about the work much faster, thus perhaps missing out on the human’s temporally-stretched experience of looking.[22] Second, the human mind can only consciously attend to one or a few things at a time, yet no such restriction need exist for a panoptic c-critic, something that would fundamentally change how this agent perceives art and the world.
Ambiguity: The modern world of technology and standardisation prizes precision and certitude, evolving a range of tools to handle contingency, risk, and uncertainty, from insurance and financial derivatives, to the state itself. The best art, on the other hand, is open to multiple interpretations, potentially making it meaningful to multiple audiences and retaining its relevance over centuries. Art is not the only site of productive ambiguity – philosophy is another: for instance, as Wittgenstein[23] described in his notion of ‘family resemblances’ and his appeal to Jastrow’s duck-rabbit. We would expect our c-critic to recognise possible sources of ambiguity in a work or its context, flag them in its interpretation, and perhaps request clarification for things that seem particularly opaque or troubling.
What’s more, it is in relation to teaching machines about ambiguity that a major issue with AI shows up, far exceeding the biases and non sequiturs of GPT-3. It has become evident that specifying precise objectives to artificial agents often leads to pathological behaviours in the face of a real world that is out-of-distribution (OOD). This is not dissimilar to what we see outside AI, for instance, in the financial industry: specifying regulatory targets to banks pre-2008 led to pathologies like the housing bubble. But without some objective, an AI can’t do anything useful. Hence, an area of intense interest is how best to teach norms, values, or other ‘fuzzy’ concepts, and encourage the AI, when it isn’t sure, to come back and ask a human. In fact, the concept of ‘human values’ is a particularly awkward notion. There are two obvious problems: first, there is no ‘we’ and even if there were, humans don’t all have the same values and never have; second, almost no one of us individually can consistently set down what our current values actually are (see Russell 2020, Ch. 9). Third, as societies, our values have changed radically over centuries – from the Greeks down to twentieth-century America, slavery was a common practice; similarly, industrial farming is widespread today, but might be abhorrent in the future. So what values exactly are AIs supposed to learn?
While I make no pretense that art is the answer, I do make the possibly naïve suggestion that the ambiguity inherent in art might teach the machine something apparently ineradicable about humans in the world.
Conclusion
“An artist is an algorithm falling in love with the shape of the loss function itself.” –Joscha Bach[24]
A c-critic might never come, or by the time it comes, it may present limited interest, as we will already have reached (at least) human-level intelligence. At one level, criticism could be seen as just another AI-complete problem: get a machine to comment on some sensorial input (primarily, but not necessarily, a camera feed). However, I argue that art is slightly different: art is not quite the same as the everyday stuff of the world. Rather, artworks comment on, in the sense of referencing – in a meaningful or intentional way – the world. Hence, art is a representation of the world, and this representation may be recursive (in that art can reference other art). Art can represent objects in the world but also concepts or ideas: hence, it exists in a meta-relationship, much like fiction, or language generally, but with much greater informational content. Art is a particular physically-realised and socially-codified practice adjacent to acts like imagination, mental simulation, or thought experiments, which are things humans do routinely, even if they aren’t always named as such. It seems to be one of the ways we establish knowledge and test out possible future consequences of our actions. Hence, I would argue that both the making of art, and the interpreting of it, may sit in a special relationship to the general search for AGI, and fleshing this out would be the focus of a future essay.
“In a corner of the canvas, as they came nearer, they distinguished a bare foot emerging from the chaos of color, half-tints and vague shadows that made up a dim, formless fog. Its living delicate beauty held them spellbound. This fragment that had escaped an incomprehensible, slow, and gradual destruction seemed to them like the Parian marble torso of some Venus emerging from the ashes of a ruined town.” –Honoré de Balzac, The Unknown Masterpiece[25]
Notes
[1] It is an open question whether deep learning-based architectures, such as GPT-3 type systems, if given enough processing power and data, would eventually add up to AGI, or whether there is some fundamental conceptual hurdle that remains undiscovered. This essay in Noema and this conversation give an idea of the positions various researchers hold.
[2] Artists have always experimented with the latest kit – Caravaggio was arrested in the possession of compasses, perhaps a newish invention in 1598. For AI-in-art, see this recent newsletter, article, as well a 2019 review I wrote of Pierre Huyghe’s show at the Serpentine Gallery, London.
[3] Writing around the social role of art is vast: ranging from Plato (who would famously banish the artists from his Republic) to Hannah Arendt or Hans-Georg Gadamer who in different ways place art very much at the centre of the task of humanity’s collective self-understanding; to Frankfurt School criticism of the Left’s tendency to navel-gaze while safely ensconced in the towers of academia and drawing rooms; to Guy Debord’s pungent riposte to the quiet co-opting of culture by contemporary capitalism. For a brief summary, see Sophie Cloutier, “The Social Role of Art: A Reading of Art and Truth after Plato”, in Existenz, (Vol. 9 No. 1, Spring 2014), 22-25, available here. A less academic take by artist Liam Gillick on Gilles Châtelet is here.
[4] Terry Barrett, Interpreting Art: Reflection, Wondering and Responding (New York: McGraw-Hill, 2003), 197-228, currently out of print. Substantially similar content is available on Barrett’s site.
[5] Arthur Danto, What Art Is (New Haven: Yale University Press, 2013).
[6] See Danto, 153-154; or, for a more comprehensive summary, and, in the context of AI, see the work of Hubert Dreyfus.
[7] Real-world images are needed because the readymade long ago shattered the boundary between art and non-art: see Danto (2013).
[8] There are significant practical and theoretical obstacles to getting machines to digest unstructured textual data, as Stuart Russell points out (see Russell, Stuart, Human Compatible (New York City: Viking, 2019), 79-82.
[9] Arthur Danto, “The Artworld”, in The Journal of Philosophy (Vol. 61: No. 19, 1964), 580.
[10] Walter Benjamin, The Work of Art in the Age of Mechanical Reproduction (originally published 1935) (London: Penguin, 2008), 11-13.
[11] My (unsubstantiated) intuition is that to build a c-critic might be as difficult as building an AGI, i.e. an AI-complete problem. Depending on one’s precise operationalisation of AGI, constructing a highly-capable c-critic might be a more difficult problem (than making AGI), in that artistic creation and interpretation are potentially the most abstract/meta- mental activities humans undertake. As this essay hopefully demonstrates, interpretation in particular is a complex task that draws upon history and philosophy, and operates at individual-subjective and culturally-encoded levels.
[12] See David Chalmers, GPT-3 and General Intelligence, 2020. From the perspective of what more compute might achieve, see this post by Daniel Kokotajlo. For an opposing viewpoint, see comments by Gary Marcus which are addressed here. The more-compute versus ‘secret symbol sauce’ debate is recapitulated in these two Noema essays.
[13] See Giorgio Agamben, The Man Without Content (Italian edition) (Quodlibet, 1994), 37. See 42-47 of the English translation here. He gives an account of how these terms develop through Plato and Aristotle, down to Heidegger.
[14] Michael Fried, Art and Objecthood: Essays and Reviews (Chicago: University of Chicago Press, 1998),153.
[15] It is even important with paintings, as viewers can walk near and around the edges, to get a better look at the brushwork or see the edge: often a fair sign of the painter’s craftsmanship and intentionality.
[16] The hackneyed analogy is the copious writing around the Metaverse, but for more rigorous and interesting analyses, see Robin Hanson, Nick Bostrom, David Chalmers, and (to take one example) the fiction of Charles Stross.
[17] This is one of the major open questions in AI, part of the ‘hard problem of consciousness’, namely, what is it like to ‘see red’, or to ‘feel water’? We have an idea of visual- or touch-processing in the sense organs and in the brain, but that doesn’t tell us much about our (or others’) subjective sensory experience. And since the senses are largely the foundation of our entire structure of knowledge, we face what is known as the ‘symbol grounding’ problem, namely that we don’t really understand how higher-level cognitive and linguistic concepts are connected to raw sensory inputs. It is not clear whether an AI possesses, or can be made to possess, qualia and therefore phenomenal experience. See Roman L Yampolskiy, Detecting Qualia in Natural and Artificial Agents, 2017.
[18] For example, like the rich memories implanted into replicants, as in Philip K. Dick’s Do Androids Dream of Electric Sheep (1968) which inspired the Blade Runner films. For relevance to AI, see Hubert Dreyfus, “Why Heideggerian AI failed and how fixing it would require making it more Heideggerian”, in Artificial Intelligence (Vol. 171: 18, 2007): doi:10.1016/j.artint.2007.10.012.
[19] For Bach, see this interview, around time-mark 1h40m. The philosopher Reza Negarestani suggests something similar, sketching a picture of intelligence as a function of language, a social thing that is shared within a collective, constantly in the process of being updated through everyday use: see Negarestani as well as Negarestani, Intelligence and Spirit (Falmouth: Urbanomic, 2018).
[20] It is worth noting that attitudes to robots already vary: Japanese attitudes to robots might be related to the Shinto animism prevalent in Japanese culture, whereas Western wariness of blurring the man-machine boundary might be a Judeo-Christian prejudice. Also, see this more academic treatment of Japanese attitudes to automation.
[21] Alva Noë, “What Art Unveils”, in The Stone in The New York Times (2015).
[22] Besides the reference to the virtual world above, there are two caveats here. First, the c-critic’s subjective experience of time may well be different from ours, for instance, a minute of our time might be hours of days of the agent’s time, simply because of the agent’s much faster hardware. Hence, within what seems to us like a very short viewing window, the c-critic might still have a subjectively rich experience, though arguably it would still miss things like shadows changing in a Robert Irwin installation as the sun moves. For subjective time perception by certain types of artificial agents, see Robin Hanson, “What will it be like to be an Emulation?”, in Russell Blackford, Damien Broderick, eds. Intelligence Unbound: The Future of Uploaded and Machine Minds (New York City: Wiley, 2014): http://hanson.gmu.edu/IntelligenceUnbound.pdf. Second, current AI research around curiosity is showing unusual behaviour, such as agents in simulations (which are admittedly set up to incentivise these behaviours as a way of learning) poking around objects much as a child might in an unfamiliar environment. See Ch. 6 of Brian Christian, The Alignment Problem (New York City: W. W. Norton, 2020). Hence it isn’t inconceivable that the c-critic views the Irwin installation then comes back an hour later, because it ‘knows’ that Irwin’s work often is about light in an environment.
[23] Ludwig Wittgenstein, Philosophical Investigations (1953) (New York City, Blackwell Publishing, 2001).
[24] Joscha Bach, who also discusses his views on art briefly near the end of this wide-ranging podcast from the Future of Life Institute.
[25] Honoré de Balzac, The Unknown Masterpiece (1845) (Project Gutenberg, 2007).