This paper will explore the social history and legacy of the perceptron. I begin by tracing a genealogy of this technical object through the work of its designer, Frank Rosenblatt, paying particularly close attention to the construction of his Mark I Perceptron at Cornell University in 1962. I outline the neurophysiological framework that inspired Rosenblatt’s approach, emphasizing the ways in which analogies between the human brain and the workings of electronic computers were of particular importance. Following this, I discuss the controversy that erupted following the critique of connectionism—the approach to artificial intelligence embodied by the perceptron—mounted by Marvin Minsky and Seymour Papert in their 1969 book Perceptrons. For Minsky and Papert, the perceptron had severe limitations, limitations that could only be overcome by their own approach to AI research known as symbolic AI. The publication of their text is considered to have led to the “AI winter,” a decade-and-a-half absence of AI research. Through the work of Trevor Pinch and Mikel Olazaran in the field of science and technology studies (STS), I analyze the dissonance between the two “modes of articulation” of the controversy precipitated by the book, modes they denominate the “official-history mode” and the “research-area mode.” I briefly survey the prevalence of the official history mode in commentary on AI in contemporary critical theory of technology and some of the crucial errors these accounts make. Finally, I review the return of connectionism as the primary AI research paradigm in the late 1980s, and I situate the perceptron in this context as the kernel or finite automaton that generated or developed into the more complex neural networks widely used today in deep learning. Furthermore, I consider the ways in which the original architecture of the perceptron has mutated and transformed into models such as convolutional neural nets.
Wikimedia Commons
Connectionism
The perceptron was introduced by American psychologist Frank Rosenblatt. A professor at Cornell University, Rosenblatt’s theorization of the perceptron was part of his wider cognitive-scientific research into the workings of the human brain. In his 1958 paper “The Perceptron: A Probabilistic Model for Information Storage and Organization,” Rosenblatt poses three questions about the brain’s information processing capabilities that he hoped his research would clarify:
- How is information about the physical world sensed, or detected, by the biological system?
- In what form is information stored, or remembered?
- How does information contain in storage, or in memory, influence recognition and behavior?[1]
Focusing primarily on the second and third questions, Rosenblatt summarizes then-contemporary neurophysiological research that had garnered evidence for two competing answers to them. The first of these answers suggests that memory is constituted by a one-to-one correlation between environmental stimulus and neurologically stored information. In this view, the wiring diagram of the brain, once understood, would theoretically allow researchers to reconstruct precisely what an organism remembers by correlating environmental sensory inputs to memory traces that are retained as images in the storage-space of the brain. Likewise, these “coded memory theorists”[2] answer the third question by stating that recognition and behavior consist of matching new sensory stimuli with images previously stored in the topology of the brain.
Against the coded memory theorists, Rosenblatt subscribes to an opposition theory, which he views as congruent with the tradition of British empiricism. For his band of empirical neurophysiologists, the answers to the second and third questions posed above are more adequately provided by the neurophysiological model of connectionism.
The implications of the connectionist theory of cognition are as follows. Rather than the space of information storage pre-existing its environment, for connectionism memory exists only through the process of enaction. Whereas the coded memory theorists view the brain as pulling images from the world and retaining them as images, connectionism sees the act of perception as the creation of new neural pathways and the act of memory storage as the creation of sufficiently strong neural pathways. The repeated travel of neurons across a pathway determines its strength: repeated stimulus exposure builds connective consistency. For this reason, information storage is interarticulated with an organism’s perception of environmental cues, as sensory impression constitutes the creation of new synaptic connections, and the sum of these connections constitutes its memory. Information is thus stored as preferences for particular actions, as the storage capacity of the human brain is a vast network of associations.
Synaptic connections, central to the neuroanatomical model posited by connectionism, are formed by the firing of nerve cells called neurons. Crucially, the speculation that scientists may construct a mathematical imitation of the human brain’s neural networks was hypothesized by Warren McCulloch and Walter Pitts in 1943.[3] Referring to these as “nervous nets,” McCulloch and Pitts viewed the neuron as the primary unit underlying all the brain’s cognitive processes, and they sought to define a logical calculus that would capture the passage of neurons along synaptic pathways. If they could define such a calculus, they believed, it would demonstrate that the neural nets of the human brain could be implemented digitally as a generalized Turing machine.[4] Indeed, two of the fundamental aspects of their theoretical mathematics are hypostatized in Rosenblatt’s perceptron. First, McCulloch and Pitts viewed the firing of a neuron as an “all-or-nothing” event; a neuron either fires or it doesn’t, and there is no variation in the degree of its momentum. Second, a neuron must reach a threshold of excitation in order to fire. These thresholds are reached when the sum total of a given neuron’s synaptic input is of a sufficient level of energy. Both of these propositions are contained in Rosenblatt’s perceptron.
In addition to McCulloch and Pitts, there are two additional precursors to Rosenblatt’s connectionism, one theoretical and the other methodological. The theoretical precursor is computer scientist John von Neumann. Von Neumann was adamant in his view that the human brain’s cognitive process could be captured by a digital machine. In The Computer and the Brain, von Neumann offers a depiction of biological neural networks that closely resembles that provided by McCulloch and Pitts.[5] Having offered his description of the brain, von Neumann concludes by boldly proclaiming that his sketch is indistinguishable from the functioning of an electronic computer:
This is clearly the description of the functioning of an organ in a digital machine, and of the way in which the role and function of a digital organ has to be characterized. It therefore justifies the original assertion, that the nervous system has a prima facie digital character.[6]
One final and a crucial influence for Rosenblatt was the methodological approach of British cybernetician W. Ross Ashby. In Rosenblatt’s 1962 book-length expansion on the inspiration and aims of the perceptron, he acknowledges his appropriation of Ashby’s error-driven approach to systems training as well as Ashby’s emphasis on probability theory.[7] Though Ashby did not adhere to any particular neurophysiological model—he favored the “black box” approach, in which only the inputs and outputs are recognizable to the researcher—Rosenblatt admired Ashby’s research on homeostatic systems, which demonstrated that “an adaptive system must contain variables, step functions, and the ability to randomly reorganize itself to adjust to its environment.”[8] Thus, despite Ashby’s lack of a specific image of the brain, his view that the nervous system is “essentially mechanistic,”[9] and his focus on adaptive learning through negative feedback, is congruent with Rosenblatt’s more neurophysiologically-focused intellectual inheritance from McCulloch, Pitts, and von Neumann.
Thus, this thread of the connectionist neurophysiological theory—in which cognitive processes can be modelled both mathematically and mechanically—as well as the error-driven adaptive systems learning of Ross Ashby—inspired Rosenblatt’s approach to AI and his belief that he could build the perceptron using this framework. Indeed, in the introduction to his Principles of Neurodynamics, Rosenblatt declares that “there is general agreement that the information-handling capabilities of biological networks do not depend upon any specifically vitalistic powers which could not be duplicated by man-made devices.”[10]
Alexander L. Cicchinelli, Collector. Frank Rosenblatt Publications, #17-1-3370. Division of Rare and Manuscript Collections, Cornell University Library.
Mark I Perceptron
Rosenblatt had initially anticipated his perceptron to be realized as a piece of machinery, however its first concrete implementation was as a software program in an early mass-produced computer, the IBM 704, at Cornell University in 1957.[11] It wasn’t until 1962 that Rosenblatt secured the resources to build his perceptron as a custom-built hardware device. This first material instantiation was named the Mark I Perceptron. Rosenblatt and his team at the Cornell Aeronautical Library employed the Mark I for the purposes of image classification.
Rosenblatt’s team published an operator’s manual for the Mark I Perceptron that outlined its construction and the methods they used to train it.[12] The manual stipulates that the machine is composed of the three types of sub-units: sensory units (S-units), association units (A-units), and response units (R-units). Each type of sub-unit forms one layer of the perceptron, and the three layers are arranged sequentially. Each sub-unit receives energy, in the form of either light or electricity, and when a sub-unit’s total energy excitation reaches a particular threshold, the sub-unit transmits its received energy to the sub-units that it is connected to in the next layer. Every S-unit is connected to multiple A-units, and every A-unit to multiple R-units.
The S-units are photoresistors. There are 400 such photoresistors—conductive cells that register visible light—arranged in a 20 x 20 grid. This S-layer of photocells is able to generate a rudimentary, 400-pixel image of the object that the perceptron will attempt to classify. In the manual, the S-units are referred to as retinas, and they respond—like the neuronal model posited by McCulloch and Pitts—according to stimulus intensity on an “all-or-nothing” basis. The output signal an S-unit transmits to its A-units are all of the same 24-volt magnitude. Stimulus presentation is automated by a 35mm slide projector. The projector is placed in a light-tight box to emphasize contrast and reduce noise; the projection typically displayed letters of the alphabet or primitive geometric shapes. Based on which photocells are activated, the retinal field is theoretically able to classify plane patterns such as “position in the retinal field of view, geometric form, occurrence frequency, and size.”[13]
The A-units are analogues of the brain’s neurons, and each A-unit consists of a transistor amplifier and a relay. Each of the 512 A-units is connected to a maximum of 40 S-units. The input excitation of an A-unit is the sum of all 24-volt S-unit outputs received. The threshold for each A-unit varies between 0 and 100 volts. Crucially, connections between the S-layer and the A-layer are “pseudo-random,” in order to “eliminate any particular intentional bias in the perceptron.”[14] The pseudo-random wiring was imperative in order to demonstrate the perceptron’s ability to organize itself out of an initially disorganized network; this also made it unique from other work in digital computers being done at the time which relied on precise wiring.[15] Further, Margaret Boden states that Rosenblatt’s wiring was directly inspired by the homeostatic systems of Ashby, which emphasized probability theory over the Boolean logic favored by the other cyberneticians.[16] Presumably, Rosenblatt also needed this type of wiring to distance his work from the coded memory theorists’ neurophysiological framework by strictly emphasizing connective associations over mnemonic placement in topological space.
The A-layer sends electrical signals to the R-layer, but the voltage is not constant—unlike the previous layer-to-layer connection. The voltage transferred via the connection from an A-unit to its R-units varies according to a value that corresponds to the success that the particular A-unit has had in activating its R-units in the past. Rather than a simple feedforward connection between the layers, each A-unit receives feedback from its R-units. This process renders each connection’s voltage a continuous variable. The success values corresponding to favorable servomechanistic behavior between the A-layer and the R-layer are measured by potentiometers—rotary resisters that register voltage—which are driven by a series of DC electrical motors. The potentiometers are able to alter adaptive weights—or “wipers”[17]—according to the perceptron’s learning algorithm, thus training the system. This kind of error-driven self-learning that proceeds by negative feedback again points to the influence of Ashby’s homeostatic systems on Rosenblatt.
In this way, the sum of the A-layer’s success values constitutes the perceptron’s memory. These connective lines are thus mechanical analogues to the brain’s synapses posited in the theory of neurophysiological connectionism, in which memory is comprised by neural pathways, and information retrieval relies on associations.
The R-units register a value of 0 or 1 depending on whether or not the voltage from their total input surpasses the given R-unit threshold value. Each R-unit has a switch and a relay. The electrical signal received by the switch must exceed the R-unit threshold or the relay will not close. There are 8 such R-units, each of which encompasses a response-state light that, when illuminated, indicates a value of 1, and when not, 0. The response state value is fed back to the A-units (unless the cycle is stopped manually), thus altering the voltage of the connective line between R-unit and A-units and the measurements of the weighted potentiometers.
Many researchers in the field were enthusiastic about Rosenblatt’s Mark I, while others were dismissive. Rosenblatt’s work had been a somewhat high-profile endeavor in the scientific community ever since mainstream press outlets in the United States reported his stated goals in the original perceptron paper.[18] In Talking Nets: An Oral History of Neural Nets, published in 1999, Jack Cowan, a prominent researcher in neurology from the time, recalls, “[Rosenblatt] made claims that you could tell a circle from a triangle with his early perceptron. It was all wrong; you couldn’t do things like that. […] by and large it was clear that the perceptron wasn’t doing the things that Frank claimed it could do.”[19] James Anderson, a cognitive science professor at Brown University, is much more sympathetic, stating:
The conclusion was that perceptrons indeed have some severe processing power limitations, but those limitations seem to correspond to the strengths and weaknesses shown by humans. Perhaps neural nets are not very good engineering devices, but they are great models for mental function.[20]
Academics propounding both points of view are however in agreement when it comes to the profound significance that the arguments presented in Marvin Minsky and Seymour Papert’s 1969 book Perceptrons had on the AI discourse of the time. In the book, Minsky and Papert mount a critique of connectionism. For them, the best approach to AI research was not connectionism but their own approach, symbolic AI.
Mark I Perceptron at the Cornell Aeronautical Laboratory
Division of Rare and Manuscript Collections, Cornell University Library.
The XOR Function
Marvin Minsky and Seymour Papert’s 1969 book Perceptrons: An Introduction to Computational Geometry speculated that the perceptron, and the entire connectionist approach to AI along with it, had severe limitations.[21] Minsky and Rosenblatt attended high school together and had closely followed each other’s work since adolescence.[22] His and Papert’s critique was specifically oriented toward discrediting Rosenblatt’s work due to what they perceived as its lack of mathematical rigor. The three related critiques their book mounted against the perceptron deal with complex pattern recognition, the performance of the XOR function, and the limitations of a single-layer network.
First, Minsky and Papert argue that the perceptron had severe shortcomings when it came to registering plane contrast and object connectivity. This problem is represented by an illustration on the cover of the 1972 revised edition of their book. On this cover, there is a purple, swirling geometric figure against a red background. Both the geometric shape and the color contrast render it difficult to tell upon first glance if the geometric shape is continuous or at any point broken; in order to decide if the figure is totally connected, a human eye would have to follow its curves to completion. The inability to tell if the figure is connected is not only a problem for humans, though: it is also a problem for the perceptron. Minsky and Papert showed that Rosenblatt’s perceptron had difficulties in registering object connectedness as well as subtle color and light contrasts.
Additionally, Perceptrons argues, correctly, that a single-layered perceptron is unable to classify non-linear patterns; its classificatory capacities are limited to patterns that are linearly separable. In Euclidean geometry, linear separability refers to the clustering of two sets of data into A and B regions. This is most easily visualized with a two-dimensional plane. If a line or decision surface is able to neatly divide the data into A and B, the pattern is considered linearly separable. If data distribution is not so clearly delimited, the pattern is considered non-linear. Rosenblatt’s perceptron only demonstrated recognition of linearly separable patterns, severely limiting the types of stimuli it could classify.
A significant example of the perceptron’s reliance on linearly separable data is its inability to perform the Boolean XOR function. Also known as the “exclusive-or” function, this was the critique made of the perceptron that had the most significant weight in the artificial intelligence community.[23] The XOR function is an operation that returns true if A or B but false if Both or None; it returns true only if inputs differ. Again, visualizing this on a two-dimensional plane, the data pattern of the XOR function is only separable if two decision surfaces are used. Minsky and Papert’s book convincingly demonstrated that a single layer perceptron could not perform the XOR function, a major and much-discussed limitation.
Importantly, both of these critiques stem from the fact that Rosenblatt’s perceptron had only one layer of neurons. A single-layered neural net cannot compute much, and more layers are needed to perform more complex operations. However, Minsky and Papert’s book was often interpreted as implying that, even with multi-layered perceptrons, the problem of linear separability could not be solved. The book was interpreted as discrediting connectionist research in its entirety.
Modes of Articulation
Minsky and Papert’s book is often credited today with ushering in the “AI winter,” a marked absence of interest, funding, and research into neural networks which lasted throughout the 1970s and into the mid-1980s.[24] As stated above, while most researchers in the field are in agreement with regard to Minsky and Papert’s relatively uncontroversial assertion that a single-layered perceptron cannot perform the XOR function, the book was cited during its time as stating that even multi-layered perceptrons would be unable to overcome this obstacle. For this same reason, throughout the 1970s, the book was often seen as having discredited neural networks. This interpretation—that the book demonstrated that even multi-layered perceptrons could not perform the XOR function—is, as I hope to show, nugatory. Perceptrons demonstrated no such thing. The book’s historical role in precipitating the AI winter is thus much more complex than the institutional narrative would have it.
In his essay “What Does A Proof Do if it Does Not Prove?,” Trevor Pinch analyzes communication problems in the sciences by centering upon the ways in which different scientific communities articulate a “disputed cognitive object.”[25] Disputed objects take the form of a specific contribution—a text, proof, or interpretation—which registers as an event and “can be analyzed along more than one dimension.”[26] Communication breaks down when, due to respective practical conditions, different communities adhere to different dimensions of the disputed object. The adherence to a particular dimension is referred to by Pinch as a “mode of articulation.”
Two such modes of articulation are outlined by Pinch: the research-area mode and the official-history mode. The research-area mode analyzes the disputed object as an issue of immediate concern. It affords a degree of interpretive flexibility to the object in question. For researchers who adopt this mode, the disputed object is, to a certain degree, still open. Particular aspects are considered tenuous and hypotheses may be added and subtracted. For researchers who adopt the official-history mode, the conditions are not open. Interpretive flexibility is absent as the disputed object in question is closed. Pinch argues that the official-history mode is articulated when the disputed object is not within the scientific community’s area of immediate practical concern. The official-history mode is used discursively to provide scientists with a historical account of an object’s significance.
Mikel Olazaran’s paper “A Sociological Study of the Official History of the Perceptrons Controversy” analyzes Minsky and Papert’s Perceptrons as a disputed cognitive object in Pinch’s sense.[27] He traces the emergence of the two modes of articulation, outlining the ways in which the institutionalization of the official-history mode was tactically promoted by researchers in symbolic AI due to their belief that resources were being diverted from their projects in order to fund connectionist research. The research-area mode is in agreement with some of the official-history mode’s fundamental assertions about the disputed object: that Minsky and Papert successfully demonstrated the perceptron’s limitations with regard to pattern recognition and performance of the XOR function. The research-area mode departs from the official-history mode quickly when it comes to what the book actually said versus what it is interpreted to have said about multi-layered perceptrons. Additionally, the research-area mode does not attribute the same importance to the book’s publication on the funding of connectionist AI research.
The official history-mode of articulation presupposes that Minsky and Papert needed to show neural network research should be abandoned due to its intrinsic mathematical limitations, and that the funding for futile connectionist projects should therefore be rerouted in order to fund research in their opposing field, symbolic AI. They opined that pattern recognition limitations with the single-layer perceptron might also be encountered with multi-layer perceptrons at the training level. However, the official history mode views this latter conjecture as closed: Olazaran uses the term “impossibility proofs,” stating that Perceptrons was interpreted as a “‘knock down’ proof of the impossibility of perceptrons.”[28] The official-history mode commits two errors. The first is the view that Minsky and Papert’s book successfully demonstrated that Rosenblatt’s perceptron had insurmountable limitations and should therefore be abandoned. The second is that the publication of the Perceptrons in 1969 was such an immediate bombshell that it killed neural network research entirely.
The official-history mode is visible in much recent critical thinking on technology. In Matteo Pasquinelli’s much-cited 2017 essay “Machines That Morph Logic,” the author blindly reiterates the official-history narrative, stating that the mere publication of Perceptrons “had a devastating impact” and that it “blocked funds to neural network research for decades.”[29] Michael Castelle similarly correlates the publication of the book with the death of neural networks. In “Deep Learning as an Epistemic Ensemble,” he states that interest in neural networks only returned slightly in the mid-1980s, but never to pre-Perceptrons levels due to the advent of more complex machine learning systems.[30] These two thinkers are both caught in a mode of articulation which allows for no interpretive flexibility. For them, the disputed object in question belongs to a closed narrative.
Minsky and Papert indeed “set out to kill the perceptron.”[31] However, according to the research-area mode, while Minsky and Papert pointed to difficulties that would be encountered in training multi-layered perceptrons, these were merely difficulties which had already been acknowledged by connectionist AI researchers but were not considered insoluble. This issue is afforded a degree of interpretive flexibility when addressed in the research-area mode of articulation. Many connectionist AI researchers discussed the learning difficulties of multi-layer perceptrons, most significantly Rosenblatt himself. In Principles of Neurodynamics—published eight years before Perceptrons—Rosenblatt outlines many of the weighting obstacles that would be encountered in a perceptron of multiple A-layers, obstacles which are also mentioned by Minsky and Papert.[32] Crucially, Rosenblatt does not mention these issues as impassable, but merely as hurdles whose overcoming requires further research. Further, Bernard Widrow, another prominent connectionist AI researcher at this time, recalls that research into multi-layered perceptrons had already overcome many of the book’s stated limitations by the time of Perceptrons’ publication.[33]
For the research-area mode, the timing of the book’s effects on connectionist AI is also open to interpretive flexibility. Minsky and Papert’s arguments had been circulating in manuscript form well before the book was formally published in 1969. Jack Cowan recalls that even by 1962, research into neural networks had already been slowing down, and that this was only partially due to Minsky and Papert’s criticisms.[34] His recollection would seem to be in conjunction with Widrow’s statement that, by 1969, connectionist research had already advanced well beyond the model of the perceptron that was critiqued in Perceptrons; Minsky and Papert’s arguments were based on models from the early 1960s. Additionally, research on neural networks was in fact not entirely terminated during the so-called “AI winter”; it was simply done outside the field of AI.[35] This was more prevalent in Europe than the United States, however the complete abandonment of neural nets following the book is a specific exaggeration that is embedded within the official-history mode of articulation. Thus, in the research-area mode, the book’s publication was much less of a profound and dramatic event than in the official-history mode.
Connectionism
Though the contemporary discourse around deep learning often obscures the fundamentally connectionist nature of complex neural networks, the perceptron’s basic schema—in which neurons receive energy from a retinal field and trigger a response based on learned weights and biases—is still the basic format for machine learning systems. Today’s deep neural networks are essentially connectionist; these technologies remain within the bounds of Rosenblatt’s foundational architecture.
David Rumelhart and James McClelland published the first of their Parallel Distributed Processing (PDP) books in 1987, taking Rosenblatt’s connectionism as a direct influence.[36] Though their research remained relatively marginal at the time, it inspired a resurgence of interest in neural networks at a time when, due to the institutionalization of the official-history mode, connectionism was seen as discredited in the AI field.[37] Rumelhart and McClelland continued Rosenblatt’s neurophysiologically-grounded research, focusing on the neuron as the fundamental cognitive unit. They viewed the information processing capacities of the human brain as stemming from its parallelism: its ability to perform multiple tasks simultaneously. They sought to update Rosenblatt’s multiple to take advantage of developments in computers which allowed for parallel processing. The PDP group grew to include researchers from other fields such as Geoffrey Hinton and Terence Sejnowski. This expanded group introduced the Boltzmann machine and the still widely used backpropagation algorithm.
In addition to acting as a technical precursor to PDP’s contributions, the perceptron can also be seen as a conceptual underpinning for the entire culture of contemporary machine learning. Adrian MacKenzie’s Machine Learners: Archaeology of a Data Practice uses the perceptron to diagram the working practices of machine learning programmers.[38] In Mackenzie’s text, the perceptron encapsulates the affordances provided by a dynamic working relationship with data. The relatively simple process of teaching a perceptron the Boolean NAND operation demonstrates the necessity for programmers to adapt to the grammars of action of datasets that make demands; the data modifies and reworks the operations being carried out upon it. As opposed to the static dataspace of the Linnaean table, programmers treat data as the operations of numerical functions or vectors. For this reason, most machine learning textbooks and how-tos introduce the perceptron early: programmers must familiarize themselves with the affordances of this relatively simple neural net, as those affordances are subsequently transposed onto more complex systems. In this way, the perceptron is a theoretical archetype of machine learning today.
Convolutional neural nets (CNNs) are a good example of a contemporary image classification model whose architecture is inherited from the perceptron. CNNs have three fundamental features: a convolutional layer encompassing a series of filters, a rectified linear unit (RELU) layer that acts as the neurons, a pooling layer for dimensionality reduction, and a fully connected (FC) layer that classifies patterns based on a small sample of labelled data.[39] Each filter of the convolutional layer is identical to the perceptron’s single retinal field: a filter is composed of a grid of S-units that each focuses on a particular area of the plane, as one filter looks for one pattern. The convolutional layer transmits excitation to the RELU layer based on whether or not a filter’s pattern is found—this layer thus acts as the net’s A-units. Next, a pooling layer is needed to scale the large amount of data (the perceptron does not have an analogue for this layer due to its simplicity). The convolutional layer, RELU layer, and pooling layer may each appear multiple times in many sequences before they connect to the final, FC, layer. The FC layer is similar to Rosenblatt’s R-layer, which learns to classify based on examples and error-driven feedback.
The perceptron can thus be seen as the foundation upon which today’s machine learning systems are built. One significant transformation that has occurred as the perceptron has morphed into complex deep neural networks are the new systems’ changing relations to the indexical. For Rosenblatt’s Mark I Perceptron, the goal for the system was to make an index between a shape and a node, establishing a direct correspondence between a real-world object and the perceptron’s A-units. If Rosenblatt’s team could make an indexical claim, they could demonstrate that their device had some capacity to interact with the world. Today, the stakes have changed. Michael Castelle points to the fact that much of the functional utility around twenty-first-century deep learning rests upon these systems’ ability to perform a transduction from index to symbol.[40] In the case of CNNs, the net takes a real-world object from a database such as ImageNet, performing operations on the object in order to generate a textual caption—a symbolic representation of the object—which describes the data’s content. This index to symbol transduction, a principal application of deep neural nets, could not have occurred had the perceptron not acted as the germ of these operations. The perceptron may be situated as the kernel generating today’s complex deep neural networks and, in this way, ubiquitous image classification models such as convolutional neural nets should be viewed as the result of the perceptron’s ontogeny.
NOTES
[1] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.,” Psychological Review 65, no. 6 (1958): 386–408, https://doi.org/10.1037/h0042519.
[2] Ibid., 387.
[3] Warren S. McCulloch and Walter Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” The Bulletin of Mathematical Biophysics 5, no. 4 (December 1943): 115–33, https://doi.org/10.1007/BF02478259.
[4] Ibid., 129.
[5] John Von Neumann, The Computer and the Brain (New Haven, Connecticut: Yale University Press, 1958), 39–51.
[6] Ibid., 44.
[7] Frank Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (Buffalo, New York: Cornell Aeronautical Laboratory, 1961), 25.
[8] Ronald R. Kline, The Cybernetics Moment: Or Why We Call Our Age the Information Age, New Studies in American Intellectual and Cultural History (Baltimore: Johns Hopkins University Press, 2015), 52.
[9] W. Ross Ashby, Design for a Brain: The Origin of Adaptive Behavior, 1960, v.
[10] Rosenblatt, Principles of Neurodynamics, 9.
[11] Margaret A. Boden, Mind as Machine: A History of Cognitive Science, Volume 2 (Oxford: Oxford University Press; New York: Clarendon Press, 2006), 903.
[12] John C Hay, Ben E Lynch, and David R Smith, “Mark I Perceptron Operators’ Manual” (Buffalo, New York: Cornell Aeronautical Laboratory, 1960).
[13] Ibid., 1.
[14] Ibid., 48.
[15] Christopher M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics (New York: Springer, 2006), 196.
[16] Boden, Mind as Machine, 905.
[17] Hay, Lynch, and Smith, “Mark I Perceptron Operators’ Manual,” 35.
[18] “New Navy Device Learns by Doing: Psychologist Shows Embryo of Computer Designed to Read and Grow Wiser,” New York Times, 7 July 1958, 25.
[19] James A Anderson and Edward Rosenfeld, eds., Talking Nets: An Oral History of Neural Networks (Boulder, Colorado: NetLibrary, Inc., 1999), 100.
[20] Ibid., 154–55.
[21] Marvin Minsky and Seymour A. Papert, Perceptrons: An Introduction to Computational Geometry (Cambridge, Massachusetts: The MIT Press, 1969).
[22] Daniel Crevier, AI: The Tumultuous History of the Search for Artificial Intelligence (New York, NY: Basic Books, 1993), 102.
[23] Stuart J. Russell, Peter Norvig, and Ernest Davis, Artificial Intelligence: A Modern Approach, 3rd ed, Prentice Hall Series in Artificial Intelligence (Upper Saddle River: Prentice Hall, 2010), 730–31.
[24] Russell, Norvig, and Davis, Artificial Intelligence, 22.
[25] Trevor J Pinch, “What Does a Proof Do If It Does Not Prove?,” in The Social Production of Scientific Knowledge (Springer, 1977), 171–215.
[26] Ibid., 173.
[27] Mikel Olazaran, “A Sociological Study of the Official History of the Perceptrons Controversy,” Social Studies of Science 26, no. 3 (August 1996): 611–59, https://doi.org/10.1177/030631296026003005.
[28] Ibid., 629.
[29] Matteo Pasquinelli, “Machines That Morph Logic: Neural Networks and the Distorted Automation of Intelligence as Statistical Inference,” Glass Bead, no. 1 (2017).
[30] Michael Castelle, “Deep Learning as an Epistemic Ensemble,” 15 September 2018, https://castelle.org/pages/deep-learning-as-an-epistemic-ensemble.html.
[31] Jeremy Bernstein, “Profiles: A.I.,” New Yorker, 14 December 1981, https://www.newyorker.com/magazine/1981/12/14/a-i.
[32] Rosenblatt, Principles of Neurodynamics, 577–81.
[33] Olazaran, “A Sociological Study of the Official History of the Perceptrons Controversy,” 634.
[34] Anderson and Rosenfeld, Talking Nets, 108.
[35] Olazaran, “A Sociological Study of the Official History of the Perceptrons Controversy,” 641–42.
[36] David E. Rumelhart and James L. McClelland and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations (Cambridge, Massachusetts: The MIT Press, 1999).
[37] Boden, Mind as Machine, 943–48.
[38] Adrian Mackenzie, Machine Learners: Archaeology of a Data Practice (Cambridge, Massachusetts: The MIT Press, 2017) 21–50.
[39] For Yann Lecun’s original CNN paper, see Yann LeCun et al., “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE 86, no. 11 (1998): 2278–324.
[40] Michael Castelle, “Deep Learning as an Epistemic Ensemble.”
Bibliography
Anderson, James A, and Edward Rosenfeld, eds. Talking Nets: An Oral History of Neural Networks. Boulder, Colorado: NetLibrary, Inc., 1999.
Ashby, W. Ross. Design for a Brain: The Origin of Adaptive Behavior. New York: Springer, 1960.
Bishop, Christopher M. Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer, 2006.
Boden, Margaret A. Mind as Machine: A History of Cognitive Science. Volume 2. Oxford; New York: Oxford University Press; Clarendon Press, 2006.
Castelle, Michael. “Deep Learning as an Epistemic Ensemble.” 15 September 2018. https://castelle.org/pages/deep-learning-as-an-epistemic-ensemble.html
Crevier, Daniel. AI: The Tumultuous History of the Search for Artificial Intelligence. New York, NY: Basic Books, 1993.
Hay, John C, Ben E Lynch, and David R Smith. “Mark I Perceptron Operators’ Manual.” Buffalo, New York: Cornell Aeronautical Laboratory, 1960.
Kline, Ronald R. The Cybernetics Moment: Or Why We Call Our Age the Information Age. New Studies in American Intellectual and Cultural History. Baltimore: Johns Hopkins University Press, 2015.
LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86, no. 11 (1998): 2278–324.
Mackenzie, Adrian. Machine Learners: Archaeology of a Data Practice. Cambridge, Massachusetts: The MIT Press, 2017.
McCulloch, Warren S., and Walter Pitts. “A Logical Calculus of the Ideas Immanent in Nervous Activity.” The Bulletin of Mathematical Biophysics 5, no. 4 (December 1943): 115–33. https://doi.org/10.1007/BF02478259.
Minsky, Marvin, and Seymour A. Papert. Perceptrons: An Introduction to Computational Geometry. Cambridge, Massachusetts: The MIT Press, 1969.
Olazaran, Mikel. “A Sociological Study of the Official History of the Perceptrons Controversy.” Social Studies of Science 26, no. 3 (August 1996): 611–59. https://doi.org/10.1177/030631296026003005.
Pasquinelli, Matteo. “Machines That Morph Logic: Neural Networks and the Distorted Automation of Intelligence as Statistical Inference.” Glass Bead, no.1 (2017).
Pinch, Trevor J. “What Does a Proof Do If It Does Not Prove?” In The Social Production of Scientific Knowledge, 171–215. New York: Springer, 1977.
Rosenblatt, F. “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.” Psychological Review 65, no. 6 (1958): 386–408. https://doi.org/10.1037/h0042519.
Rosenblatt, Frank. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Buffalo, New York: Cornell Aeronautical Laboratory, 1961.
Rumelhart, David E., and James L. McClelland and the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. Cambridge, Massachusetts: The MIT Press, 1999.
Russell, Stuart J., Peter Norvig, and Ernest Davis. Artificial Intelligence: A Modern Approach. 3rd ed. Prentice Hall Series in Artificial Intelligence. Upper Saddle River: Prentice Hall, 2010.
Von Neumann, John. The Computer and the Brain. New Haven, Connecticut: Yale University Press, 1958.