The Bittersweet, Forrest Gump-like World of AI, According to Gary Marcus

Gary Marcus is a unique character in the current AI melodrama—he is frequently an antagonist in the drama, but he is also an active protagonist for what he has long held as the right way to build truly intelligent AI systems. And he is not one to shy away from the dramatic or click-inducing turn of phrase – his most recent being “Sam Altman’s pants are totally on fire,” as the headline of an article that details the increasingly well-documented habit of the founder of OpenAI to be persistently “economical with the truth.” He is also known for pointed, often personal takedowns that right what he perceives are the deceptions, misattributions and general malfeasance of notable others in the AI field, including Yann LeCun. It can be difficult to know whether the real Gary Marcus is the progressive cognitive scientist with an active interest in how machines should emulate human learning, or whether it is the headline-grabbing anti-AI establishmentarian, digitally proclaiming that the current AI “emperor” has no clothes. A conversation with him can sometimes feel like being an unwitting participant in the popular 1960s game show, To Tell The Truth, in which, at the end of the show, the host asks the real individual to distinguish themselves from their impostors by standing up.

There is no denying that Marcus courts controversy, but it is arguably because of the manner in which he says things rather than the value of what he says. Marcus’ backstory provides some insight into the essence of the man. Gary Marcus was a child prodigy in the Johns Hopkins Center for Talented Youth, who opted not to finish high school and head to college at the age of 14, having already firmly established his interest in computing and AI. As he concisely tells it, “I first learned to code when I was 10, long before it was popular. I learned on a paper computer, which is basically a simulation of a computer…and that immediately got me interested in artificial intelligence [which] got me interested in cognitive psychology, because I realized all the AI was kind of fake and not really working. So, I got interested in how humans learn language and how they understand the world. That’s what my PhD was about.” His PhD supervisor was the world-renowned cognitive psychologist and linguist, Stephen Pinker, one of the foremost pioneers of the theory that language is an innate faculty of the human mind that originated “to solve the specific problem of communication among social hunter-gatherers,” i.e., that it is a specialized adaptative and instinctive behavior, in the same way as a spider’s web-weaving or a beaver’s dam-building. Marcus’ pugnacious attitude was also apparent early on when, during his graduate work with Pinker, they received 35 single-spaced pages of inchoate comments from a reviewer of an academic paper they had submitted for publication. After some reflection, Marcus proceeded to write a combative 25-page rebuttal, and the die was essentially cast for the rest of his career, the majority of which he spent as a professor of psychology and neural science at New York University.

It is clear that Marcus was heavily influenced by both Pinker and Noam Chomsky, the two bastions of the “nativist” view of language, whose premise is that a unique substrate is required for facile language acquisition, in contrast to the connectionist view that language is learned entirely by experience, using a general cognitive substrate rather than a specialized one. But his interest was always looking beyond the human learning paradigm to the machine or artificial intelligence realm, effectively “hybridizing” across disciplines, “[If] I’ve had an interesting voice in artificial intelligence, it’s really from my training in cognitive science and especially in developmental psychology, developmental neuroscience and so forth. So, I always think of myself as like [Joseph] Conrad writing in English. It’s not his native language, but he’s got a different take than other people, and that makes it interesting. And so I have a different take on AI than most of the people in the field. I’m not completely unique, but I’m one of the most public about trying to take cognitive science and reflect on AI and where we are about those questions.” He also credits this hybrid interest as the reason behind his undergraduate and graduate paths, “I chose to go to Hampshire College because they had a program in cognitive science which was interdisciplinary, then I went to an interdisciplinary program at MIT in brain and cognitive science.”

Marcus summarizes his driving ambition by reference to an expression most frequently associated with Malcolm X, “by any means necessary,” saying, “I want to understand minds and machines by any means necessary…that means philosophy, psychology, linguistics, computer science and so forth.” Indeed, it is reasonable to conclude that Marcus approaches the fight against the current “connectionist-centric” AI, epitomized by LLMs, with as much fervor as the civil rights leader, viewing this as a battle for equitable and logical investment in our human-augmented future.

But Marcus’s position is more complicated—both more nuanced and more conflicted—than his media output would suggest. During our conversation, he starts by arguing that “LLMs are amazing, because you can throw any problem at them, and they get at least some of the problems in that domain right. That’s kind of astonishing.” But he goes on to say that “because they’re shallow, you can never really count on them. The core problem of LLMs is they don’t represent world models.” As a result, LLMs exhibit what he describes as “broad, shallow intelligence, in the sense that they can kind of talk about anything, but whatever they do is superficial, and that’s different from old fashioned narrow [artificial] intelligence, where you’d have a chess computer that would do nothing else.”

Surprisingly, given this dichotomy, he claims complete prescience regarding the current success of LLMs, saying, “I’m not surprised by any of it, because I read a paper by Chomsky and Miller in 1965 and what they showed is by having successively longer statistical abstractions, you could mimic the properties of language without any semantics at all. And they argued, this is not what you want. And they were right. The argument that they made in 1965 still stands, and a lot of what I’ve done is really to extend their argument into large language models. So many people are surprised, even astonished, that a large language model can basically replicate human text, but I’m not, because I understand what it means to have a successively deeper approximation without really having a deeper representation.”

Returning to the subject of the deficiencies of LLM-based approaches, he posits that “it’s the correlational nature of LLMs that ultimately cripples them,” by drawing from what is known about human learning. “We learn correlations with respect to theories,” he says. In other words, we possess a learning substrate – the neocortex – that linguistic nativists argue embeds an understanding that the world consists of enduring objects that travel on connected paths in space and over time, and contains a sense of geometry and quantity, as well as also the underpinnings of an intuitive human psychology, based in part on the pioneering research of cognitive psychologist Elizabeth Spelke.

Daniel Kahneman, in his book Thinking Fast and Slow, observed that “more intelligent people have richer representations of the world,” an observation with which Marcus agrees. “I think Danny is right that richer representations are often extremely helpful and intelligent people often go for richer representations. If you have more superficial representations, you’re not as smart. You could argue that LLMs have very superficial representations, and that limits them,” he says.

In short, the argument goes that we humans use “causal correlations” to understand the world and turn these correlations into concepts that are embedded in theories that we develop as we engage in sensorimotor learning to create rich representations of reality.

As outlined above, central to this is the debate over how language is actually understood and processed in humans. The cognitive linguistics field is composed of one school of thought, pioneered by Noam Chomsky, that the brain has innate language structures that allow language to be generated and spoken or written using a combination of “competence” and “performance” functions. In contrast, the opposing view (school of thought) is that the brain has general-purpose cognitive capabilities that are used to produce language, by observation of the world. The empirical evidence currently points to language being a mixture of the two, with language centers such as Wernicke’s Area used for speech understanding, and Broca’s Area used for speech production, interconnected with other less specialized cognitive processing regions.

Marcus refers to the book he wrote in 2004 called The Birth of the Mind, which he describes as “my own effort to understand kind of the relation between biology and environment and how genes help to construct the brain.” He found that “genes are not a blueprint for the brain in a literal sense, like this goes here, this goes there. They’re really more like a guide to how to grow something, and that guide actually takes environmental inputs. It’s this very elaborate and flexible system, but innateness ultimately means that there’s some kind of way of building something, even in the absence of experience. And then innateness actually guides how we learn things,” including language and, by extension, also how we think.

The absence of these not-well-understood innate human capabilities results in the necessary default choice for LLMs to only rely on statistical correlation, acting on the massive corpus of written human knowledge and across different correlation length scales (via the multiple attention heads of the Transformer architecture). But, as Marcus wittily puts it, recalling a line from the movie, Forrest Gump, “LLMs are like a box of chocolates, you never know what you’re going to get.”

This uncertainty arises because, with every query, the user is exploring the LLM’s complex, non-intuitive latent space in an undefined way, generating a probabilistic answer, based on knowledge derived from the training set that is also undefined for the user. But, in part, the issue is also due to the fact that, as Marcus puts it, “Language is a compressed version of reality and that is its utility, but that is also its weakness,” mirroring similar observations made by David Eagleman in this series.

There is also an evolutionary vulnerability that is being unintentionally exploited, explains Marcus: “LLMs are basically impostors mimicking us. We have very specific tools for looking for what we call conspecifics, other members of the species…so we might be pretty sensitive to, for example, how hairy somebody is, to decide whether they’re human or not…and language is a strong positive diagnostic cue [for Homo sapiens] and we just don’t have that machinery for imposters that are trained on the entire internet—that wasn’t a thing in the period where we evolved.”

As a result we are unable to efficiently detect that these extremely performant language models, “create bullshit in the sense that they have no understanding of what the meaning of something is, but they can mimic the general thing…they fake their way through everything,” argues Marcus, in agreement with the observation by Rodney Brooks, that are LLMs are “bullshitters,” based on philosopher Harry Frankfurt’s canonical definition.

Crossing the Chasm

The conversation then turned to what should be done to advance the field in a way that has more long-term value. In his 2019 book, Rebooting AI: Building Artificial Intelligence We Can Trust, Marcus identifies three gaps that comprise what he calls “the AI Chasm” that must be crossed in order to minimize the risk of AI systems and maximize the utility and reliability:

The Gullibility Gap: Our tendency to anthropomorphize machines and project human abilities onto them
The Illusory Progress Gap: Our tendency to see progress in one narrow area (an easy case) as implying progress in solving the general (hard) case
The Robustness Gap: Our tendency to believe that because something can be shown to be reliable in some cases, the same performance will apply across all cases

These are very reminiscent of Rodney Brooks’ observations about “magical thinking” and anthropomorphic extrapolation of capabilities.

The question naturally arises as to a good way to test AI systems to expose these gaps. The Turing test was widely held to be the definitive test until recently, but as Marcus says, “The test turns out to be a test of human gullibility, rather than a test of intelligence, which is to say an intelligent system might pass the test, but you don’t have to be intelligent to pass it, you just need to fool people.”

Moreover, the increasing ability of LLMs to pass a variety of standard test benchmarks is not a reliable indicator, due to the “training to the test” that occurs, a phenomenon described by Goodhart’s Law – that “once a measure becomes a target, it ceases to be a good measure.”

Marcus points out that one of the fundamental issues with the robustness of current systems is that they cannot survive what he calls “the distribution shift” – being able to correctly and reliably respond to a case that is outside the training data, which is compounded by the fact that there is no clarity on the training data set or methodology that was used, “So you don’t know what’s out of distribution for the system.”

Indeed, a desirable test for intelligence that would allow us to counter the facile seduction of language-based models could be whether they could respond correctly to an outlier, “That would be a good test. It would be a brilliant test, and it might be the best test we have,” says Marcus, but knowing whether an outlier is indeed an outlier is difficult or impossible given that we don’t know what are the outliers, due to the absence of published training data. A recent example of this was what Marcus typically called the “Erdosgate” scandal, in which an OpenAI developer erroneously claimed that GPT5 had solved for one of the famous unsolved “Erdos conjectures,” but in fact the proposed solution was to be found in unreviewed work published in the hinterlands of the internet.

Ultimately, any test of human-like intelligence must arguably be that any such system can reliably and predictably replicate the essential set of human capabilities, which Marcus sees as the ability to:

Understand the world
Understand language
Reason and plan
Adapt to new circumstances (by executing a new plan)
Learn efficiently using abstraction and generalization

He argues that although an intelligent system could have a different set of attributes, the fundaments will likely be the same: “You could imagine a very smart AI system working differently from people, but I can’t imagine a very smart AI system not understanding causality. I can’t imagine a very smart AI system not knowing that there are objects that exist over time and understand object permanence. So, there are some basics that I think any intelligent system would probably have; if we met space-faring aliens, I would expect them to have causality, to have object permanence, to track records of individuals over time. There’s some things that I would expect any species [or system] of some moderate amount of intelligence to have.”

He also returns to his cognitive science roots in his belief that “you need a framework on which to learn. I think that learning is not a homogeneous thing. Like intelligence…learning is not one thing, but many. So we use different kinds of techniques to learn different things. Our brains, unlike some other brains, are built with many different techniques for learning. Some of those are very data-driven and require a lot of data…and some of them you just need one data point.”

Marcus also agrees that the Kahneman-Klein requirement that we have highlighted previously in this series – that expertise is developed through prolonged practice and observation of the regularities – is a strong foundational principle. “I think it is valuable to learn from prolonged practice. [But] sometimes we learn things from very small amounts of data.” He uses a typical Gary Marcus example to illustrate this point, “Let’s suppose that the generative AI bubble collapsed. You would want to immediately take action.” While there is truth to this observation, one could equally argue that the action we would in such circumstances would be based on similarity to a pre-existing theory or set of experiences borne from prolonged exposure to a prior related circumstance.

Clearly, the above neo-nativist arguments are inconsistent with the popular conjecture that scaling current LLM technology will suffice to deliver human-like intelligent solutions. This anti-scaling position has been the cause célèbre of Marcus over the years. He has been a vociferous opponent of the position outlined in computer scientist Richard Sutton’s famous 2019 paper, The Bitter Lesson, in which the bitter lesson of the title is the observation that, in artificial intelligence, approaches that scale with the computational power tend to outperform those based on domain-specific understanding because they take better advantage of Moore’s law. Although this lesson has been widely evangelized based on the recent success of LLMs that adopt precisely this approach, Marcus pithily notes that, “The bitter lesson about the bitter lesson is that it only works for some problems,” typically those employ brute-force search or learning to discover apparent similarities.

In simple terms, scaling approaches apply best to problems for which there is no known formal computational model that provides a sufficiently complete description of the space, so simple pattern-matching, for example of words, or pixels, combined with a massive amount of compute, is the preferred option. Yet, the entirety of the current LLM-centric industry is betting that scaling will prevail in the quest for human-like intelligent systems, despite the logic – and the arguments outlined above and throughout this series – that this cannot rationally be the case.

To continue the “bitter” language argument and building on the Kahneman System 1 vs System 2 framework that has been a foundation of this series, one could conjecture that the bittersweet lesson will be that scaling will deliver System 1-type heuristics, but that System 2 understanding will require different methodologies, based on rich representations of reality.

The Argument for a Neuro-symbolic Future

Marcus believes that the future of AI must be based on a hybrid approach, a position that he has held for more than two decades. “In my 2001 book, The Algebraic Mind, I argued that you need to have both connectionism, which is like Kahneman’s System 1, and symbolic systems—algebraic systems—for System 2, and the field resisted this.” He observes that deep learning is great at learning, but fails in the construction of cognitive models and the principle of “compositionality,” which linguists argue is central to human language creation, i.e., that meaning is derived from the application of a set of rules applied to the constituent parts of language, and with the appropriate context. Conversely, Marcus points out that classical AI incorporates compositionality and the construction of cognitive models but, conversely, is poor at learning outside these confines and at scale.

Simply put, connectionist approaches such as LLMs try to do everything the same way by learning from a blank slate, but Marcus observes that “if artificial intelligence is to be anything like natural intelligence we will need to learn how to build structured hybrid systems that incorporate innate knowledge and abilities that represent knowledge compositionally and keep track of enduring entities/individuals,” with support for abstract reasoning based on symbolic representations, i.e. neuro-symbolic AI.

He points out that formal symbolic logic alone is, however, also insufficient as it can only describe things that are certain, whereas the mathematician Bertrand Russell famously opined that “all human knowledge is uncertain, inexact and partial.” So, humans have derived our “common sense” by sensorimotor experience of the world that we use to build rich representations or world models, most likely by mapping these experiences to theories that are embedded in our neocortex by nature and then modified by nurture.

As a result, Marcus sees the clear need for a change in the research priority, “We need world models as a centerpiece rather than an afterthought,” and highlights the success of models developed by DeepMind for protein folding (AlphaFold) and materials prediction (GNoME) that incorporate scientific knowledge about structure and interactions, as a fundamental component of their construction and operation. Similarly, Yann LeCun’s work on JEPA models, Fei-Fei Li’s startup World Labs and Jeff Hawkins’ company Numenta are also leading examples of efforts in this direction.

However, the difficulty in discovering applicable and robust world models should not be underestimated. As Marcus puts it, “We actually know how to do this [build models] in closed domains where there’s a fixed set of rules. But how do you figure out, how the world actually works? For example, in politics, the rules are not that clear, but we figure out some stuff out about it. Or transportation systems—how do you figure out how to get around Boston? Or New York? How do you build internal models of how things work, how they connect, what people do?” He conjectures that this thorniest set of problems needs to be solved before we approach anything resembling human-like intelligence.

So, What About AGI?

Given the preceding observations, Marcus is understandably skeptical about where we are currently with respect to Artificial General Intelligence or AGI, “We’re climbing a mountain range to get to AGI and we’re up some peak, but we’re going to have to go back down to the valley and climb another peak to actually get there. We have an illusion that because we’re high up in the clouds that we must be close. But there’s a famous saying that I’ve riffed on, which is, ‘building a better ladder doesn’t necessarily get you to the moon’. We have great ladders now that work for some things and not others. We’re not actually that close to an artificial general intelligence that we could trust.”

He also likes to cite aeronautics professor David Akin’s Law 31 of Spacecraft design that “you can’t get to the moon by climbing successively taller trees” and believes that we should view current AI systems as “nerds and idiot savants focused so tightly on what they do that they are unaware of the larger picture” and cannot therefore battle us for supremacy. Referring to the recent doom scenario outlined in the book, If Anybody Builds It, Everyone Dies, he says, “If you look at it in detail, it’s just not plausible. It doesn’t recognize the resilience of humans. It doesn’t recognize the genetic diversity, the geographical diversity… we’re resourceful and have a kind of immune system for the planet.”

In Rebooting AI, Marcus quotes Steven Pinker’s eloquent summary of the AI doomsday scenario, “[This] makes as much sense as the worry that since jet planes have surpassed the flying ability of eagles, someday they will swoop out of the sky and seize our cattle. The…fallacy is a confusion of intelligence with motivation—of beliefs with desires, inferences with goals, thinking with wanting. Even if we did invent superhumanly intelligent robots, why would they want to enslave their masters or take over the world? Intelligence is the ability to deploy novel means to attain a goal. But the goals are extraneous to the intelligence: Being smart is not the same as wanting something.”

The argument is complementary to that outlined by Yann LeCun – that the power of intelligence is overestimated relative to physical or psychological power. Marcus further argues that a super-intelligent human-like system would, by definition, have to understand and abide by human values and goals and compute the consequences of its actions, which is very far from reality for today’s systems. The more pressing risk, he sees is actually AI systems that are “idiot savants with power.” In an attempt to define how we should more formally assess a system’s AGI proximity, he has recently contributed to a paper entitled A Definition of AGI, that uses the Cattell-Horn-Carroll theory of cognition as a basis for quantifying progress towards AGI.

When all is said and done, Marcus is surprisingly utopian (or is it dystopian?) about humanity’s destiny, predicting that “eventually AI will replace every job. We will have a different kind of life that will be of leisure and finding meaning through art and things like that…it’ll really change how we live our lives, work, at least paid work. Probably at some point work will not be the centerpiece of most people’s existence. That may not happen soon, but I think it’s inevitable that will happen someday.”

This dichotomy between frustration with the current progress in AI and excitement about the potential would seem to summarize Gary Marcus in a nutshell—on one hand, he is a self-promoting truth teller, and on the other, he is the former-child prodigy and AI true-believer who is fond of provocative takedowns. There is no doubt that, as colorful a character as he is, the AI debate is richer for his contributions and proclamations, as they provide a much-needed balance to the increasingly breathless and extraordinarily hyperbolic claims and magical thinking of the majority of the primary actors, with the veracity of these latter claims and motives cause for much greater concern and scrutiny. In summary, Marcus’s propensity for self-defense in the form of vociferous, personal criticism aimed at those whom he perceives as having slighted him (or others in his camp) can detract from his substantive contributions that provide balance and a welcome counterpunch to the hyperbole and cult of personality that too often define the AI space in current times.

Source link

Login

Crossing the Chasm

The Argument for a Neuro-symbolic Future

So, What About AGI?

Some Related Posts