Deep Blue to Watson - A Brief History of Artificial Intelligence

While the quiet revolution was transforming AI's foundations, a series of high-profile showdowns between humans and machines captured the public imagination. These spectacles — a chess match, a quiz show, a Go tournament — became the defining moments of AI for a generation of observers. They were also more complicated than the headlines suggested.

The Match of the Century

On May 11, 1997, world chess champion Garry Kasparov sat across from a tall black cabinet in a conference room at the Equitable Center in New York City. Inside the cabinet was Deep Blue, an IBM supercomputer built specifically to play chess. This was their rematch. The year before, Kasparov had beaten an earlier version of Deep Blue 4-2. This time would be different.

Deep Blue was a brute-force machine. It evaluated 200 million chess positions per second using custom hardware chips designed specifically for chess calculation. Its approach was fundamentally different from how humans play. Kasparov relied on intuition, pattern recognition, and strategic understanding built from decades of study. Deep Blue relied on raw computational power, examining millions of possible futures and selecting the move that led to the best outcomes.

The rematch lasted six games. Kasparov won Game 1, and it seemed like the human champion would prevail again. But Deep Blue won Game 2 — and Kasparov was visibly shaken. In a critical moment of the game, Deep Blue had made a move that seemed deeply strategic, almost human in its subtlety. Kasparov suspected IBM's team had intervened — that a human was guiding the machine's play.

In fact, the move was the result of a bug. Deep Blue's evaluation function had encountered a position it could not assess, and rather than crashing, it had chosen a move essentially at random. But the random move happened to be excellent, and Kasparov's paranoid interpretation of it rattled him for the rest of the match.

Games 3, 4, and 5 were draws. In Game 6, a psychologically damaged Kasparov made an elementary blunder and resigned after just nineteen moves. Deep Blue had won the match 3.5 to 2.5.

The media treated it as a watershed moment: machine beats man at the game that had long been considered the ultimate test of human intellect. But the AI research community had mixed reactions. Deep Blue was not really "intelligent" in any meaningful sense. It did not understand chess. It did not learn from its games. It could not play any other game, let alone carry on a conversation or navigate a room. It was a triumph of specialized engineering — impressive, but not what most researchers would call artificial intelligence.

Kasparov himself put it well in retrospect: "Deep Blue was intelligent the way your alarm clock is intelligent. Not even."

The Paradox of Game-Playing AI

Deep Blue crystallized a pattern that would repeat throughout AI history. Playing chess had long been considered a hallmark of intelligence. The assumption was that a machine that could play chess at the grandmaster level must be intelligent in some general sense.

This turned out to be completely wrong. Chess, despite its complexity, is a well-defined problem with clear rules, perfect information, and a finite (if enormous) search space. These are exactly the conditions under which computers excel. The fact that a computer could play chess said nothing about its ability to perform tasks that are easy for humans but hard for computers — like recognizing a face, understanding a joke, or navigating a crowded sidewalk.

This became known as Moravec's paradox, after the roboticist Hans Moravec, who observed in the 1980s: "It is comparatively easy to make computers exhibit adult-level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility."

The tasks that feel effortless to us — seeing, moving, understanding language, recognizing emotions — are actually the hardest for computers, because they require the kind of intuitive, learned knowledge that evolution has spent millions of years building into our brains. The tasks that feel difficult to us — formal logic, chess, mathematics — are often easy for computers, because they have clear rules and can be solved by exhaustive search.

Deep Blue's victory did not mean AI was close to human-level intelligence. It meant that one very specific kind of problem had been solved. Many harder problems remained untouched.

Watson and Jeopardy!

Fourteen years after Deep Blue's triumph, IBM staged another human-vs-machine spectacle. In February 2011, an AI system named Watson competed on the quiz show Jeopardy! against the show's two greatest champions: Ken Jennings and Brad Rutter.

Watson was a fundamentally different kind of system from Deep Blue. Where Deep Blue solved a well-defined mathematical problem through brute-force search, Watson had to understand natural language questions (presented in Jeopardy!'s famously tricky, pun-laden format), search through millions of documents, and produce answers with enough confidence to risk wagering money.

Watson's architecture combined hundreds of different algorithms — statistical parsers, knowledge bases, information retrieval systems, and machine learning models. When given a clue, Watson would generate hundreds of candidate answers using different methods, then use machine learning to evaluate which candidate was most likely correct. If its confidence exceeded a threshold, it would buzz in.

The system was trained on encyclopedias, dictionaries, news articles, literary works, and other text — about 200 million pages of content stored in its local memory. It did not have access to the internet during the competition.

Watson won decisively, earning $77,147 to Jennings' $24,000 and Rutter's $21,600. Jennings famously wrote on his Final Jeopardy answer: "I for one welcome our new computer overlords."

Watson's victory was more impressive than Deep Blue's in several ways. It required processing messy, ambiguous natural language rather than the clean formalism of chess. It required broad knowledge across thousands of topics. And it required making decisions under uncertainty — Watson had to judge how confident it was before buzzing in.

But Watson also revealed the limits of its approach. It made bizarre errors that no human would make. In one memorable moment, the category was "U.S. Cities," and the clue described an airport and its largest airline hub. Watson answered "Toronto" — a Canadian city. The system had no common sense about geography. It could match statistical patterns but had no understanding of what cities, countries, or airports actually are.

Watson's Afterlife

IBM, riding the publicity wave, attempted to commercialize Watson as a general-purpose AI platform. The company invested billions of dollars in Watson Health, Watson for business, and other applications. The marketing was aggressive — IBM positioned Watson as a transformative AI that could revolutionize healthcare, law, finance, and virtually every other industry.

The results were deeply disappointing. Watson Health, which was supposed to revolutionize cancer treatment, struggled to deliver on its promises. Hospitals that partnered with IBM found that the system required enormous amounts of data preparation, produced recommendations that doctors often disagreed with, and could not handle the messy reality of clinical medicine. MD Anderson Cancer Center ended its Watson partnership after spending $62 million with little to show for it.

The problem was not that Watson's technology was bad. It was that the gap between winning a quiz show and solving real-world problems was enormous. Jeopardy! questions have clear, unambiguous answers. Medical diagnoses do not. Jeopardy! clues come in a standardized format. Clinical data comes in dozens of incompatible formats across thousands of different systems. IBM had taken a spectacular demonstration and tried to generalize it far beyond what the technology could support.

Watson's commercial struggles echoed the expert systems collapse of the 1980s. Once again, impressive demonstrations in controlled settings failed to translate into robust real-world systems. The lesson — that the gap between demo and deployment is always larger than it looks — proved stubbornly difficult for the AI industry to learn.

Meanwhile, in the Real World

While Deep Blue and Watson grabbed headlines, AI was making far less publicized but more consequential advances in everyday applications.

Web search was being transformed by machine learning. Google's ranking algorithms incorporated hundreds of learned signals to determine which pages to show for any given query. By the late 2000s, search engines were handling billions of queries per day using sophisticated AI systems that most users never thought of as AI.

Recommendation systems were becoming ubiquitous. Netflix's recommendation engine, powered by collaborative filtering and matrix factorization, influenced what millions of people watched. Amazon's recommendation system drove a significant fraction of the company's revenue. Spotify used machine learning to create personalized playlists. These systems were not general intelligence, but they were genuine AI applications working at enormous scale.

Computer vision was improving rapidly. Face detection became standard in digital cameras. Optical character recognition (OCR) became accurate enough for practical document processing. Medical imaging systems began to assist radiologists in detecting tumors and other abnormalities.

Autonomous vehicles moved from science fiction to active research programs. The DARPA Grand Challenges of 2004 and 2005 spurred dramatic progress in self-driving technology. In the 2004 challenge, no vehicle completed the course. In 2005, five vehicles finished. By 2009, Google had launched its self-driving car project, and the race to build autonomous vehicles was underway.

Personal assistants began to appear. Apple's Siri, launched in 2011, brought voice-activated AI to millions of smartphones. It was limited — often frustratingly so — but it demonstrated that AI could be a consumer product, not just an enterprise tool.

The Stage Is Set

By 2012, AI was at an inflection point. The quiet revolution in machine learning had produced robust practical systems. Computing power had increased by orders of magnitude. The internet had generated unprecedented volumes of data. Neural network researchers had developed new techniques for training deeper networks.

All that was needed was a catalyst — a dramatic demonstration that the combination of data, compute, and deep neural networks could achieve something genuinely extraordinary. That catalyst arrived in October 2012, at an image recognition competition held in conjunction with a computer vision conference in Florence, Italy. A team of researchers from the University of Toronto, led by a professor who had spent decades in the wilderness of neural network research, was about to change everything.