The first big bang in 2012
The big bang in artificial intelligence (AI) marks a breakthrough in 2012, when a team of researchers led by Geoff Hinton trained an artificial neural network (called a deep learning system) to win an image classification competition by a surprising margin. . Before this, AI did some remarkable feats, but it didn’t make much money. Since 2012, AI has helped big technology companies make enormous fortunes, not least from advertising.
Second Big Bang in 2017?
Has there been a new big bang in AI since Transformers came out in 2017? In episodes 5 and 6 of the London Futurist Podcast, Alexa Gordic explores this question and explains how today’s cutting-edge AI systems work. Alexa is an AI researcher at DeepMind and previously worked on the Microsoft HoloLens team. Remarkably, his AI skill is self-taught – so there’s still hope for us all!
Transformers are deep learning models that process inputs expressed in natural language and produce outputs such as translations or summaries of texts. Their arrival was announced in 2017 when Google researchers published a paper called “Attention is All You Need.” The title refers to the fact that transformers can “attend” to a large corpus of text simultaneously, whereas their predecessors, recurrent neural networks, could only attend to symbols on either side of the segment of text being processed.
Transformers work by breaking text into smaller units called tokens and mapping them onto high-dimensional networks – often thousands of dimensions. We humans cannot imagine this. The space we live in is defined by three numbers – or four, if you include time, and we can’t imagine a space with thousands of dimensions. Researchers suggest we shouldn’t even try.
Dimensions and vectors
For transformer models, words and tokens have dimensions. We can think of them as properties or relationships. For example, “male” means “king” and “female” means “queen”. These concepts can be expressed as vectors as arrows in three-dimensional space. The model attributes a probability to a particular token associated with a particular vector. For example, a princess is more likely to be associated with a vector representing “wearing a slipper” than a vector representing “wearing a dog.”
There are different ways that machines find relationships or vectors between tokens. In supervised learning, they are given enough labeled data to represent all relevant vectors. In self-monitored learning, they are not given labeled data and have to find their own relationships. This means that humans cannot necessarily discover the relationships they discover. They are black boxes. Researchers are investigating how machines perform these measurements, but aren’t sure if the most powerful systems are truly transparent.
parameters and synapses
The size of a transformer model is usually measured by the number of parameters it contains. A parameter is analogous to a synapse in the human brain, which is the point where the tendrils (axons and dendrites) of our neurons meet. The first transformer models had a hundred million or more parameters, and now the largest have trillions. This is still smaller than the number of synapses in the human brain, and human neurons are far more complex and powerful organisms than artificial ones.
Not just by text
A surprising discovery a few years after Transformers came out was that they could tokenize not just text but images as well. Google released the first Vision Transformer in late 2020, and since then people around the world have marveled at the output of Dall-E, MidJourney and others.
The first of these image-generation models are generative adversarial networks, or GANs. These are twin models, one (the generator) generates images designed to trick the other into accepting it as genuine, and the second system (the discriminator) rejects insufficient attempts. GANs are now superseded by diffusion models, whose approach is to peel the noise from the desired signal. The first diffusion model was actually described as long ago as 2015, but the paper was almost completely ignored. They were rediscovered in 2020.
Transformers are gluttons for compute power and energy, and this has led to concerns that this could represent a dead end for AI research. Academic institutions are already finding it difficult to fund research into the latest models, and even tech giants fear they may soon find them unaffordable. The human brain points the way forward. Not only is it larger than the latest Transformer models (about 80 billion neurons, each with 10,000 synapses, 1,000 times larger). It’s also a more efficient user of energy – mainly because we only need to activate a fraction of our synapses to perform a given computation, whereas AI systems activate all of their artificial neurons all the time. Neuromorphic chips that mimic the brain more closely than classic chips may help.
Alexa often surprises what the latest models can do, but this one isn’t surprising. “If I’m not surprised, that means I can predict the future, which I can’t.” He derives pleasure from the fact that the research community is like a beehive: you never know where the next idea will come from. The next big thing may come from two students at a university and a researcher named Ian Goodfellow created the first GAN by playing around at home after mulling over a couple of beers.