One of the most important problems related to the industrial application of artificial intelligence is how to run programs on small computing devices that have very little processing power, very little memory, and are limited in terms of available power in terms of batteries.
The so-called edge market for AI has been a huge area of late, with startups receiving tens of millions in venture capital to come up with chips and software. The Edge effort has led to specialized development tools for machine-learning forms of AI, such as the TinyML initiative from Google.
Those two paths represent two philosophies: make edge devices more powerful or reduce AI programs to use fewer computations.
Also: The true goal of AI may no longer be intelligence
A third possible approach is, and is more carefully balanced by, what work is done in constrained devices. That’s the plan MIT researchers put forward in October in the scholarly journal Science.
In partnership with Nokia and NTT Research at MIT’s Research Laboratory of Electronics, Computer Science and Artificial Intelligence Laboratory, and Lincoln Laboratory, researcher Alexander Sluds developed a system that uses photonics to beam data to a client device. in a much more energy-efficient manner in the optical domain.
Their network setup, called NetCast, can perform the basic operation of manipulating the weights, or parameters, of a deep neural network, using about 10 femtojoules of power, or 10 fJ, “three orders of magnitude less than what’s possible in existing digital CMOS” — that is, standard semiconductor chips.
A femtojoule, written as a decimal point followed by 14 zeros and a 1, is one-quadrillionth, a very small fraction of a joule, the joule being the amount of electricity to run a 1-watt device per second.
That tiny fraction of a watt is a major power saver and important because many edge devices have a total power budget in the milliwatts or thousandths of a watt, versus typical computing devices that use tens or hundreds of watts, the authors note. . Netcast’s femtoJoule operation can effectively get the program under “a stubborn barrier near 1 pJ,” or one picoJoule, or one trillionth of a joule.
Also: AI edge chip market ablaze with ‘destabilizing’ VC funding
The key to Netcast is how to minimize the work the client needs to do for the basic operation of the neural net to get within that 10-femtojoule budget.
A neural net makes predictions by passing some input data to its parameters or weights and multiplying the input by the weights. That mathematical operation, the product of the input vector and parameter matrix is called a multiply-accumulate, or MAC, operation, and neural net programs apply tons of them to the input every second. .
Typically the biggest power hog for most neural nets is fetching data from the RAM memory chips and accessing the network. This is a problem because neural weights are typically stored in RAM, so each layer of MAC operations requires multiple trips over the PCIe bus to RAM and possibly even to a network line card for remote memory stores.
Therefore, the key to Netcast is how to minimize memory access and network traffic for the client device.
The solution is a current photonic technology called wavelength division multiplexing. Using WDM, as it is commonly referred to, multiple pieces of data can be sent simultaneously over a fiber-optic line by assigning each piece of data its own wavelength of light, so that the multiple pieces of data share the entire available radiation spectrum in the fiber. WDM is a very mature, robust technology that is used in all modern telecom networks to increase fiber-optic data transmission capacity; It forms the backbone of the Internet.
Each row of the matrix can be encoded on a wavelength of light and then “transmitted” to the client device, so that a multi-wavelength WDM signal can send the entire weight matrix or even multiple matrices. At the client device, an optical receiver retrieves the encoded data at each wavelength and combines it with the input data to perform matrix multiplication in the optical domain rather than electrically. The output can then be stored electrically in local RAM after being converted from an optical signal.
Sluds and team This results in a dramatic simplification of the components that must be present in a client device at the edge.
“This architecture minimizes active components at the client, requiring only a single optical transceiver modulator, digital-to-analog converter (DAC), and analog-to-digital converter (ADC).”
The authors created an actual version of Netcast using 84 kilometers of fiber using WDM with a capacity of 2.4 terabits per second, running from the main MIT campus to Lincoln Lab and back. Their system test was a classic machine learning task, performing predictions on the MNIST database of handwritten letters. Images of handwritten characters are input to a neural net, and the net must perform an image recognition task of identifying which character each image represents.
“Using locally 1,000 test images, we demonstrate an accurate calculation of 98.7%, comparable to the model’s baseline accuracy of 98.7%,” they report.
The authors go further. Anticipating deployment on satellites and other exotic locales, they came up with photodetectors called integrating receivers that could work with very small numbers of photons.
“Applications of netcast, including free-space deployment to drones or spacecraft, can operate in deep photon-starved environments,” they write. Their version of integrating receivers can only detect the results of MAC operation in fractions of a femtojoule, called an attojoule, which requires only 100 photons for MAC operation.
But the authors go further. They were able to go to the theoretical limit of the netcast, where each Mac needs to detect a single photon. Using so-called superconducting nanowire single-photon detectors (SNSPDs), they build a receiver that can measure the results of each MAC with less information than a photon.
“This result may seem surprising at first because a MAC is less than a single photon negative,” Sluds and team wrote. “We can better understand this measurement by noting that in the readout, we performed a vector-vector product with M = 100 MACs. Each MAC has less than a single photon in it, but the measured signal contains many photons.”
The implications of computing can be profound.
“The realization of computing with less than one photon per MAC could enable a new class of computing systems that protect both client input and server weight data from a data privacy perspective,” they write. This makes computing on spacecraft more reliable. “Weight data from the directional base station can be transmitted to the spacecraft and the results classified on the craft before being transmitted back to Earth.”
All of NetCast’s components can be manufactured in any standard semiconductor chip factory today, Sluds and team noted.
In conclusion, they write, “Our approach removes a fundamental barrier to edge computing, enabling high-speed computing on deployed sensors and drones.”