Ultra-high bandwidth optical neuromorphic processing with Kerr soliton
crystal microcombs: Scaling the network to the PetaOp regime
Abstract
Artificial neural networks (ANNs) are able to distill the hierarchical
aspects of raw data. They are central to machine learning functions
including speech and pattern recognition, medical diagnosis, playing
board games, computer vision, and many other areas [1-7]. Optical
neural networks (ONNs) in particular can significantly increase the
computing speed of ANNs in order to overcome the intrinsic bandwidth
bottleneck of electronics. Convolutional neural networks (CNNs), are
inspired by biological systems such as the visual cortex, and are a
powerful approach to greatly reduce the parametric network complexity in
order to enhance the accuracy of the predictions of the system. In this
paper, we demonstrate a universal optical convolutional accelerator that
can be used in conjunction with both electronic and optical neural
networks. It operates beyond 10 Tera-OPS (TOPS - operations per second)
and produces convolutions of extremely large scale images of 250,000
pixels in size with a resolution of 8-bits. It generates 10 convolutions
simultaneously in parallel, with 10 different kernels. This processing
simultaneously — enough for facial image recognition. After
demonstrating this, we then use the exact hardware to form a
convolutional neural network consisting of a convolutional front-end
followed by a deep optical neural network fully connected layer,
together forming a CNN with ten neurons at the output. We successfully
perform the recognition of all 10 hand written digits, each consisting
of 900 pixel handwritten digit images. We achieve an accuracy of 88%
which is very close to the theoretical accuracy of 90%. We use an
approach that exploits the simultaneous multiplexing, or interleaving,
within the time, space and wavelength dimensions, using an optical
frequency comb supplied by an integrated Kerr micro-comb source. We
compare the performance of different optical neural networks, explicitly
showing that our approach is intrinsically scalable in both size and
speed, up to the PetaOPs per second (POPs) regime in speed and to well
over 24,000 synapses in size. We perform theoretical evaluation of the
scaled system performance and show that it is trainable to much more
complex networks for real-world demanding applications including
real-time video recognition and autonomous unmanned vehicle control.