Modern deep neural networks exhibit incredible performance on a variety of complex tasks, such as recognition, classification, and natural language processing. Adapting to ever-increasing workloads, deep learning algorithms have become extremely compute- and memory-intensive, making them infeasible for deployment on compact, embedded platforms with power and cost budget limitations.
GPUs are commonly used to accelerate deep learning algorithms as they demonstrate extreme compute density, high memory bandwidth, and ease of programming. On the other hand, modern FPGA devices contain several thousand digital signal processing blocks, several thousand distributed on-chip memory blocks, and more recently, a few in-package High-Bandwidth Memories (HBM). Compared to GPUs, FPGAs exhibit higher energy efficiency and optimally accelerate irregular neural network structures, such as quantized and pruned networks.
This talk will discuss the strengths and weaknesses of FPGAs as a deep learning platform. Two recent deep learning techniques where FPGAs are superior to GPUs are presented. The first is Approximate Nearest Neighbor (ANN) search, the keystone of sparse memory-augmented neural networks. Hierarchical Product Quantization (HPQ) is introduced to accelerate ANN search by x250 compared to other GPU-based approaches. The second technique is FGIE, a Fine-Grained Inference Engine. FGIE efficiently utilizes online Most-Significant Digit First (MSDF) arithmetic to early terminate computation, thus avoids ineffectual computations. This technique successfully improves energy efficiency by x1.8, enables layer-wise mixed-precision operations, and reduces memory traffic. Furthermore, Boveda, a simple and effective on-chip lossless memory compression technique is presented. Boveda achieves 53% memory footprint reduction by exploiting the value distribution of deep learning applications.
The rest of the talk will provide a glimpse into current and future research on hardware-accelerated deep learning, this includes, (1) automation: raising the design abstraction to the software level using high-level synthesis, thus enabling software developers to exploit FPGAs as a deep learning platform, (2) architecture: in-memory computation and logarithmic numbers system, specifically in conjunction with online MSDF arithmetic, and (3), applications: portable biomedical systems for processing high-resolution brain neural activity for diagnostic, treatment, augmentation and repair of brain function.
Ameer Abdelhadi is a research scientist at the University of Toronto and a co-founder and CTO at xCELLeration, a multidisciplinary collaboration effort in developing portable systems for processing high-resolution brain neural activity for diagnostic, treatment, augmentation, and repair of brain functions. Prior to the University of Toronto, Ameer has been a research fellow at Imperial College London, a lecturer, and a postdoctoral fellow at Simon Fraser University and the University of British Columbia. He earned a Ph.D. in computer engineering from the University of British Columbia in 2016. Before his graduate studies, Ameer held design and research positions in the semiconductors industry. His research interests lie in the area of reconfigurable computing, application-specific custom-tailored acceleration, hardware-efficient machine learning, and computational neuroscience. His research is published in top-tier computer architecture and reconfigurable computing venues, won best paper awards, and is directly benefiting leading technology vendors and research community.