Hardware for Deep Learning

Dive into the intricacies of Deep Learning hardware through extensive resources. This deep exploration provides an in-depth comprehension of hardware solutions for Deep Learning, equipping you with advanced knowledge and practical insights in this domain. For those new to this journey, there's no requirement for hardware investments.

Google Colab offers cost-effective Deep Learning and ML with a free NVIDIA T4 GPU. As your projects become more computation-intensive, you can explore renting online GPUs. Alternatively, consider Google Colab Pro, which offers access to NVIDIA V100, A100 GPUs.

Articles

A Hitchhiker’s Guide to ML Training Infrastructure (opens in a new tab): By Ay Palat, discusses the impact of hardware on machine learning and the importance of having a good infrastructure to support retraining.
A Ready Deep Learning Hardware Guide (opens in a new tab): By Nir Ben-Zvi, provides a comprehensive guide to understanding the hardware requirements for deep learning, including GPUs, RAM, and storage. The author, shares his expertise as a computer vision consultant, offering recommendations for specific hardware components and discussing the benefits of building a custom deep learning machine.

Explainers

GPUs: Explained (opens in a new tab): The video provides an overview of GPUs, their role in accelerating various computing tasks, including deep learning, explains the architecture of GPUs, their parallel processing capabilities, and their advantages in handling complex mathematical operations required for deep learning models.
How are memories stored in neural networks? (opens in a new tab): Explores neural network memory storage through Hopfield Network, encompassing associative memory for retrieval from incomplete/noisy inputs, enhanced with animations.
CPU vs GPU vs TPU vs DPU vs QPU (opens in a new tab): By Fireship, the video evaluates CPU, GPU, TPU, DPU, and QPU performance while delving into their respective strengths, weaknesses, and operational mechanisms.
A Full Hardware Guide to Deep Learning (opens in a new tab): By Tim Dettmers provides a comprehensive guide to building a high-performance deep learning system. ordered by mistake severity, with the most common mistakes listed first. The author emphasizes the importance of using a GPU for deep learning and provides recommendations for specific models. discusses the role of PCI-Express lanes in deep learning performance and concludes that they have almost no effect on performance.
AI’s Hardware Problem (opens in a new tab) by Asianometry, explains Von Neumann architecture, memory scalability, the concept of "computer in memory," practical limitations, RAM, circuitry, and related topics.
Putting the “You” in CPU (opens in a new tab): By Lexi Mattick & Hack Club, aims to bridge the gap between low-level knowledge and understanding how programs execute on CPUs. It explores topics such as syscalls, program execution, multitasking, provides in-depth explanations and interactive visualizations to help users grasp the inner workings of CPUs and how they process instructions and data.

Courses

Hardware for Machine Learning (opens in a new tab): The EE 290 by University of California, Berkeley, covers current topics of research interest in electrical engineering, focuses on hardware for ML and is designed as a seminar-style class, with students expected to present, discuss, and interact with research papers also includes readings, labs, projects, and course participation, with the option for extra credit through contributions to Piazza, Gemmini/Chipyard code.

Papers

Efficient Processing of Deep Neural Networks: A Tutorial and Survey (opens in a new tab) (2017): Offers an extensive review of energy-efficient deployment of deep neural networks (DNNs) on micro-AI platforms. It covers neural architecture search and practical model compression and quantization strategies in depth.
DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous ML (opens in a new tab) (2014): Introduced DianNao, an energy-efficient accelerator for ML. DianNao's small size and high throughput make it suitable for various devices, including smartphones and wearables.
EIE: Efficient Inference Engine on Compressed Deep Neural Network (opens in a new tab) (2016): Introduces EIE, an efficient inference engine for compressed deep neural networks. The engine is designed to perform inference on compressed networks, which reduces the amount of memory and bandwidth required for inference.
In-Datacenter Performance Analysis of a Tensor Processing Unit (opens in a new tab) (2017): Presents a performance analysis of the Tensor Processing Unit (TPU), which is a custom ASIC designed by Google for ML workloads. The paper shows that the TPU can provide significant performance improvements over traditional CPUs and GPUs for certain types of ML workloads.
Loihi: A Neuromorphic Manycore Processor with On-Chip Learning (opens in a new tab) (2018): Introduces Loihi, a neuromorphic manycore processor with on-chip learning capabilities. The processor is designed to mimic the behavior of biological neurons and synapses, which makes it well-suited for certain types of ML tasks.
Neural Networks in Hardware: A Survey (opens in a new tab): Provides a survey of hardware implementations of neural networks. It covers a wide range of topics, including hardware architectures, memory systems, and design methodologies. The paper also discusses the trade-offs between different hardware implementations and provides insights into future research directions.

Books

Computer Architecture: A Quantitative (5th Edn) (opens in a new tab) is a widely acclaimed book by John L Hennessy & David A. Patterson, recommended by top institutions mainly for electrical engineering students, offers in-depth knowledge of hardware, making it an ideal choice for those seeking a deep understanding of computer architecture.

If you've already explored the preceding resources, this book is optional. It's merely an extra resource for hardware enthusiasts.

Computer Vision Natural Language Processing