Computer Vision

Dive into the realm of computer vision, a pioneering domain of AI that enables machines to interpret visual data from images and videos. Through computer vision, explore groundbreaking applications such as facial recognition, autonomous vehicles, medical imaging, and quality control in manufacturing.

Courses

Introduction to Computer Vision (opens in a new tab): By Udacity, provides a thorough introduction to key computer vision concepts like image formation, feature detection, and motion tracking. Taught by industry experts, it's the perfect starting point for aspiring computer vision enthusiasts.
Advanced Computer Vision with TensorFlow (opens in a new tab): By Coursera, delves into image classification with CNNs and transfer learning, image segmentation techniques, and object localization or detection using bounding boxes and algorithms.
Self-Driving Cars Specialization (opens in a new tab): By The University of Toronto's Coursera program offers a comprehensive understanding of state-of-the-art engineering practices in the self-driving car industry, addresses crucial terminology, design aspects, safety evaluation in the self-driving car domain, provides an in-depth exploration of hardware and software architectures, particularly delving into software stack components. It includes a project where students program a self-driving car to navigate the CARLA simulation environment. Additionally, it covers state estimation, motion planning, perception, deep learning, sensor fusion, and localization and mapping, with each course ending with a project to reinforce knowledge and skills. This specialization comprises four courses: Introduction to Self-Driving Cars, State Estimation and Localization for Self-Driving Cars, Visual Perception for Self-Driving Cars and Motion Planning for Self-Driving Cars.

Explainers

AI for Full-Self Driving at Tesla (opens in a new tab) by Andrej Karpathy discusses the application of neural networks in Tesla's Autopilot system and the challenges of deploying them at scale.
Computer Vision: How AI Learns to See (opens in a new tab) Computer scientist Alexei Efros, inspired by personal vision challenges, elucidates the process of training AI systems to discern and interpret visual content, encompassing images and videos.
How we teach computers to understand pictures (opens in a new tab) by Fei Fei Li, discusses her work in instructing computers to perceive and comprehend images. She outlines the utilization of deep learning algorithms to educate computers in object recognition within images, highlighting its potential applications in healthcare, education, and robotics.

Guides

Kaggle's Computer Vision (opens in a new tab): Offers a beginner-friendly introduction to the field, covering image processing topics. They also provide deep learning and computer vision projects, enhancing practical experience. Kaggle's robust tools and resources, along with collaborative competitions, help you grow in computer vision. Ideal for beginners and those expanding their skills.
OpenCV (opens in a new tab): Provide a robust resource for computer vision and image processing. Topics include OpenCV installation, core functionality, image manipulation, high-level GUI, and diverse modules like ML, object detection, and GPU-accelerated computer vision. These tutorials accommodate Python, C++, and JavaScript developers, available in different versions, empowering learners to explore a wide range of computer vision techniques

Books

Programming Computer Vision with Python (opens in a new tab): By Jan Erik Solem, offers a hands-on approach to computer vision, teaching techniques like object recognition and 3D reconstruction, includes code samples, exercises, and covers diverse topics. The book is ideal for those with basic programming skills and integrates OpenCV through a Python interface.
Computer Vision: Algorithms and Applications (opens in a new tab): By Richard Szeliski is a comprehensive book covering various computer vision topics and real-world applications, including medical imaging, consumer-level tasks, providing complete code samples, explanations, and exercises. Based on Szeliski's courses at top universities, it serves as a valuable resource for computer science and engineering students.
Learning OpenCV 3: Computer Vision in C++ (opens in a new tab): By Adrian Kaehler & Gary Bradski is a practical guide to computer vision using OpenCV. It covers essential tools, tracking, and qualitative analysis, offering insights into 3D reconstruction, provides hands-on learning with code examples and exercises, relevant to diverse fields like automation and IoT.

Papers

AlexNet (opens in a new tab) (2012): This convolutional neural network architecture was one of the first large CNNs to significantly outperform traditional computer vision methods on the ImageNet dataset. It demonstrated the power of deep learning for computer vision.
Rich feature hierarchies for accurate object detection and semantic segmentation (opens in a new tab) (2014): Introduced the R-CNN algorithm for object detection. It uses region proposals and CNN features to detect objects in images. R-CNN significantly improved detection accuracy and kicked off the rapid progress in object detection research.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (opens in a new tab) (2015): Introduced improvements to R-CNN with their Faster R-CNN algorithm. By introducing a Region Proposal Network, Faster R-CNN sped up object detection significantly compared to R-CNN.
Mask R-CNN (opens in a new tab) (2017): Presented an extension to Faster R-CNN by adding a branch for predicting an object mask in parallel with the bounding box. This Mask R-CNN architecture enables detecting and segmenting objects in one model.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (opens in a new tab) (2019): A convolutional neural network architecture using a scaling method that balances network depth, width, and resolution. EfficientNet achieved state-of-the-art accuracy on ImageNet with significantly fewer parameters.

Deep Learning Hardware