Image Generation
AI-driven image generation harnesses artificial intelligence to craft visual content, encompassing images and artwork. These systems possess the capability to generate images from textual descriptions, enhance existing photos, or craft entirely new visuals by leveraging diverse inputs and preferences. Additionally, they can generate images from other images, further extending their creative potential.
Courses
-
Diffusers (opens in a new tab): Hugging Face's Diffusers library is versatile, simplifying work with state-of-the-art diffusion models. Key features include pre-trained diffusion pipelines for image, audio, and 3D structure generation, customizable noise schedulers, practical guides, support for both inference and training, and a focus on usability. The library supports various tasks and offers resources on ethical guidelines and safety implementations.
-
Image Prompting (opens in a new tab) by Learn Prompting, offers an open-source course on Image Prompting techniques for both beginners and professionals.
Articles & Papers
-
Image GPT (opens in a new tab) (2020): A transformer model, processes images as pixel sequences, enabling it to recognize 2-D features like objects and categories. Its applications span image captioning, classification, and analysis.
-
High-resolution image synthesis with latent diffusion models (opens in a new tab) (2021): A research paper introduces a method for high-resolution image synthesis through latent diffusion models, breaking down the process into sequential denoising autoencoders. It attains top-tier results on image synthesis benchmarks.
-
DALL·E: Creating images from text (opens in a new tab) (2021) is an AI system that crafts images from text descriptions. It's a transformer model trained to generate images in high resolution and offers creative applications. (paper)
-
CLIP: Connecting text and images (opens in a new tab) (2021) by OpenAI connects text and images, bridging computer vision and natural language processing. Motivated by addressing challenges in deep learning and enabling zero-shot transfer, CLIP's evaluation is robust. However, it requires careful prompt engineering, and concerns regarding biases persist. (paper)
-
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (opens in a new tab) (2022): by Goggle, unveils Imagen, a text-to-image diffusion model known for its remarkable photorealism and deep language comprehension. It employs a diffusion process, iteratively adding noise to a latent representation to generate images. Imagen is trained on a substantial text-image dataset, delivering top-tier performance on various image synthesis benchmarks. (website)
-
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (opens in a new tab) (2022): by Google, The paper suggests a technique to fine-tune subject-driven text-to-image diffusion models by introducing new tokens in the embedding space of a frozen model. (website)
-
Adding Conditional Control to Text-to-Image Diffusion Models (opens in a new tab) (2023): The paper introduces a technique for incorporating conditional control into text-to-image diffusion models, enabling image generation based on specific conditions like object presence or styles.
-
DALL·E 3 (opens in a new tab) (2023): OpenAI's latest AI system, creates lifelike images from text descriptions. An evolution of DALL-E 2, it's more potent with improved editing, vivid concept visualization, and enhanced realism. This exemplifies AI's swift advancement in creative tasks, supporting multi-modal AI research and applications. (paper)
-
How Stable Diffusion Works (opens in a new tab): This post is the computer vision counterpart to the previous one. Chris McCormick provides a simple explanation of Stable Diffusion and builds intuition for text-to-image models. For a more accessible introduction, see the comic on r/StableDiffusion.
Explainers
-
How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile (opens in a new tab) by Dr Mike Pound, elaborates on the core working principle of image generation, specifically focusing on Stable Diffusion and DALL-E. Stable Diffusion employs a diffusion process to create high-quality images.
-
How Stable Diffusion Works (AI Image Generation) (opens in a new tab) by Gonkee, explains how Stable Diffusion works for generating high-quality images.
Advancements
-
MiniGPT-4 (opens in a new tab), a multimodal model, combines visual information with language tasks. It adeptly generates image descriptions, answers questions about image content, and performs diverse tasks like website creation from drafts and writing stories inspired by images. MiniGPT-v2, an enhanced version, efficiently processes high-resolution images, making it ideal for various vision-language multi-task learning applications.
-
SynthID (opens in a new tab) by DeepMind, a tool that watermark images and identifies synthetic ones created by Imagen, a text-to-image model. SynthID promotes responsible use of AI-generated content, enabling Vertex AI customers to confidently create and identify such images. Combining two deep learning models, it's anticipated to expand beyond image identification to include audio, video, and text as it evolves with other AI models.
Reference
- Camenduru's 3D ML Papers (opens in a new tab): GitHub page featuring repositories dedicated to 3D machine learning papers.
Multi-dimensional Image Generation
Papers
-
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (opens in a new tab) (2020): NeRF uses deep neural networks to synthesize new views of complex scenes, excelling in this task. Its 5D function maps 3D space and 2D viewing directions to radiance values. NeRF finds applications in robotics, urban mapping, virtual reality, and more. (website)
-
DreamFusion: Text-to-3D using 2D diffusion (opens in a new tab) (2022): presents an innovative approach for creating 3D models from text descriptions using deep learning. It employs a frozen diffusion model to generate plausible images from text and introduces "Sample-Driven Synthesis" (SDS) for 3D model generation from 2D images. The methods outperform existing approaches and are valuable for researchers in text-to-3D and image-to-3D synthesis. (website)