Deep dive into LLM

Embark on an in-depth exploration of LLMs with our comprehensive resources. This deep dive offers a thorough understanding of LLMs, equipping you with advanced knowledge and practical insights in the field.

Courses

LLMs: Foundation Models from the Ground Up (opens in a new tab) by Databricks, provides an in-depth exploration of foundational models in LLMs, highlighting key innovations that fueled the rise of transformer-based models like BERT, GPT, and T5. It also covers advanced techniques, such as Flash Attention, LoRa, and PEFT, contributing to the ongoing enhancements of LLM capabilities, including applications like ChatGPT.
Mlabonne's LLM Course (opens in a new tab) is a comprehensive guide to LLMs, featuring a roadmap, notebooks, and articles covering various aspects. The course includes topics like LLM training, inference optimization techniques, building frameworks, and more. It offers a step-by-step guide for entering the world of large language models. (website)
Neural Networks: Zero to Hero (opens in a new tab) by Andrej Karpathy, where you'll embark on a journey to build neural networks from the ground up, all through code. Starting with the fundamentals of backpropagation, progress to crafting cutting-edge deep neural networks like GPT. Language models serve as an excellent entry point into deep learning, with skills transferable across domains, making them our primary focus. (website)

Explainers

How Neural Networks Learned to Talk: A 30 Year History (opens in a new tab): This video explores the evolution of language models, from modest beginnings to the development of OpenAI's GPT models and hints at Q*. The journey delves into key moments in neural network research focused on next-word prediction, highlighting early experiments with small language models in the 1980s and significant contributions.
LLMs Process Explained what makes them tick & how they work (opens in a new tab): By AemonAlgiz, covers foundational concepts like softmax, tokenization, embedding, and positional encoding. Dive into the magic of attention and multi-attention heads for enhanced comprehension, all presented with accessible clarity and depth, suitable for AI enthusiasts at all levels. It also addresses challenges and ongoing efforts in the field, such as accent adaptation and enhancing ASR system quality. If you're looking to delve into the details, then contemplate Jason Wei's docs on LLMs.
LLM Visualization (opens in a new tab): Experience a 3D visualization and walkthrough of the LLM algorithm powering OpenAI's ChatGPT. Dive into the intricacies, exploring each addition and multiplication, and witness the entire process in action.
Prompt injection: What’s the worst that can happen? (opens in a new tab): Prompt injection represents a significant security concern within LLM applications, and while there is no flawless remedy, Simon Willison provides a comprehensive explanation of this issue in his post. Simon consistently produces exceptional content on AI-related topics.

Articles

ChatGPT Explained: A Normie's Guide To How It Works (opens in a new tab) by Jon Stokes, An overview of ChatGPT, focusing on core concepts. Topics include token window, training data, rules, and interactive token usage for improved conversation-like interactions. It clarifies its structure without anthropomorphism.
The Scaling Hypothesis (opens in a new tab): Explore the Scaling Hypothesis, a captivating theory that posits larger AI models outperform smaller ones with ample data and resources. Delve into its impact on language models like GPT-3, controversies, applications, and ongoing debates among researchers. Discover the potential for achieving human-level or superhuman AI, and how organizations like EleutherAI are actively testing its limits through open-source models.
Building LLM applications for production (opens in a new tab): Chip Huyen explores several significant hurdles encountered in developing LLM applications, offers solutions for tackling them, and highlights the most suitable use cases for these applications.
Chinchilla's wild implications (opens in a new tab): This post delves into language model scaling laws, particularly those from the DeepMind paper introducing Chinchilla. Chinchilla, with 33-B parameters, defies the Scaling Hypothesis, highlighting the multifaceted role of factors like model architecture and data curation in performance.
GPT-4 (opens in a new tab): OpenAI's latest milestone, is a versatile multimodal model accepting text and image inputs, excelling in creative and technical writing. It generates, edits, and collaborates with users. It handles over 25k words, making it suitable for long-form content, conversations, and document analysis. Although advanced, it may have occasional reasoning errors and gullibility.
What Is ChatGPT Doing … and Why Does It Work? (opens in a new tab) by Stephen Wolfram, traces the development of AI from simple neural networks to complex language models like ChatGPT that leverage massive datasets and computing power to produce remarkably natural conversational text, giving insight into the inner workings and capabilities of modern AI.
The Waluigi Effect (opens in a new tab): Delves into the Waluigi Effect and unusual "semiotic" occurrences in large language models like GPT-3/3.5/4 and their variants (ChatGPT, Sydney), providing mechanistic insights.
New models and developer products announced at DevDay (opens in a new tab): OpenAI launches GPT-4 Turbo (128K context, lower prices), Assistants API for agent-like experiences, DALL-E 3 API, Whisper v3 ASR model, user-friendly GPT customization, Custom Models program, and reduced platform prices.

State of LLMs

Intro to LLMs (opens in a new tab) by Andrej Karpathy, provides a general-audience introduction to Large Language Models, the key technical element in systems like ChatGPT, Claude, and Bard. It covers their nature, future directions, analogies to current operating systems, and touches on security challenges in this emerging computing paradigm.
State of GPT (opens in a new tab) by Andrej Karpathy, Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Dive deeper into practical techniques and mental models for the effective use of these models, including prompting strategies, finetuning, the rapidly growing ecosystem of tools, and their future extensions.

LLM Benchmarks

Chatbot Arena Leaderboard (opens in a new tab): An innovative benchmark platform designed for LLMs. Elo rating system based leaderboard, inspired by competitive games like chess, encourage the entire community to participate by submitting new models, evaluating their performance, and engaging in the exciting world of LLM battles. (paper) (website)
Open LLM Leaderboard (opens in a new tab): A ranking by Hugging Face, comparing open source LLMs across a collection of standard benchmarks and tasks. It enables transparent comparisons on metrics like accuracy and compute efficiency across models to help guide appropriate model selection and usage for various applications.
Stanford HELM Leaderboard (opens in a new tab): HELM is a dynamic language model benchmark, providing comprehensive coverage and addressing historical gaps in AI evaluations. It benchmarks models rigorously under standardized conditions, using a top-down approach to facilitate systematic scenario and metric selection.
Multi-task Language Understanding on MMLU (opens in a new tab): A comparison between LLMs on the Multi-task Language Understanding (MMLU) dataset. Assessing performance across 57 diverse tasks, from math to law, MMLU measures measuring multitask accuracy, identifying shortcomings, and tracking progress in language understanding. (paper) (code)

Explorations

AutoGPT (opens in a new tab): An open-source demonstration of GPT-4's capabilities. It benchmarks agent performance, offering internet access, memory management, and text generation. It can complete tasks with minimal human intervention and self-prompt for various requests. (website)
BabyAGI (opens in a new tab) is an open-source Python library designed for training and assessing AGI agents in the BabyAI environment. This environment presents simple text-based games where agents must learn navigation and interaction to achieve goals. BabyAGI provides a user-friendly framework for working with the BabyAI environment and offers a collection of pre-trained models to kickstart training.
MemGPT (opens in a new tab) expands context in LLMs by managing memory tiers and using interrupts for user interaction. It can analyze large documents, enabling conversational agents to evolve during long-term interactions. It's an OS-inspired system for extended context within LLMs. (paper)
Ollama (opens in a new tab) is a versatile software tool for running LLMs like Llama 2 on your local computer. It streamlines setup, optimizes GPU utilization, and consolidates model components into a single package, allowing for easy customization and model creation. (website)
Open Interpreter (opens in a new tab) enables LLMs to execute code on a user's computer for various tasks. It offers a natural-language interface for tasks like photo and video editing, PDF creation, Chrome browser control, and data analysis. Users interact with Open Interpreter through a ChatGPT-style terminal interface, allowing local execution of Python, JavaScript, Shell, and more.
AutoGen (opens in a new tab) is a versatile framework for LLM applications, facilitating the creation of conversational agents capable of collaborating on tasks. These customizable agents support human interaction, operate in diverse modes utilizing LLMs, human inputs, and tools.
GPT-FAST (opens in a new tab): A PyTorch-native transformer text generation model by PyTorch Labs, is a lightweight and efficient tool for text generation and evaluation. Optimized for on-device LLM inference and 4-bit quantization performance on various hardware.

Advancements

ReAct: Synergizing Reasoning and Acting in Language Models (opens in a new tab): A novel paradigm fuses reasoning and acting in language models. It excels in language reasoning tasks, producing verbal traces and text actions simultaneously, enhancing dynamic reasoning, and adapting to external input for improved performance. (paper)
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models (opens in a new tab): MetaMath is a project for enhancing mathematical questions for language models. It builds the MetaMathQA dataset and fine-tunes LLaMA-2 models, creating specialized mathematical reasoning models. Results show MetaMath's significant performance lead on GSM8K and MATH benchmarks, even surpassing models of the same size. (model) (code) (paper)
Alpaca: A Strong, Replicable Instruction-Following Model (opens in a new tab) (2023): Alpaca 7B, a model fine-tuned based on the LLaMA 7B model using 52k instruction-following demonstrations, exhibits qualitative similarity to OpenAI's text-davinci-003 in single-turn instruction following during our initial assessment. Remarkably, Alpaca maintains a compact size and is straightforward and cost-effective to replicate. (code) (paper)

Insights

State of AI Report 2023 (opens in a new tab): Provides an exhaustive overview of AI, encompassing technology breakthroughs, industry developments, politics, safety, economic impacts, and future predictions. It encourages contributions from the AI community, fostering informed discussions about AI's future.
A Survey of Large Language Models (opens in a new tab) (2023): Offers a comprehensive overview of the evolving landscape of LLMs, exploring their capabilities, applications, and challenges in NLP.
2023 State of AI in 14 Charts (opens in a new tab): A snapshot of what happened this past year in AI research, education, policy, hiring, and more.
The AI Index Report: Measuring trends in AI (opens in a new tab): By Stanford's HAI, compiling unbiased, globally sourced AI data. The 2023 report encompasses extra self-collected data and fresh analysis, focusing on foundation models, geopolitics, training costs, AI's environmental impact, and public opinion trends.
The state of AI in 2023: Generative AI’s breakout year (opens in a new tab) by McKinsey, discusses the explosive growth of generative AI tools.

Papers

Language Models are Few-Shot Learners (opens in a new tab) (2020): Scaling language models significantly improves task-agnostic, few-shot performance, competing with fine-tuning approaches. GPT-3, with 175 billion parameters, excels across NLP tasks without updates or fine-tuning, specified solely through text interaction. Strong performance is noted in translation, question-answering, and cloze tasks, with challenges and methodological issues identified in certain datasets.
Sparks of Artificial General Intelligence: Early experiments with GPT-4 (opens in a new tab) (2023): Initial assessment conducted by Microsoft Research on GPT-4, the most sophisticated LLM currently available, in comparison to human cognitive abilities.

Generative AI Enhancing Model