Natural Language Processing Papers
Paper with Code NLP papes section (opens in a new tab): Provides access to research papers along with the corresponding code.
- MemGPT: Towards LLMs as Operating Systems (opens in a new tab) (Oct 2023)
- Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models (opens in a new tab) (Oct 2023)
- Introducing The Foundation Model Transparency Index (opens in a new tab)
- Improving Image Generation with Better Captions (opens in a new tab)
- Habitat 3.0: A Co-Habitat for Humans, Avatars and RobotsHabitat 3.0: A Co-Habitat for Humans, Avatars and Robots (opens in a new tab)
- Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflections (opens in a new tab) (Oct 2023)
- Improved Baselines with Visual Instruction Tuning (opens in a new tab) (Oct 2023)
- Open X-Embodiment: Robotic Learning Datasets and RT-X Models (opens in a new tab) (Oct 2023)
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation (opens in a new tab) (Oct 2023)
- RealFill: Reference-Driven Generation for Authentic Image Completion (opens in a new tab) (Sept 2023)
- GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models (opens in a new tab) (Aug 2023)
Language Modelling
- Generating Sequences With Recurrent Neural Networks (opens in a new tab) (2013)
- Semi-supervised Sequence Learning (opens in a new tab) (2015)
- End-To-End Memory Networks (opens in a new tab) (2015)
- Listen, Attend and Spell (opens in a new tab) (2015)
- Regularizing and Optimizing LSTM Language Models (opens in a new tab) (2017)
- Deep contextualized word representations (opens in a new tab) (2018)
- Universal Language Model Fine-tuning for Text Classification (opens in a new tab) (2018)
- An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (opens in a new tab) (2018)
- DARTS: Differentiable Architecture Search (opens in a new tab) (2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (opens in a new tab) (2019)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (opens in a new tab) (2019)
- Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (opens in a new tab) (2019)
- Language Models are Few-Shot Learners (opens in a new tab) (2020)
Text Generation
- Generating Sequences With Recurrent Neural Networks (opens in a new tab) (2013)
- Show and Tell: A Neural Image Caption Generator (opens in a new tab) (2014)
- Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models (opens in a new tab) (2016)
- SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient (opens in a new tab) (2017)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (opens in a new tab) (2019)
- Learning Transferable Visual Models From Natural Language Supervision (opens in a new tab) (2021)
Sentiment Analysis
- Bag of Tricks for Efficient Text Classification (opens in a new tab) (2016)
- A Structured Self-attentive Sentence Embedding (opens in a new tab) (2017)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (opens in a new tab) (2018)
- Universal Language Model Fine-tuning for Text Classification (opens in a new tab) (2018)
- Deep contextualized word representations (opens in a new tab) (2018)
- Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (opens in a new tab) (2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (opens in a new tab) (2019)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (opens in a new tab) (2019)
Text Classification
- Semi-supervised Sequence Learning (opens in a new tab) (2015)
- Bag of Tricks for Efficient Text Classification (opens in a new tab) (2016)
- FastText.zip: Compressing text classification models (opens in a new tab) (2016)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (opens in a new tab) (2018)
- Universal Language Model Fine-tuning for Text Classification (opens in a new tab) (2018)
Key Papers
-
Efficient Estimation of Word Representations in Vector Space (opens in a new tab) (2013): Introduced two model architectures for word vector representations derived from large datasets, outperforming existing methods in word similarity tasks with enhanced accuracy and lower computational demands. These vectors excel in measuring syntactic and semantic word similarities.
-
GloVe: Global Vectors for Word Representation (opens in a new tab) (2014): Proposed GloVe, a model that learns word meanings from co-occurrence statistics. It uses a global co-occurrence matrix to derive word vectors, demonstrating superior performance in various word analogy and similarity tasks compared to other methods.
-
Deep contextualized word representations (opens in a new tab) (2018): Introduces novel word representations capturing both meaning and context in sentences. These representations stem from a deep bidirectional language model trained on extensive text. They excel in various NLP tasks, such as sentiment analysis and named entity recognition, surpassing other techniques in performance.
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (opens in a new tab) (2018): Presents BERT, a pre-trained deep bidirectional transformer model. Trained on extensive text data with a masked language modeling objective, it excels in natural NLP like question answering and sentiment analysis, surpassing alternative methods in performance and versatility.
-
Attention Is All You Need (opens in a new tab) (2017): Introduced the groundbreaking concept of the Transformer architecture, an attention-based neural architecture for sequence processing. It outperforms other methods in NLP tasks like machine translation and language modeling, demonstrating its effectiveness in capturing contextual information from input sequences like sentences. (blog) (opens in a new tab)
-
Sequence to Sequence Learning with Neural Networks (opens in a new tab) (2014): Introduces the sequence-to-sequence model, a neural network for tasks like machine translation and text summarization. Comprising an encoder and a decoder, it effectively processes input sequences and generates output sequences, outperforming other methods in a range of NLP tasks.
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (opens in a new tab) (2020): Introduces a versatile fine-tuning method for retrieval-augmented generation (RAG) models, combining parametric and non-parametric memory for language generation. They employ a pre-trained neural retriever to fetch Wikipedia passages for input, achieving state-of-the-art results on knowledge-intensive NLP tasks. RAG models offer more precise, diverse, and factual language generation. (blog) (opens in a new tab)
-
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (opens in a new tab) (2019): Suggests improving language models with Transformer-XL, designed for longer text. Uses segment-level recurrence, reusing hidden states for processing extended sequences. Outperforms original Transformer and other models in language tasks.
-
Rethinking Attention with Performers (opens in a new tab) (2020): Introduced Transformer architectures estimating full-rank-attention with linear complexity, devoid of sparsity or low-rankness assumptions. Leveraging Fast Attention Via positive Orthogonal Random features (FAVOR+), they efficiently approximate attention-kernels, extending beyond softmax for scalable kernel methods. Performers demonstrate accuracy across diverse tasks, showcasing a novel attention-learning paradigm.
-
End-to-End Object Detection with Transformers (opens in a new tab) (2020): Presenting a novel object detection method, DETR views the task as a direct set prediction problem, eliminating hand-designed components like non-maximum suppression and anchor generation. Using a set-based global loss and a transformer encoder-decoder architecture, DETR reasons about object relations, delivering accurate predictions in parallel. It outperforms Faster RCNN (opens in a new tab) on COCO (opens in a new tab), showcasing simplicity and generalizability to panoptic segmentation.
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models (opens in a new tab) (2023): The study reveals that transformer models, pre-trained on a mix of diverse data sources like news, books, and code, exhibit constrained 'model selection (opens in a new tab)' abilities. They excel in tasks aligned with their training domains but struggle in mismatched ones. While pretraining on varied data holds potential for enhancing model flexibility, further research is required to broaden transformers' task generalization.