Techniques for handling context in LLM models

This post is related to:

BART configuration parameters overview

Large Language Models (LLMs) often face challenges with long contexts due to token limitations or memory constraints. To address this, researchers and engineers have developed various techniques and tools to enhance context management. Below is a detailed list of techniques and associated tools.

Techniques

1. Windowing (Sliding Window Technique)

Title: Training RNN and it’s Variants Using Sliding Window Technique. 2020
DOI: 10.1109/SCEECS48394.2020.93
Read paper ‘Training RNN and it’s Variants Using Sliding Window Technique. 2020’ on sci-hub.se

2. Chunking and Overlapping Contexts

Title: Hierarchical Attention Networks for Document Classification
DOI: 10.18653/v1/N16-1174
Read ‘Hierarchical Attention Networks for Document Classification’ on aclanthology.org

3. Hierarchical Context Representations

Title: Hierarchical Learning for Generation with Long Source Sequences. 2021
DOI: 10.48550/arXiv.2104.07545
Read ‘Hierarchical Learning for Generation with Long Source Sequences. 2021’ on arxiv.org

4. Memory-Augmented Neural Networks (MANNs)

Title: One-shot Learning with Memory-Augmented Neural Networks. 2016
DOI: 10.48550/arXiv.1605.06065
Read ‘One-shot Learning with Memory-Augmented Neural Networks. 2016’ on arxiv.org

5. Transformer Variants for Long Contexts

Longformer

Title: Longformer: The Long-Document Transformer. 2020
DOI: 10.48550/arXiv.2004.05150
Read ‘Longformer: The Long-Document Transformer. 2020’ on arxiv.org

Reformer

Title: Reformer: The Efficient Transformer. 2020
DOI: 10.48550/arXiv.2001.04451
Read ‘Reformer: The Efficient Transformer. 2020’ on arxiv.org

6. Compression-Based Context Management (Summarization)

Title: Efficient Adaptation of Pretrained Transformers for Abstractive Summarization. 2019
DOI: 10.48550/arXiv.1906.00138
Read ‘Compression-Based Context Management (Summarization)’ on arxiv.org