This post is related to:
Large Language Models (LLMs) often face challenges with long contexts due to token limitations or memory constraints. To address this, researchers and engineers have developed various techniques and tools to enhance context management. Below is a detailed list of techniques and associated tools.
Techniques
1. Windowing (Sliding Window Technique)
- Title: Training RNN and it’s Variants Using Sliding Window Technique. 2020
- DOI: 10.1109/SCEECS48394.2020.93
- Read paper ‘Training RNN and it’s Variants Using Sliding Window Technique. 2020’ on sci-hub.se
2. Chunking and Overlapping Contexts
- Title: Hierarchical Attention Networks for Document Classification
- DOI: 10.18653/v1/N16-1174
- Read ‘Hierarchical Attention Networks for Document Classification’ on aclanthology.org
3. Hierarchical Context Representations
- Title: Hierarchical Learning for Generation with Long Source Sequences. 2021
- DOI: 10.48550/arXiv.2104.07545
- Read ‘Hierarchical Learning for Generation with Long Source Sequences. 2021’ on arxiv.org
4. Memory-Augmented Neural Networks (MANNs)
- Title: One-shot Learning with Memory-Augmented Neural Networks. 2016
- DOI: 10.48550/arXiv.1605.06065
- Read ‘One-shot Learning with Memory-Augmented Neural Networks. 2016’ on arxiv.org
5. Transformer Variants for Long Contexts
Longformer
- Title: Longformer: The Long-Document Transformer. 2020
- DOI: 10.48550/arXiv.2004.05150
- Read ‘Longformer: The Long-Document Transformer. 2020’ on arxiv.org
Reformer
- Title: Reformer: The Efficient Transformer. 2020
- DOI: 10.48550/arXiv.2001.04451
- Read ‘Reformer: The Efficient Transformer. 2020’ on arxiv.org
6. Compression-Based Context Management (Summarization)
- Title: Efficient Adaptation of Pretrained Transformers for Abstractive Summarization. 2019
- DOI: 10.48550/arXiv.1906.00138
- Read ‘Compression-Based Context Management (Summarization)’ on arxiv.org
7. Causal Attention Mechanisms
- Title: Attention Is All You Need. 2017
- DOI: arXiv.1706.03762
- Read ‘Attention Is All You Need. 2017’ on arxiv.org
8. Vector-Based Semantic Search (Dense Vector Representations)
- Title: Sentence-BERT: Sentence Embeddings Using Siamese Networks. 2019
- DOI: 10.48550/arXiv.1908.10084
- Read Sentence-BERT: Sentence Embeddings Using Siamese Networks. 2019’ on arxiv.org
9. Retrieval-Augmented Generation (RAG)
- Title: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. 2020
- DOI: 10.48550/arXiv.2005.11401
- Read ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. 2020’ on arxiv.org
10. Cache Augmented Models for Recurrent Usage
- Title: RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation. 2024
- DOI: 10.48550/arXiv.2404.12457
- Read ‘RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation. 2024’ on arxiv.org
Tools and Frameworks
1. Pinecone
- Provides vector-based memory and semantic search capabilities.
2. LangChain
- Handles chunking, memory management, and retrieval tasks for LLMs.
3. FAISS (Facebook AI Similarity Search)
- Optimized for efficient similarity search on dense vectors.
4. Weaviate
- Offers scalable vector search and knowledge graph integrations.
5. Hugging Face’s Transformers
- Implements state-of-the-art transformer models like Longformer and Reformer.
6. OpenAI Embeddings
- Provides embeddings for vector search and semantic tasks.
7. Redis Vector Store
- Lightweight memory storage for vectorized data.
8. GPT Index (LlamaIndex)
- Automates document splitting, chunking, and embedding management for LLMs.
9. Haystack
- An open-source framework for retrieval-augmented generation (RAG).
10. MemGPT
- Enhances memory for multi-session GPT-based interactions.