Compressed Context Memory for
Online Language Model Interaction

Jang-Hyun Kim¹, Junyoung Yeom¹, Sangdoo Yun^†2, Hyun Oh Song^†1,

¹Seoul National University, ²NAVER AI Lab, ^†equal supervision

Paper arXiv Code&Demo

Our approach compresses accumulating contexts into a compact memory space,
reducing memory requirement during inference and enhancing computational efficiecy.

Interactive demo of context compression approach with finetuned LLaMA-7B. Check out our GitHub for running demos.

Summary

Our approach dynamically creates compressed memory of attention keys/values during LLM interactions.

Our approach only requires training a conditional LoRA for compression .

We use a fully parallelized training strategy for recurrent compression procedures.

We conduct evaluations on diverse applications: conversation, multi-task ICL, and personalization, achieving the performance level of a full context model with 5x smaller context memory space.

Figure. Illustration of the proposed LM framework with compressed memory Mem(∙), where <C> denotes a compression token.

Results

Figure. Comparison of our approach and the full context model with LLaMA-7B on multi-task ICL.

Figure. Streaming evaluation on PG19 validation set using sliding window with LLaMA-7B.

Table. An example sample of our approach with LLaMA-7B in DailyDialog.

Table. Inference throughput analysis of our approach with LLaMA-7B using batch processing.

Related Works

Jack W Rae, Anna Potapenko, Siddhant M Jayakumar, and Timothy P Lillicrap. Compressive transformers for long-range sequence modelling. ICLR, 2020.

Aydar Bulatov, Yury Kuratov, and Mikhail Burtsev. Recurrent memory transformer. NeurIPS, 2022.

Jesse Mu, Xiang Lisa Li, and Noah Goodman. Learning to compress prompts with gist tokens. NeurIPS, 2023.

Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. Adapting language models to compress contexts. EMNLP, 2023.

Tao Ge, Jing Hu, Xun Wang, Si-Qing Chen, and Furu Wei. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945, 2023.

BibTeX

@article{kim2023compressed,
            title={Compressed Context Memory For Online Language Model Interaction},
            author={Kim, Jang-Hyun and Yeom, Junyoung and Yun, Sangdoo and Song, Hyun Oh},
            journal={arXiv preprint arXiv:2312.03414},
            year={2023}
          }

Compressed Context Memory for Online Language Model Interaction

Our approach compresses accumulating contexts into a compact memory space, reducing memory requirement during inference and enhancing computational efficiecy.