Compressed Context Memory for
Online Language Model Interaction

1Seoul National University, 2NAVER AI Lab, equal supervision

Our approach compresses accumulating contexts into a compact memory space,
reducing memory requirement during inference and enhancing computational efficiecy.


Interactive demo of context compression approach with finetuned LLaMA-7B. Check out our GitHub for running demos.

Summary

  • Our approach dynamically creates compressed memory of attention keys/values during LLM interactions.
  • Our approach only requires training a conditional LoRA for compression .
  • We use a fully parallelized training strategy for recurrent compression procedures.
  • We conduct evaluations on diverse applications: conversation, multi-task ICL, and personalization,       achieving the performance level of a full context model with 5x smaller context memory space.
  • Figure. Illustration of the proposed LM framework with compressed memory Mem(∙), where <C> denotes a compression token.

    Results

    Figure. Comparison of our approach and the full context model with LLaMA-7B on multi-task ICL.


    Figure. Streaming evaluation on PG19 validation set using sliding window with LLaMA-7B.


    Table. An example sample of our approach with LLaMA-7B in DailyDialog.


    Table. Inference throughput analysis of our approach with LLaMA-7B using batch processing.

    Related Works

  • Jack W Rae, Anna Potapenko, Siddhant M Jayakumar, and Timothy P Lillicrap. Compressive transformers for long-range sequence modelling. ICLR, 2020.
  • Aydar Bulatov, Yury Kuratov, and Mikhail Burtsev. Recurrent memory transformer. NeurIPS, 2022.
  • Jesse Mu, Xiang Lisa Li, and Noah Goodman. Learning to compress prompts with gist tokens. NeurIPS, 2023.
  • Alexis Chevalier, Alexander Wettig, Anirudh Ajith, and Danqi Chen. Adapting language models to compress contexts. EMNLP, 2023.
  • Tao Ge, Jing Hu, Xun Wang, Si-Qing Chen, and Furu Wei. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945, 2023.
  • BibTeX

    @article{kim2023compressed,
                title={Compressed Context Memory For Online Language Model Interaction},
                author={Kim, Jang-Hyun and Yeom, Junyoung and Yun, Sangdoo and Song, Hyun Oh},
                journal={arXiv preprint arXiv:2312.03414},
                year={2023}
              }