SpreadsheetLLM Compression Techniques

Implementation of compression techniques from the SpreadsheetLLM framework described in:

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Yuzhang Tian, Jianbo Zhao, Haoyu Dong, et al.
Microsoft Research, arXiv:2407.09025 (2024)
📄 Paper Link

🎯 Implemented Techniques

1. Structural Anchors for Efficient Layout Understanding

Identifies structural boundaries in spreadsheets
Removes homogeneous regions that don't contribute to layout understanding
Results: Up to 2.33x compression with 57.1% space saved

2. Inverted Index Translation

Maps cell values to address ranges instead of row-by-row encoding
Eliminates redundant encoding of repeated values
Results: Up to 58.2% token reduction

3. Data Format Aggregation

Groups cells by data format/type using rule-based recognition
Recognizes 9 data types: Year, Integer, Float, Percentage, Scientific notation, Date, Time, Currency, Email, Others
Results: Up to 80.4% space saved

📊 Performance Results

Technique	Original Tokens	Compressed	Compression Ratio	Space Saved
Structural Anchors	112	64	1.75x	42.9%
Inverted Index	55	23	2.39x	58.2%
Data Format Aggregation	51	10	5.10x	80.4%

🚀 Usage

Run the Jupyter Notebook:

jupyter notebook Implementation_of_Spreadsheet_compression_techniques.ipynb

The notebook contains detailed implementations, examples, and performance analysis for each compression technique.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Implementation_of_Spreadsheet_compression_techniques.ipynb		Implementation_of_Spreadsheet_compression_techniques.ipynb
LICENSE		LICENSE
README.md		README.md
data_format_aggregation_demo.py		data_format_aggregation_demo.py
inverted_index_demo.py		inverted_index_demo.py
requirements.txt		requirements.txt
structural_anchors_demo.py		structural_anchors_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpreadsheetLLM Compression Techniques

🎯 Implemented Techniques

1. Structural Anchors for Efficient Layout Understanding

2. Inverted Index Translation

3. Data Format Aggregation

📊 Performance Results

🚀 Usage

About

Uh oh!

Releases

Packages

Languages

License

DUrayev/SpreadsheetLLM-compression

Folders and files

Latest commit

History

Repository files navigation

SpreadsheetLLM Compression Techniques

🎯 Implemented Techniques

1. Structural Anchors for Efficient Layout Understanding

2. Inverted Index Translation

3. Data Format Aggregation

📊 Performance Results

🚀 Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages