Text Classification with Python 3.14's zstd Module

🗜️ Classify text… with compression? Python 3.14 makes this technique practical for the first time.

Python 3.14 added the compression.zstd module (Facebook’s Zstandard) to the standard library. This opens an elegant and surprising way to classify text without traditional ML models.

🧠 The core idea: If you compress a text together with a corpus from a category, the result will be smaller the more similar the text is to that category. This is based on Kolmogorov complexity: similar data compresses better together.

💡 The practical trick with zstd:

from compression.zstd import ZstdCompressor, ZstdDict

# For each class, build a "dictionary" from its corpus
zd_tacos = ZstdDict(tacos_corpus, is_raw=True)
comp = ZstdCompressor(zstd_dict=zd_tacos)

# The text producing the shortest output is the winning class
len(comp.compress(new_text))

✨ Advantages:

Zero external dependencies (Python 3.14 stdlib)
Works in online/streaming mode: no full retraining needed
Very fast: rebuilding the compressor takes microseconds

⚠️ Limitations:

Less accurate than modern models like BERT
Best for low-latency or resource-constrained use cases

💡 Explanation in a nutshell
#

The idea is simple: to know whether a text is about “tacos” or “padel,” compress it alongside texts from each category. The text will compress better with the texts it resembles most. It’s a way to measure similarity using compression math, without training any model.

Text classification with Python 3.14

Python 3.14 added Zstandard to stdlib; its incremental API finally makes classify-by-compression practical.

maxhalford.github.io ↗

Also published on LinkedIn.

Author

Juan Pedro Bretti Mandarano

💡 Explanation in a nutshell#

Text classification with Python 3.14

💡 Explanation in a nutshell
#