Geolm

Posted on Dec 12, 2025

bc_crunch : Compression Algorithm Documentation

#programming #dxt #compression

bc_crunch is a tiny, dependency-free C99 library for lossless compression of GPU-compressed texture blocks BC1, BC4, BC3, and BC5.

You can find it here : https://github.com/Geolm/bc_crunch

And contact me on bsky : https://bsky.app/profile/geolm.bsky.social

1. The Core Engine: Adaptive Arithmetic Coding (AAC)

Arithmetic Coding is a form of entropy encoding that maps an entire sequence of symbols to a single fractional number, often achieving better compression than Huffman coding because it can assign fractional bits per symbol.

Modeling (range_model): The compressor maintains a probability model for every type of data it encounters (e.g., color red delta, index difference, etc.). This model tracks how often each symbol (byte value or small integer) has occurred so far.

Adaptive Learning: The models are adaptive. After encoding a symbol, the model's counts are immediately updated. This allows the compressor to learn the statistics of the specific texture as it is being compressed, making it highly efficient even for unique data patterns.

Renormalization: The range_codec manages a base and length interval. As symbols are encoded, this interval shrinks. When it gets too small, a process called renormalization shifts the interval to output compressed bytes, maintaining precision and efficiency.

2. BC1 Compression (bc1_crunch)

BC1 blocks (8 bytes) consist of two 16-bit color endpoints and one 32-bit block of 16 2-bit indices.

A. Index Data Compression (The 32-bit Indices)

The 32-bit index pattern, which dictates the color of the 4×4 pixels, is often repeated across a texture. Your algorithm exploits this redundancy with a global dictionary.

Global Dictionary (Top Table) Creation:

The compressor first scans the entire texture and builds a histogram of every unique 32-bit index pattern.

It selects the Top 256 (TABLE_SIZE) most frequently occurring index patterns to form the top_table.

Encoding the Top Table:

The list of 256 patterns itself is compressed! Since the table is sorted, the patterns are similar.

The compressor encodes the difference (delta) between consecutive patterns in the table using Arithmetic Coding, which is very efficient for small, positive differences.

Encoding the Block Indices:

For each new block, the algorithm searches the top_table for the index pattern that is closest in binary representation (using Hamming distance, i.e., how many bits are different).

Reference Index: It encodes the index (table_index) of this nearest match in the top_table.

Mode Bit: It encodes a single bit (block_mode):

0 (Match): If the current block indices are exactly the same as the reference.

1 (Difference): If they are different, it computes the XOR difference and encodes the resulting 32-bit value, one byte at a time, using a dedicated AAC model (table_difference).

B. Color Endpoint Compression (The two 16-bit colors)

The color endpoints are compressed using a sophisticated form of Spatial Prediction and De-correlated Delta Encoding.

Spatial Prediction (Choosing a Reference):

The blocks are processed in a zig-zag pattern across the texture to maximize spatial locality.

For a block's color endpoint, the compressor calculates which of two neighbors offers the best prediction: the previous block in the row or the block immediately above it.

A single bit (color_reference) is encoded to tell the decoder which neighbor (and thus which color) to use as a reference for delta encoding.

De-correlated Delta Encoding:

The difference (delta) in Red, Green, and Blue components between the current color and the chosen reference color is calculated.

Green First: The Green component delta is encoded first.

R/B Prediction: The algorithm then uses the Green delta to predict the Red and Blue deltas (dred -= dgreen/2, dblue -= dgreen/2). This removes common information (e.g., if Green increases, R and B are likely to increase too), making the residuals smaller and easier to compress.

The residual Red and Blue deltas are encoded next. This sequence (Green -> Red residual -> Blue residual) significantly increases compression efficiency by de-correlating the color channels.

3. BC4 Compression (bc4_crunch)

BC4 blocks (8 bytes) consist of two 8-bit color endpoints (luminance/alpha) and one 48-bit block of 16 3-bit indices.

A. Color Endpoint Compression (The two 8-bit colors)

Compression relies on strong spatial prediction for Color 0 and local prediction for Color 1.

Color 0 (The First Endpoint):

It uses a Parallelogram Prediction (a common technique in image compression) to predict the current Color 0 value based on surrounding neighbors: Reference = Left + Up - UpLeft.

The difference (delta) between the actual Color 0 and this spatial reference is calculated (with wrapping modulo 256) and encoded (color_delta[0]).

Color 1 (The Second Endpoint):

It's encoded as a delta from the current block's Color 0. This assumes a strong relationship between the two endpoints within the same block.

B. Index Data Compression (The 48-bit Indices)

This is the most complex part, combining dictionary-based encoding with powerful contextual modeling for dictionary misses.

Adaptive Dictionary (Move-to-Front):

A small dictionary (DICTIONARY_SIZE = 256) of 48-bit index patterns is maintained.

This dictionary is a Move-to-Front (MTF) list: when an entry is used, it is moved to the front (index 0), ensuring the most recently used patterns have the shortest code length.

Dictionary Lookup and Mode Bit:

For the current block's 48-bit index pattern (bitfield), the compressor searches for the nearest match in the MTF dictionary based on Hamming distance.

Mode Bit (use_dict):

1 (Hit/Near Match): If the nearest match is very close (score < 4 bits different), it encodes 1. It then encodes the index of the match and the XOR difference (split into 16 separate 3-bit symbols), similar to BC1. The entry is moved to the front.

0 (Miss/New Pattern): If the match is poor, it encodes 0. The current pattern is pushed to the front of the MTF dictionary.

Contextual Local Prediction (Dictionary Miss):

If a pattern is not found in the dictionary, it's compressed locally using two levels of context:

Index-Chain XOR: The indices (processed in a zig-zag pattern within the block) are encoded as the XOR difference from the previously encoded index in the chain (block_previous ^ data).

Contextual Model Selection: The key innovation: the choice of which AAC model to use for this XOR difference is based on two things:

Endpoint Range: The overall difference between the block's two endpoints (int_abs(color[0] - color[1])). The code uses multiple buckets (e.g., range <8, range <32, or max range) to select a group of models (indices[24]).

Previous Index Value: Within that group, the specific model is selected by the value of the block_previous index. This means the probability of the next index is modeled based on the current index and the block's general contrast, making the prediction extremely precise.

DEV Community