Module constriction.symbol
Symbol codes. Mainly provided for teaching purpose. You'll probably want to use a stream code instead.
Symbol codes encode messages (= sequences of symbols) by compressing each symbol individually and then concatenating the compressed representations of the symbols (called code words). This has the advantage of being conceptually simple, but the disadvantage of incurring an overhead of up to almost 1 bit per symbol. The overhead is most significant in the regime of low entropy per symbol (often the regime of novel machine-learning based compression methods), since each (nontrivial) symbol in a symbol code contributes at least 1 bit to the total bitrate of the message, even if its information content is only marginally above zero.
This module defines the types QueueEncoder
, QueueDecoder
, and StackCoder
, which
are essentially just efficient containers for of growable bit strings. The interesting
part happens in the so-called code book, which defines how symbols map to code words. We
provide the Huffman Coding algorithm in the submodule huffman
,
which calculates an optimal code book for a given probability distribution.
Examples
The following two examples both encode and then decode a sequence of symbols using the same entropy model for each symbol. We could as well use a different entropy model for each symbol, but it would make the examples more lengthy.
- Example 1: Encoding and decoding with "queue" semantics ("first in first out", i.e., we'll decode symbols in the same order in which we encode them):
import constriction
import numpy as np
# Define an entropy model over the (implied) alphabet {0, 1, 2, 3}:
probabils = np.array([0.3, 0.2, 0.4, 0.1], dtype=np.float32)
# Encode some example message, using the same model for each symbol here:
message = [1, 3, 2, 3, 0, 1, 3, 0, 2, 1, 1, 3, 3, 1, 2, 0, 1, 3, 1]
encoder = constriction.symbol.QueueEncoder()
encoder_codebook = constriction.symbol.huffman.EncoderHuffmanTree(probabils)
for symbol in message:
encoder.encode_symbol(symbol, encoder_codebook)
# Obtain the compressed representation and the bitrate:
compressed, bitrate = encoder.get_compressed()
print(compressed, bitrate) # (prints: [3756389791, 61358], 48)
print(f"(in binary: {[bin(word) for word in compressed]}")
# Decode the message
decoder = constriction.symbol.QueueDecoder(compressed)
decoded = []
decoder_codebook = constriction.symbol.huffman.DecoderHuffmanTree(probabils)
for symbol in range(19):
decoded.append(decoder.decode_symbol(decoder_codebook))
assert decoded == message # (verifies correctness)
- Example 2: Encoding and decoding with "stack" semantics ("last in first out", i.e., we'll encode symbols from back to front and then decode them from front to back):
import constriction
import numpy as np
# Define an entropy model over the (implied) alphabet {0, 1, 2, 3}:
probabils = np.array([0.3, 0.2, 0.4, 0.1], dtype=np.float32)
# Encode some example message, using the same model for each symbol here:
message = [1, 3, 2, 3, 0, 1, 3, 0, 2, 1, 1, 3, 3, 1, 2, 0, 1, 3, 1]
coder = constriction.symbol.StackCoder()
encoder_codebook = constriction.symbol.huffman.EncoderHuffmanTree(probabils)
for symbol in reversed(message): # Note: reversed
coder.encode_symbol(symbol, encoder_codebook)
# Obtain the compressed representation and the bitrate:
compressed, bitrate = coder.get_compressed()
print(compressed, bitrate) # (prints: [[2818274807, 129455] 48)
print(f"(in binary: {[bin(word) for word in compressed]}")
# Decode the message (we could explicitly construct a decoder:
# `decoder = constriction.symbol.StackCoder(compressed)`
# but we can also also reuse our existing `coder` for decoding):
decoded = []
decoder_codebook = constriction.symbol.huffman.DecoderHuffmanTree(probabils)
for symbol in range(19):
decoded.append(coder.decode_symbol(decoder_codebook))
assert decoded == message # (verifies correctness)
Sub-modules
constriction.symbol.huffman
-
Codebooks for Huffman coding [1] …
Classes
class QueueDecoder (compressed)
-
A container of compressed bits that can be read from front to back for decoding.
This is the counterpart to
QueueEncoder
. The constructor takes a single argument, which must be a rank-1 numpy array withdtype=np.uint32
, as in the first return value of is returned by `QueueEncoder.get_compressed()Example:
See first module level example.
Methods
def decode_symbol(self, /, codebook)
-
Reads more bits from the current encapsulated compressed data and tries to match them up with one of the code words in the provided code book, returning the corresponding symbol on success.
Arguments
- codebook — a decoder code book for a symbol code. Currently,
DecoderHuffmanTree
is the only implemented decoder code book.
- codebook — a decoder code book for a symbol code. Currently,
class QueueEncoder
-
Methods
def encode_symbol(self, /, symbol, codebook)
-
Looks up the provided symbol in the provided codebook and appends its bits to the compressed data such that they form a prefix-free code (i.e., a code that can easily be read from front to back).
Arguments
- symbol — an integer in the range
{0, 1, ..., n-1}
wheren
is the size of thecodebook
provided in the second argument. - codebook — an encoder code book for a symbol code. Currently,
EncoderHuffmanTree
is the only implemented encoder code book.
- symbol — an integer in the range
def get_compressed(self, /)
-
Deprecated method. Please use
get_compressed_and_bitrate
instead.(The method was renamed to
get_compressed_and_bitrate
inconstriction
version 0.4.0 to avoid confusion about the return type.) def get_compressed_and_bitrate(self, /)
-
Returns a tuple
(compressed, bitrate)
, wherecompressed
is a copy of the compressed representation that is currently on the coder (filled with zero bits to a multiple of 32 bits), andbitrate
is the number of information-carrying bits.You can write the compressed data to a file (by calling
compressed.tofile("filename")
), read it back in at a later point (withcompressed = np.fromfile("filename")
) and then construct aQueueDecoder
from. def get_decoder(self, /)
-
Shortcut for
QueueDecoder(encoder.get_compressed())
whereencoder
is aQueueEncoder
.Copies the compressed data out of the encoder. Thus, if you continue to encode symbols on the encoder, those won't affect what you will be able to decode on the decoder.
class StackCoder (compressed=None)
-
A container of compressed bits that allows appending and consuming bits from the same end.
When encoding onto a
StackCoder
, the bits that comprise each code word are automatically appended in reverse order so that a prefix-free code becomes a suffix-free code which can easily be decoded from the end. For Huffman Coding, this is actually the natural way to generate code words (by walking the tree from a leaf to the root).A
StackCoder
does not distinguish between an encoder and a decoder. It supports both encoding and decoding with a single data structure and even allows you to arbitrarily switch back and forth between encoding and decoding operations (e.g., for bits-back coding).The constructor takes existing compressed data as an optional argument. If it is provided, it must must be a rank-1 numpy array with, as in the first return value of the method
get_compressed
. If no argument is provided, then theStackCoder
is initialized empty (useful for encoding).Example:
See second module level example.
Methods
def decode_symbol(self, /, codebook)
-
Consumes bits from the end of the encapsulated compressed data and tries to match them up with one of the code words in the provided code book, returning the corresponding symbol on success.
Arguments
- codebook — a decoder code book for a symbol code. Currently,
DecoderHuffmanTree
is the only implemented decoder code book.
- codebook — a decoder code book for a symbol code. Currently,
def encode_symbol(self, /, symbol, codebook)
-
Looks up the provided symbol in the provided codebook and appends its bits to the compressed data such that they form a suffix-free code (i.e., a code that can easily be read from back to front).
Arguments
- symbol — an integer in the range
{0, 1, ..., n-1}
wheren
is the size of thecodebook
provided in the second argument. - codebook — an encoder code book for a symbol code. Currently,
EncoderHuffmanTree
is the only implemented encoder code book.
- symbol — an integer in the range
def get_compressed(self, /)
-
Deprecated method. Please use
get_compressed_and_bitrate
instead.(The method was renamed to
get_compressed_and_bitrate
inconstriction
version 0.4.0 to avoid confusion about the return type.) def get_compressed_and_bitrate(self, /)
-
Returns a tuple
(compressed, bitrate)
, wherecompressed
is a copy of the compressed representation that is currently on the coder (filled with zero bits to a multiple of 32 bits), andbitrate
is the number of information-carrying bits.You can write the compressed data to a file (by calling
compressed.tofile("filename")
), read it back in at a later point (withcompressed = np.fromfile("filename")
) and then reconstruct a coder (for decoding) py passingcompressed
to the constructor ofStackCoder
.