11.3. Working with Binary Data Record Layouts
Working with binary data record layouts in Python often involves the struct module.
🡆struct.pack(format, v1, v2, ...): Packs Python values into a bytes object according to the format string.
🡆struct.unpack(format, buffer): Unpacks a bytes object into a tuple of Python values according to the format string.
🡆struct.calcsize(format): Returns the size of the struct corresponding to the format string.
Which allows packing and unpacking of binary data according to specified format strings.
Example
Consider a binary file with a header containing an unsigned short for version and an unsigned int for data length, both in little-endian.
import struct
# Simulate a binary file content
# Version (2 bytes, H), Length (4 bytes, I)
mock_binary_header = struct.pack(‘<HI’, 101, 512)
with open(‘mock_file.bin’, ‘wb’) as f:
f.write(mock_binary_header)
with open(‘mock_file.bin’, ‘rb’) as f:
header_data = f.read(struct.calcsize(‘<HI’))
version, length = struct.unpack(‘<HI’, header_data)
print(f”Version: {version}, Length: {length}”)
Opening Binary Files
Binary files are opened using the 'rb' (read binary), 'wb' (write binary), or 'ab' (append binary) modes in the open() function.
with open(‘data.bin’, ‘rb’) as f:
binary_data = f.read()
Format strings define the layout of the binary data, specifying data types, byte order, and alignment.
🡆'<': Little-endian (standard for many systems)
🡆'>' or '!': Big-endian (network byte order)
🡆'=': Native byte order and alignment
🡆'@': Native byte order, native alignment (default)
'b' (signed char), 'H' (unsigned short), 'I' (unsigned int), 'f' (float), 'd' (double), 's' (bytes).This is particularly useful when interacting with files or network protocols that use fixed-size binary structures.
