Python binary data processing tutorial: Detailed explanation of struct module

When dealing with scenarios such as network protocol parsing, embedded device communication, and binary file reading, Python’s built-inbytes / bytearrayAlthough it can store raw bytes, operating directly on hexadecimal characters is boring and error-prone. At this time,structThe module is your Swiss Army Knife - it uses "format strings" that humans can understand at a glance to easily achieve bidirectional conversion between Python native types and binary data.


1. First understand: binary storage in Python

There are two main types in Python that specifically store binary data:

  • bytes: immutable byte sequence, passedb''Literal creation
  • bytearray: Variable byte sequence, one of which can be modified
# 直接写 ASCII 字符串对应的字节
b1 = b'hello'

# 手动写出每个十六进制字节
b2 = b'\x00\x9c@c'
print(b2)  # 输出与上面一致

After understanding these two types, let’s look atstructHow to convert them to and from Python normal types.


2. struct module core

2.1 The two most commonly used functions

structThe usage is very concentrated. Remembering the following two core functions is enough to handle 90% of the scenarios:

FunctionEffect
struct.pack(fmt, v1, v2, ...)Convert Python data to formatfmtpackagedbytes
struct.unpack(fmt, buffer)willbytes/bytearrayby formatfmtUnpack into tuples

Don’t forget to import it before using:

import struct

2.2 Core Rules: Format String

The format string isstructThe baton that determines how the data is interpreted. A complete format string consists of optional byte order + optional alignment and a required type specifier.

Omitting byte order makesstructUse the native rules of the current running platform (@), which may cause your program to have inconsistent results across Windows, macOS, and Linux. It is recommended to write down the fixed order directly.

CharactersRule Description
<Little endian (low byte first, common to personal computers)
>Big endian (high byte first, commonly used in network protocols and embedded systems)
!Network order (equivalent to>, specially used to parse network protocols)
@Native order + native alignment (Platform dependent, use with caution)
=Native order, but standard size, no alignment (less used)

Common data type specifiers

The following types cover most application scenarios, and remembering them will help you cope with daily tasks:

FormatCorresponding C typePython typeNumber of bytes
xPadding bytes (skip)None1
bsigned charint1
Bunsigned charint1
hshortint2
Hunsigned shortint2
iintint4
Iunsigned intint4
qlong longint8
Qunsigned long longint8
ffloatfloat4
ddoublefloat8
schar[]bytesSpecify length with numeric prefix
pPascal stringbytesSpecify the length with a numeric prefix (the first byte stores the actual length)

3. Try it out: basic practical operation

3.1 Packing and unpacking of single integers

Let’s first demonstrate using big endian order, which is common in network protocols and many binary file formats:

import struct

# 把整数 10240099 打包成大端 4 字节无符号整数
n = 10240099
packed = struct.pack('>I', n)
print(packed)       # 输出 b'\x00\x9c@c'

# 解包时必须使用相同的格式字符串
# unpack 返回的是一个元组,取第一个元素就是恢复后的整数
unpacked = struct.unpack('>I', packed)[0]
print(unpacked)     # 输出 10240099

It's that simple: a format string turns a human-readable number into a compact binary, and can also be changed back.

3.2 Mixing multiple data types

packandunpackSupports combining multiple type specifiers in order in a format string, corresponding to parameters one-to-one:

# 打包:大端序,4 字节无符号整数 + 2 字节无符号短整数
mixed_packed = struct.pack('>IH', 4042322160, 32896)
print(mixed_packed)  # 输出 b'\xf0\xf0\xf0\xf0\x80\x80'

# 解包得到一个两元素的元组
mixed_unpacked = struct.unpack('>IH', mixed_packed)
print(mixed_unpacked)  # 输出 (4042322160, 32896)

4. Practical practice: parsing BMP file headers

Just talk without practicing the tricks. Below we usestructWrite a utility function to extract width, height, bit depth and other information from the binary header of BMP images.

The BMP file header is in little endian order (<) storage, which is a common byte order under Windows. The first 30 bytes contain key information such as image size.

import struct
import base64

def analyze_bmp_header(data: bytes) -> dict | None:
    """分析 BMP 文件前 30 字节的标准头"""
    if len(data) < 30:
        raise ValueError("数据太短,无法解析 BMP 头")

    # BMP 头的固定格式:小端序
    # 对应顺序:文件标识(2s)、文件大小(I)、保留1(H)、保留2(H)、
    # 数据偏移(I)、头大小(I)、宽(I)、高(I)、色彩平面数(H)、位深(H)
    header = struct.unpack('<2sIHHIIIIHH', data[:30])

    # 验证是否是合法的 BMP 文件(前两个字节必须是 b'BM')
    if header[0] != b'BM':
        return None

    return {
        'type': header[0].decode('ascii'),
        'size': header[1],
        'width': header[6],
        'height': header[7],
        'bits_per_pixel': header[9]
    }

def bmp_info(data: bytes) -> dict | None:
    """简化版接口:只返回常用的宽、高、位深"""
    full_info = analyze_bmp_header(data)
    if not full_info:
        return None
    return {
        'width': full_info['width'],
        'height': full_info['height'],
        'color': full_info['bits_per_pixel']
    }

if __name__ == '__main__':
    # 这里用 base64 编码的模拟 BMP 数据,避免依赖本地文件
    bmp_base64 = (
        'Qk1oAgAAAAAAADYAAAAoAAAAHAAAAAoAAAABABAAAAAAADICAAASCwAAEgsAA'
        'AAAAAAAAAAA/3//f/9//3//f/9//3//f/9//3//f/9//3//f/9//3//f/9//3//f/9//3/'
        '/f/9//3//f/9//3//f/9/AHwAfAB8AHwAfAB8AHwAfP9//3//fwB8AHwAfAB8/3//f/9/A'
        'HwAfAB8AHz/f/9//3//f/9//38AfAB8AHwAfAB8AHwAfAB8AHz/f/9//38AfAB8/3//f/9'
        '//3//fwB8AHz/f/9//3//f/9//3//f/9/AHwAfP9//3//f/9/AHwAfP9//3//fwB8AHz/f'
        '/9//3//f/9/AHwAfP9//3//f/9//3//f/9//38AfAB8AHwAfAB8AHwAfP9//3//f/9/AHw'
        'AfP9//3//f/9//38AfAB8/3//f/9//3//f/9//3//fwB8AHwAfAB8AHwAfAB8/3//f/9//'
        '38AfAB8/3//f/9//3//fwB8AHz/f/9//3//f/9//3//f/9/AHwAfP9//3//f/9/AHwAfP9'
        '//3//fwB8AHz/f/9/AHz/f/9/AHwAfP9//38AfP9//3//f/9/AHwAfAB8AHwAfAB8AHwAf'
        'AB8/3//f/9/AHwAfP9//38AfAB8AHwAfAB8AHwAfAB8/3//f/9//38AfAB8AHwAfAB8AHw'
        'AfAB8/3//f/9/AHwAfAB8AHz/fwB8AHwAfAB8AHwAfAB8AHz/f/9//3//f/9//3//f/9//'
        '3//f/9//3//f/9//3//f/9//3//f/9//3//f/9//3//f/9//38AAA=='
    )
    bmp_data = base64.b64decode(bmp_base64)

    info = bmp_info(bmp_data)
    print(f"模拟 BMP 图片信息:{info}")

    # 验证结果是否符合预期
    assert info['width'] == 28
    assert info['height'] == 10
    assert info['color'] == 16
    print("✅ BMP 解析测试通过!")

After running, you can see the width, height and bit depth of the output, and the entire parsing process is clear and intuitive.


5. Pitfall avoidance guide and best practices

  1. Be sure to specify the byte order Never use the default@, otherwise data interpretation will be messed up across platforms.
  2. Processingstruct.error
    When unpacking, if the supplied byte length does not match the expected format string, an exception will be thrown. For cultivationtry/exceptWrapping Habit.
  3. Big data cooperationmemoryview
    When dealing with binary buffers larger than tens of MB, usememoryviewSlicing can avoid memory copying and improve performance.
  4. Check official documentation For less common formats such as Pascal stringp?Boolean, etc.), refer directly to Python struct 官方文档.

6. What else can be done?

Got itstruct, the following tasks will become easier:

  • Parse TCP/UDP network headers (IPv4, DNS, etc.)
  • Read the binary format of older versions of Excel
  • Interact with dynamic libraries written in C/C++ and pass binary structures
  • Write communication protocols with embedded devices such as Arduino and Raspberry Pi

If you encounter a more complex nested binary structure, you can also usestruct.calcsizeCalculate the number of bytes occupied by the format string and perform step-by-step analysis; or use a third-party library such asconstructto describe higher-level protocols.


Through this tutorial, you can already use it skillfullystructHandles common binary data. Next time we meet densely packed people\xBytes, it is better to write a line of format string directly, so thatstructHelp you get everything done!