Complete guide to Base64 encoding (latest version 2023)

1. What is Base64 encoding?

Base64 is a format using 64 printable ASCII characters (A-Za-z0-9+/) to represent the encoding scheme of binary data. Its core mission is: Let non-text data "safely survive" in a plain text environment.

You'll see it in these places:

  • Embed small-size images or fonts in web pages/CSS/JavaScript (Data URLs)
  • Email attachments (the SMTP protocol only supported 7-bit ASCII in its early years)
  • Content packaging for JWT tokens and Kubernetes Secrets
  • Readable transmission of binary data such as digital certificates and signatures

2. Why is Base64 needed?

try to put a.jpgOpen the file directly with Notepad, and you will see "garbled characters" filling the screen. There are three key reasons behind this:

  1. Unprintable characters messing Binary data is filled with control characters (such as line feed, carriage return,NULL), the text editor doesn't know how to display it.
  2. Limitations of 7-bit transmission protocol Older protocols like SMTP and early HTTP form uploads can only safely transmit the first 128 characters in the ASCII table.
  3. Risk of truncation of special characters In scenarios such as URL, Cookie, JSON, etc.+/Symbols such as , spaces, and newlines will be escaped or filtered, directly causing data damage.

Base64 "translates" all binary data into pure letters, numbers, and a few symbols, bypassing these problems.

3. Detailed explanation of Base64 principle

3.1 Coding core process

The essence is: split every 3 bytes (24 bits) into four 6-bit blocks, and then map each 6-bit value (0~63) to the default Base64 character table.

The indication is as follows:

原始 3 字节二进制布局:
┌───────────────┬───────────────┬───────────────┐
│      b1       │      b2       │      b3       │
├─┬─┬─┬─┬─┬─┬─┬─┼─┬─┬─┬─┬─┬─┬─┬─┼─┬─┬─┬─┬─┬─┬─┬─┤
│1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8│1 2 3 4 5 6 7 8│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

重新切分为 4 个 6 bit 映射块:
┌───────┬───────┬───────┬───────┐
│  n1   │  n2   │  n3   │  n4   │
└───────┴───────┴───────┴───────┘

Character table order follows RFC 4648: Uppercase letters → Lowercase letters → Numbers →+/ (standard version), use URL secure version-_Replaces the last two characters.

3.2 Padding rules for less than 3 bytes

If the total data length is not a multiple of 3, you need to add ** at the end.0x00(Binary all zeros)** Make up a complete 24-bit block, and then encode normally. But it was made up0x00cannot be mapped directly toA, but use=to represent (will be automatically ignored when decoding):

  • 1 byte remaining: add 20x00, append 2 at the end after encoding=
  • Remaining 2 bytes: add 10x00, append 1 at the end after encoding=

4. Implementation in modern programming languages

Python built-inbase64The module already covers most production needs, and there is no need to install additional third-party libraries:

import base64

# 1. 标准编码/解码
original_bytes = b"2024 技术博客"
encoded_std = base64.b64encode(original_bytes)  # 输出 bytes:b'MjAyNCDlronmnKzlj4bmuJs='
decoded_std = base64.b64decode(encoded_std)     # 还原:b'2024 技术博客'

# 2. URL 安全编码/解码(自动处理 +→-、/→_,还可手动去除填充)
original_binary = b"\xfb\xef\xff"               # 示例二进制数据
encoded_url = base64.urlsafe_b64encode(original_binary).rstrip(b"=")  # b'--__'
decoded_url = base64.urlsafe_b64decode(encoded_url + b"=" * ((4 - len(encoded_url) % 4) % 4))
# 还原:b'\xfb\xef\xff'

💡 handles unpadded/messily formatted Base64

In actual business, it is often encountered that the Base64 ** returned by the third-party interface is missing.=, there are spaces at the beginning and end, or put+/and-_Mixed use of **. Here is a robust general decoding function:

import base64
from typing import Union

def robust_base64_decode(s: Union[str, bytes]) -> bytes:
    # 1. 统一转为干净的 ASCII bytes
    if isinstance(s, str):
        s = s.strip().encode("ascii", errors="ignore")
    else:
        s = s.strip()
    
    # 2. 将 URL 安全字符改回标准字符(兼容两种输入)
    s = s.replace(b"-", b"+").replace(b"_", b"/")
    
    # 3. 自动补齐缺失的填充
    pad_len = (4 - len(s) % 4) % 4
    s += b"=" * pad_len
    
    # 4. 解码并进行校验
    try:
        return base64.b64decode(s, validate=True)
    except Exception as e:
        raise ValueError("无效的 Base64 输入") from e

5. Advanced topics and best practices

5.1 Performance and storage trade-offs

  • Data expansion: The size after Base64 encoding will increase by approximately 33% (24 bit → 4×8 bit). For large files (such as videos or compressed packages over 100 MB), it is not recommended to use Base64 to transfer them directly. Blocked binary upload or CDN direct link is preferred.
  • Python performance optimization: When processing large files, you can usebase64.b64encodeEncapsulate stream processing to avoid reading into memory at once.
  • Alternative: If both the front and back ends support binary transmission (HTTP/2, gRPC, WebSocket Binary frames, FormData File), using binary directly is a more efficient choice.

5.2 ⚠️ Guide to safe pit avoidance

**Base64 ≠ Encryption! ** It is only a fully reversible encoding and must not be used to "protect" sensitive data such as passwords, ID numbers, API keys, etc.

The correct approach is:

  1. Perform real encryption on sensitive data first (e.g. AES-256-GCM)
  2. Then package the encrypted binary data in Base64 to facilitate transmission.

In addition, it is strongly recommended to turn on verification when decoding (such as in Pythonvalidate=True) to avoid unexpected behavior caused by invalid input.

5.3 Common modern scenarios in practice

Scenario 1: Data URL embedding small images

<!-- 仅推荐嵌入 10 KB 以下的图片,避免阻塞 HTML 解析 -->
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" 
     alt="1x1 透明占位图">

Scenario 2: JWT token parsing (only header/payload is read, signature must be verified)

import json
# 假设上面已经定义了 robust_base64_decode 函数
from decoder import robust_base64_decode

jwt_token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c"
header, payload, signature = jwt_token.split(".")

# 解析 header(得到 JSON 字符串)
print(json.loads(robust_base64_decode(header)))
# 输出:{'alg': 'HS256', 'typ': 'JWT'}

Further reading