Complete guide to Base64 encoding (latest version 2023)
1. What is Base64 encoding?
Base64 is a format using 64 printable ASCII characters (A-Z、a-z、0-9、+、/) to represent the encoding scheme of binary data. Its core mission is: Let non-text data "safely survive" in a plain text environment.
You'll see it in these places:
- Embed small-size images or fonts in web pages/CSS/JavaScript (Data URLs)
- Email attachments (the SMTP protocol only supported 7-bit ASCII in its early years)
- Content packaging for JWT tokens and Kubernetes Secrets
- Readable transmission of binary data such as digital certificates and signatures
2. Why is Base64 needed?
try to put a.jpgOpen the file directly with Notepad, and you will see "garbled characters" filling the screen. There are three key reasons behind this:
- Unprintable characters messing
Binary data is filled with control characters (such as line feed, carriage return,
NULL), the text editor doesn't know how to display it. - Limitations of 7-bit transmission protocol Older protocols like SMTP and early HTTP form uploads can only safely transmit the first 128 characters in the ASCII table.
- Risk of truncation of special characters
In scenarios such as URL, Cookie, JSON, etc.
+、/Symbols such as , spaces, and newlines will be escaped or filtered, directly causing data damage.
Base64 "translates" all binary data into pure letters, numbers, and a few symbols, bypassing these problems.
3. Detailed explanation of Base64 principle
3.1 Coding core process
The essence is: split every 3 bytes (24 bits) into four 6-bit blocks, and then map each 6-bit value (0~63) to the default Base64 character table.
The indication is as follows:
Character table order follows RFC 4648:
Uppercase letters → Lowercase letters → Numbers →+/ (standard version), use URL secure version-_Replaces the last two characters.
3.2 Padding rules for less than 3 bytes
If the total data length is not a multiple of 3, you need to add ** at the end.0x00(Binary all zeros)** Make up a complete 24-bit block, and then encode normally.
But it was made up0x00cannot be mapped directly toA, but use=to represent (will be automatically ignored when decoding):
- 1 byte remaining: add 2
0x00, append 2 at the end after encoding= - Remaining 2 bytes: add 1
0x00, append 1 at the end after encoding=
4. Implementation in modern programming languages
Python (3.10+ recommended writing method)
Python built-inbase64The module already covers most production needs, and there is no need to install additional third-party libraries:
💡 handles unpadded/messily formatted Base64
In actual business, it is often encountered that the Base64 ** returned by the third-party interface is missing.=, there are spaces at the beginning and end, or put+/and-_Mixed use of **.
Here is a robust general decoding function:
5. Advanced topics and best practices
5.1 Performance and storage trade-offs
- Data expansion: The size after Base64 encoding will increase by approximately 33% (24 bit → 4×8 bit). For large files (such as videos or compressed packages over 100 MB), it is not recommended to use Base64 to transfer them directly. Blocked binary upload or CDN direct link is preferred.
- Python performance optimization: When processing large files, you can use
base64.b64encodeEncapsulate stream processing to avoid reading into memory at once. - Alternative: If both the front and back ends support binary transmission (HTTP/2, gRPC, WebSocket Binary frames, FormData File), using binary directly is a more efficient choice.
5.2 ⚠️ Guide to safe pit avoidance
**Base64 ≠ Encryption! ** It is only a fully reversible encoding and must not be used to "protect" sensitive data such as passwords, ID numbers, API keys, etc.
The correct approach is:
- Perform real encryption on sensitive data first (e.g. AES-256-GCM)
- Then package the encrypted binary data in Base64 to facilitate transmission.
In addition, it is strongly recommended to turn on verification when decoding (such as in Pythonvalidate=True) to avoid unexpected behavior caused by invalid input.

