💡 Do you often need to save temporary configurations, crawler results or API response cache during development? It’s too heavy to use a database, and it’s troublesome to parse it manually with ordinary text? Today’s protagonist—Python standard libraryjson, add some third-party gadgets to easily handle lightweight JSON text storage and interaction~

JSON processing tutorial in Python

JSON (JavaScript Object Notation) is a lightweight data exchange format that is easy to read and parse for both humans and machines. It is now a "universal language" used in almost all scenarios. This tutorial takes you from basics to advanced to quickly master JSON processing in Python.

📌 JSON Basics

JSON data structure

JSON only supports 6 core data structures, which is very concise:

Object 🗂️: curly braces{}Wrapped key-value pair, the key must be a double-quoted string
Array 📊: square brackets[]Wrapped ordered collection of values, mixed types possible (but specification recommended)
Value: can be a string (double quotes), number, Boolean value (true/false）、null, object or array

🔄 Correspondence between JSON and Python types

PythonjsonThe module will automatically do bidirectional mapping, just remember this table clearly:

JSON type	Python type
object	dict
array	list
string	str
number	int/float
true	True
false	False
null	None

📖 Read JSON data

Loading from string (`json.loads`）

loads= load string, suitable for processing strings returned by API, multi-line configuration text, etc.

import json

# 多行 JSON 字符串可以用三引号，合法 JSON 不强制缩进，但 Python 写好看点
json_str = '''
{
    "name": "John",
    "age": 30,
    "city": "New York",
    "hobbies": ["reading", "traveling"],
    "married": false
}
'''

# 转换为 Python 对象
data = json.loads(json_str)

print(type(data))          # <class 'dict'>
print(data["name"])        # John
print(data["hobbies"][0])  # reading

Load from file (`json.load`）

load= load file object, remember to usewithStatement ** automatically closes the file.

import json

# 假设同目录下有 data.json 文件
try:
    with open('data.json', 'r', encoding='utf-8') as f:
        data = json.load(f)  # 直接传文件对象
except FileNotFoundError:
    print("文件不存在，创建默认数据...")
    data = {"default": True}

print(data)

✍️ Write JSON data

Convert to JSON string (`json.dumps`）

dumps= dump string, three must-use practical parameters should be remembered:

ensure_ascii=False: Reserve Chinese and other non-ASCII characters (if not added, it will become\uXXXXGarbled characters)
indent=2or4:Set the indentation level to make it easier for people to read.
sort_keys=True: Sort by dictionary key, the output result is more stable (easy to compare)

import json

data = {
    "name": "张三",
    "age": 28,
    "city": "北京",
    "hobbies": ["摄影", "编程"],
    "married": True
}

# 转换为带缩进、有序、保留中文的 JSON 字符串
json_str = json.dumps(data, ensure_ascii=False, indent=4, sort_keys=True)
print(json_str)

Write to JSON file (`json.dump`）

dump= dump file object, parameters anddumpsconsistent.

import json

data = {
    "name": "李四",
    "age": 35,
    "city": "上海",
    "hobbies": ["游泳", "音乐"],
    "married": False
}

with open('output.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4, sort_keys=True)

🚀 Advanced usage

Processing custom classes/datetime objects

standardjsonModules cannot directly serialize custom classes (such asUser)ordatetime, two methods can be used:

Method 1: Customize`default`Function (simple and flexible)

existdumps/dumpwhen passed indefault, tells the module how to handle unknown objects:

import json
from datetime import datetime

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        self.created_at = datetime.now()

def custom_encoder(obj):
    # 先处理特殊类型
    if isinstance(obj, datetime):
        # 转 ISO 格式字符串，跨平台且可读
        return obj.isoformat()
    if isinstance(obj, User):
        # 显式指定要序列化的字段（比直接用 __dict__ 安全，不会带临时属性）
        return {
            "name": obj.name,
            "age": obj.age,
            "created_at": obj.created_at
        }
    # 其他类型交给默认处理（会抛出 TypeError）
    raise TypeError(f"Object of type {obj.__class__.__name__} not serializable")

user = User("王五", 40)
json_str = json.dumps(user, default=custom_encoder, ensure_ascii=False, indent=2)
print(json_str)

Method 2: Inheritance`json.JSONEncoder`(Suitable for packaging)

If the same serialization logic is used in multiple places, it can be encapsulated into a subclass and passed inclsparameter:

class CustomJSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, User):
            return {
                "name": obj.name,
                "age": obj.age,
                "created_at": obj.created_at
            }
        return super().default(obj)

# 调用时传 cls
json_str = json.dumps(user, cls=CustomJSONEncoder, ensure_ascii=False, indent=2)

Automatically restore special types when parsing (`object_hook`）

When reading JSON, each time a JSON object (that is, Python'sdict), will be calledobject_hookfunction, here we can restore the time in ISO format todatetime, or put ordinarydictRestore toUser：

import json
from datetime import datetime

def custom_decoder(dct):
    # 每解析到一个 dict 就检查
    if "created_at" in dct:
        try:
            dct["created_at"] = datetime.fromisoformat(dct["created_at"])
        except ValueError:
            pass  # 如果格式不对就保留字符串
    # 如果有 name 和 age，还可以还原成 User（可选）
    if "name" in dct and "age" in dct and "created_at" in dct:
        return User(dct["name"], dct["age"])
    return dct

json_str = '''
{
    "name": "赵六",
    "age": 45,
    "created_at": "2023-01-15T10:30:00"
}
'''

data = json.loads(json_str, object_hook=custom_decoder)
print(type(data))  # <class '__main__.User'>（如果上面做了还原）
print(type(data["created_at"]) if isinstance(data, dict) else type(data.created_at))  # <class 'datetime.datetime'>

Streaming large JSON files (using`ijson`）

If the JSON file exceeds the memory limit (such as several G of log or crawler data), you can use a third-party libraryijsonRead chunk by chunk instead of loading the entire file at once:

# 先安装 ijson
pip install ijson

import ijson

# 假设 large_users.json 是一个包含很多用户的数组：[{"name":...}, {"name":...}, ...]
with open('large_users.json', 'rb') as f:
    # 只流式读取并处理所有用户名，不用加载整个数组
    for name in ijson.items(f, 'item.name'):
        print(name)
        # 这里可以做插入数据库、统计等操作

✅ Best Practices

Encoding issues must be paid attention to: read and write filesencoding='utf-8', serialization plusensure_ascii=False
Required for file operationswith: Automatically manage file handles to prevent leaks
Error handling is required: at least captureFileNotFoundError(File does not exist),json.JSONDecodeError(Parse error, Python 3.5+),TypeError(serialization error)
Streaming must be used for large data:ijson、pandas.read_json(chunksize=...)All are good choices
Security must be considered: Do not parse JSON from untrusted sources to prevent malicious code injection (although Pythonjsonmodule ratioevalSafe, but still cautious)

🎉 Summary

Python's standard libraryjsonIt can already cover 90% of daily JSON processing needs, withijsonGadgets such as lightweight text storage, API interaction, and temporary data caching can be easily handled~ Next time you encounter such needs, don’t rush to open a database!

#JSON processing tutorial in Python

#📌 JSON Basics

#JSON data structure

#🔄 Correspondence between JSON and Python types

#📖 Read JSON data

#Loading from string (json.loads）

#Load from file (json.load）

#✍️ Write JSON data

#Convert to JSON string (json.dumps）

#Write to JSON file (json.dump）

#🚀 Advanced usage

#Processing custom classes/datetime objects

#Method 1: CustomizedefaultFunction (simple and flexible)

#Method 2: Inheritancejson.JSONEncoder(Suitable for packaging)

#Automatically restore special types when parsing (object_hook）

#Streaming large JSON files (usingijson）

#✅ Best Practices

#🎉 Summary