Complete tutorial on operating MySQL database in Python

1. Preface

In modern web development, data crawling, automated reporting and other scenarios, relational-database is still one of the core choices for structured data management. Among them, MySQL has become the most widely used database from entry to production environments due to its advantages of free open source, mature ecology, and stable performance.

This tutorial adopts the idea of from 0 to usable, and then to production optimization**, and uses concise and practical code examples to help you quickly master the core skills of operating MySQL in Python. The content covers environment-setup, secure connections, CRUD, transaction processing and ORM practice, with production-level best practices attached.

2. Environment preparation

2.1 Rapid deployment of MySQL (Docker)

If you don’t want to install the MySQL service directly on your local machine, using Docker to start it with one click is the fastest solution:

docker run --name local-mysql8 \
  -e MYSQL_ROOT_PASSWORD=123456 \
  -p 3306:3306 \
  -d mysql:8.0

After startup, you can passdocker exec -it local-mysql8 mysql -uroot -p123456Enter the container and create a test database in advance:

CREATE DATABASE IF NOT EXISTS spiders DEFAULT CHARACTER SET utf8mb4;

You can also use graphical tools such as Navicat and DBeaver to connect.

2.2 Install Python driver library

Commonly used libraries for Python operations on MySQL include the following:

Library name	Features	Applicable scenarios
`PyMySQL`	Pure Python implementation, strong compatibility, no compilation dependencies, simple installation	entry-level learning, lightweight applications, cross-platform environment
`mysql-connector-python`	The Python driver officially provided by MySQL has slightly better performance	Projects with further performance requirements and official support
`SQLAlchemy`	Python's most popular ORM tool, which can operate multiple databases and shield SQL differences	Complex business logic, team collaboration, multi-database projects

PyMySQL is the first choice at the entry stage, and can be combined with SQLAlchemy later to improve development efficiency. Install basic libraries:

pip install pymysql cryptography

cryptographyUsed to support MySQL 8.0 defaultcaching_sha2_passwordAuthentication plugin.

3. Safe and efficient database connection

3.1 Entry level: single basic connection

This connection method is suitable for quick verification and learning, and should not be used directly in production environments.

import pymysql

try:
    db = pymysql.connect(
        host='localhost',
        user='root',
        password='123456',
        database='spiders',
        port=3306,
        charset='utf8mb4',                 # 必须用 utf8mb4，支持 Emoji 和完整 Unicode
        cursorclass=pymysql.cursors.DictCursor  # 返回字典，方便按列名读取
    )

    with db.cursor() as cursor:
        cursor.execute("SELECT VERSION()")
        result = cursor.fetchone()
        print(f"✅ 数据库连接成功，版本：{result['VERSION()']}")
finally:
    if db.open:
        db.close()
        print("🔌 连接已关闭")

3.2 Production level: using connection pooling

Frequently creating and destroying database connections is extremely expensive, so connection pools must be used in production environments. recommendDBUtils：

pip install dbutils

Connection pool configuration example:

from dbutils.pooled_db import PooledDB
import pymysql

# 全局连接池对象，只初始化一次
_pool = None

def init_pool():
    global _pool
    if _pool is None:
        _pool = PooledDB(
            creator=pymysql,
            maxconnections=10,    # 最大连接数，依据服务器配置调整
            mincached=2,         # 启动时保留的空闲连接数
            host='localhost',
            user='root',
            password='123456',
            database='spiders',
            charset='utf8mb4',
            cursorclass=pymysql.cursors.DictCursor
        )
    return _pool

def get_connection():
    pool = init_pool()
    return pool.connection()

All subsequent database operations passget_connection()Obtain a connection and resources are managed uniformly by the connection pool.

4. Core CRUD operations

We pass astudentsTable to demonstrate addition, deletion, modification and query.

4.1 Initialize table structure

def create_students_table():
    sql = """
    CREATE TABLE IF NOT EXISTS students (
        id VARCHAR(255) NOT NULL COMMENT '学号',
        name VARCHAR(255) NOT NULL COMMENT '姓名',
        age INT NOT NULL COMMENT '年龄',
        gender VARCHAR(10) DEFAULT NULL COMMENT '性别',
        PRIMARY KEY (id)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='学生信息表'
    """
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(sql)
    print("✅ 学生表创建/检查完成")

with get_connection() as connThe transaction will be automatically processed: if there is no exception, it will be submitted, and if an exception occurs, it will be rolled back.

4.2 Insert data

4.2.1 Single insertion

def insert_single_student():
    sql = "INSERT INTO students(id, name, age) VALUES(%s, %s, %s)"
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(sql, ('20240001', '张三', 20))
    print(f"✅ 插入成功，影响行数：{cursor.rowcount}")

⚠️ placeholder must be used%s, Never use string concatenation, otherwise there is a risk of SQL injection!

4.2.2 Batch insert (strongly recommended)

To insert 1,000 pieces of data, a single insertion may require 1,000 network interactions, while a batch insertion only takes about 1 time.

def insert_batch_students():
    sql = "INSERT INTO students(id, name, age) VALUES(%s, %s, %s)"
    data = [
        ('20240002', '李四', 21),
        ('20240003', '王五', 22),
        ('20240004', '赵六', 20)
    ]
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.executemany(sql, data)
    print(f"✅ 批量插入成功，影响行数：{cursor.rowcount}")

4.2.3 Update if it exists, insert if it does not exist (Upsert)

This is a necessary operation for crawler data synchronization and idempotent writing.

def upsert_student():
    sql = """
    INSERT INTO students(id, name, age, gender) 
    VALUES(%s, %s, %s, %s)
    ON DUPLICATE KEY UPDATE 
        name=VALUES(name), age=VALUES(age), gender=VALUES(gender)
    """
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(sql, ('20240001', '张三三', 23, '男'))
    print(f"✅ Upsert 完成，影响行数：{cursor.rowcount}")

Return value rules: If the number of affected rows is 1, it means inserting a new record, and if it is 2, it means updating existing records.

4.3 Update, delete, query

Update data

def update_student():
    sql = "UPDATE students SET age = %s WHERE name = %s"
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(sql, (21, '李四'))
    print("✅ 更新完成")

Delete data

def delete_student():
    sql = "DELETE FROM students WHERE age < %s"
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(sql, (21,))
    print(f"✅ 删除成功，影响行数：{cursor.rowcount}")

Query data

Single query:

def query_one():
    sql = "SELECT id, name, age FROM students WHERE id = %s"
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(sql, ('20240001',))
            row = cursor.fetchone()
            if row:
                print(row)

Multiple queries:

def query_all():
    sql = "SELECT id, name, age FROM students"
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(sql)
            rows = cursor.fetchall()
            for row in rows:
                print(row)

Paging query:

def query_page(page, page_size):
    offset = (page - 1) * page_size
    sql = "SELECT id, name, age FROM students LIMIT %s OFFSET %s"
    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(sql, (page_size, offset))
            rows = cursor.fetchall()
            return rows

When paging large amounts of data,OFFSETIf it is too large, it will cause performance degradation. You can use cursor paging based on auto-incrementing primary keys instead: WHERE id > last_id LIMIT page_size

Large data volume streaming query:

When the result set may be very large, and to avoid loading all the data into memory at once, a streaming cursor can be used:

def stream_query():
    sql = "SELECT id, name, age FROM students"
    with get_connection() as conn:
        # 使用 SSCursor（服务器端游标），逐条返回结果
        with conn.cursor(pymysql.cursors.SSCursor) as cursor:
            cursor.execute(sql)
            row = cursor.fetchone()
            while row:
                print(row)
                row = cursor.fetchone()

SSCursorThe result set will be left on the server side and read line by line by the client. The memory usage is constant, which is very suitable for exporting large amounts of data.

5. Transaction processing: ensuring data consistency

In key businesses such as transfers and orders, a series of operations are required to either all succeed or all fail (ACID characteristics). The following transfer example demonstrates the use of transactions.

def transfer_money(from_account, to_account, amount):
    # 创建账户表（如果不存在）
    create_sql = """
    CREATE TABLE IF NOT EXISTS accounts (
        id VARCHAR(255) NOT NULL PRIMARY KEY,
        balance DECIMAL(10,2) NOT NULL DEFAULT 0.00
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
    """
    # 插入测试数据（幂等）
    insert_sql = """
    INSERT IGNORE INTO accounts(id, balance) VALUES
    ('A1001', 1000.00),
    ('A1002', 500.00);
    """

    with get_connection() as conn:
        with conn.cursor() as cursor:
            cursor.execute(create_sql)
            cursor.execute(insert_sql)

            try:
                # 检查余额
                cursor.execute("SELECT balance FROM accounts WHERE id = %s", (from_account,))
                row = cursor.fetchone()
                if not row or row['balance'] < amount:
                    raise ValueError("账户余额不足或账户不存在")

                # 扣款
                cursor.execute(
                    "UPDATE accounts SET balance = balance - %s WHERE id = %s",
                    (amount, from_account)
                )

                # 模拟中途出错（可取消注释测试回滚）
                # raise Exception("网络中断！")

                # 存款
                cursor.execute(
                    "UPDATE accounts SET balance = balance + %s WHERE id = %s",
                    (amount, to_account)
                )

                print("✅ 转账成功")
            except Exception as e:
                # 任何步骤出错，回滚整个事务
                conn.rollback()
                print(f"❌ 转账失败，已回滚：{e}")

because of usingwithstatement, even if it is not submitted explicitly, will automatically exit when there is no exception.commit(); Once an exception occurs, the context manager will executerollback(). explicit callrollback()It can make the intention clearer.

6. Advanced: Use SQLAlchemy ORM to simplify code

Although handwritten SQL is flexible, code readability and maintainability will decrease when the business logic is complex. SQLAlchemy As the most popular ORM for Python, it allows you to operate databases like objects.

Install:

pip install sqlalchemy pymysql

Complete example:

from sqlalchemy import create_engine, Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

# 1. 基类
Base = declarative_base()

# 2. 定义模型（映射到 students 表）
class StudentORM(Base):
    __tablename__ = 'students'

    id = Column(String(255), primary_key=True, comment='学号')
    name = Column(String(255), nullable=False, comment='姓名')
    age = Column(Integer, nullable=False, comment='年龄')
    gender = Column(String(10), comment='性别')

# 3. 创建引擎（内置连接池）
engine = create_engine(
    'mysql+pymysql://root:123456@localhost:3306/spiders',
    pool_size=10,          # 连接池大小
    pool_recycle=3600      # 防止连接被 MySQL 超时断开
)

# 4. 自动创建表（如果不存在）
Base.metadata.create_all(engine)

# 5. 创建会话工厂
SessionLocal = sessionmaker(bind=engine)

def test_orm():
    db = SessionLocal()
    try:
        # 插入
        new_student = StudentORM(id='20240005', name='钱七', age=22, gender='女')
        db.add(new_student)

        # 查询年龄 >= 20 的学生
        students = db.query(StudentORM).filter(StudentORM.age >= 20).all()
        print("✅ 查询结果：")
        for stu in students:
            print(stu.id, stu.name, stu.age)

        db.commit()
    except Exception as e:
        db.rollback()
        print(f"❌ ORM 操作失败：{e}")
    finally:
        db.close()

test_orm()

Through ORM, query, update and delete become intuitive Python object operations, greatly reducing the error rate of handwritten SQL.

7. Production-level best practices and pitfalls to avoid

When using Python + MySQL in a production environment, be sure to adhere to the following principles:

Connection pooling must be used: eitherDBUtilsEither SQLAlchemy's built-in connection pool can significantly reduce connection overhead.
Must use parameterized query: all user input should pass%sPlaceholders are passed in to prevent SQL injection.
Character sets are used uniformlyutf8mb4: Supports Emoji and full Unicode characters to avoid garbled characters.
Proper use of transactions: Key business logic (such as payment, inventory deduction) must be wrapped in transactions to ensure data consistency.
Release resources in a timely manner: Make full use ofwithstatement ortry-finallyClose the connection and cursor.
Batch operation efficiency improvement: Use it first when writing a large amount of dataexecutemany()or batchINSERTstatement.
Configure connection timeout and automatic reconnection: Settingspool_recycle(maximum connection reuse time) andconnect_timeout, to prevent the connection from being disconnected idle.
Add indexes for commonly used query fields: Analyze slow query logs and optimize accordinglyWHERE、JOINConditional fields.
Decoupling of sensitive information: Configurations such as database passwords should be read from environment variables or the configuration center, and hard coding is strictly prohibited.

8. Summary

This tutorial starts with basic connections and gradually explains how to operate MySQL safely and efficiently in Python:

Get started quickly: PyMySQL single connection, suitable for learning and gadgets.
Production solution: PyMySQL + DBUtils connection pool, suitable for lightweight backend services.
Advanced method: SQLAlchemy ORM, suitable for complex business systems, greatly improving development efficiency.

Mastering these contents is enough for you to handle most data interaction scenarios between Python and MySQL. With good habits of transactions, batch operations, and resource management, you can write database code that is both robust and efficient.

#Complete tutorial on operating MySQL database in Python

#1. Preface

#2. Environment preparation

#2.1 Rapid deployment of MySQL (Docker)

#2.2 Install Python driver library

#3. Safe and efficient database connection

#3.1 Entry level: single basic connection

#3.2 Production level: using connection pooling

#4. Core CRUD operations

#4.1 Initialize table structure

#4.2 Insert data

#4.2.1 Single insertion

#4.2.2 Batch insert (strongly recommended)

#4.2.3 Update if it exists, insert if it does not exist (Upsert)

#4.3 Update, delete, query

#Update data

#Delete data

#Query data

#5. Transaction processing: ensuring data consistency

#6. Advanced: Use SQLAlchemy ORM to simplify code

#7. Production-level best practices and pitfalls to avoid

#8. Summary