MySQL Advanced

MySQL high-level features in action

After mastering the addition, deletion, modification and query, if you want to make SQL queries efficient and easy to maintain, you often need to use some high-level features provided by MySQL. They can not only solve the embarrassment of traditional relational-database when facing semi-structured data, but also elegantly implement complex sorting and aggregation analysis.

This article selects two of the most commonly used features to explain in depth: JSON type (flexible storage of semi-structured data) and window function (efficient processing of analytical queries).


JSON type: use document model in relational-database

Many developers have complained: the "table" of relational-database is too rigid. All columns must have their types defined in advance. As soon as business requirements change,ALTER TABLE, the maintenance cost is not small.

Starting from MySQL 5.7, the native JSON data type was officially introduced; with MySQL 8.0, the log performance bottleneck of JSON fields was solved. It can be said that the JSON type is equivalent to a lightweight document database built into MySQL, making data persistence more flexible.

The JSON type mainly contains two structures:

  1. JSON object (similar to Python dictionary)

    {"name": "骆昊", "tel": "13122335566", "QQ": "957658"}
  2. JSON Array

    [1, 2, 3]
    [{"name": "骆昊", "tel": "13122335566"}, {"name": "王大锤", "tel": "13599876543"}]

Let’s look at the practical application of the JSON type through two classic scenarios.

Scenario 1: Multi-channel login information storage

Most current applications support multiple login methods such as mobile phones, WeChat, QQ, and Weibo, but they do not force users to bind to all channels. If you use traditional column design, you will either add a column for each channel (a large number of null values), or frequently modify the table structure, which is not a good solution.

With the JSON type, it can be designed like this:

CREATE TABLE `tb_test`
(
  `user_id`    bigint unsigned,
  `login_info` json,
  PRIMARY KEY (`user_id`)
);

INSERT INTO `tb_test` 
VALUES 
    (1, '{"tel": "13122335566", "QQ": "654321", "wechat": "jackfrued"}'),
    (2, '{"tel": "13599876543", "weibo": "wangdachui123"}');

When querying the user's mobile phone number and WeChat ID, you can useJSON_EXTRACTExtract the value and passJSON_UNQUOTERemove the quotes. MySQL also provides->>Syntactic sugar makes it easier to write:

-- 标准写法
SELECT `user_id`
     , JSON_UNQUOTE(JSON_EXTRACT(`login_info`, '$.tel')) AS 手机号
     , JSON_UNQUOTE(JSON_EXTRACT(`login_info`, '$.wechat')) AS 微信 
FROM `tb_test`;

-- 推荐写法
SELECT `user_id`
     , `login_info` ->> '$.tel' AS 手机号
     , `login_info` ->> '$.wechat' AS 微信
FROM `tb_test`;

The result is as follows:

+---------+-------------+-----------+
| user_id | 手机号      | 微信       |
+---------+-------------+-----------+
|       1 | 13122335566 | jackfrued |
|       2 | 13599876543 | NULL      |
+---------+-------------+-----------+

Scenario 2: User portrait and tag system

Another great application is Profiling. We can store multiple tag IDs as JSON arrays, thereby avoiding the cumbersome "user-tag" intermediate table, and the query is also very flexible.

  1. Create a tag dictionary table:

    CREATE TABLE `tb_tags`
    (
      `tag_id`   int unsigned NOT NULL COMMENT '标签ID',
      `tag_name` varchar(20)  NOT NULL COMMENT '标签名',
      PRIMARY KEY (`tag_id`)
    );
    
    INSERT INTO `tb_tags` (`tag_id`, `tag_name`) 
    VALUES
        (1, '70后'),
        (2, '80后'),
        (3, '90后'),
        (4, '00后'),
        (5, '爱运动'),
        (6, '高学历'),
        (7, '小资'),
        (8, '有房'),
        (9, '有车'),
        (10, '爱看电影'),
        (11, '爱网购'),
        (12, '常点外卖');
  2. User tag association table (using JSON array):

    CREATE TABLE `tb_users_tags`
    (
      `user_id`   bigint unsigned NOT NULL COMMENT '用户ID',
      `user_tags` json            NOT NULL COMMENT '用户标签'
    );
    
    INSERT INTO `tb_users_tags`
    VALUES
        (1, '[2, 6, 8, 10]'),
        (2, '[3, 8, 9, 11]');
  3. Common query techniques:

  • Query users who contain a specific tag (e.g. tag10, loves watching movies):

      ```sql
      SELECT `user_id`
        FROM `tb_users_tags`
       WHERE 10 MEMBER OF (`user_tags`->'$');
      ```
  • Query users who contain multiple tags at the same time (love watching movies10And born in the 1980s2):

      ```sql
      SELECT `user_id`
        FROM `tb_users_tags`
       WHERE JSON_CONTAINS(`user_tags`->'$', '[2, 10]');
      ```
  • Query users containing any tag (like watching movies10Or those born in the 80s/90s2, 3):

      ```sql
      SELECT `user_id`
        FROM `tb_users_tags`
       WHERE JSON_OVERLAPS(`user_tags`->'$', '[2, 3, 10]');
      ```

Tips:MEMBER OFJSON_CONTAINSandJSON_OVERLAPSThey are all built-in functions provided by MySQL 8.0 for JSON arrays, which can greatly simplify tag queries.


Window function: a data analysis tool

MySQL fully supports Window Functions (Window Functions) starting from version 8.0. Prior to this, similar functions were only available in databases such as Oracle and PostgreSQL, and were often referred to as OLAP (online analytical processing) functions.

What is a "window"?

"Window" can be understood as a collection of records that meet specific conditions. Window functions are special functions that execute on this collection. The biggest difference between it and ordinary aggregate functions is that: Aggregation functions will merge multiple records into one, while window functions will retain each record and append the calculation results on this basis.

Basic syntax

In the syntax of window functions,OVERKeywords are used to define the scope of the window:

<窗口函数> OVER (
    PARTITION BY <分组列>
    ORDER BY <排序列>
    ROWS/RANGE BETWEEN <窗口帧范围>
)

Window functions are mainly divided into two categories:

  1. Special window function:ROW_NUMBER()RANK()DENSE_RANK()LEAD()LAG()wait.
  2. Aggregation function as window function:SUM()AVG()MAX()MIN()COUNT()wait.

The following is a classichrsTake the database (employee/department table) as an example to demonstrate two typical scenarios.

Example 1: Global sorting and paging values

Requirement: Query the employees ranked 4th to 6th from highest to lowest in terms of monthly salary.

SELECT * 
  FROM (
      SELECT `ename`
           , `sal`
           , ROW_NUMBER() OVER (ORDER BY `sal` DESC) AS `rk`
        FROM `tb_emp`
  ) AS `temp`
 WHERE `rk` BETWEEN 4 AND 6;

According to business needs, you can flexibly choose the sorting function:

  • ROW_NUMBER(): Continuous and non-repeating sorting (1, 2, 3, 4, ...)
  • RANK(): Parallel jump sorting (1, 2, 2, 4, ...)
  • DENSE_RANK(): Parallel and continuous sorting (1, 2, 2, 3, ...)

Before MySQL 8.0, achieving the same function usually relied on variable simulation, the code was obscure and the performance was average:

-- 旧版本写法(不推荐)
SELECT `rank`, `ename`, `sal` 
FROM (
       SELECT @a:=@a+1 AS `rank`, `ename`, `sal` 
       FROM `tb_emp`, (SELECT @a:=0) AS t1 
       ORDER BY `sal` DESC
) AS `temp` 
WHERE `rank` BETWEEN 4 AND 6;

Example 2: Sorting within groups

Requirement: Query the two employees with the highest monthly salary in each department.

SELECT `ename`, `sal`, `dname` 
FROM (
    SELECT 
        `ename`, `sal`, `dno`,
        RANK() OVER (PARTITION BY `dno` ORDER BY `sal` DESC) AS `rank`
    FROM `tb_emp`
) AS `temp` 
NATURAL JOIN `tb_dept` 
WHERE `rank` <= 2;

The core here isPARTITION BY, it will divide the data into multiple independent windows according to department numbers, and then sort them separately within each window. finally passWHERE rank <= 2Filter out the Top 2 for each department.


Summary

  • JSON type gives MySQL the ability to process semi-structured data. It is especially suitable for scenarios such as storing dynamic attributes, multi-channel information, user tags, etc. It can be efficiently queried when combined with built-in functions.
  • Window function greatly simplifies analytical queries such as ranking, group sorting, and cumulative statistics. The code is clear and the performance is better than traditional variable or subquery solutions.

These two features are essential skills in the MySQL 8.0 era, and making good use of them can significantly improve development efficiency and query performance.