MySQL8.0新特性-CTE,generated column,window function

2020-09-02

MySQL8.0新特性-CTE,generated column,window function

CTE-公用表表达式

CTE包括递归和非递归两种

公用表达式允许使用命令的临时结果集，这是通过允许在SELECT语句和某些其他语句前面使用WITH子句来实现的

为什么需要CTE
不能在同一查询中两次引用派生表，因为那样的话，查询会根据派生表的引用次数计算两次或多次，这会引发严重的性能问题。使用CTE后，子查询只会计算一次

非递归CTE

公用表表达式（ CTE）与派生表类似，但它的声明会放在查询块之前，而不是FROM
子句中。
派生表
SELECT ... FROM (subquery) AS derived, tl
CTE
SELECT ... WI TH derived AS (subquery) SELECT ... FROM deived, tl

派生查询不能引用其他派生查询：
SELECT
FROM (SELECT . . . FROM . . . ) AS dl, (SELECT . . . FROM dl . .. ) AS d2
ERROR: 1146 (42S02): Table 、db. dl’ doesn’ t exist

CTE可以引用其他CTE:
WITH dl AS (SELECT ... FROM . . . ) , d2 AS (SELECT . . . FROM dl . .. )
SELECT
FROM d1,d2...

递归CTE
是一种特殊的CTE，其子查询会引用自己的名字。 WITH子句必须以WITH RECURSIVE开头。递归CTE子查询包括两部分：seed查询和recursive查询，由UNION[ALL]或UNIONDISTINCT分隔

mysql> with recursive cte(n) as ( select 1 union all select n + 1 from cte where n<5) select * from cte;
+------+
| n |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)

假设要执行分层数据遍历，以便为每个员工生成一个组织结构图（即从CEO到每个员工的路径），也可以使用递归CTE!
mysql> create table employees_mgr(
-> id int primary key not null,
-> name varchar(100) not null,
-> manager_id int null,
-> index(manager_id),
-> foreign key(manager_id) references employees_mgr(id));
Query OK, 0 rows affected (0.05 sec)

mysql> insert into employees_mgr values
-> (333,"Yasmina",NULL),
-> (198,"john",333),
-> (692,"Tarek",333),
-> (29,"Pedro",198),
-> (4610,"Sarah",29),
-> (72,"Pierre",29),
-> (123,"Adil",692);
Query OK, 7 rows affected (0.01 sec)
Records: 7 Duplicates: 0 Warnings: 0

mysql> select * from employees_mgr;
+------+---------+------------+
| id | name | manager_id |
+------+---------+------------+
| 29 | Pedro | 198 |
| 72 | Pierre | 29 |
| 123 | Adil | 692 |
| 198 | john | 333 |
| 333 | Yasmina | NULL |
| 692 | Tarek | 333 |
| 4610 | Sarah | 29 |
+------+---------+------------+
7 rows in set (0.00 sec)

mysql> with recursive employee_paths(id,name,path) as
-> (
-> select id,name,cast(id as char(200))
-> from employees_mgr
-> where manager_id is null
-> union all
-> select e.id,e.name,concat(ep.path,',',e.id)
-> from employee_paths as ep join employees_mgr as e
-> on ep.id=e.manager_id) select * from employee_paths order by path;
+------+---------+-----------------+
| id | name | path |
+------+---------+-----------------+
| 333 | Yasmina | 333 |
| 198 | john | 333,198 |
| 29 | Pedro | 333,198,29 |
| 4610 | Sarah | 333,198,29,4610 |
| 72 | Pierre | 333,198,29,72 |
| 692 | Tarek | 333,692 |
| 123 | Adil | 333,692,123 |
+------+---------+-----------------+
7 rows in set (0.00 sec)

mysql> explain with recursive employee_paths(id,name,path) as
-> (
-> select id,name,cast(id as char(200))
-> from employees_mgr
-> where manager_id is null
-> union all
-> select e.id,e.name,concat(ep.path,',',e.id)
-> from employee_paths as ep join employees_mgr as e
-> on ep.id=e.manager_id) select * from employee_paths order by path;
+----+-------------+---------------+------------+------+---------------+------------+---------+-------+------+----------+------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+------------+---------+-------+------+----------+------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using filesort |
| 2 | DERIVED | employees_mgr | NULL | ref | manager_id | manager_id | 5 | const | 1 | 100.00 | Using index condition |
| 3 | UNION | ep | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where |
| 3 | UNION | e | NULL | ref | manager_id | manager_id | 5 | ep.id | 1 | 100.00 | NULL |
+----+-------------+---------------+------------+------+---------------+------------+---------+-------+------+----------+------------------------+
4 rows in set, 1 warning (0.00 sec)

生成列(generated column):

应用场景：
1. 创建表时
2. 已有表时

生成列（generated column）的值是根据列定义中包含的表达式计算得出的。生成列包含下面两种类型：

virtual（虚拟）：当从表中读取记录时，将动态计算该列。
stored（存储）：当向表中写入新记录时，将计算该列并将其存储为常规列。
virtual生成列比stored生成列更有用，因为一个虚拟的列不占用任何存储空间。你可以使用触发器模拟stored生成列的行为。

1. 创建表时
假设你的应用程序从t_employees_generated表中检索数据时，使用full_name表示concat(first_name, ' ', last_name)，而不是使用表达式来表示，从而实现虚拟列实时计算full_name。

# 创建测试表
mysql> create table if not exists t_employees_generated (
emp_no int(11) not null,
birth_date date not null,
first_name varchar(14) not null,
last_name varchar(16) not null,
gender enum('M', 'F') not null,
hire_date date not null,
full_name varchar(30) as (concat(first_name, ' ', last_name)),
primary key (emp_no),
key name (first_name, last_name)
) engine=innodb default charset=utf8mb4;
Query OK, 0 rows affected, 1 warning (0.04 sec)

请注意，应该根据虚拟列修改插入语句。你可以这样使用full insert：

# 执行插入
mysql> insert into t_employees_generated (emp_no, birth_date, first_name, last_name, gender, hire_date) values (123456, '1987-10-02', 'ABC', 'XYZ', 'F', '2008-07-28');
Query OK, 1 row affected (0.11 sec)

# 验证数据
mysql> select * from t_employees_generated where emp_no = '123456';
+--------+------------+------------+-----------+--------+------------+-----------+
| emp_no | birth_date | first_name | last_name | gender | hire_date | full_name |
+--------+------------+------------+-----------+--------+------------+-----------+
| 123456 | 1987-10-02 | ABC | XYZ | F | 2008-07-28 | ABC XYZ |
+--------+------------+------------+-----------+--------+------------+-----------+
1 row in set (0.00 sec)

如果要在INSERT语句中包含full_name，就只能将其指定为DEFAULT

# 执行插入
mysql> insert into t_employees_generated (emp_no, birth_date, first_name, last_name, gender, hire_date, full_name) values (123457, '1987-10-02', 'ABC', 'XYZ', 'F', '2008-07-28', DEFAULT);
Query OK, 1 row affected (0.01 sec)

# 验证数据
mysql> select * from t_employees_generated where emp_no = '123457';
+--------+------------+------------+-----------+--------+------------+-----------+
| emp_no | birth_date | first_name | last_name | gender | hire_date | full_name |
+--------+------------+------------+-----------+--------+------------+-----------+
| 123457 | 1987-10-02 | ABC | XYZ | F | 2008-07-28 | ABC XYZ |
+--------+------------+------------+-----------+--------+------------+-----------+
1 row in set (0.00 sec)

其他值都会引发ERROR 3105 (HY000):错误，不允许在t_employees_generated表中为生成的列full_name指定值：

mysql> insert into t_employees_generated (emp_no, birth_date, first_name, last_name, gender, hire_date, full_name) values (123458, '1987-10-02', 'ABC', 'XYZ', 'F', '2008-07-28', 'TEST');
ERROR 3105 (HY000): The value specified for generated column 'full_name' in table 't_employees_generated' is not allowed.

2. 已有表时
如果你已经创建了表并希望添加新的生成列，请执行ALTER TABLE语句
# 修改表结构 - 新增生成列
mysql> alter table t_employees_generated add hire_date_year year as (year(hire_date)) virtual;
Query OK, 0 rows affected (0.12 sec)
Records: 0 Duplicates: 0 Warnings: 0

# 验证数据
mysql> select * from t_employees_generated where emp_no = '123456';
+--------+------------+------------+-----------+--------+------------+-----------+----------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date | full_name | hire_date_year |
+--------+------------+------------+-----------+--------+------------+-----------+----------------+
| 123456 | 1987-10-02 | ABC | XYZ | F | 2008-07-28 | ABC XYZ | 2008 |
+--------+------------+------------+-----------+--------+------------+-----------+----------------+
1 row in set (0.00 sec)

# 查看表结构
mysql> desc employees.t_employees_generated;
+----------------+---------------+------+-----+---------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------+------+-----+---------+-------------------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | MUL | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| full_name | varchar(30) | YES | | NULL | VIRTUAL GENERATED |
| hire_date_year | year(4) | YES | | NULL | VIRTUAL GENERATED |
+----------------+---------------+------+-----+---------+-------------------+
8 rows in set (0.00 sec)

窗口函数:
对于查询中的每一行，可以使用窗口函数，利用与该行相关的行执行计算。这是通过使用OVER和WINDOW子句来完成的。
以下是可以执行计算的函数。

ROW_NUMBER()：分区内当前行的编号。
RANK()：分区中当前行的等级（有间隔）。
DENSE_RANK()：分区内当前行的等级（无间隔）。
PERCENT_RANK()：百分比排名值。
FIRST_VALUE()：窗口帧中第一行的参数值。
LAST_VALUE()：窗口帧中最末行的参数值。
LEAD()：领先于分区内当前行的那一行的参数值。
LAG()：落后于分区内当前行的那一行的参数值。
NTH_VALUE()：窗口帧中的第 n 行的参数值。
NTILE()：分区内当前行的桶的编号。
COME_DIST()：累积分布值。

准备工作
为了使示例有效，请先添加hire_date_year、full_name虚拟列
参考：MySQL - MySQL 8.0进阶操作：生成列（generated column）

# 新增虚拟列
mysql> alter table employees.employees add hire_date_year year as (year(hire_date)) virtual;
Query OK, 0 rows affected (0.21 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table employees.employees add full_name varchar(30) as (concat(first_name, ' ', last_name)) virtual;
Query OK, 0 rows affected (0.12 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> desc employees.employees;
+----------------+---------------+------+-----+---------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------+------+-----+---------+-------------------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| hire_date_year | year(4) | YES | | NULL | VIRTUAL GENERATED |
| full_name | varchar(30) | YES | | NULL | VIRTUAL GENERATED |
+----------------+---------------+------+-----+---------+-------------------+
8 rows in set (0.00 sec)

分割结果
可以在OVER子句中分割结果。假设你想、找出每年的工资排列情况

partition by：分组
order by：排序

mysql> select full_name, gender,row_number()over(partition by gender order by hire_date_year asc) rn from employees.employees limit 3;
+-------------------+--------+----+
| full_name | gender | rn |
+-------------------+--------+----+
| Mizuhito Yamaashi | M | 1 |
| Moon Muhlberg | M | 2 |
| Randi Tagansky | M | 3 |
+-------------------+--------+----+
3 rows in set (0.84 sec)

行号：
mysql> select concat(first_name," ",last_name) as full_name,salary,row_number() over (order by salary desc) as 'Rank' from employees join salaries on salaries.emp_no=employees.emp_no limit 10;
+-------------------+--------+------+
| full_name | salary | Rank |
+-------------------+--------+------+
| Tokuyasu Pesch | 158220 | 1 |
| Tokuyasu Pesch | 157821 | 2 |
| Honesty Mukaidono | 156286 | 3 |
| Xiahua Whitcomb | 155709 | 4 |
| Sanjai Luders | 155513 | 5 |
| Tsutomu Alameldin | 155377 | 6 |
| Tsutomu Alameldin | 155190 | 7 |
| Tsutomu Alameldin | 154888 | 8 |
| Tsutomu Alameldin | 154885 | 9 |
| Willard Baca | 154459 | 10 |
+-------------------+--------+------+
10 rows in set (9.96 sec)

命名窗口
可以根据需要对一个窗口命名，并多次使用它，元须每次都重新定义：
mysql> select hire_date_year,salary,rank() over w as 'Rank' from employees join salaries on salaries.emp_no=employees.emp_no window w as (partition by hire_date_year order by salary desc) order by salary desc limit 10;
+----------------+--------+------+
| hire_date_year | salary | Rank |
+----------------+--------+------+
| 1985 | 158220 | 1 |
| 1985 | 157821 | 2 |
| 1986 | 156286 | 1 |
| 1985 | 155709 | 3 |
| 1987 | 155513 | 1 |
| 1985 | 155377 | 4 |
| 1985 | 155190 | 5 |
| 1985 | 154888 | 6 |
| 1985 | 154885 | 7 |
| 1985 | 154459 | 8 |
+----------------+--------+------+
10 rows in set (10.27 sec)

第一个、最后一个和第n个值
你可以选择窗口结果中的第一个、最后一个和第n个值。如果该行不存在，则返回NULL。
假设你想从窗口中找到第一个、最后一个和第3个值，代码如下：
mysql> select hire_date_year,salary,rank() over w as 'Rank',
-> first_value(salary) over w as 'first',
-> NTH_VALUE(salary,3) over w as 'third',
-> LAST_value(salary) over w as 'last'
-> from employees join salaries on salaries.emp_no=employees.emp_no window w as (partition by hire_date_year order by salary desc) order by salary desc limit 10;
+----------------+--------+------+--------+--------+--------+
| hire_date_year | salary | Rank | first | third | last |
+----------------+--------+------+--------+--------+--------+
| 1985 | 158220 | 1 | 158220 | NULL | 158220 |
| 1985 | 157821 | 2 | 158220 | NULL | 157821 |
| 1986 | 156286 | 1 | 156286 | NULL | 156286 |
| 1985 | 155709 | 3 | 158220 | 155709 | 155709 |
| 1987 | 155513 | 1 | 155513 | NULL | 155513 |
| 1985 | 155377 | 4 | 158220 | 155709 | 155377 |
| 1985 | 155190 | 5 | 158220 | 155709 | 155190 |
| 1985 | 154888 | 6 | 158220 | 155709 | 154888 |
| 1985 | 154885 | 7 | 158220 | 155709 | 154885 |
| 1985 | 154459 | 8 | 158220 | 155709 | 154459 |
+----------------+--------+------+--------+--------+--------+
10 rows in set (19.89 sec)