窗函数 (SQL)在SQL中,窗函数(window function)或分析函数(analytic function)[1]是一个函数,它使用来自一行或多行的值来为每一行返回一个值。 与之形成对比,聚合函数为多行返回单个值。窗口函数有一个 例子1例如,这里有一个查询,它使用一个窗口函数来比较每个员工的工资与其部门的平均工资(来自PostgreSQL文档的示例):[3] SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
输出: depname | empno | salary | avg ----------+-------+--------+---------------------- develop | 11 | 5200 | 5020.0000000000000000 develop | 7 | 4200 | 5020.0000000000000000 develop | 9 | 4500 | 5020.0000000000000000 develop | 8 | 6000 | 5020.0000000000000000 develop | 10 | 5200 | 5020.0000000000000000 personnel | 5 | 3500 | 3700.0000000000000000 personnel | 2 | 3900 | 3700.0000000000000000 sales | 3 | 4800 | 4866.6666666666666667 sales | 1 | 5000 | 4866.6666666666666667 sales | 4 | 4800 | 4866.6666666666666667 (10 rows)
窗口函数在聚合之后进行评估(例如,在 语法根据PostgreSQL文档,窗函数具有下列之一的语法:[4] function_name ([expression [, expression ... ]]) OVER window_name
function_name ([expression [, expression ... ]]) OVER ( window_definition )
function_name ( * ) OVER window_name
function_name ( * ) OVER ( window_definition )
其中
其中 [ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]
{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]
注:
例子2窗函数允许访问恰在当前记录之前和之后的记录的数据。[5][6][7][8] 一个窗函数定义当前行周围具有给定行数的帧或窗,并跨窗对数据集执行计算。[9][10] NAME | ------------ Aaron| <-- Preceding (unbounded) Andrew| Amelia| James| Jill| Johnny| <-- 1st preceding row Michael| <-- Current row Nick| <-- 1st following row Ophelia| Zach| <-- Following (unbounded) 在上表中,下一个查询为每一行提取具有前一行和后一行的窗口的值w: SELECT
LAG(name, 1)
OVER(ORDER BY name) "prev",
name,
LEAD(name, 1)
OVER(ORDER BY name) "next"
FROM people
ORDER BY name
查询结果为: | PREV | NAME | NEXT | |----------|----------|----------| | (null)| Aaron| Andrew| | Aaron| Andrew| Amelia| | Andrew| Amelia| James| | Amelia| James| Jill| | James| Jill| Johnny| | Jill| Johnny| Michael| | Johnny| Michael| Nick| | Michael| Nick| Ophelia| | Nick| Ophelia| Zach| | Ophelia| Zach| (null)| 例子3# 首先创建虚拟的业务员销售数据
CREATE TABLE Sales
(
idate date,
iname char(2),
sales int
);
# 向表中插入数据
INSERT INTO Sales VALUES
('2021/1/1', '丁一', 200),
('2021/2/1', '丁一', 180),
('2021/2/1', '李四', 100),
('2021/3/1', '李四', 150),
('2021/2/1', '刘猛', 180),
('2021/3/1', '刘猛', 150),
('2021/1/1', '王二', 200),
('2021/2/1', '王二', 180),
('2021/3/1', '王二', 300),
('2021/1/1', '张三', 300),
('2021/2/1', '张三', 280),
('2021/3/1', '张三', 280);
# 查询各月中销售业绩最差的业务员
SELECT * FROM
(SELECT month(idate),iname,sales,
ROW_NUMBER() OVER(PARTITION BY month(idate) ORDER BY sales) as sales_order
FROM Sales) as t
WHERE sales_order=1;
例子4求用户连续登录天数
# 首先创建虚拟的用户登录表,并插入数据
create table user_login
(
user_id varchar(100),
login_time datetime
);
insert into user_login values
(1,'2020-11-25 13:21:12'),
(1,'2020-11-24 13:15:22'),
(1,'2020-11-24 10:30:15'),
(1,'2020-11-24 09:18:27'),
(1,'2020-11-23 07:43:54'),
(1,'2020-11-10 09:48:36'),
(1,'2020-11-09 03:30:22'),
(1,'2020-11-01 15:28:29'),
(1,'2020-10-31 09:37:45'),
(2,'2020-11-25 13:54:40'),
(2,'2020-11-24 13:22:32'),
(2,'2020-11-23 10:55:52'),
(2,'2020-11-22 06:30:09'),
(2,'2020-11-21 08:33:15'),
(2,'2020-11-20 05:38:18'),
(2,'2020-11-19 09:21:42'),
(2,'2020-11-02 00:19:38'),
(2,'2020-11-01 09:03:11'),
(2,'2020-10-31 07:44:55'),
(2,'2020-10-30 08:56:33'),
(2,'2020-10-29 09:30:28');
# 第一种情况:查看每位用户连续登陆的情况。包括每位用户连续登录的情况、查看每位用户最大连续登录的天数、查看在某个时间段里连续登录天数超过N天的用户
select user_id,
min(login_date) start_date,
max(login_date) end_date,
count(login_date) days # 计算每位用户连续登录天数
from (select *,date_sub(login_date, interval irank day) idate # 增加辅助列,以判断用户是否连续登录
from (select *,rank() over(partition by user_id order by login_date) irank # 对每个用户的登录日期排序
from (select distinct user_id, date(login_time) login_date from user_login
# 数据预处理:由于统计的窗口期是天数,对登录时间字段格式转换为日期然后去重(去掉用户一天内多次登录的情况)
) as a
) as b
) as c
group by user_id,idate;
# 第二种情况:计算每个用户最大连续登录天数
# 第三种情况:查看给定时间段内连续登录天数≥5天的用户
# 找出相差天数为5的记录
select distinct user_id
from (select *,datediff(idate5,login_date)+1 as days # 计算当前登录日期与之后第4次登陆的日期的差值
from (select *,lead(login_date,4) over(partition by user_id order by login_date) idate5 # 获取当前登录日期之后第4次登陆的日期
from user_logrin_date)
as a)
as b
where days = 5;
历史SQL:2003引入了窗函数,其后的标准扩展了其功能。[11] MySQL从8.0开始引入了窗函数。目前支持的语法为: # 开窗函数语法
func_name(<parameter>) OVER([PARTITION BY <part_by_condition>] [ORDER BY <order_by_list> ASC|DESC])
参考文献
|