需求描述:登录表 tb_user_logs(user_id int, login_time datetime),求3天内连续登录的用户ID。
| user_id | dt |
| 1001 | 2023-06-10 |
| 1001 | 2023-06-11 |
| 1001 | 2023-06-15 |
| 1001 | 2023-06-16 |
| 1001 | 2023-06-17 |
步骤1: 给数据进行排序
| user_id | dt | NO |
| 1001 | 2023-06-10 | 1 |
| 1001 | 2023-06-11 | 2 |
| 1001 | 2023-06-15 | 3 |
| 1001 | 2023-06-16 | 4 |
| 1001 | 2023-06-17 | 5 |
步骤2: 用 “dt-NO”
| user_id | dt | NO | td-NO |
| 1001 | 2023-06-10 | 1 | 2023-06-09 |
| 1001 | 2023-06-11 | 2 | 2023-06-09 |
| 1001 | 2023-06-15 | 3 | 2023-06-12 |
| 1001 | 2023-06-16 | 4 | 2023-06-12 |
| 1001 | 2023-06-17 | 5 | 2023-06-12 |
步骤3:对 “dt-NO” 进行分组,同一组的日期一定是连续的。
整理思路:对每个用户,按照日期从小到大排序,对数据进行连续编号,计算日期与编号的差值,按照差值进行分组操作,同一组内的日期一定是连续的,只要组内数据个数N>=3即可。
编号:窗口函数row_number()支持对行进行编号,编号可以是全局的,也可以分组编号,但一定要提供排序的规则。
语法:
row_number() over (partition by user_id order by dt)
解释:对分组进行编号,artition by user_id:对 user_id 进行分组,order by dt:对 dt 进行排序
实战:
步骤1:ID去重(一个用户1天只统计1次)
select distinct user_id,date(login_time) from tb_user_logs;
步骤2:分组编号
select *,row_number() over (partition by user_id order by dt)
form (select distinct user_id,date(login_time) from tb_user_logs) a;

步骤3:计算差值
select *, date_sub(dt, INTERVAL xrank DAY) as delta
form (
select *,row_number() over (partition by user_id order by dt)
form (
select distinct user_id,date(login_time) from tb_user_logs
) a
)

步骤4:找出>=3的组
select user_id,min(dt) as start_date,count(*) as days
form (
select *, date_sub(dt, INTERVAL xrank DAY) as delta
form (
select *,row_number() over (partition by user_id order by dt)
form (
select distinct user_id,date(login_time) from tb_user_logs
) a
) b
) c
group by
user_id, delta
having count(*) >=3;