2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
-- 创建 user_activity 表
DROP TABLE IF EXISTS user_activity ;
CREATE TABLE user_activity (
user_id STRING,
activity_start TIMESTAMP,
activity_end TIMESTAMP
);
-- 插入数据
INSERT INTO user_activity VALUES
('user1', '2024-07-11 08:00:00', '2024-07-11 09:00:00'),
('user2', '2024-07-11 08:30:00', '2024-07-11 09:30:00'),
('user3', '2024-07-11 09:00:00', '2024-07-11 10:00:00'),
('user4', '2024-07-11 09:15:00', '2024-07-11 09:45:00'),
('user5', '2024-07-11 09:30:00', '2024-07-11 10:30:00'),
('user6', '2024-07-11 10:00:00', '2024-07-11 11:00:00'),
('user7', '2024-07-11 08:05:00', '2024-07-11 08:55:00'),
('user8', '2024-07-11 08:45:00', '2024-07-11 09:15:00'),
('user9', '2024-07-11 09:05:00', '2024-07-11 10:05:00'),
('user10', '2024-07-11 09:25:00', '2024-07-11 10:25:00'),
('user11', '2024-07-11 08:10:00', '2024-07-11 09:10:00'),
('user12', '2024-07-11 08:20:00', '2024-07-11 09:20:00'),
('user13', '2024-07-11 08:35:00', '2024-07-11 09:35:00'),
('user14', '2024-07-11 08:50:00', '2024-07-11 09:50:00'),
('user15', '2024-07-11 09:10:00', '2024-07-11 10:10:00'),
('user16', '2024-07-11 09:20:00', '2024-07-11 10:20:00'),
('user17', '2024-07-11 09:40:00', '2024-07-11 10:40:00'),
('user18', '2024-07-11 10:05:00', '2024-07-11 11:05:00'),
('user19', '2024-07-11 10:15:00', '2024-07-11 11:15:00'),
('user20', '2024-07-11 10:25:00', '2024-07-11 11:25:00');
Calculate the peak number of online users of a system at each time point.
Example results:
activity_time | max_users |
---|---|
2024-07-11 08 | 8 |
2024-07-11 09 | 9 |
… | … |
Results by activity_time
Ascending.
in:
activity_time
Indicates the time point of statistics;max_users
Indicates the highest peak number of people at that time point.select
date_format(activity_time,'yyyy-MM-dd HH') activity_time,
max(total_users) max_users
from
(select
activity_time,
sum(flag) over(order by activity_time) total_users
from
(select
activity_start activity_time,
1 flag
from
user_activity
union all
select
activity_end activity_time,
-1 flag
from
user_activity)t1
)t2
group by
date_format(activity_time,'yyyy-MM-dd HH');
The output is as follows:
The core of this question lies in the subquery t2
The logic in:
select
activity_time,
sum(flag) over(order by activity_time) total_users
from
(select
activity_start activity_time,
1 flag
from
user_activity
union all
select
activity_end activity_time,
-1 flag
from
user_activity)t1;
First, we use the subquery t1
The columns are converted into rows, so why do we need to do this? Of course, it is for the convenience of statistics.
Let's think about it, when a user logs in and enters the system, will the number of people +1
, on the contrary, when the user exits, will the number of people -1
。
When we put the login and logout times in the same column and sort them by time, can we accurately calculate the number of people online at each moment? This is a subquery. t2
What is done is accumulated through the window function.t2
The result is as follows:
2024-07-11 08:00:00 1
2024-07-11 08:05:00 2
2024-07-11 08:10:00 3
2024-07-11 08:20:00 4
2024-07-11 08:30:00 5
2024-07-11 08:35:00 6
2024-07-11 08:45:00 7
2024-07-11 08:50:00 8
2024-07-11 08:55:00 7
2024-07-11 09:00:00 7
2024-07-11 09:00:00 7
2024-07-11 09:05:00 8
2024-07-11 09:10:00 8
2024-07-11 09:10:00 8
2024-07-11 09:15:00 8
2024-07-11 09:15:00 8
2024-07-11 09:20:00 8
2024-07-11 09:20:00 8
2024-07-11 09:25:00 9
2024-07-11 09:30:00 9
2024-07-11 09:30:00 9
2024-07-11 09:35:00 8
2024-07-11 09:40:00 9
2024-07-11 09:45:00 8
2024-07-11 09:50:00 7
2024-07-11 10:00:00 7
2024-07-11 10:00:00 7
2024-07-11 10:05:00 7
2024-07-11 10:05:00 7
2024-07-11 10:10:00 6
2024-07-11 10:15:00 7
2024-07-11 10:20:00 6
2024-07-11 10:25:00 6
2024-07-11 10:25:00 6
2024-07-11 10:30:00 5
2024-07-11 10:40:00 4
2024-07-11 11:00:00 3
2024-07-11 11:05:00 2
2024-07-11 11:15:00 1
2024-07-11 11:25:00 0
Finally, group and aggregate by time point, through max
The function finds the maximum peak number of people at each time point, complete~