hive 之 row_number等窗口分析函数用法

一、排序&去重分析

row_number() over(partititon by col1 order by col2) as rn

结果:1,2,3,4

rank() over(partititon by col1 order by col2) as rk

结果:1,2,2,4,5

dense_rank() over(partititon by col1 order by col2) as ds_rk

结果:1,2,2,3,4

上HQL语句:

1
select


1
order_id,


1
departure_date,


1
row_number() over(partition by order_id order by departure_date) as rn, -- 直排


1
rank() over(partition by order_id order by departure_date) as rk, -- 并列的,下一个数字会跳过


1
dense_rank() over(partition by order_id order by departure_date) as d_rk -- 并列的,下一个数据不会跳过


1
from ord_test


1
where order_id=410341346;

运行结果如下图:

二、跨行获取

lag(col1,n,DEFAULT) over(partition by col1 order by col2) as up
用于统计窗口内往上第n行值,第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)

lead(col1,n,DEFAULT) over(partition by col1 order by col2) as down
用于统计窗口内往下第n行值,第一个参数为列名,第二个参数为往下第n行(可选,默认为1),第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)

first_value() over(partition by col1 order by col2) as fv
取分组内排序后,截止到当前行,第一个值

last_value() over(partition by col1 order by col2) as lv
取分组内排序后,截止到当前行,第一个值

1
select


1
order_id,


1
departure_date,


1
first_value(departure_date) over(partition by order_id order by add_time)as fv, -- 取分组内第一条


1
last_value(departure_date) over(partition by order_id order by add_time)as lv -- 取分组内最后一条


1
from ord_test


1
where order_id=410341346;

1
select


1
order_id,


1
departure_date,


1
lead(departure_date,1) over(partition by order_id order by departure_date)as down_1, -- 向下取一级


1
lag(departure_date,1) over(partition by order_id order by departure_date)as up_1 -- 向上取一级


1
from ord_test


1
where order_id=410341346;

结果如下:

mhf

我还没有学会写个人说明!

相关推荐

2 条评论

  1. 已看

  2. 期待更新

微信扫一扫,分享到朋友圈

hive 之  row_number等窗口分析函数用法