今天突然任务报错.运行到stage-7的时候报错。知道肯定是数据倾斜导致的,就是查不到具体哪里的问题。看yarn job的log也没发现有用的信息。
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
关于Hive执行计划简述
一般执行计划有两个部分:
stage dependencies 各个stage之间的依赖性
stage plan 各个stage的执行计划
一个stage并不一定是一个MR,有可能是Fetch Operator,也有可能是Move Operator。
一个MR的执行计划分为两个部分:
Map Operator Tree MAP端的执行计划
Reduce Operator Tree Reduce端的执行计划
一些常见的Operator:
TableScan 读取数据,常见的属性 alias
Select Operator 选取操作
Group By Operator 分组聚合, 常见的属性 aggregations、mode , 当没有keys属性时只有一个分组。
Reduce Output Operator 输出结果给Reduce , 常见的属性 sort order
Fetch Operator 客户端获取数据 , 常见属性 limit
常见的属性的取值及含义:
aggregations 用在Group By Operator中
count()计数
mode 用在Group By Operator中
hash 待定
mergepartial 合并部分聚合结果
final
sort order 用于Reduce Output Operator中
+ 正序排序
不排序
++按两列正序排序,如果有两列
+- 正反排序,如果有两列
-反向排序
如此类推
后来就通过explain看下执行计划。
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-3 depends on stages: Stage-2, Stage-17, Stage-29
Stage-37 depends on stages: Stage-3
Stage-28 depends on stages: Stage-37
Stage-27 depends on stages: Stage-21, Stage-28 , consists of Stage-36, Stage-5
Stage-36 has a backup stage: Stage-5
Stage-26 depends on stages: Stage-36
Stage-35 depends on stages: Stage-5, Stage-26
Stage-7 depends on stages: Stage-35
Stage-13 depends on stages: Stage-7 , consists of Stage-10, Stage-9, Stage-11
Stage-10
Stage-0 depends on stages: Stage-10, Stage-9, Stage-12
Stage-8 depends on stages: Stage-0
Stage-9
Stage-11
Stage-12 depends on stages: Stage-11
Stage-5
Stage-32 is a root stage , consists of Stage-16
Stage-16
Stage-30 depends on stages: Stage-16 , consists of Stage-38, Stage-17
Stage-38 has a backup stage: Stage-17
Stage-29 depends on stages: Stage-38
Stage-17
Stage-34 is a root stage , consists of Stage-21
Stage-21
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: ddm_content_doc_info_logic_day
Statistics: Num rows: 502133848 Data size: 209584131934 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (((substr(docid, 9, 2)) IN ('04', '02', '03', '07', '00') or (substr(docid, 9, 4) = '9001')) and (topicid = substr(docid, 9, 8))) (type: boolean)
Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: topicid (type: string), docid (type: string), title (type: string), digest (type: string), from_unixtime(UDFToInteger((ptime / 1000))) (type: string), source (type: string), url (type: string), search (type: string), finearticle (type: string), category (type: string), quality (type: string), dkeys (type: string), interests (type: string), professional (type: string), body (type: string), buloid (type: string), ispic (type: string), iscomment (type: string), doc_del (type: int), channelid (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19
Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: int), _col19 (type: string)
TableScan
alias: ddm_content_netease_audit_doc_base_day
Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (doctype) IN ('40', '43', '44', '45', '46', '48', '49', '10', '50') (type: boolean)
Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: topicid (type: string), docid (type: string), title (type: string), publish_time (type: string), split(source, '&&')[0] (type: string), doc_url (type: string), regexp_replace(category, ',', '\') (type: string), quality (type: float), keywords (type: string), interests (type: string), picnum (type: tinyint), delstatus (type: tinyint), channel (type: string), doctype (type: tinyint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: float), _col8 (type: string), _col9 (type: string), _col10 (type: tinyint), _col11 (type: tinyint), _col12 (type: string), _col13 (type: tinyint)
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
keys:
0 _col1 (type: string)
1 _col1 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33
Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: COALESCE(_col0,_col20) (type: string), COALESCE(_col1,_col21) (type: string), COALESCE(_col2,_col22) (type: string), _col3 (type: string), COALESCE(_col23,_col4) (type: string), COALESCE(_col24,_col5) (type: string), COALESCE(_col25,_col6) (type: string), _col7 (type: string), _col8 (type: string), COALESCE(_col9,_col26) (type: string), COALESCE(_col10,_col27) (type: string), COALESCE(_col11,_col28) (type: string), COALESCE(_col12,_col29) (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col30 (type: tinyint), _col17 (type: string), COALESCE(_col18,_col31) (type: int), COALESCE(_col19,_col32) (type: string), _col33 (type: tinyint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: tinyint)
TableScan
alias: ddm_content_netease_article_day
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: id (type: string), title (type: string), description (type: string), publish_time (type: string), source (type: string), url (type: string), '40' (type: string), body (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), COALESCE(_col1,_col22) (type: string), COALESCE(_col2,_col23) (type: string), COALESCE(_col3,_col24) (type: string), COALESCE(_col4,_col25) (type: string), COALESCE(_col5,_col26) (type: string), COALESCE(_col6,_col27) (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), COALESCE(_col14,_col29) (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), COALESCE(_col21,_col28) (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: string)
TableScan
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string)
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23
Statistics: Num rows: 334170096 Data size: 139478248870 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), COALESCE(_col1,_col22) (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), COALESCE(_col23,_col21) (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
Statistics: Num rows: 334170096 Data size: 139478248870 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-37
Map Reduce Local Work
Alias -> Map Local Tables:
t:b:ddm_content_original_column_d
Fetch Operator
limit: -1
t:d:ddm_content_local_city
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
t:b:ddm_content_original_column_d
TableScan
alias: ddm_content_original_column_d
Statistics: Num rows: 6 Data size: 1273 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: topicid (type: string), colname (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 6 Data size: 1273 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col0 (type: string)
1 _col0 (type: string)
2 _col0 (type: string)
t:d:ddm_content_local_city
TableScan
alias: ddm_content_local_city
Statistics: Num rows: 1652 Data size: 452648 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: local_city_id (type: string), local_city_name (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1652 Data size: 452648 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col0 (type: string)
1 _col0 (type: string)
2 _col0 (type: string)
Stage: Stage-28
Map Reduce
Map Operator Tree:
TableScan
Map Join Operator
condition map:
Left Outer Join0 to 1
Left Outer Join0 to 2
keys:
0 _col0 (type: string)
1 _col0 (type: string)
2 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25
Statistics: Num rows: 735174227 Data size: 306852154164 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Local Work:
Map Reduce Local Work
Stage: Stage-27
Conditional Operator
Stage: Stage-36
Map Reduce Local Work
Alias -> Map Local Tables:
t:$INTNAME1
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
t:$INTNAME1
TableScan
HashTable Sink Operator
keys:
0 _col1 (type: string)
1 _col0 (type: string)
Stage: Stage-26
Map Reduce
Map Operator Tree:
TableScan
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38
Statistics: Num rows: 808691667 Data size: 337537376896 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Local Work:
Map Reduce Local Work
Stage: Stage-35
Map Reduce Local Work
Alias -> Map Local Tables:
t:e:ddm_content_channel_day
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
t:e:ddm_content_channel_day
TableScan
alias: ddm_content_channel_day
Statistics: Num rows: 32 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: channelid (type: string), channelname (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 32 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col20 (type: string)
1 _col0 (type: string)
Stage: Stage-7
Map Reduce
Map Operator Tree:
TableScan
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col20 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col40
Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string), _col4 (type: string)
sort order: +-
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col25 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string), _col30 (type: string), _col31 (type: string), _col32 (type: string), _col33 (type: string), _col34 (type: int), _col35 (type: int), _col36 (type: int), _col37 (type: string), _col38 (type: int), _col40 (type: string)
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Select Operator
expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: string), VALUE._col1 (type: string), VALUE._col2 (type: string), KEY.reducesinkkey1 (type: string), VALUE._col3 (type: string), VALUE._col4 (type: string), VALUE._col5 (type: string), VALUE._col6 (type: string), VALUE._col7 (type: string), VALUE._col8 (type: string), VALUE._col9 (type: string), VALUE._col10 (type: string), VALUE._col11 (type: string), VALUE._col12 (type: string), VALUE._col13 (type: string), VALUE._col14 (type: string), VALUE._col15 (type: tinyint), VALUE._col16 (type: string), VALUE._col17 (type: int), VALUE._col19 (type: string), VALUE._col20 (type: string), VALUE._col21 (type: string), VALUE._col23 (type: string), VALUE._col25 (type: string), VALUE._col26 (type: string), VALUE._col27 (type: string), VALUE._col28 (type: string), VALUE._col29 (type: string), VALUE._col30 (type: string), VALUE._col31 (type: string), VALUE._col32 (type: int), VALUE._col33 (type: int), VALUE._col34 (type: int), VALUE._col35 (type: string), VALUE._col36 (type: int), VALUE._col38 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col40
Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
PTF Operator
Function definitions:
Input definition
input alias: ptf_0
output shape: _col0: string, _col1: string, _col2: string, _col3: string, _col4: string, _col5: string, _col6: string, _col7: string, _col8: string, _col9: string, _col10: string, _col11: string, _col12: string, _col13: string, _col14: string, _col15: string, _col16: string, _col17: tinyint, _col18: string, _col19: int, _col21: string, _col22: string, _col23: string, _col25: string, _col27: string, _col28: string, _col29: string, _col30: string, _col31: string, _col32: string, _col33: string, _col34: int, _col35: int, _col36: int, _col37: string, _col38: int, _col40: string
type: WINDOWING
Windowing table definition
input alias: ptf_1
name: windowingtablefunction
order by: _col4 DESC NULLS LAST
partition by: _col1
raw input shape:
window functions:
window function definition
alias: row_number_window_0
name: row_number
window function: GenericUDAFRowNumberEvaluator
window frame: PRECEDING(MAX)~FOLLOWING(MAX)
isPivotResult: true
Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (row_number_window_0 = 1) (type: boolean)
Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), COALESCE(_col3,_col28) (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), COALESCE(_col14,_col29) (type: string), _col15 (type: string), _col16 (type: string), UDFToString(_col17) (type: string), _col18 (type: string), UDFToString(_col19) (type: string), _col30 (type: string), _col31 (type: string), _col32 (type: string), UDFToString(_col34) (type: string), UDFToString(_col35) (type: string), UDFToString(_col36) (type: string), _col37 (type: string), UDFToString(_col38) (type: string), CASE WHEN (((_col21 = '40') or (_col21 = '50') or (_col21 = '40-1'))) THEN ('网易号') WHEN (((_col21) IN ('43', '45', '46', '49') or (substr(_col1, 9, 4) = '9001'))) THEN ('抓取') WHEN (((substr(_col1, 9, 2) = '04') or (_col21 = '10'))) THEN ('本地') WHEN (_col22 is not null) THEN ('原创') WHEN ((substr(_col1, 9, 2)) IN ('02', '03', '07', '00')) THEN ('编辑') WHEN ((_col21 = '48')) THEN ('合作媒体') WHEN ((_col21 = '44')) THEN ('有道分享') END (type: string), CASE WHEN (((_col21 = '40') or (_col21 = '50') or (_col21 = '40-1'))) THEN (_col34) WHEN ((_col21 = '43')) THEN ('灰抓今日头条文章(无版权)') WHEN ((_col21 = '45')) THEN ('灰抓微信(无版权)') WHEN ((_col21 = '46')) THEN ('定向增量(无版权)') WHEN (((substr(_col1, 9, 4) = '9001') or (_col21 = '49'))) THEN ('段子抓取') WHEN (((substr(_col1, 9, 2) = '04') or (_col21 = '10'))) THEN (_col25) WHEN (_col22 is not null) THEN (_col23) WHEN ((substr(_col1, 9, 2)) IN ('02', '03', '07', '00')) THEN (_col40) WHEN ((_col21 = '48')) THEN ('合作媒体') WHEN ((_col21 = '44')) THEN ('有道分享') END (type: string), _col21 (type: string), _col27 (type: string), _col33 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32
Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: portal.ddm_content_doc_day
Stage: Stage-13
Conditional Operator
Stage: Stage-10
Move Operator
files:
hdfs directory: true
destination: hdfs://hz-cluster9/tmp/hive/public/.hive-staging_hive_2019-07-20_07-47-52_070_1672411049055087804-1/-ext-10000
Stage: Stage-0
Move Operator
tables:
partition:
day 20190719
replace: true
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: portal.ddm_content_doc_day
Stage: Stage-8
Stats-Aggr Operator
Stage: Stage-9
Map Reduce
Map Operator Tree:
TableScan
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: portal.ddm_content_doc_day
Stage: Stage-11
Map Reduce
Map Operator Tree:
TableScan
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: portal.ddm_content_doc_day
Stage: Stage-12
Move Operator
files:
hdfs directory: true
destination: hdfs://hz-cluster9/tmp/hive/public/.hive-staging_hive_2019-07-20_07-47-52_070_1672411049055087804-1/-ext-10000
Stage: Stage-5
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 735174227 Data size: 306852154164 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col25 (type: string)
TableScan
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: int), _col9 (type: int), _col10 (type: int), _col11 (type: string), _col12 (type: int)
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38
Statistics: Num rows: 808691667 Data size: 337537376896 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-32
Conditional Operator
Stage: Stage-16
Map Reduce
Map Operator Tree:
TableScan
alias: ddm_content_netease_article_day
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: id (type: string)
outputColumnNames: _col0
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
TableScan
alias: ddm_content_netease_audit_doc_base_day
Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (doctype) IN ('40', '43', '44', '45', '46', '48', '49', '10', '50') (type: boolean)
Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: docid (type: string)
outputColumnNames: _col0
Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: CASE WHEN (_col1 is null) THEN (_col0) END (type: string)
outputColumnNames: _col0
Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-30
Conditional Operator
Stage: Stage-38
Map Reduce Local Work
Alias -> Map Local Tables:
t:a:cc:c3:ddm_content_netease_audit_doc_base_day
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
t:a:cc:c3:ddm_content_netease_audit_doc_base_day
TableScan
alias: ddm_content_netease_audit_doc_base_day
Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: docid (type: string), doctype (type: tinyint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col0 (type: string)
1 _col0 (type: string)
Stage: Stage-29
Map Reduce
Map Operator Tree:
TableScan
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), CASE WHEN (_col1 is null) THEN ('40-1') WHEN (_col1 is not null) THEN (_col2) END (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Local Work:
Map Reduce Local Work
Stage: Stage-17
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
TableScan
alias: ddm_content_netease_audit_doc_base_day
Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: docid (type: string), doctype (type: tinyint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: tinyint)
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), CASE WHEN (_col1 is null) THEN ('40-1') WHEN (_col1 is not null) THEN (_col2) END (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-34
Conditional Operator
Stage: Stage-21
Map Reduce
Map Operator Tree:
TableScan
alias: ddm_content_netease_article_day
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: id (type: string), wemedia_id (type: string), description (type: string), body (type: string), original (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string)
TableScan
alias: ddm_content_netease_base_info_d
Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: wemedia_id (type: string), tid (type: string), cname (type: string), tname (type: string), account_quality (type: int), online_state (type: int), account_type (type: int), account_star (type: string), original_flag (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: string), _col8 (type: int)
Reduce Operator Tree:
Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: int), _col10 (type: int), _col11 (type: int), _col12 (type: string), _col13 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
只有一条评论 (QwQ)
你详解了jb?