今天突然任务报错.运行到stage-7的时候报错。知道肯定是数据倾斜导致的,就是查不到具体哪里的问题。看yarn job的log也没发现有用的信息。
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

关于Hive执行计划简述

一般执行计划有两个部分:
stage dependencies 各个stage之间的依赖性
stage plan 各个stage的执行计划

一个stage并不一定是一个MR,有可能是Fetch Operator,也有可能是Move Operator。

一个MR的执行计划分为两个部分:
Map Operator Tree MAP端的执行计划
Reduce Operator Tree Reduce端的执行计划

一些常见的Operator:
TableScan 读取数据,常见的属性 alias

Select Operator 选取操作
Group By Operator 分组聚合, 常见的属性 aggregations、mode , 当没有keys属性时只有一个分组。
Reduce Output Operator 输出结果给Reduce , 常见的属性 sort order
Fetch Operator 客户端获取数据 , 常见属性 limit

常见的属性的取值及含义:
aggregations 用在Group By Operator中
count()计数

mode 用在Group By Operator中
hash 待定
mergepartial 合并部分聚合结果
final

sort order 用于Reduce Output Operator中
+ 正序排序
不排序
++按两列正序排序,如果有两列
+- 正反排序,如果有两列
-反向排序
如此类推

后来就通过explain看下执行计划。

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-3 depends on stages: Stage-2, Stage-17, Stage-29
  Stage-37 depends on stages: Stage-3
  Stage-28 depends on stages: Stage-37
  Stage-27 depends on stages: Stage-21, Stage-28 , consists of Stage-36, Stage-5
  Stage-36 has a backup stage: Stage-5
  Stage-26 depends on stages: Stage-36
  Stage-35 depends on stages: Stage-5, Stage-26
  Stage-7 depends on stages: Stage-35
  Stage-13 depends on stages: Stage-7 , consists of Stage-10, Stage-9, Stage-11
  Stage-10
  Stage-0 depends on stages: Stage-10, Stage-9, Stage-12
  Stage-8 depends on stages: Stage-0
  Stage-9
  Stage-11
  Stage-12 depends on stages: Stage-11
  Stage-5
  Stage-32 is a root stage , consists of Stage-16
  Stage-16
  Stage-30 depends on stages: Stage-16 , consists of Stage-38, Stage-17
  Stage-38 has a backup stage: Stage-17
  Stage-29 depends on stages: Stage-38
  Stage-17
  Stage-34 is a root stage , consists of Stage-21
  Stage-21

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: ddm_content_doc_info_logic_day
            Statistics: Num rows: 502133848 Data size: 209584131934 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (((substr(docid, 9, 2)) IN ('04', '02', '03', '07', '00') or (substr(docid, 9, 4) = '9001')) and (topicid = substr(docid, 9, 8))) (type: boolean)
              Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: topicid (type: string), docid (type: string), title (type: string), digest (type: string), from_unixtime(UDFToInteger((ptime / 1000))) (type: string), source (type: string), url (type: string), search (type: string), finearticle (type: string), category (type: string), quality (type: string), dkeys (type: string), interests (type: string), professional (type: string), body (type: string), buloid (type: string), ispic (type: string), iscomment (type: string), doc_del (type: int), channelid (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19
                Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col1 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col1 (type: string)
                  Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: int), _col19 (type: string)
          TableScan
            alias: ddm_content_netease_audit_doc_base_day
            Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (doctype) IN ('40', '43', '44', '45', '46', '48', '49', '10', '50') (type: boolean)
              Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: topicid (type: string), docid (type: string), title (type: string), publish_time (type: string), split(source, '&&')[0] (type: string), doc_url (type: string), regexp_replace(category, ',', '\') (type: string), quality (type: float), keywords (type: string), interests (type: string), picnum (type: tinyint), delstatus (type: tinyint), channel (type: string), doctype (type: tinyint)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
                Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col1 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col1 (type: string)
                  Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: float), _col8 (type: string), _col9 (type: string), _col10 (type: tinyint), _col11 (type: tinyint), _col12 (type: string), _col13 (type: tinyint)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
          keys:
            0 _col1 (type: string)
            1 _col1 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33
          Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: COALESCE(_col0,_col20) (type: string), COALESCE(_col1,_col21) (type: string), COALESCE(_col2,_col22) (type: string), _col3 (type: string), COALESCE(_col23,_col4) (type: string), COALESCE(_col24,_col5) (type: string), COALESCE(_col25,_col6) (type: string), _col7 (type: string), _col8 (type: string), COALESCE(_col9,_col26) (type: string), COALESCE(_col10,_col27) (type: string), COALESCE(_col11,_col28) (type: string), COALESCE(_col12,_col29) (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col30 (type: tinyint), _col17 (type: string), COALESCE(_col18,_col31) (type: int), COALESCE(_col19,_col32) (type: string), _col33 (type: tinyint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
            Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col1 (type: string)
              sort order: +
              Map-reduce partition columns: _col1 (type: string)
              Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: tinyint)
          TableScan
            alias: ddm_content_netease_article_day
            Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: id (type: string), title (type: string), description (type: string), publish_time (type: string), source (type: string), url (type: string), '40' (type: string), body (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
              Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Map-reduce partition columns: _col0 (type: string)
                Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
          keys:
            0 _col1 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
          Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), COALESCE(_col1,_col22) (type: string), COALESCE(_col2,_col23) (type: string), COALESCE(_col3,_col24) (type: string), COALESCE(_col4,_col25) (type: string), COALESCE(_col5,_col26) (type: string), COALESCE(_col6,_col27) (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), COALESCE(_col14,_col29) (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), COALESCE(_col21,_col28) (type: string)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
            Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col1 (type: string)
              sort order: +
              Map-reduce partition columns: _col1 (type: string)
              Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: string)
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: string)
              sort order: +
              Map-reduce partition columns: _col0 (type: string)
              Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col1 (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
          keys:
            0 _col1 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23
          Statistics: Num rows: 334170096 Data size: 139478248870 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), COALESCE(_col1,_col22) (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), COALESCE(_col23,_col21) (type: string)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
            Statistics: Num rows: 334170096 Data size: 139478248870 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-37
    Map Reduce Local Work
      Alias -> Map Local Tables:
        t:b:ddm_content_original_column_d 
          Fetch Operator
            limit: -1
        t:d:ddm_content_local_city 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        t:b:ddm_content_original_column_d 
          TableScan
            alias: ddm_content_original_column_d
            Statistics: Num rows: 6 Data size: 1273 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: topicid (type: string), colname (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 6 Data size: 1273 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col0 (type: string)
                  1 _col0 (type: string)
                  2 _col0 (type: string)
        t:d:ddm_content_local_city 
          TableScan
            alias: ddm_content_local_city
            Statistics: Num rows: 1652 Data size: 452648 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: local_city_id (type: string), local_city_name (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 1652 Data size: 452648 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col0 (type: string)
                  1 _col0 (type: string)
                  2 _col0 (type: string)

  Stage: Stage-28
    Map Reduce
      Map Operator Tree:
          TableScan
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
                   Left Outer Join0 to 2
              keys:
                0 _col0 (type: string)
                1 _col0 (type: string)
                2 _col0 (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25
              Statistics: Num rows: 735174227 Data size: 306852154164 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-27
    Conditional Operator

  Stage: Stage-36
    Map Reduce Local Work
      Alias -> Map Local Tables:
        t:$INTNAME1 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        t:$INTNAME1 
          TableScan
            HashTable Sink Operator
              keys:
                0 _col1 (type: string)
                1 _col0 (type: string)

  Stage: Stage-26
    Map Reduce
      Map Operator Tree:
          TableScan
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 _col1 (type: string)
                1 _col0 (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38
              Statistics: Num rows: 808691667 Data size: 337537376896 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-35
    Map Reduce Local Work
      Alias -> Map Local Tables:
        t:e:ddm_content_channel_day 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        t:e:ddm_content_channel_day 
          TableScan
            alias: ddm_content_channel_day
            Statistics: Num rows: 32 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: channelid (type: string), channelname (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 32 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col20 (type: string)
                  1 _col0 (type: string)

  Stage: Stage-7
    Map Reduce
      Map Operator Tree:
          TableScan
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 _col20 (type: string)
                1 _col0 (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col40
              Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col1 (type: string), _col4 (type: string)
                sort order: +-
                Map-reduce partition columns: _col1 (type: string)
                Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col25 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string), _col30 (type: string), _col31 (type: string), _col32 (type: string), _col33 (type: string), _col34 (type: int), _col35 (type: int), _col36 (type: int), _col37 (type: string), _col38 (type: int), _col40 (type: string)
      Local Work:
        Map Reduce Local Work
      Reduce Operator Tree:
        Select Operator
          expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: string), VALUE._col1 (type: string), VALUE._col2 (type: string), KEY.reducesinkkey1 (type: string), VALUE._col3 (type: string), VALUE._col4 (type: string), VALUE._col5 (type: string), VALUE._col6 (type: string), VALUE._col7 (type: string), VALUE._col8 (type: string), VALUE._col9 (type: string), VALUE._col10 (type: string), VALUE._col11 (type: string), VALUE._col12 (type: string), VALUE._col13 (type: string), VALUE._col14 (type: string), VALUE._col15 (type: tinyint), VALUE._col16 (type: string), VALUE._col17 (type: int), VALUE._col19 (type: string), VALUE._col20 (type: string), VALUE._col21 (type: string), VALUE._col23 (type: string), VALUE._col25 (type: string), VALUE._col26 (type: string), VALUE._col27 (type: string), VALUE._col28 (type: string), VALUE._col29 (type: string), VALUE._col30 (type: string), VALUE._col31 (type: string), VALUE._col32 (type: int), VALUE._col33 (type: int), VALUE._col34 (type: int), VALUE._col35 (type: string), VALUE._col36 (type: int), VALUE._col38 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col40
          Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string, _col2: string, _col3: string, _col4: string, _col5: string, _col6: string, _col7: string, _col8: string, _col9: string, _col10: string, _col11: string, _col12: string, _col13: string, _col14: string, _col15: string, _col16: string, _col17: tinyint, _col18: string, _col19: int, _col21: string, _col22: string, _col23: string, _col25: string, _col27: string, _col28: string, _col29: string, _col30: string, _col31: string, _col32: string, _col33: string, _col34: int, _col35: int, _col36: int, _col37: string, _col38: int, _col40: string
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col4 DESC NULLS LAST
                  partition by: _col1
                  raw input shape:
                  window functions:
                      window function definition
                        alias: row_number_window_0
                        name: row_number
                        window function: GenericUDAFRowNumberEvaluator
                        window frame: PRECEDING(MAX)~FOLLOWING(MAX)
                        isPivotResult: true
            Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (row_number_window_0 = 1) (type: boolean)
              Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), COALESCE(_col3,_col28) (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), COALESCE(_col14,_col29) (type: string), _col15 (type: string), _col16 (type: string), UDFToString(_col17) (type: string), _col18 (type: string), UDFToString(_col19) (type: string), _col30 (type: string), _col31 (type: string), _col32 (type: string), UDFToString(_col34) (type: string), UDFToString(_col35) (type: string), UDFToString(_col36) (type: string), _col37 (type: string), UDFToString(_col38) (type: string), CASE WHEN (((_col21 = '40') or (_col21 = '50') or (_col21 = '40-1'))) THEN ('网易号') WHEN (((_col21) IN ('43', '45', '46', '49') or (substr(_col1, 9, 4) = '9001'))) THEN ('抓取') WHEN (((substr(_col1, 9, 2) = '04') or (_col21 = '10'))) THEN ('本地') WHEN (_col22 is not null) THEN ('原创') WHEN ((substr(_col1, 9, 2)) IN ('02', '03', '07', '00')) THEN ('编辑') WHEN ((_col21 = '48')) THEN ('合作媒体') WHEN ((_col21 = '44')) THEN ('有道分享') END (type: string), CASE WHEN (((_col21 = '40') or (_col21 = '50') or (_col21 = '40-1'))) THEN (_col34) WHEN ((_col21 = '43')) THEN ('灰抓今日头条文章(无版权)') WHEN ((_col21 = '45')) THEN ('灰抓微信(无版权)') WHEN ((_col21 = '46')) THEN ('定向增量(无版权)') WHEN (((substr(_col1, 9, 4) = '9001') or (_col21 = '49'))) THEN ('段子抓取') WHEN (((substr(_col1, 9, 2) = '04') or (_col21 = '10'))) THEN (_col25) WHEN (_col22 is not null) THEN (_col23) WHEN ((substr(_col1, 9, 2)) IN ('02', '03', '07', '00')) THEN (_col40) WHEN ((_col21 = '48')) THEN ('合作媒体') WHEN ((_col21 = '44')) THEN ('有道分享') END (type: string), _col21 (type: string), _col27 (type: string), _col33 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32
                Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: portal.ddm_content_doc_day

  Stage: Stage-13
    Conditional Operator

  Stage: Stage-10
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://hz-cluster9/tmp/hive/public/.hive-staging_hive_2019-07-20_07-47-52_070_1672411049055087804-1/-ext-10000

  Stage: Stage-0
    Move Operator
      tables:
          partition:
            day 20190719
          replace: true
          table:
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: portal.ddm_content_doc_day

  Stage: Stage-8
    Stats-Aggr Operator

  Stage: Stage-9
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                  name: portal.ddm_content_doc_day

  Stage: Stage-11
    Map Reduce
      Map Operator Tree:
          TableScan
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                  name: portal.ddm_content_doc_day

  Stage: Stage-12
    Move Operator
      files:
          hdfs directory: true
          destination: hdfs://hz-cluster9/tmp/hive/public/.hive-staging_hive_2019-07-20_07-47-52_070_1672411049055087804-1/-ext-10000

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col1 (type: string)
              sort order: +
              Map-reduce partition columns: _col1 (type: string)
              Statistics: Num rows: 735174227 Data size: 306852154164 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col25 (type: string)
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: string)
              sort order: +
              Map-reduce partition columns: _col0 (type: string)
              Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: int), _col9 (type: int), _col10 (type: int), _col11 (type: string), _col12 (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
          keys:
            0 _col1 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38
          Statistics: Num rows: 808691667 Data size: 337537376896 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-32
    Conditional Operator

  Stage: Stage-16
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: ddm_content_netease_article_day
            Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: id (type: string)
              outputColumnNames: _col0
              Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Map-reduce partition columns: _col0 (type: string)
                Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
          TableScan
            alias: ddm_content_netease_audit_doc_base_day
            Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (doctype) IN ('40', '43', '44', '45', '46', '48', '49', '10', '50') (type: boolean)
              Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: docid (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
          keys:
            0 _col0 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: CASE WHEN (_col1 is null) THEN (_col0) END (type: string)
            outputColumnNames: _col0
            Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-30
    Conditional Operator

  Stage: Stage-38
    Map Reduce Local Work
      Alias -> Map Local Tables:
        t:a:cc:c3:ddm_content_netease_audit_doc_base_day 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        t:a:cc:c3:ddm_content_netease_audit_doc_base_day 
          TableScan
            alias: ddm_content_netease_audit_doc_base_day
            Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: docid (type: string), doctype (type: tinyint)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                keys:
                  0 _col0 (type: string)
                  1 _col0 (type: string)

  Stage: Stage-29
    Map Reduce
      Map Operator Tree:
          TableScan
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
              keys:
                0 _col0 (type: string)
                1 _col0 (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), CASE WHEN (_col1 is null) THEN ('40-1') WHEN (_col1 is not null) THEN (_col2) END (type: string)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  table:
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-17
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: string)
              sort order: +
              Map-reduce partition columns: _col0 (type: string)
              Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
          TableScan
            alias: ddm_content_netease_audit_doc_base_day
            Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: docid (type: string), doctype (type: tinyint)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Map-reduce partition columns: _col0 (type: string)
                Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col1 (type: tinyint)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
          keys:
            0 _col0 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2
          Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), CASE WHEN (_col1 is null) THEN ('40-1') WHEN (_col1 is not null) THEN (_col2) END (type: string)
            outputColumnNames: _col0, _col1
            Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-34
    Conditional Operator

  Stage: Stage-21
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: ddm_content_netease_article_day
            Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: id (type: string), wemedia_id (type: string), description (type: string), body (type: string), original (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4
              Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col1 (type: string)
                sort order: +
                Map-reduce partition columns: _col1 (type: string)
                Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string)
          TableScan
            alias: ddm_content_netease_base_info_d
            Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: wemedia_id (type: string), tid (type: string), cname (type: string), tname (type: string), account_quality (type: int), online_state (type: int), account_type (type: int), account_star (type: string), original_flag (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
              Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Map-reduce partition columns: _col0 (type: string)
                Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: string), _col8 (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
          keys:
            0 _col1 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
          Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: int), _col10 (type: int), _col11 (type: int), _col12 (type: string), _col13 (type: int)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
            Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe