今天突然任务报错.运行到stage-7的时候报错。知道肯定是数据倾斜导致的,就是查不到具体哪里的问题。看yarn job的log也没发现有用的信息。


stage dependencies 各个stage之间的依赖性
stage plan 各个stage的执行计划

一个stage并不一定是一个MR,有可能是Fetch Operator,也有可能是Move Operator。

Map Operator Tree MAP端的执行计划
Reduce Operator Tree Reduce端的执行计划

TableScan 读取数据,常见的属性 alias

Select Operator 选取操作
Group By Operator 分组聚合, 常见的属性 aggregations、mode , 当没有keys属性时只有一个分组。
Reduce Output Operator 输出结果给Reduce , 常见的属性 sort order
Fetch Operator 客户端获取数据 , 常见属性 limit

aggregations 用在Group By Operator中

mode 用在Group By Operator中
hash 待定
mergepartial 合并部分聚合结果

sort order 用于Reduce Output Operator中
+ 正序排序
+- 正反排序,如果有两列


  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-3 depends on stages: Stage-2, Stage-17, Stage-29
  Stage-37 depends on stages: Stage-3
  Stage-28 depends on stages: Stage-37
  Stage-27 depends on stages: Stage-21, Stage-28 , consists of Stage-36, Stage-5
  Stage-36 has a backup stage: Stage-5
  Stage-26 depends on stages: Stage-36
  Stage-35 depends on stages: Stage-5, Stage-26
  Stage-7 depends on stages: Stage-35
  Stage-13 depends on stages: Stage-7 , consists of Stage-10, Stage-9, Stage-11
  Stage-0 depends on stages: Stage-10, Stage-9, Stage-12
  Stage-8 depends on stages: Stage-0
  Stage-12 depends on stages: Stage-11
  Stage-32 is a root stage , consists of Stage-16
  Stage-30 depends on stages: Stage-16 , consists of Stage-38, Stage-17
  Stage-38 has a backup stage: Stage-17
  Stage-29 depends on stages: Stage-38
  Stage-34 is a root stage , consists of Stage-21

  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
            alias: ddm_content_doc_info_logic_day
            Statistics: Num rows: 502133848 Data size: 209584131934 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (((substr(docid, 9, 2)) IN ('04', '02', '03', '07', '00') or (substr(docid, 9, 4) = '9001')) and (topicid = substr(docid, 9, 8))) (type: boolean)
              Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: topicid (type: string), docid (type: string), title (type: string), digest (type: string), from_unixtime(UDFToInteger((ptime / 1000))) (type: string), source (type: string), url (type: string), search (type: string), finearticle (type: string), category (type: string), quality (type: string), dkeys (type: string), interests (type: string), professional (type: string), body (type: string), buloid (type: string), ispic (type: string), iscomment (type: string), doc_del (type: int), channelid (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19
                Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col1 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col1 (type: string)
                  Statistics: Num rows: 251066924 Data size: 104792065967 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: string), _col18 (type: int), _col19 (type: string)
            alias: ddm_content_netease_audit_doc_base_day
            Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (doctype) IN ('40', '43', '44', '45', '46', '48', '49', '10', '50') (type: boolean)
              Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: topicid (type: string), docid (type: string), title (type: string), publish_time (type: string), split(source, '&&')[0] (type: string), doc_url (type: string), regexp_replace(category, ',', '\') (type: string), quality (type: float), keywords (type: string), interests (type: string), picnum (type: tinyint), delstatus (type: tinyint), channel (type: string), doctype (type: tinyint)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
                Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col1 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col1 (type: string)
                  Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: float), _col8 (type: string), _col9 (type: string), _col10 (type: tinyint), _col11 (type: tinyint), _col12 (type: string), _col13 (type: tinyint)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
            0 _col1 (type: string)
            1 _col1 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32, _col33
          Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: COALESCE(_col0,_col20) (type: string), COALESCE(_col1,_col21) (type: string), COALESCE(_col2,_col22) (type: string), _col3 (type: string), COALESCE(_col23,_col4) (type: string), COALESCE(_col24,_col5) (type: string), COALESCE(_col25,_col6) (type: string), _col7 (type: string), _col8 (type: string), COALESCE(_col9,_col26) (type: string), COALESCE(_col10,_col27) (type: string), COALESCE(_col11,_col28) (type: string), COALESCE(_col12,_col29) (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col30 (type: tinyint), _col17 (type: string), COALESCE(_col18,_col31) (type: int), COALESCE(_col19,_col32) (type: string), _col33 (type: tinyint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
            Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
            Reduce Output Operator
              key expressions: _col1 (type: string)
              sort order: +
              Map-reduce partition columns: _col1 (type: string)
              Statistics: Num rows: 276173622 Data size: 115271275062 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: tinyint)
            alias: ddm_content_netease_article_day
            Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: id (type: string), title (type: string), description (type: string), publish_time (type: string), source (type: string), url (type: string), '40' (type: string), body (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7
              Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Map-reduce partition columns: _col0 (type: string)
                Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
            0 _col1 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29
          Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), COALESCE(_col1,_col22) (type: string), COALESCE(_col2,_col23) (type: string), COALESCE(_col3,_col24) (type: string), COALESCE(_col4,_col25) (type: string), COALESCE(_col5,_col26) (type: string), COALESCE(_col6,_col27) (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), COALESCE(_col14,_col29) (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), COALESCE(_col21,_col28) (type: string)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
            Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
            Reduce Output Operator
              key expressions: _col1 (type: string)
              sort order: +
              Map-reduce partition columns: _col1 (type: string)
              Statistics: Num rows: 303790990 Data size: 126798405316 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: string)
            Reduce Output Operator
              key expressions: _col0 (type: string)
              sort order: +
              Map-reduce partition columns: _col0 (type: string)
              Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col1 (type: string)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
            0 _col1 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23
          Statistics: Num rows: 334170096 Data size: 139478248870 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), COALESCE(_col1,_col22) (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), COALESCE(_col23,_col21) (type: string)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21
            Statistics: Num rows: 334170096 Data size: 139478248870 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-37
    Map Reduce Local Work
      Alias -> Map Local Tables:
          Fetch Operator
            limit: -1
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
            alias: ddm_content_original_column_d
            Statistics: Num rows: 6 Data size: 1273 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: topicid (type: string), colname (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 6 Data size: 1273 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                  0 _col0 (type: string)
                  1 _col0 (type: string)
                  2 _col0 (type: string)
            alias: ddm_content_local_city
            Statistics: Num rows: 1652 Data size: 452648 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: local_city_id (type: string), local_city_name (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 1652 Data size: 452648 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                  0 _col0 (type: string)
                  1 _col0 (type: string)
                  2 _col0 (type: string)

  Stage: Stage-28
    Map Reduce
      Map Operator Tree:
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
                   Left Outer Join0 to 2
                0 _col0 (type: string)
                1 _col0 (type: string)
                2 _col0 (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25
              Statistics: Num rows: 735174227 Data size: 306852154164 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-27
    Conditional Operator

  Stage: Stage-36
    Map Reduce Local Work
      Alias -> Map Local Tables:
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
            HashTable Sink Operator
                0 _col1 (type: string)
                1 _col0 (type: string)

  Stage: Stage-26
    Map Reduce
      Map Operator Tree:
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
                0 _col1 (type: string)
                1 _col0 (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38
              Statistics: Num rows: 808691667 Data size: 337537376896 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-35
    Map Reduce Local Work
      Alias -> Map Local Tables:
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
            alias: ddm_content_channel_day
            Statistics: Num rows: 32 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: channelid (type: string), channelname (type: string)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 32 Data size: 6460 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                  0 _col20 (type: string)
                  1 _col0 (type: string)

  Stage: Stage-7
    Map Reduce
      Map Operator Tree:
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
                0 _col20 (type: string)
                1 _col0 (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col40
              Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col1 (type: string), _col4 (type: string)
                sort order: +-
                Map-reduce partition columns: _col1 (type: string)
                Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col25 (type: string), _col27 (type: string), _col28 (type: string), _col29 (type: string), _col30 (type: string), _col31 (type: string), _col32 (type: string), _col33 (type: string), _col34 (type: int), _col35 (type: int), _col36 (type: int), _col37 (type: string), _col38 (type: int), _col40 (type: string)
      Local Work:
        Map Reduce Local Work
      Reduce Operator Tree:
        Select Operator
          expressions: VALUE._col0 (type: string), KEY.reducesinkkey0 (type: string), VALUE._col1 (type: string), VALUE._col2 (type: string), KEY.reducesinkkey1 (type: string), VALUE._col3 (type: string), VALUE._col4 (type: string), VALUE._col5 (type: string), VALUE._col6 (type: string), VALUE._col7 (type: string), VALUE._col8 (type: string), VALUE._col9 (type: string), VALUE._col10 (type: string), VALUE._col11 (type: string), VALUE._col12 (type: string), VALUE._col13 (type: string), VALUE._col14 (type: string), VALUE._col15 (type: tinyint), VALUE._col16 (type: string), VALUE._col17 (type: int), VALUE._col19 (type: string), VALUE._col20 (type: string), VALUE._col21 (type: string), VALUE._col23 (type: string), VALUE._col25 (type: string), VALUE._col26 (type: string), VALUE._col27 (type: string), VALUE._col28 (type: string), VALUE._col29 (type: string), VALUE._col30 (type: string), VALUE._col31 (type: string), VALUE._col32 (type: int), VALUE._col33 (type: int), VALUE._col34 (type: int), VALUE._col35 (type: string), VALUE._col36 (type: int), VALUE._col38 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38, _col40
          Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
          PTF Operator
            Function definitions:
                Input definition
                  input alias: ptf_0
                  output shape: _col0: string, _col1: string, _col2: string, _col3: string, _col4: string, _col5: string, _col6: string, _col7: string, _col8: string, _col9: string, _col10: string, _col11: string, _col12: string, _col13: string, _col14: string, _col15: string, _col16: string, _col17: tinyint, _col18: string, _col19: int, _col21: string, _col22: string, _col23: string, _col25: string, _col27: string, _col28: string, _col29: string, _col30: string, _col31: string, _col32: string, _col33: string, _col34: int, _col35: int, _col36: int, _col37: string, _col38: int, _col40: string
                  type: WINDOWING
                Windowing table definition
                  input alias: ptf_1
                  name: windowingtablefunction
                  order by: _col4 DESC NULLS LAST
                  partition by: _col1
                  raw input shape:
                  window functions:
                      window function definition
                        alias: row_number_window_0
                        name: row_number
                        window function: GenericUDAFRowNumberEvaluator
                        window frame: PRECEDING(MAX)~FOLLOWING(MAX)
                        isPivotResult: true
            Statistics: Num rows: 889560852 Data size: 371291122633 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (row_number_window_0 = 1) (type: boolean)
              Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), COALESCE(_col3,_col28) (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), COALESCE(_col14,_col29) (type: string), _col15 (type: string), _col16 (type: string), UDFToString(_col17) (type: string), _col18 (type: string), UDFToString(_col19) (type: string), _col30 (type: string), _col31 (type: string), _col32 (type: string), UDFToString(_col34) (type: string), UDFToString(_col35) (type: string), UDFToString(_col36) (type: string), _col37 (type: string), UDFToString(_col38) (type: string), CASE WHEN (((_col21 = '40') or (_col21 = '50') or (_col21 = '40-1'))) THEN ('网易号') WHEN (((_col21) IN ('43', '45', '46', '49') or (substr(_col1, 9, 4) = '9001'))) THEN ('抓取') WHEN (((substr(_col1, 9, 2) = '04') or (_col21 = '10'))) THEN ('本地') WHEN (_col22 is not null) THEN ('原创') WHEN ((substr(_col1, 9, 2)) IN ('02', '03', '07', '00')) THEN ('编辑') WHEN ((_col21 = '48')) THEN ('合作媒体') WHEN ((_col21 = '44')) THEN ('有道分享') END (type: string), CASE WHEN (((_col21 = '40') or (_col21 = '50') or (_col21 = '40-1'))) THEN (_col34) WHEN ((_col21 = '43')) THEN ('灰抓今日头条文章(无版权)') WHEN ((_col21 = '45')) THEN ('灰抓微信(无版权)') WHEN ((_col21 = '46')) THEN ('定向增量(无版权)') WHEN (((substr(_col1, 9, 4) = '9001') or (_col21 = '49'))) THEN ('段子抓取') WHEN (((substr(_col1, 9, 2) = '04') or (_col21 = '10'))) THEN (_col25) WHEN (_col22 is not null) THEN (_col23) WHEN ((substr(_col1, 9, 2)) IN ('02', '03', '07', '00')) THEN (_col40) WHEN ((_col21 = '48')) THEN ('合作媒体') WHEN ((_col21 = '44')) THEN ('有道分享') END (type: string), _col21 (type: string), _col27 (type: string), _col33 (type: string)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31, _col32
                Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 444780426 Data size: 185645561316 Basic stats: COMPLETE Column stats: NONE
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: portal.ddm_content_doc_day

  Stage: Stage-13
    Conditional Operator

  Stage: Stage-10
    Move Operator
          hdfs directory: true
          destination: hdfs://hz-cluster9/tmp/hive/public/.hive-staging_hive_2019-07-20_07-47-52_070_1672411049055087804-1/-ext-10000

  Stage: Stage-0
    Move Operator
            day 20190719
          replace: true
              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
              name: portal.ddm_content_doc_day

  Stage: Stage-8
    Stats-Aggr Operator

  Stage: Stage-9
    Map Reduce
      Map Operator Tree:
            File Output Operator
              compressed: false
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                  name: portal.ddm_content_doc_day

  Stage: Stage-11
    Map Reduce
      Map Operator Tree:
            File Output Operator
              compressed: false
                  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                  output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                  name: portal.ddm_content_doc_day

  Stage: Stage-12
    Move Operator
          hdfs directory: true
          destination: hdfs://hz-cluster9/tmp/hive/public/.hive-staging_hive_2019-07-20_07-47-52_070_1672411049055087804-1/-ext-10000

  Stage: Stage-5
    Map Reduce
      Map Operator Tree:
            Reduce Output Operator
              key expressions: _col1 (type: string)
              sort order: +
              Map-reduce partition columns: _col1 (type: string)
              Statistics: Num rows: 735174227 Data size: 306852154164 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: string), _col10 (type: string), _col11 (type: string), _col12 (type: string), _col13 (type: string), _col14 (type: string), _col15 (type: string), _col16 (type: string), _col17 (type: tinyint), _col18 (type: string), _col19 (type: int), _col20 (type: string), _col21 (type: string), _col22 (type: string), _col23 (type: string), _col25 (type: string)
            Reduce Output Operator
              key expressions: _col0 (type: string)
              sort order: +
              Map-reduce partition columns: _col0 (type: string)
              Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: int), _col9 (type: int), _col10 (type: int), _col11 (type: string), _col12 (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
            0 _col1 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col25, _col27, _col28, _col29, _col30, _col31, _col32, _col33, _col34, _col35, _col36, _col37, _col38
          Statistics: Num rows: 808691667 Data size: 337537376896 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-32
    Conditional Operator

  Stage: Stage-16
    Map Reduce
      Map Operator Tree:
            alias: ddm_content_netease_article_day
            Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: id (type: string)
              outputColumnNames: _col0
              Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Map-reduce partition columns: _col0 (type: string)
                Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
            alias: ddm_content_netease_audit_doc_base_day
            Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (doctype) IN ('40', '43', '44', '45', '46', '48', '49', '10', '50') (type: boolean)
              Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: docid (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: string)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: string)
                  Statistics: Num rows: 125993428 Data size: 294209672067 Basic stats: COMPLETE Column stats: NONE
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
            0 _col0 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: CASE WHEN (_col1 is null) THEN (_col0) END (type: string)
            outputColumnNames: _col0
            Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-30
    Conditional Operator

  Stage: Stage-38
    Map Reduce Local Work
      Alias -> Map Local Tables:
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
            alias: ddm_content_netease_audit_doc_base_day
            Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: docid (type: string), doctype (type: tinyint)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
              HashTable Sink Operator
                  0 _col0 (type: string)
                  1 _col0 (type: string)

  Stage: Stage-29
    Map Reduce
      Map Operator Tree:
            Map Join Operator
              condition map:
                   Left Outer Join0 to 1
                0 _col0 (type: string)
                1 _col0 (type: string)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: string), CASE WHEN (_col1 is null) THEN ('40-1') WHEN (_col1 is not null) THEN (_col2) END (type: string)
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                      input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-17
    Map Reduce
      Map Operator Tree:
            Reduce Output Operator
              key expressions: _col0 (type: string)
              sort order: +
              Map-reduce partition columns: _col0 (type: string)
              Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
            alias: ddm_content_netease_audit_doc_base_day
            Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: docid (type: string), doctype (type: tinyint)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Map-reduce partition columns: _col0 (type: string)
                Statistics: Num rows: 251986855 Data size: 588419341799 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col1 (type: tinyint)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
            0 _col0 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2
          Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), CASE WHEN (_col1 is null) THEN ('40-1') WHEN (_col1 is not null) THEN (_col2) END (type: string)
            outputColumnNames: _col0, _col1
            Statistics: Num rows: 277185546 Data size: 647261290007 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-34
    Conditional Operator

  Stage: Stage-21
    Map Reduce
      Map Operator Tree:
            alias: ddm_content_netease_article_day
            Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: id (type: string), wemedia_id (type: string), description (type: string), body (type: string), original (type: string)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4
              Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col1 (type: string)
                sort order: +
                Map-reduce partition columns: _col1 (type: string)
                Statistics: Num rows: 154362377 Data size: 57131042127 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col0 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string)
            alias: ddm_content_netease_base_info_d
            Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: wemedia_id (type: string), tid (type: string), cname (type: string), tname (type: string), account_quality (type: int), online_state (type: int), account_type (type: int), account_star (type: string), original_flag (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
              Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0 (type: string)
                sort order: +
                Map-reduce partition columns: _col0 (type: string)
                Statistics: Num rows: 1234899 Data size: 707579117 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: string), _col8 (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Left Outer Join0 to 1
            0 _col1 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13
          Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string), _col6 (type: string), _col7 (type: string), _col8 (type: string), _col9 (type: int), _col10 (type: int), _col11 (type: int), _col12 (type: string), _col13 (type: int)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
            Statistics: Num rows: 169798618 Data size: 62844147701 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe