问题描述/异常栈
[1;31mError: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. All datanodes DatanodeInfoWithStorage[***:1004,DS-2b583f8d-bb61-4560-8b4c-4dd6c99ace38,DISK] are bad. Aborting...
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:317)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:258)
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:372)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: All datanodes DatanodeInfoWithStorage[***:1004,DS-2b583f8d-bb61-4560-8b4c-4dd6c99ace38,DISK] are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1109)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:871)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:401) (state=08S01,code=1)[m
解决方案
#调整DN以下参数配置,加大超时等待时长
dfs.client.socket-timeout=180000
dfs.datanode.socket.write.timeout=180000
问题原因
该问题为Datanode服务异常导致,一般为集群节点批量维护或网络异常导致,小概率出现用户任务对应数据块副本全部缺失触发;
没有评论