报错信息

2023-11-10 10:51:15,665 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
    at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:59)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:312)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:302)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:532)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:342)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)

2023-11-10 10:51:15,675 INFO [main] org.apache.hadoop.mapred.Task: Running cleanup for the task
2023-11-10 10:51:15,809 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ReduceTask metrics system...
2023-11-10 10:51:15,810 INFO [file] org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: file thread interrupted.
2023-11-10 10:51:15,810 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system stopped.
2023-11-10 10:51:15,811 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system shutdown complete.
org.apache.hadoop.mapreduce.task.reduce.Shuffle error in shuffle in fetcher
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:305)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:295)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:514)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)

解决

-- shuffle使用的内存比例。默认值为0.7。
set mapreduce.reduce.shuffle.input.buffer.percent=0.6;
-- 单个shuffle任务能使用的内存限额,设置为0.15,即为 Shuffle内存 * 0.15。
-- 低于此值可以输出到内存,否则输出到磁盘。默认值为0.25。
set mapreduce.reduce.shuffle.memory.limit.percent=0.15;
-- shuffle的数据量到Shuffle内存 ** 0.9的时候,启动合并。默认值为0.66。
set mapreduce.reduce.shuffle.merge.percent=0.9;

原因
reduce会在map执行到一定比例启动多个fetch线程去拉取map的输出结果,放到reduce的内存、磁盘中,然后进行merge。当数据量大时,拉取到内存的数据就会引起OOM,所以此时要减少fetch占内存的百分比,将fetch的数据直接放在磁盘上。
参考shuffle原理和内存溢出原因
MergeManager
MergeManager是重要的数据结构,用于管理shuffle的数据。它尽量使用内存来缓存shuffle的数据,提高效率,如果缓存不了,则输出到硬盘上。