概要描述
Yarn的日志聚合默认不开启,开启之后偶尔会遇到日志没能够成功聚合到hdfs的情况,本文给出相关排查方式以及常见原因整理。
详细说明
首先,通过 yarn logs -applicationId application_XXX 日志可以看到打印出 Log aggregation has not completed or is not enabled
参考 Yarn 的 Application 日志聚合和定期清理 检查了yarn的相关配置都是正确的。
hadoop fs -ls /yarn1/var/log/hadoop-yarn/apps 检查hdfs聚合目录也是有的,
yarn nodemanager 节点日志应该能够有下面这种内容,关键字 logaggregation:
2024-12-19 18:03:08,004 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Considering container container_e242_1734509842865_0001_01_000002 for log-aggregation
2024-12-19 18:03:14,121 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Considering container container_e242_1734509842865_0001_01_000003 for log-aggregation
2024-12-19 18:03:41,781 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Application just finished : application_1734509842865_0001
2024-12-19 18:03:41,917 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_e242_1734509842865_0001_01_000003. Current good log dirs are /vdir/mnt/disk1/hadoop/yarn/logs,/vdir/mnt/disk2/hadoop/yarn/logs
2024-12-19 18:03:41,927 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_e242_1734509842865_0001_01_000002. Current good log dirs are /vdir/mnt/disk1/hadoop/yarn/logs,/vdir/mnt/disk2/hadoop/yarn/logs
下面是聚合失败的一些问题说明:
1.yarn用户对hdfs组件的access权限被误删了
解决方案:到Guardian页面添加yarn用户对hdfs的ACCESS权限。

2.聚合日志过多没有配置清理
解决方案:hdfs清理 /yarn1/var/log/hadoop-yarn/apps/backup/logs 目录下的文件,并且全局配置 yarn.log-aggregation.retain-seconds 参数,配置服务重启yarn。
yarn.log-aggregation.retain-seconds 默认不会过期,导致yarn.nodemanager.remote-app-log-dir 目录下文件过多,报错The directory item limit of /yarn1/var/log/hadoop-yarn/apps/backup/logs is exceeded: limit=1048576 items=1048576

其他原因陆续补充中…