天天看点

Apache Hive 3.X版本 HiveServer2 insert overwrite大量数据 报错内存溢出 GC overhead limit exceeded return code -101.执行语句2.报错内容3.问题解决

1.执行语句

使用beeline命令执行insert overwrite 语句,将text格式的表导入到orc格式的表中,数据量为10G

2.报错内容

beeline界面报错,GC溢出

报错信息:ERROR : FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded

INFO : Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 2
INFO : 2021-03-04 20:29:24,524 Stage-3 map = 0%, reduce = 0%
INFO : 2021-03-04 20:29:31,756 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 7.48 sec
INFO : 2021-03-04 20:29:47,270 Stage-3 map = 100%, reduce = 50%, Cumulative CPU 23.37 sec
INFO : 2021-03-04 20:29:48,302 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 38.9 sec
INFO : MapReduce Total cumulative CPU time: 38 seconds 900 msec
INFO : Ended Job = job_1614859018049_0008
INFO : Starting task [Stage-2:STATS] in serial mode
ERROR : FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 15 Reduce: 16 Cumulative CPU: 1857.62 sec HDFS Read: 3999520714 HDFS Write: 1398843068 SUCCESS
INFO : Stage-Stage-3: Map: 1 Reduce: 2 Cumulative CPU: 38.9 sec HDFS Read: 257422321 HDFS Write: 343036357 SUCCESS
INFO : Total MapReduce CPU Time Spent: 31 minutes 36 seconds 520 msec
INFO : Completed executing command(queryId=root_20210304202612_290e982e-a68e-455b-98de-a5a426b415ab); Time taken: 258.611 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
Error: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded (state=08S01,code=-101)
           

详细的报错信息通过hive.log查看,也可以通过HiveServer2 的Web: http://${自定义IP}:10002/查看

return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded

java.lang.OutOfMemoryError: GC overhead limit exceeded

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:308) ~[hive-service-3.0.0.jar:3.0.0]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:223) ~[hive-service-3.0.0.jar:3.0.0]
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) ~[hive-service-3.0.0.jar:3.0.0]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:313) ~[hive-service-3.0.0.jar:3.0.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_262]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_262]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.1.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:326) ~[hive-service-3.0.0.jar:3.0.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_262]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_262]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
           

3.问题解决

主要是insert overwrite语句使用了很多HiveServer2的JVM的heap引起的,通过增大HiveServer2的heap大小能解决。

通过命令查看原先的HiveServer2使用了多少内存。

ps -ef | grep -i hiveserver2
           

查找关键词-Xmx后面的值,默认为256M,对于导入10G的表来说太小了,需要增加。

进入到Hive的home目录,进入到bin目录中,找到hive-config.sh脚本

vim hive-config.sh

export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-20480}
           

将256增大为20480(20G)

之后重启HiveServer2即可。

继续阅读