1.执行语句

使用beeline命令执行insert overwrite 语句，将text格式的表导入到orc格式的表中，数据量为10G

2.报错内容

beeline界面报错，GC溢出

报错信息：ERROR : FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded

INFO : Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 2
INFO : 2021-03-04 20:29:24,524 Stage-3 map = 0%, reduce = 0%
INFO : 2021-03-04 20:29:31,756 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 7.48 sec
INFO : 2021-03-04 20:29:47,270 Stage-3 map = 100%, reduce = 50%, Cumulative CPU 23.37 sec
INFO : 2021-03-04 20:29:48,302 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 38.9 sec
INFO : MapReduce Total cumulative CPU time: 38 seconds 900 msec
INFO : Ended Job = job_1614859018049_0008
INFO : Starting task [Stage-2:STATS] in serial mode
ERROR : FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 15 Reduce: 16 Cumulative CPU: 1857.62 sec HDFS Read: 3999520714 HDFS Write: 1398843068 SUCCESS
INFO : Stage-Stage-3: Map: 1 Reduce: 2 Cumulative CPU: 38.9 sec HDFS Read: 257422321 HDFS Write: 343036357 SUCCESS
INFO : Total MapReduce CPU Time Spent: 31 minutes 36 seconds 520 msec
INFO : Completed executing command(queryId=root_20210304202612_290e982e-a68e-455b-98de-a5a426b415ab); Time taken: 258.611 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
Error: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded (state=08S01,code=-101)

详细的报错信息通过hive.log查看，也可以通过HiveServer2 的Web: http://${自定义IP}:10002/查看

return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded

java.lang.OutOfMemoryError: GC overhead limit exceeded

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.StatsTask. GC overhead limit exceeded
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:308) ~[hive-service-3.0.0.jar:3.0.0]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:223) ~[hive-service-3.0.0.jar:3.0.0]
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) ~[hive-service-3.0.0.jar:3.0.0]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:313) ~[hive-service-3.0.0.jar:3.0.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_262]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_262]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.1.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:326) ~[hive-service-3.0.0.jar:3.0.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_262]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_262]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

3.问题解决

主要是insert overwrite语句使用了很多HiveServer2的JVM的heap引起的，通过增大HiveServer2的heap大小能解决。

通过命令查看原先的HiveServer2使用了多少内存。

ps -ef | grep -i hiveserver2

查找关键词-Xmx后面的值，默认为256M，对于导入10G的表来说太小了，需要增加。

进入到Hive的home目录，进入到bin目录中，找到hive-config.sh脚本

vim hive-config.sh

export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-20480}

将256增大为20480(20G)

之后重启HiveServer2即可。

Apache Hive 3.X版本 HiveServer2 insert overwrite大量数据报错内存溢出 GC overhead limit exceeded return code -101.执行语句2.报错内容3.问题解决

1.执行语句

2.报错内容

3.问题解决

继续阅读

jdk1.7+Eclipse+Maven3.5+Hadoop2.7.3构建hadoop项目

HDFS命令行工具

【51CTO学院三周年】自学路上的伴侣

在线教育巨头多邻国Duolingo入华一周年，中国市场马力全开

【分类算法】什么是分类算法定义分类与聚类分类过程方法

申请评分模型拒绝推断（RI）方法申请评分模型拒绝推断（RI）方法

Sql优化一：sql语句优化

Nacos 2.0 升级前后性能对比压测

尚硅谷—韩顺平—图解 Java设计模式（结构型）（55～）

Storm编译打包过程中遇到的一些问题及解决方法

MapReduce的几个企业级经典面试案例MapReduce的几个企业级经典面试案例

9.spark Core 进阶2--Cashe

浅谈企业活动中进行数据分析的重要性

Ambari介绍和架构原理

NOSQL安全攻击

win10本地scala和spark安装安装scala安装spark

Apache Hive 3.X版本 HiveServer2 insert overwrite大量数据 报错内存溢出 GC overhead limit exceeded return code -101.执行语句2.报错内容3.问题解决

1.执行语句

2.报错内容

3.问题解决

继续阅读

Apache Hive 3.X版本 HiveServer2 insert overwrite大量数据报错内存溢出 GC overhead limit exceeded return code -101.执行语句2.报错内容3.问题解决