hive在E-MapReduce叢集的實踐（二）叢集hive參數優化提高hdfs性能hive參數優化

2018-03-05 23:50:00

本文介紹一些常見的叢集跑hive作業參數優化，可以根據業務需要來使用。

dfs.client.read.shortcircuit=true //直讀

dfs.client.read.shortcircuit.streams.cache.size=4096 //直讀緩存

dfs.datanode.balance.bandwidthPerSec=30048576 //提高balance帶寬，一般擴容後調整

dfs.datanode.max.transfer.threads=16384 //提高線程數

dfs.namenode.checkpoint.period=21600 //延長checkpoint時間

dfs.namenode.handler.count=100 //并發數，大叢集要提高

dfs.namenode.fslock.fair=false //降低寫性能，但提高讀鎖性能

dfs.namenode.lifeline.handler.count=1 //ha叢集優化，大叢集使用

hive.metastore.server.max.threads=100000

hive.compactor.worker.threads=5

hive.metastore.client.socket.timeout=1800s

hive.metastore.failure.retries=5

hive.exec.max.dynamic.partitions=5000

hive.exec.max.dynamic.partitions.pernode=2000

set hive.execution.engine=tez;

SET hive.tez.auto.reducer.parallelism=true;

SET hive.tez.max.partition.factor=20;

STORED AS ORC tblproperties (“orc.compress" = “SNAPPY”)

hive.exec.orc.default.compress=SNAPPY

hive.exec.parallel=true

SET hive.exec.reducers.bytes.per.reducer=128000000;

set hive.vectorized.execution.enabled = true;

set hive.vectorized.execution.reduce.enabled = true;

hive.limit.optimize.enable=true

set hive.cbo.enable=true;

set hive.compute.query.using.stats=true;

set hive.stats.fetch.column.stats=true;

set hive.stats.fetch.partition.stats=true;

查詢前先統計常用表的靜态資訊，常join的列

analyze table tweets compute statistics;

analyze table tweets compute statistics for columns sender, topic;

set hive.enforce.bucketing = <b>true</b>;

set hive.optimize.bucketmapjoin = true;

set hive.optimize.bucketmapjoin.sortedmerge = true;

set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

hive在E-MapReduce叢集的實踐（二）叢集hive參數優化提高hdfs性能hive參數優化

繼續閱讀

Sql優化一：sql語句優化

SQL優化SQL語句優化的目的

Nacos 2.0 更新前後性能對比壓測

JAVA高效程式設計指南

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

Storm編譯打包過程中遇到的一些問題及解決方法

關于SQL語言

SQL語言基礎：常用的資料查詢語句

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

9.spark Core 進階2--Cashe

淺談企業活動中進行資料分析的重要性

neo4j之cypher使用文檔

Ambari介紹和架構原理

NOSQL安全攻擊

sqlServer根據經緯查距離

win10本地scala和spark安裝安裝scala安裝spark