文章目錄
- 準備工作
- 安裝hue依賴的第三方軟體包
- 下載下傳hue源碼
- 編譯
- 配置
-
- 基礎配置
- 配置資料庫
- hue內建hadoop3.1.0配置
-
- 配置hadoop叢集
- 配置hue
- hue內建hive配置
- hue內建spark配置
-
- 安裝livy
- 配置hue
- MySql初始化
- 啟動hue
- 驗證
-
- 驗證hue內建spark配置是否正确
- 遇到的問題
準備工作
1、 安裝python
2、 安裝maven
3、應用類服務一般用專有賬号啟動,我們建立一個hue使用者和使用者組
groupadd hadoop
useradd -g hadoop hue
安裝hue依賴的第三方軟體包
yum -y install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi gcc gcc-c++ krb5-devel libtidy libxml2-devel libxslt-devel openldap-devel python-devel sqlite-devel openssl-devel mysql-devel gmp-devel
下載下傳hue源碼
下載下傳方式一:直接去官網下載下傳
http://gethue.com
下載下傳方式二:通過git下載下傳
git clone https://github.com/cloudera/hue.git branch-4.4
這裡采用git下載下傳
mv branch-4.4 hue
将hue及其内部檔案所屬使用者設定為hue,所屬使用者組設定為hadoop,然後切換hue使用者下
chown -R hue:hadoop hue
su hue
編譯
cd hue
make apps
配置
修改/usr/local/hue/desktop/conf/pseudo-distributed.ini
基礎配置
[desktop]
# 安全秘鑰,存儲session的加密處理
secret_key=dfsahjfhflsajdhfljahl
# Time zone name
time_zone=Asia/Shanghai
# Enable or disable debug mode.
django_debug_mode=false
# Enable or disable backtrace for server error
http_500_debug_mode=false
# This should be the hadoop cluster admin
## default_hdfs_superuser=hdfs
default_hdfs_superuser=root
# 不啟用的子產品
# app_blacklist=impala,security,rdbms,jobsub,pig,hbase,sqoop,zookeeper,metastore,indexer
配置資料庫
[[database]]
# 資料庫引擎類型
engine=mysql
# 資料庫主機位址
host=10.62.124.43
# 資料庫端口
port=3306
# 資料庫使用者名
user=root
# 資料庫密碼
password=xhw888
# 資料庫庫名
name=hue
hue內建hadoop3.1.0配置
配置hadoop叢集
配置etc/Hadoop/core-site.xml
<!-- Hue hue user. Start -->
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
<!-- Hue hue user. End -->
配置好後,重新開機dfs使配置生效
配置hue
修改desktop/conf/pseudo-distributed.ini檔案,先找到[[hdfs_clusters]]這個标簽,修改如下:
[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
fs_defaultfs=hdfs://hadoopSvr1:8020
# NameNode logical name.
logical_name=hadoopSvr1
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
## webhdfs_url=http://localhost:50070/webhdfs/v1
webhdfs_url=http://hadoopSvr1:9870/webhdfs/v1
# Change this if your HDFS cluster is Kerberos-secured
## security_enabled=false
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
# Directory of the Hadoop configuration
## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'
hadoop_conf_dir=$HADOOP_CONF_DIR
找到[[yarn_clusters]]這個标簽,修改如下:
# Configuration for YARN (MR2)
# ------------------------------------------------------------------------
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
## resourcemanager_host=localhost
resourcemanager_host=hadoopSvr3
# The port where the ResourceManager IPC listens on
## resourcemanager_port=8032
# Whether to submit jobs to this cluster
submit_to=True
# Resource Manager logical name (required for HA)
## logical_name=
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
# URL of the ResourceManager API
## resourcemanager_api_url=http://localhost:8088
resourcemanager_api_url=http://hadoopSvr3:8088
# URL of the ProxyServer API
## proxy_api_url=http://localhost:8088
proxy_api_url=http://hadoopSvr3:8088
# URL of the HistoryServer API
## history_server_api_url=http://localhost:19888
history_server_api_url=http://hadoopSvr4:19888
# URL of the Spark History Server
## spark_history_server_url=http://localhost:18088
spark_history_server_url=http://hadoopSvr1:18080
# Change this if your Spark History Server is Kerberos-secured
## spark_history_server_security_enabled=false
# In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
# have to be verified against certificate authority
## ssl_cert_ca_verify=True
hue內建hive配置
hue主要用于hive的互動式查詢,在hue所在伺服器建立/usr/local/hue/hive/conf目錄,
把hiveService2的hive-site.xml配置檔案複制到該目錄下。
修改desktop/conf/pseudo-distributed.ini檔案,先找到[beeswax]這個标簽,修改如下:
[beeswax]
# hiveServer2 服務位址(填主機名,kerberos要用)
hive_server_host=hadoopSvr3
# hiveServer2服務端口
hive_server_port=10000
# hiveServer2 hive-site.xml配置檔案存放位置
hive_conf_dir=/usr/local/hue/hive/conf
hue內建spark配置
啟動spark的thrift server
cd /usr/local/spark/sbin/
start-thriftserver.sh --master yarn --deploy-mode client
安裝livy
-
下載下傳livy安裝包
下載下傳位址:http://livy.incubator.apache.org/download/
- 解壓zip包
unzip apache-livy-0.6.0-incubating-bin.zip
mv apache-livy-0.6.0-incubating-bin livy-0.6.0
- 配置livy
cd livy-0.6.0/conf/
配置livy-env.sh
cp livy-env.sh.template livy-env.sh
建立livy的log目錄,并在livy-env.sh檔案中加入以下内容
export HADOOP_CONF_DIR=/usr/local/hadoop-3.1.0/etc/hadoop
export SPARK_HOME=/usr/local/spark
export LIVY_LOG_DIR=/data/livy/logs
export LIVY_PID_DIR=/data/livy/pid
配置livy.conf
cp livy.conf.template livy.conf
在livy.conf檔案中加入以下内容
# What port to start the server on.
livy.server.port = 8998
# What spark master Livy sessions should use.
livy.spark.master = yarn
# What spark deploy mode Livy sessions should use.
livy.spark.deploy-mode = client
- 啟動livy
/usr/local/livy-0.6.0/bin/livy-server start
配置hue
修改desktop/conf/pseudo-distributed.ini檔案,先找到[spark]這個标簽,修改如下:
###########################################################################
# Settings to configure the Spark application.
###########################################################################
[spark]
# The Livy Server URL.
## livy_server_url=http://localhost:8998
livy_server_url=http://hadoopSvr3:8998
# Configure Livy to start in local 'process' mode, or 'yarn' workers.
## livy_server_session_kind=yarn
livy_server_session_kind=yarn
# Whether Livy requires client to perform Kerberos authentication.
## security_enabled=false
# Whether Livy requires client to use csrf protection.
## csrf_enabled=false
# Host of the Sql Server
## sql_server_host=localhost
sql_server_host=hadoopSvr1
# Port of the Sql Server
## sql_server_port=10000
sql_server_port=10000
# Choose whether Hue should validate certificates received from the server.
## ssl_cert_ca_verify=true
###########################################################################
MySql初始化
在mysql資料庫上建一個名為hue的庫
# 登入mysql資料庫
mysql -u root -p
# 建立資料庫hue
create database hue;
# 建立使用者
create user 'hue'@'%' identified by 'xhw888';
# 授權
grant all privileges on hue.* to 'hue'@'%';
flush privileges;
然後執行如下指令
build/env/bin/hue syncdb
build/env/bin/hue migrate
執行完以後,可以在mysql中看到,hue相應的表已經生成。
啟動hue
停止hue
- 一般情況下,直接使用Ctrl + c來停止hue服務
- 如果将hue在背景運作的話,可以使用kill指令:
ps -ef | grep hue | grep -v grep | awk '{print $2}' | xargs kill -9
驗證
服務啟起來後,預設服務端口為8000
http://10.62.124.44:8000
驗證hue內建spark配置是否正确
登入hue背景,打開scala編輯頁,執行以下scala代碼
var counter = 0
val data = Array(1, 2, 3, 4, 5)
var rdd = sc.parallelize(data)
// Wrong: Don't do this!!
rdd.map(x=>x+1).collect()
若出現如下結果,則證明內建成功
counter: Int = 0
data: Array[Int] = Array(1, 2, 3, 4, 5)
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:27
res2: Array[Int] = Array(2, 3, 4, 5, 6)
hue管理賬号
使用者名:hue
密碼:xhw888
遇到的問題
- mysql初始化過程中報錯
解決方案
登入mysql修改密碼加密插件