大数据协作框架Hue

大数据协作框架Hue
- 一概述
- 二Hue的安装和部署
- 三hue集成hadoop2x
- 四hue集成hive
- 五hive集成RDBMS

一，概述

1，参考文档

http://gethue.com/    官网
http://github.com/cloudera/hue   源码
http://archive.cloudera.com/cdh5/cdh//hue--cdh5/   hue安装指南

3，特性：

free&open source
be productive
100%compatible
4dynamic search dashboar with solr(动态的solr集成)
spark and hadoop notebooks

4,结构示意图：

大数据协作框架Hue大数据协作框架Hue

二，Hue的安装和部署

1，下载源码包CDH5.3.6版本：

2,虚拟机连接互联网

3，安装hue所以依赖的系统包，针对不同的unix系统,需要root权限

4，解压hue源码包到指定的目录下

5,编译源码包

6，修改配置文件hue.ini

# Set this to a random string, the longer the better.
  # This is used for secure hashing in the session store.
  secret_key=qpbdxoewsqlkhztybvfidtvwekftusgdlofbcfghaswuicmqp

  # Webserver listens on this address and port
  http_host=xingyunfei001.comcn
  http_port=

  # Time zone name
  time_zone=Asia/Shanghai

7,启动hue

[hadoop001@xingyunfei001 hue-.-cdh5.]$ build/env/bin/supervisor

8,浏览器查看

大数据协作框架Hue大数据协作框架Hue

三，hue集成hadoop2.x

1,修改hadoop的hdfs-site.xml配置文件：

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

2,修改hadoop的core-site.xml配置文件

<property>
  <name>hadoop.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hue.groups</name>
  <value>*</value>
</property>

3,修改hue的hue.ini配置文件

[hadoop]

  # Configuration for HDFS NameNode
  # ------------------------------------------------------------------------
  [[hdfs_clusters]]
    # HA support by using HttpFs

    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://xingyunfei001.com.cn:

      # NameNode logical name.
      ## logical_name=

      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is  for HttpFs.
      webhdfs_url=http://xingyunfei001.com.cn:/webhdfs/v1

      # Change this if your HDFS cluster is Kerberos-secured
      ## security_enabled=false

      # Default umask for file and directory creation, specified in an octal value.
      ## umask=

      # Directory of the Hadoop configuration
      hadoop_hdfs_home=/opt/app/hadoop_2_cdh

      hadoop_bin=/opt/app/hadoop_2_cdh/bin

      hadoop_conf_dir=/opt/app/hadoop_2_cdh/etc/hadoop

[[yarn_clusters]]

    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=xingyunfei001.com.cn

      # The port where the ResourceManager IPC listens on
      resourcemanager_port=

      # Whether to submit jobs to this cluster
      submit_to=True

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # Change this if your YARN cluster is Kerberos-secured
      ## security_enabled=false

      # URL of the ResourceManager API
      resourcemanager_api_url=http://xingyunfei001.com.cn:

      # URL of the ProxyServer API
      proxy_api_url=http://xingyunfei001.com.cn:

      # URL of the HistoryServer API
      history_server_api_url=http://xingyunfei001.com.cn:

      # In secure mode (HTTPS), if SSL certificates from Resource Manager's
      # Rest Server have to be verified against certificate authority
      ## ssl_cert_ca_verify=False

    # HA support by specifying multiple clusters
    # e.g.

    # [[[ha]]]
      # Resource Manager logical name (required for HA)
      ## logical_name=my-rm-name

4,重新启动hdfs

[hadoop001@xingyunfei001 hadoop_2.cdh]$ sbin/start-all.sh

[hadoop001@xingyunfei001 hadoop_2.cdh]$ sbin/mr-jobhistory-daemon.sh start historyserver

5,重新启动hue服务器

[hadoop001@xingyunfei001 hue-.-cdh5.]$ build/env/bin/supervisor

6,查看测试结果

大数据协作框架Hue大数据协作框架Hue

四，hue集成hive

1,配置hue.ini配置文件

[beeswax]

  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=xingyunfei001.com.cn

  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000

  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/opt/app/hive_0.13.1_cdh/conf

  # Timeout in seconds for thrift calls to Hive service
  server_conn_timeout=120

  # Choose whether Hue uses the GetLog() thrift call to retrieve Hive logs.
  # If false, Hue will use the FetchResults() thrift call instead.
  ## use_get_log_api=true

  # Set a LIMIT clause when browsing a partitioned table.
  # A positive value will be set as the LIMIT. If  or negative, do not set any limit.
  ## browse_partitioned_table_limit=

  # A limit to the number of rows that can be downloaded from a query.
  # A value of - means there will be no limit.
  # A maximum of , is applied to XLS downloads.
  ## download_row_limit=

  # Hue will try to close the Hive query when the user leaves the editor page.
  # This will free all the query resources in HiveServer2, but also make its results inaccessible.
  ## close_queries=false

  # Thrift version to use when communicating with HiveServer2
  ## thrift_version=

2,修改hive的hive-site.xml文件配置metastore server

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://xingyunfei001.com.cn:9083</value>
</property>

3,启动metastore server(先启动)和hiveserver2

[hadoop001@xingyunfei001 hive_.cdh]$ bin/hiveserver2

4,修改hdfs文件系统的/tmp权限

[hadoop001@xingyunfei001 hadoop_2.cdh]$ bin/hdfs dfs -chmod -R o+x /tmp

5,查看配置是否生效

大数据协作框架Hue大数据协作框架Hue

五，hive集成RDBMS

1,修改hue.ini配置文件

[[databases]]
    # sqlite configuration.
    [[[sqlite]]]
      # Name to show in the UI.
      nice_name=SQLite

      # For SQLite, name defines the path to the database.
      name=/opt/app/hue--cdh5/desktop/desktop.db

      # Database backend to use.
      engine=sqlite

      # Database options to send to the server when connecting.
      # https://docs.djangoproject.com/en//ref/databases/
      ## options={}

    # mysql, oracle, or postgresql configuration.
    [[[mysql]]]
      # Name to show in the UI.
      nice_name="My SQL DB"

      # For MySQL and PostgreSQL, name is the name of the database.
      # For Oracle, Name is instance of the Oracle server. For express edition
      # this is 'xe' by default.
      ## name=mysqldb

      # Database backend to use. This can be:
      #  mysql
      #  postgresql
      #  oracle
      engine=mysql

      # IP or hostname of the database to connect to.
      host=xingyunfei001.com.cn

      # Port the database server is listening to. Defaults are:
      #  MySQL: 
      #  PostgreSQL: 
      #  Oracle Express Edition: 
      port=

      # Username to authenticate with when connecting to the database.
      user=root

      # Password matching the username to authenticate with when
      # connecting to the database.
      password=root

      # Database options to send to the server when connecting.
      # https://docs.djangoproject.com/en//ref/databases/
      ## options={}

2,重新启动hue

build/env/bin/supervisor

3,查看配置是否生效

大数据协作框架Hue大数据协作框架Hue

大数据协作框架Hue大数据协作框架Hue

大数据协作框架Hue

一，概述

二，Hue的安装和部署

三，hue集成hadoop2.x

四，hue集成hive

五，hive集成RDBMS

继续阅读

jdk1.7+Eclipse+Maven3.5+Hadoop2.7.3构建hadoop项目

HDFS命令行工具

【51CTO学院三周年】自学路上的伴侣

在线教育巨头多邻国Duolingo入华一周年，中国市场马力全开

【分类算法】什么是分类算法定义分类与聚类分类过程方法

申请评分模型拒绝推断（RI）方法申请评分模型拒绝推断（RI）方法

Sql优化一：sql语句优化

Nacos 2.0 升级前后性能对比压测

尚硅谷—韩顺平—图解 Java设计模式（结构型）（55～）

Storm编译打包过程中遇到的一些问题及解决方法

MapReduce的几个企业级经典面试案例MapReduce的几个企业级经典面试案例

9.spark Core 进阶2--Cashe

浅谈企业活动中进行数据分析的重要性

Ambari介绍和架构原理

NOSQL安全攻击

win10本地scala和spark安装安装scala安装spark