天天看点

大数据协作框架Hue大数据协作框架Hue

大数据协作框架Hue

  • 大数据协作框架Hue
    • 一概述
    • 二Hue的安装和部署
    • 三hue集成hadoop2x
    • 四hue集成hive
    • 五hive集成RDBMS

一,概述

1,参考文档

http://gethue.com/    官网
http://github.com/cloudera/hue   源码
http://archive.cloudera.com/cdh5/cdh//hue--cdh5/   hue安装指南
           

3,特性:

  1. free&open source
  2. be productive
  3. 100%compatible
  4. 4dynamic search dashboar with solr(动态的solr集成)
  5. spark and hadoop notebooks

4,结构示意图:

大数据协作框架Hue大数据协作框架Hue

二,Hue的安装和部署

1,下载源码包CDH5.3.6版本:

2,虚拟机连接互联网

3,安装hue所以依赖的系统包,针对不同的unix系统,需要root权限

4,解压hue源码包到指定的目录下

5,编译源码包

6,修改配置文件hue.ini

# Set this to a random string, the longer the better.
  # This is used for secure hashing in the session store.
  secret_key=qpbdxoewsqlkhztybvfidtvwekftusgdlofbcfghaswuicmqp

  # Webserver listens on this address and port
  http_host=xingyunfei001.comcn
  http_port=

  # Time zone name
  time_zone=Asia/Shanghai
           

7,启动hue

[hadoop001@xingyunfei001 hue-.-cdh5.]$ build/env/bin/supervisor
           

8,浏览器查看

大数据协作框架Hue大数据协作框架Hue

三,hue集成hadoop2.x

1,修改hadoop的hdfs-site.xml配置文件:

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
           

2,修改hadoop的core-site.xml配置文件

<property>
  <name>hadoop.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hue.groups</name>
  <value>*</value>
</property>
           

3,修改hue的hue.ini配置文件

[hadoop]

  # Configuration for HDFS NameNode
  # ------------------------------------------------------------------------
  [[hdfs_clusters]]
    # HA support by using HttpFs

    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs://xingyunfei001.com.cn:

      # NameNode logical name.
      ## logical_name=

      # Use WebHdfs/HttpFs as the communication mechanism.
      # Domain should be the NameNode or HttpFs host.
      # Default port is  for HttpFs.
      webhdfs_url=http://xingyunfei001.com.cn:/webhdfs/v1

      # Change this if your HDFS cluster is Kerberos-secured
      ## security_enabled=false

      # Default umask for file and directory creation, specified in an octal value.
      ## umask=

      # Directory of the Hadoop configuration
      hadoop_hdfs_home=/opt/app/hadoop_2_cdh

      hadoop_bin=/opt/app/hadoop_2_cdh/bin

      hadoop_conf_dir=/opt/app/hadoop_2_cdh/etc/hadoop
           
[[yarn_clusters]]

    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=xingyunfei001.com.cn

      # The port where the ResourceManager IPC listens on
      resourcemanager_port=

      # Whether to submit jobs to this cluster
      submit_to=True

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # Change this if your YARN cluster is Kerberos-secured
      ## security_enabled=false

      # URL of the ResourceManager API
      resourcemanager_api_url=http://xingyunfei001.com.cn:

      # URL of the ProxyServer API
      proxy_api_url=http://xingyunfei001.com.cn:

      # URL of the HistoryServer API
      history_server_api_url=http://xingyunfei001.com.cn:

      # In secure mode (HTTPS), if SSL certificates from Resource Manager's
      # Rest Server have to be verified against certificate authority
      ## ssl_cert_ca_verify=False

    # HA support by specifying multiple clusters
    # e.g.

    # [[[ha]]]
      # Resource Manager logical name (required for HA)
      ## logical_name=my-rm-name
           

4,重新启动hdfs

[hadoop001@xingyunfei001 hadoop_2.cdh]$ sbin/start-all.sh

[hadoop001@xingyunfei001 hadoop_2.cdh]$ sbin/mr-jobhistory-daemon.sh start historyserver
           

5,重新启动hue服务器

[hadoop001@xingyunfei001 hue-.-cdh5.]$ build/env/bin/supervisor
           

6,查看测试结果

大数据协作框架Hue大数据协作框架Hue

四,hue集成hive

1,配置hue.ini配置文件

[beeswax]

  # Host where HiveServer2 is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=xingyunfei001.com.cn

  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000

  # Hive configuration directory, where hive-site.xml is located
  hive_conf_dir=/opt/app/hive_0.13.1_cdh/conf

  # Timeout in seconds for thrift calls to Hive service
  server_conn_timeout=120

  # Choose whether Hue uses the GetLog() thrift call to retrieve Hive logs.
  # If false, Hue will use the FetchResults() thrift call instead.
  ## use_get_log_api=true

  # Set a LIMIT clause when browsing a partitioned table.
  # A positive value will be set as the LIMIT. If  or negative, do not set any limit.
  ## browse_partitioned_table_limit=

  # A limit to the number of rows that can be downloaded from a query.
  # A value of - means there will be no limit.
  # A maximum of , is applied to XLS downloads.
  ## download_row_limit=

  # Hue will try to close the Hive query when the user leaves the editor page.
  # This will free all the query resources in HiveServer2, but also make its results inaccessible.
  ## close_queries=false

  # Thrift version to use when communicating with HiveServer2
  ## thrift_version=
           

2,修改hive的hive-site.xml文件配置metastore server

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://xingyunfei001.com.cn:9083</value>
</property>
           

3,启动metastore server(先启动)和hiveserver2

[hadoop001@xingyunfei001 hive_.cdh]$ bin/hiveserver2
           

4,修改hdfs文件系统的/tmp权限

[hadoop001@xingyunfei001 hadoop_2.cdh]$ bin/hdfs dfs -chmod -R o+x /tmp
           

5,查看配置是否生效

大数据协作框架Hue大数据协作框架Hue

五,hive集成RDBMS

1,修改hue.ini配置文件

[[databases]]
    # sqlite configuration.
    [[[sqlite]]]
      # Name to show in the UI.
      nice_name=SQLite

      # For SQLite, name defines the path to the database.
      name=/opt/app/hue--cdh5/desktop/desktop.db

      # Database backend to use.
      engine=sqlite

      # Database options to send to the server when connecting.
      # https://docs.djangoproject.com/en//ref/databases/
      ## options={}

    # mysql, oracle, or postgresql configuration.
    [[[mysql]]]
      # Name to show in the UI.
      nice_name="My SQL DB"

      # For MySQL and PostgreSQL, name is the name of the database.
      # For Oracle, Name is instance of the Oracle server. For express edition
      # this is 'xe' by default.
      ## name=mysqldb

      # Database backend to use. This can be:
      #  mysql
      #  postgresql
      #  oracle
      engine=mysql

      # IP or hostname of the database to connect to.
      host=xingyunfei001.com.cn

      # Port the database server is listening to. Defaults are:
      #  MySQL: 
      #  PostgreSQL: 
      #  Oracle Express Edition: 
      port=

      # Username to authenticate with when connecting to the database.
      user=root

      # Password matching the username to authenticate with when
      # connecting to the database.
      password=root

      # Database options to send to the server when connecting.
      # https://docs.djangoproject.com/en//ref/databases/
      ## options={}
           

2,重新启动hue

build/env/bin/supervisor
           

3,查看配置是否生效

大数据协作框架Hue大数据协作框架Hue

继续阅读