hadoop版本的对比

目前hadoop有2个开源版本，一个是Apache的版本，另一个是Cloudera在Apache的基础上进行优化的版本，也称为CDH3版。

两个版本的对比情况如下：

CDH3 版本	Apache 版本	描述
Hadoop Common	●	●	The common utilities that support the other Hadoop subprojects.
Hadoop Distributed File System (HDFS）	●	●	A distributed file system that provides high-throughput access to application data.
Hadoop MapReduce	●	●	A software framework for distributed processing of large data sets on compute clusters.
Flume	●	A distributed, reliable, and available service for efficiently moving large amounts of data as the data is produced.
Sqoop	●	A tool that imports data from relational databases into Hadoop clusters.
Hue	●	A graphical user interface to work with CDH.
Pig	●	●	A high-level data-flow language and execution framework for parallel computation.Enables you to analyze large amounts of data using Pig's query language called Pig Latin.
Hive	●	●	A data warehouse infrastructure that provides data summarization and ad hoc querying. A powerful data warehousing application built on top of Hadoop which enables you to access your data using Hive QL, a language that is similar to SQL.
HBase	●	●	A scalable, distributed database that supports structured data storage for large tables. provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS).
ZooKeeper	●	●	A high-performance coordination service for distributed applications.A highly reliable and available service that provides coordination between distributed processes.
Oozie	●	A server-based workflow engine specialized in running workflow jobs with actions that execute Hadoop jobs.
Whirr	●	Provides a fast way to run cloud services.
Snappy	●	A compression/decompression library.
Avro	●	A data serialization system.
Cassandra	●	A scalable multi-master database with no single points of failure.
Chukwa	●	A data collection system for managing large distributed systems.
Mahout	●	A Scalable machine learning and data mining library.

理论上说，CDH3版本应该支持Apache版本的全部组件及其子项目。

两个hadoop版本的异同如下：

系统

从CDH3b3开始不支持hadoop.job.ugi参数，请使用UserGroupInformation.doAs()方法代替。

其它见：https://ccp.cloudera.com/display/CDHDOC/Incompatible+Changes

安装

Cloudera CDH3基于hadoop稳定版0.20.2，并集成很多补丁（patch）。

CDH提供rpm包和tar两种方式（Cloudera更推荐使用rpm方式），hadoop0.20.2只提供了tar包安装方式。

Cloudera CDH3 自动设置JAVA_HOME环境变量，apache hadoop需要手工配置。

Apache hadoop使用start/stop-dfs.sh start/stop-all.sh脚本维护集群，CDH通过root身份运行/etc/init.d/hadoop-0.20-* 脚本启动、关闭服务，这种方式只可以管理当前服务器，如果希望实现类似start/stop-all.sh需要自己写脚本。

Cloudera CDH安装成功后会添加两个用户：hdfs（hdfs文件系统相关）, mapred（mapreduce相关），而Apache hadoop通常的做法是添加一个hadoop用户来做所有的事情。

Cloudera CDH通过alternatives切换多个配置文件，而Apache hadoop配置文件只保存在$HADOOP_HOME/conf下面。

eclipse插件

Cloudera CDH默认没有提供eclipse插件，需要自己编译，而且它的插件和Apache hadoop插件不兼容。

安全

CDH3支持Kerberos安全认证，apache hadoop则使用简陋的用户名匹配认证。

hadoop版本的对比

系统

安装

eclipse插件

安全

继续阅读

Apache与PHP环境下配置本地虚拟主机

MapReduce的几个企业级经典面试案例MapReduce的几个企业级经典面试案例

Linux 7 中配置Apache服务，及禁止ip访问，删除apache广告页面。

Apache配置文件中的deny和allow的使用

Apache 配置默认编码

服务器配置——Apache

Apache静态文件访问配置（书封服务器）

apache httpd 配置

Ubuntu16.04安装Apache+MySQL+PHP1. 安装Apache2. 安装MySQL3. 安装PHP4. 安装phpMyAdmin

ubuntu14.04下安装hbse1.0.1.1

Apache配置SSLApache配置SSL

Windows下配置Apache的SSL服务

User Defined Hadoop DataType

Apache2.4.x 配置文件详解Apache配置需要了解如下：开始讲解：

配置apache支持PHP（win7）

Ambari介绍和架构原理