hadoop版本的對比

目前hadoop有2個開源版本，一個是Apache的版本，另一個是Cloudera在Apache的基礎上進行優化的版本，也稱為CDH3版。

兩個版本的對比情況如下：

CDH3 版本	Apache 版本	描述
Hadoop Common	●	●	The common utilities that support the other Hadoop subprojects.
Hadoop Distributed File System (HDFS）	●	●	A distributed file system that provides high-throughput access to application data.
Hadoop MapReduce	●	●	A software framework for distributed processing of large data sets on compute clusters.
Flume	●	A distributed, reliable, and available service for efficiently moving large amounts of data as the data is produced.
Sqoop	●	A tool that imports data from relational databases into Hadoop clusters.
Hue	●	A graphical user interface to work with CDH.
Pig	●	●	A high-level data-flow language and execution framework for parallel computation.Enables you to analyze large amounts of data using Pig's query language called Pig Latin.
Hive	●	●	A data warehouse infrastructure that provides data summarization and ad hoc querying. A powerful data warehousing application built on top of Hadoop which enables you to access your data using Hive QL, a language that is similar to SQL.
HBase	●	●	A scalable, distributed database that supports structured data storage for large tables. provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS).
ZooKeeper	●	●	A high-performance coordination service for distributed applications.A highly reliable and available service that provides coordination between distributed processes.
Oozie	●	A server-based workflow engine specialized in running workflow jobs with actions that execute Hadoop jobs.
Whirr	●	Provides a fast way to run cloud services.
Snappy	●	A compression/decompression library.
Avro	●	A data serialization system.
Cassandra	●	A scalable multi-master database with no single points of failure.
Chukwa	●	A data collection system for managing large distributed systems.
Mahout	●	A Scalable machine learning and data mining library.

理論上說，CDH3版本應該支援Apache版本的全部元件及其子項目。

兩個hadoop版本的異同如下：

系統

從CDH3b3開始不支援hadoop.job.ugi參數，請使用UserGroupInformation.doAs()方法代替。

其它見：https://ccp.cloudera.com/display/CDHDOC/Incompatible+Changes

安裝

Cloudera CDH3基于hadoop穩定版0.20.2，并內建很多更新檔（patch）。

CDH提供rpm包和tar兩種方式（Cloudera更推薦使用rpm方式），hadoop0.20.2隻提供了tar包安裝方式。

Cloudera CDH3 自動設定JAVA_HOME環境變量，apache hadoop需要手工配置。

Apache hadoop使用start/stop-dfs.sh start/stop-all.sh腳本維護叢集，CDH通過root身份運作/etc/init.d/hadoop-0.20-* 腳本啟動、關閉服務，這種方式隻可以管理目前伺服器，如果希望實作類似start/stop-all.sh需要自己寫腳本。

Cloudera CDH安裝成功後會添加兩個使用者：hdfs（hdfs檔案系統相關）, mapred（mapreduce相關），而Apache hadoop通常的做法是添加一個hadoop使用者來做所有的事情。

Cloudera CDH通過alternatives切換多個配置檔案，而Apache hadoop配置檔案隻儲存在$HADOOP_HOME/conf下面。

eclipse插件

Cloudera CDH預設沒有提供eclipse插件，需要自己編譯，而且它的插件和Apache hadoop插件不相容。

安全

CDH3支援Kerberos安全認證，apache hadoop則使用簡陋的使用者名比對認證。

hadoop版本的對比

系統

安裝

eclipse插件

安全

繼續閱讀

Apache與PHP環境下配置本地虛拟主機

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

Linux 7 中配置Apache服務，及禁止ip通路，删除apache廣告頁面。

Apache配置檔案中的deny和allow的使用

Apache 配置預設編碼

伺服器配置——Apache

Apache靜态檔案通路配置（書封伺服器）

apache httpd 配置

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

ubuntu14.04下安裝hbse1.0.1.1

Apache配置SSLApache配置SSL

Windows下配置Apache的SSL服務

User Defined Hadoop DataType

Apache2.4.x 配置檔案詳解Apache配置需要了解如下：開始講解：

配置apache支援PHP（win7）

Ambari介紹和架構原理