Install RHadoop with Hadoop 2.2 – Red Hat Linux

Prerequisite

Hadoop 2.2 has been

installed (and the below installation steps should be

applied on each of Hadoop node)

Step 1. Install R

(by yum)

[hadoop@c0046220

yum.repos.d]$ sudo yum update

yum.repos.d]$ yum search r-project

yum.repos.d]$ sudo yum install R

...

Installed:

R.x86_64 0:3.0.2-1.el6

Dependency

R-core.x86_64 0:3.0.2-1.el6

R-core-devel.x86_64 0:3.0.2-1.el6 R-devel.x86_64 0:3.0.2-1.el6

R-java.x86_64 0:3.0.2-1.el6

R-java-devel.x86_64 0:3.0.2-1.el6

bzip2-devel.x86_64 0:1.0.5-7.el6_0 fontconfig-devel.x86_64

0:2.8.0-3.el6 freetype-devel.x86_64 0:2.3.11-14.el6_3.1

java-1.6.0-openjdk-devel.x86_64 1:1.6.0.0-1.62.1.11.11.90.el6_4 kpathsea.x86_64

0:2007-57.el6_2 libRmath.x86_64 0:3.0.2-1.el6

libRmath-devel.x86_64 0:3.0.2-1.el6

libXft-devel.x86_64 0:2.3.1-2.el6 libXmu.x86_64

0:1.1.1-2.el6 libXrender-devel.x86_64 0:0.9.7-2.el6

libicu.x86_64 0:4.2.1-9.1.el6_2

netpbm.x86_64 0:10.47.05-11.el6

netpbm-progs.x86_64 0:10.47.05-11.el6 pcre-devel.x86_64 0:7.8-6.el6

psutils.x86_64 0:1.17-34.el6

tcl.x86_64 1:8.5.7-6.el6

tcl-devel.x86_64 1:8.5.7-6.el6 tex-preview.noarch

0:11.85-10.el6 texinfo.x86_64 0:4.13a-8.el6

texinfo-tex.x86_64 0:4.13a-8.el6 texlive.x86_64

0:2007-57.el6_2 texlive-dvips.x86_64 0:2007-57.el6_2

texlive-latex.x86_64 0:2007-57.el6_2

texlive-texmf.noarch 0:2007-38.el6

texlive-texmf-dvips.noarch 0:2007-38.el6 texlive-texmf-errata.noarch

0:2007-7.1.el6 texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6

texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6

texlive-texmf-errata-latex.noarch 0:2007-7.1.el6 texlive-texmf-fonts.noarch

0:2007-38.el6 texlive-texmf-latex.noarch 0:2007-38.el6

texlive-utils.x86_64 0:2007-57.el6_2 tk.x86_64

1:8.5.7-5.el6 tk-devel.x86_64 1:8.5.7-5.el6

zlib-devel.x86_64 0:1.2.3-29.el6

Complete!

Validation:

yum.repos.d]$ R

version 3.0.2 (2013-09-25) -- "Frisbee Sailing"

Platform:

x86_64-redhat-linux-gnu (64-bit)

is free software and comes with ABSOLUTELY NO WARRANTY.

You

are welcome to redistribute it under certain conditions.

Type

‘license()‘ or ‘licence()‘ for distribution details.

Natural language support but running in an English locale

is a collaborative project with many contributors.

‘contributors()‘ for more information and

‘citation()‘

on how to cite R or R packages in publications.

‘demo()‘ for some demos, ‘help()‘ for on-line help, or

‘help.start()‘

for an HTML browser interface to help.

‘q()‘ to quit R.

Step 2. Install

RHadoop

2.1 Getting

RHadoop Packages

Download packages rhdfs,

rhbase and rmr2 from

and then run the R code below.

RHadoop]$ cd /tmp

tmp]$ mkdir RHadoop

tmp]$ cd RHadoop

RHadoop]$ wget

https://raw.githubusercontent.com/RevolutionAnalytics/rhdfs/master/build/rhdfs_1.0.8.tar.gz

https://raw.githubusercontent.com/RevolutionAnalytics/rmr2/3.1.0/build/rmr2_3.1.0.tar.gz

https://raw.githubusercontent.com/RevolutionAnalytics/rhbase/master/build/rhbase_1.2.0.tar.gz

2.2 Install R

packages that RHadoop depends on.

java]$ echo $JAVA_HOME

/usr/java/jdk1.8.0_05

java]$ sudo -i

[root@c0046220

~]# export JAVA_HOME=/usr/java/jdk1.8.0_05

~]# R CMD javareconf

[root@c0046220 ~]# R

.libPaths();

[1]

"/usr/lib64/R/library" "/usr/share/R/library"

install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional",

"stringr", "plyr", "reshape2", "caTools"))

#install.packages("caTools") #needed for rmr2

2.3 Install

Set environment variables

~]$ vi ~/.bashrc

set HADOOP locations for RHADOOP

export

HADOOP_CMD=$HADOOP_HOME/bin/hadoop

HADOOP_STREAMING=/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

~]$ source .bashrc

[hadoop@c0040084

R]$ sudo -i

[root@c0040084

~]# R

Sys.setenv(HADOOP_HOME="/opt/hadoop/hadoop-2.2.0");

Sys.setenv(HADOOP_CMD="/opt/hadoop/hadoop-2.2.0/bin/hadoop");

Sys.setenv(HADOOP_STREAMING="/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar");

install.packages(pkgs="/tmp/RHadoop/rhdfs_1.0.8.tar.gz",repos=NULL);

install.packages(pkgs="/tmp/RHadoop/rmr2_3.1.0.tar.gz",repos=NULL);

Step 3. Validation

Load and initialize the

rhdfs package, and execute some simple commands as below:

library(rhdfs)

hdfs.init()

hdfs.ls("/")

[hadoop@c0046220 ~]$ R

required package: rJava

sure to run hdfs.init()

14/05/15

10:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for

your platform... using builtin-java classes where applicable

permission owner group size modtime file

drwxr-xr-x hadoop supergroup 0 2014-05-14 03:05 /apps

drwxr-xr-x hadoop supergroup 0 2014-05-12 09:40 /data

drwxr-xr-x hadoop supergroup 0 2014-05-12 09:45 /output

drwxrwx--- hadoop supergroup 0 2014-05-15 10:02 /tmp

drwxr-xr-x hadoop supergroup 0 2014-05-14 05:48 /user

drwxr-xr-x hadoop supergroup 0 2014-05-13 06:43 /usr

rmr2 package, and execute some simple commands as below:

library(rmr2)

from.dfs(to.dfs(1:100))

from.dfs(mapreduce(to.dfs(1:100)))

~]$ R

required package: Rcpp

required package: RJSONIO

required package: bitops

required package: digest

required package: functional

required package: reshape2

required package: stringr

required package: plyr

required package: caTools

$key

NULL

$val

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

[91] 91 92 93 94 95 96 97 98 99 100

input<-

‘/user/hadoop/tmp.txt‘

wordcount =

function(input, output = NULL, pattern = " "){

wc.map = function(., lines) {

keyval(unlist( strsplit( x = lines,split = pattern)),1)

}

wc.reduce =function(word, counts ) {

keyval(word, sum(counts))

mapreduce(input = input ,output = output, input.format = "text",

map = wc.map, reduce = wc.reduce,combine = T)

wordcount(input)

input<- ‘/user/hadoop/tmp.txt‘

wordcount = function(input, output = NULL, pattern = " "){

10:18:40 INFO mapreduce.Job: Job job_1399887026053_0013 completed successfully

10:18:40 INFO mapreduce.Job: Counters: 45

File System Counters

FILE: Number of bytes read=11018

FILE: Number of bytes written=278566

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=2004

HDFS: Number of bytes written=11583

HDFS: Number of read operations=9

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Failed reduce tasks=1

Launched map tasks=2

Launched reduce tasks=2

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=23412

Total time spent by all reduces in occupied slots (ms)=13859

Map-Reduce Framework

Map input records=24

Map output records=112

Map output bytes=10522

Map output materialized bytes=11024

Input split bytes=208

Combine input records=112

Combine output records=114

Reduce input groups=105

Reduce shuffle bytes=11024

Reduce input records=114

Reduce output records=112

Spilled Records=228

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=569

CPU time spent (ms)=3700

Physical memory (bytes) snapshot=574214144

Virtual memory (bytes) snapshot=6258499584

Total committed heap usage (bytes)=365953024

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=1796

File Output Format Counters

Bytes Written=11583

rmr

reduce calls=110

10:18:40 INFO streaming.StreamJob: Output directory: /tmp/file612355aa2e35

function

()

{

fname

<environment:

0x37d70d0>

from.dfs("/tmp/file612355aa2e35")

[1] "-"

[2] "of"

[3] "Hong"

[4] "Paul‘s"

[5] "School"

[6] "College"

[7] "Graduate"

References

Install RHadoop with Hadoop 2.2 – Red Hat Linux

Install RHadoop with Hadoop 2.2 – Red Hat Linux

繼續閱讀

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

QR碼編碼原理三（日本漢字和中文編碼）

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method