天天看點

Running Shark Locally 及可能出現的問題

[size=medium][b]Shark本地安裝[/b][/size]

[b]1.下載下傳scala[/b]

wget [url]http://www.scala-lang.org/files/archive/scala-2.9.3.tgz[/url]

最新有2.10.2.tgz檔案

tar xvfz scala-2.9.3.tgz

[b]2.下載下傳shark and hive壓縮包[/b]

wget [url]http://spark-project.org/download/shark-0.7.0-hadoop1-bin.tgz[/url] (cdh3)

tar xvfz shark-0.7.0-*-bin.tgz

[b]3. 配置環境變量[/b]

cd shark-0.7.0/conf

cp shark-env.sh.template shark-env.sh

vi shark-env.sh

export HIVE_HOME=/path/to/hive-0.9.0-bin

export SCALA_HOME=/path/to/scala-2.9.3

[b]4.測試資料[/b]

CREATE TABLE src(key INT, value STRING);

LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;

SELECT COUNT(1) FROM src;

OK

500

Time taken: 2.149 seconds

沒有了hive中的mr,速度快了不少

CREATE TABLE src_cached AS SELECT * FROM SRC;

SELECT COUNT(1) FROM src_cached;

[size=medium][b]安裝過程中可能出現的問題及解決[/b][/size]

[b]1.CREATE TABLE src(key INT, value STRING);[/b]

FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version

mismatch. (client = 61, server = 63))

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

ERROR exec.Task: FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol

org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63))

org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol

org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63))

at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:544)

at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3313)

at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)

at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1312)

at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1104)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)

at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:288)

at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)

at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)

at shark.SharkCliDriver$.main(SharkCliDriver.scala:203)

at shark.SharkCliDriver.main(SharkCliDriver.scala)

[b]reason[/b]:Hadoop版本與SHARK的Hadoop core jar包版本不一緻引起的。

[b]解決[/b]:将${HADOOP_HOME}/hadoop-core-*.jar copy 到${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/hadoop-core/目錄下面,rm原來的hadoop-core-*.jar

重新進入Shark

[b]2.出現java.lang.NoClassDefFoundError[/b]

/app/hadoop/shark/shark-0.7.0/lib_managed/jars/org.apache.hadoop/hadoop-core/

java.lang.NoClassDefFoundError: org/apache/hadoop/thirdparty/guava/common/collect/LinkedListMultimap

at org.apache.hadoop.hdfs.SocketCache.<init>(SocketCache.java:48)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:253)

at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:220)

at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1611)

at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:68)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1645)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1627)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)

at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)

at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:136)

at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:151)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getDefaultDatabasePath(HiveMetaStore.java:475)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:353)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:371)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:278)

at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:248)

at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:114)

at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)

at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)

at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:538)

at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3313)

at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)

at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1312)

at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1104)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)

at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:288)

at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)

at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)

at shark.SharkCliDriver$.main(SharkCliDriver.scala:203)

at shark.SharkCliDriver.main(SharkCliDriver.scala)

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.thirdparty.guava.common.collect.LinkedListMultimap

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

at java.lang.ClassLoader.loadClass(ClassLoader.java:248)

... 36 more

[b]reason[/b]:CDH版本的缺少一個第三方包guava-*.jar

[b]解決[/b]:建一個目錄${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/thirdparty,拷貝${HADOOP_HOME}/lib/guava-r09-jarjar.jar到這個目錄

重新進入Shark

[b]3.show tables出現問題[/b]

Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in

mapredWork!

[b]reason[/b]:缺少hadoop-lzo-*.jar引起的

[b]解決[/b]:建一個目錄${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/lib, 拷貝${HADOOP_HOME}/lib/hadoop-lzo-*.jar到這個目錄

重新進入Shark

[b]4.SELECT count(1) FROM src_cached出現問題[/b]

spark.SparkException: Job failed: ShuffleMapTask(6, 0) failed: ExceptionFailure(java.lang.NoSuchMethodError: sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)

V)at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)

at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)

at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)

at spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:601)

at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:300)

at spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)

at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask

[b]reason[/b]:java1.6版本低,需要安裝jdk7.

[b]解決[/b]:安裝jdk7, JAVA_HOME指向新的JDK7,問題解決

tar xvfz jdk-7u25-linux-x64.tar.gz -C /usr/java/

export JAVA_HOME=/usr/java/jdk1.7.0_25

export CLASSPATH=/usr/java/jdk1.7.0_25/lib

重新進入Shark