天天看点

hadoop 运行kmeans例子出错

[[email protected] hadoop-0.20.2]$ hadoop jar /home/hadoop/mahout-0.3/mahout-examples-0.3.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -i testdata -o output

10/09/20 14:46:07 INFO kmeans.Job: Preparing Input

10/09/20 14:46:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

10/09/20 14:46:08 INFO mapred.FileInputFormat: Total input paths to process : 1

10/09/20 14:46:10 INFO mapred.JobClient: Running job: job_201009201429_0001

10/09/20 14:46:11 INFO mapred.JobClient: map 0% reduce 0%

10/09/20 14:46:24 INFO mapred.JobClient: map 50% reduce 0%

10/09/20 14:46:27 INFO mapred.JobClient: map 100% reduce 0%

10/09/20 14:46:29 INFO mapred.JobClient: Job complete: job_201009201429_0001

10/09/20 14:46:29 INFO mapred.JobClient: Counters: 9

10/09/20 14:46:29 INFO mapred.JobClient: Job Counters

10/09/20 14:46:29 INFO mapred.JobClient: Rack-local map tasks=1

10/09/20 14:46:29 INFO mapred.JobClient: Launched map tasks=2

10/09/20 14:46:29 INFO mapred.JobClient: Data-local map tasks=1

10/09/20 14:46:29 INFO mapred.JobClient: FileSystemCounters

10/09/20 14:46:29 INFO mapred.JobClient: HDFS_BYTES_READ=291645

10/09/20 14:46:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=483037

10/09/20 14:46:29 INFO mapred.JobClient: Map-Reduce Framework

10/09/20 14:46:29 INFO mapred.JobClient: Map input records=601

10/09/20 14:46:29 INFO mapred.JobClient: Spilled Records=0

10/09/20 14:46:29 INFO mapred.JobClient: Map input bytes=288375

10/09/20 14:46:29 INFO mapred.JobClient: Map output records=601

10/09/20 14:46:29 INFO kmeans.Job: Running Canopy to get initial clusters

10/09/20 14:46:29 INFO canopy.CanopyDriver: Input: output/data Out: output/canopies Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure t1: 80.0 t2: 55.0

10/09/20 14:46:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

10/09/20 14:46:31 INFO mapred.FileInputFormat: Total input paths to process : 2

10/09/20 14:46:32 INFO mapred.JobClient: Running job: job_201009201429_0002

10/09/20 14:46:33 INFO mapred.JobClient: map 0% reduce 0%

10/09/20 14:46:42 INFO mapred.JobClient: map 50% reduce 0%

10/09/20 14:46:48 INFO mapred.JobClient: Task Id : attempt_201009201429_0002_m_000001_0, Status : FAILED

org.apache.mahout.math.CardinalityException: My cardinality is: 0, but the other is: 60

at org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:275)

at org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)

at org.apache.mahout.common.distance.EuclideanDistanceMeasure.distance(EuclideanDistanceMeasure.java:39)

at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:108)

at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:49)

at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:34)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/09/20 14:46:51 INFO mapred.JobClient: map 50% reduce 16%

10/09/20 14:46:56 INFO mapred.JobClient: Task Id : attempt_201009201429_0002_m_000001_1, Status : FAILED

org.apache.mahout.math.CardinalityException: My cardinality is: 0, but the other is: 60

at org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:275)

at org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)

at org.apache.mahout.common.distance.EuclideanDistanceMeasure.distance(EuclideanDistanceMeasure.java:39)

at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:108)

at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:49)

at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:34)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/09/20 14:47:02 INFO mapred.JobClient: Task Id : attempt_201009201429_0002_m_000001_2, Status : FAILED

org.apache.mahout.math.CardinalityException: My cardinality is: 0, but the other is: 60

at org.apache.mahout.math.RandomAccessSparseVector.dot(RandomAccessSparseVector.java:275)

at org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure.distance(SquaredEuclideanDistanceMeasure.java:57)

at org.apache.mahout.common.distance.EuclideanDistanceMeasure.distance(EuclideanDistanceMeasure.java:39)

at org.apache.mahout.clustering.canopy.CanopyClusterer.addPointToCanopies(CanopyClusterer.java:108)

at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:49)

at org.apache.mahout.clustering.canopy.CanopyMapper.map(CanopyMapper.java:34)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/09/20 14:47:11 INFO mapred.JobClient: Job complete: job_201009201429_0002

10/09/20 14:47:11 INFO mapred.JobClient: Counters: 14

10/09/20 14:47:11 INFO mapred.JobClient: Job Counters

10/09/20 14:47:11 INFO mapred.JobClient: Launched reduce tasks=1

10/09/20 14:47:11 INFO mapred.JobClient: Rack-local map tasks=3

10/09/20 14:47:11 INFO mapred.JobClient: Launched map tasks=5

10/09/20 14:47:11 INFO mapred.JobClient: Data-local map tasks=2

10/09/20 14:47:11 INFO mapred.JobClient: Failed map tasks=1

10/09/20 14:47:11 INFO mapred.JobClient: FileSystemCounters

10/09/20 14:47:11 INFO mapred.JobClient: HDFS_BYTES_READ=242288

10/09/20 14:47:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=12038

10/09/20 14:47:11 INFO mapred.JobClient: Map-Reduce Framework

10/09/20 14:47:11 INFO mapred.JobClient: Combine output records=0

10/09/20 14:47:11 INFO mapred.JobClient: Map input records=301

10/09/20 14:47:11 INFO mapred.JobClient: Spilled Records=15

10/09/20 14:47:11 INFO mapred.JobClient: Map output bytes=11940

10/09/20 14:47:11 INFO mapred.JobClient: Map input bytes=242198

10/09/20 14:47:11 INFO mapred.JobClient: Combine input records=0

10/09/20 14:47:11 INFO mapred.JobClient: Map output records=15

Exception in thread "main" java.io.IOException: Job failed!

at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)

at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:163)

at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:152)

at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:101)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

请各位多多指教!