hadoop 分布式缓存

Hadoop 分布式缓存实现目的是在所有的MapReduce调用一个统一的配置文件，首先将缓存文件放置在HDFS中，然后程序在执行的过程中会可以通过设定将文件下载到本地具体设定如下：

public static void main(String[] arge) throws IOException, ClassNotFoundException, InterruptedException{

Configuration conf=new Configuration();

conf.set("fs.default.name", "hdfs://192.168.1.45:9000");

FileSystem fs=FileSystem.get(conf);

fs.delete(new Path("CASICJNJP/gongda/Test_gd20140104"));

conf.set("mapred.job.tracker", "192.168.1.45:9001");

conf.set("mapred.jar", "/home/hadoop/workspace/jar/OBDDataSelectWithImeiTxt.jar");

Job job=new Job(conf,"myTaxiAnalyze");

DistributedCache.createSymlink(job.getConfiguration());//

try {

DistributedCache.addCacheFile(new URI("/user/hadoop/CASICJNJP/DistributeFiles/imei.txt"), job.getConfiguration());

} catch (URISyntaxException e1) {

// TODO Auto-generated catch block

e1.printStackTrace();

}

job.setMapperClass(OBDDataSelectMaper.class);

job.setReducerClass(OBDDataSelectReducer.class);

//job.setNumReduceTasks(10);

//job.setCombinerClass(IntSumReducer.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(Text.class);

FileInputFormat.addInputPath(job, new Path("/user/hadoop/CASICJNJP/SortedData/20140104"));

FileOutputFormat.setOutputPath(job, new Path("CASICJNJP/gongda/SelectedData"));

System.exit(job.waitForCompletion(true)?0:1);

}

代码中标红的为将HDFS中的/user/hadoop/CASICJNJP/DistributeFiles/imei.txt作为分布式缓存

public class OBDDataSelectMaper extends Mapper<Object, Text, Text, Text> {

String[] strs;

String[] ImeiTimes;

String timei;

String time;

private java.util.List<Integer> ImeiList = new java.util.ArrayList<Integer>();

protected void setup(Context context) throws IOException,

InterruptedException {

try {

Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context

.getConfiguration());

if (cacheFiles != null && cacheFiles.length > 0) {

String line;

BufferedReader br = new BufferedReader(new FileReader(

cacheFiles[0].toString()));

try {

line = br.readLine();

while ((line = br.readLine()) != null) {

ImeiList.add(Integer.parseInt(line));

}

} finally {

br.close();

}

} catch (IOException e) {

System.err.println("Exception reading DistributedCache: " + e);

}

public void map(Object key, Text value, Context context)

throws IOException, InterruptedException {

strs = value.toString().split("\t");

ImeiTimes = strs[0].split("_");

timei = ImeiTimes[0];

if (ImeiList.contains(Integer.parseInt(timei))) {

context.write(new Text(strs[0]), value);

} catch (Exception ex) {

}

上述标红代码中在Map的setup函数中加载分布式缓存。

hadoop 分布式缓存

继续阅读

Java小案例——随机数猜测随机数猜测

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

Ambari介绍和架构原理

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method