hadoop getmerge，【轉】MapReduce讀取lzo文件

2023-11-22 阅读 25 评论 0

摘要：1.讀lzo文件需要添加以下代碼，并導入lzo相關的jar包?job.setInputFormatClass(LzoTextInputFormat.class); 2.寫lzo文件 lzo格式默認是不支持splitable的，需要為其添加索引文件，才能支持多個map并行對lzo文件進行處理如果希望reduce輸出的是lzo格式的文

1.讀lzo文件

需要添加以下代碼，并導入lzo相關的jar包
?job.setInputFormatClass(LzoTextInputFormat.class);

2.寫lzo文件

lzo格式默認是不支持splitable的，需要為其添加索引文件，才能支持多個map并行對lzo文件進行處理

如果希望reduce輸出的是lzo格式的文件，添加下面的語句
? ? ? ? FileOutputFormat.setCompressOutput(job, true);
? ? ? ? FileOutputFormat.setOutputCompressorClass(job, LzopCodec.class);
? ? ? ? int result = job.waitForCompletion(true) ? 0 : 1;
? ? ? ? //上面的語句執行完成后，會生成最后的輸出文件，需要在此基礎上添加lzo的索引
? ? ? ? LzoIndexer lzoIndexer = new LzoIndexer(conf);
? ? ? ? lzoIndexer.index(new Path(args[1]));

如果已經存在lzo文件，但沒有添加索引，可以采用下面的方法，在輸入路徑的文件上上添加lzo索引

hadoop getmerge、hadoop jar $HADOOP_HOME/lib/hadoop-lzo-0.4.17.jar com.hadoop.compression.lzo.LzoIndexer hdf://inputpath

或者?

hadoop jar $HADOOP_HOME/lib/hadoop-lzo-0.4.17.jar? com.hadoop.compression.lzo.DistributedLzoIndexe hdf://inputpath

java讀取文件內容，?

【轉自】http://blog.csdn.net/wisgood/article/details/17080361

原文链接：https://hbdhgg.com/1/184642.html

上一篇：python中mylist怎么用，mylist

下一篇：nginx+tomcat+redis負載均衡及session共享