java開發api接口的例子,Sqoop Java API 導入應用案例

 2023-12-25 阅读 30 评论 0

摘要:環境信息: Linux+JDK1.7 java開發api接口的例子?Sqoop 1.4.6-cdh5.5.2 hadoop-core 2.6.0-mr1-cdh5.5.2 hadoop-common 2.6.0-cdh5.5.2 Java API、hadoop-mapreduce-client-core 2.6.0-cdh5.5.2 需求: 將oracle中的某表導入到hdfs 實現: 首先組織S

環境信息:

Linux+JDK1.7

java開發api接口的例子?Sqoop 1.4.6-cdh5.5.2

hadoop-core 2.6.0-mr1-cdh5.5.2

hadoop-common 2.6.0-cdh5.5.2

Java API、hadoop-mapreduce-client-core 2.6.0-cdh5.5.2

需求:

將oracle中的某表導入到hdfs

實現:

首先組織Sqoop命令:

String[] args = new String[] {     // Oracle數據庫信息"--connect","jdbc:oracle:thin:@***:1522/**","-username","***","-password","***",
                      // 查詢sql
"--query","select * from TABLE_NAME where $CONDITIONS and create_date>=date'2017-05-01' and create_date<date'2017-06-01' ","-split-by","id","--hive-overwrite","--fields-terminated-by","'\\001'","--hive-drop-import-delims","--null-string","'\\\\N'","--null-non-string","'\\\\N'","--verbose","--target-dir","/user/hive/warehouse/test.db/H_TABLE_NAME"};

執行Sqoop任務:

       String[] expandArguments = OptionsFileUtil.expandArguments(args);SqoopTool tool = SqoopTool.getTool("import");Configuration conf = new Configuration();conf.set("fs.default.name", "hdfs://nameservice1");//設置HDFS服務地址Configuration loadPlugins = SqoopTool.loadPlugins(conf);Sqoop sqoop = new Sqoop((com.cloudera.sqoop.tool.SqoopTool) tool, loadPlugins);int res = Sqoop.runSqoop(sqoop, expandArguments);if (res == 0)log.info ("成功");

完成編碼后,發到測試環境進行測試,發現Sqoop在進行動態編譯時報編譯錯誤:

2017-07-26 15:10:15 [ERROR] [http-0.0.0.0-8080-6] [org.apache.sqoop.tool.ImportTool.run(ImportTool.java:613)] Encountered IOException running import job: java.io.IOException: Error returned by javac
?? ?at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:217)
?? ?at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108)
?? ?at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
?? ?at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
?? ?at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
?? ?at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
?? ?at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

動態編譯的日志如果沒有特殊配置的話,是無法通過log4j進行輸出的,因此,編譯錯誤需要到系統日志里查找:

/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:7: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
?????????????????????????? ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
?????????????????????????? ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
?????????????????????????? ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:37: error: cannot access Writable
public class QueryResult extends SqoopRecord? implements DBWritable, Writable {

如上,推測是動態編譯環境的classpath沒有包含hadoop-common包導致的,在CompilationManager里查到了如下內容:

private String findHadoopJars() {String hadoopMapRedHome = options.getHadoopMapRedHome();if (null == hadoopMapRedHome) {LOG.info("$HADOOP_MAPRED_HOME is not set");return Jars.getJarPathForClass(JobConf.class);}if (!hadoopMapRedHome.endsWith(File.separator)) {hadoopMapRedHome = hadoopMapRedHome + File.separator;}File hadoopMapRedHomeFile = new File(hadoopMapRedHome);LOG.info("HADOOP_MAPRED_HOME is " + hadoopMapRedHomeFile.getAbsolutePath());Iterator<File> filesIterator = FileUtils.iterateFiles(hadoopMapRedHomeFile,new String[] { "jar" }, true);StringBuilder sb = new StringBuilder();while (filesIterator.hasNext()) {File file = filesIterator.next();String name = file.getName();if (name.startsWith("hadoop-common")|| name.startsWith("hadoop-mapreduce-client-core")|| name.startsWith("hadoop-core")) {sb.append(file.getAbsolutePath());sb.append(File.pathSeparator);}}if (sb.length() < 1) {LOG.warn("HADOOP_MAPRED_HOME appears empty or missing");return Jars.getJarPathForClass(JobConf.class);}String s = sb.substring(0, sb.length() - 1);LOG.debug("Returning jar file path " + s);return s;}

推測是由于配置里沒有hadoopMapRedHome這個參數,導致這個方法只能取到JobConf.class所在的jar包,即hadoop-core包。打開DEBUG進行驗證,找到如下日志:

2017-07-26 15:10:14 [INFO] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.findHadoopJars(CompilationManager.java:85)] $HADOOP_MAPRED_HOME is not set
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:171)] Current sqoop classpath = :/usr/local/tomcat6/bin/bootstrap.jar
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:195)] Adding source file: /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/QueryResult.java
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:199)] Invoking javac with args:
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? -sourcepath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? -d
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? -classpath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? :/usr/local/tomcat6/bin/bootstrap.jar:/var/www/webapps/***/WEB-INF/lib/hadoop-core-2.6.0-mr1-cdh5.5.2.jar:/var/www/webapps/***/WEB-INF/lib/sqoop-1.4.6-cdh5.5.2.jar

果然是缺少了jar包。在CompilationManager中查到classpath的組裝方式如下:

   String curClasspath = System.getProperty("java.class.path");LOG.debug("Current sqoop classpath = " + curClasspath);args.add("-sourcepath");args.add(jarOutDir);args.add("-d");args.add(jarOutDir);args.add("-classpath");args.add(curClasspath + File.pathSeparator + coreJar + sqoopJar);

可以通過兩種方式將缺失的jar添加進去:

1.直接修改java.class.path:

        String curClasspath = System.getProperty ("java.class.path");curClasspath = curClasspath+ File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-common-2.6.0-cdh5.5.2.jar"+ File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-mapreduce-client-core-2.6.0-cdh5.5.2.jar";System.setProperty ("java.class.path", curClasspath);

2.增加配置項(未嘗試):

--hadoop-mapred-home <dir>? 指定$HADOOP_MAPRED_HOME路徑

使用第一種方式后,已經能夠正常進行導入操作:

2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1547)] Job complete: job_local703153215_0001
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:566)] Counters: 18
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)]?? File System Counters
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of bytes read=15015144
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of bytes written=15688984
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of write operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of bytes read=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of bytes written=1536330810
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of read operations=40
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of write operations=36
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)]?? Map-Reduce Framework
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Map input records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Map output records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Input split bytes=455
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Spilled Records=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? CPU time spent (ms)=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Physical memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Virtual memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Total committed heap usage (bytes)=4080271360
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:184)] Transferred 1.4308 GB in 71.5332 seconds (20.4822 MB/sec)
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:186)] Retrieved 3272909 records.

?

至此,Sqoop Java API 導入demo完成。

參考文章:

http://shiyanjun.cn/archives/624.html Sqoop-1.4.4工具import和export使用詳解

http://blog.csdn.net/sl1992/article/details/53521819? Java操作Sqoop對象

轉載于:https://www.cnblogs.com/claren/p/7240735.html

版权声明:本站所有资料均为网友推荐收集整理而来,仅供学习和研究交流使用。

原文链接:https://hbdhgg.com/5/195011.html

发表评论:

本站为非赢利网站,部分文章来源或改编自互联网及其他公众平台,主要目的在于分享信息,版权归原作者所有,内容仅供读者参考,如有侵权请联系我们删除!

Copyright © 2022 匯編語言學習筆記 Inc. 保留所有权利。

底部版权信息