環境信息:
Linux+JDK1.7
java開發api接口的例子?Sqoop 1.4.6-cdh5.5.2
hadoop-core 2.6.0-mr1-cdh5.5.2
hadoop-common 2.6.0-cdh5.5.2
Java API、hadoop-mapreduce-client-core 2.6.0-cdh5.5.2
需求:
將oracle中的某表導入到hdfs
實現:
首先組織Sqoop命令:
String[] args = new String[] { // Oracle數據庫信息"--connect","jdbc:oracle:thin:@***:1522/**","-username","***","-password","***",
// 查詢sql"--query","select * from TABLE_NAME where $CONDITIONS and create_date>=date'2017-05-01' and create_date<date'2017-06-01' ","-split-by","id","--hive-overwrite","--fields-terminated-by","'\\001'","--hive-drop-import-delims","--null-string","'\\\\N'","--null-non-string","'\\\\N'","--verbose","--target-dir","/user/hive/warehouse/test.db/H_TABLE_NAME"};
執行Sqoop任務:
String[] expandArguments = OptionsFileUtil.expandArguments(args);SqoopTool tool = SqoopTool.getTool("import");Configuration conf = new Configuration();conf.set("fs.default.name", "hdfs://nameservice1");//設置HDFS服務地址Configuration loadPlugins = SqoopTool.loadPlugins(conf);Sqoop sqoop = new Sqoop((com.cloudera.sqoop.tool.SqoopTool) tool, loadPlugins);int res = Sqoop.runSqoop(sqoop, expandArguments);if (res == 0)log.info ("成功");
完成編碼后,發到測試環境進行測試,發現Sqoop在進行動態編譯時報編譯錯誤:
2017-07-26 15:10:15 [ERROR] [http-0.0.0.0-8080-6] [org.apache.sqoop.tool.ImportTool.run(ImportTool.java:613)] Encountered IOException running import job: java.io.IOException: Error returned by javac
?? ?at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:217)
?? ?at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108)
?? ?at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
?? ?at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
?? ?at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
?? ?at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
?? ?at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
動態編譯的日志如果沒有特殊配置的話,是無法通過log4j進行輸出的,因此,編譯錯誤需要到系統日志里查找:
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:7: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
?????????????????????????? ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
?????????????????????????? ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
?????????????????????????? ^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:37: error: cannot access Writable
public class QueryResult extends SqoopRecord? implements DBWritable, Writable {
如上,推測是動態編譯環境的classpath沒有包含hadoop-common包導致的,在CompilationManager里查到了如下內容:
private String findHadoopJars() {String hadoopMapRedHome = options.getHadoopMapRedHome();if (null == hadoopMapRedHome) {LOG.info("$HADOOP_MAPRED_HOME is not set");return Jars.getJarPathForClass(JobConf.class);}if (!hadoopMapRedHome.endsWith(File.separator)) {hadoopMapRedHome = hadoopMapRedHome + File.separator;}File hadoopMapRedHomeFile = new File(hadoopMapRedHome);LOG.info("HADOOP_MAPRED_HOME is " + hadoopMapRedHomeFile.getAbsolutePath());Iterator<File> filesIterator = FileUtils.iterateFiles(hadoopMapRedHomeFile,new String[] { "jar" }, true);StringBuilder sb = new StringBuilder();while (filesIterator.hasNext()) {File file = filesIterator.next();String name = file.getName();if (name.startsWith("hadoop-common")|| name.startsWith("hadoop-mapreduce-client-core")|| name.startsWith("hadoop-core")) {sb.append(file.getAbsolutePath());sb.append(File.pathSeparator);}}if (sb.length() < 1) {LOG.warn("HADOOP_MAPRED_HOME appears empty or missing");return Jars.getJarPathForClass(JobConf.class);}String s = sb.substring(0, sb.length() - 1);LOG.debug("Returning jar file path " + s);return s;}
推測是由于配置里沒有hadoopMapRedHome這個參數,導致這個方法只能取到JobConf.class所在的jar包,即hadoop-core包。打開DEBUG進行驗證,找到如下日志:
2017-07-26 15:10:14 [INFO] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.findHadoopJars(CompilationManager.java:85)] $HADOOP_MAPRED_HOME is not set
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:171)] Current sqoop classpath = :/usr/local/tomcat6/bin/bootstrap.jar
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:195)] Adding source file: /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/QueryResult.java
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:199)] Invoking javac with args:
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? -sourcepath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? -d
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? -classpath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)]?? :/usr/local/tomcat6/bin/bootstrap.jar:/var/www/webapps/***/WEB-INF/lib/hadoop-core-2.6.0-mr1-cdh5.5.2.jar:/var/www/webapps/***/WEB-INF/lib/sqoop-1.4.6-cdh5.5.2.jar
果然是缺少了jar包。在CompilationManager中查到classpath的組裝方式如下:
String curClasspath = System.getProperty("java.class.path");LOG.debug("Current sqoop classpath = " + curClasspath);args.add("-sourcepath");args.add(jarOutDir);args.add("-d");args.add(jarOutDir);args.add("-classpath");args.add(curClasspath + File.pathSeparator + coreJar + sqoopJar);
可以通過兩種方式將缺失的jar添加進去:
1.直接修改java.class.path:
String curClasspath = System.getProperty ("java.class.path");curClasspath = curClasspath+ File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-common-2.6.0-cdh5.5.2.jar"+ File.pathSeparator+ "/var/www/webapps/***/WEB-INF/lib/hadoop-mapreduce-client-core-2.6.0-cdh5.5.2.jar";System.setProperty ("java.class.path", curClasspath);
2.增加配置項(未嘗試):
--hadoop-mapred-home <dir>
? 指定$HADOOP_MAPRED_HOME路徑
使用第一種方式后,已經能夠正常進行導入操作:
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1547)] Job complete: job_local703153215_0001
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:566)] Counters: 18
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)]?? File System Counters
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of bytes read=15015144
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of bytes written=15688984
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? FILE: Number of write operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of bytes read=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of bytes written=1536330810
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of read operations=40
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? HDFS: Number of write operations=36
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)]?? Map-Reduce Framework
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Map input records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Map output records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Input split bytes=455
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Spilled Records=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? CPU time spent (ms)=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Physical memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Virtual memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)]???? Total committed heap usage (bytes)=4080271360
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:184)] Transferred 1.4308 GB in 71.5332 seconds (20.4822 MB/sec)
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:186)] Retrieved 3272909 records.
?
至此,Sqoop Java API 導入demo完成。
參考文章:
http://shiyanjun.cn/archives/624.html Sqoop-1.4.4工具import和export使用詳解
http://blog.csdn.net/sl1992/article/details/53521819? Java操作Sqoop對象