How to run Sqoop command from oozie

In the Importing data from Sqoop into Hive External Table with Avro encoding updated i blogged about how you can use sqoop to import data from RDBMS into Hadoop. I wanted to test if i can use Oozie for invoking Sqoop command and i followed these steps for doing that.

  1. First i tried executing this command from my command line on Hadoop cluster to make sure that i can actually run sqoop without any problem

sqoop import --connect jdbc:mysql://localhost/test --username root --password cloudera --table CUSTOMER --as-avrodatafile
  • Once the sqoop command was successfully executed i went back and deleted the CUSTOMER directory from HDFS to make sure that i could re-import data using following command
    hdfs dfs -rm -R CUSTOMER
  • Next i went to Hue to create oozie workflow with single sqoop command that i had executed before But if your not using the Hue console you can create workflow.xml manually like this Also make sure to create job.properties file like this Take a look at Enabling Oozie console on Cloudera VM 4.4.0 and executing examples for information on how to run oozie job from command line
  • Next when i ran the Oozie workflow, the job failed with following error, which indicates that Oozie does not have the MySQL JDBC driver.
    java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875) at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1846) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1646) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197) at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49)
  • So first thing i did was to check if mysql driver is there in the oozie shared lib by executing following commands
    export OOZIE_URL=http://localhost:11000/oozieoozie admin -shareliblist sqoop

    I noticed that the mysql-connector-java.jar was not there in the list of shared libs for Oozie + sqoop

  • Next step was to find the mysql-connector-java.jar in my sandbox that i could do by finding it like this
    sudo find / -name mysql*

    I found mysql-connector-java.jar on my local machine at /var/lib/sqoop/mysql-connector-java.jar

  • I wanted to update the Oozie shared lib to include the mysql driver jar. So i executed following command to figure out the directory where the oozie sqoop shared lib is
    oozie admin -sharelibupdate

    From this output i got HDFS directory location for Oozie shared lib which is /user/oozie/share/lib/lib_20160406022812

  • Then i used following two commands to first copy the db driver into the oozie shared lib and making sure it is accessible to other users hdfs -copyFromLocal /var/lib/sqoop/mysql-connector-java.jar /user/oozie/share/lib/sqoop/. hdfs dfs -chmod 777 /user/oozie/share/lib/sqoop/mysql-connector-java.jar
  • Now the last step was to let Oozie know that it should reload the sharedlib and i did that by executing following two commands
    oozie admin -sharedlibupdateoozie admin -shareliblist sqoop | grep mysql*

    The second command queries oozie to get current list of shared jars and i could see mysql-connector-java.jar listed in it like this

  • When i re-executed the ooize job again this time it ran successfully.