How to run Sqoop command from oozie

In the Importing data from Sqoop into Hive External Table with Avro encoding updated i blogged about how you can use sqoop to import data from RDBMS into Hadoop. I wanted to test if i can use Oozie for invoking Sqoop command and i followed these steps for doing that.

  1. First i tried executing this command from my command line on Hadoop cluster to make sure that i can actually run sqoop without any problem

sqoop import --connect jdbc:mysql://localhost/test --username root --password cloudera --table CUSTOMER --as-avrodatafile
  • Once the sqoop command was successfully executed i went back and deleted the CUSTOMER directory from HDFS to make sure that i could re-import data using following command
    hdfs dfs -rm -R CUSTOMER
  • Next i went to Hue to create oozie workflow with single sqoop command that i had executed before But if your not using the Hue console you can create workflow.xml manually like this Also make sure to create file like this Take a look at Enabling Oozie console on Cloudera VM 4.4.0 and executing examples for information on how to run oozie job from command line
  • Next when i ran the Oozie workflow, the job failed with following error, which indicates that Oozie does not have the MySQL JDBC driver.
    java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver at org.apache.sqoop.manager.SqlManager.makeConnection( at org.apache.sqoop.manager.GenericJdbcManager.getConnection( at org.apache.sqoop.manager.SqlManager.execute( at org.apache.sqoop.manager.SqlManager.execute( at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery( at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery( at org.apache.sqoop.manager.SqlManager.getColumnTypes( at org.apache.sqoop.manager.ConnManager.getColumnTypes( at org.apache.sqoop.orm.ClassWriter.getColumnTypes( at org.apache.sqoop.orm.ClassWriter.generate( at org.apache.sqoop.tool.CodeGenTool.generateORM( at org.apache.sqoop.tool.ImportTool.importTable( at at at at org.apache.sqoop.Sqoop.runSqoop( at org.apache.sqoop.Sqoop.runTool( at org.apache.sqoop.Sqoop.runTool( at org.apache.sqoop.Sqoop.main( at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob( at at
  • So first thing i did was to check if mysql driver is there in the oozie shared lib by executing following commands
    export OOZIE_URL=http://localhost:11000/oozieoozie admin -shareliblist sqoop

    I noticed that the mysql-connector-java.jar was not there in the list of shared libs for Oozie + sqoop

  • Next step was to find the mysql-connector-java.jar in my sandbox that i could do by finding it like this
    sudo find / -name mysql*

    I found mysql-connector-java.jar on my local machine at /var/lib/sqoop/mysql-connector-java.jar

  • I wanted to update the Oozie shared lib to include the mysql driver jar. So i executed following command to figure out the directory where the oozie sqoop shared lib is
    oozie admin -sharelibupdate

    From this output i got HDFS directory location for Oozie shared lib which is /user/oozie/share/lib/lib_20160406022812

  • Then i used following two commands to first copy the db driver into the oozie shared lib and making sure it is accessible to other users hdfs -copyFromLocal /var/lib/sqoop/mysql-connector-java.jar /user/oozie/share/lib/sqoop/. hdfs dfs -chmod 777 /user/oozie/share/lib/sqoop/mysql-connector-java.jar
  • Now the last step was to let Oozie know that it should reload the sharedlib and i did that by executing following two commands
    oozie admin -sharedlibupdateoozie admin -shareliblist sqoop | grep mysql*

    The second command queries oozie to get current list of shared jars and i could see mysql-connector-java.jar listed in it like this

  • When i re-executed the ooize job again this time it ran successfully.