Importing data from Sqoop into Hive External Table with Avro encoding updated

In the Importing data from Sqoop into Hive External Table with Avro encoding i had details on how you can import a table from RDBMS into Hive using Sqoop in Avro format. In that blog i went through few steps to get the avsc file, but i realized there is easier way to do it following these steps

  1. First execute the sqoop import command like this, make sure that you pass --outdir schema as parameters to the sqoop import command, what that does is it generates the CUSTOMER.avsc and CUSTOMER.java in the schema directory on your local machine

sqoop import --connect jdbc:mysql://localhost/test --username root --password cloudera --table CUSTOMER --as-avrodatafile --outdir schema
  • You can verify that CUSTOMER.avsc file got created as you expected by executing ls -ltrA schema
  • Next create schema directory in HDFS by executing hdfs mkdir command like this
    hdfs dfs -mkdir /user/cloudera/schema
  • Copy the CUSTOMER.avsc from your local schema directory to HDFS in schema directory by executing following command
    hdfs dfs -copyFromLocal schema/CUSTOMER.avsc /user/cloudera/schema/.
  • Last step is to create Hive table with CUSTOMER.avsc as schema using following command
    CREATE EXTERNAL TABLE CUSTOMERROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'LOCATION '/user/cloudera/CUSTOMER'TBLPROPERTIES ('avro.schema.url'='/user/cloudera/schema/CUSTOMER.avsc');
  • Now if you go to hive and execute "SELECT * FROM CUSTOMER;" query then you should see 1 record in it like this