Spark program to read data from RDBMS

I wanted to figure out how to connect to RDBMS from spark and extract data, so i followed these steps. You can download this project form github First i did create Address table in my local mysql like this

CREATE TABLE `address` (  `addressid` int(11) NOT NULL AUTO_INCREMENT,  `contactid` int(11) DEFAULT NULL,  `line1` varchar(300) NOT NULL,  `city` varchar(50) NOT NULL,  `state` varchar(50) NOT NULL,  `zip` varchar(50) NOT NULL,  `lastmodified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,  PRIMARY KEY (`addressid`),  KEY `contactid` (`contactid`),  CONSTRAINT `address_ibfk_1` FOREIGN KEY (`contactid`) REFERENCES `CONTACT` (`contactid`)) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;

Then i did add 5 sample records to the address table. When i query address table on my local this is what i get After that i did create a Spark Scala project that has mysql-connector-java as one of the dependencies The last step was to create a simple Spark program like this, My program has 4 main sections

  1. First is Address as case class with same schema as that of Address table, without lastmodified field
  2. Next is this call to create object of JdbcRDD that says query everything from address with addressid between 1 and 5. new JdbcRDD(sparkContext, getConnection, "select * from address limit ?,?", 0, 5, 1, convertToAddress)
  3. Then i did define getConnection() method that creates JDBC connection to my database and returns it
  4. Last is the convertToAddress() method that knows how to take a ResultSet and convert it into object of Address

When i run this program in IDE this is the output i get