Recently i had a requirement in which i wanted to figure out how to read XML documents stored as message in IBM MQ and post them into Hadoop. I decided to use Apache Flume + Flume JMS Source + Flume HDFS Sink for this. I had to use following steps for this setup. Please note that i am not WebSphere MQ expert so there might be a better/easier way to achieve this.
INITIAL_CONTEXT_FACTORY=com.sun.jndi.fscontext.RefFSContextFactoryPROVIDER_URL=file:/C:/temp/jmsbinding
DEF CF(myConnectionFactory) QMGR(myQueueManager) HOSTNAME(myHostName) PORT(1426) CHANNEL(myChannelName) TRANSPORT(CLIENT)
Once you execute this command it will generate .bindings file in C:/temp/jmsbinding (Folder that is configured as value of PROVIDER_URL)
/etc/flume/conf
folder in my linux box which has Flume running on it. C:\Program Files (x86)\IBM\WebSphere MQ\java\lib
to /usr/hdp/current/flume-server/lib/
folder in my Hadoop installation, but i kept getting ClassNotFoundException and to deal with that i copied more and more jars from my MQ Client into Flume jms.jarfscontext.jarjndi.jarproviderutil.jarcom.ibm.mq.jarcom.ibm.mqjms.jarcom.ibm.mq.pcf.jarconnector.jardhbcore.jarcom.ibm.mq.jmqi.jarcom.ibm.mq.headers.jar
# Flume agent config#st the sources, channels, and sinks for the agentggflume.sources = jmsggflume.channels = memoryggflume.sinks = hadoopggflume.sources.jms.channels=memoryggflume.sinks.hadoop.channel=memoryggflume.sources.jms.type = jmsggflume.sources.jms.providerURL = file:///etc/flume/confggflume.sources.jms.initialContextFactory = com.sun.jndi.fscontext.RefFSContextFactoryggflume.sources.jms.destinationType=QUEUEggflume.sources.jms.destinationName=<channelName>ggflume.sources.jms.connectionFactory=myConnectionFactoryggflume.sources.jms.batchSize=1ggflume.channels.memory.type = memoryggflume.channels.memory.capacity = 1000ggflume.channels.memory.transactionCapacity = 100ggflume.sinks.hadoop.type=hdfsggflume.sinks.hadoop.hdfs.path=/data/mq/xmlggflume.sinks.hadoop.hdfs.filePrefix=sample
flume-ng agent --conf conf --conf-file mqflume.conf --name ggflume -Dflume.root.logger=DEBUG,console
Now you should see the existing messages from MQ being dumped into HDFS