The WebSphere Notes

The WebSphere Notes (www.webspherenotes.com)  is a blog that has my study notes about WebSphere Application server administration and WebSphere Portal Server developer and administration certification.

 

Sunil has been in the IT industry for 10 years, worked with IBM Software Labs and was part of WebSphere Portal Server Development team for 4 years, and is now working for Ascendant Technology. Sunil has been working with WebSphere Portal since 2003. He is author of "Java Portlets 101" book and more than 25 articles and has a popular blog about portlet development and administration (http://wpcertification.blogspot.com)

Letzte Blogeinträge

  • Enabling Oozie console on Cloudera VM 4.4.0 and executing examples

    Samstag, 19. Juli 2014

    I am trying to learn about Apache Oozie, so i wanted to figure out how to use it in Cloudera 4.4.0 VM. When you go to the Oozie web console it shows a message saying that the Console is disabled. In order to enable the console i had to follow these steps

    1. Go to your Cloudera Manager, in that i went to the oozie configuration screen and i did check the Enable Oozie Server Web Console screen like this.

  • Where are MapReduce logs when your using yarn framework

    Freitag, 18. Juli 2014

    For last couple of months i have been using Yarn framework for running my mapreduce jobs. Normally using Yarn is transparent so i did not have to do any thing different but just change my mapred-site.xml file to set value of mapreduce.framework.name to yarn like this. But YARN affects how the logs and job history gets stored.

  • Using counters in MapReduce program

    Freitag, 18. Juli 2014

    While developing mapreduce jobs you might want to keep counters for some conditions that you find. For example in Map Reduce job that uses GeoIP to counts number of requests from particular city, i want to check how many requests came from US, India vs. other countries. Also there are cases when you try to find out location of a IP address and if the IP is not in the GeoIP database it throws error. I wanted to see how many ips are not found in DB.

  • Using DistributedCache with MapReduce job

    Mittwoch, 16. Juli 2014

    In the Using third part jars and files in your MapReduce application(Distributed cache) entry i blogged about how to use Distributed Cache in Hadoop using command line option. But you can also have option of using DistributedCache API.

  • Killing bad behaving mapreduce job

    Dienstag, 15. Juli 2014

    I was working on building this MapReduce program, and after submitting it i realized that i made a mistake and it was taking really long time to complete the job. So i decided to kill it. These are the steps that i followed First i did execute the mapred job -list command to get list of jobs that were in progress.

  • Configure LogStash to read Apache HTTP Server logs and add GeoIP information in it.

    Samstag, 12. Juli 2014

    LogStash is a tool that you can use for managing your logs. Basic idea is you configure logstash to read the log file, it enhances log records and then it writes those records to ElasticSearch. Then you can use Kibana to view your log files. I wanted to figure out where my web traffic is coming from, so i configured the LogStash server to read the HTTP server log, then used its geoip capability to find out the location of the request based on the ip of the request and store it in elastic search.

  • Using third part jars and files in your MapReduce application(Distributed cache)

    Freitag, 11. Juli 2014

    If you want to use a third party jar in your MapReduce program you have two options, one is to create a single jar with all dependencies and other is to use the hadoop distributed cache option. I wanted to play around with both these options, so i built this simple application in which i read the Standard Apache HTTP Server log and parse it to read the ip address request is coming from. Then i use the ip address and invoke the Geo IP lookup to find out what city, country that request came from.

  • How reading and writing of files in HDFS works

    Donnerstag, 3. Juli 2014

    Read Path

    1. The client program starts with Hadoop library jar and copy of cluster configuration data, that specifies the location of the name node.
    2. The client begins by contact the node node indicating the file it wants to read.
    3. The name node will validate clients identity, either by simply trusting client or using authentication protocol such as Kerberos.
    4. The client identity is verified against the owner and permission of the file.

  • HDFS Java Client

    Donnerstag, 3. Juli 2014

    Hadoop provides you with the Java API that you can use to perform some of the commonly used file operations such as read, create a new file or append at the end of the existing file or search for files. I wanted to try these common operartions out so i built this HelloHDFS project, that you can download from here This is the main class that takes command line argument for operation name and file path and performs the operation.

  • HDFS Daemons

    Mittwoch, 2. Juli 2014

    An HDFS cluster has two types of nodes operating in master-worker pattern

    • NameNode: Manages the filesystem's directory structure and meta data for all the files. This information is persisted on local disk in the form of 2 files
      1. fsimage This is master copy of the metadata for the file system.
      2. edits: This file stores changes(delta/modifications) made to the meta information. In new version of hadoop (I am using 2.4) there would be multiple edit files(per transaction) that get created which store the changes made to meta.