Q31. How will you write a custom partitioner for a Hadoop job
To have hadoop use a custom partitioner you will have to do minimum the following three
- Create a new class that extends Partitioner class
- Override method getPartition
- In the wrapper that runs the Map Reducer, either
- add the custom partitioner to the job programtically using method setPartitionerClass or
- add the custom partitioner to the job as a config file (if your wrapper reads from config file or oozie)
Q32. How did you debug your Hadoop code
There can be several ways of doing this but most common ways are
- By using counters
- The web interface provided by Hadoop framework
Q33. Did you ever built a production process in Hadoop ? If yes then what was the process when your hadoop job fails due to any reason
Its an open ended question but most candidates, if they have written a production job, should talk about some type of alert mechanisn like email is sent or there monitoring system sends an alert. Since Hadoop works on unstructured data, its very important to have a good alerting system for errors since unexpected data can very easily break the job.
Q34. Did you ever ran into a lop sided job that resulted in out of memory error, if yes then how did you handled it
This is an open ended question but a candidate who claims to be an intermediate developer and has worked on large data set (10-20GB min) should have run into this problem. There can be many ways to handle this problem but most common way is to alter your algorithm and break down the job into more map reduce phase or use a combiner if possible.
To have hadoop use a custom partitioner you will have to do minimum the following three
- Create a new class that extends Partitioner class
- Override method getPartition
- In the wrapper that runs the Map Reducer, either
- add the custom partitioner to the job programtically using method setPartitionerClass or
- add the custom partitioner to the job as a config file (if your wrapper reads from config file or oozie)
Q32. How did you debug your Hadoop code
There can be several ways of doing this but most common ways are
- By using counters
- The web interface provided by Hadoop framework
Q33. Did you ever built a production process in Hadoop ? If yes then what was the process when your hadoop job fails due to any reason
Its an open ended question but most candidates, if they have written a production job, should talk about some type of alert mechanisn like email is sent or there monitoring system sends an alert. Since Hadoop works on unstructured data, its very important to have a good alerting system for errors since unexpected data can very easily break the job.
Q34. Did you ever ran into a lop sided job that resulted in out of memory error, if yes then how did you handled it
This is an open ended question but a candidate who claims to be an intermediate developer and has worked on large data set (10-20GB min) should have run into this problem. There can be many ways to handle this problem but most common way is to alter your algorithm and break down the job into more map reduce phase or use a combiner if possible.
This comment has been removed by a blog administrator.
ReplyDelete