101. What Mapper does?
Ans: Maps are the individual tasks that transform i
nput records into intermediate records. The transformed
intermediate records do not need
to be of the same type as the input records. A given input
pair may map to zero or many
output pairs.
102. What is the InputSplit in map reduce software
Ans: An InputSplit is a logical representation of a unit (A
chunk) of input work for a
map task; e.g., a filename and a byte range within that file
to process or a row set in a text
file.
103. What is the InputFormat ?
Ans: The InputFormat is responsible for enumerate (itemise)
the InputSplits, and
producing a RecordReader which will turn those logical work
units into actual physical
input records.
104. Where do you specify the Mapper Implementation?
Ans: Generally mapper implementation is specified in the Job
itself.
105. How Mapper is instantiated in a running job 9
Ans: The Mapper itself is instantiated in the running job, and
will be passed a
MapContext object which it can use to configure itself.
106. Which are the methods in the Mapper interface?
Ans : The Mapper contains the run() method, which call its
own setup() method
only once, it also call a map() method for each input and
finally calls it cleanup()
method. All above methods you can override in your code.
107. What happens if you don’t override the Mapper methods
and keep them as
it is?
Ans: If you do not override any methods (leaving even map
as-is), it will act as
the identity function, emitting each input record as a
separate output.
108. What is the use of Context object?
Ans: The Context object allows the mapper to interact with
the rest of the Hadoop
system. It Includes configuration data for the job, as well as
interfaces which allow it to emit
output.
109. How can you add the arbitrary key-value pairs in your
mapper?
Ans: You can set arbitrary (key, value) pairs of
configuration data in your Job,
e.g. with Job.getConfiguration().set("myKey",
"myVal"), and then retrieve this data
in your mapper with
Context.getConfiguration().get("myKey"). This kind of
functionality is typically done in the Mapper's setup()
method.
110. How does Mapper’s run() method works?
Ans: The Mapper.run() method then calls map(KeyInType,
ValInType, Context) for each
key/value pair in the InputSplit for that task
Thanks for sharing this latest interview questions admin, it is really useful to me.
ReplyDeleteHadoop Training in Chennai | Big Data Hadoop Training in Chennai | Big Data Training