121. Which object can be used to get the progress of a
particular job ?
Ans: Context
122. What is next step after Mapper or MapTask?
Ans : The output of the Mapper are sorted and Partitions
will be created for the
output. Number of partition depends on the number of
reducer.
123. How can we control particular key should go in a
specific reducer?
Ans: Users can control which keys (and hence records) go to
which Reducer by
implementing a custom Partitioner.
124. What is the use of Combiner?
Ans: It is an optional component or class, and can be
specify via
Job.setCombinerClass(ClassName), to perform local
aggregation of the
intermediate outputs, which helps to cut down the amount of
data transferred
from the Mapper to the Reducer.
125. How many maps are there in a particular Job?
Ans: The number of maps is usually driven by the total size
of the inputs, that is,
the total number of blocks of the input files.
Generally it is around 10-100 maps per-node. Task setup
takes awhile, so it is
best if the maps take at least a minute to execute.
Suppose, if you expect 10TB of input data and have a
blocksize of 128MB, you'll
end up with 82,000 maps, to control the number of block you can use the
mapreduce.job.maps parameter (which only provides a hint to
the framework).
Ultimately, the number of tasks is controlled by the number
of splits returned by
the InputFormat.getSplits() method (which you can override).
126. What is the Reducer used for?
Ans: Reducer reduces a set of intermediate values which
share a key to a (usually smaller) set of values.
The number of reduces for the job is set by the user via Job.setNumReduceTasks(int).
127. Explain the core methods of the Reducer?
Ans: The API of Reducer is very similar to that of Mapper,
there's a run() method that receives a Context containing the job's configuration
as well as interfacing methods that return data from the reducer itself back to the
framework. The run() method calls setup() once, reduce() once for each key
associated with the
reduce task, and cleanup() once at the end. Each of these
methods can access the job's configuration data by using
Context.getConfiguration(). As in Mapper, any or all of these methods can be overridden
with custom implementations. If none of these methods are overridden,
the default reducer operation is the identity function; values are passed
through without further processing.
The heart of Reducer is its reduce() method. This is called
once per key; the second argument is an Iterable which returns all the values
associated with that key.
128. What are the primary phases of the Reducer?
Ans: Shuffle, Sort and Reduce
129. Explain the shuffle?
Ans: Input to the Reducer is the sorted output of the
mappers. In this phase the
framework fetches the relevant partition of the output of
all the mappers, via HTTP.
30. Explain the Reducer’s Sort phase
Ans: The framework groups Reducer inputs by keys (since
different mappers may
have output the same key) in this stage. The shuffle and
sort phases occur simultaneously;
while map-outputs are being fetched they are merged (It is
similar to merge-sort).
No comments:
Post a Comment