Latest Hadoop Interview Questions -Part 7

11.   Hadoop HDFS architecture

img:triangle



12.   Map Reduce in Hadoop

Map reduce :
it is a framework for processing in parallel across huge datasets usning large no. of computers referred to cluster, it involves two processes namely Map and reduce.

img:Hadoop

Map Process:
In this process input is taken by the master node,which divides it into smaller tasks and distribute them to the workers nodes. The workers nodes process these sub tasks and pass them back to the master node.

Reduce Process :
In this the master node combines all the answers provided by the worker nodes to get the results of the original task. The main advantage of Map reduce is that the map and reduce are performed in distributed mode. Since each operation is independent, so each map can be performed in parallel and hence reducing the net computing time.


13.   What is a heartbeat in HDFS?

A heartbeat is a signal indicating that it is alive. A data node sends heartbeat to Name node and task tracker will send its heart beat to job tracker. If the Name node or job tracker does not receive heart beat then they will decide that there is some problem in data node or task tracker is unable to perform the assigned task.



14.   What is a metadata?

Metadata is the information about the data stored in data nodes such as location of the file, size of the file and so on.



15.   What is a Data node?

Data nodes are the slaves which are deployed on each machine and provide the actual storage. 
These are responsible for serving read and write requests for the clients.



16.   What is a Name node?

Name node is the master node on which job tracker runs and consists of the metadata. $ It maintains and manages the blocks which are present on the datanodes. $It is a high-availability machine and single point of failure in HDFS.



17.   Is Namenode also a commodity?

No. 
Namenode can never be a commodity hardware because the entire HDFS rely on it. 
It is the single point of failure in HDFS. Namenode has to be a high-availability machine.



18.   Can Hadoop be compared to NOSQL database like Cassandra?

Though NOSQL is the closet technology that can be compared to Hadoop, it has its own pros and cons. There is no DFS in NOSQL. Hadoop is not a database. It’s a filesystem (HDFS) and distributed programming framework (MapReduce).



19.   What is Key value pair in HDFS?

Key value pair is the intermediate data generated by maps and sent to reduces for generating the final output.



20.   What is the difference between MapReduce engine and HDFS cluster?

HDFS cluster is the name given to the whole configuration of master and slaves where data is stored. Map Reduce Engine is the programming module which is used to retrieve and analyze data.


1 comment:

  1. Good work, great updating of information frequently. Thank you for sharing this article with us. Know more about Big Data Hadoop Training in Bangalore

    ReplyDelete