11. Hadoop HDFS architecture
12. Map Reduce in Hadoop
Map reduce :
it is a framework for processing in parallel across huge datasets usning large no. of computers referred to cluster, it involves two processes namely Map and reduce.
Map Process:
In this process input is taken by the master node,which divides it into smaller tasks and distribute them to the workers nodes. The workers nodes process these sub tasks and pass them back to the master node.
Reduce Process :
In this the master node combines all the answers provided by the worker nodes to get the results of the original task. The main advantage of Map reduce is that the map and reduce are performed in distributed mode. Since each operation is independent, so each map can be performed in parallel and hence reducing the net computing time.
it is a framework for processing in parallel across huge datasets usning large no. of computers referred to cluster, it involves two processes namely Map and reduce.
Map Process:
In this process input is taken by the master node,which divides it into smaller tasks and distribute them to the workers nodes. The workers nodes process these sub tasks and pass them back to the master node.
Reduce Process :
In this the master node combines all the answers provided by the worker nodes to get the results of the original task. The main advantage of Map reduce is that the map and reduce are performed in distributed mode. Since each operation is independent, so each map can be performed in parallel and hence reducing the net computing time.
13. What is a heartbeat in HDFS?
A heartbeat is a signal indicating that it is alive. A data node
sends heartbeat to Name node and task tracker will send its heart beat to job
tracker. If the Name node or job tracker does not receive heart beat then they
will decide that there is some problem in data node or task tracker is unable
to perform the assigned task.
14. What is a metadata?
Metadata is the information about the data stored in data nodes
such as location of the file, size of the file and so on.
15. What is a Data node?
Data nodes are the slaves which are deployed on each machine and
provide the actual storage.
These are responsible for serving read and write requests for the clients.
These are responsible for serving read and write requests for the clients.
16. What is a Name node?
Name node is the master node on which job tracker runs and
consists of the metadata. $ It maintains and manages the blocks which are
present on the datanodes. $It is a high-availability machine and single point
of failure in HDFS.
17. Is Namenode also a commodity?
No.
Namenode can never be a commodity hardware because the entire HDFS rely on it.
It is the single point of failure in HDFS. Namenode has to be a high-availability machine.
Namenode can never be a commodity hardware because the entire HDFS rely on it.
It is the single point of failure in HDFS. Namenode has to be a high-availability machine.
18. Can Hadoop be compared to NOSQL
database like Cassandra?
Though NOSQL is the closet technology that can be compared to
Hadoop, it has its own pros and cons. There is no DFS in NOSQL. Hadoop is not a
database. It’s a filesystem (HDFS) and distributed programming framework
(MapReduce).
19. What is Key value pair in HDFS?
Key value pair is the intermediate data generated by maps and sent
to reduces for generating the final output.
20. What is the difference between
MapReduce engine and HDFS cluster?
HDFS cluster is the name given to the whole configuration of
master and slaves where data is stored. Map Reduce Engine is the programming
module which is used to retrieve and analyze data.
Good work, great updating of information frequently. Thank you for sharing this article with us. Know more about Big Data Hadoop Training in Bangalore
ReplyDelete