Q1. What are the default configuration files that are used in Hadoop
As of 0.20 release, Hadoop supported the following read-only default configurations
- src/core/core-default.xml
- src/hdfs/hdfs-default.xml
- src/mapred/mapred-default.xml
Q2. How will you make changes to the default configuration files
Hadoop does not recommends changing the default configuration files, instead it recommends making all site specific changes in the following files
- conf/core-site.xml
- conf/hdfs-site.xml
- conf/mapred-site.xml
Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:
- core-default.xml : Read-only defaults for hadoop.
- core-site.xml: Site-specific configuration for a given hadoop installation.
Hence if same configuration is defined in file core-default.xml and src/core/core-default.xml then the values in file core-default.xml (same is true for other 2 file pairs) is used.
Q3. Consider case scenario where you have set property mapred.output.compress totrue to ensure that all output files are compressed for efficient space usage on the cluster. If a cluster user does not want to compress data for a specific job then what will you recommend him to do ?
Ask him to create his own configuration file and specify configuration mapred.output.compressto false and load this file as a resource in his job.
Q4. In the above case scenario, how can ensure that user cannot override the configuration mapred.output.compress to false in any of his jobs
This can be done by setting the property final to true in the core-site.xml file
Q5. What of the following is the only required variable that needs to be set in file conf/hadoop-env.sh for hadoop to work
- HADOOP_LOG_DIR
The only required variable to set is JAVA_HOME that needs to point to <java installation> directory
Q6. List all the daemons required to run the Hadoop cluster
- NameNode
- SecondaryNameNode
- DataNode
- JobTracker
- TaskTracker
Q7. Whats the default port that jobtrackers listens to
50030
Q8. Whats the default port where the dfs namenode web ui will listen on
50070
Learn Hadoop: Video Tutorial:
As of 0.20 release, Hadoop supported the following read-only default configurations
- src/core/core-default.xml
- src/hdfs/hdfs-default.xml
- src/mapred/mapred-default.xml
Q2. How will you make changes to the default configuration files
Hadoop does not recommends changing the default configuration files, instead it recommends making all site specific changes in the following files
- conf/core-site.xml
- conf/hdfs-site.xml
- conf/mapred-site.xml
Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:
- core-default.xml : Read-only defaults for hadoop.
- core-site.xml: Site-specific configuration for a given hadoop installation.
Hence if same configuration is defined in file core-default.xml and src/core/core-default.xml then the values in file core-default.xml (same is true for other 2 file pairs) is used.
Q3. Consider case scenario where you have set property mapred.output.compress totrue to ensure that all output files are compressed for efficient space usage on the cluster. If a cluster user does not want to compress data for a specific job then what will you recommend him to do ?
Ask him to create his own configuration file and specify configuration mapred.output.compressto false and load this file as a resource in his job.
Q4. In the above case scenario, how can ensure that user cannot override the configuration mapred.output.compress to false in any of his jobs
This can be done by setting the property final to true in the core-site.xml file
Q5. What of the following is the only required variable that needs to be set in file conf/hadoop-env.sh for hadoop to work
- HADOOP_LOG_DIR
- JAVA_HOME
- HADOOP_CLASSPATHThe only required variable to set is JAVA_HOME that needs to point to <java installation> directory
Q6. List all the daemons required to run the Hadoop cluster
- NameNode
- SecondaryNameNode
- DataNode
- JobTracker
- TaskTracker
Q7. Whats the default port that jobtrackers listens to
50030
Q8. Whats the default port where the dfs namenode web ui will listen on
50070
Learn Hadoop: Video Tutorial:
This comment has been removed by a blog administrator.
ReplyDelete
ReplyDeleteHi, Your post is good. Its so useful for me about Hadoop.Thank you for your post.
Hadoop Training in Chennai