Hadoop Internals


Fork me on GitHub

Hadoop Configuration parameters


Parameter File Default Diagram(s)
mapreduce.task.io.sort.mb mapred-site.xml 100 MapTask > Shuffle
MapTask > Execution
mapreduce.map.sort.spill.percent mapred-site.xml 0.80 MapTask > Shuffle
MapTask > Execution
mapreduce.task.io.sort.factor mapred-site.xml 100 MapTask > Shuffle
Merge
ReduceTask > Shuffle
mapreduce.map.combine.minspills mapred-site.xml 3 MapTask > Shuffle
mapreduce.job.reduces mapred-site.xml 1 MapTask > Shuffle
0 Job > NEW => INITED
mapreduce.cluster.local.dir mapred-site.xml ${hadoop.tmp.dir}/mapred/local MapTask > Shuffle
mapreduce.reduce.merge.memtomem.enabled mapred-site.xml False Reduce Task > Shuffle
mapreduce.framework.name mapred-site.xml yarn/local Reduce Task > Shuffle
mapreduce.reduce.shuffle.parallelcopies mapred-site.xml 5 Reduce Task > Shuffle
mapreduce.reduce.memory.totalbytes mapred-site.xml Runtime.maxMemory() Reduce Task > Fetcher
mapreduce.reduce.shuffle.memory.limit.percent mapred-site.xml 0.25 Reduce Task > Fetcher
mapreduce.job.ubertask.enable mapred-site.xml False Job > NEW => INITED
mapreduce.job.ubertask.maxmaps mapred-site.xml 9 Job > NEW => INITED
mapreduce.job.ubertask.maxreduces mapred-site.xml 1 Job > NEW => INITED
mapreduce.job.ubertask.maxbytes mapred-site.xml dfs.block.size Job > NEW => INITED
mapreduce.map.
failures.maxpercent mapred-site.xml 0 Job > RUNNING => {RUNNING, COMMITTING, FAIL ABORT}
mapreduce.reduce.
failures.maxpercent mapred-site.xml 0 Job > RUNNING => {RUNNING, COMMITTING, FAIL ABORT}
mapreduce.map.memory.mb mapred-site.xml 1024 Task Attempt > NEW => UNASSIGNED
mapreduce.reduce.memory.mb mapred-site.xml 1024 Task Attempt > NEW => UNASSIGNED
scheduler.maximum-allocation-mb yarn-site.xml 8192 Container Allocator
mapreduce.reduce.shuffle.merge.percent mapred-site.xml 0.90 Reduce Task > Shuffle
yarn.resourcemanager.scheduler.class yarn-site.xml CapacityScheduler Resource Manager