当某个TaskTracker上出现空闲slot时,调度器依次选择一个queue、(选中的queue中的)job、(选中的job中的)task,并将该slot分配给该task。下面是选择queue、job和task所采用的策略:
①选择queue:当所有queue按照资源使用率(numSlotsOccupied/capacity)由小到大排序,依次进行处理,直到找到一个合适的job。
②选择job:在当前queue中,所有作业按照作业提交时间和作业优先级进行排序(假设开启支持优先级调度功能,默认不支持,需要在配置文件中开启),调度依次考虑每个作业,选择符合两个条件的job:【1】作业所在的用户未达到资源使用上限【2】该TaskTracker所在的节点剩余的内存足够该job的task使用。
③选择task,同大部分调度器一样,考虑task的locality和资源使用情况(即:调用jobInProgress中的obtainNewMapTask()/obtainNewReduceTask()方法)
综上所述,能力调度器的伪代码为:
// CapacityTaskScheduler:trackTracker出现空闲slot,为slot寻找合适的task
List<Task> assignTasks(TaskTrackerStatus taskTracker) {
sortQueuesByResourcesUsesage(queues);
for queue:queues {
sortJobsByTimeAndPriority(queue);
for job:queue.getJobs() {
if(matchesMemoryRequirements(job,taskTracker)) {
task = job. obtainNewTask();
if(task != null) return task
}
}
}
}
4、capacity Scheduler配置实例
①. 复制$HADOOP_HOME/contrib/capacity-scheduler/hadoop-capacity-scheduler.jar 到$HADOOP_HOME/lib目录中
②. 修改namenode节点中的conf/mapred-site.xml文件
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
</property>
<property>
<name>mapred.queue.names</name>
<value>default,hadoop,hive</value>
</property>
③. 修改conf/capacity-scheduler.xml 配置文件
<?xml version="1.0"?>
<!-- This is the configuration file for the resource manager in Hadoop. -->
<!-- You can configure various scheng parameters related to queues. -->
<!-- The properties for a queue follow a naming convention,such as, -->
<!-- mapred.capacity-scheduler.queue.<queue-name>.property-name. -->
<configuration>
<!-- Capacity scheduler Job Initialization configuration parameters -->
<property>
<name>mapred.capacity-scheduler.init-poll-interval</name>
<value>5000</value>
<description>The amount of time in miliseconds which is used to poll the job queues for jobs to initialize.
</description>
</property>
<property>
<name>mapred.capacity-scheduler.init-worker-threads</name>
<value>5</value>
<description>Number of worker threads which would be used by
Initialization poller to initialize jobs in a set of queue.
If number mentioned in property is equal to number of job queues
then a single thread would initialize jobs in a queue. If lesser
then a thread would get a set of queues assigned. If the number
is greater then number of threads would be equal to number of
job queues.
</description>
</property>
<property>
<name>mapred.capacity-scheduler.maximum-system-jobs</name>
<value>30</value>
<description>Maximum number of jobs in the system which can be initialized,
concurrently, by the Capacity Scheduler.
</description>
</property>
<!--hadoop queue-->
<property>
<name>mapred.capacity-scheduler.queue.hadoop.capacity</name>
<value>30</value>
<description>Percentage of the number of slots in the cluster that are to be available for jobs in this queue.
</description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hadoop.maximum-capacity</name>
<value>-1</value>
<description>
</description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hadoop.supports-priority</name>
<value>true</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hadoop.minimum-user-limit-percent</name>
<value>100</value>
<description> </description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hadoop.user-limit-factor</name>
<value>3</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hadoop.maximum-initialized-active-tasks</name>
<value>200000</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hadoop.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hadoop.init-accept-jobs-factor</name>
<value>10</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user</name>
<value>5</value>
<description>The maximum number of jobs to be pre-initialized for a user
of the job queue.
</description>
</property>
<!-- hive -->
<property>
<name>mapred.capacity-scheduler.queue.hive.capacity</name>
<value>30</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.maximum-capacity</name>
<value>-1</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.supports-priority</name>
<value>true</value>
<description>If true, priorities of jobs will be taken into account in scheng decisions.
</description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.minimum-user-limit-percent</name>
<value>100</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.user-limit-factor</name>
<value>4</value>
<description>The multiple of the queue capacity which can be configured to allow a single user to acquire more slots.
</description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks</name>
<value>200000</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.hive.init-accept-jobs-factor</name>
<value>10</value>
<description></description>
</property>
<!-- default -->
<property>
<name>mapred.capacity-scheduler.queue.default.capacity</name>
<value>40</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.maximum-capacity</name>
<value>-1</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.supports-priority</name>
<value>true</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name>
<value>100</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.user-limit-factor</name>
<value>4</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks</name>
<value>200000</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user</name>
<value>100000</value>
<description></description>
</property>
<property>
<name>mapred.capacity-scheduler.queue.default.init-accept-jobs-factor</name>
<value>10</value>
<description></description>
</property>
</configuration>
本文来自电脑杂谈,转载请注明本文网址:
http://www.pc-fly.com/a/jisuanjixue/article-37299-12.html
帮做不断提高产品质量
剩我买块电池么