The Art of Fairness: Balancing Workloads in YARN Queues#

YARN (Yet Another Resource Negotiator) is at the heart of many modern data processing platforms, providing resource management and job scheduling capabilities. One of the critical challenges for administrators is ensuring that jobs enjoy fair access to cluster resources, regardless of size or priority. This blog post walks you through everything you need to know about fairness in YARN queues, from the building blocks of YARN to advanced scheduling configurations. By the end, you will have a robust understanding of how to balance workloads and optimize your cluster usage in a fair manner.

Introduction to YARN#

Before delving into fairness, let’s begin with a high-level overview of YARN, the resource framework built into Hadoop. YARN was introduced in Hadoop 2.0, separating resource management from job execution. This separation allows different frameworks—such as Spark, MapReduce, or Tez—to all leverage YARN as their underlying resource negotiator.

Why YARN Matters#

YARN addresses the scalability limitations of earlier Hadoop versions by providing a centralized resource manager. It supports:

Multi-tenancy: Multiple teams or business units can share a single cluster.
Elasticity: Easily scale up or down as demand changes.
Flexibility: Run various data processing engines (e.g., Spark, Hive, MapReduce).
Workload isolation: Allocate resources intelligently so that no single job starves others.

Key Components#

YARN’s architecture includes two core components:

ResourceManager (RM): Manages cluster resources, scheduling, and job lifecycle events.
NodeManager (NM): Monitors resource usage on a specific node and reports to the RM.

Within the ResourceManager, schedulers (Fair Scheduler, Capacity Scheduler, and FIFO) decide how resources are distributed among different workloads. For the sake of this post, we focus on YARN’s Fair Scheduler concepts.

Understanding YARN Scheduling Basics#

What Is a Scheduler?#

A YARN scheduler is responsible for allocating resources to running applications. Each scheduler implements a specific policy:

FIFO Scheduler: The simplest approach; queues jobs in first-in-first-out order.
Capacity Scheduler: Divides cluster resources into partitions (queues) with configurable capacities and hierarchical structures.
Fair Scheduler: Dynamically allocates resources so that all running applications receive a fair share.

Fairness vs. Capacity#

While the Capacity Scheduler focuses on guaranteeing minimum capacities for different queues, the Fair Scheduler aims to balance out resource allocations among all jobs. These are not mutually exclusive concepts—Capacity Scheduler can be configured with properties that offer fairness—but the Fair Scheduler is often chosen where static allocations are either too rigid or less desirable.

Fair Scheduler Overview#

What Is the Fair Scheduler?#

The Fair Scheduler is designed to give each running application equal weighting over time. If a user submits multiple applications, the Fair Scheduler spreads available resources among these applications to ensure no single process monopolizes the cluster. When a running job finishes or relinquishes resources, new jobs can use those freed resources immediately.

Fairness Algorithms#

Two commonly referenced concepts when discussing fairness:

Dominant Resource Fairness (DRF): Considers multiple resource dimensions (CPU, memory, etc.) and attempts to balance usage across them.
Weight-based Fairness: Administrators can adjust the “weight” of particular queues or users to prioritize certain types of workloads.

Advantages of the Fair Scheduler#

More flexible than FIFO or a simple capacity-based approach.
Supports preemption, ensuring that no job remains starved indefinitely.
Facilitates multi-tenant environments where different teams compete for resources.
Offers advanced configurations to customize fairness (e.g., minimum shares, maximum shares, weights).

Configuring Fair Schedulers#

Typically, the Fair Scheduler is configured in two main files:

yarn-site.xml: Specifies which scheduler (Fair Scheduler) YARN should use.
fair-scheduler.xml: Defines the configuration details, such as pools (queues), weights, min/max resources, and preemption settings.

Step-by-Step Configuration#

Enable the Fair Scheduler:
In yarn-site.xml, set the following property:

1
<property>
2
    <name>yarn.resourcemanager.scheduler.class</name>
3
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
4
</property>

Configure fair-scheduler.xml:
You can place fair-scheduler.xml in the Hadoop configuration directory. A basic skeleton might look like this:

1
<allocations>
2
    <queue name="root">
3
        <queue name="default">
4
            <minResources>1024 mb,1 vcores</minResources>
5
        </queue>
6
        <queue name="production">
7
            <minResources>2048 mb,2 vcores</minResources>
8
        </queue>
9
        <queue name="development">
10
            <minResources>1024 mb,1 vcores</minResources>
11
        </queue>
12
    </queue>
13
</allocations>

Set Default Queue:
Typically, you want a default queue to catch jobs that aren’t specified to any particular queue. This is usually “default”.
Validate Configuration:
Restart the ResourceManager. Check the ResourceManager UI to ensure the Fair Scheduler is active and the queues are visible.

Queue Structures and Hierarchies#

Queues in Fair Scheduler are often referred to as “pools.” Administrators can create simple or hierarchical queue structures:

Flat Structure: A small number of top-level queues (e.g., default, production, development).
Hierarchical Structure: Each queue can have children, grandchildren, etc., allowing for more fine-grained resource distribution.

Elements Within a Queue#

When you define a queue in fair-scheduler.xml, you can specify various attributes:

minResources: Guaranteed minimum resources.
maxResources: An upper bound (though not always strictly enforced if resources are idle).
weight: A relative priority factor. The higher the weight, the more resources allocated to that queue when there’s competition.
aclSubmitApps: An access control list specifying who can submit jobs.
aclAdministerApps: Who can administer the queue.

Example of a Hierarchical Setup#

1
<allocations>
2
    <queue name="root">
3
        <minResources>2048 mb,2 vcores</minResources>
4

5
        <queue name="analytics">
6
            <minResources>4096 mb,4 vcores</minResources>
7
            <weight>2.0</weight>
8
        </queue>
9

10
        <queue name="sla">
11
            <minResources>1024 mb,1 vcores</minResources>
12
            <queue name="highpriority">
13
                <minResources>2048 mb,2 vcores</minResources>
14
                <weight>3.0</weight>
15
            </queue>
16
            <queue name="lowpriority">
17
                <minResources>512 mb,1 vcores</minResources>
18
                <weight>0.5</weight>
19
            </queue>
20
        </queue>
21

22
    </queue>
23
</allocations>

In this example:

The root queue guarantees 2048 MB and 2 vcores collectively.
The “analytics” queue offers a higher weight (2.0) relative to others.
Under “sla,” you have two child queues: “highpriority” (with weight=3.0) and “lowpriority” (weight=0.5).

Managing Resource Allocation#

By default, the Fair Scheduler tries to share resources evenly across queues. If multiple queues are active, it attempts to allocate resources such that each queue receives a fair share according to its weight. When a queue is idle, other queues can borrow its capacity.

Preemption#

Preemption allows the scheduler to reclaim resources from running applications when the cluster is overcommitted and other queues are not getting their fair share. Some key points:

yarn.scheduler.fair.preemption: Enable or disable.
Preemption Timeout: The time after which idle or underused resources are forcibly taken away if other queues need them.

You can configure preemption properties in the yarn-site.xml configuration file or directly in fair-scheduler.xml. For instance:

1
<property>
2
   <name>yarn.scheduler.fair.preemption</name>
3
   <value>true</value>
4
</property>
5
<property>
6
   <name>yarn.scheduler.fair.preemption.timeout</name>
7
   <value>60000</value>
8
</property>

Weight-Based Scheduling#

Instead of using strict resource allocations, you can assign weights to queues. If queue A has a weight of 2.0, and queue B has a weight of 1.0, and both queues are competing for resources, queue A receives approximately two-thirds of the available resources while queue B receives one-third.

User and Job-Level Fairness#

Within a queue, the Fair Scheduler can also distribute resources among applications from different users. Configurations like userMaxAppsDefault or queueMaxAppsDefault can prevent a single user from run-away job submissions that crowd out everyone else.

Examples and Code Snippets#

Example 1: Basic Fair Scheduler Allocation#

1
<allocations>
2
    <queue name="root">
3
        <queue name="default">
4
            <minResources>1024 mb,1 vcores</minResources>
5
            <weight>1.0</weight>
6
        </queue>
7
        <queue name="production">
8
            <minResources>2048 mb,2 vcores</minResources>
9
            <weight>2.0</weight>
10
        </queue>
11
    </queue>
12
</allocations>

In this setup:

“default” queue has a weight of 1.0.
“production” queue has a weight of 2.0.

If both queues are active, “production” will get roughly twice the resources of “default.”

Example 2: Using Preemption in Fair Scheduler#

1
<property>
2
    <name>yarn.scheduler.fair.preemption</name>
3
    <value>true</value>
4
</property>
5

6
<property>
7
    <name>yarn.scheduler.fair.preemption.timeout</name>
8
    <value>30000</value>
9
</property>

Preemption is turned on.
If a queue is waiting (and has not reached its fair share) for more than 30 seconds, the scheduler can take resources from a queue that is consuming more than its share.

Example 3: Fine-Tuning with ACLs#

1
<allocations>
2
    <queue name="root">
3
        <queue name="marketing">
4
            <aclSubmitApps>marketing_user1,marketing_group1</aclSubmitApps>
5
            <aclAdministerApps>marketing_admin</aclAdministerApps>
6
        </queue>
7
        <queue name="finance">
8
            <aclSubmitApps>finance_user1,finance_group1</aclSubmitApps>
9
            <aclAdministerApps>finance_admin</aclAdministerApps>
10
        </queue>
11
    </queue>
12
</allocations>

Users listed under aclSubmitApps can submit jobs to the queue, while aclAdministerApps users can manage (kill, move) those applications.

Advanced Concepts#

Fair Scheduler Policies#

Fair Scheduler supports multiple scheduling policies at the queue level:

Fair (Default): Attempts to balance resource distribution.
FIFO: Processes running applications in first-come-first-serve order within the queue.
DRF: Dominant Resource Fairness, considering multiple resource types simultaneously.

To configure a scheduling policy:

1
<allocations>
2
    <queue name="root">
3
        <schedulingPolicy>drf</schedulingPolicy>
4
        <!-- child queues -->
5
    </queue>
6
</allocations>

Queue-Level Policies#

You can override the root queue’s scheduling policy at the child queue level, allowing for a hybrid approach. For example, your root queue might use DRF, but a child queue might use FIFO:

1
<queue name="development">
2
    <schedulingPolicy>fifo</schedulingPolicy>
3
</queue>

Reservation System#

YARN also supports a Reservation System (primarily used with the Capacity Scheduler) that can pre-allocate resources to specific jobs or workflows. Though less common with Fair Scheduler, understanding that reservations can block out a chunk of capacity for critical workloads can help in advanced multi-tenant scenarios. With the Fair Scheduler, you would typically rely on minResources and preemption to protect critical jobs.

Multiple Resource Types#

Modern data processing involves more than just CPU or memory. You might have GPU resources, specialized co-processors, or other constraints. With DRF-based scheduling, you can account for multiple resource dimensions. Ensure you enable YARN node resource capabilities if you’re dealing with GPUs:

1
<property>
2
    <name>yarn.resource-types</name>
3
    <value>yarn.io/gpu</value>
4
</property>

Then, each queue can define minimum GPU resources:

1
<minResources>4096 mb,2 vcores,1 yarn.io/gpu</minResources>

Fairness in Container Allocation#

YARN schedules resources at the container level. Each application can request containers with specific resource requirements. The Fair Scheduler tries to meet these requests without starving other applications or queues. You can limit container sizing by queue to avoid a single container from hoarding excessive resources.

Monitoring and Troubleshooting#

ResourceManager UI#

The ResourceManager UI provides an overview of:

Active queues.
Running applications in each queue.
Real-time resource usage (memory, CPU).
Preemption events (in certain Hadoop distributions).

Fair Scheduler Logs#

Enabling debug logs for the Fair Scheduler can help diagnose complex issues. You can adjust logging in log4j.properties (or log4j2.properties in newer Hadoop releases). Common log categories:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler

Common Issues#

Jobs stuck in ‘ACCEPTED’ state: Often indicates insufficient resources or misconfigured queue capacities.
Starving queues: Check that the weights or minResources for the queue are adequate. Consider enabling preemption.
Excessive preemption: If preemption events happen too frequently, you might lower the preemption.timeout or reduce the concurrency in certain queues.

Best Practices#

Avoid Over-Provisioning: Resist the temptation to set very high minResources for all queues, as it can lead to wasted capacity.
Use Weights Judiciously: Balancing the weight values can be tricky. Begin with small increments.
Leverage Preemption Carefully: Preemption is powerful but can lead to application instability if not configured with proper timeouts.
Define a Default Queue: Ensure all unspecified applications land somewhere safe.
Monitor Regularly: YARN scheduling is not a set-and-forget proposition. Monitor performance, ephemeral usage spikes, and queue-level metrics.
Test with Representative Workloads: Apply different test scenarios (large, small, interactive, batch) to see how the Fair Scheduler behaves under pressure.

Conclusion#

The Fair Scheduler in YARN is a potent mechanism for managing multi-tenant environments and distributing limited resources equitably across various departments, teams, or applications. By configuring the Fair Scheduler properly—through minResources, maxResources, weights, and preemption policies—you can maintain high cluster utilization while ensuring that no single workload monopolizes resources.

From basic concepts of YARN architecture to advanced scheduling algorithms like DRF, you now have the foundational and advanced knowledge to plan and implement fairness in your cluster. Continue exploring the nuances of YARN’s configuration files, adopt best practices, and, most importantly, monitor your cluster’s performance. By understanding your workloads and meticulously fine-tuning your Fair Scheduler setup, you will master the art of balancing workloads in YARN queues—ultimately maximizing the value of your data infrastructure while keeping peace among your different teams and use cases.