Slack scheduling brings 100% resource utilization to safety-critical systems

To guarantee resource availability for time-critical tasks, safety-critical RTOSs utilize time partitioning. Most implementations, however, are extremely inefficient, reserving large amounts of CPU time for critical tasks, even if those tasks utilize only a fraction of the reserved time in practice. Slack scheduling provides an alternative to conventional time-partitioned scheduling that greatly enhances efficiency, enabling software designers to leverage the full power of today’s modern processors without sacrificing the guaranteed resource availability of time partitioning.

Time partitioned real-time operating systems guarantee that a specific computation will have access to the CPU for a specific amount of time (also known as a “budget”) at a bounded, deterministic location within a timeline or hyperperiod. This guarantee is a key enabler for developing highly integrated systems that allow software of varying degrees of criticality to coexist on the same platform. High-criticality computations are guaranteed access to the CPU even in the face of misbehaving lower-criticality software. In other words, one can always ensure that high-criticality applications have the CPU time they demand by budgeting for their worst-case execution. This approach works well for periodic computations that have a small deviation between their worst-case execution and their nominal-case execution.

However, as Figure 1 shows, this guarantee comes at a price. By allocating worst-case execution times, the scheduler ensures worst-case performance, every time. From a CPU bandwidth perspective, it is as if every time the hyperperiod is executed, every computation experiences its worst-case execution, every time. This leads to the all-too-common occurrence of CPU budget time depletion, or no more time to allocate in the hyperperiod, while profiling shows that average CPU utilization is actually only 50 percent or less in some scenarios.

21
Figure 1: Budgeting high-criticality applications for worst-case CPU times comes at a price, as illustrated by the unused CPU time in this traditional ARINC-653 scheduler.

In the vast majority of cases, actual computation time rarely equals worst-case allocated time. Even more improbable is the occurrence of multiple computations experiencing their worst-case execution time during the same period. Thus, for the vast majority of computations, unused CPU budget shows up as useless CPU idle time.

Further exacerbating the problem are the following common scenarios:

  • Aperiodic activities such as interrupts and client-server exchanges.
  • The CPU budget needed to guarantee safe execution is insufficient to meet customers’ desired performance needs.
  • The deviation between nominal-case and worst-case execution times is significant.

Slack scheduling technology addresses these common prob- lems and enables developers to leverage the full power of today’s modern processors without sacrificing the safety of time and space partitioning. This innovative technology greatly reduces idle time, making it possible to achieve 100 percent CPU utilization.

Slack scheduling defined

What is slack? When considering slack, it is often helpful to think of a bank account. Deposits of time are made into the slack account. “Threads” or computations can withdraw time until the account balance reaches zero. Where do these deposits come from? There are two sources of slack time:

  • Budgeted CPU time that goes unused during a thread’s execution.
  • Unallocated CPU time (for example, when adding up the total budgeted CPU time, the sum for the hyperperiod is less than 100 percent).

At the beginning of each hyperperiod, the slack account has a balance equal to the total unallocated CPU time; this source of slack is deterministic. As threads execute and complete early (with respect to their worst-case budget), they make implicit deposits of their remaining unused budgeted time to the slack account; this source of slack is nondeterministic. Conversely, as threads execute and wish to use slack time, they make explicit withdrawals. The scheduler manages the deposits and withdrawals to ensure time partitioning and the ability to schedule the system.

Using slack

How does one use slack? Programmers explicitly define threads as slack requesters at design time. All threads participate in generating slack (making implicit deposits of time into the account), but only slack requesters are allowed to consume slack (make explicit withdrawals from the slack account). A slack requester must first use all its budgeted CPU time. Once a slack requester (such as “thread Medium” in Figure 2) has depleted all of its budgeted CPU time, it is given immediate access to all available slack time, meaning all of the time in the slack account. The slack requester can use all or a portion of the available slack time at its discretion.

22
Figure 2: Once a slack requester (such as “thread Medium”) has depleted all its budgeted CPU time, it is given immediate access to all available slack time.

If the slack requester uses all of the available slack time and a subsequent thread generates slack (deposits into the slack account), the slack requester can be scheduled again and given access to the recently generated slack time. Since the scheduler always schedules the highest-priority thread that is ready to run, slack will be consumed by the highest-priority slack requester first. In this way, slack is a form of load shedding. All available CPU time can be used, as threads with nominal execution times less than their worst-case execution times will be generating time to be used by slack-requesting threads in addition to unallocated CPU time.

One of the most common uses of slack is to remove the lowest-criticality applications from the high-criticality, fixed-budget timeline. This allows developers to run their low-criticality applications purely on slack. The Deos development environment provides a classic example of this. All of the Ethernet-based applications (the network stack, FTP server, and Telnet server), including the network’s interrupt service routine, execute purely on slack. Without slack scheduling, these applications would demand more than 50 percent of the budgeted CPU time in order to meet performance expectations. With slack, budgeted CPU time for these applications dropped 80 percent, with a 300 percent increase in performance.

Advantages of slack scheduling

One key advantage of slack scheduling that is particularly useful in the client-server arena is the ability for threads to execute multiple times within the same period. This nuance of slack scheduling allows a client thread and its server thread to exchange data, perhaps multiple times, back-to-back within the same period to complete a transaction. In contrast, in most time-partitioned schemes, clients must wait for their server thread to be scheduled. When a transaction requires multiple interactions, the delay can be significant. To reduce this delay, users are forced to provide scheduling functionality within their code in order to craft a hyperperiod timeline that balances the needs of time-critical applications with those of less time-critical client-server applications. Slack-based implementations enable users to deploy slack and let the scheduler take care of this balance for them.

Slack scheduling provides an efficient way of budgeting CPU time that not only ensures safety, but also enables developers to get the most out of their processor. For example, let’s say that a display requires a 10 Hz update rate in order to meet minimum safety requirements. While this update rate might be deemed safe, it could still fall short of customer expectations. In this case, the developer could meet the safety requirement by budgeting for a 10 Hz update rate, while utilizing slack to provide the highest possible average update rate to meet customer expectations.

Additionally, slack scheduling allows developers to address their software requirements in a way commonly used before the advent of time-partitioned operating systems. In particular, it enables them to assign requirements to low-priority threads and then high-priority threads to monitor their activity and guarantee that the requirements are accomplished. For example, developers could meet their Continuous Built-In Test (CBIT) requirements by assigning the CBIT activity to a low-priority, pure slack thread and monitor the CBIT thread’s completion rate from a high-priority, fixed CPU budget thread. The slack scheduler would help spread the CPU load across the timeline (instead of the developer having to “play scheduler”) while the high-priority, fixed-budget thread guarantees that the activity occurs as required.

Slack scheduling meets critical timelines

Advanced safety-critical RTOSs like DDC-I’s Deos give software designers the ability to factor computations into slack scheduling and/or high-criticality timelines. This new technology enables them to leverage all the power of today’s modern processors, without sacrificing the safety of space and time partitioning.

Bill Cronk manages DDC-I’s Deos product line. He has previously worked as a software engineer with Kutta Technologies and as a technical manager with Honeywell Aerospace. For more information, contact sales@ddci.com.

DDC-I, Inc. 602-275-7172 www.ddci.com