Load Average (Linux/SysV)

Many tools (uptime, top, xload) refer to the load average of the system as a measure of the amount of work the system is doing. It has become apparent that load is not necessarily a good indicator of this. This document details what exactly is meant by the load and how it is calculated. This analysis is based closely on the Linux OS but most likely applies to all System V type UNIXes.

First lets look at what the man pages have to say:

Excerpt From: man top

      
Linux                       Feb 1 1993                          1

TOP(1)              Linux Programmer's Manual              TOP(1)

The load averages are the average number of process ready to run
during the last 1, 5 and 15 minutes.

From this you might expect the load to have something to do with the scheduler and keeping track of the number of processes in the ready queue at certain time intervals. This is reasonable but wrong the calculation is actually done instantaneously by counting the number of processes running and dividing to get the averages. In this way you can get a variety of overviews of the system load: now, recently, and over longish time.

The following section is actual code from the Linux kernel version 2.0.28 showing how the load is calculated.

From: /usr/src/linux/kernel/info.c

        struct sysinfo val;

        val.loads[0] = avenrun[0] << (SI_LOAD_SHIFT - FSHIFT);
        val.loads[1] = avenrun[1] << (SI_LOAD_SHIFT - FSHIFT);
        val.loads[2] = avenrun[2] << (SI_LOAD_SHIFT - FSHIFT);

From: /usr/src/linux/include/linux/kernel.h

#define SI_LOAD_SHIFT   16
struct sysinfo {
        long uptime;                    /* Seconds since boot */
        unsigned long loads[3];         /* 1, 5, and 15 minute load averages */
        unsigned long totalram;         /* Total usable main memory size */
        unsigned long freeram;          /* Available memory size */
        unsigned long sharedram;        /* Amount of shared memory */
        unsigned long bufferram;        /* Memory used by buffers */
        unsigned long totalswap;        /* Total swap space size */
        unsigned long freeswap;         /* swap space still available */
        unsigned short procs;           /* Number of current processes */
        char _f[22];                    /* Pads structure to 64 bytes */
};

From: /usr/src/linux/include/linux/sched.h

/*
 * These are the constant used to fake the fixed-point load-average
 * counting. Some notes:
 *  - 11 bit fractions expand to 22 bits by the multiplies: this gives
 *    a load-average precision of 10 bits integer + 11 bits fractional
 *  - if you want to count load-averages more often, you need more
 *    precision, or rounding will get you. With 2-second counting freq,
 *    the EXP_n values would be 1981, 2034 and 2043 if still using only
 *    11 bit fractions.
 */
extern unsigned long avenrun[];         /* Load averages */

#define FSHIFT          11              /* nr of bits of precision */
#define FIXED_1         (1<<FSHIFT)     /* 1.0 as fixed-point */
#define LOAD_FREQ       (5*HZ)          /* 5 sec intervals */
#define EXP_1           1884            /* 1/exp(5sec/1min) as fixed-point */
#define EXP_5           2014            /* 1/exp(5sec/5min) */
#define EXP_15          2037            /* 1/exp(5sec/15min) */

#define CALC_LOAD(load,exp,n) \
        load *= exp; \
        load += n*(FIXED_1-exp); \
        load >>= FSHIFT;

From: /usr/src/linux/fs/proc/array.c

#define LOAD_INT(x) ((x) >> FSHIFT)
#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)

static int get_loadavg(char * buffer)
{
        int a, b, c;

        a = avenrun[0] + (FIXED_1/200);
        b = avenrun[1] + (FIXED_1/200);
        c = avenrun[2] + (FIXED_1/200);
        return sprintf(buffer,"%d.%02d %d.%02d %d.%02d %d/%d %d\n",
                LOAD_INT(a), LOAD_FRAC(a),
                LOAD_INT(b), LOAD_FRAC(b),
                LOAD_INT(c), LOAD_FRAC(c),
                nr_running, nr_tasks, last_pid);
}

From: /usr/src/linux/kernel/sched.c

/*
 * Hmm.. Changed this, as the GNU make sources (load.c) seems to
 * imply that avenrun[] is the standard name for this kind of thing.
 * Nothing else seems to be standardized: the fractional size etc
 * all seem to differ on different machines.
 */
unsigned long avenrun[3] = { 0,0,0 };

/*
 * Nr of active tasks - counted in fixed-point numbers
 */
static unsigned long count_active_tasks(void)
{
        struct task_struct **p;
        unsigned long nr = 0;

        for(p = &LAST_TASK; p > &FIRST_TASK; --p)
                if (*p && ((*p)->state == TASK_RUNNING ||
                           (*p)->state == TASK_UNINTERRUPTIBLE ||
                           (*p)->state == TASK_SWAPPING))
                        nr += FIXED_1;
#ifdef __SMP__

        nr-=(smp_num_cpus-1)*FIXED_1;
#endif                  
        return nr;
}

static inline void calc_load(unsigned long ticks)
{
        unsigned long active_tasks; /* fixed-point */
        static int count = LOAD_FREQ;

        count -= ticks;
        if (count < 0) {
                count += LOAD_FREQ;
                active_tasks = count_active_tasks();
                CALC_LOAD(avenrun[0], EXP_1, active_tasks);
                CALC_LOAD(avenrun[1], EXP_5, active_tasks);
                CALC_LOAD(avenrun[2], EXP_15, active_tasks);
        }
}

(NeXT Tip #35) Load Average is a very informative document based on NeXT's load average but still relevant.


Jamie Marconi
Last modified: Mon Mar 3 23:20:14 PST