If you're using a Linux server, you're probably familiar with the term load average/system load. Measuring the load average is critical to understanding how your servers are performing; if overloaded, you need to kill or optimize the processes consuming high amounts of resources, or provide more resources to balance the workload.
But how do you determine if your server has sufficient load capacity, and when should you be worried? Let's dive in and find out.
What is a load average?
The load average is the average system load on a Linux server for a defined period of time. In other words, it is the CPU demand of a server that includes sum of the running and the waiting threads.
Typically, the top or the uptime command will provide the load average of your server with output that looks like:
These numbers are the averages of the system load over a period of one, five, and 15 minutes.
Before getting into how to measure the load average output and what each of these values mean, let's get into the simplest example: a server with a single core processor.
Breaking down the load
A server with a single core processor is like a single line of customers waiting to get their items billed in a grocery store. During peak hours, there is usually a long line and the waiting time for every individual is also high.
If you're the cashier and want to record the waiting time, one important metric would be the number of people waiting during a particular period of time. If there are no customers waiting, then the wait time is zero. On the other hand, if there is a long line of customers, then the wait time is high.
Applying that to the load average output (0.5, 1.5, 3.0) that we got above:
- 0.5 means the minimum waiting time at the counter. Between 0.00 and 1.0, there is no need to worry. Your servers are safe!
- 1.5 means the queue is filling up. If the average gets any higher, things are going to start slowing down.
- 3.00 means there's a considerably long queue waiting, and an extra resource/counter is required to clear up the queue faster.
What you want is a queue/load average value between 0.00 and 1.00. So can we conclude that the ideal load average is 1.00, and anything above that is an action call to troubleshoot? Well, although it's a safe bet, a more proactive approach is leaving some extra headroom to manage unexpected loads.
Multicores and multiprocessors to the rescue
Are a single quad core processor and a server with four processors (with one core each) the same? Relatively, yes. The main difference between multicore and multiprocessor is that the former refers to a single CPU having multiple cores, while the latter refers to multiple CPUs. To sum up: one quad core is equal to two dual cores which is equal to four single cores.
The load average is relative to the number of cores available in the server and not how they are spread out over CPUs. This means the maximum utilization range is 0-1 for a single core, 0-2 for a dual core, 0-4 for a quad core, 0-8 for an octa-core, and so on.
Referring to the cashier example again, a load of 1.00 would mean the capacity is just right on a single core processor; while on a dual core processor, a load of 1.50 would mean one line is filled up, and the other line is filling up. Similarly, a load of 5.00 on a quad core processor is something to worry about, while on an octa-core processor, 5.00 is only just filling up, and there is optimum space available.
Comments
Post a Comment