Understanding Device and Pseudo Swap

I thought this was a pretty good explanation of how pseudo and device swap are utilized in an HPUX system. This was sent to me from HP Support.

When a process is spawned, the kernel will check virtual memory to see if there is space to accommodate the process. First the kernel checks that there is enough RAM available for the new process to run, that is not locked by other processes or used by the kernel. Then the kernel checks to make sure that the new process is able to reserve enough space in the swap area. If either of these tests fails, the process will not spawn or the process will be terminated, and you see a message like this "cannot fork, not enough space". All running processes have to be able to reserve their space from the configured swap space on the system.

Since Glance might not always work specially when there is memory crunch, I thought I will explain in detail what to look for in this command “swapinfo” to monitor when the usage spikes.

Just a little introduction of "Pseudo swap" It is a swap space, which the operating system recognizes, but in reality it does not exist. Pseudo swap is make-believe swap space. It does not exist in memory; it does not exist on disk; it does not exist anywhere. However, the operating system does recognize it, which means more swap space can be reserved than physically exists. The purpose of pseudo swap is to allow more processes to run in memory, than could be supported by the swap device(s). Swap devices refer to both device swap and filesystem swap. Pseudo swap allows the operating system (specifically the kernel variable swap_avail) to recognize more swap space, thereby allowing additional processes to start when all of the physical swap has been reserved. By having the operating system recognize more swap space than physically exists, customers with large memory systems can now operate without having to purchase large amounts of swap space, which they will most likely never use.

root@host:/root>swapinfo -tam Mb      Mb      Mb   PCT  START/      Mb TYPE      AVAIL    USED    FREE  USED   LIMIT RESERVE  PRI  NAME dev        4096     100    3996    2%       0       -    1  /dev/vg00/lvol2 dev        4096       0    4096    0%       0       -    3  /dev/vg00/swap2 dev        4096       0    4096    0%       0       -    4  /dev/vg00/swap3 dev        4096       0    4096    0%       0       -    5  /dev/vg00/swap4   <<<--- device swap configured, 4096*4 = 16384 reserve       -    1158   -1158 memory    75954   75418     536   99%   <<<--- pseudo swap available = 75954 total     92338   76676   15662   83%       -       0    -

The "USED" figure for the "memory" line indicates that we have 75418Mb less for pseudo swap *reservation*. Remember that the allocation policy is to always reserve from the real swap devices first (swapspc_cnt) before using pseudo swap (swapmem_cnt). This pseudo swap counter is decremented whenever the kernel allocates dynamic memory for its own purposes or memory pages are locked by processes. Since these memory pages are no longer available to user processes we adjust our initial estimate of 75% of memory by decrementing swapmem_cnt and thus effectively reduce the amount of pseudo swap available for reservation. This is what the 75418Mb means - they have nothing to do with "used" swap disk blocks.

If the real swap counter (swapspc_cnt) is zero i.e., has completely been reserved, then we start decrementing the pseudo swap counter (swapmem_cnt). In this case, we are actually *reserving* swap space from pseudo swap for user processes. The "used" field should increase but it does not mean that swap disk blocks are allocated or used. It simply means that we will have less swap space to reserve from.

In the context of pseudo swap, what does used and reserved mean - specifically in the usage of physical memory?

The column "USED" is really a misnomer for pseudo swap. It does not mean the same thing as "USED" for real device swap. For the latter it means disk blocks allocated. In the context of pseudo swap, used and reserved mean the same thing. Probably a more accurate translation of the "USED" field for pseudo swap is the amount that cannot be used for swap reservation. Let's examine the above swapinfo output further...

Here we have a total of 92338mb for swap reservation, where real swap (swapspc_cnt) = 16384mb (the total of four "dev" lines), and pseudo swap (swapmem_cnt) = 75954, the "memory" line.

Looking at the "dev" lines, under the device swap "USED" column the value 0 means therre has been no physical paging. In other words, you would see them used only where there occurs paging.

Looking at the "memory" line, the 75418Mb "USED" figure for pseudo swap (memory) indicates that either the kernel has allocated that much dynamic memory, or memory pages have been locked by processes. Subsequently this amount is no longer available for swap reservation and thus the swapmem_cnt counter is decremented appropriately. It is "used" in that sense ... but the "USED" term here does not mean that memory pages have been paged out! It simply means we have less space for swap reservation. The "FREE" column is basically what's AVAILable subtracting off what's USED.

Looking at the "reserve" line shows how much memory has actually been reserved. In other words, the "reserve" line shows how much has been decremented from the swap counter. Note we always reserve from the real swap counter (swapspc_cnt) before the pseudo swap counter (swapmem_cnt) is used. The *reserved* swap is simply used to guarantee that there will be swap space available if the system experiences memory pressure. The number does not necessarily equate to the amount of pages that will be paged out to disk; vhand and swapper actually decide which pages are candidates and how many of those pages are paged out to disk, according to a number of internal parameters.

Here are my recommendations:

In this case Oracle seems to be taking up 75418 MB (out of 95.5 GB) physical memory under normal load and total device swap is 16384MB only. The total swap including pseudo swap is  92338 MB,this is less then the physical memory.

First of all your application seems to be memory intensive in that it requires more physical memory. You may want to check with your application support if it needs any tuning.

It is recommended that a system be configured with at least a 1:1 ratio of swap to physical memory, you may consider adding more device swap.