Recipe ID: hsts-r59
Memory Management in Linux Systems
This article describes the Linux memory management features, i.e., virtual memory and the disk buffer cache. The purpose and workings and the things the system administrator needs to take into consideration are described.
1. What is virtual memory?
Linux supports virtual memory, that is, using a disk as an extension of RAM so that the effective size of usable memory grows correspondingly. The kernel will write the contents of a currently unused block of memory to the hard disk so that the memory can be used for another purpose. When the original contents are needed again, they are read back into memory. This is all made completely transparent to the user; programs running under Linux only see the larger amount of memory available and don't notice that parts of them reside on the disk from time to time. Of course, reading and writing the hard disk is slower (on the order of a thousand times slower) than using real memory, so the programs don't run as fast. The part of the hard disk that is used as virtual memory is called the swap space.
Linux can use either a normal file in the filesystem or a separate partition for swap space. A swap partition is faster, but it is easier to change the size of a swap file (there's no need to repartition the whole hard disk, and possibly install everything from scratch). When you know how much swap space you need, you should go for a swap partition, but if you are uncertain, you can use a swap file first, use the system for a while so that you can get a feel for how much swap you need, and then make a swap partition when you're confident about its size.
You should also know that Linux allows one to use several swap partitions and/or swap files at the same time. This means that if you only occasionally need an unusual amount of swap space, you can set up an extra swap file at such times, instead of keeping the whole amount allocated all the time.
A note on operating system terminology: computer science usually distinguishes between swapping (writing the whole process out to swap space) and paging (writing only fixed size parts, usually a few kilobytes, at a time). Paging is usually more efficient, and that's what Linux does, but traditional Linux terminology talks about swapping anyway.
2. Creating a swap space
A swap file is an ordinary file; it is in no way special to the kernel. The only thing that matters to the kernel is that it has no holes, and that it is prepared for use with mkswap. It must reside on a local disk, however; it can't reside in a filesystem that has been mounted over NFS due to implementation reasons.
The bit about holes is important. The swap file reserves the disk space so that the kernel can quickly swap out a page without having to go through all the things that are necessary when allocating a disk sector to a file. The kernel merely uses any sectors that have already been allocated to the file. Because a hole in a file means that there are no disk sectors allocated (for that place in the file), it is not good for the kernel to try to use them.
One good way to create the swap file without holes is through the following command:
$ dd if=/dev/zero of=/extra-swap bs=1024
1024+0 records in
1024+0 records out
where /extra-swap is the name of the swap file and the size of is given after the count=. It is best for the size to be a multiple of 4, because the kernel writes out memory pages, which are 4 kilobytes in size. If the size is not a multiple of 4, the last couple of kilobytes may be unused.
A swap partition is also not special in any way. You create it just like any other partition; the only difference is that it is used as a raw partition, that is, it will not contain any filesystem at all. It is a good idea to mark swap partitions as type 82 (Linux swap); this will the make partition listings clearer, even though it is not strictly necessary to the kernel.
After you have created a swap file or a swap partition, you need to write a signature to its beginning; this contains some administrative information and is used by the kernel. The command to do this is mkswap, used like this:
$ mkswap /extra-swap 1024
Setting up swapspace, size = 1044480
Note that the swap space is still not in use yet: it exists, but the kernel does not use it to provide virtual memory.
You should be very careful when using mkswap, since it does not check that the file or partition isn't used for anything else. You can easily overwrite important files and partitions with mkswap! Fortunately, you should only need to use mkswap when you install your system.
The Linux memory manager limits the size of each swap space to 2 GB. You can, however, use up to 8 swap spaces simultaneously, for a total of 16GB.
3. Using a swap space
An initialized swap space is taken into use with swapon. This command tells the kernel that the swap space can be used. The path to the swap space is given as the argument, so to start swapping on a temporary swap file one might use the following command.
$ swapon /extra-swap
Swap spaces can be used automatically by listing them in the /etc/fstab file.
/dev/hda8 none swap sw 0 0
/swapfile none swap sw 0 0
The startup scripts will run the command swapon -a, which will start swapping on all the swap spaces listed in /etc/fstab. Therefore, the swapon command is usually used only when extra swap is needed.
You can monitor the use of swap spaces with free. It will tell the total amount of swap space used.
total used free shared
Mem: 15152 14896 256 12404 2528
-/+ buffers: 12368 2784
Swap: 32452 6684 25768
The first line of output (Mem:) shows the physical memory. The total column does not show the physical memory used by the kernel, which is usually about a megabyte. The used column shows the amount of memory used (the second line does not count buffers). The free column shows completely unused memory. The shared column shows the amount of memory shared by several processes; the more, the merrier. The buffers column shows the current size of the disk buffer cache.
That last line (Swap:) shows similar information for the swap spaces. If this line is all zeroes, your swap space is not activated.
The same information is available via top, or using the proc filesystem in file /proc/meminfo. It is currently difficult to get information on the use of a specific swap space.
A swap space can be removed from use with swapoff. It is usually not necessary to do it, except for temporary swap spaces. Any pages in use in the swap space are swapped in first; if there is not sufficient physical memory to hold them, they will then be swapped out (to some other swap space). If there is not enough virtual memory to hold all of the pages Linux will start to thrash; after a long while it should recover, but meanwhile the system is unusable. You should check (e.g., with free) that there is enough free memory before removing a swap space from use.
All the swap spaces that are used automatically with swapon -a can be removed from use with swapoff -a; it looks at the file /etc/fstab to find what to remove. Any manually used swap spaces will remain in use.
Sometimes a lot of swap space can be in use even though there is a lot of free physical memory. This can happen for instance if at one point there is need to swap, but later a big process that occupied much of the physical memory terminates and frees the memory. The swapped-out data is not automatically swapped in until it is needed, so the physical memory may remain free for a long time. There is no need to worry about this, but it can be comforting to know what is happening.
4. Sharing swap spaces with other operating systems
Virtual memory is built into many operating systems. Since they each need it only when they are running, i.e., never at the same time, the swap spaces of all but the currently running one are being wasted. It would be more efficient for them to share a single swap space. This is possible, but can require a bit of hacking..
5. Allocating swap space
Some people will tell you that you should allocate twice as much swap space as you have physical memory, but this is a bogus rule. Here's how to do it properly:
For instance, if you want to run X, you should allocate about 8 MB for it, gcc wants several megabytes (some files need an unusually large amount, up to tens of megabytes, but usually about four should do), and so on. The kernel will use about a megabyte by itself, and the usual shells and other small utilities perhaps a few hundred kilobytes (say a megabyte together). There is no need to try to be exact, rough estimates are fine, but you might want to be on the pessimistic side.
Remember that if there are going to be several people using the system at the same time, they are all going to consume memory. However, if two people run the same program at the same time, the total memory consumption is usually not double, since code pages and shared libraries exist only once.
The free and ps commands are useful for estimating the memory needs.
It's a good idea to have at least some swap space, even if your calculations indicate that you need none. Linux uses swap space somewhat aggressively, so that as much physical memory as possible can be kept free. Linux will swap out memory pages that have not been used, even if the memory is not yet needed for anything. This avoids waiting for swapping when it is needed: the swapping can be done earlier, when the disk is otherwise idle.
Swap space can be divided among several disks. This can sometimes improve performance, depending on the relative speeds of the disks and the access patterns of the disks. You might want to experiment with a few schemes, but be aware that doing the experiments properly is quite difficult. You should not believe claims that any one scheme is superior to any other, since it won't always be true.
The buffer cache
Reading from a disk is very slow compared to accessing (real) memory. In addition, it is common to read the same part of a disk several times during relatively short periods of time. For example, one might first read an e-mail message, then read the letter into an editor when replying to it, then make the mail program read it again when copying it to a folder. Or, consider how often the command ls might be run on a system with many users. By reading the information from disk only once and then keeping it in memory until no longer needed, one can speed up all but the first read. This is called disk buffering, and the memory used for the purpose is called the buffer cache.
Since memory is, unfortunately, a finite, nay, scarce resource, the buffer cache usually cannot be big enough (it can't hold all the data one ever wants to use). When the cache fills up, the data that has been unused for the longest time is discarded and the memory thus freed is used for the new data.
Disk buffering works for writes as well. On the one hand, data that is written is often soon read again (e.g., a source code file is saved to a file, then read by the compiler), so putting data that is written in the cache is a good idea. On the other hand, by only putting the data into the cache, not writing it to disk at once, the program that writes runs quicker. The writes can then be done in the background, without slowing down the other programs.
Most operating systems have buffer caches (although they might be called something else), but not all of them work according to the above principles. Some are write-through: the data is written to disk at once (it is kept in the cache as well, of course). The cache is called write-back if the writes are done at a later time. Write-back is more efficient than write-through, but also a bit more prone to errors: if the machine crashes, or the power is cut at a bad moment, or the floppy is removed from the disk drive before the data in the cache waiting to be written gets written, the changes in the cache are usually lost. This might even mean that the filesystem (if there is one) is not in full working order, perhaps because the unwritten data held important changes to the bookkeeping information.
Because of this, you should never turn off the power without using a proper shutdown procedure or remove a floppy from the disk drive until it has been unmounted (if it was mounted) or after whatever program is using it has signaled that it is finished and the floppy drive light doesn't shine anymore. The sync command flushes the buffer, i.e., forces all unwritten data to be written to disk, and can be used when one wants to be sure that everything is safely written. In traditional UNIX systems, there is a program called update running in the background which does a sync every 30 seconds, so it is usually not necessary to use sync. Linux has an additional daemon, bdflush, which does a more imperfect sync more frequently to avoid the sudden freeze due to heavy disk I/O that sync sometimes causes.
Under Linux, bdflush is started by update. There is usually no reason to worry about it, but if bdflush happens to die for some reason, the kernel will warn about this, and you should start it by hand (/sbin/update).
The cache does not actually buffer files, but blocks, which are the smallest units of disk I/O (under Linux, they are usually 1 KB). This way, also directories, super blocks, other filesystem bookkeeping data, and non-filesystem disks are cached.
The effectiveness of a cache is primarily decided by its size. A small cache is next to useless: it will hold so little data that all cached data is flushed from the cache before it is reused. The critical size depends on how much data is read and written, and how often the same data is accessed. The only way to know is to experiment.
If the cache is of a fixed size, it is not very good to have it too big, either, because that might make the free memory too small and cause swapping (which is also slow). To make the most efficient use of real memory, Linux automatically uses all free RAM for buffer cache, but also automatically makes the cache smaller when programs need more memory.
Under Linux, you do not need to do anything to make use of the cache, it happens completely automatically. Except for following the proper procedures for shutdown and removing floppies, you do not need to worry about it.
Resources for Linux Kernel Programmers
Linux File System Dictionary
Comprehensive Review of How Linux File and Directory System Works
Hands-on Linux classes
Linux Operating System Distributions
We provide private tutoring classes online and offline (at our DC site or your preferred location) with custom curriculum for almost all of our classes for $50 per hour online or $75 per hour in DC. Give us a call or submit our private tutoring registration form to discuss your needs.