|
|
Subscribe / Log in / New account

Virtual Memory II: the return of objrmap

Andrea Arcangeli not only wants to make the Linux kernel scale to and beyond 32GB of memory on 32-bit processors; he seems to be in a real hurry. There are, it would seem, customers waiting for a 2.6-based distribution which can run in such environments.

For Andrea, the real culprit in the exhaustion of low memory is clear: it's the reverse-mapping virtual memory ("rmap") code. The rmap code was first described on this page in January, 2002; its purpose is to make it easier for the kernel to free memory when swapping is required. To that end, rmap maintains, for each physical page in the system, a chain of reverse pointers; each pointer indicates a page table which has a reference for that page. By following the rmap chains, the kernel can quickly find all mappings for a given page, unmap them, and swap the page out.

The rmap code solved some real performance problems in the kernel's virtual memory subsystem, but it, too has a cost. Every one of those reverse mapping entries consumes memory - low memory in particular. Much effort has gone into reducing the memory cost of the rmap chains, but the simple fact remains: as the amount of memory (and the number of processes using that memory) goes up, the rmap chains will consume larger amounts of low memory. Eliminating the rmap overhead would go a long way toward allowing the kernel to scale to larger systems. Of course, one wants to eliminate this overhead while not losing the benefits that rmap brings.

Andrea's approach is to bring back and extend the object-based reverse mapping patches. The initial object-based patch was created by Dave McCracken; LWN covered this patch a year ago. Essentially, this patch eliminates the rmap chains for memory which maps a file by following pointers "the long way around" and searching candidate virtual memory areas (VMAs). Andrea has updated this patch and fixed some bugs, but the core of the patch remains the same; see last year's description for the details.

Last week, we raised the possibility that the virtual memory subsystem could see fundamental changes in the course of the 2.6 "stable" series. This week, Linus confirmed that possibility in response to Andrea's object-based reverse mapping patch:

I certainly prefer this to the 4:4 horrors. So it sounds worth it to put it into -mm if everybody else is ok with it.

Assuming this work goes forward, it has the usual implications for the stable kernel. Even assuming that it stays in the -mm tree for some time, its inclusion into 2.6 is likely to destabilize things for a few releases until all of the obscure bugs are shaken out.

Dave McCracken's original patch, in any case, only solves part of the problem. It gets rid of the rmap chains for file-backed memory, but it does nothing for anonymous memory (basic process data - stacks, memory obtained with malloc(), etc.), which has no "object" behind it. File-backed memory is a large portion of the total, especially on systems which are running large Oracle servers and use big, shared file mappings. But anonymous memory is also a large part of the mix; it would be nice to take care of the rmap overhead for that as well.

To that end, Andrea has posted another patch (in preliminary form) which provides object-based reverse mapping for anonymous memory as well. It works, essentially, by replacing the rmap chain with a pointer to a chain of virtual memory area (VMA) structures.

Anonymous pages are always created in response to a request for memory from a single process; as a result, they are never shared at creation time. Given that, there is no need for a new anonymous page to have a chain of reverse mappings; we know that there can be only a single mapping. Andrea's patch adds a union to struct page which includes the existing mapping pointer (for non-anonymous memory) and adds a couple of new ones. One of those is simply called vma, and it points to the (single) VMA structure pointing to the page. So if a process has several non-shared, anonymous pages in the same virtual memory area, the structure looks somewhat like this:

[Anonymous reverse mapping]

With this structure, the kernel can find the page table which maps a given page by following the pointers through the VMA structure.

Life gets a bit more complicated when the process forks, however. Once that happens, there will be multiple page tables pointing to the same anonymous pages and a single VMA pointer will no longer be adequate. To deal with this case, Andrea has created a new "anon_vma" structure which implements a linked list of VMAs. The third member of the new struct page union is a pointer to this structure which, in turn, points to all VMAs which might contain the page. The structure now looks like:

[anonvma]

If the kernel needs to unmap a page in this scenario, it must follow the linked list and examine every VMA it finds. Once the page is unmapped from every page table found, it can be freed.

There are some memory costs to this scheme: the VMA structure requires a new list_head structure, and the anon_vma structure must be allocated whenever a chain must be formed. One VMA can refer to thousands of pages, however, so a per-VMA cost will be far less than the per-page costs incurred by the existing rmap code.

This approach does incur a greater computational cost. Freeing a page requires scanning multiple VMAs which may or may not contain references to the page under consideration. This cost will increase with the number of processes sharing a memory region. Ingo Molnar, who is fond of O(1) solutions, is nervous about object-based schemes for this reason. According to Ingo, losing the possibility of creating an O(1) page unmapping scheme is a heavy cost to pay for the prize of making large amounts of memory work on obsolete hardware.

The solution that Ingo would like to see, instead, is to reduce the per-page memory overhead by reducing the number of pages. The means to that end is page clustering - grouping adjacent hardware pages into larger virtual pages. Page clustering would reduce rmap overhead, and reduce the size of the main kernel memory map as well. The available page clustering patch is even more intrusive than object-based reverse mapping, however; it seems seriously unlikely to be considered for 2.6.

Index entries for this article
Kernelanon_vma
KernelMemory management/Object-based reverse mapping
KernelObject-based reverse mapping


to post comments

Virtual Memory II: the return of objrmap

Posted Mar 11, 2004 18:51 UTC (Thu) by riel (subscriber, #3142) [Link]

There is some preview code available that would decrease the CPU overhead of objrmap, probably to acceptable levels. If that code works as well as it's supposed to work (it's a new data structure, not yet well tested, etc...) there's a reasonable chance the pte based rmap can be replaced.

It's good to see that the last reason that caused me to make the pte based rmap code is finally dissolving. A well working object based rmap is much lighter weight...

Virtual Memory II: the return of objrmap

Posted Mar 18, 2004 13:42 UTC (Thu) by leandro (guest, #1460) [Link] (3 responses)

I guess even Linus sometimes bows to pressure. After all, all this complication is quite unnecessary, it is a decade now that we've had 64 bits processors. Nothing but Wintel FUD and proprietary software prevents users from running 64 bits now.

It could be argued that 64 bits vendors haven't been doing the right thing. Now Linus is on a POWER64 machine, but he should have been there for a long time, or on UltraSPARC. A pity Intel killed the Alpha, which was once Linus' platform. Also other developers should have been long ago given such systems.

I hope the BSDs and the Hurd stick to sanity.

Virtual Memory II: the return of objrmap

Posted Mar 21, 2004 15:34 UTC (Sun) by alpharomeo (guest, #20341) [Link] (1 responses)

Not sure what "Intel killed the Alpha" means. Alphas are available now (e.g., Compaq ES-47) and the Alpha technology is planned to be integrated into the Itanium product line starting in '06. Do you know something different?

Virtual Memory II: the return of objrmap

Posted Apr 16, 2004 18:20 UTC (Fri) by leandro (guest, #1460) [Link]

> what "Intel killed the Alpha" means

Alpha ceased to be developed.

> Alphas are available now (e.g., Compaq ES-47)

Expensive, limited, substandard. In other words, not developed neither as technology nor as a platform.

> the Alpha technology is planned to be integrated into the Itanium

Processor architecture is not like Lego where you can mix and match. The Itanium and Alpha architectures are fundamentally different and philosophically opposed. Some Alpha tricks may be incorporated into Itanium, but it will never see the potential Alpha had, and POWER still has but with a different focus. Some argue that nothing has the potential Alpha had.

Virtual Memory II: the return of objrmap

Posted May 24, 2004 7:19 UTC (Mon) by khim (subscriber, #9252) [Link]

Low memory on 32bit systems is still 2Gb! And if it's not enough for some structures on 32Gb system then obviously something is wrong: 10% for book-keeping is too much IMO (cache misses and all). So this patch is sane. True, only huge 32bit systems make it 2.6 and not 2.7 material but patch itself is sane - it's good for huge 64bit systems as well (not sure about small systems), just not essential there.

Virtual Memory II: the return of objrmap

Posted Mar 21, 2004 15:31 UTC (Sun) by alpharomeo (guest, #20341) [Link]

Page clustering seems like the obvious solution. From one point of view, it is the 4K page size that is obsolete, not the 32-bit addressing. When can we anticipate having a page clustering option available?


Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds