documentation/address-space.sgml - wine - Git at Google

 <chapter id="address-space">
   <title> Address space management </title>

   <para>
     Every Win32 process in Wine has its own dedicated native process on the host system, and
     therefore its own address space. This section explores the layout of the Windows address space
     and how it is emulated.
   </para>

   <para>
     Firstly, a quick recap of how virtual memory works. Physical memory in RAM chips is split
     into <emphasis>frames</emphasis>, and the memory that each process sees is split
     into <emphasis>pages</emphasis>. Each process has its own 4 gigabytes of address space (4gig
     being the maximum space addressable with a 32 bit pointer). Pages can be mapped or unmapped:
     attempts to access an unmapped page cause an EXCEPTION_ACCESS_VIOLATION which has the
     easily recognizable code of 0xC0000005.  Any page can be mapped to any frame, therefore you can
     have multiple addresses which actually "contain" the same memory. Pages can also be mapped to
     things like files or swap space, in which case accessing that page will cause a disk access to
     read the contents into a free frame.
   </para>

   <sect1>
     <title>Initial layout</title>

     <para>
       When a Win32 process starts, it does not have a clear address space to use as it pleases. Many pages
       are already mapped by the operating system. In particular, the EXE file itself and any DLLs it
       needs are mapped into memory, and space has been reserved for the stack and a couple of heaps
       (zones used to allocate memory to the app from). Some of these things need to be at a fixed
       address, and others can be placed anywhere.
     </para>

     <para>
       The EXE file itself is usually mapped at address 0x400000 and up: indeed, most EXEs have
       their relocation records stripped which means they must be loaded at their base address and
       cannot be loaded at any other address.
     </para>

     <para>
       DLLs are internally much the same as EXE files but they have relocation records, which means
       that they can be mapped at any address in the address space. Remember we are not dealing with
       physical memory here, but rather virtual memory which is different for each
       process. Therefore OLEAUT32.DLL may be loaded at one address in one process, and a totally
       different one in another. Ensuring all the functions loaded into memory can find each other
       is the job of the Windows dynamic linker, which is a part of NTDLL.
     </para>

     <para>
       So, we have the EXE and its DLLs mapped into memory. Two other very important regions also
       exist: the stack and the process heap. The process heap is simply the equivalent of the libc
       malloc arena on UNIX: it's a region of memory managed by the OS which malloc/HeapAlloc
       partitions and hands out to the application. Windows applications can create several heaps but
       the process heap always exists. It's created as part of process initialization in
       dlls/ntdll/thread.c:thread_init().
     </para>

     <para>
       There is another heap created as part of process startup, the so-called shared or system
       heap. This is an undocumented service that exists only on Windows 9x: it is implemented in
       Wine so native win9x DLLs can be used. The shared heap is unusual in that anything allocated
       from it will be visible in every other process. This heap is always created at the
       SYSTEM_HEAP_BASE address or 0x80000000 and defaults to 16 megabytes in size.
     </para>

     <para>
       So far we've assumed the entire 4 gigs of address space is available for the application. In
       fact that's not so: only the lower 2 gigs are available, the upper 2 gigs are on Windows NT
       used by the operating system and hold the kernel (from 0x80000000). Why is the kernel mapped
       into every address space?  Mostly for performance: while it's possible to give the kernel its
       own address space too - this is what Ingo Molnars 4G/4G VM split patch does for Linux - it
       requires that every system call into the kernel switches address space. As that is a fairly
       expensive operation (requires flushing the translation lookaside buffers etc) and syscalls are
       made frequently it's best avoided by keeping the kernel mapped at a constant position in every
       processes address space.
     </para>

     <para>
       On Windows 9x, in fact only the upper gigabyte (0xC0000000 and up) is used by the kernel, the
       region from 2 to 3 gigs is a shared area used for loading system DLLs and for file
       mappings. The bottom 2 gigs on both NT and 9x are available for the programs memory allocation
       and stack.
     </para>

     <para>
       There are a few other magic locations. The bottom 64k of memory is deliberately left unmapped
       to catch null pointer dereferences. The region from 64k to 1mb+64k are reserved for DOS
       compatibility and contain various DOS data structures. Finally, the address space also
       contains mappings for the Wine binary itself, any native libaries Wine is using, the glibc
       malloc arena and so on.
     </para>

   </sect1>

   <sect1>
     <title> Laying out the address space </title>

     <para>
       Up until about the start of 2004, the Linux address space very much resembled the Windows 9x
       layout: the kernel sat in the top gigabyte, the bottom pages were unmapped to catch null
       pointer dereferences, and the rest was free. The kernels mmap algorithm was predictable: it
       would start by mapping files at low addresses and work up from there.
     </para>

     <para>
       The development of a series of new low level patches violated many of these assumptions, and
       resulted in Wine needing to force the Win32 address space layout upon the system. This
       section looks at why and how this is done.
     </para>

     <para>
       The exec-shield patch increases security by randomizing the kernels mmap algorithms. Rather
       than consistently choosing the same addresses given the same sequence of requests, the kernel
       will now choose randomized addresses. Because the Linux dynamic linker (ld-linux.so.2) loads
       DSOs into memory by using mmap, this means that DSOs are no longer loaded at predictable
       addresses, so making it harder to attack software by using buffer overflows. It also attempts
       to relocate certain binaries into a special low area of memory known as the ASCII armor so
       making it harder to jump into them when using string based attacks.
     </para>

     <para>
       Prelink is a technology that enhances startup times by precalculating ELF global offset
       tables then saving the results inside the native binaries themselves. By grid fitting each
       DSO into the address space, the dynamic linker does not have to perform as many relocations
       so allowing applications that heavily rely on dynamic linkage to be loaded into memory much
       quicker. Complex C++ applications such as Mozilla, OpenOffice and KDE can especially benefit
       from this technique.
     </para>

     <para>
       The 4G VM split patch was developed by Ingo Molnar. It gives the Linux kernel its own address
       space, thereby allowing processes to access the maximum addressable amount of memory on a
       32-bit machine: 4 gigabytes. It allows people with lots of RAM to fully utilise that in any
       given process at the cost of performance: as mentioned previously the reason behind giving
       the kernel a part of each processes address space was to avoid the overhead of switching on
       each syscall.
     </para>

     <para>
       Each of these changes alter the address space in a way incompatible with Windows. Prelink and
       exec-shield mean that the libraries Wine uses can be placed at any point in the address
       space: typically this meant that a library was sitting in the region that the EXE you wanted
       to run had to be loaded (remember that unlike DLLs, EXE files cannot be moved around in
       memory). The 4G VM split means that programs could receive pointers to the top gigabyte of
       address space which some are not prepared for (they may store extra information in the high
       bits of a pointer, for instance). In particular, in combination with exec-shield this one is
       especially deadly as it's possible the process heap could be allocated beyond
       ADDRESS_SPACE_LIMIT which causes Wine initialization to fail.
     </para>

     <para>
       The solution to these problems is for Wine to reserve particular parts of the address space
       so that areas that we don't want the system to use will be avoided. We later on
       (re/de)allocate those areas as needed. One problem is that some of these mappings are put in
       place automatically by the dynamic linker: for instance any libraries that Wine
       is linked to (like libc, libwine, libpthread etc) will be mapped into memory before Wine even
       gets control. In order to solve that, Wine overrides the default ELF initialization sequence
       at a low level and reserves the needed areas by using direct syscalls into the kernel (ie
       without linking against any other code to do it) before restarting the standard
       initialization and letting the dynamic linker continue. This is referred to as the
       preloader and is found in loader/preloader.c.
     </para>

     <para>
       Once the usual ELF boot sequence has been completed, some native libraries may well have been
       mapped above the 3gig limit: however, this doesn't matter as 3G is a Windows limit, not a
       Linux limit. We still have to prevent the system from allocating anything else above there
       (like the heap or other DLLs) though so Wine performs a binary search over the upper gig of
       address space in order to iteratively fill in the holes with MAP_NORESERVE mappings so the
       address space is allocated but the memory to actually back it is not. This code can be found
       in libs/wine/mmap.c:reserve_area.
     </para>

   </sect1>

 </chapter>
	<chapter id="address-space">
	<title> Address space management </title>

	<para>
	Every Win32 process in Wine has its own dedicated native process on the host system, and
	therefore its own address space. This section explores the layout of the Windows address space
	and how it is emulated.
	</para>

	<para>
	Firstly, a quick recap of how virtual memory works. Physical memory in RAM chips is split
	into <emphasis>frames</emphasis>, and the memory that each process sees is split
	into <emphasis>pages</emphasis>. Each process has its own 4 gigabytes of address space (4gig
	being the maximum space addressable with a 32 bit pointer). Pages can be mapped or unmapped:
	attempts to access an unmapped page cause an EXCEPTION_ACCESS_VIOLATION which has the
	easily recognizable code of 0xC0000005. Any page can be mapped to any frame, therefore you can
	have multiple addresses which actually "contain" the same memory. Pages can also be mapped to
	things like files or swap space, in which case accessing that page will cause a disk access to
	read the contents into a free frame.
	</para>

	<sect1>
	<title>Initial layout</title>

	<para>
	When a Win32 process starts, it does not have a clear address space to use as it pleases. Many pages
	are already mapped by the operating system. In particular, the EXE file itself and any DLLs it
	needs are mapped into memory, and space has been reserved for the stack and a couple of heaps
	(zones used to allocate memory to the app from). Some of these things need to be at a fixed
	address, and others can be placed anywhere.
	</para>

	<para>
	The EXE file itself is usually mapped at address 0x400000 and up: indeed, most EXEs have
	their relocation records stripped which means they must be loaded at their base address and
	cannot be loaded at any other address.
	</para>

	<para>
	DLLs are internally much the same as EXE files but they have relocation records, which means
	that they can be mapped at any address in the address space. Remember we are not dealing with
	physical memory here, but rather virtual memory which is different for each
	process. Therefore OLEAUT32.DLL may be loaded at one address in one process, and a totally
	different one in another. Ensuring all the functions loaded into memory can find each other
	is the job of the Windows dynamic linker, which is a part of NTDLL.
	</para>

	<para>
	So, we have the EXE and its DLLs mapped into memory. Two other very important regions also
	exist: the stack and the process heap. The process heap is simply the equivalent of the libc
	malloc arena on UNIX: it's a region of memory managed by the OS which malloc/HeapAlloc
	partitions and hands out to the application. Windows applications can create several heaps but
	the process heap always exists. It's created as part of process initialization in
	dlls/ntdll/thread.c:thread_init().
	</para>

	<para>
	There is another heap created as part of process startup, the so-called shared or system
	heap. This is an undocumented service that exists only on Windows 9x: it is implemented in
	Wine so native win9x DLLs can be used. The shared heap is unusual in that anything allocated
	from it will be visible in every other process. This heap is always created at the
	SYSTEM_HEAP_BASE address or 0x80000000 and defaults to 16 megabytes in size.
	</para>

	<para>
	So far we've assumed the entire 4 gigs of address space is available for the application. In
	fact that's not so: only the lower 2 gigs are available, the upper 2 gigs are on Windows NT
	used by the operating system and hold the kernel (from 0x80000000). Why is the kernel mapped
	into every address space? Mostly for performance: while it's possible to give the kernel its
	own address space too - this is what Ingo Molnars 4G/4G VM split patch does for Linux - it
	requires that every system call into the kernel switches address space. As that is a fairly
	expensive operation (requires flushing the translation lookaside buffers etc) and syscalls are
	made frequently it's best avoided by keeping the kernel mapped at a constant position in every
	processes address space.
	</para>

	<para>
	On Windows 9x, in fact only the upper gigabyte (0xC0000000 and up) is used by the kernel, the
	region from 2 to 3 gigs is a shared area used for loading system DLLs and for file
	mappings. The bottom 2 gigs on both NT and 9x are available for the programs memory allocation
	and stack.
	</para>

	<para>
	There are a few other magic locations. The bottom 64k of memory is deliberately left unmapped
	to catch null pointer dereferences. The region from 64k to 1mb+64k are reserved for DOS
	compatibility and contain various DOS data structures. Finally, the address space also
	contains mappings for the Wine binary itself, any native libaries Wine is using, the glibc
	malloc arena and so on.
	</para>

	</sect1>

	<sect1>
	<title> Laying out the address space </title>

	<para>
	Up until about the start of 2004, the Linux address space very much resembled the Windows 9x
	layout: the kernel sat in the top gigabyte, the bottom pages were unmapped to catch null
	pointer dereferences, and the rest was free. The kernels mmap algorithm was predictable: it
	would start by mapping files at low addresses and work up from there.
	</para>

	<para>
	The development of a series of new low level patches violated many of these assumptions, and
	resulted in Wine needing to force the Win32 address space layout upon the system. This
	section looks at why and how this is done.
	</para>

	<para>
	The exec-shield patch increases security by randomizing the kernels mmap algorithms. Rather
	than consistently choosing the same addresses given the same sequence of requests, the kernel
	will now choose randomized addresses. Because the Linux dynamic linker (ld-linux.so.2) loads
	DSOs into memory by using mmap, this means that DSOs are no longer loaded at predictable
	addresses, so making it harder to attack software by using buffer overflows. It also attempts
	to relocate certain binaries into a special low area of memory known as the ASCII armor so
	making it harder to jump into them when using string based attacks.
	</para>

	<para>
	Prelink is a technology that enhances startup times by precalculating ELF global offset
	tables then saving the results inside the native binaries themselves. By grid fitting each
	DSO into the address space, the dynamic linker does not have to perform as many relocations
	so allowing applications that heavily rely on dynamic linkage to be loaded into memory much
	quicker. Complex C++ applications such as Mozilla, OpenOffice and KDE can especially benefit
	from this technique.
	</para>

	<para>
	The 4G VM split patch was developed by Ingo Molnar. It gives the Linux kernel its own address
	space, thereby allowing processes to access the maximum addressable amount of memory on a
	32-bit machine: 4 gigabytes. It allows people with lots of RAM to fully utilise that in any
	given process at the cost of performance: as mentioned previously the reason behind giving
	the kernel a part of each processes address space was to avoid the overhead of switching on
	each syscall.
	</para>

	<para>
	Each of these changes alter the address space in a way incompatible with Windows. Prelink and
	exec-shield mean that the libraries Wine uses can be placed at any point in the address
	space: typically this meant that a library was sitting in the region that the EXE you wanted
	to run had to be loaded (remember that unlike DLLs, EXE files cannot be moved around in
	memory). The 4G VM split means that programs could receive pointers to the top gigabyte of
	address space which some are not prepared for (they may store extra information in the high
	bits of a pointer, for instance). In particular, in combination with exec-shield this one is
	especially deadly as it's possible the process heap could be allocated beyond
	ADDRESS_SPACE_LIMIT which causes Wine initialization to fail.
	</para>

	<para>
	The solution to these problems is for Wine to reserve particular parts of the address space
	so that areas that we don't want the system to use will be avoided. We later on
	(re/de)allocate those areas as needed. One problem is that some of these mappings are put in
	place automatically by the dynamic linker: for instance any libraries that Wine
	is linked to (like libc, libwine, libpthread etc) will be mapped into memory before Wine even
	gets control. In order to solve that, Wine overrides the default ELF initialization sequence
	at a low level and reserves the needed areas by using direct syscalls into the kernel (ie
	without linking against any other code to do it) before restarting the standard
	initialization and letting the dynamic linker continue. This is referred to as the
	preloader and is found in loader/preloader.c.
	</para>

	<para>
	Once the usual ELF boot sequence has been completed, some native libraries may well have been
	mapped above the 3gig limit: however, this doesn't matter as 3G is a Windows limit, not a
	Linux limit. We still have to prevent the system from allocating anything else above there
	(like the heap or other DLLs) though so Wine performs a binary search over the upper gig of
	address space in order to iteratively fill in the holes with MAP_NORESERVE mappings so the
	address space is allocated but the memory to actually back it is not. This code can be found
	in libs/wine/mmap.c:reserve_area.
	</para>

	</sect1>

	</chapter>