| <chapter id="implementation"> |
| <title>Low-level Implementation</title> |
| <para>Details of Wine's Low-level Implementation...</para> |
| |
| <sect1 id="undoc-func"> |
| <title>Undocumented APIs</title> |
| |
| <para> |
| Some background: On the i386 class of machines, stack entries are |
| usually dword (4 bytes) in size, little-endian. The stack grows |
| downward in memory. The stack pointer, maintained in the |
| <literal>esp</literal> register, points to the last valid entry; |
| thus, the operation of pushing a value onto the stack involves |
| decrementing <literal>esp</literal> and then moving the value into |
| the memory pointed to by <literal>esp</literal> |
| (i.e., <literal>push p</literal> in assembly resembles |
| <literal>*(--esp) = p;</literal> in C). Removing (popping) |
| values off the stack is the reverse (i.e., <literal>pop p</literal> |
| corresponds to <literal>p = *(esp++);</literal> in C). |
| </para> |
| |
| <para> |
| In the <literal>stdcall</literal> calling convention, arguments are |
| pushed onto the stack right-to-left. For example, the C call |
| <function>myfunction(40, 20, 70, 30);</function> is expressed in |
| Intel assembly as: |
| <screen> |
| push 30 |
| push 70 |
| push 20 |
| push 40 |
| call myfunction |
| </screen> |
| The called function is responsible for removing the arguments |
| off the stack. Thus, before the call to myfunction, the |
| stack would look like: |
| <screen> |
| [local variable or temporary] |
| [local variable or temporary] |
| 30 |
| 70 |
| 20 |
| esp -> 40 |
| </screen> |
| After the call returns, it should look like: |
| <screen> |
| [local variable or temporary] |
| esp -> [local variable or temporary] |
| </screen> |
| </para> |
| |
| <para> |
| To restore the stack to this state, the called function must know how |
| many arguments to remove (which is the number of arguments it takes). |
| This is a problem if the function is undocumented. |
| </para> |
| |
| <para> |
| One way to attempt to document the number of arguments each function |
| takes is to create a wrapper around that function that detects the |
| stack offset. Essentially, each wrapper assumes that the function will |
| take a large number of arguments. The wrapper copies each of these |
| arguments into its stack, calls the actual function, and then calculates |
| the number of arguments by checking esp before and after the call. |
| </para> |
| |
| <para> |
| The main problem with this scheme is that the function must actually |
| be called from another program. Many of these functions are seldom |
| used. An attempt was made to aggressively query each function in a |
| given library (<filename>ntdll.dll</filename>) by passing 64 arguments, |
| all 0, to each function. Unfortunately, Windows NT quickly goes to a |
| blue screen of death, even if the program is run from a |
| non-administrator account. |
| </para> |
| |
| <para> |
| Another method that has been much more successful is to attempt to |
| figure out how many arguments each function is removing from the |
| stack. This instruction, <literal>ret hhll</literal> (where |
| <symbol>hhll</symbol> is the number of bytes to remove, i.e. the |
| number of arguments times 4), contains the bytes |
| <literal>0xc2 ll hh</literal> in memory. It is a reasonable |
| assumption that few, if any, functions take more than 16 arguments; |
| therefore, simply searching for |
| <literal>hh == 0 && ll < 0x40</literal> starting from the |
| address of a function yields the correct number of arguments most |
| of the time. |
| </para> |
| |
| <para> |
| Of course, this is not without errors. <literal>ret 00ll</literal> |
| is not the only instruction that can have the byte sequence |
| <literal>0xc2 ll 0x0</literal>; for example, |
| <literal>push 0x000040c2</literal> has the byte sequence |
| <literal>0x68 0xc2 0x40 0x0 0x0</literal>, which matches |
| the above. Properly, the utility should look for this sequence |
| only on an instruction boundary; unfortunately, finding |
| instruction boundaries on an i386 requires implementing a full |
| disassembler -- quite a daunting task. Besides, the probability |
| of having such a byte sequence that is not the actual return |
| instruction is fairly low. |
| </para> |
| |
| <para> |
| Much more troublesome is the non-linear flow of a function. For |
| example, consider the following two functions: |
| <screen> |
| somefunction1: |
| jmp somefunction1_impl |
| |
| somefunction2: |
| ret 0004 |
| |
| somefunction1_impl: |
| ret 0008 |
| </screen> |
| In this case, we would incorrectly detect both |
| <function>somefunction1</function> and |
| <function>somefunction2</function> as taking only a single |
| argument, whereas <function>somefunction1</function> really |
| takes two arguments. |
| </para> |
| |
| <para> |
| With these limitations in mind, it is possible to implement more stubs |
| in Wine and, eventually, the functions themselves. |
| </para> |
| </sect1> |
| |
| <sect1 id="accel-impl"> |
| <title>Accelerators</title> |
| |
| <para> |
| There are <emphasis>three</emphasis> differently sized |
| accelerator structures exposed to the user: |
| </para> |
| <orderedlist> |
| <listitem> |
| <para> |
| Accelerators in NE resources. This is also the internal |
| layout of the global handle <type>HACCEL</type> (16 and |
| 32) in Windows 95 and Wine. Exposed to the user as Win16 |
| global handles <type>HACCEL16</type> and |
| <type>HACCEL32</type> by the Win16/Win32 API. |
| These are 5 bytes long, with no padding: |
| <programlisting> |
| BYTE fVirt; |
| WORD key; |
| WORD cmd; |
| </programlisting> |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Accelerators in PE resources. They are exposed to the user |
| only by direct accessing PE resources. |
| These have a size of 8 bytes: |
| </para> |
| <programlisting> |
| BYTE fVirt; |
| BYTE pad0; |
| WORD key; |
| WORD cmd; |
| WORD pad1; |
| </programlisting> |
| </listitem> |
| <listitem> |
| <para> |
| Accelerators in the Win32 API. These are exposed to the |
| user by the <function>CopyAcceleratorTable</function> |
| and <function>CreateAcceleratorTable</function> functions |
| in the Win32 API. |
| These have a size of 6 bytes: |
| </para> |
| <programlisting> |
| BYTE fVirt; |
| BYTE pad0; |
| WORD key; |
| WORD cmd; |
| </programlisting> |
| </listitem> |
| </orderedlist> |
| |
| <para> |
| Why two types of accelerators in the Win32 API? We can only |
| guess, but my best bet is that the Win32 resource compiler |
| can/does not handle struct packing. Win32 <type>ACCEL</type> |
| is defined using <function>#pragma(2)</function> for the |
| compiler but without any packing for RC, so it will assume |
| <function>#pragma(4)</function>. |
| </para> |
| |
| </sect1> |
| |
| <sect1 id="hardware-trace"> |
| <title>Doing A Hardware Trace</title> |
| |
| <para> |
| The primary reason to do this is to reverse engineer a |
| hardware device for which you don't have documentation, but |
| can get to work under Wine. |
| </para> |
| <para> |
| This lot is aimed at parallel port devices, and in particular |
| parallel port scanners which are now so cheap they are |
| virtually being given away. The problem is that few |
| manufactures will release any programming information which |
| prevents drivers being written for Sane, and the traditional |
| technique of using DOSemu to produce the traces does not work |
| as the scanners invariably only have drivers for Windows. |
| </para> |
| <para> |
| Presuming that you have compiled and installed wine the first |
| thing to do is is to enable direct hardware access to your |
| parallel port. To do this edit <filename>config</filename> |
| (usually in <filename>~/.wine/</filename>) and in the |
| ports section add the following two lines |
| </para> |
| <programlisting> |
| read=0x378,0x379,0x37a,0x37c,0x77a |
| write=0x378,x379,0x37a,0x37c,0x77a |
| </programlisting> |
| <para> |
| This adds the necessary access required for SPP/PS2/EPP/ECP |
| parallel port on LPT1. You will need to adjust these number |
| accordingly if your parallel port is on LPT2 or LPT0. |
| </para> |
| <para> |
| When starting wine use the following command line, where |
| <literal>XXXX</literal> is the program you need to run in |
| order to access your scanner, and <literal>YYYY</literal> is |
| the file your trace will be stored in: |
| </para> |
| <programlisting> |
| wine -debugmsg +io XXXX 2> >(sed 's/^[^:]*:io:[^ ]* //' > YYYY) |
| </programlisting> |
| <para> |
| You will need large amounts of hard disk space (read hundreds |
| of megabytes if you do a full page scan), and for reasonable |
| performance a really fast processor and lots of RAM. |
| </para> |
| <para> |
| You will need to postprocess the output into a more manageable |
| format, using the <command>shrink</command> program. First |
| you need to compile the source (which is located at the end of |
| this section): |
| <programlisting> |
| cc shrink.c -o shrink |
| </programlisting> |
| </para> |
| <para> |
| Use the <command>shrink</command> program to reduce the |
| physical size of the raw log as follows: |
| </para> |
| <programlisting> |
| cat log | shrink > log2 |
| </programlisting> |
| <para> |
| The trace has the basic form of |
| </para> |
| <programlisting> |
| XXXX > YY @ ZZZZ:ZZZZ |
| </programlisting> |
| <para> |
| where <literal>XXXX</literal> is the port in hexidecimal being |
| accessed, <literal>YY</literal> is the data written (or read) |
| from the port, and <literal>ZZZZ:ZZZZ</literal> is the address |
| in memory of the instruction that accessed the port. The |
| direction of the arrow indicates whether the data was written |
| or read from the port. |
| </para> |
| <programlisting> |
| > data was written to the port |
| < data was read from the port |
| </programlisting> |
| <para> |
| My basic tip for interpreting these logs is to pay close |
| attention to the addresses of the IO instructions. Their |
| grouping and sometimes proximity should reveal the presence of |
| subroutines in the driver. By studying the different versions |
| you should be able to work them out. For example consider the |
| following section of trace from my UMAX Astra 600P |
| </para> |
| <programlisting> |
| 0x378 > 55 @ 0297:01ec |
| 0x37a > 05 @ 0297:01f5 |
| 0x379 < 8f @ 0297:01fa |
| 0x37a > 04 @ 0297:0211 |
| 0x378 > aa @ 0297:01ec |
| 0x37a > 05 @ 0297:01f5 |
| 0x379 < 8f @ 0297:01fa |
| 0x37a > 04 @ 0297:0211 |
| 0x378 > 00 @ 0297:01ec |
| 0x37a > 05 @ 0297:01f5 |
| 0x379 < 8f @ 0297:01fa |
| 0x37a > 04 @ 0297:0211 |
| 0x378 > 00 @ 0297:01ec |
| 0x37a > 05 @ 0297:01f5 |
| 0x379 < 8f @ 0297:01fa |
| 0x37a > 04 @ 0297:0211 |
| 0x378 > 00 @ 0297:01ec |
| 0x37a > 05 @ 0297:01f5 |
| 0x379 < 8f @ 0297:01fa |
| 0x37a > 04 @ 0297:0211 |
| 0x378 > 00 @ 0297:01ec |
| 0x37a > 05 @ 0297:01f5 |
| 0x379 < 8f @ 0297:01fa |
| 0x37a > 04 @ 0297:0211 |
| </programlisting> |
| <para> |
| As you can see there is a repeating structure starting at |
| address <literal>0297:01ec</literal> that consists of four io |
| accesses on the parallel port. Looking at it the first io |
| access writes a changing byte to the data port the second |
| always writes the byte <literal>0x05</literal> to the control |
| port, then a value which always seems to |
| <literal>0x8f</literal> is read from the status port at which |
| point a byte <literal>0x04</literal> is written to the control |
| port. By studying this and other sections of the trace we can |
| write a C routine that emulates this, shown below with some |
| macros to make reading/writing on the parallel port easier to |
| read. |
| </para> |
| <programlisting> |
| #define r_dtr(x) inb(x) |
| #define r_str(x) inb(x+1) |
| #define r_ctr(x) inb(x+2) |
| #define w_dtr(x,y) outb(y, x) |
| #define w_str(x,y) outb(y, x+1) |
| #define w_ctr(x,y) outb(y, x+2) |
| |
| /* Seems to be sending a command byte to the scanner */ |
| int udpp_put(int udpp_base, unsigned char command) |
| { |
| int loop, value; |
| |
| w_dtr(udpp_base, command); |
| w_ctr(udpp_base, 0x05); |
| |
| for (loop=0; loop < 10; loop++) |
| if ((value = r_str(udpp_base)) & 0x80) |
| { |
| w_ctr(udpp_base, 0x04); |
| return value & 0xf8; |
| } |
| |
| return (value & 0xf8) | 0x01; |
| } |
| </programlisting> |
| <para> |
| For the UMAX Astra 600P only seven such routines exist (well |
| 14 really, seven for SPP and seven for EPP). Whether you |
| choose to disassemble the driver at this point to verify the |
| routines is your own choice. If you do, the address from the |
| trace should help in locating them in the disassembly. |
| </para> |
| <para> |
| You will probably then find it useful to write a script/perl/C |
| program to analyse the logfile and decode them futher as this |
| can reveal higher level grouping of the low level routines. |
| For example from the logs from my UMAX Astra 600P when decoded |
| further reveal (this is a small snippet) |
| </para> |
| <programlisting> |
| start: |
| put: 55 8f |
| put: aa 8f |
| put: 00 8f |
| put: 00 8f |
| put: 00 8f |
| put: c2 8f |
| wait: ff |
| get: af,87 |
| wait: ff |
| get: af,87 |
| end: cc |
| start: |
| put: 55 8f |
| put: aa 8f |
| put: 00 8f |
| put: 03 8f |
| put: 05 8f |
| put: 84 8f |
| wait: ff |
| </programlisting> |
| <para> |
| From this it is easy to see that <varname>put</varname> |
| routine is often grouped together in five successive calls |
| sending information to the scanner. Once these are understood |
| it should be possible to process the logs further to show the |
| higher level routines in an easy to see format. Once the |
| highest level format that you can derive from this process is |
| understood, you then need to produce a series of scans varying |
| only one parameter between them, so you can discover how to |
| set the various parameters for the scanner. |
| </para> |
| |
| <para> |
| The following is the <filename>shrink.c</filename> program: |
| <programlisting> |
| /* Copyright David Campbell <campbell@torque.net> */ |
| #include <stdio.h> |
| #include <string.h> |
| |
| void |
| main (void) |
| { |
| char buff[256], lastline[256]; |
| int count; |
| |
| count = 0; |
| lastline[0] = 0; |
| |
| while (!feof (stdin)) |
| { |
| fgets (buff, sizeof (buff), stdin); |
| if (strcmp (buff, lastline) == 0) |
| { |
| count++; |
| } |
| else |
| { |
| if (count > 1) |
| fprintf (stdout, "# Last line repeated %i times #\n", count); |
| fprintf (stdout, "%s", buff); |
| strcpy (lastline, buff); |
| count = 1; |
| } |
| } |
| } |
| </programlisting> |
| </para> |
| </sect1> |
| |
| </chapter> |
| |
| <!-- Keep this comment at the end of the file |
| Local variables: |
| mode: sgml |
| sgml-parent-document:("wine-devel.sgml" "set" "book" "part" "chapter" "") |
| End: |
| --> |