mr-edd.co.uk :: horsing around with the C++ programming language

Introducing nanohook

[3rd April 2010]

One of the problems I've encountered when porting pexl to Windows is that code can behave differently depending on whether its standard streams are connected to a console or a pipe.

Take the following simple C program, for example:

#include <stdio.h>

int main()
{
    int age = 0;
    printf("How old are you?: ");
    scanf("%d", &age);

    return 0;
}

You can compile and run this at a command prompt on Windows and it will work just fine. Strictly speaking however, there should be a call to fflush(stdout) between the printf and scanf.

Indeed when you run this program with its stdout connected to a named pipe, the question won't appear at the other end.

If it were possible to replace the WriteFile and/or WriteFileEx[1] functions in the child process with equivalents that also flush their output when writing to stdout or stderr, pexl would be able to reproduce the output of these programs faithfully.

nanohook is a library that allows just such replacement functions to be installed at run-time, at the machine code level.

There are already some existing libraries that do this, namely Microsoft Detours and EasyHook. The Detours library has some very strange licensing requirements, however. EasyHook's licensing is a little nicer (LGPL) but I feel the library is much more complicated that it needs to be, at least for my kinds of use, due to all the managed/unmanaged shenanigans.

Another library I came across was N-CodeHook. This was much smaller and simpler. However I couldn't get it to compile on MinGW. Also, since I'm in the middle of learning x86 asm, I thought it might be a good exercise to try to understand and re-write the library. nanohook is the (heavily commented!) result.

It did indeed turn out to be quite a good learning experience as I had to rummage through the Intel reference manuals to understand the bits and bytes that make up the relevant machine code instructions.

DLL injection

So I have a reasonably tight hooking library, but to use it in pexl I need to be able to run code in the child processes in order to install the hooks. This can be achieved by DLL injection.

Windows provides a function called CreateRemoteThread, which allows you to create a thread in another process. By passing LoadLibrary to be the thread's body and the name of a DLL as the function's argument, we can have Windows load a custom-made DLL in to the address space of the child process.

Because Windows runs a DllMain function whenever a dynamic library is loaded, we have the opportunity to run arbitrary code inside that process!

I've glossed over the details somewhat, but by creating a DLL that contains the code to install hooks for WriteFile and/or WriteFileEx, it is possible to have troublesome processes behave as if their output streams were connected to a console.

This is all pretty nasty sounding stuff, so I'm not sure whether I'll actually use it in pexl, but it's definitely been quite fun getting my hands dirty and learning about this stuff!

Other uses

Two projects that I've found quite interesting recently are sham and fabricate.

sham is a Windows build tool that records the inputs and outputs of each process it invokes. It does this by using the DLL injection technique described above in combination with Microsoft's Detours in order to install a number of hooks.

So by taking note of the files it opens for reading and writing, it is able to automagically build up a dependency graph of the build. No special logic is needed to scan source files for #include directives, for example. In fact it will work for anything, there's no reason that you couldn't use it to build Java, whose class-level dependencies are incredibly difficult to work out.

But because sham uses Detours, the legality of its use something of a grey area due to Detours' strange licensing.

fabricate is a similar project that aims to do what sham does in a cross-platform manner. Many UNIXy operating systems provide the strace program, which can be used to perform a similar task to sham.

fabricate currently doesn't use Detours for the Windows solution and instead resorts to monitoring directories, which as far as I understand can become slow as directory sizes grow.

So perhaps some time soon, I'll be able to donate some code to help with fabricate's Windows support.

Footnotes
  1. which are the lowest level user-space functions for writing output to HANDLEs — all writes eventually call one of these, as far as I understand []

Comments

Max

[06/04/2010 at 11:02:26]

Hi,
although nanohook surely is a powerful solution that has several other applications, you may have solved the original problem, by using the line buffering, with the ISO/ANSI function setvbuf. A call like:

setvbuf( stdout, NULL, _IOLBF, SOME_BUFFER_SIZE );

according to the documentation, this call should:

Use line buffering: pass on output to the host system at every
newline, as well as when the buffer is full, or when an input
operation intervenes.

Best Regards

Simon

[07/04/2010 at 21:58:34]

Could you perhaps post a sample solution to your original problem? I have encountered this buffering problem in the past, for example when trying to pipe text output from a running console applications into a GUI window, and never found a workable solution. Modifying the console application code isn't necessarily possible in this case, but it seems nanohook may provide the solution.

Thanks.

Edd

[15/04/2010 at 21:49:19]

Max: that's a good thought. I'll give that a try. It looks like I'd have to use _IONBF on Windows, though.

Some programs might by-pass the C runtime library altogether, in which case one would still have to hook WriteFile(Ex) to call FlushFileBuffers, perhaps (really not sure, guessing).

Simon: I'll see what I can do. Perhaps I'll find time later in the week.

(optional)
(optional)
(required, hint)

Links can be added like [this one -> http://www.mr-edd.co.uk], to my homepage.
Phrases and blocks of code can be enclosed in {{{triple braces}}}.
Any HTML markup will be escaped.