A Programming Fusion Technique for Windows NT
by Greg Hoglund <hoglund@ieway.com>                                           Tue Dec 07
Tue Dec 07 1999                                                               1999
A Programming Fusion Technique For Windows NT                                 A Programming
                                                                              Fusion
Using c/c++ and assembly together under Windows NT to stackguard,             Technique for
boobytrap, and otherwise get your hands dirty.                                Windows NT

Part One                                                                      Thu Nov 18
                                                                              1999
-Greg Hoglund, 1999 ( http://www.rootkit.com )                                Interpreting
Copyright Security-Focus.com 1999                                             Network
                                                                              Traffic: A
Introduction                                                                  Network
                                                                              Intrusion
                                                                              Detectors Look
Assembly language is a useful weapon. I must admit that assembly is           at Suspicious
complicated for many. SO many of us program in 'c' or c++, and yearn for      Events
leverage of assembly. Think about patching the interrupt table under NT.
Think about writing steel-belted inner loops. Think about stack-guarding      Tue Nov 02
and boobytraps. All of this granted by assembly. The difficult part comes     1999
with the development environment. Do you have TASM 5 installed? How about     Implementing a
SoftIce? As it turns out, you don't need to for some tasks. Right there in    Secure Network
your MS Dev Studio - with VC++ - you can brandish the power of machine
language. Through a few simple tricks you can begin down a deep and secret    Tue Oct 19
path. And to go there, you don't always need SoftIce - or TASM. All you       1999
need is VC++ and some time to forge ahead.                                    THE TRINITY OF
                                                                              A QUALITY
                                                                              INFORMATION
The power of C                                                                SECURITY
                                                                              PROGRAM v2

Higher level languages give you an incredible performance tool. Coding in C   Wed Oct 06
or C++ can take far less time than debugging and writing code in assembly.    1999
If your a hard case, you CAN write in 100% assembly. The point is that you    The Last Line
don't *have* to go there. You can still benefit from assembly. Using VC++,    of Defense,
you can exploit the powers of your runtime libraries, your graphics           Broken
libraries, and even PHAT codebases like MFC. I personally would rather use
the MFC window classes than re-write my own from scratch. Sure, it's          Tue Sep 21
bigger, slower - but those aren't the parts of my program that *need*         1999
speed. There is a trade-off and it's based on time. Think about this - the    Auditing Your
faster you can crank out apps - the more you will make - the faster your      Firewall Setup
armies will spread - the more people who will download *your* tool instead
of tomorrow's tool.                                                           Thu Aug 26
                                                                              1999
Using this Fusion technique you can still take advantage of OpenGL macros     How to Get A
and libraries, MS-DevStudio app-wizards, MFC, C++ Classes, DCOM, and the      Real Security
Standard Template Library. Using Fusion technique you can stack-guard your    Budget
subroutines, patch interrupts, hook system calls, overwrite system tables,
make your own system calls, use undocumented functions, exploit systems,      Wed Aug 25
write solid inner loops - Use it or lose it.                                  1999
                                                                              Cautionary
A simple 'hello fusion'                                                       Tales: Stealth
                                                                              Coordinated
The following code demonstrates a simple use of __declspec( naked ) and       Attack HOWTO
shows how you can write your own code, unhindered by the VC code generator.
The benefit is a crux of control. Like many engineers - I can't *stand it*    Mon Aug 23
when I can't control my code. Here is the source:                             1999
                                                                              Why
void _func1(void);                                                            Crypto-Control
                                                                              Will Fail
void main(void)
{                                                                             Wed Aug 11
_func1();                                                                     1999
}                                                                             The Internet
                                                                              Auditing
void __declspec(naked) _func1(void)                                           Project
{
__asm                                                                               [ more ]
{
ret
}
}

As you can see, you must supply your own 'ret' instruction - which means
"return". Since we have declared the function naked, we must manually tell
the processor to return. The normal, invisible code that makes that happen
for us is *not* generated when we declare a function naked. Everything is
manual. Lets look at what the assembly language looks like for this code
(output as shown by MSDev debugger - hit ALT-8 when debugging to see this):

12: void main(void)
13: {
0040A440 55 push ebp
0040A441 8B EC mov ebp,esp
14: _func1();
0040A443 E8 C2 6B FF FF call @ILT+10(?_func1@@YAXXZ)(0x0040100a)
15: }
0040A448 5D pop ebp
0040A449 C3 ret
16:
17: void __declspec(naked) _func1(void)
18: {
0040A44A C3 ret
19: __asm
20: {
21: ret
22: }
23: }

At first, the program sets up for the main() call - that is the ebp, esp
stuff in the first two lines. Next, we actually call our function - that is
the 'call' statement. As you can see, you function is exactly 1-BYTE long
('C3') and says 'ret' - return from the function. Trace this yourself in
the debugger to witness how this works. The next two lines simple setup the
main() function for exit.

How does this differ from a normal, everyday call? Well, lets look! A
normal call would *not* be declared naked. It would look like this:

void _func3(int a);

void main(void)
{
_func3(1);
}

void _func3(int a)
{
a = 0;
}

Note that our function takes one argument. For simplicity, we are simply
going to zero it out. Lets look now at the corresponding assembly language:

12: void main(void)
13: {
0040A440 55 push ebp
0040A441 8B EC mov ebp,esp
14: _func3(1);
0040A443 6A 01 push 1
0040A445 E8 C5 6B FF FF call @ILT+15(?_func3@@YAXH@Z)(0x0040100f)
0040A44A 83 C4 04 add esp,4
15: }
0040A44D 5D pop ebp
0040A44E C3 ret

33: void _func3(int a)
34: {
0040A451 55 push ebp
0040A452 8B EC mov ebp,esp
35: a = 0;
0040A454 C7 45 08 00 00 00 00 mov dword ptr [a],0
36: }
0040A45B 5D pop ebp
0040A45C C3 ret

For the most part it looks the same as our previous example. Because we are
passing an argument we must first 'push' the value on the stack. Note the
'push' directly before the 'call'. Also, because we aren't using the
'naked' directive, the compiler has created a stack frame for our function.
Note that the base pointer (ebp) is pushed onto the stack and set to the
current stack pointer (esp). The 'mov' instruction corresponds to the "a =
0". When the work is all done, we must pop the base pointer back off of the
stack and return. Finally, the compiler corrects the stack position for us
- note the add esp, 4 - which moves the stack pointer back to it's original
position prior to the call. All of this was done automatically.

Now, you ask, why would anyone wish to use 'naked'? Well, lets say you want
to pass all of your arguments in registers. This can be done for
performance reasons - and using a naked function you can do this. For
example:

/* takes argument in eax, adds 6, and returns result in eax */
void __declspec( naked ) _function(void)
{
__asm
{
add eax, 6
ret
}
}

void main(void)
{
int result;
__asm mov eax, 10
result = _function();
}

For more complicated functions, or when calling a function series, this can
be very useful. You can track buffer position in one register, while
storing a function pointer in another, and storing a heap pointer in a
third. FLAGS could be stored in yet another register. If calling several
hundred functions in a row on a single dataset, this removes all of the
function prelude - epilog and function stack. This can increase your
performance quite a bit.

The following function takes a pointer to a structure in ebx and
dereferences it. It assumes that a pointer to a destination buffer has
already been set up in edi. It's functions like these that can be ganged
together to work on large datasets. Keep it in the registers!

struct _function
{
void *mRunFunction;
void *mHeapData; /* stores state in each instance */
struct _function *next;
};

void __declspec(naked) f_charseq(void){
__asm
{
call [ebx] ; do some work
mov esi, [ebx + 4] ; mHeapData
mov eax, [esi] ; move a number
mov [edi], eax ; mov to destination and
; increment destination pointer
inc edi
ret
}
}



Pulling some neat tricks

Now that you know the technique, lets use it for some useful tricks. One
trick I recently developed is stack-guarding your function calls. A
stack-overflow depends, usually, on the ability to overflow a locally
defined variable (sometimes called an 'automatic variable'). These
variables are allocated on the stack. If they are overflowed, then it
follows that the stack can become corrupted. Obviously, then, the return
address can be overwritten and the buffer-overflow delivers it's payload.

Let's explore a trick to prevent the stack-smash from working. The
following function stores an extra copy of the return-address onto the
stack. Because of the way VC++ sets up for a function call, our *manual*
'push' places a copy of the return address LAST on the stack. This is
important because any buffers local to this function come BEFORE this
value, and will be filled in the *opposite* direction (away from our saved
return address). What this means is, no matter how much of an overflow is
cast, this last value we pushed cannot be overwritten.

The code:

void _func2(int a, int b)
{
// stack guarded function
// note that this 'push eax' places copy of eip
// *LAST* on the stack, and therefore safe from
// any local buffer that may be overflowed
__asm
{
push [ebp + 4] ; eip
}

char s[10];

a = 5;
b = 10;

strcpy(s, "XXXXXXXXXXXX");

// pop saved EIP and check, throw
// debug interrupt if things aren't cool.
__asm
{
pop eax
cmp eax, [ebp + 4]
je BUFFER_OK
int 3
BUFFER_OK:
}
}

At the end of the call, we pop our saved value and check it against the
real return-address. If they do not match we throw a debug-break (interrupt
3) and the program halts w/o any damage to the system. The hacker will have
only succeeded in a DoS attack. Of course, we could throw an exception and
gracefully exit the program. Additionally, we could gracefully restart the
program also - or even handle the exception internally. I leave these as
exercises for the engineer, as these are fairly design-specific decisions.
The technique is sound. If you want to make your code look cleaner, use the
following MACRO's:

#define START_GUARD __asm push [ebp + 4]
#define END_GUARD __asm \
{ \
__asm pop eax \
__asm cmp eax, [ebp + 4] \
__asm je BOK \
__asm int 3 \
__asm BOK: \
}

void _func2(int a, int b)
{
START_GUARD

char s[10];

a = 5;
b = 10;

strcpy(s, "XXXXXXXXXXXX");

END_GUARD
}


Lets take another step and overwrite the system service table. Because
kernel-memory (the upper few gigs of address space) is off-limits to the
lowly user-process, you must write a driver or native application. The
native application runs in ring-0 and has access to kernel structures.

As a note, leveraging the SE_DEBUG privilege to write to kernel-memory
doesn't work from user-mode. As soon as you try to query a remotely
interesting page of memory, you are denied access. VirtualQuery() simply
fails to return anything. Try to query anything up in 0x80000000 and you'll
see what I mean. This doesn't leave us in the dust however - SE_DEBUG
*will* allow us to inject code into other processes - and this can be
interesting.


Hooking the interrupt table under NT

Inline assembly within VC++ turns out not to be so fruitful. A simple use
of the 'sidt' instruction, in a feeble attempt to load the interrupt table
address, returns me the following scrub:

fatal error C1001: INTERNAL COMPILER ERROR
(compiler file 'E:\utc\src\P2\x86\inasm.c', line 471)

Oh dear Microsoft, thank you for your help. It turns out the we actually
*can* use this instruction, of course, but only from the DDK build utility
- not from VC++ itself. Also, TASM will not compile that instruction
either. TASM returns the following error:

error** maingui.asm(322) Illegal instruction for currently selected
processor(s)

This complicates things a bit and means you must have the Windows NT DKK
installed. Writing a driver is rather simple, however, so let's explore
patching the interrupt table under NT. The following code is a basic
driver:

#include "ntddk.h"
#include "stdarg.h"
#include "stdio.h"

// print macro that only turns on when checked builds are on
#if DBG
#define DbgPrint(arg) DbgPrint arg
#else
#define DbgPrint(arg)
#endif

NTSTATUS DriverEntry(
IN PDRIVER_OBJECT DriverObject,
IN PUNICODE_STRING RegistryPath )
{
DbgPrint ("Entering DriverEntry\n");

/* for now all dispatches point to the same place */
DriverObject->MajorFunction[IRP_MJ_READ] =
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] =
DriverObject->MajorFunction[IRP_MJ_FLUSH_BUFFERS] =
DriverObject->MajorFunction[IRP_MJ_CLEANUP] =
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = OnDispatchGeneral;

return STATUS_SUCCESS;
}

NTSTATUS OnDispatchGeneral(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp )
{
PIO_STACK_LOCATION currentIrpStack = IoGetCurrentIrpStackLocation(Irp);
PIO_STACK_LOCATION nextIrpStack = IoGetNextIrpStackLocation(Irp);
/* Default to success. */
Irp->IoStatus.Status = STATUS_SUCCESS;
Irp->IoStatus.Information = 0;
return STATUS_SUCCESS;
}

This code will compile under the NT DDK. I won't go into detail here on
making a correct 'SOURCES' file, or 'Makefile' - there is plenty of sample
code in the DDK that can help you build an environment. Lets explore
something a little more exciting - patching the interrupt table.

The interrupt table on x86 processors is pointed to by the IDT register.
There are two instructions for dealing with the IDT register, 'lidt' and
'sidt', which load and save the value of the register, respectively. Using
a driver, which is running in ring-0, will allow us to use these
instructions. User-mode (ring-3) does not have the privilege of loading a
value into the IDT register.

The IDT register contains a 6-byte value. The first 32 bits contain the
base address of the IDT itself. The last 16 bits contain the size of the
IDT. The IDT under NT contains 256 'entries'. Each entry corresponds
directly with an interrupt. Each entry in the IDT (when running under NT,
hence protected-mode) is exactly 8 bytes long, therefore the IDT under NT
is exactly 2KB in length. These entries are called segment-descriptors.
Let's talk about these.

IDT entries, or 'segment-descriptors', can one of three types. The first
and most important is called an 'interrupt-gate'. There is also a
'task-gate' and a 'trap-gate', but I am going to focus only on the
interrupt-gate for now.

When an interrupt occurs, the corresponding interrupt-descriptor is read.
Within the descriptor are two values - a code-segment and an offset. Under
NT, the code-segment is almost always set to 0x08. The processor will look
up the corresponding code-segment in the GDT (Global Descriptor Table) to
find out where it starts in memory. Code-segment 0x08 starts at memory
location 0x00000000 so this is fairly easy. Then, the processor will add
the offset and jump to the corresponding code. So, when debugging around,
you can almost always simply jump to the offset without worrying so much
about the GDT. You can explore all of these structures using SoftIce with
the 'gdt' and 'idt' commands.

The structures for the IDT register and the IDT descriptors can be defined
as follows:

#pragma pack(1)
typedef struct
{
WORD offset_lo;
WORD selector;
BYTE reserved_lsb;
unsigned char reserved_msb:5;
unsigned char DPL:2;
unsigned char SEGMENT_PRESENT:1;
WORD offset_hi;
} IDT_DESCRIPTOR;

typedef struct
{
WORD size;
WORD base_lo;
WORD base_hi;
} IDT_REGISTER;
#pragma()

In our driver, let's go ahead and install our own hook into the IDT. We can
hook any interrupt we choose, but for demonstration purposes I am choosing
interrupt 2Eh. Interrupt 2Eh is the System Service interrupt and is used by
user mode programs to call any function contained within NTDLL or NTOSKRNL.
As you realize, this can be fairly powerful.

The code to actually hook the IDT is as follows:

struct IDT_REGISTER gRegister;
struct IDT_DESCRIPTOR *gHeadDescriptorP = NULL;
struct IDT_DESCRIPTOR *g_2E_DescriptorP = NULL;
void *gSystemCallPtr = NULL;

__asm lea eax, gRegister
__asm sidt [eax] ; load the IDT register in gRegister

gHeadDescriptorP = MAKELONG( gRegister.base_lo, gRegister.base_hi);

Find the interrupt 2E descriptor:

g_2E_Descriptor = &gHeadDescriptor[0x2E];

Get the function address that is stored within the descriptor:

gSystemCallPtr =
MAKELONG( g_2E_Descriptor->offset_lo,
g_2E_Descriptor->offset_hi);

Replace the address stored within the descriptor:

__asm
{
cli ; disable interrupts
lea eax, MyHookFunction
mov ebx, g_2E_Descriptor
mov [ebx], ax
shr eax, 16
mov [ebx+6], ax
sti ; re-enable interrupts
}

Note that we must disable interrupts while messing around with the IDT. We
wouldn't want an interrupt to be services while we have half-loaded the
descriptor with a new address! ;-) Also note that we are replacing the
address in the descriptor with that of a function called 'MyHookFunction'.
Let's explore that now.

To hook the interrupt we want to inject our own code that must be run. When
our code is finished, we want to call the original code. To do this we are
going to revisit our old friend __declspec( naked ).

__declspec(naked) MyHookFunction()
{
__asm
{
// call number is in eax
// do something here...
jmp gSystemCallPtr;
}
}

Obviously we could pull off a number of tricks here. We could, for
instance, determine the process-ID of the caller. Easily, we could alter
the parameters that are being passed. We could even 'add' functionality to
certain system calls. We could even add our own system calls! (And this
without actually adding them to the system service call table ... rather
stealthy eh?).

Virii and Internet worms will have a field day with such technique. We
could hide all sorts of information from user-mode. We could hide
processes, files, and even redirect requests based on process-id. Malicious
yes, but there are many good & legitimate uses for this. For example, when
designing a host-based IDS recently, I was able to profile system-call
usage based on process. This enabled me to build an anomaly detector.
Furthermore, there are several good books on neural networks on the market
(with source code) - you could easily build a neural network
process-profiler with this technology. On a simpler note, you could simply
watch to see what files are being opened or touched with NtCreateFile().
You wouldn't need to have NT auditing turned on - it just works.

Obviously debugging is another use for a technique such as this. Other
interesting ideas include enforcing independent access-controls on files
(think B1 security), user-profiling, and tracking resource usage.

Getting the process ID is a bit more tricky than the hook itself. While
reversing NTOSKRNL, I find that the following location is called when
interrupt 2E occurs (unassembled in SoftIce):

:u 805b3e31 L 40
0008:805b3e31 push 8013cbd0
0008:805b3e36 jmp 8055f2ab
0008:805b3e3b xor [eax],eax
0008:805b3e3d add eax,fs:[eax+66290468]
0008:805b3e44 sub cl,5a
0008:805b3e47 xor bh,al
0008:805b3e49 push dword ptr [ebx]
0008:805b3e4b add [eax+ebx-80],ah
0008:805b3e4f push 80033044
0008:805b3e54 jmp 8053681c
0008:805b3e59 xor al,00
0008:805b3e5b add eax,fs:[eax+66726468]
0008:805b3e62 sub cl,e9
0008:805b3e65 sub eax,edi
0008:805b3e67 invalid
0008:805b3e6a add eax,fs:[eax+13c37868]

As it turns out, this location is within SoftIce itself - so I cannot set a
breakpoint here. SoftIce is up to something fishy - but it's a debugger and
could be up to anything. I followed this a little further and ended up in
the following code (within NTOSKRNL):

---------------------------------------------------------------------------

8013CBD0
8013CBD0 loc_8013CBD0: ; DATA XREF: INIT:801C7A50o
8013CBD0 push 0
8013CBD2 push ebp
8013CBD3 push ebx
8013CBD4 push esi
8013CBD5 push edi
8013CBD6 push fs
8013CBD8 mov ebx, 30h ; in GDT at FFDFF000h
8013CBDD db 66h
8013CBDD mov fs, bx

save old value. Usually -1, but sometimes a pointer:
8013CBE0 push dword ptr ds:0FFDFF000h

put -1 in place:
8013CBE6 mov dword ptr ds:0FFDFF000h, 0FFFFFFFFh

setup pointer to struct:
8013CBF0 mov esi, ds:0FFDFF124h

save the old value of ___ :
8013CBF6 push dword ptr [esi+137h]

make room on the stack:
8013CBFC sub esp, 48h
8013CBFF mov ebx, [esp+6Ch] ;some variable
8013CC03 and ebx, 1

new value into struct:
8013CC06 mov [esi+137h], bl

prep for call:
8013CC0C mov ebp, esp

mov from struct to automatic @ 3Ch:
8013CC0E mov ebx, [esi+128h]
8013CC14 mov [ebp+3Ch], ebx
8013CC17 mov [esi+128h], ebp
8013CC1D cld
8013CC1E test byte ptr [esi+2Ch], 0FFh
8013CC22 jnz loc_8013CB48

It looks like the FS register (a data segment register) is being loaded
with 0x30 - checking the GDT reveals that the data segment 0x30 points to
memory location 0FFDFF000h - just like the code shows - imagine that. A
couple of offsets are being checked from that data-segment so let's dump
them:

at offset 124 there is a pointer to some structure, we store that in esi:
mov esi, ds:0FFDFF124h

Offset into the structure, at 137, is something interesting:
push dword ptr [esi+137h]

also offset into the structure at 128 is something:
mov ebx, [esi+128h]

finally, I note another tidbit at 2Ch
test byte ptr [esi+2Ch], 0FFh

Digging around in this pile isn't too fruitful - but the process id ends up
being at offset 0x01E0 from esi - so it is possible to now learn the
process responsible for the system call:

__asm
{
push edx
push fs
mov bx, 0x30
mov fs, bx
mov edx, fs:[0x1E0] ; edx has process id
// do something with this data
pop fs
pop edx
}


Okay, enough said. Next week we will go back to user-mode and take a look
at leveraging the SE_DEBUG privilege to inject code into other processes.

To Be Continued...


                                                           [ Post a reply ]
[Image]

Discussion
No comments have been posted.

                                 copyright
                     Interested in advertising with us?
