Tracking file system access of individual processes

2018-02-05

by Martin Messer

In the last article, we have shown how to interrupt a process running in an unpatched Windows system on top of the Cyberus virtualization platform before it executes specific system calls using the Tycho Python API. This time, we demonstrate how to implement a short but useful script that logs which files are accessed by a process of our choice.

Prerequisites

We need the same setup as in the last article Windows system call parameter analysis:

  • The typical Cyberus Setup. If you are not familiar with it, please have a look at our blog article Fun with Python and Tycho.
  • An executable to analyse. Let's use pafish again.

List of accessed files and devices

In the previous blog post, we used the NtCreateFile system call to show how easy it is to extract some Windows system call arguments out of a running process.

This time, we will have a closer look into individual arguments in order extract the actual path of the files being accessed for read/write purposes. Doing this for every NtCreateFile system call will enable us to collect a list of all accessed files and devices during the execution of our executable pafish.exe.

Please be aware that files can also be mapped to memory using different system calls. For the sake of simplicity we just concentrate on NtCreateFile in this blog post.

Before we know what exactly our script needs to do, we need to research Windows data structures a bit:

NtCreateFile data structure research

In order to understand how to get file/device paths out of calls to NtCreateFile, we need to have a look at its signature It is documented on MSDN:

NTSTATUS NtCreateFile(
  _Out_    PHANDLE            FileHandle,
  _In_     ACCESS_MASK        DesiredAccess,
  _In_     POBJECT_ATTRIBUTES ObjectAttributes,
  _Out_    PIO_STATUS_BLOCK   IoStatusBlock,
  _In_opt_ PLARGE_INTEGER     AllocationSize,
  _In_     ULONG              FileAttributes,
  _In_     ULONG              ShareAccess,
  _In_     ULONG              CreateDisposition,
  _In_     ULONG              CreateOptions,
  _In_     PVOID              EaBuffer,
  _In_     ULONG              EaLength
);

While studying the types of the input parameters, we find that file and device names are stored in the OBJECT_ATTRIBUTES structure. (POBJECT_ATTRIBUTES is a type alias of a pointer to an OBJECT_ATTRIBUTES value) It is also documented on MSDN:

typedef struct _OBJECT_ATTRIBUTES {
  ULONG           Length;
  HANDLE          RootDirectory;
  PUNICODE_STRING ObjectName;
  ULONG           Attributes;
  PVOID           SecurityDescriptor;
  PVOID           SecurityQualityOfService;
} OBJECT_ATTRIBUTES, *POBJECT_ATTRIBUTES;

We do quickly find out that the UNICODE_STRING pointer does actually lead us to the file/device path which is stored as a wide string in the guest memory.

In contrast to structures, strings have dynamic size, so they have to be handled differently. Therefore, we need to implement our custom handler in a way that it knows and respects the string size while extracting it from memory. Luckily, the MSDN documentation tells us that a UNICODE_STRING structure contains both a pointer to the wide string and its actual length:

typedef struct _LSA_UNICODE_STRING {
  USHORT Length;
  USHORT MaximumLength;
  PWSTR  Buffer;
} LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING, *PUNICODE_STRING;

Implementation

Now, let's implement what we learned. The first imports are similar to the previous post so we will skip their explanation:

import time
import pprint
from pyTycho.syscall_interpreter import interpret_execute_syscall
from pyTycho import tycho

Instead of running a few python code lines in the interactive python shell as in the previous articles, we are writing a standalone script this time. The whole script is accessible here on github.

Additionally, we need some other features from the pyTycho.syscall_interpreter module which is needed for carving of additional types and values out of system calls. The information obtained from it can be post processed by registering our own custom handler functions for the system call interpretation phase.

from pyTycho.syscall_interpreter import pointer_tracking_enabled_by_type
from pyTycho.syscall_interpreter import type_specific_handlers

The first few lines are in general the same as in the last article. We obtain the service object that allows us to talk to the Tycho service, then we open a process handle to pafish.exe before it is actually started. That process handle allows us to interrupt pafish (or any other app whose executable name we put into the variable file_name) in the moment it tries to execute its first instruction. Then we print a wait message until we see our app being scheduled:

service = tycho()
process_handle = service.open_process(file_name)
process_handle.set_break_on_start(True)
while not process_handle.is_running():
    time.sleep(1)
    print("{} is currently not running".format(file_name))

System call interpretation handler

The Cyberus system call interpretation engine is able to automatically extract pointer target values from system call arguments out of the guest system. In order to make use of this feature, we need to enable system call breakpointing for the NtCreateFile system call and add the pointer type POBJECT_ATTRIBUTES to the pointer tracking list:

breakpoint = process_handle.get_syscall_breakpoint()
breakpoint.add_syscall_whitelist(tycho.syscalls.NtCreateFile)
breakpoint.add_syscall_whitelist(tycho.syscalls.NtTerminateProcess)
breakpoint.enable()
pointer_tracking_enabled_by_type.append("POBJECT_ATTRIBUTES")

We also added NtTerminateProcess to the breakpoint white list. The explanation for this follows later.

In the second step, we want to carve the file name string only. This means that we need to access the UNICODE_STRING member of the OBJECT_ATTRIBUTES structure pointed at from the input arguments.

At this point we need to write our own custom handler function that we will then register as a callback in the system call interpretation library. The library assumes the following signature:

def our_callback(process_handle, argument_representation):
   # ...

The first parameter is the tycho process handle. The second parameter is a tuple that contains the type name of the parameter (in our case UNICODE_STRING) as well as a python dictionary object with carved member values of type instance:

(
 UNICODE_STRING, {
    "Length"        : (ULONG, <length of string>),
    "MaximumLength" : (ULONG, <maximum length>),
    "Buffer"        : (PVOID, <pointer to buffer>),
 }
)

Our callback function does not need to return anything.

The implementation shall just extract the file/device path and add it to a global list:

list_of_files = []

def extract_string(process, object_representation):
     global list_of_files
     typ, value = object_representation
     if typ == "UNICODE_STRING" and "Buffer" in value.iterkeys() and "Length" in value.iterkeys():
         _, length = value["Length"]
         _, address = value["Buffer"]
         filename = process.read_linear(address, length)
         list_of_files.append(filename.decode("utf-16"))

At first we check if we got the right type. Then we obtain both the pointer to the wide string as well as its length. Finally, we can use that information to read the path out of guest memory using the process.read_linear function. Since this is a wide string, we need to decode it before finally appending it to our global file list.

Having the handler implemented, we can now register it in the system call interpretation library:

type_specific_handlers.append( ("UNICODE_STRING",extract_string) )

At this point, we have performed all necessary preparation. We can now write a loop that extracts path information out of every NtCreateFile system call:

while True:
    process_handle.resume()
    thread_handle = process_handle.wait_for_breakpoint()
    syscall = interpret_execute_syscall(process_handle, thread_handle)
    if syscall["num"] == tycho.syscalls.NtTerminateProcess:
        break

pprint.pprint(list_of_files)

A look at the abort condition reveals why we also whitelisted the NtTerminateProcess system call: We use it to detect the termination of pafish.

After pafish has terminated, we print the list of file and device paths. Running the whole script shows that some files were accessed more than one time:

[u'pafish.log',
 u'pafish.log',
 u'pafish.log',
 u'pafish.log',
 u'hi_CPU_VM_rdtsc_force_vm_exit',
 u'pafish.log',
 u'hi_sandbox_mouse_act',
 u'\\??\\PhysicalDrive0',
 u'\\??\\Nsi',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\??\\VBoxMiniRdrDN',
 u'\\??\\pipe\\VBoxMiniRdDN',
 u'\\??\\VBoxTrayIPC',
 u'\\??\\pipe\\VBoxTrayIPC',
 u'\\??\\C:\\Windows\\SysWOW64\\de-DE\\MPR.DLL.mui',
 u'\\??\\C:\\Windows\\Globalization\\Sorting\\sortdefault.nls',
 u'\\??\\C:\\Windows\\SysWOW64\\rsaenh.dll',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\DEVICE\\NETBT_TCPIP_{D0A4D4B8-574B-4FC2-939C-13AE21F36507}',
 u'\\DEVICE\\NETBT_TCPIP_{846EE342-7039-11DE-9D20-806E6F6E6963}',
 u'\\DEVICE\\NETBT_TCPIP_{D535A6F8-90AF-4DC7-B511-4DD493AD6F6F}',
 u'\\??\\HGFS',
 u'\\??\\vmci',
 u'pafish.log']

Conclusion

The Tycho Python API is capable to extract system call information out of a running process. While doing that, it can also extract additional data using user-provided callback functions.

Based on such insights, users can implement their own high level breakpoints, e.g. that stop a process when it...

  • accesses certain paths
  • reads/writes specific files
  • attempts to communicate with (specific) hosts from the network (for example by interpreting NtDeviceIoControlFile system calls)
  • ...

Of course it is also possible to not only log system call parameters, but also to fake the return values or even deflect the whole call.


Share this article: