Reverse engineering a software is not an easy task. Especially not if you do this for the first time.
Hi, my name is Sebastian Manns. I study "general and digital forensics". Since one month I am a trainee at Cyberus Technology and my job is Software/Malware Analysis with Tycho.
In my first blog entry I will show you how easy it is to evaluate and manipulate system calls with Tycho using Pafish as an example.
Motivation
In order to gain first experiences in software analysis with Tycho, I was looking for a simple project that would still be useful.
I chose Pafish, because it is a small tool similar to malware that performs various tests to detect if it is running in a virtual environment. When malware detects virtualization, it may not do what it is supposed to do, but behaves inconspicuously. That's why it's important that analysis tools like Tycho are as invisible as possible.
By comparing the two screenshots below, you can see the difference between pafish's output on a normal Oracle Virtualbox VM and on my Tycho Setup. Currently Tycho is not completely invisible for Pafish:
What about the tests that failed? There are four tests that trace the Tycho VM:
- "Checking the difference between CPU timestamp counters (rdtsc) forcing VM exit"
- "Using mouse activity"
- "Checking if disk size <= 60GB via DeviceIoControl()"
- "Checking if disk size <= 60GB via GetDiskFreeSpaceExA()"
Trace number 1 comes from pafish taking the difference between two CPU
timestamp counter values.
Between those two values, it forces a VMEXIT
event of the VM it is
potentially running in.
Faking timestamp counters is a possibility, but this shall not be the scope
of this article.
Trace number 2 comes from the fact that the mouse remains untouched on the analysis system.
I found traces number 3 and 4 most interesting, because they use two specific system calls to test if the hard disk's size is unusually small. My analysis hardware really has such a small disk, so i would need to patch this anyway if i did not get a larger one.
The idea is now to use Tycho to fake the disk size and delude pafish that it is much larger. This task is very manageable, and it is easy to get familiar with different functions of Tycho. And at the end, the Tycho Setup passes two more tests and a nice example script for Tycho is still finished.
Preparation
What exactly are system calls and why are they important for software analysis?
System Calls are used to call a kernel service from user land. The goal is to be able to switch from user mode to kernel mode, with the associated privileges. What system calls are available depend on your operating system.
Tycho gives us full control over the communication from a program to the operating system. We can stop the program at any system call, evaluate and manipulate it and let the program run again. And all this without even looking at a traditional debugger and dealing with assembler code.
What is required:
For this task we only need an typical Tycho Setup and Pafish.
First test: check disk size
This test checks if the hard disk is less than 60 GiB, because a normal
computer rarely has such a small hard disk.
To do this, Pafish uses the Windows function
DeviceIoControl
.
To be able to work with Tycho we have to go deeper and use the system call
that is called by the DeviceIoControl
function, which is
NtDeviceIoControlFile
.
Now let's have a look at the system call. All we need to do is to pause Pafish when this system call occurs. How to pause a program with breakpoints on a system call is also described in the previous article Windows system call parameter analysis.
So now we begin to find the system call.
Python Example
First we import the most important libraries, which are needed for the
breakpoints and the evaluation of the system calls.
Then we connect to the Tycho server and with
service.open_process("pafish.exe")
we open a handle to the process.
After that, we wait until pafish.exe is started.
The pafish.pause()
call will make sure that the process is stopped as soon
as it is started.
import struct
import time
from binasci import unhexlify
from binasci import hexlify
from pyTycho.syscall_interpreter import SystemCallInterpretationFactory
from pyTycho import Tycho
service = pyTycho.Tycho()
pafish = service.open_process("pafish.exe")
pafish.pause()
while not pafish.is_running():
time.sleep(1)
print("pafish.exe is currently not running")
This piece of code creates a breakpoint for NtDeviceIoControlFile
via our
pafish-tycho-process, activates it and waits for the breakpoint.
breakpoint = pafish.get_syscall_breakpoint()
breakpoint.add_syscall_whitelist(pyTycho.Tycho.syscalls.NtDeviceIoControlFile)
breakpoint.enable()
In this snippet of code we create a factory object which retrieves the input and output parameters of a system call. And finally, we're waiting for Pafish to use it.
factory = SystemCallInterpretationFactory(pafish)
pafish.resume()
event = pafish.wait_for_breakpoint()
if event.event_category == event.SYSCALL_BP:
handle_syscall_event(factory, pafish, event)
If Pafish then uses the system call, the script returns from the
wait_for_breakpoint
function and gets an event object with information about
the system call.
The handle_syscall_event
function is invoked when a system call breakpoint is
hit, records the name and input/output parameters for a given system call, and
then prints all this information for further inspection.
{'after': {'IoStatusBlock': {'type': 'IO_STATUS_BLOCK*', 'value': 0x8EC60},
'OutputBuffer': {'type': 'VOID*', 'value': 0x28F930}},
'before': {'ApcContext': {'type': 'VOID*', 'value': 0},
'ApcRoutine': {'type': 'IO_APC_ROUTINE*', 'value': 0},
'Event': {'type': 'HANDLE', 'value': 0},
'FileHandle': {'type': 'HANDLE', 'value': 0xD0},
'InputBuffer': {'type': 'VOID*', 'value': 0},
'InputBufferLength': {'type': 'ULONG', 'value': 0},
'IoControlCode': {'type': 'ULONG', 'value': 0x7405C},
'OutputBufferLength': {'type': 'ULONG', 'value': 0x8}},
'name': 'NtDeviceIoControlFile',
'return_value': 0L}
Syscall interpretation
With the new information we acquired about the system call, the next task is to find out where the size of the hard disk is, and how we can change it.
Looking at
NtDeviceIoControlFile
in the MSDN docs
again, we find the following information:
OutputBuffer
: A pointer to a buffer that is to receive the device-dependent return information from the target device.
OutputBufferLength
: Length of the OutputBuffer in bytes.
IoControlCode
: Code that indicates which device I/O control function is to be executed.
The structure that this pointer points to is 8 bytes in size, consisting of a single member indicating the disk size.
- How do we know these 8 bytes represent the size of the hard drive?
We have the
IoControlCode
0x7405C
. On ioctls.net we can find it and follow the link toIOCTL_DISK_GET_LENGTH_INFO
. At the bottom of the page, there is a link toGET_LENGTH_INFORMATION
on which is written:
Contains disk, volume, or partition length information used by the
IOCTL_DISK_GET_LENGTH_INFO
control code
Return value patching
Now that we have the right system call and know what it does, it is super easy to manipulate it with Tycho. To check the output from above, you can do the following steps.
hex(struct.unpack("<Q",pafish.read_linear(0x28F930, 8))[0])
'0xEE8156000'
If you divide 0xEE8156000
bytes by 0x400³
you get 59 GiB which is the size
of the hard disk.
And with just one more command it is possible with Tycho to change the size to 500 GiB.
pafish.write_linear(0x28F930, struct.pack("<Q", 0x7D00000000))
struct
is a python module
which converts binary data into a python object (or a python object into binary
data) with the layout of binary specified by the format string.
With struct.pack()
we convert the new size value into binary data and with
pafish.write_linear()
we write the value into the memory.
Second test: check_free partition space
The second test is similar to the first. This time it checks how much space is
available on the C: partition.
If less than 60GiB are free the test is not passed.
To check this, the function
getDiskFreeSpaceExA
is used.
This function calls the system call
NtQueryVolumeInformationFile
which we can analyze with Tycho.
The interpretation works also similarly to the first test. All we have to do is look for another system call, but the interpretation is a bit different.
We add NtQueryVolumeInformationFile
to the whitelist and the system call
interpretation retrieves the following information after the breakpoint was hit:
{'after': {'FileSystemInformation': {'type': 'VOID*', 'value': 0x28F8A8},
'IoStatusBlock': {'type': 'IO_STATUS_BLOCK*', 'value': 0x8E328}},
'before': {'FileHandle': {'type': 'HANDLE', 'value': 0xD0},
'FileSystemInformationClass': {'type': 'FS_INFORMATION_CLASS',
'value': (0x3,
'FileFsSizeInformation')},
'Length': {'type': 'ULONG', 'value': 0x18}},
'name': 'NtQueryVolumeInformationFile',
'return_value': 0L}
Through Tycho, we know that the system call parameter
FileSystemInformationClass
has value 3
, which corresponds to
FileFsSizeInformation
.
And that means FileSystemInformation
points to a FileFsSizeInformation
object.
This object contains the information that is interesting for us.
We can find this 0x18
byte large object at address 0x28F8A8
.
With Tycho, we can easily read the raw bytes from this object.
hexlify(pafish.read_linear(0x28F8A8, 0x18))
'ff2f75000000000055421900000000000800000000020000'
But the 0x18
bytes of output aren't really easy to understand yet, because the
FileFsSizeInformation
is a structure like this:
8 Byte 8 Byte 4 Byte 4 Byte
+----------------+----------------+--------+--------+
|total |free |sector/ |byte/ |
|cluster: |cluster: |cluster:|sector: |
|0x752FFF |0x194255 |0x8 |0x200 |
+----------------+----------------+--------+--------+
0x28F8A8 0x28F8C0
Let's read the individual parts of it step by step with struct.unpack()
.
hex(struct.unpack("<Q",pafish.read_linear(0x28F8A8, 8))[0])
'0x752FFF'
hex(struct.unpack("<Q",pafish.read_linear(0x28F8B0, 8))[0])
'0x194255'
hex(struct.unpack("<i",pafish.read_linear(0x28F8B8, 4))[0])
'0x8'
hex(struct.unpack("<i",pafish.read_linear(0x28F8BC, 4))[0])
'0x200'
Using the following formula:
free space = (free cluster * sector per cluster * bytes per sector) / 0x400³
You get the result 6.3148 GiB. That exactly describes the free space on the C: partition.
Here it is also super easy to change the return value with Tycho. As an example we want to manipulate the system call so that the partition is 300 GiB and still has 150 GiB free space. For this we have to change the total clusters and the free clusters. We need to know how many clusters we need for these values. For this we use the following calculation:
new_cluster = (new size*0x400³) / (byte per sector * sector per cluster)
As a result we get 0x4B00000
total cluster and 0x2580000
free cluster.
pafish.write_linear(0x28F8A8, struct.pack("<Q", 0x4B00000))
pafish.write_linear(0x28F8B0, struct.pack("<Q", 0x2580000))
Using this change, we let the partition to be 300 GiB in size, with 150 GiB of free space, and Pafish reports success.
Summary
Tycho is a cool tool for software analysis. In this blog entry I showed you how easy it is to find, stop, analyze and manipulate system calls with tycho.
Because you can easily get complete information from a system call before and after execution, you can quickly find out a lot about the software. And all this without using a classic debugger.
With Tycho's Python API it is also super easy to script analysis tasks and to integrate it into other tools. Furthermore Tycho exposes much less virtualization artifacts than other off-the-shelf virtualization software, which is a big advantage in malware analysis.