Hasty Scripts: Summarizing Installed Applications on Encrypted VMs

Virtual machines are everywhere, no longer just confined to the corporate environment. It is not unheard of for consumers to use virtualization software on their personal devices these days. Examination of these VMs does not differ much from normal host investigations. Tools, like FTK Imager, support common virtualized HDDs and can be used to preserve and review them. But what happens when a VM is encrypted and password-protected? Compound the issue with an uncooperative custodian and it may be time for more creative solutions. Thankfully, VMware, a popular virtualization program, can create an artifact on the host operating system that gives us insight into what applications are installed on the VM.

This is a by-product of VMware’s Unity mode and requires that VMware tools be installed on the VM. Unity mode previously supported both Windows and Linux guest operating systems, but now only supports Windows guest operating systems. Of interest, a folder called GuestAppsCache is created in the VM’s folder regardless of if the user ever used Unity mode or not. Within this folder is a sub-folder called appData containing a number of files with the .appinfo and .appicon extensions. Each .appinfo binary file correlates to an application installed on the VM and includes application name and file path details. Likewise, for each .appinfo file an .appicon file exists and, with some modification, is a PNG icon of said application. An example of this directory can be seen below.

GuestAppsCache

The names of these files are simply the MD5 checksum of the application’s file path as recorded within the .appinfo file. The create and modify timestamps of these files correlate to when the application was installed within the VM. They do not necessarily give insight into when the application was first or last executed. What do we do with potentially hundreds of binary files containing relevant information? Create a script to do all of the legwork for us, naturally.

Example .appinfo File

Before writing a script, we need to understand the structure of these binary files and the best method we should employ to parse them. We would need to examine a few of them before we could definitively identify the patterns highlighted in the example below. We will use these patterns to process these files.

Example_appinfo_file

Below are the patterns observed after reviewing a few of these files. The bullet-points are numbered to match the corresponding elements in the figure. The underlined lines are the pieces of information we will extract from this file:

  1. The first 11 bytes of the file can be skipped.
  2. The 12th byte is an 8-bit integer and the length of the application name immediately following it.
  3. After the application name, the next non-zero 8-bit integer will be the length of the application path which immediately follows it.

The .appicon files will require less logic to process. After viewing a few with a hex editor, it was apparent these were PNG files due to their file signature 0x89504E47 appearing 29 bytes within the file. For these files, we will simply remove the first 28 bytes so that the PNG file signature is where it should be (the beginning of the file) and change the name and extension of the icon file.

The vm_summarizer.py Script

Let’s begin discussing how the script works now that we have established the goals of the script and the structural footholds we can rely on. We will mainly focus our attention on three functions of interest: processFiles, processPhotos, and csvWriter. The full code can be found on the Github repository of the same name. When running the script, the user will supply an input directory and an output directory. Optionally, they can supply the –photos flag to process the application icons instead of just the application details. By default, the script will only process the .appinfo files and output the data to a CSV file. The input directory should be the “GuestAppsCache” folder of interest (e.g., D:\Virtual Machines\myVM\caches\GuestAppsCache).

appdata = os.path.join(in_dir, 'appData')
if os.path.exists(appdata) is False:
    print '[-] {} directory does not exist.'.format(appdata)
    sys.exit(1)

apps = [x for x in os.listdir(appdata) if x.lower().endswith('.appinfo')]
print '[+] Processing {} APPINFO file(s) in {}'.format(len(apps), appdata)
data = processFiles(apps, appdata)

The files we are interested in are located within the “appData” sub-folder. We will join this to the input directory on line 1. We then check if the directory exists using the os.path.exists() function. If it does not, we use the sys.exit(1) statement to exit the script with an error. Using the sys.exit() function with any non-zero integer is indicative of exiting with an error.

On line 6, we use list comprehension to obtain a list of all files in the “appData” directory ending with the .appinfo extension. List comprehension is a fantastic method of creating lists. They can appear a little complicated at first – here’s an example of creating that same list the “long” way.

apps = []
for input_file in os.listdir(appdata):
    lower_file = input_file.lower() # Converting string case to lower-case
    if lower_file.endswith('.appinfo'):
        apps.append(input_file)

List comprehension is a powerful and useful tool to instantiate lists with data. However, the same feat can be accomplished in a few more lines of code. However we arrive at our list of .appinfo files, we pass it and the appdata input directory to the processFiles function on line 8 of the previous code block.

vm_summarizer.py – processFiles function

On line 3, we create the data dictionary which will store the processed results. We then begin iterating through each .appinfo file in the list. Remember, each file name is the MD5 checksum of the application file path. For our purposes, we can use this as a unique key for the data dictionary. We create this key by splitting off the .appinfo file extension and only keeping the MD5 checksum on line 5. We then use that key and create a nested dictionary and setup placeholder values which we will eventually add.

def processFiles(apps, appdata):
    [snip]
    data = {}
    for app in apps:
        app_key = app.split('.appinfo')[0]
        data[app_key] = {'File Create Date': '', 'File Modify Date': '', 'App Name': '', 'App Path': ''}

        create = datetime.utcfromtimestamp(os.path.getctime(os.path.join(appdata, app))).strftime('%m/%d/%Y %H:%M:%S')
        modify = datetime.utcfromtimestamp(os.path.getmtime(os.path.join(appdata, app))).strftime('%m/%d/%Y %H:%M:%S')

Next, on lines 8 and 9, we use a few functions to extract the create and modify timestamps and convert them into a formatted date string. First, we append the .appinfo file to the “appData” directory as we have seen before. The resulting file path string is fed into the built-in os.path.getctime() function. This function returns an integer representing the create date of the file as a UNIX timestamp. We then pass that UNIX timestamp to the datetime utcfromtimestamp() method which will convert the UNIX timestamp without applying a timezone. Lastly, we use the datetime strftime() method to create the formatted date string using datetime directives1.

There’s a lot going on in between those parentheses, but working inside out is the best means of understanding compact lines like that. Now, let’s introduce the struct library2. This is the quintessential python library that every Python forensicator should be familiar with. This library can be used to interpret binary data as various types (strings, 8-bit integers, etc.)

Before we can use struct, we need to open the file to read. We do this on line 1 and open the file in “rb” mode or read binary mode. As we discussed earlier, the first 11 bytes of these files are not necessary for our purposes. We skip them using the seek() method, where the first argument is the number of bytes to seek to and the second argument is where to seek from. Using a value of 0 indicates we wish to seek from the beginning of the file (as opposed to 1 – from the current position or 2 – from the end of the file).

Struct can seem somewhat obtuse at first glance. We will often use the unpack() method to extract data types from binary data. The unpack() method requires two arguments, a string which tells struct how to interpret the binary data and the binary data itself. It is critical that the number of bytes your struct string dictates is the number of bytes you give struct to interpret.

        with open(os.path.join(appdata, app), 'rb') as app_file:
            # Skip to the 11th byte
            app_file.seek(11, 0)
            name_size = struct.unpack('B', app_file.read(1))[0]
            name = struct.unpack('{}s'.format(str(name_size)), app_file.read(name_size))[0]

For example, on line 4, we use the “B” format character to specify the binary data should be interpreted as an 8-bit unsigned integer. This then requires us to supply struct with one byte of data from the app_file object as the second argument (review the linked struct documentation for a table of all format characters and their size).

The unpack() method returns a tuple, no matter if you only supplied one format character as we did. In this case, we only want the object in index zero of the tuple which would be our 8-bit integer. Let’s say that from the example .appinfo figure we saw previously the app name is “CCleaner”. This is an 8 character long string and so the byte preceding it, which we just interpreted as an 8-bit integer with struct, will reflect that and be the number 8.

On line 5, we again use struct but this time with the “s” format character. And, we use the format() method to place a number in front of it. The “s” character will interpret binary data as a string array. Placing a number in front of the character, in our example, this would be the integer 8, specifies the next 8 bytes to be read as a string. And just like that we now have successfully extracted the application name from the .appinfo file.

            while struct.unpack('B', app_file.read(1))[0] == 0:
                continue

            # Skip back to the non-zero byte
            app_file.seek(-1, 1)
            path_size = struct.unpack('B', app_file.read(1))[0]
            path = struct.unpack('{}s'.format(str(path_size)), app_file.read(path_size))[0]

We need to skip to the next non-zero byte to extract the length of the application path string. We do this with a while loop where we read each successive byte and check if it is equal to the integer 0. If it does, we continue to read more bytes. When it does not, we break out of the while loop and execute line 5. On line 5, we use seek() again (this time seeking from our current location) back one byte to get to the first non-zero byte that we just read when we broke out of the while loop.

Lines 6 and 7 use the same technique as previously discussed to first read the size of the path string and then read the path string itself. All that is left for us to do now is to store all of the extracted data into the data dictionary using the app_key to specify the nested dictionary specific to this file. Notice that we pass the path variable through the urllib3 unquote() method prior to adding it to the dictionary. This method replaces common HTML escape characters by their single character equivalent. For example, you have likely seen “%20” characters in URLS substituted for spaces. The unquote method would take those “%20” characters and convert them back to spaces in the string.

            data[app_key]['File Create Date'] = create
            data[app_key]['File Modify Date'] = modify
            data[app_key]['App Name'] = name
            data[app_key]['App Path'] = urllib.unquote(path)

    return data

After each .appinfo file is handled in this manner we return the data dictionary back to the main function.

vm_summarizer.py – processPhotos function

If the user invokes the –photos flag when running the script the processPhotos function will execute. This function takes the data dictionary we just created, a list of .appicon files, and the input and output directories. We will first use the os.path.join() function to append a “Photos” sub-folder to the directory string. If this “Photos” sub-directory does not exist, we use the os.makedirs() method on line 5 to create that sub-directory and any others in the path which do not currently exist.

def processPhotos(data, icons, appdata, out_dir):
    [snip]
    out_photo_dir = os.path.join(out_dir, 'Photos')
    if not os.path.exists(out_photo_dir):
        os.makedirs(out_photo_dir)

Next, we begin iterating through each of the .appicon files in the icons list. This icons list was created, by the way, using the same one-liner list comprehension statement we saw before but modified for the “.appicon” file extension rather than “.appinfo”. Wrapping the enumerate method around an iterable object like a list allows us to use the first variable in the for loop, “a”, as an integer which increments with each round of the for loop. You’ll see why this comes in handy shortly.

Like before, we need to obtain the app_key for this file, as it shares the same MD5 checksum as its .appinfo counterpart. With the key extracted on line 2, we can pull out the name of the application associated with this icon. When we create the new icon file we will rename it so that it matches the application name. On line 4, we create the out_photo variable which represents the new name of the application icon file. Notice we append the loop number, “a”, to the end of the file along with the appropriate PNG file extension. This is to avoid overwriting files with the same application name as each other.

    for a, icon in enumerate(icons):
        app_key = icon.split('.appicon')[0]
        app_name = data[app_key]['App Name']
        out_photo = os.path.join(out_photo_dir, app_name + '_' + str(a) + '.png')
        in_photo = os.path.join(appdata, icon)
        with open(out_photo, 'wb') as outfile:
            with open(in_photo, 'rb') as infile:
                infile.seek(28, 0)
                outfile.write(infile.read())

Finally, we are ready to write the output application icon and open a “wb” file handle for it. We also open a file handle in “rb” mode for the original application icon file and seek to the 28th byte. To finish this function, we write bytes 29 and onward from the original application icon file to the output application icon file. The “Photos” output directory should contain thumbnails like those captured in the figure below.

photos_output

Let’s now turn our attention back to the data dictionary we created and write that out to a CSV file.

vm_summarizer.py – csvWriter function

Like most csvWriter functions we create, this is a fairly simple one. If you have stored your parsed data in a logical method than getting it out to another format should be straightforward. That’s the goal at least. Last time we used the regular csv writer. This time, since we have relied so heavily on dictionaries, we will use the csv DictWriter. It operates substantially the same, but comes with a few extra feature and rules and makes writing dictionaries to a CSV file easier.

def csvWriter(data, output_dir):
    [snip]
    with open(os.path.join(output_dir, 'VMware_GuestAppsCache.csv'), 'wb') as csv_file:

        writer = csv.DictWriter(csv_file, fieldnames=['App Name', 'App Path', 'File Create Date', 'File Modify Date'])
        writer.writeheader()
        for app in data.keys():
            writer.writerow(data[app])

First, we need to open the CSV file in the output directory in “wb” mode. Recall that in Python 2.X, CSV files must be opened in “wb” mode to avoid writing intervening blank rows in your output. Next, on line 5, we create the DictWriter object. The fieldnames keyword argument must reflect all of the keys in the dictionary we will give it to write. Otherwise, it will throw an error if it encounters a key not specified in the fieldnames list. Also note, this list dictates the column order of the dictionary keys.

Since the fieldnames are specified, we can use the writeheader() function to write the column headers to the CSV. Then we iterate through each app in the data dictionary, and use the writerow() method and supply it the nested dictionary. The CSV generated should look something like the one shown in the figure below.

csv_output

That completes our discussion of this script. Much more could be said about the struct library. It is one that I find myself frequently using to interpret binary data and one I recommend familiarizing yourself with due to its immense utility in our field. Having issues with the script or questions about how a particular bit works? Feel free to ask below.

Hasty Scripts: Summarizing Installed Applications on Encrypted VMs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s