Forensic artifacts: Dropbox

I got a bit waylaid with how Dropbox performs host-level authentication while I was researching and documenting forensic artifacts that Dropbox leaves lying around, but finally have gotten the chance to come back around to finish my research/documentation.  Here’s a summary of my observations:

  • Dropbox binaries are installed into %AppData%\Dropbox\bin instead of the standard %PROGRAMFILES%.  During the install, a number of registry keys were added (13), although they contained no forensically useful data.
  • The Dropbox configuration and state is stored in SQLite files found in %AppData%\Dropbox
    • config.db: contains baseline configuration settings that the Dropbox client references in order to run in a table named config.   Records of interest include:
      • host_id: the authentication hash used by the Dropbox client to authenticate into the Dropbox “cloud.”  This hash is assigned upon initial install/authentication and does not change unless revoked from the Dropbox web interface.
      • email: account holder’s email address.  Can be changed to any value without consequence – set at install/authentication.
      • dropbox_path: actual path to the user’s Dropbox on the local system.
      • recently_changed3: lists the path/filename for the five most recently changed files- this includes files removed/deleted from the Dropbox.  This is probably the only truly useful forensic artifact produced by Dropbox (other than the usual filesystem related artifacts).  The BLOB for this record is text-based and is consistently formatted:
        • text begins with “lp1″, ends with “a.”
        • entries are in order of most recent to least recent and each entry the filename/path is followed by “I00″ and “tp#” (replace # with the order that the file is in + 1, i.e. first entry is followed by “tp2″), separate by line breaks.
        • if the file has been removed/deleted from the Dropbox, the “I00″ text is removed and a “N” is placed in front of the “tp#”.  So, an example of a removed/deleted file is would be:(V41725479:/new file.txt
      • root_ns: appears to be used throughout the Dropbox DBs to reference the base Dropbox path/location.
    • filecache.db: contains a number of tables, but the primary focus is to describe all files actively in the Dropbox (deleted/removed files are removed from this table upon deletion/removal).  Tables and records of interest:
      • file_journal: includes the filename, path, size (in Bytes), mtime (file modified time, in Unix/POSIX format), ctime (file created time, in Unix/POSIX format), local_dir (flag indicating whether the entry is a directory), and more (mainly unpopulated).
      • block_ref: maps file IDs (fj_id) to file hashes (hash_id) found in the block_cache table.
      • block_cache: hash id (id) and hash.  Hash is of an unknown format and did not match up with anything I could generate using standard tools.
      • mount_table: appears to list folders that are shared with other Dropbox users.
    • host.db: actually not a SQLite database but contains what looks to be a hash of some sort (possibly SHA-1?) and the dropbox path (dropbox_path in config.db) encoded in base-64.  The entire file may be encoded in base-64 (basing this on a few Dropbox forum postings I read), but the first part of the file does not decode into anything human readable or match any other fields that I observed in the other DBs.
    • sigstore.db: stores hash values which correspond to the values found in the block_cache table in filecache.db.
    • unlink.db: appears to be a binary file and is not a SQLite database.  Format and purpose is unknown.

Honestly, short of the recently_changed3 record in the config database, there really isn’t a significant number of useful forensic artifacts generated by Dropbox.  Given Dropbox writes to the local filesystem, your standard filesystem analysis steps will encompass files stored/synced into a subject’s Dropbox; but perhaps, under certain circumstances, the recently_changed3 record and/or the Dropbox ctime/mtime entries for files could come in handy…

Happy Forensicating.

Searching and extracting data from PST files

Keyword searches can be a significant aspect of an investigation and given the prevalence of Microsoft Outlook you’ll most likely find yourself needing to search through PST files for data, be it a simple keyword or more complex pattern.  Even though you can use Outlook to open up a PST file, my personal preference is not to do the search within Outlook itself for two primary reasons:

  1. Outlook will change data within the PST file; of course, you’re working on a copy – but I prefer to not have dynamically changing data (i.e. unread/read status, etc) when I’m doing my analysis.
  2. If you’re wanting to find data matching a certain pattern (i.e. Regular Expressions) or data that is not within the message body (i.e. message header data), Outlook does not really have the facilities to support these kinds of searches.

Of course, there are several commercial investigative tools that will parse through and allow you to search PST files (FTK and Encase come to mind) but in this post I’m going to focus on performing the extraction and search with only free tools in a Linux environment.

What you’ll need:

  • A relatively up-to-date Linux system (be it physical or VM).
  • Readpst compiled/installed (in Ubuntu: apt-get install readpst) – readpst is a utility included with libpst which can be found here.

Also, I’m going to begin by assuming that you’ve acquired the PST file in a forensically sound fashion and that a copy of the file is accessible on your Linux system.  Let’s get started…

Extracting data from a PST file using readpst

Run readpst on the PST file to extract all objects within the PST (i.e. messages/attachments, calendar entries, contacts, etc).  By default, readpst exports data in mbox format – this ends up placing all of the extracted objects into a set of mbox files (one per subfolder), which can make extracting objects that match a search criteria a bit tedious.  Instead, we’re going to tell readpst to write each object into its own file, the command looks like:

readpst -S -o out/ outlook.pst

Where out/ is the directory where you’d like readpst to output the files and outlook.pst is the PST file that you’re extracting data from.  The -S flag indicates that you’d like readpst to extract each object separately, rather than in mbox format.

Once readpst has finished, in your output directory you’ll find a directory structure that matches the folder structure of the PST (generally starting with a base directory of Outlook).  Within each of these folders you’ll find numerically named files that contain plain text representing the exported object (i.e. for a email message you’ll find the message body, headers, etc).

Working with the extracted data

Thanks to readpst, it is quite trivial to extract all data within a PST file into a nicely organized (and basically human readable) set of files and at this point you can begin processing these files as you would any other text file.  For example, a commonly seen forensic task would be to search all objects within a PST for certain keywords or perhaps a pattern.  As an example of pattern matching, let’s say you were investigating a PII incident and you wanted to see whether a subject had utilized email to send or receive emails that appear to contain social security numbers.  You could use grep to search the files within the directory structure that readpst created with the following command:

grep -R -P '\b(?!000)(?!666)([0-6]\d{2}|7([0-6]\d|7[012]))([ -])?(?!00)\d\d([ -|])?(?!0000)\d{4}\b' out/

This is telling grep to run a recursive search using a regular expression which will match numbers that look like SSNs in the readpst output directory.  From there, you could even automate this process using a script to automatically move matching messages to a target folder that you could manually validate (or whatever the next step of you given workflow is).

As you can see, forensically analyzing PST files using freely available software is quite easy and can be a very powerful method for efficiently extracting case-pertinent data.  Give it a try sometime…

On a side note, I’ve added a new Resources section to my blog and one of the pages contained within this section is dedicated to listing useful regular expressions (such as the SSN matching regular expression I used above).  Right now, that is the only one I have up there, but I’ll keep adding to this page as I think of other useful regular expressions, so check back regularly.