Hello Reader,
In our last post we discussed at a high level the relationship between the $MFT, $LOGFILE and $USNJRNL. In this post we will go into detail of the structures we can recover from each of the three and how they link allowing us to determine the historical changes made to a file or directory.
$MFT - The Master File Table is a pretty well understood artifact. MFT structures are fully documented and there are a variety of tools out there for parsing it. With that said, I'm not going into any depth on how the MFT works but instead just highlight the two structures we are interested in.
(Thanks to Mike Wilkinson for making these MFT data structure diagrams I am referencing below. You can find the full version of his NTFS cheat sheet here http://www.writeblocked.org/resources/ntfs_cheat_sheets.pdf)
The first is the File Record shown below:
When a file is created, modified, or deleted this is the structure that gets added, changed, or updated. The field in the upper right at offset 0x08 labeled $Logfile Sequence Number or LSN is how the MFT refers to the most recent change recorded in the $logfile. Each $logfile record has an associated LSN, however the LSN is updated in the file record to correspond to the most recent change. There is no record that I'm aware of that shows what LSNs a file record previously had. The MFT Record Number is a unique identifier for this file record, and if we have a way to link a change to it then it becomes easy to associate historical changes we recover to indicate which MFT file record they are referencing.
The $USNJrnl keeps the MFT Record Number to indicate which file it is operating on and the Parent Record number to reflect what directory that MFT file record resided in. If a $logfile entry records a change then that change can be easily linked back to the MFT file record number's LSN if it's the last change made to that file record.
The file record however is not the only record/attribute we care about in the MFT for our triforce historical analysis powers, we also care a lot about the Standard Information record shown below:
If a time stamp, owner id or SID of a file changes then it's the standard information block/attribute that gets written to the $logfile and not the entire file record with all its attributes. This was a problem before we found the triforce linkage because as you can see the standard information block does not refer to the file record number. We had to determine which MFT entry a $logfile record was pointing to by either the LSN (which is captured in the Logfile header per recorded change) and hope it hasn't been updated again. Alternatively we could determine the location of the MFT entry by doing some math using the VCN (virtual cluster number) and the MFT cluster ID recorded in the $logfile. Relying on the physical location in the MFT is also problematic because a defrag can remove deleted entries and change the VCN where the entry resides leading to false positives of which $logfile record points to which MFT record.
The good news here is that as you can see at offset 0x40 the standard information attribute does record the update sequence number! The update sequence number in turn will point to the file record number and parent file record number as discussed above. This means that through the link between the $USNJrnl and the $MFT we can associate a change made to the standard information attribute from the $logfile to the $USNJrnl which links back to a specific $MFT file record number. This is a reliable identifier as the file record number's value does not change based on system activity! This then leads us to the $logfile structures.
$Logfile - Every change recorded in the $logfile starts with a header as shown below:
The LSN here relates back to the file record entry inside the MFT for the change that is being recorded. The LSN for a file record in the MFT will be updated to reflect the most current $logfile entry for that file. Meaning the LSN for a file will change with every change recorded. That means than any $logfile entry whose recorded change does not reference either the USN or the MFT record number can only have its corresponding MFT record determined by doing a calculation using the recorded VCN seen at offset 0x48 above.
Why does the $logfile record the VCN? The process of repairing the file system using the $logfile is to overlay the data stored in the $logfile over the areas where a transaction failed to complete successfully. This allows the file system to be rolled back (using the undo records) or have a change reapplied (using the redo records) by just overwriting what previously located at those VCNs.
What comes after the LSN record header will vary on what change took place, the $logfile is storing the raw MFT record/attribute that has been modified so any MFT entry could exist in the $logfile. We focus on the File records and the Standard information attribute records as they reveal the most about changes occurring to a file. There are other MFT records/attributes that could be of interest to you and they also exist in the $logfile. Any change made to a MFT record/attribute will be recorded in the $logfile, the hard part is then referencing that logged change to the actual MFT record being modified to know which file record it relates to. So you can imagine that after every LSN header you have a copy of the MFT record/attribute being changed reflecting its before (undo) and after (redo) states.
Since there are no other $logfile structures other than the LSN header, RCRD header and Restart areas we are reliant on what is being recorded by the MFT record being changed to exactly know which file is being modified. When we are lucky (like we are with file records and $standard_information records) we get a link back to a unique file reference number. When we are unlucky (resident data found in $DATA attribute records) we have to rely on some math using the VCN and MFT Cluster index stored in the LSN header to determine what location within the MFT the record is pointing to. It's this possibility for false positives that keeps these records out of the public version of our $logfile parser.
Note: We will go into even more detail of the $logfile structures when we do the big $logfile post which is coming with the tool release I promise.
$USNJrnl - The USNJrnl or Update Sequence Number Journal has a pretty simple structure compared to the rest we've talked about and is fully documented as shown below:
Sorry no fancy hex offset data structure for this yet, just the record structure as taken from Microsoft's documentation of the USNJrnl. As you can see for purposes of linking back $standard_information structures stored in the $logfile to the MFT we have the matching USN stored here as the sixth element down. Since each USNJrnl entry, and thus each open/close of a file, has a unique USN assigned we have a great lasting artifact to look for when trying to match $standard_information records back to MFT records. The fourth and fifth items in the record entry link back to the MFT for not only the file record number but also the directory the file was located in as seen in the parent record number.
Taken just on its own the USNJrnl is a fantastic source of historical information that more examiners are beginning to utilize, you can get even more information out if it by taking it a step further. If you were to mine out all the unique USN records into a database table you could group them by file reference number to see all the changes including the renaming of a file or its movement between directories.This is because the MFT file record number (shown in the MSDN screenshot above as a reference number) does not change no matter how many times the file record or attributes change. Renaming a file, moving a file, editing its time stamps, filling it with random data, etc... none of these actions will change the file record number. What utilizing the triforce gets us is more granular details of those attribute changes that only exist in the $logfile extrapolated out through the $USNJrnl to a MFT file record.
Putting it all together - So that was a lot of words up there, if you read the last post you got the same information at a very high level but now you can see at a much deeper level how these things sync up. I don't believe that the developers actually intended this relationship to exist, or else I would expect more syncing for more record types stored in the $logfile, we just again get a happy overlap between what a developer made and what analysis can reveal to us.
If you followed everything I wrote above you will see that using the power of the NTFS triforce we can recover and identify:
1. The change of ownership of a file ($logfile)
2. The change of a file's SID (if that were to happen) ($logfile)
3. The changing of timestamps ($logfile)
4. The movement of files between directories ($logfile and $USNJrnl)
5. The renaming of files (common during wiping) ($logfile and $USNJrnl)
6. The summary of actions taken against a file ($USNJrnl)
7. The changing of attributes to a file, important for things like tracking hard links to determine CD Burns ($logfile)
We can do all of these with little chance of error thanks to the combination of these three data sources. Additionally we can recover granular historical changes to files. Depending on your location in the DFIR spectrum (from digital forensics analyst, incident responder to malware analyst or all of the above) you will have different uses for this information. We are very excited about thee triforce and we are extending our $logfile parser to include these sources, of which the $MFT integration was already on our roadmap. Getting the full use out all of this information will require a database and were not sure if SQLLite is up to the task, hope to have something workable out there soon.
In the next blog post I'll talk about how to get access to the $USNJrnl, $MFT and $Logfile from volume shadow copies as not all access methods are equal. After that I'll likely move into updating some old 'what did they take' posts to reflect new artifact sources and post the results of our forensic tool tests.
In our last post we discussed at a high level the relationship between the $MFT, $LOGFILE and $USNJRNL. In this post we will go into detail of the structures we can recover from each of the three and how they link allowing us to determine the historical changes made to a file or directory.
$MFT - The Master File Table is a pretty well understood artifact. MFT structures are fully documented and there are a variety of tools out there for parsing it. With that said, I'm not going into any depth on how the MFT works but instead just highlight the two structures we are interested in.
(Thanks to Mike Wilkinson for making these MFT data structure diagrams I am referencing below. You can find the full version of his NTFS cheat sheet here http://www.writeblocked.org/resources/ntfs_cheat_sheets.pdf)
The first is the File Record shown below:
When a file is created, modified, or deleted this is the structure that gets added, changed, or updated. The field in the upper right at offset 0x08 labeled $Logfile Sequence Number or LSN is how the MFT refers to the most recent change recorded in the $logfile. Each $logfile record has an associated LSN, however the LSN is updated in the file record to correspond to the most recent change. There is no record that I'm aware of that shows what LSNs a file record previously had. The MFT Record Number is a unique identifier for this file record, and if we have a way to link a change to it then it becomes easy to associate historical changes we recover to indicate which MFT file record they are referencing.
The $USNJrnl keeps the MFT Record Number to indicate which file it is operating on and the Parent Record number to reflect what directory that MFT file record resided in. If a $logfile entry records a change then that change can be easily linked back to the MFT file record number's LSN if it's the last change made to that file record.
The file record however is not the only record/attribute we care about in the MFT for our triforce historical analysis powers, we also care a lot about the Standard Information record shown below:
If a time stamp, owner id or SID of a file changes then it's the standard information block/attribute that gets written to the $logfile and not the entire file record with all its attributes. This was a problem before we found the triforce linkage because as you can see the standard information block does not refer to the file record number. We had to determine which MFT entry a $logfile record was pointing to by either the LSN (which is captured in the Logfile header per recorded change) and hope it hasn't been updated again. Alternatively we could determine the location of the MFT entry by doing some math using the VCN (virtual cluster number) and the MFT cluster ID recorded in the $logfile. Relying on the physical location in the MFT is also problematic because a defrag can remove deleted entries and change the VCN where the entry resides leading to false positives of which $logfile record points to which MFT record.
The good news here is that as you can see at offset 0x40 the standard information attribute does record the update sequence number! The update sequence number in turn will point to the file record number and parent file record number as discussed above. This means that through the link between the $USNJrnl and the $MFT we can associate a change made to the standard information attribute from the $logfile to the $USNJrnl which links back to a specific $MFT file record number. This is a reliable identifier as the file record number's value does not change based on system activity! This then leads us to the $logfile structures.
$Logfile - Every change recorded in the $logfile starts with a header as shown below:
The LSN here relates back to the file record entry inside the MFT for the change that is being recorded. The LSN for a file record in the MFT will be updated to reflect the most current $logfile entry for that file. Meaning the LSN for a file will change with every change recorded. That means than any $logfile entry whose recorded change does not reference either the USN or the MFT record number can only have its corresponding MFT record determined by doing a calculation using the recorded VCN seen at offset 0x48 above.
Why does the $logfile record the VCN? The process of repairing the file system using the $logfile is to overlay the data stored in the $logfile over the areas where a transaction failed to complete successfully. This allows the file system to be rolled back (using the undo records) or have a change reapplied (using the redo records) by just overwriting what previously located at those VCNs.
What comes after the LSN record header will vary on what change took place, the $logfile is storing the raw MFT record/attribute that has been modified so any MFT entry could exist in the $logfile. We focus on the File records and the Standard information attribute records as they reveal the most about changes occurring to a file. There are other MFT records/attributes that could be of interest to you and they also exist in the $logfile. Any change made to a MFT record/attribute will be recorded in the $logfile, the hard part is then referencing that logged change to the actual MFT record being modified to know which file record it relates to. So you can imagine that after every LSN header you have a copy of the MFT record/attribute being changed reflecting its before (undo) and after (redo) states.
Since there are no other $logfile structures other than the LSN header, RCRD header and Restart areas we are reliant on what is being recorded by the MFT record being changed to exactly know which file is being modified. When we are lucky (like we are with file records and $standard_information records) we get a link back to a unique file reference number. When we are unlucky (resident data found in $DATA attribute records) we have to rely on some math using the VCN and MFT Cluster index stored in the LSN header to determine what location within the MFT the record is pointing to. It's this possibility for false positives that keeps these records out of the public version of our $logfile parser.
Note: We will go into even more detail of the $logfile structures when we do the big $logfile post which is coming with the tool release I promise.
$USNJrnl - The USNJrnl or Update Sequence Number Journal has a pretty simple structure compared to the rest we've talked about and is fully documented as shown below:
Sorry no fancy hex offset data structure for this yet, just the record structure as taken from Microsoft's documentation of the USNJrnl. As you can see for purposes of linking back $standard_information structures stored in the $logfile to the MFT we have the matching USN stored here as the sixth element down. Since each USNJrnl entry, and thus each open/close of a file, has a unique USN assigned we have a great lasting artifact to look for when trying to match $standard_information records back to MFT records. The fourth and fifth items in the record entry link back to the MFT for not only the file record number but also the directory the file was located in as seen in the parent record number.
Taken just on its own the USNJrnl is a fantastic source of historical information that more examiners are beginning to utilize, you can get even more information out if it by taking it a step further. If you were to mine out all the unique USN records into a database table you could group them by file reference number to see all the changes including the renaming of a file or its movement between directories.This is because the MFT file record number (shown in the MSDN screenshot above as a reference number) does not change no matter how many times the file record or attributes change. Renaming a file, moving a file, editing its time stamps, filling it with random data, etc... none of these actions will change the file record number. What utilizing the triforce gets us is more granular details of those attribute changes that only exist in the $logfile extrapolated out through the $USNJrnl to a MFT file record.
Putting it all together - So that was a lot of words up there, if you read the last post you got the same information at a very high level but now you can see at a much deeper level how these things sync up. I don't believe that the developers actually intended this relationship to exist, or else I would expect more syncing for more record types stored in the $logfile, we just again get a happy overlap between what a developer made and what analysis can reveal to us.
If you followed everything I wrote above you will see that using the power of the NTFS triforce we can recover and identify:
1. The change of ownership of a file ($logfile)
2. The change of a file's SID (if that were to happen) ($logfile)
3. The changing of timestamps ($logfile)
4. The movement of files between directories ($logfile and $USNJrnl)
5. The renaming of files (common during wiping) ($logfile and $USNJrnl)
6. The summary of actions taken against a file ($USNJrnl)
7. The changing of attributes to a file, important for things like tracking hard links to determine CD Burns ($logfile)
We can do all of these with little chance of error thanks to the combination of these three data sources. Additionally we can recover granular historical changes to files. Depending on your location in the DFIR spectrum (from digital forensics analyst, incident responder to malware analyst or all of the above) you will have different uses for this information. We are very excited about thee triforce and we are extending our $logfile parser to include these sources, of which the $MFT integration was already on our roadmap. Getting the full use out all of this information will require a database and were not sure if SQLLite is up to the task, hope to have something workable out there soon.
In the next blog post I'll talk about how to get access to the $USNJrnl, $MFT and $Logfile from volume shadow copies as not all access methods are equal. After that I'll likely move into updating some old 'what did they take' posts to reflect new artifact sources and post the results of our forensic tool tests.