Hola Reader,
There are always bigger hard drives, at least for the near future, and more devices available to individuals every day. While the everyday work flow for an individual is drastically improved, hopefully, by obtaining this additional storage and processing power it only further compounds the amount of evidence a computer forensic examiner has to go through. It's not uncommon now for a single person to have over a terabyte of data between their laptop, desktop, mobile phone, tablet, hosted email, file sharing and backup services. The question then becomes how do we keep up with the larger volumes of data without allowing each individual to take a month to process.
There are several options out there, depending on your choice of tools (encase, SIFT, FTK, Prodiscover, SMART, etc...) to really speed up the processing of the evidence so you can get full text indexes built and artifacts parsed. In most cases the answer comes down to:
1. Faster CPUs to process the evidence
2. More RAM to process the evidence
3. Distributing the workload across multiple systems (if your software permits this)
4. Faster storage to hold the evidence
In my lab we've approached this from all of these points.
1. Faster CPUs to process the evidence
We have multi cored, multi processor CPUs on our systems. What we've found is that typically, unless we are password cracking, that the I/O from the disks can't keep up with resources available. Meaning our CPUs are never maxed out. So the CPUs are not the bottleneck for getting more speed.
2. More RAM to process the evidence
We have machines with up to 49GBs of ram, under large loads we sometimes see usage into the 30GBs but again because of I/O from the disks more than enough is available. So the RAM is not the bottleneck for getting more speed.
3. Distributing the workload across multiple systems (if your software permits this)
We've tried using FTK's feature that allows multiple systems to distribute processing across them. The problem comes in that the evidence is still being stored and accessed from system. Again the issue comes up the I/O from the disks can't keep up with amount of requests being made from the additional systems now requesting data to process over the network. So distributed processing alone did not speed up processing.
4. Faster storage to hold the evidence
Here we find the greatest benefit. Initially we bought large direct attached storage RAIDs to store and process our evidence. While these systems get us I/O speeds around 100-200MBs depending on the type of disks it still wasn't enough speed to max out the CPUs and RAM. So we started looking around at other options. When you start looking at larger/faster storage systems things can get very expensive, very quickly.
If we had a very large budget we could have gone for a RAMSAN, http://www.ramsan.com/, and gotten a couple terabytes of storage at 4GBs a second read and write speeds. That kind of I/O would clearly max out most systems attached to it and easily keep up with the demands of a distribute processing system.
Unfortunately we don't have that large of a budget for storage, especially not with the amount of data that we working with. A RAMSAN 810 according to this article, start at $45,000 as of 8/23/11. So we look at the midline prosumer range of solid state storage as they are faster than similar SAS 15k drives but priced around the same in the most cases for larger sizes. Most prosumer SSD disks can go up to 300-400MBs a seconds and are connected via SATA meaning you could easily turn them into a small RAID and load your images there. However again the cost of doing this can quickly scale up depending on the amount of storage you need to hold the evidence.
Instead we have been implementing is a middle of the road approach. Instead of loading the entire evidence set into faster storage we purchased a PCI-E based SSD card from amazon with read speeds of 1.5GBs and write speeds of 1.2GBs it has a smaller amount of storage (we opted for 240gbs) but allows the most heavy part of the processing (dealing with all the data extracted from the forensic images) to be done on the fastest storage. To accomplish this in FTK we pointed the ADTemp directory to the PCI-E SSD card and our processing speeds improved dramatically. We were able to complete a full text index of a 149GB full forensic image in 1 hour. This test wasn't even optimal as the forensic image wasn't even copied onto faster RAID storage but instead was just attached via USB3.
We've since ordered PCI-E SSD cards for all of our evidence processing servers and will post most benchmarks as we move forward in our testing and processing. I would also like to expand our SSD storage to include the evidence storage media and database media but I'm doing this one step a time to find out what parts are getting me the largest increases in speed for the dollar.
Paul Henry (@phenrycissp on twitter) has already taken this a step further by putting all his pieces onto SSD SATA storage and integrating in the PCI-E SSD card for his temporary directory. His new issue is finding test images large enough to show meaningful results since its' processing so quickly So if you have a test image 100GBs or larger, please let him know!
There are always bigger hard drives, at least for the near future, and more devices available to individuals every day. While the everyday work flow for an individual is drastically improved, hopefully, by obtaining this additional storage and processing power it only further compounds the amount of evidence a computer forensic examiner has to go through. It's not uncommon now for a single person to have over a terabyte of data between their laptop, desktop, mobile phone, tablet, hosted email, file sharing and backup services. The question then becomes how do we keep up with the larger volumes of data without allowing each individual to take a month to process.
There are several options out there, depending on your choice of tools (encase, SIFT, FTK, Prodiscover, SMART, etc...) to really speed up the processing of the evidence so you can get full text indexes built and artifacts parsed. In most cases the answer comes down to:
1. Faster CPUs to process the evidence
2. More RAM to process the evidence
3. Distributing the workload across multiple systems (if your software permits this)
4. Faster storage to hold the evidence
In my lab we've approached this from all of these points.
1. Faster CPUs to process the evidence
We have multi cored, multi processor CPUs on our systems. What we've found is that typically, unless we are password cracking, that the I/O from the disks can't keep up with resources available. Meaning our CPUs are never maxed out. So the CPUs are not the bottleneck for getting more speed.
2. More RAM to process the evidence
We have machines with up to 49GBs of ram, under large loads we sometimes see usage into the 30GBs but again because of I/O from the disks more than enough is available. So the RAM is not the bottleneck for getting more speed.
3. Distributing the workload across multiple systems (if your software permits this)
We've tried using FTK's feature that allows multiple systems to distribute processing across them. The problem comes in that the evidence is still being stored and accessed from system. Again the issue comes up the I/O from the disks can't keep up with amount of requests being made from the additional systems now requesting data to process over the network. So distributed processing alone did not speed up processing.
4. Faster storage to hold the evidence
Here we find the greatest benefit. Initially we bought large direct attached storage RAIDs to store and process our evidence. While these systems get us I/O speeds around 100-200MBs depending on the type of disks it still wasn't enough speed to max out the CPUs and RAM. So we started looking around at other options. When you start looking at larger/faster storage systems things can get very expensive, very quickly.
If we had a very large budget we could have gone for a RAMSAN, http://www.ramsan.com/, and gotten a couple terabytes of storage at 4GBs a second read and write speeds. That kind of I/O would clearly max out most systems attached to it and easily keep up with the demands of a distribute processing system.
Unfortunately we don't have that large of a budget for storage, especially not with the amount of data that we working with. A RAMSAN 810 according to this article, start at $45,000 as of 8/23/11. So we look at the midline prosumer range of solid state storage as they are faster than similar SAS 15k drives but priced around the same in the most cases for larger sizes. Most prosumer SSD disks can go up to 300-400MBs a seconds and are connected via SATA meaning you could easily turn them into a small RAID and load your images there. However again the cost of doing this can quickly scale up depending on the amount of storage you need to hold the evidence.
Instead we have been implementing is a middle of the road approach. Instead of loading the entire evidence set into faster storage we purchased a PCI-E based SSD card from amazon with read speeds of 1.5GBs and write speeds of 1.2GBs it has a smaller amount of storage (we opted for 240gbs) but allows the most heavy part of the processing (dealing with all the data extracted from the forensic images) to be done on the fastest storage. To accomplish this in FTK we pointed the ADTemp directory to the PCI-E SSD card and our processing speeds improved dramatically. We were able to complete a full text index of a 149GB full forensic image in 1 hour. This test wasn't even optimal as the forensic image wasn't even copied onto faster RAID storage but instead was just attached via USB3.
We've since ordered PCI-E SSD cards for all of our evidence processing servers and will post most benchmarks as we move forward in our testing and processing. I would also like to expand our SSD storage to include the evidence storage media and database media but I'm doing this one step a time to find out what parts are getting me the largest increases in speed for the dollar.
Paul Henry (@phenrycissp on twitter) has already taken this a step further by putting all his pieces onto SSD SATA storage and integrating in the PCI-E SSD card for his temporary directory. His new issue is finding test images large enough to show meaningful results since its' processing so quickly So if you have a test image 100GBs or larger, please let him know!