Howdy Reader,
I'm in Austin for the DFIR Summit, but the daily blogs must continue! Yesterday we had a particularly challenging Sunday Funday regarding detecting web server log tampering. We had a couple contenders and the winner this week is Jacob Williams! Here is Jacob's winning answer:
This was a great answer! Jake certainly showed a mastery of web log analysis in his response and I hope you will get some good pointers here for your own log analysis.
This Sunday Funday was based on a real case, ILS v Partsbase, that I worked on for three years back in 2003-2006. The case went to a jury trail where we were able to successfully show that 3 years of web logs were altered to make it appear as though our client had made 1.6 million unauthorized accesses over 3 years.
The case came together in a series of steps that Jacob highlighted in his answer but that I want to highlight and explain.
The milestone series resumes tomorrow!
I'm in Austin for the DFIR Summit, but the daily blogs must continue! Yesterday we had a particularly challenging Sunday Funday regarding detecting web server log tampering. We had a couple contenders and the winner this week is Jacob Williams! Here is Jacob's winning answer:
Wow, I wish I had access to the server. Text based log files are one of the few places where slack space analysis can be a benefit. There's plenty of room to potentially find evidence of log tampering (especially if the tampered log is smaller than the original). Over three years of web server logs, I'd hope to find SOMETHING in slack space if logs were manipulated.
The first thing to check are time series within the logs. By time series, I mean does every log entry come after the one before it. This is one place that people totally screw up when modifying logs. I actually have written scripts to check this in various formats and again, it's a place that inexperienced forgers get caught.
Timestamps on the logs might also be useful, though less reliable depending on how you were provided the logs. W3C formatted logs begin anew each day. Obviously you want to check for the timestamps to be consistent with the dates of the logs. Again, depending on how you were provided the logs (FAT formatted thumb drive for instance), the file timestamps may not be usable.
A piece of the case that isn't specified is whether the suspect has a static IP address. Obviously we'll want to correlate log entries to that static IP if one exists. If the user has a dynamic IP, check the range to make sure it is consistent with his ISP. Three years is probably too far back to subpoena DHCP logs from the ISP, but get as much as you can.
GoGo InFlight Internet service has sort of screwed up this next one, but I want to get the suspect's travel records to identify time periods when he couldn't have had access to the Internet to make the illicit logins. Times when a suspect is in the air, etc. are great. Is the suspect a public speaker? Check the logs for times when he was speaking. I like to think I'm talented, but I have a hard time hacking websites and teaching SANS FOR610 at the same time (even if I do know the material like the back of my hand). Find as many instances of these time issues as possible. It might be conceivable that the suspect violated the laws of time and space once, but thirty times? Fifty times? Come on, this isn't an episode of Fringe. Of course the attacker could have been creative with his Internet access or set an automated timed attack, but let the plaintiff prove this. Just as in the possibility of tampering with forensic data, the simple possibility of isn't sufficient to say it happened.
One of the more technical approach I'd take would be analysis of actual usage patterns. Are the suspect's usage patterns (i.e. pages accessed) consistent with what the plaintiff is alleging? Does the defendant magically skip the login screen and go directly to authenticated access when everyone else must login through login.aspx? Stuff like this can be an indicator that the logs have been tampered with.
Some of the rest depends on the style of the web application and the verbosity of logging. If the logs contain some sort of session ID (in the URL perhaps) we should analyze how this session ID is generated. If it is completely random, do the logs ever show our suspect using the same ID? If so, the odds of hitting the same random session id are nil to none. Go buy a lottery ticket. Another thing we often see are time based session IDs where the IDs increase over time. Again, make sure that the IDs are increasing over time for the user's login. If the web application places a time stamp in the URL to prevent replay attacks, make sure that the URL timestamps are consistent with the log timestamps. Also check that they are increasing.
We also want to check the user agents being recorded. Is your suspect a total techno-tard but his user agent indicates Linux? Mac user agent, but the user doesn't own a Mac? Look for accepted languages in the HTTP requests. If the user is in America and doesn't speak Chinese, then the accepted language in the HTTP headers probably won't be Chinese.
Anomalies in the logs are also something to check for. Did the suspect's log entries happen at a particular time? One of the ways people screw up forging logs is to change a legitimate log entry to cover up illicit activity. In this case, check other user's patterns of behavior. Does user X always log in between 0900 and 1100, but fails to on days when our suspect logs in at the same time (and coincidentally performs the same actions)?
One of the final things I'd check would be whether the web application logs both a text based userID and a numeric user ID. We want to make sure that these are always consistent with one another and never reused in the logs.
As a side note, I'd also want to subpoena the web server configuration and web application to audit the code. If this is a high profile case, it's worth performing tests on the web application to ensure that the expected logging matches the actual logging.
To extend my answer a little, I'd like to add to check the Cookie and Referrer fields in the W3C formatted logs (if those attributes are being logged).
Either attribute could highlight an anomaly in forged records. For instance, cookie values could point to session IDs inconsistent with those in the GET request (or assigned simultaneously to another user).
Referrer fields are another issue entirely. Web applications usually have a fairly static content flow. If the referred fields for our suspected forged records are inconsistent with those of legitimate users, we may have found the smoking gun. This again underscores the need to subpoena the web application for testing. If the plaintiff claims the logs are damning "because that's how the custom web app works" we should have cause to examine the web application to determine logging fringe cases/inconsistencies.
This was a great answer! Jake certainly showed a mastery of web log analysis in his response and I hope you will get some good pointers here for your own log analysis.
This Sunday Funday was based on a real case, ILS v Partsbase, that I worked on for three years back in 2003-2006. The case went to a jury trail where we were able to successfully show that 3 years of web logs were altered to make it appear as though our client had made 1.6 million unauthorized accesses over 3 years.
The case came together in a series of steps that Jacob highlighted in his answer but that I want to highlight and explain.
- Compare session states, in my case I was lucky and the developer decided to store the cookies assigned in the web logs and they contain an environmental variable for the users IP address. In my case the IP stored in the cookie never matched the IP recorded in the log.
- Review the the user agents, Using the user agents we were able to pull out 1,100 different devices suddenly associated with my clients outgoing IP address.
- Review the user agents for browser customization. This was the most critical aspect in getting the jury to understand what occurred. There was a public and a private web site with two different sets of logs. If you have a standard image rolled out to your systems you may see a message on the browser bar (IE specifically) that says something like 'Provided by HECFBlog'. This information is passed on in the UserAgent.
- Put all your logs into a database for better analysis and cross queries. With distinct customized browser user agents located , for example 'University of Some State' I was able to reconstruct sessions between the two logs and show that when a visit to the public side was made it contained an IP address belonging to the University but when an image request was made to the private side my clients IP address would suddenly appear in the logs as the requester.
The milestone series resumes tomorrow!