Unmasking the Dreaded Traffic Logfile
Web Marketing Today, Issue 65, December 1, 1999
Hidden deep beneath the calm exterior of your website is a writhing, pulsating information instrument, ever lengthening, ever growing, lusting for liberation from the nether worlds of its cyber existence, longing to be recognized as a full partner in your marketing strategy. Its yearnings have been frustrated until now. Dare you unlock the tightly locked and release ... the tormented logfile?
I had to get your attention, because this subject is dreadfully boring otherwise. And scary. I'll try to reassure you when you're afraid.
Nevertheless, you need to understand and learn how to use a logfile. Every time someone visits a page on your site, a record is made. Though logfile formats vary, these elements are pretty common.
Here's the contents of a single line of the 1.7 MB logfile from my Christian Articles Archive site (http://www.joyfulheart.com). It uses the NCSA combined logfile format. (Take a look at a sample snippet to see what it's like.) We'll look at each piece:
206.102.195.149 - - [30/Oct/1999:23:07:08 -0700] "GET /misc/newton.htm HTTP/1.1" 200 12794 "http://www.looksmart.com/r_search?key="Amazing+Grace"" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"
User Address
This is the IP address of the visitor to my site. When you log on to the Internet, your ISP typically assigns you a unique IP number or address to use during your online session. This is what tells a website where to send the HTML and graphics requested by your Web browser. When you log off from the Internet, that IP number goes back into a pool of available numbers to be assigned to another of the ISP's customers. IP stands for Internet Protocol. In the example above the IP number is 206.102.195.149. If you do a reverse DNS look-up on this IP number at OSI-Lab in Zurich (http://www.osilab.ch/services/dns_e.htm), the result is a37.pm3-30.theriver.com which belongs to "Arizona's premier Internet Service Provider." You really can't go further than that to identify a particular person, though you might be able to track your competitor scoping out your site. :-)
Date/Time
The exact time of the log file, combined with the IP address enables you to follow a particular visitor sequentially from page to page on your site.
GMT offset
This is the number of hours from Greenwich Mean Time (GMT). So in our example the offset is 7 hours from GMT.
Action
This is either GET or POST. Except for a few CGI programs, this will typically be GET. That is, get a web page or an image that goes on that page. Take a look at our example:
"GET /misc/newton.htm HTTP/1.1"
This records a command from our Arizona visitor's browser to GET a web page at the URL http://www.joyfulheart.com/misc/newton.htm using a protocol named HTTP/1.1. This is the basic HTML page for an article about the song "Amazing Grace" by John Newton, America's favorite hymn.
Return Code
The next item tells whether the action was successful or not. Our example is a return code of 200, which means "Success. Okay." You've probably got the dreaded 404 "Failed. Not found" error code when the web page you were trying to find wasn't at that URL, so these return codes aren't entirely new to you.
Size.
This is the size of the file sent, in the case of our example 12794 bytes.
Referrer.
This tells us the webpage where our visitor came from. In our example:
"http://www.looksmart.com/r_search?key="Amazing+Grace""
Someone found this page using the LookSmart search engine, using the keywords "Amazing Grace." Some logfile formats separate the access from the referrer logs, but I think it's valuable to have this information together in the combined format.
Browser/Platform.
The final field in the logfile tells us what web browser and operating system our visitor is using.
"Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"
Mozilla is a code name that indicates this browser is Netscape-compatible. Our visitor was using Microsoft Internet Explorer version 4.01 on a Windows 95 operating system.
Placing All the Pictures
First, my webserver GETs or opens the webpage called for newton.htm, and then GETs each of the gif images that go on that page, downloads them, and displays them at their appropriate spots. Take a look at the process -- and don't be afraid, I'm right here. I'll abbreviate each line to make it simple:
GET /misc/newton.htm
GET /css/caa.css
GET /joypic/celtic-x.gif
GET /joypic/caa-bak.gif
GET /joypic/caa-go.gif
GET /joypic/caa-spacer.gif
GET /joypic/caa-top.gif
GET /joypic/caa-line.gif
GET /caa-images/slavship.gif
GET /caa-images/slavdeck.gif
GET /joypic/caa-bot.gif
There in the logfile is each image displayed. The next time the visitor looks at a page on this site, however, she won't have to download each of these images again, since they've been cached in memory by the web browser.
"Daddy, daddy, I can't stand this logfile terror!"
"It's okay, honey, it'll only hurt a little longer."
Following a Visitor
That's it, the terror goes on, line after line, day after day, the lowly logfile trudges along, growing in bulk, waiting to be noticed and appreciated. So let's take a look at our logfile and see how a visitor passes through our site. I will be abbreviating the logfile to simplify this for you. Here's the path of a visitor that came from a search on Yahoo for the keywords "Christmas stories." First, she went to my main Christmas linking page (http://www.joyfulheart.com/xmas/) and then poked around some:
14:06:07 GET /xmas/
14:06:20 GET /xmas/joseph.htm
14:06:48 GET /xmas/cradle.htm
14:08:01 GET /xmas/burlap.htm
I've skipped all the intervening images. But notice what you see here. The visitor was on the site for about two minutes, looked at the linking page, and then an article about St. Joseph, another about a Cradle, and a third about "Burlap, Boys, and Christmas."
Why follow a visitor? Because only when you do that, you begin to discover how a visitor uses your site -- what door she comes in, what interests her, and where she leaves. Lots of small scientific observations will add up to an accurate picture of what a visitor actually does on your site -- and that information is priceless if your goal is to optimize the experience, and lead your visitor to the most important -- from a monetary standpoint -- parts of your site.
Now that wasn't so bad, was it. I want you to relax, take several deep breaths, and then smile. You've been through logfile hell and lived to tell about it.
Sample newsletter. We respect your privacy and never sell or rent our subscriber lists. Subscribing will not result in more spam! I guarantee it!

