Measuring Traffic

December 9, 2004

As a website runs, a log file is generated on the web server. This log file contains one line for each and every thing sent to visitors of the site — pages, images, etc. In most cases, this line also indicates what kind of browser it was (“User Agent”), what operating system it ran on (Windows XP, Mac OS X, etc.), the address of the user (“Host Address”) and if the user clicked on a link to get to the page that was sent, it will also have the address of that page (“referer” and yes, it’s routinely misspelled this way.)

As you can imagine, this makes for a large and unwieldy file very quickly. This file is typically processed and used to generate one or more “statistics” pages. Your web host may offer one or several varieties of statistics pages — some may be free, and others may cost money. Alternatively, you might install (or have installed) a statistics package of your own choice, or you might even download your log files and use any of several popular programs on your own computer to generate statistical information. (There is also another alternative, involving placing counting codes on each page, but we’ll ignore that for the sake of this discussion)

What all of these mechanisms have in common is that they have to make some “guesses” as to how to interpret the logs. As far as the web server is concerned, if a user viewed one page at 10:00 AM and another at 10:01 AM, those two events have nothing to do with each other. The statistics packages make this determination themselves based on time between pages, etc.

For this reason, two different statistics programs looking at the same set of logs will often have somewhat different results. In general, it’s a bad practice to try to compare them — changes in traffic in a single statistics program over time will yield much more usable information.

Years ago, it used to be common to speak of web traffic in terms of “hits”. Since each image on a page is a “hit”, as are things like style sheets, external Javascript files, etc., a “hit” isn’t terribly useful. Since one page might generate fifty “hits” per view, and another five, it’s usually a lot more convenient to speak of “page views” (someone once suggested that “HITS” is an acronym for “How Idiots Track Statistics”, which is crude, but probably near the mark.)

Another useful figure to watch is “User Session”. This approximates (due to the limitations mentioned above) how many users visited the site during a period. If a given user showed up at the site at 9:00 AM, and looked at pages until 9:45 AM, and returned again for 10 minutes at 4:00 PM, this should be counted as two “User Sessions”.

In general, things your statistics program should be telling you include:

  • Page Views
  • User Sessions
  • Number of views per page
  • Number of times users entered the site on a page
  • Number of times users exited the site on a page
  • Referring pages, and how many users they sent
  • Number of hits per each search engine
  • Keywords and phrases used at search engines
  • Amount of data transfered (this is particularly important if you pay for your hosting by transfer volume)

Regardless of the statistics package you use, whether it’s provided by your web host or one you download and run on your own computer, it’s vitally important that you learn how to read your site statistics, and do it regularly. Otherwise, you have no tools to gauge how well your site is performing, and how well you are promoting your site — whether links from other sites are actually sending you traffic, whether search engines are finding your most important pages, etc.

