Comparing Site Statistics

December 13, 2005

I recently had a site owner who was puzzled over the different numbers he was receiving from two different statistics packages. It’s no wonder he was confused — the fact is that no stats software can be compared with any other to any significant degree of accuracy.

He was looking at numbers for both “hits”, and “sessions”.

Hits are the number of elements the web server served; if a page has a style sheet and four graphic images on it, someone loading that page will generate 1 (HTML) + 1 (Style Sheet) + 4 (images) = 6 hits. If it has a Javascript included file, then it will generate an additional hit — but only if the user has Javascript enabled.

Hits are meaningless for most intents and purposes, except for comparing against themselves, one day to the next, etc.

Sessions are more meaningful, but they’re much more wild guesses. Web servers by themselves have no concept of a session; one user loads 4 items here, another user loads 6 items there, the web server has no idea whether they’re the same user or not.

Stats software tries to get around this by saying “if the same IP address loads two items within X number of minutes of each other, it’s the same session.” Unfortunately, X varies from stats package to stats package.

Worse, the “same IP address” may not be meaningful — a user with an ISP that hands out dynamic IPs may have two different addresses from one page to the next. Larger / faster ISPs may be loading from a proxy cache or group of proxy caches, each with their own IP address; not only may a user’s address change from one page to the next, but one address may represent 10 or 100 or 1000 users — there’s no way of knowing.

Some stats packages try to get around this by issuing cookies and checking them (this usually requires a special item to be loaded on a page, or a special filter on the server). Some users will reject all cookies, other users will be running “internet protection software” that will reject certain ones, often particularly those used in stats package. Including special code on the page usually only works depending on the user’s Javascript settings.

Stats that come from log files (such as the stats server) will often include activity by non-users; search engine spiders, malware looking for e-mail addresses to spam, bots that are looking for copyright violations, etc. Some of these will be identified correctly, but since many of them are constantly changing to try to keep from being identified, many of them won’t.

Stats that come from cookies or other scripting (for example, Google Analytics or Mint) usually won’t show bots, but they also won’t show users that don’t enable that scripting language or reject the cookie, etc.

For these reasons, no stats software can be compared with any other to any significant degree of accuracy. I have four or five different measures of statistics on virtually all of my sites. None of them ever agree.

What you can do, however is try to get used to the limitations of each method, and watch for trends in each package — compare data from stats package A with data from stats package A from previous days, weeks and months. Likewise, compare data from package B with itself in the past, etc.

Some stats packages are better at identifying some things than others are; learning how to interpret each of them and how to make meaningful plans for site development and marketing accordingly is largely the art of being a successful web site owner.

Be Sociable, Share!


Got something to say? [privacy policy]

You must be logged in to post a comment.