14 βStatistics
Statistics play a role in helping to understand how visitors use a site, but they need to be useful for that role.
Most hosting providers include several statistics packages, of which one of the most comprehensive is Awstats. It includes some useful information like pages visited, hours viewed, country of visitors, robot visits, and viewing session times. Unfortunately, while these are superficially useful, they would be really useful if they allowed combinations of these, like viewing times for individual pages, which would indicate if they had actually been read.
Originally I thought that what might be useful is to have map of the sequence of pages visited and durations. In practical terms, that meant for each page, having a list of the most popular pages immediately visited after it. To contain the visual complexity, I limited that to three pages. That still looked too complex considering the expected site owners, so I ditched that.
The single most useful statistic is whether a page has actually been read. Packages like Awstats get their data from site visitor logs. Since the logs include the IP address of the visitor and the time each page was opened, the time between the page opening and the next page viewed by the visitor would indicate the page viewing duration. However, the last page duration would not be available because there was no next page as a timestamp. Unfortunately, Awstats does not show this page-level duration.
Logs are not necessarily retained as they can be disabled for processing on a per package basis, so I experimented with creating a record for each page viewed as a file with its name encoded with the details. With the list of times now available, I worked out a way to do the calculations, but it was a lot work just to determine whether a page was read. The actual duration may not be as critical, so I scrapped all that and decided to use JavaScript to send the time, locale and article ID to the site so it could create a record encoded in a file name.
Originally, I set the delay for 10 seconds before JavaScript sent the details, but after evaluating actual read times, I changed it to 30 seconds. The read processing creates a file that includes all the information sent into it filename. After that, it tries to add them to a summary file before deleting them. This helps to keep long term stats without having lots of files.
When the
When I first added the weekly statistics, they were only available when an article ID was clicked on, but when I thought that it would also be useful at the site level, I thought that that was a better level for them. With that, the weekly statistics for individual articles seemed of limited value, so I removed them. Statistics are very useful, especially to those used to studying them, but the intended audience for the product was not likely to want to go too deep into them, so I opted for keeping them simple and conveying only the most useful information.
One of the special pages is Test. It allows for up to 20 multiple-choice questions, but the results are only shown to the visitor who answered the questions. It is meant to be used for providing feedback as to how much they had understood by reading an article or using a procedure. I thought that a useful article type would be for a poll, and I realised that it would be almost identical to a test, though the results upon completion would be different. To be useful, that would require accumulating the results, which also raised the possibility of doing the same for the tests so that a site owner may be able to get some more direct feedback from their visitors.
I discussed other's experience of polls, including the rampant trivial questions often posed. I also understood that getting stats from tests might skew site owners to asking questions that are not really of interest to their site visitors. Determining what questions may be worth asking is a skill, and given that when people get open-ended tools like tests and polls, they can become a source of speculative entertainment.
In determining what facilities to provide with a product, a decision needs to be made as to whether it really useful or a gimmick. Trying to get meaningful accumulated poll and test results can be a serious distraction from creating content, with a limited ongoing use to a site owner, so I abandoned these plans. Utilities are not a means of creating content, so they must take second fiddle to those tools that do. While one hopes that people find a product enjoyable to use, providing them with misleading entertainment is likely to be counter-productive.