11. Statistics
Statistics play a role in helping to understand how visitors use a site, but they need to be useful for that role.
Most hosting providers include several statistics packages, of which one of the most comprehensive is Awstats. It includes some useful information like pages visited, hours viewed, country of visitors, robot visits, and viewing session times. Unfortunately, while these are superficially useful, they would be really useful if they allowed combinations of these, like viewing times for individual pages, which would indicate if they had actually been read.
Originally I thought that what might be useful is to have map of the sequence of pages visited and durations. In practical terms, that meant for each page, having a list of the most popular pages immediately visited after it. To contain the visual complexity, I limited that to three pages. That still looked too complex considering the expected site owners, so I ditched that.
The single most useful statistic is whether a page has actually been read. Packages like Awstats get their data from site visitor logs. Since the logs include the IP address of the visitor and the time each page was opened, the time between the page opening and the next page viewed by the visitor would indicate the page viewing duration. However, the last page duration would not be available because there was no next page as a timestamp. Unfortunately, Awstats doesn't show this page-level duration.
Logs are not necessarily retained as they can be disabled for processing on a per package basis, so I experimented with creating a record for each page viewed as a file with its name encoded with the details. With the list of times now available, I worked out a way to do the calculations, but it was a lot work just to determine whether a page was read. The actual duration may not be as critical, so I scrapped all that and decided to use JavaScript to send the time, locale and article ID to the site so it could create a record encode in a file name.
Originally, I set the delay for 10 seconds before JavaScript sent the details, but after evaluating actual read times, I changed it to 30 seconds. The actual statistics are calculated when the master manager accesses the Statistics page, which lists the pages, most popular first, and if there multiple locales, the page counts for each locale. Only the most recent 10,000 visits are kept, hence the need for the time accessed. For a popular site, the current statistics would be an indication of only the most recent activity. For example, if a link to an article had been placed in a social media article, examining the statistics would reveal how much site activity resulted.
One of the special pages is Test. It allows for up to 20 multiple-choice questions, but the results are only shown to the visitor who answered the questions. It is meant to be used for providing feedback as to how much they had understood by reading an article or using a procedure. I thought that a useful article type would be for a poll, and I realised that it would be almost identical to a test, though the results upon completion would be different. To be useful, that would require accumulating the results, which also raised the possibility of doing the same for the tests so that a site owner may be able to get some more direct feedback from their visitors.
I discussed other's experience of polls, including the rampant trivial questions often posed. I also understood that getting stats from tests might skew site owners to asking questions that aren't really of interest to their site visitors. Determining what questions may be worth asking is a skill, and given that when people get open-ended tools like tests and polls, they can become a source of conjective entertainment.
In determining what facilities to provide with a product, a decision needs to be made as to whether it really useful or a gimmick. Trying to get meaningful accumulated poll and test results can be a serious distraction from creating content, with a limited ongoing use to a site owner, so I abandoned these plans. Utilities are not a means of creating content, so they must take second fiddle to those tools that do. While one hopes that people find a product enjoyable to use, providing them with misleading entertainment is likely to be counter-productive.