To main heading

Smallsite Design

Technology

5. Data structure

Web sites consist of pages and media, like images. How and where they are stored, and how they are accessed, is critical to smooth site operation.

Any design relying upon stored information brings up the issue of how and when it is accessed, especially across multiple users. In the database world, this usually brings record-locking mechanisms into the mix to prevent edits to a piece of data being clobbered right after by another user. In going with XML, I also decided to look at how the types of site-owners I was targeting would use the product.

Most sites would be run by one person, which made simultaneous updates moot, but they may want to get others to translate and review articles, which puts them back into the problem area. The key I centred around was that it was really only articles that needed to be flexible in their data arrangements, while the rest could be handled by one person exclusively. Restricting the ability for only one person at a time being able to make edits eased the access contention considerably.

This makes the product unsuitable for highly collaborative environments, but that is not what the target audiences generally want nor be used to. The product is not for enterprise-level use, though teams could use it to expose some of their information to the public or internally.

The data is distributed across several files. This was done to reduce how much is used for any particular purpose. For example, all the data for an article is not needed when a file is requested, but some checks need to be done to ensure the request is legitimate. Consequently, there is a small XML file that contains core data used by all requests, and other files are added as required.

Except for individual article files, all other files can only be updated by the master manager. For those, rather than lock them for updates, when changes are made, a new file is produced, which is then used for subsequent reads, and old versions are eventually deleted. Even the article files are managed in the same way, where any change produces a new file, making it very easy to undo, simply by deleting the latest. Keeping the last couple of versions of management files also allows undoing for some operations.

From the outset, I wanted a completely virtual environment, where, other than root folder files, all others should be in a folder with a name with a 256bit cryptographically-secure random number in it, so while easy to get to their contents from the index.php file, it was virtually impossible to guess their real URL from the internet side. Initially, I didn't come across how to do this, so I had a clunky means of providing real paths for articles and files, while still virtualising their storage. When I did clue into how to do it in the .htaccess file, I promptly dispensed with that rubbish!

Article headings and introductions are used in several places other than just topping the article, such as in the page header, browser tab headings, and feeds. They also were needed for category listings. None of these required the rest of the article but did require this information from multiple articles at the same time. For the first few years, they were in a separate file from the rest of the site's metadata. As loading times were reduced by rapidly increasing use of SSDs in servers, and XML parsing being made very efficient, the separate file was merged into the metadata file.

After a little while, I included an XML file that contained common data that wasn't locale-specific. This grew over time to include valid parent-child element combinations, attribute options, new element templates, and a myriad other rules that I originally had in PHP arrays but were getting too hard to manipulate. I abstracted more rules from PHP and XSL to centralise management of them. All elements and attributes in the XML file are as short as I could make them. While that did not make much difference to parsing times, the plentiful Xpath filtering has a lot less work to do.

All the user interface text is in an Excel file. This extensively uses formulae to provide a fall-through language hierarchy while avoiding having to repetitively translate words or phrases that must be used in multiple places.

All files are stored with ASCII lowercase names to ensure there will never be possible incompatibilities or problems reading or rendering their names. Where paths are included in the names, they are encoded as lowercase hex characters.

  • User interface
  • Picking the philosophy
  • Facilities
  • Contact   
  • Categories   Feed   Site map

  • This site doesn't store cookies or other files on your device when visiting public pages.
    External sites: Open in a new tab or window, and might store cookies or other files on your device. Visit them at your own risk.
    Powered by: Smallsite Design©Patanjali Sokaris