Web sites consist of pages and media, like images. How and where they are stored, and how they are accessed, is critical to smooth site operation.
Any design relying upon stored information brings up the issue of how and when it is accessed, especially across multiple users. In the database world, this usually brings record-locking mechanisms into the mix to prevent edits to a piece of data being clobbered right after by another user. In going with XML, I also decided to look at how the types of site-owners I was targeting would use the product.
Most sites would be run by one person, which made simultaneous updates moot, but they may want to get others to translate and review articles, which puts them back into the problem area. The key I centred around was that it was really only articles that needed to be flexible in their data arrangements, while the rest could be handled by one person exclusively. Restricting the ability for only one person at a time being able to make edits eased the access contention considerably.
This makes the product unsuitable for highly collaborative environments, but that is not what the target audiences generally want nor be used to. The product is not for enterprise-level use, though teams could use it to expose some of their information to the public.
The data is distributed across several files. This was done to reduce how much is used for any particular purpose. For example, all the data for an article is not needed when a file is requested, but some checks need to be done to ensure the request is legitimate. Consequently, there is a small XML file that contains core data used by all requests, and other files are added as required.
Except for individual article files, all other files can only be updated by the master manager. For those, rather than lock them for updates, when changes are made, a new file is produced, which is then used for subsequent reads, and old versions eventually deleted. Even the article files are managed in the same way, where any change produces a new file, making it very easy to undo, simply by deleting the latest. Keeping the last couple of versions of management files also allows undoing for some operations.
From the outset, I wanted a completely virtual environment, where, other than the files that must be in the root folder, all others were in a folder that has included in its name a 256bit cryptographically-secure random number, so that while it was easy to get to its contents from the index.php file, it was virtually impossible to guess their URL. Initially, I didn't come across how to do this, so I had a clunky means of providing real paths for articles and files, while still virtualising their storage. When I did come across the solution, I promptly dispensed with that rubbish!
Article headings and introductions are used in several places other than just topping the article, such as in the page header, browser tab headings, and feeds. They also were needed for category listings. None of these required the rest of the article but did require this information from multiple articles at the same time. For the first few years, they were in a separate file from the rest of the site's metadata. As loading times were reduced by rapidly increasing use of SSDs in servers, and XML parsing being made very efficient, the separate file was merged into the metadata file.
After a little while, I included an XML file that contained common data that wasn't locale-specific. This grew over time to include valid parent-child element combinations, attribute options and a myriad other rules that I originally had in PHP arrays but were getting too hard to manipulate. I abstracted more rules from PHP and XSL to centralise management of them.
All the user interface text is in an Excel file. This extensively uses formulae to provide a fall-through language hierarchy while avoiding having to repeatedly translate words or phrases used in multiple places.