6 Program structure

The program structure is what shapes how the code is constructed and the data is used.

The code for the product exists across several code language files. CSS for the viewing look and feel, JavaScript for the dynamic operations, XML for the business rules and content structure and data, XSLT for the translation from the XML to the HTML for viewing, Excel for the user interface strings (later rendered as XML), and PHP to glue the backend. Any functionality has to rely upon all these to work effectively, and thus any program structure is a hybrid of all the contributory code files. However, the balance between them all changed as the project proceeded.

Birth of the rules XML

△

The rules XML only came into existence later in the project.

The parameters for forms for capturing data originally existed as arrays within PHP, with a form class holding the methods and the field definitions. The main issue was getting the form parameters into the XSLT so that the HTML was defined properly. The XSLT is the central meta-structure that brings all the product functionality together by using a runtime XML structure that uses the PHP to add the various XML repositories to it. Incidental values could be added to the runtime XML, but adding the complex forms parameter structures from PHP was getting more awkward.

So the form structures were transferred to a rules XML structure that could be directly accessed by the XSLT, rather than a lot of PHP code that basically converted between two structures. One structure is easier to maintain than two, where the second only existed because it had to be made so the XSLT could use the first structure. Refactoring to use only one XML-based structure made sense. Why it did not make sense originally was because the initial PHP structure was rather rudimentary, but got more complex as the project proceeded because more use case variations had to be catered for.

For example, each tag only had possible parameters that could be applied to it, such as whether it was deletable, movable, or a rich-text element. Each of these had dependencies upon them in every other file, from the XSLT that added the HTML that provided the structure, the CSS that controlled how they looked, and the user-interface strings used, but also the PHP code that transferred those parameters to the runtime XML and checked the validity or the actions requested and the field values returned.

When more parameters were added, and became more complex, the rules XML file was created. Into this went the form field definitions, and even the proforma tag structures that were previously text within the PHP. Then a whole lot more parameters were added that were basically like shortcut flags for the XSLT, such as to indicate whether a tag had attributes. The options for those attributes are defined in the file, but rather than check for that deep in XSLT templates, which would require the whole structure be available to them, just the flag attribute value is passed to them.

An a flag was added to it so a template could easily check for attributes and cater for them without having to know their structure and options. There are now 40+ such flags, each only one lower or uppercase letter, symbol or number. The rules file came to control which tags could be included in another as children, including how they are grouped and which other tags could be substituted for them. Even with that, originally I only allowed one of the tags to be substituted, but later expanded that to allow more. Like with the attributes, other lists with their options were added to the file.

Validating form data

△

Maintaining tag structure directly meant that form validation needed a round-trip means of managing it.

The contact page has a form, and it converts the supplied values into an email. That is rather basic and does not require any complexity, at least on the surface. However, it has to have minimum and maximum numbers of characters applied, with the minimum ensuring that stupidly short emails with abuse do not get through, and the maximum to block voluminous treatises. The size of paragraphs is limited to prevent impenetrable streams of consciousness ramblings, and the number of them as well.

Every field has similar restrictions, so a universal scheme to round-trip the definition, rendering and checking of their values was needed. Each form field on each page has a corresponding structure in the rules XML, with an attribute for each limit. This is used by the XSLT to build the HTML of the field with whatever attributes are needed, such as required or the pattern field to provide a regex to check the field formatting in the browser.

But it is at the server that there needs to be the complexity to validate all the values submitted. The big problem with HTML is that any user can access the developer tools in their browser and alter any part of the page, including the field and even their names and values. That means that anyone could potentially submit values that could be incorporated into an application's data and possibly damage its operation. Therefore, every form submitted must be checked for whether the action requested and values supplied are valid for the current context.

This includes for the current user if restrictions are applied depending upon their status. Then any errors have to be communicated to the user so they can correct any mistakes. Those messages have to be meaningful, so the checking code must test for how the values or actions are incorrect so that the most meaningful message is displayed. But that is just for well-meaning users. Malicious submissions must also be checked for, though providing useful feedback for malicious users can be dispensed with, so they get no clue as to how to better try to break the application.

The last group of errors to check for is where the application has inadvertently broken something or the data has been damaged. This can be as simple as checking that the PHP version used is recent enough to whether elements in the XML files are there before trying to modify them. Without such checks, some operations may cause the application to crash, rather than just gracefully not try the action, so leaving the application operational, though the user may be perplexed as to why nothing happened. The test could attempt to fix the broken element if it can.

A key part of validation is sanitising each input so that no characters that could create problems are entered. These are also controlled by a flag value that contains characters that each indicate the type of sanitisation to be applied, whether that be trim off spaces or convert --- to an em-dash —. The testing is controlled by a PHP array that specifies the type of tests conducted upon values, whether any special element-specific tests are done, and what error message is used. All these are done as part of a PHP class dedicated to form values checking.

If the tests are passed, then special processing is undertaken to add the values into the structure where they belong. This is done by another PHP class whose code is loaded on a per page basis. There are checks to make sure each data item is updated before proceeding, with a fail generating an error and bypassing any more processing. The PHP code for some pages can be quite complex, commensurate with both the complexity of the applicable XML structure being altered, and how critical it is.

Editing articles

△

Of all the XML structures, that for articles can have the most variations.

A page is rendered as a hierarchy of HTML tags encoded as specially formatted characters in its text. While most HTML editors treat that text as just one big block, while providing help to hint at what attributes and values could be added to tags, that still leaves the whole structure vulnerable to inadvertent damaging keystrokes, and puts it upon the writer to know the intricacies of the technicalities of how it all hangs together.

The product takes a different approach to how pages are constructed, focussing on the sematic structure of articles, and keeping the integrity of that structure while editing. In this way, the focus is upon the semantic structure and content, rather than how they are encoded into the HTML. However, that does make what happens more complex, but alleviates that by making the structure that the elements are manipulated by more consistent. Once familiar with the element block editing structure, any element can be manipulated using it.

Once an element is created, it cannot be broken by what is typed into it or its children. There are none of the intermediate states during typing where the element or its attributes become invalid. Attributes are always only selectable from valid options, and no invalid children can be added by just typing them in. Even the inline insertion of elements into rich-text elements will not add invalid elements, even though they are typed in. A valid element will be created, and can only removed by deleting it explicitly, or altered only in valid ways, always maintaining its integrity.

Article editing has its own large PHP class, and a large section of XSLT code devoted to it. And there is a special PHP class that provides functions that can be called from the XSLT. Article editing uses the most resources, both in the server and the browser because of the amount of complexity that can occur in their structure. This is why there are article types, principally to restrict what can be added to them so that their purpose is not undermined, but also to limit what resources are loaded to edit them.

The editing code has to combine text content management, attribute modification, displaying the page as is while allowing access to each element's editing block, managing spike items, including ensuring they can only be added to the structure where allowed, and allowing access to other articles to get relevant content to use in the current article. Each time an article page is displayed, PHP creates a list of valid children and actions that can be performed on the current element, and a list of spike items and actions that can be used with the current element.

With each editing action taken, PHP has to check that what results will still be valid, even though it already presented that information in the lists it provided to the the current page's XSLT. This is just another part in maintaining article integrity with each editing step, again because what arrives at the browser can be modified by malicious, or just adventurous, users to make the application do what it is not meant to do. Some of that processing is to make sure indicators of what may now be invalid or important status information is rippled up the element hierarchy to aid with visibility.

Program structure

△

It is essential to have ways of structuring programs, and that is by architecting it for maximum utility.

The structures presented here are just some of what had to incorporated into the product, and which evolved over time. However, with each step in evolution, sometimes some refactoring had to be done to cater for expanded functionality required, but that also often led to expanded opportunities that triggered further refactoring. One of the issues was that with functionality spread across some many source files with their attendant technology, I often caught myself out trying to remember the end-to-end process I was trying to add to or change.

The form processing has so many touchpoints in its interrelated code-base that I often forgot to add changes to some code when adding new tags, triggering a cycle of debugging to find what ended up being very simple corrections. A lot of swearing happened due to my frustration that I could not remember such little things. I have a documentation file that I put a lot of design information into, as well as program and structure details. But it is hard to put all the details that really define the product, as Philomatics describes in their The Value of Source Code video.

That video tries to explain concepts from Peter Naur's Programming as Theory Building article. It helped me understand why I was having difficulty actually getting to describe the core of the product in my document, but it also helped to explain why I wanted the product to be as complete and self-contained as possible. I am not young, and I did not want the product to be extensible because that would require all the interfaces being well-explained, and having programmers familiar enough with the product and its code that they could handle all the complexities that could arise from that.

Being complete and self-contained meant that someone could maintain it, but would have difficulty making it into a more interfaced product because pretty well every bit of code was added from first principles and thus made to order for what I wanted it to do. Many web product use frameworks and libraries to take care of a lot of functionality, giving newly added programmers who are familiar with them a head-start in their chances of understanding the whole program.

Completely custom code thwarts that, but I used it to ensure that users did not need to maintain any third-party software, not to frustrate future maintainers. The only way to make this all work was to make sure that the product was self-contained, and thus less likely to need maintaining. This was done by focussing solely upon its primary function, and not trying to be all things to all people.