Adding search
After discovering that search engines ignored most less popular sites' pages, an included site search facility was required.
Google is the worst, with only 12% coverage at the time, but now down to 4%. This is a disgraceful disregard for their users and probably illegal in many jurisdictions, though it is hard to get anyone to just simply test their site for how lacking the coverage is, such is the blind acceptance of Google site tools in many tech circles. At the time, Bing was only covering 39% of pages, and while that is much better than Google, it is not 100%, which is the baseline requirement for being used for a site's search facility.
Since XML is being used for storage anyway, it was the obvious candidate for holding the search data. It stores all the words of the latest releases of all enabled articles, by article, then locale. While I originally had all sequences of non-alphanumeric characters converted to single spaces, I changed to just including all article text, less embedded character sequences, which make better sense for use with
When a search is initiated, the file is included in the load-up of the
External providers
△After Bing began covering 100% of every site's pages in August 2023, allowing external providers became feasible.
Bing going 100% without the 200 page limit of Google was a game-changer. However, to be able to search only within a site, there has to be a way to restrict the search to the site. For a manual search on a search engine's home page, site: followed by the domain name can be typed after the search term, but expecting site visitors to do that is not reliable. It could be done using JavaScript, but just plain HTML would be better. Some search engines provide a another attribute that can be used for this, but Bing does not.
Fortunately, DuckDuckGo does, and uses Bing results, so also gets 100% coverage, though new articles can take a few days to be included. The other advantage over Bing is that it uses Bing anonymously, so visitors' details are not leaked to them. The only tasks left were to implement specifying the provider-specific details in the
Normally, the query and site domain are included in the URL sent to the search provider, but returning them using form fields is better, as it does not leak the search query terms in browser histories or server logs.
<form method="post" action="https://duckduckgo.com/" target="_blank"> <input type="search" name="q" required="required"/> <button type="submit">Search</button> <input type="hidden" name="sites" value="yourdomain.com"/> </form>
where yourdomain.com is replaced by your site's domain name. For Google, if they ever get to do their job, replace sites with sitesearch.
Unfortunately, reliable external search is not with us yet. While Bing and thus DuckDuckGo give us 100%, using the site: and a term with Bing returns a lot of pages that do not include the term, while DuckDuckGo returns too few. So the internal search is still currently the best option, but at least those using those search sites have a better chance of coming across the vast majority of our site's pages.