To the main heading-

Smallsite Design

Support

Google search is lying to you!

!

Google is considered the leader in search and supposedly gives more results than all the others, but they are ignoring a large part of the web.

While Google may be singled out here, expect all the other search engines to follow similar strategies. There are not that many search engines, and while there are many popping up these days claiming privacy advantages over Google, they are usually using Google or Bing by proxy, in that they use their servers to do the search on your behalf, so the source search engines do not know it is for you, so they cannot build up data on your web activity.

Use DuckDuckGo

As of August 2023, Bing provides 100% coverage.
DuckDuckGo includes Bing by proxy, retaining privacy.

Except for very popular sites, avoid Google
and any that use it, like StartPage or Brave.

Most of the web is ignoredβ–³

Much is made of Google gathering a lot of information about pages, but a simple search for all the pages on a particular site will only show a fraction of them.

Google, like other search engines, runs several web crawlers to scan all sites on the web, often several times a day. These crawlers or bots read all the content of the pages of each site, store it, and then process it several times over over the next few days or weeks to build up a rating for each page. Well, that's the theory!

Now, try running the search site:, with your domain name after it, which should be reasonably expected to list all pages for the given domain. In 2022, for a site that has been around for years, and with over 150 pages, less than 8% were listed, though at least they were the home pages and article listings, giving some visibility. Now, even if Google did not bother to give the pages a favourable rank, they should still all be listed, because there are no other pages to push them to the bottom of the list. It is like they never existed. Note that as of August 2023, Bing seems to list 100% of pages, whereas earlier it was only 38%

As John Mueller of Google explains in his Why does a site:query not show all my pages? video, The short answer is that a site: query is not meant to be complete, nor used for diagnostics purposes. … This query limits the results to a specific website. It’s not meant to be a comprehensive collection of all the pages from that website. In unashamably self-promotion of their proprietary tools, Mueller says at the end Don't worry about the counts shown in the site: query. Use Search Console instead.

The relevant questions to be asked are Why can't the site: query be used for diagnostic purposes?, Why does a site owner have to give privileged access to their site to Google via Search Console just to verify what Google knows about it? and Why can't any researcher, or even any consumer, know what pages Google has indexed for a site? This is all very disingenuous of Google because they used to show more, but now show less, and persistent use of the site: query will result in persistent ReCapcha verifications. This is Google trying to avoid public scrutiny of the efficacy of their search query results.

While Mueller says that the site: query will not list all pages and implies that Search Central would do that (if there are no technical issues), if a page is not listed in a site: query, it is never returned in any other query, even if prefixed with unique text from the page. That refutes his whole explanation and counters his advice. It is all lies. Here we are, trying to make content worth reading, just for a huge chunk of it to be ignored!

Now, this appears rather strange, given that Google has massive amounts of storage devoted to each page, let alone the also massive amounts of computer processing they can apply to each. Why then cannot they at least store the minimum about each of the pages they are currently ignoring? It is possible that they have been so focused on ad revenue that they do not really care about being a proper search engine, but doing whatever improves their ad revenue streams. When most go for the big sites, it seems the myriad little sites become irrelevant. It is possible as Bing now does it.

Coverage for sitemap.xml vs site: for some so-called SEO expert sites in 2022:
#Domainsitemapsite:%
asafaridigital.com.au20013266
bdigitaldarts.com.au814859
cadimpact.com.au3088528
dclearwateragency.com.au1456041
ereloadmedia.com.au6756510
–Total156640226

These SEO experts do not seem to want to explain why all their pages are not listed. Perhaps they do not know, or do not want to let on, as that would mean that most of their advice is ineffective because Google is largely ignoring their customers' pages as well.

Coverage for sitemap.xml vs site: for some smaller sites in 2022:
#Domainsitemapsite:%
asmallbusinessaustralia.org34510028
bpatanjalisokaris.com157128

The first site represents small business, but it is not serving them if it does not advocate for them to Google or government to get their member's site's pages all listed. The last site is for the 8% mentioned earlier, but has been around for several years and is nowhere near as popular as the others. Its low coverage is indicative of what most newer small sites would get, which means they are almost invisible to the web. Those results were returned in 2022. Now the site: query for the second site returns 8 out of 227 pages (5 via StartPage), or under 4%. Google is getting worse by the year.

What this all means is that for any visitor who reads a page on these web sites after following a link on a non-search page, and later on would like to read it again, searching for it using Google would most likely fail. Google is feigning to give you effective tools to help you improve the visibility of your site's content, only to totally ignore most of it!

Big sites show only 200 pages for site::
#Domainsitemapsite:%
adyson.com185820411
bhubspot.com44062005
cford.com65020131
dneilpatel.com7,8112003
esearchengineland.com29,8382001

Searching for where this limit is specified showed nothing. Some SEO experts are telling their readers to use site: to find out how many pages Google has registered, but are not letting them know this important limitation. Some seem all too willing to trust Google too much.

Now, if Google is doing this to your own site, how much can you trust their searches if they are just ignoring almost all of pages of the smaller sites. They have stated that they favour results from the bigger sites because they say people are more likely to trust them, but that might be reasonable if they at least included all pages. Are there sites that they judge are not even worth bothering with? What gems in the corners of the web are we missing out on because they have been ignored completely.

This seems most egregious when given the poor real support they have for those who have signed up for Google Search Central, considering the simple most basic thing they can do for a site is to list all the damn pages, and show them in search results! What it seems to imply is that they are not doing this to help the sites themselves, but to gather all the deep usage data about those site's visitors through their privileged access to a site's data, while offering just a few stats from their vast treasure trove from the site in return.

And a suggestion to the Bing team. Now that you actually list all web pages, you could legitimately hit Google users with a clearly articulated advertising campaign that would point out how Google is not serving their interests at all, and so dissuade millions from using them. A suggestion for those search engines, like StartPage or Brave, that are using Google results, is perhaps it is time to diversify your sources because Google is short-changing your users as well. Also, you are returning less than half of the pitiful amount Google returns.

This is why Smallsite Design, while originally using the site: prefix to search using Google, had to have its own search facility. It may not be as sophisticated as Google's, but at least it lists all pages on the site that match the text. That means that if someone does get to your site, they have a means to properly search your site and discover the full breadth of what you offer. Smallsite Design now does offer the ability to select which external search engine to use.

The real world takeaway from all this is that if a so-called SEO expert cannot get all their own site's pages listed by Google, how can they be expected to do so for your site? Save your money and time and focus on writing content worth reading. It may take time for Google to get around to listing it, but you will have a worthwhile site without wasting time on distracting SEO blind alleys. Meanwhile, you will get some attention from Bing users, which includes users of search engines that use Bing's API.

All this highlights something very strange about attitudes toward the internet. Why is no one publishing figures for the coverage provided by the various search engines? All we see are how popular they are. Why are consumer organisations not informing us of the significant shortcoming of these resources that we rely upon for finding what we want? Why do so many think that search engines are so much more accurate than what they actually are?

Damping factorβ–³

To avoid searches only focusing upon links within the same group of pages, a damping factor is applied to make sure other pages are considered.

Search engines apply what is called a page ranking algorithm to determine how likely a random visitor is to want to go to each page on the web. This will take into account the search words used along with previous history of the searcher and all other searchers' previous searches. However, the problem that can occur is that the web is not all connected, but really consists of largely connected sites and a whole lot of smaller groups of sites that are isolated from each other.

So how do search engines find out about such isolated sites? Google signed up to be a domain registrar, but not to provide domain names. Instead, doing so allowed it to have access to all the new domain names, giving it the opportunity to discover those that had never had links to them. That is why any new site gets crawled pretty quickly after creation. If Google did that, all the other search providers probably did the same thing.

However, since page ranking is basically a popularity contest, none of the new sites will ever show up in searches until some popular site links to them. This was exploited by some by use of link farms that provided popularity, but because they did so without any real merit, Google et al have tracked such farms and downgraded or banned those who use them. But to address the real issue of less popular site invisibility, a damping factor is included to make sure a percentage of sites outside the popular ones are included. The percentage used is typically 15%

The lieβ–³

The lie here is one of omission, because Google et al are not being honest about what they leave out, and thus that the results are severely skewed for many searches.

So if search providers know about all the pages on the web and have incorporated a damping factor of 15% to make sure less popular sites have a chance, why are they totally ignoring most pages of those sites? Well, that is the multi-billion dollar question that Google et al should be providing answers to, as that is what is reasonably expected of the major search providers with their world-wide reach.

That Google has listed such pages a lot more in the past, but has never listed all of them suggests that there must be such a significant cost to them maintaining their consideration as search result candidates that they continuously decide to ignore them after each search of a site. That is, search providers have criteria to eliminate pages for consideration early in the mapping process.

Google still lists home pages and listings pages, but basically ignores those pages that may be what people really want to get to. Technically, Google has provided a means of people finding a site that may have the information they seek, but not directly. A person would have to click on a link on one of those home or listing pages to get to any useful information, which is unlikely given that such useful pages are listed directly for more popular sites. This is how Google et al lie to us while supposedly fulfilling what they promise.

Google et al do not tell anyone why they are really ignoring most of the pages of non-large sites. They cite some technical reasons, but they rarely apply to working sites where all the pages can be seen in full in browsers. Ignoring pages significantly reduces the pool of non-popular pages that would be considered to be part of the 15% included by the damping factor, and thus skews results from that pool to more not-so-unpopular pages. The lie is not indicating that their so-called organic results are far from organic, especially for searches that would have included some excluded pages high in the results if not at the top.

Wants your contentβ–³

As of 2024, Google is shifting to significantly rip off more site content to display on their search results pages.

They are not so interested in actually sending anyone to the sites they extract content form. In a way, this is neoliberalism for websites, where they are fully exploited, but the site-owner's interests are ignored. In the ruthless web, no one wants to have anyone leave their site. We see this in social media, where links to sites are tolerated, but the layout, facilities and algorithms work to keep people on the site rather than leaving. However, when the major search engine now works to keep searchers on the search site, it is another brick in the wall keeping everyone in their world.

That this is available sort of makes it obvious why Google has been increasingly hiding smaller sites over the last few years. For a website to take advantage of their new content emphasis, it needs what's called schema markup embedded into all significant elements on the page. The only site owners likely to devote resources to such data markup are major corporations, or SEO devotees. If small site owners are unlikely to go to such trouble after using their time and core abilities to create the visible content, Google does not need them in its new world search order.

With more emphasis on metadata as being of higher importance than the creativity used in making the visible content, with all its nuances and human-appeal, people are being relegated to feeding the Google machine, and if a site does not offer new metadata, it is ignored. Content is purely transactional for Google, and only of worth if it keeps searchers looking at page metadata, dressed up in a few words from the page, only on Google's pages.

This whole scenario assumes that people would prefer to get content in this way, as if no one would be interested in listening to recorded music, but only its keywords and significant riffs, or instead of reading a book, the synopsis will do. This regime applied to a YouTube video would show people an AI version of the presenter rattling off a connective monologue using snippets of the uploaded video as illustration. Google could save a lot of bandwidth that way, as the videos would be much shorter.

While such a focussed and condensed format would suit some who are only after key bits of information, for many it will fail to engage them, just because they want the longer experience of the expressive form of the content. Even for those who want to kill a bit of time, the experience leaves out what attracts them in the first place.

While many sites will not be doing all that markup, SEO experts, who are not known for being experts on the creation of the part of the content that people actually want, will be out in force extolling the markup's virtues and how important it is for the supposed success of any sites. This will lead many site owners to be biasing their site designs to be more suitable for containing markup, and that will lead to a significant overall degradation of site experience quality, just because resource constraints will force reprioritisation.

In essence, schema markup is just a more sophisticated keyword facility, but when site owners just filled their sites with excess keywords, quality dived, and eventually Google had to ignore the keyword attribute, and downgrade sites that did too much stuffing to the detriment of the usefulness of their sites. It seems that all that Google has learned about what makes a quality site for people over the last decade is being discarded in favour of what makes it easier for their algorithms. This will backfire, though it may work for enough people that Google does not care.

Making the best of itβ–³

While Google may be ignoring your startup efforts, there are some ways that may make them take notice, and not in a punitive way.

At one time, the holy grail of getting ranked higher by Google was to get some site that Google ranked highly to provide links to your site. However, that fell out of favour with many of those favoured sites because their staff were abusing that privilege, which led to a dramatic drop in the numbers of those links, but also a down-ranking of their statistic's importance by Google, just because the rorting made any ranking unreliable.

That leaves the only links on a page being worthwhile as those going to:
  1. a.Related pages on your own site.
  2. b.Related pages on other sites.

Linking to other pages on your own site shows that your content is comprehensive and that your pages support each other. Linking to authoritative sites – in Google's eyes – means that you are saying that there is validity in your content. Of course, your links have to be relevant and the content on the target must support the page's content, otherwise the Google algorithms will down-rank the page.

Now that support from the link target pages needs to more than just mentioning similar things, but obviously extend or justify what you write, or the other way round. Search engines are trying very hard to make the rankings correlate with quality content, so we would be mad to think that we can simply fool them, despite what SEO pundits reckon, when they are spending billions on AI to counter fraudulent up-ranking.

Google has over 200 criteria that it uses in determining rage rank, so tweaking the content to cater for one or two of them is not really going to make a difference. What really makes the difference is having content that people actually stay to read after clicking on a link in search results. Search engines know when a user returns to the results page soon after clicking on a link, and that results in the page being ranked lower for the query words used.

You will see that popular sites from the past, which seem to break all the SEO rules, still rank high in results, and that is because they know people are reading their pages. All the other ranking criteria mean nothing in that case, because they are only an indicator of probabilities. Reality trumps theory every time in the search world!

Google is being investigated for several monopolistic tendencies, but this failure to perform its advertised and expected functionality should be legally challenged.

Many countries have consumer protection laws, which often include that companies must not use misleading advertising or misrepresent their services. Clearly, this discriminative choice to ignore most of the web is a misrepresentation of services. Any reasonable person would expect a company the size of Google and that has a Google search button prominently displayed on its home page would expect that clicking on that button will return any web page satisfying the search request. They would rightly expect less-popular pages to be lower on the returned list, but not to be ignored altogether.

Furthermore, a reasonable person would expect that if a so-called search giant is not going to include all web pages, a prominent warning should be displayed so that they would know to use another search engine to get answers to their query. Consequently, Google, and any other search engines ignoring large swathes of the web, should be forced to clearly display such warnings. A company whose home page portrays itself as primarily as a search engine must be forced to live up to its projected implied promise, or warn otherwise.

Of course, many would abandon Google if they decided to continue to ignore most of the web and display such a warning, but then that is what should happen. Egregious failure to meet projected capability should not be rewarded with being allowed to continue unabated. In the end, we want companies to deliver what they present themselves as or be forced to change, either by stopping deceptive behaviour or forcing them to state what they actually do.

Make consumer affairs offices sit up and take notice by individually reporting Google and any other search engines as scams, as they really are. Report to them how many pages are listed in your site's sitemap.xml verses how many show up in a site: search. If enough people do this, maybe we will get some action to get these companies to actually do what they purport to do or face legal consequences. A lot of jurisdictions have per incident fines, which would add up to quite a lot for a global search engine in any one jurisdiction as each search is an incident.

Reporting adsβ–³

Google together with Facebook dominate online ads, but their reporting facility is almost useless.

In many places, Google ads include a reporting facility enabling an ad to be reported to Google, but because ads are rotated, an ad that has been clicked on will likely not be there when returning from its target to report it. Thus the reporting facility is next to useless for reporting any ads, especially if they are clickbait.

Given the extensive experience Google has at determining the relevance of page content to a search query, it is natural to expect Google to be able to determine if an ad's target page is related to the ad's content. Google totally fails in this too. Microsoft News has a Google ad prominently displayed at the top of the page. Over many months, many ads have featured a picture of Kevin Rudd (ex Australian prime minister) with some supposedly relevant text, only to go to a page about interior decorating, with nothing at all related to the ad.

This shows that Google has no interest at all in dealing with bogus ads, as even poor algorithms would have no difficulty in determining that the target pages included nothing about Kevin Rudd at all. Google is lying with their search results and doing nothing to prevent ads being totally misleading. Google is thus totally undeserving of any respect as an organisation in the two major areas that it professes competency in.

All this highlights that big tech companies like Google are accomplices in disseminating lies and misinformation, and along with media companies are failing to be good world citizens, unworthy of all the trust and money thrown at them. It is time to realise that big business is abusing us for their own profit and that they must be held to account for what they are doing to undermine our governments and economies. We need independent non-commercial ways of disseminating news and information that cannot be undermined by nefarious politicians and organisations.

Such independent organisations need to be funded independently of nations, and while the United Nations would seem to be the appropriate organisation for that, that it is beholden in its security council to countries whose governments are actively trying to sabotage democracy and the freedoms of their own and other countries' citizens, we need to establish alternatives that are truly free of political and ideological influence; a true fourth estate.

Links   △Latest articles&β–³
  • β€’Release notes: 2024-09-19-04-46-38
  • β€’SEO - useful or obsession?
  • β€’Who cares if designers are bored?
  • Subsite linksβ–³
  • β€’
  • β€’Contact   
  • β€’Categories   Feed   Site map
  • Searchβ–³

    This site does not store cookies or other files on your device when visiting public pages.
    External sites: Open in a separate page, and might not respect your privacy or security. Visit them at your own risk.
    Powered by: Smallsite Design  ©Patanjali Sokaris   Manage\