Google is considered the leader in search and supposedly gives more results than all the others, but they are ignoring a large part of the web.
While Google may be singled out here, expect all the other search engines to follow similar strategies. There are not that many search engines, and while there are many popping up these days claiming privacy advantages over Google, they are usually using Google or Bing by proxy, in that they use their servers to do the search on your behalf, so the source search engines don't know it is for you, so they cannot build up data on your web activity.
Much is made of Google gathering a lot of information about pages, but a simple search for all the pages on a particular site will only show a fraction of them.
Google, like other search engines, runs several web crawlers to scan all sites on the web, often several times a day. These crawlers or bots read all the content of the pages of each site, store it, and then process it several times over over the next few days or weeks to build up a rating for each page. Well, that's the theory!
Now, try running the search site:, with your domain name after it, which is supposed to list all pages for the given domain. On a site that has been around for years, and with over 150 pages, less than 8% are listed, though at least they were the home pages and article listings. giving some visibility. Now, even if Google didn't bother to give the pages a favourable rank, they should still all be listed, because there are no other pages to push them to the bottom of the list. It is like they never existed. Note that as of August 2023, Bing seems to list 100% of pages, whereas earlier it was only 38%.
As John Mueller of Google explains in his Why does a site:query not show all my pages? video,
The short answer is that a site: query is not meant to be complete, nor used for diagnostics purposes. … This query limits the results to a specific website. It’s not meant to be a comprehensive collection of all the pages from that website. In unashamably self-promotion of their proprietary tools, Mueller says at the end
Don't worry about the counts shown in the site: query. Use Search Console instead.
The relevant questions to be asked are
Why can't the site: query be used for diagnostic purposes?,
Why does a site owner have to give privileged access to their site to Google via Search Console just to verify what Google know about it? and
Why can't any researcher, or even any consumer, know what pages Google has indexed for a site? This is all very disengenuous of Google because they used to show more, but now show less, and persistent use of the site: query will result in persistent ReCapcha verifications. This is Google trying to avoid public scrutiny of the efficacy of their search query results.
While Mueller says that the site: query will not list all pages and implies that Search Central would do that (if there are no technical issues), if a page is not listed in a site: query, it is never returned in any other query, even if prefixed with unique text from the page. That refutes his whole explanation and counters his advice. It is all lies. Here we are, trying to make content worth reading, just for a huge chunk of it to be ignored!
Now, this appears rather strange, given that Google has massive amounts of storage devoted to each page, let alone the also massive amounts of computer processing they can apply to each. Why then can't they at least store the minimum about each of the pages they are currently ignoring? It is possible that they have been so focused on ad revenue that they don't really care about being a proper search engine, but doing whatever improves their ad revenue streams. When most go for the big sites, it seems the myriad little sites become irrelevant. It is possible as Bing now does it.
These SEO experts don't seem to want to explain why all their pages are not listed. Perhaps they don't know, or don't want to let on, as that would mean that most of their advice is ineffective because Google is largely ignoring their customers' pages as well.
The first site represents small business, but it is not serving them if it doesn't advocate for them to Google or government to get their member's site's pages all listed. The last site is for the 8% mentioned earlier, but has been around for several years and is nowhere near as popular as the others. Its low coverage is indicative of what most newer small sites would get, which means they are almost invisible to the web. Those results were returned in 2022. Now the site: query for the second site returns 8 out of 227 pages (5 via StartPage), or under 4%. Google is getting worse by the year.
What this all means is that for any visitor who reads a page on these web sites and later on would like to read it again, searching for it using Google would most likely fail. Google is feigning to give you effective tools to help you improve the visibility of your site's content, only to totally ignore most of it!
Searching for where this limit is specified showed nothing. Some SEO experts are telling their readers to use site: to find out how many pages Google has registered, but are not letting them know this important limitation. Some seem all too willing to trust Google too much.
Now, if Google is doing this to your own site, how much can you trust their searches if they are just ignoring almost all of pages of the smaller sites. They have stated that they favour results from the bigger sites because they say people are more likely to trust them, but that might be reasonable if they at least included all pages. Are there sites that they judge are not even worth bothering with? What gems in the corners of the web are we missing out on because they have been ignored completely.
This seems most egregious when given the poor real support they have for those who have signed up for Google Search Central, considering the simple most basic thing they can do for a site is to list all the damn pages, and show them in search results! What it seems to imply is that they are not doing this to help the sites themselves, but to gather all the deep usage data about those site's visitors through their privileged access to a site's data, while offering just a few stats from their vast treasure trove from the site in return.
And a suggestion to the Bing team. If you actually listed all web pages, you could legitimately hit Google users with a clearly articulated advertising campaign that would point out how Google is not serving their interests at all, and so dissuade millions from using them. A suggestion for those search engines, like StartPage, that are using Google results, is perhaps it is time to diversify your sources because Google is short-changing your users as well.
This is why Smallsite Design, while originally using the site: prefix to search using Google, had to have its own search facility. It may not be as sophisticated as Google's, but at least it lists all pages on the site that match the text. That means that if someone does get to your site, they have a means to properly search your site and discover the full breadth of what you offer. If the complete coverage that Bing is currently offering endures, Smallsite Design might again offer the ability to select which search engine to use, as they are usually more comprehensive.
The real world takeaway from all this is that if a so-called SEO expert cannot get all their own site's pages listed by Google, how can they be expected to do so for your site? Save your money and time and focus on writing content worth reading. It may take time for Google to get around to listing it, but you will have a worthwhile site without wasting time on distracting SEO blind alleys. Meanwhile, you will get some attention from Bing users, which includes users of search engines that use Bing's API.
To avoid searches only focusing upon links within the same group of pages, a damping factor is applied to make sure other pages are considered.
Search engines apply what is called a page ranking algorithm to determine how likely a random visitor is to want to go to each page on the web. This will take into account the search words used along with previous history of the searcher and all other searchers' previous searches. However, the problem that can occur is that the web isn't all connected, but really consists of largely connected sites and a whole lot of smaller groups of sites that are isolated from each other.
So how do search engines find out about such isolated sites? Google signed up to be a domain registrar, but not to provide domain names. Instead, doing so allowed it to have access to all the new domain names, giving it the opportunity to discover those that had never had links to them. That is why any new site gets crawled pretty quickly after creation. If Google did that, all the other search providers probably did the same thing.
However, since page ranking is basically a popularity contest, none of the new sites will ever show up in searches until some popular site links to them. This was exploited by some by use of link farms that provided popularity, but because they did so without any real merit, Google et al have tracked such farms and downgraded or banned those who use them. But to address the real issue of less popular site invisibility, a damping factor is included to make sure a percentage of sites outside the popular ones are included. The percentage used is typically 15%.
The lie here is one of omission, because Google et al are not being honest about what they leave out, and thus that the results are severely skewed for many searches.
So if search providers know about all the pages on the web and have incorporated a damping factor of 15% to make sure less popular sites have a chance, why are they totally ignoring most pages of those sites? Well, that is the multi-billion dollar question that Google et al should be providing answers to, as that is what is reasonably expected of the major search providers with their world-wide reach.
That Google has listed such pages a lot more in the past, but has never listed all of them suggests that there must be such a significant cost to them maintaining their consideration as search result candidates that they continuously decide to ignore them after each search of a site. That is, search providers have criteria to eliminate pages for consideration early in the mapping process.
Google still lists home pages and listings pages, but basically ignores those pages that may be what people really want to get to. Technically, Google has provided a means of people finding a site that may have the information they seek, but not directly. A person would have to click on a link on one of those home or listing pages to get to any useful information, which is unlikely given that such useful pages are listed directly for more popular sites. This is how Google et al lie to us while supposedly fulfilling what they promise.
Google et al do not tell anyone why they are really ignoring most of the pages of non-large sites. They cite some technical reasons, but they rarely apply to working sites where all the pages can be seen in full in browsers. Ignoring pages significantly reduces the pool of non-popular pages that would be considered to be part of the 15% included by the damping factor, and thus skews results from that pool to more not-so-unpopular pages. The lie is not indicating that their so-called organic results are far from organic, especially for searches that would have included some excluded pages high in the results if not at the top.
While Google may be ignoring your startup efforts, there are some ways that may make them take notice, and not in a punitive way.
At one time, the holy grail of getting ranked higher by Google was to get some site that Google ranked highly to provide links to your site. However, that fell out of favour with many of those favoured sites because their staff were abusing that privilege, which led to a dramatic drop in the numbers of those links, but also a down-ranking of their statistic's importance by Google, just because the rorting made any ranking unreliable.
That leaves the only links on a page being worthwhile as those going to:
Linking to other pages on your own site shows that your content is comprehensive and that your pages support each other. Linking to authoritative sites – in Google's eyes – means that you are saying that there is validity in your content. Of course, your links have to be relevant and the content on the target must support the page's content, otherwise the Google algorithms will down-rank the page.
Now that support from the link target pages needs to more than just mentioning similar things, but obviously extend or justify what you write, or the other way round. Google is trying very hard to make the rankings correlate with quality content, so we would be mad to think that we can simply fool them, despite what SEO pundits reckon, when they are spending billions on AI to counter fraudulent up-ranking.
Google has over 200 criteria that it uses in determining rage rank, so tweaking the content to cater for one or two of them is not really going to make a difference. What really makes the difference is having content that people actually stay to read after clicking on a link in search results. Search engines know when a user returns to the results page soon after clicking on a link, and that results in the page being ranked lower for the query words used.
You will see that popular sites from the past, which seem to break all the SEO rules, still rank high in results, and that is because they know people are reading their pages. All the other ranking criteria mean nothing in that case, because they are only an indicator of probabilities. Reality trumps theory every time in the search world!
Google is being investigated for several monopolistic tendencies, but this failure to perform its advertised and expected functionality should be legally challenged.
Many countries have consumer protection laws, which often include that companies must not use misleading advertising or misrepresent their services. Clearly, this discriminative choice to ignore most of the web is a misrepresentation of services. Any reasonable person would expect a company the size of Google and that has a Google search button prominently displayed on its home page would expect that clicking on that button will return any web page satisfying the search request. They would rightly expect less-popular pages to be lower on the returned list, but not to be ignored altogether.
Furthermore, a reasonable person would expect that if a so-called search giant is not going to include all web pages, a prominent warning should be displayed so that they would know to use another search engine to get answers to their query. Consequently, Google, and any other search engines ignoring large swathes of the web, should be forced to clearly display such warnings. A company whose home page portrays itself as primarily as a search engine must be forced to live up to its projected implied promise, or warn otherwise.
Of course, many would abandon Google if they decided to continue to ignore most of the web and display such a warning, but then that is what should happen. Egregious failure to meet projected capability should not be rewarded with being allowed to continue unabated. In the end, we want companies to deliver what they present themselves as or be forced to change, either by stopping deceptive behaviour or forcing them to state what they actually do.
Make consumer affairs offices sit up and take notice by individually reporting Google and any other search engines as scams, as they really are. Report to them how many pages are listed in your site's sitemap.xml verses how many show up in a site: search. If enough people do this, maybe we will get some action to get these companies to actually do what they purport to do or face legal consequences. A lot of jurisdictions have per incident fines, which would add up to quite a lot for a global search engine in any one jurisdiction as each search is an incident.
Google together with Facebook dominate online ads, but their reporting facility is almost useless.
In many places, Google ads include a reporting facility enabling an ad to be reported to Google, but because ads are rotated, an ad that has been clicked on will likely not be there when returning from its target to report it. Thus the reporting facility is next to useless for reporting any ads, especially if they are clickbait.
Given the extensive experience Google has at determining the relevance of page content to a search query, it is natural to expect Google to be able to determine if an ad's target page is related to the ad's content. Google totally fails in this too. Microsoft News has a Google ad prominently displayed at the top of the page. Over many months, many ads have featured a picture of Kevin Rudd (ex Australian prime minister) with some supposedly relevant text, only to go to a page about interior decorating, with nothing at all related to the ad.
This shows that Google has no interest at all in dealing with bogus ads, as even poor algorithms would have no difficulty in determining that the target pages included nothing about Kevin Rudd at all. Google is lying with their search results and doing nothing to prevent ads being totally misleading. Google is thus totally undeserving of any respect as an organisation in the two major areas that it professes competency in.
All this highlights that big tech companies like Google are accomplices in disseminating lies and misinformation, and along with media companies are failing to be good world citizens, unworthy of all the trust and money thrown at them. It is time to realise that big business is abusing us for their own profit and that they must be held to account for what they are doing to undermine our governments and economies. We need independent non-commercial ways of disseminating news and information that cannot be undermined by nefarious politicians and organisations.
Such independent organisations need to be funded independently of nations, and while the United Nations would seem to be the appropriate organisation for that, that it is beholden in its security council to countries whose governments are actively trying to sabotage democracy and the freedoms of their own and other countries' citizens, we need to establish alternatives that are truly free of political and ideological influence; a true fourth estate.