Learn about Google’s Crawling, Indexing and Serving
It is an amazing experience to search web pages on Google to match our query at the click of a mouse. Google search takes just a few seconds to present us the valuable information we look for from anywhere in the world, sitting comfortably in our home or office.
It is interesting to know how does Google find the matching pages and determine the order of search results.
The three processes that deliver the search results are:
Crawling
This is a process of discovering new and updated pages by Googlebot for addition in the Google index.
Billions of web pages are fetched (crawled) by using a very huge set of computers. The programme which facilitates fetching is called Googlebot, also known as a ‘robot’ or ‘spider’. Googlebot determines the site to crawl, the frequency of crawling and the number of pages to be fetched from each site.
Crawling begins with a list of web page URLs, augmented with Webmaster’s sitemap data. Googlebot detects links on pages and compiles its list of pages for crawling. Changes to sites, addition of new sites and dead links are used in updating Google index.
Indexing
Googlebot processes each crawled page and compiles an index of all the words and their location on each page. In addition, Google processes information of Title tags and ALT attributes. Googlebot can process many content types excluding content of most Flash files or dynamic pages.
Serving results
When a user enters a query, Google’s machines search the index for matching pages and return the most relevant results to the user. Relevancy is determined by more than 200 factors, one of which is the PageRank for a given page. It signifies the importance of a page based on the incoming links from other pages.
Google’s Webmaster Guidelines describe details of best practices to avoid pitfalls and improve site’s ranking.