Register on the forum now to remove ALL ads + popups + get access to tons of hidden content for members only!

Go Back   Site Owners Forums - Webmaster Forums > Search Engine Optimization > Search Engine Optimization

Notices


Reply
 
Thread Tools Rate Thread Display Modes
Old 10-25-2012, 06:33 PM   #1
sinicline
Registered User
 
Join Date: Oct 2012
Posts: 190
Google Crawling and Indexation 101

An insufficient Google crawl rate and incomplete indexation are the scourge of many websites, especially those large and new. In this forum and others, many members report indexation issues and ask how to solve them. The same is the case with SEO clients. My advice here is focused on Google, but the same general principles apply to crawling and indexation by Bing and Yahoo! as well.

First of all, Google indexation is hard to measure for a large site. There can be false alarms having to do with people using Google's site: operator, supposed to report the site's indexation count. It works well for small sites but is wildly unreliable for large ones and tends to severely underreport the count. Webmaster Tools is better for this, but possibly also unreliable. If your site is enormous, there is simply no certain way of knowing how many pages Google has indexed. For additional helpful data, check Google Analytics to see the total number of pages that have received visits. I also recommend that you manually run cache: checks of all your most important pages and of various random secondary pages to get a further idea of how your site is doing on the indexation front.

The Google crawl rate cannot be reliably controlled, but it can influenced by positive factors (listed here roughly in receding order of importance).
• Domain importance. Google’s Matt Cutts has recently admitted, interviewed by Eric Enge, that your site's crawl rate and depth of crawling are roughly proportional to PR. SEOs have long known this.
• Backlinks. PR is computed based on backlinks, which are absolutely central to indexation. If a site's page count is growing fast but the site is not earning enough new links, this may suggest to Google that the content is of low quality (guaranteed reduce your crawl and indexation rates).
• Deep Linking. Backlinks to individual pages (so-called "deep linking") are an effective way to ensure the indexation of those pages and their keep in the main Google index (as distinct from the supplementary index). Internal links to the same pages also help. Make sure that at least your most important pages get enough of both kinds of links. These need to be followed links (i.e. they should not contain the rel="nofollow" attribute).
• Site navigation and hierarchy. To the extent possible, a flat site hierarchy should be used. (An exemplary illustration is http://www.fanbase.com, with all the main categories appearing in the top-level navigation, enabling quick drilldown to individual pages.) This means (a) as few subdomains, subfolders and subdirectories as possible and (b) that all important pages must be reachable via the fewest clicks possible from the home page (more than 3-4 clicks is problematic).
• XML sitemaps. This a must. Here is one good tool -- http://www.xml-sitemaps.com -- for generating sitemaps; there are others too. Submit your sitemaps to the search engines via webmaster tools. Further notes:
o Sitemaps generally support <changefreq> and <priority> attributes, whose use may influence the crawl, although the impact is likely to be minor.
o Check WMT for sitemap errors and fix them.
o Just recently, Michael Gray has recommended that creating small sitemaps of (100 pages or less) to supplement your regular sitemaps can help get new content indexed faster. He has found using a dedicated sitemap for fresh content to be highly effective. I have not tested this personally yet, but it makes sense and Michael's mileage counts for a great deal.
• Duplicate content reduction. In general, duplicate content on a site is not a significant problem and does not entail "Google penalties." However, on very large sites high-volume duplicated content (identical pages sitting under different URLs) can confuse Google and impede proper indexing. One classic example of duplication occurs under different forms of site URLs: those that include the www. subdomain and those that don't (e.g. http://example.com/file1.html and http://www.example.com/file1.html typically have the same content). The way to handle this and other kinds of duplication it is via some form of URL canonicalization (see next item).
• URL canonicalization means creating a single SEO-friendly and user-friendly URL for each page and letting Google know that that URL is canonical. SEO reasons for canonicalization are various go beyond indexation issues: (1) Google, in spite of occasional denial, may assigns less importance to pages that contain extra slashed (subdirectories); (2) Google may sometimes have difficulties with URLs that are parameter-laden; (3) long ugly URLs are a turnoff for site visitors; (4) a clear, well-structures consistent URL convention is best for the user, for branding and for SEO; (5) canonicalization consolidates PageRank and link equity to the canonical version of the page, giving it a better chance to rank. Depending on your platform, various rewrite engines (see http://en.wikipedia.org/wiki/Rewrite_engine) can be used to automate the rewriting of URLs from "ugly" into friendly ones. URL canonicalization can be performed in any of 3 different ways:
o 301-redirect ("moved permanently") of all duplicate URLs to the canonical. IMHO this is the most reliable method of canonicalization, but it may have certain overheads.
o rel="canonical": Place a link of the form <link rel="canonical" href="http://example.com/canonical-url-example.html"> at the end of the <head> of each duplicate page. (Yes, it's OK for the canonical version to include this link to itself; and no, there is no limit on how many canonical links you can have.)
o "Display URLs as": the effect of this setting in the Google Webmaster Tools is similar to that of rel="canonical" and is the easiest option if you prefer not to write any code.
• URL stability and page uniqueness. While the issues surrounding duplicate content are fairly well known, one potential problem that is rarely discussed is the opposite. The term I have coined for it is multitasking URLs. Some applications may display different dynamically generated content under the same URL (for example, content specific the user's geographical location). Additionally, the title tags for such pages may also be generated on the fly and contradictory. I have seen this lead to a variety of indexation and search issues. For best results, the content of each page, whether dynamic or static, must be unique and must appear under its proper, unique and stable URL and title tag.
• Unique title tags. If you use the same title tags across multiple pages, Google may assume that those pages are duplicate and be reluctant to index them. Make your titles unique.
• Manual crawl rate setting. Google's Webmaster Tools offer a choice between letting Google determine the crawl rate automatically and setting it manually via a slide bar. Although setting it manually to max is unlikely to boost the crawl rate dramatically, it may brings about marginal improvement.
• Original content. It's good for all your important pages to have significant and unique original content.
• Updates, feeds, pinging. Frequent content updates both site-wide and on individual pages can significantly improve the crawl rate. Further, exporting RSS feeds and implementing automated search engine pinging have a beneficial effect. Pinging resources include http://pingomatic.com/ and http://pingler.com/.
• Social Media. Links from social media, although they are nofollow, help Google discover and index new content. Including sharing buttons on your pages and promoting them on social media sites can help get your pages into the index faster.
__________________

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
sinicline is offline   Reply With Quote

Old 10-25-2012, 11:04 PM   #2
martinjohn885
Registered User
 
Join Date: Oct 2012
Posts: 52
A great informative info. about Google Crawling and Indexation.
__________________

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
martinjohn885 is offline   Reply With Quote
Old 10-25-2012, 11:15 PM   #3
chitra19
Registered User
 
Join Date: Feb 2012
Location: In my World
Posts: 54
It's pretty much good content for all SEO. It's much impressive and useful tips to follow.
__________________

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
chitra19 is offline   Reply With Quote
Old 01-05-2016, 03:38 AM   #4
ssconlineexam
Registered User
 
Join Date: Jan 2016
Posts: 1
Hello sir well pretty nice post thanks for sharing it sir...
ssconlineexam is offline   Reply With Quote
Old 01-05-2016, 04:07 AM   #5
James More
Registered User
 
Join Date: Jan 2016
Posts: 31
Wow very healthy information regarding google crawling you shared. We all should know about the process of indexing.
James More is offline   Reply With Quote
Old 01-05-2016, 11:03 PM   #6
way2practice
Registered User
 
Join Date: Jan 2016
Posts: 1
Excellent post thanks for sharing very nice information admin keep it blogging more
way2practice is offline   Reply With Quote
Old 01-06-2016, 03:44 AM   #7
sbiclerkpo
Registered User
 
Join Date: Jan 2016
Posts: 1
Great information admin thanks for sharing it
sbiclerkpo is offline   Reply With Quote
Old 01-07-2016, 02:16 AM   #8
ibpsadda
Registered User
 
Join Date: Jan 2016
Posts: 1
Wow, really awesome post thanks admin keep posting it....
ibpsadda is offline   Reply With Quote
Old 01-08-2016, 12:54 AM   #9
currentaffairs
Registered User
 
Join Date: Jan 2016
Posts: 1
very nice post sir, keep it blogging more very happy by seeing your post thanks for sharing it...
currentaffairs is offline   Reply With Quote
Old 01-08-2016, 03:46 AM   #10
Sangeetakashyap
Registered User
 
Join Date: Sep 2015
Location: Delhi
Posts: 168
A nice info shared about crawling and indexing. It will be helpful for all who are still searching for it.
__________________

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.



To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.



To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Sangeetakashyap is offline   Reply With Quote
Old 01-08-2016, 03:48 AM   #11
governmentjobs
Registered User
 
Join Date: Jan 2016
Posts: 1
amazing posts, super sharing thanks for all looking nice site thanks admin keep on posting
governmentjobs is offline   Reply With Quote
Old 01-11-2016, 12:04 AM   #12
fewmiles
Registered User
 
Join Date: Jan 2016
Posts: 1
Super sharing very nice post thanks for sharing it cool information keep it up....
fewmiles is offline   Reply With Quote
Old 01-11-2016, 02:36 AM   #13
gr8wishes
Registered User
 
Join Date: Jan 2016
Posts: 1
Really awesome website thanks for sharing it very cool post
gr8wishes is offline   Reply With Quote
Old 01-11-2016, 09:57 PM   #14
quantumleap
Registered User
 
Join Date: Jan 2016
Location: Hyderabad
Posts: 728
Good Info Shared Thanks ...
quantumleap is offline   Reply With Quote
Old 01-11-2016, 11:04 PM   #15
cybnetics
Registered User
 
Join Date: Nov 2015
Location: Delhi
Posts: 70
Thank your for sharing useful information.
__________________

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
|
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
cybnetics is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 11:41 AM.


Powered by vBulletin Copyright Β© 2013 vBulletin Solutions, Inc.