I have often been modifying the rules of robots.txt to prevent duplicate search queries on Google.
disallow : /website directory name/it applicable for child directories/ urls? no_redirect=true? utm_source? Action? utm_campaign=? wptouch_switch? redirect ? post_id? no_redirect? replytocom we can block Monthly archive pages by meta tagsmay lose some important mobile urls that already indexed and ranking, but rel canonical already set to desktop version but still google indexing country level domains also.
You can monitor by site:yourwebsite.com compare between posts and search results, after necessary changes resubmit sitemap to Google.
You can also use google parameter tool best way to find the duplicate issues.
how to create a robots.txt file
- Create
text document using note pad rename to robots.txt. Thetxt extnesion may hide onmostly computers. - Upload robots.txt to your website root directory
. make sure you can access by yourwebsite.com/robots.txtgenerally robots.txt comes with default installation scripts likewordpress ,zoomla , blogger also.
what is the use of robots.txt file
- The main advantage of robots.txt to block search engines either entire website or some pages.
- To block dynamic
urls that are cause duplicate and low quality content issues (blockinglogin page from google). - In
wordpress replytocom and /s? =serach strings causes duplicate content onsearch results page and posts.
robots.txt file example
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Allow: /
- If user agent * means
applicable for all search engines like Google, bing , yandex and other. - Allow means Allowing crawl, disallow tells search engines not to crawl
- Here user agent Google media partners like
adsense disallowing from crawling content.
If you want to block a directory using robots text then you have mention
Disallow: /*? means url that contains ? mark anywhere in url blocks by robots.txt
Most customized robots.txt for WordPress
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Allow: /
User-agent: *
Disallow: /cgi-bin/
Disallow: /page/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
Disallow: *?wptheme
Disallow: ?comments=*
Disallow: *?replytocom
Disallow: /wp-content/plugins/
Disallow: /20
Disallow: *feed
Disallow: *?no_redirect=true
Disallow: /?
Disallow: /search/
Disallow: /?s=
Disallow: ?wptouch_switch
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Below are the optional strings depending upon plugin, mobile urls ? if you are getting traffic from twitter, facebook , feedburrner and rss that redirected urls also caches by Google bot to prevent that we have to add ? utm source and medium tags,
wptouch_switch=desktop&redirect
utm_medium
/? from_index
Best robots.txt for blogger
User-agent: *
Disallow: /search
Disallow: /*?
Allow: /
Sitemap: http://www.theonlineking.com/sitemap.xml
In blogger Disallow search helps preventing lables and search max results page crawl by google.
Disallow: /*? helps for blocking mobile redirect urls in blogger like yourwebsite.com/?m=1,
Also look at. htaccess tutorials (we can do much pretty with .htaccess)
Leave a Reply