I have often been modifying the rules of robots.txt to prevent duplicate search queries on Google.
You can monitor by site:yourwebsite.com compare between posts and search results, after necessary changes submit sitemap to Google.
You can also use google parameter tool best way to find the duplicate issues.
how to create a robots.txt file
- Create text document using notepad rename to robots.txt. The txt extension may hide on mostly computers.
- Upload robots.txt to your website root directory .
- make sure you can access by yourwebsite.com/robots.txt
- generally robots.txt comes with default installation scripts like wordpress, zoomla, blogger also.
USe Robots txt with caution if you are misused it prevents google to crawl you may lose traffic. alway test with google search console before applying any rule.
Before blocking: check actual valid pages vs indexed pages (to avoid low quality pages, auto generated pages index in google may cause thin content penalty due to overall site quality)
what is the use of robots.txt file
- The main advantage of robots.txt to block search engines either entire website or some pages.
- To block dynamic urls that are cause duplicate and low quality content issues (blocking login page from google).
- In wordpress replytocom and /s?= search strings causes duplicate content on search results page and posts.
- If user agent * means applicable for all search engines like Google,bing,yandex and other.
- Allow means Allowing crawl, disallow tells search engines not to crawl
- Here user agent Google media partners like adsense disallowing from crawling content.
Most customized robots.txt for WordPress
User-agent: Mediapartners-Google Disallow: