|
How to make Newspolitan crawl your site:
Newspolitan.com pioneered with vertical search engine technology is trying to solve a different, more specific problem than a generalist one, focusing on the needs of news market by visiting websites containing news and collecting details of the news to build a searchable index for Newspolitan database with all the news present on the web.
We are in a generation where people use search engines on a daily basis to find news online. News websites who are looking to drive more traffic to their website and more visibility in search space can list their website on Newspolitan.com and whenever any news seeker searches for news with keywords matching the news listings the news from that particular News website is shown in the results. While there may be many pieces of the puzzle to accomplish this, one of the foundational things that can be done is to make sure your site is submitted to Newspolitan.com.
Please contact us to list your website and news on Newspolitan.com
Newspolitan robot:
How to prevent Newspolitan from crawling your site:
If you wish to exclude your website from Newspolitan index, you can place a file at the root of your server called robots.txt. This is the standard protocol that most web crawlers observe for excluding a web server or directory from an index. Please note that Newsbot does not interpret a 401/403 response ("Unauthorized"/"Forbidden") to a robots.txt fetch as a request not to crawl any pages on the site.
To remove your site from Newspolitan and prevent all robots from crawling it in the future, place the following robots.txt file in your server root:
User-agent: *
Disallow: /
To remove your site from Newspolitan only and prevent just Newsbot from crawling your site in the future, place the following robots.txt file in your server root:
User-agent: Newsbot
Disallow: /
Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Newsbot to index all http pages but no https pages, you'd use the robots.txt files below.
For your http protocol (http://yourserver.com/robots.txt):
User-agent: *
Allow: /
For the https protocol (https://yourserver.com/robots.txt):
User-agent: *
Disallow: /
Newspolitan will continue to exclude your site or directories from successive crawls if the robots.txt file exists in the web server root. If you do not have access to the root level of your server, you may place a robots.txt file at the same level as the files you want to remove. |