Re-crawling Persistently Fails
Dan Macarie avatar
Written by Dan Macarie
Updated over a week ago

Re-crawling fails in case Nosto’s crawler-bot can’t access the site from the Internet. A re-crawling process is a simple Get-request, which simulates a page load and done only once when Nosto receives changed or conflicting details hence it doesn’t cause increased load on servers.

If a re-crawling persistently fails review list of typical issues and tips how to fix below.

Page Does Not Exist

In case a product [or page has been removed from the catalogue and thus is inaccessible, Nosto’s crawler can’t access the page. Such products are automatically removed from Nosto’s index as outofstock/discontinued.

Environment Is Inaccessible From The Internet

Local Installations

In case Nosto is installed on a local environment or on an environment which doesn’t have a public address such as localhost or http://sitename:9000 Nosto’s crawler obviously can’t follow the url-link mapped and recrawl the page. This affects only test installations. Read detailed guide how to work with Nosto using development environments.

User:Password Authentication

Nosto’s bot supports simple http-authentication such as .htaccess if one is applied. Send credentials to support and also include your accountID in the email.

Firewall

Nosto’s crawler-bot is launched from a range of IP-addresses which change randomly. Allowing access from an IP-address or range of addresses is therefore not applicable as these might be changed by the time you have whitelisted them. Read detailed guide how to work with Nosto using development environments.

Site Has A Landing Page Based On Geo-location

Nosto’s crawler-bot is currently launched from an IP-address in the US. Sites that use geo-location based landing pages or a template page in general, may block the crawler from accessing the site directly. Simply fix by allowing Nosto’s crawler to bypass such a landing page by adding a rule based on the agent-header details.

Nosto’s crawler uses following crawler header.

Mozilla/5.0 (compatible; NostoCrawlerBot/1.0; +http://my.nosto.com/tagging)

A similar issue might be experienced on sites which have a template for region and/or language selection.

Other typical errors are when a bot reads different details based on the location for example when customers from the US can shop tax-free (bot reads prices excluding VAT) or vice versa when non-US customers can shop tax-free, the bot consistently reads prices including VAT.

Bot Traffic Not Accepted

Similarly to previous, make sure that Nosto’s bot can access the site normally and without limitations.

Did this answer your question?