Allowing WeAssist to crawl your website
WeAssist indexes your website by crawling its pages, similar to how search engines work. If your website uses bot protection, a firewall, or security plugins, the crawler may be blocked from accessing your content. This prevents WeAssist from building or updating its knowledge base.
This article explains what to share with your web developer or system administrator so they can allow the WeAssist crawler through.
What the WeAssist crawler looks like
Your web developer will need one or both of the following to create an exception:
User-agent string:
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:133.0) Gecko/20100101 Firefox/133.0
IP address:
52.48.151.12
Whitelisting by IP is usually the quickest fix. The user-agent is useful as a fallback or if your security setup filters by browser identity rather than IP.
Where to make the change
The right place to add the exception depends on how your website is protected. Common places to look:
- Firewall or CDN settings: if your site is behind a service like Cloudflare, the exception is typically added as a custom WAF rule allowing the IP address or user-agent
- WordPress security plugins: plugins like Wordfence, iThemes Security, or All-In-One Security often have their own bot protection or IP blocking settings, separate from any CDN in front of the site
- Hosting provider firewall: some hosting providers include firewall rules at the server level that are managed through their control panel
- robots.txt: if your robots.txt contains a blanket
Disallow: /rule, add an exception for the WeAssist user-agent above it (see below)
In most cases, your web developer or whoever manages your website's security settings will know where to look. Share the user-agent string and IP address with them and ask them to add an allow rule for both.
robots.txt exception
You can view your robots.txt file by visiting https://yourdomain.com/robots.txt. If it contains a rule blocking all crawlers, add the following above any existing rules:
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:133.0) Gecko/20100101 Firefox/133.0
Allow: /
Note that robots.txt is a convention - it signals to well-behaved crawlers that they should not access your site, but it does not actively block access the way a firewall does. If WeAssist is not indexing your content, a firewall or security plugin is the more likely cause.
