The website data source feature enables you to efficiently gather and preserve content from various websites. This collected information can then be utilized to effectively respond to your customers' inquiries. This document provides a comprehensive guide on utilizing the Website Ingestion feature, emphasizing its configuration settings and offering illustrative examples for better understanding.
You can see below the available parameters that you can use, in order to configure the ingestion process:
The seed URLs are the starting points for your crawl. They are the bases from which the crawler will begin exploring links. The crawler only visits URLs that match the seed URLs or their sub-directories.
The excluded URLs are the ones you want the crawler to ignore. You can specify them in the same way as seed URLs. The crawler does not visit the URLs that match the excluded URLs or their sub-directories.
If an excluded URL is https://example.com/blog/, the crawler does not explore this page or any pages under this directory.
The sitemap URLs are the URLs of sitemaps from which the crawler can fetch a list of URLs to visit — for example, https://example.com/sitemap.xml
Moveo crawls websites every 24 hours.