[Self-Host] call to playwright is failing #902

rostwal95 · 2024-11-15T05:16:26Z

Describe the Issue
Call to playwright fails when trying to scrape with playwright.

To Reproduce
Steps to reproduce the issue:

Configure the environment or settings with '...'
Run the command '...'
Observe the error or unexpected output at '...'
Log output/error message

Expected Behavior
The call to playwright should be successful and dynamic js should be rendered and cleaned up.

Screenshots
If applicable, add screenshots or copies of the command line output to help explain the self-hosting issue.

Environment (please complete the following information):

OS: [e.g. macOS, Linux, Windows]
Firecrawl Version: [e.g. 1.2.3]
Node.js Version: [e.g. 14.x]
Docker Version (if applicable): [e.g. 20.10.14]
Database Type and Version: [e.g. PostgreSQL 13.4]

Logs
worker-1 | 2024-11-15 05:13:48 debug [ScrapeURL:]: Engine docx meets feature priority threshold
worker-1 | 2024-11-15 05:13:48 info [ScrapeURL:]: Scraping via playwright...
worker-1 | 2024-11-15 05:13:48 debug [ScrapeURL:scrapeURLWithPlaywright]: Sending request...
worker-1 | 2024-11-15 05:13:48 debug [ScrapeURL:scrapeURLWithPlaywright]: Request sent failure status
worker-1 | 2024-11-15 05:13:48 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.
worker-1 | 2024-11-15 05:13:48 info [ScrapeURL:]: Scraping via fetch...

here are the logs

Configuration
Provide relevant parts of your configuration files (with sensitive information redacted).

Additional Context
Add any other context about the self-hosting issue here, such as specific infrastructure details, network setup, or any modifications made to the original Firecrawl setup.

mogery · 2024-11-15T10:19:24Z

Can you share the logs of the playwright microservice as well?

mkaskov · 2024-11-15T13:41:33Z

the same problem.
with apps/playwright-service-ts

playwright-service-1 | SyntaxError: Unexpected token " in JSON at position 0
playwright-service-1 | at JSON.parse ()
playwright-service-1 | at createStrictSyntaxError (/usr/src/app/node_modules/body-parser/lib/types/json.js:169:10)
playwright-service-1 | at parse (/usr/src/app/node_modules/body-parser/lib/types/json.js:86:15)
playwright-service-1 | at /usr/src/app/node_modules/body-parser/lib/read.js:128:18
playwright-service-1 | at AsyncResource.runInAsyncScope (node:async_hooks:203:9)
playwright-service-1 | at invokeCallback (/usr/src/app/node_modules/raw-body/index.js:238:16)
playwright-service-1 | at done (/usr/src/app/node_modules/raw-body/index.js:227:7)
playwright-service-1 | at IncomingMessage.onEnd (/usr/src/app/node_modules/raw-body/index.js:287:7)
playwright-service-1 | at IncomingMessage.emit (node:events:517:28)
playwright-service-1 | at endReadableNT (node:internal/streams/readable:1400:12)

mogery · 2024-11-15T14:19:13Z

I just made a change, I think the way we sent the request to the microservice was wrong. Can you rebuild firecrawl (no need to rebuild playwright-service) and try again?

rostwal95 · 2024-11-15T16:16:49Z

I am getting errors while building the docker container as well -

=> ERROR [playwright-service 2/6] RUN apt-get update && apt-get install -y --no-install-recommends gcc libstdc++6 0.9s

[playwright-service 2/6] RUN apt-get update && apt-get install -y --no-install-recommends gcc libstdc++6:
0.539 Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB]
0.645 Err:1 http://deb.debian.org/debian bookworm InRelease
0.645 At least one invalid signature was encountered.
0.648 Get:2 http://deb.debian.org/debian bookworm-updates InRelease [55.4 kB]
0.677 Err:2 http://deb.debian.org/debian bookworm-updates InRelease
0.677 At least one invalid signature was encountered.
0.693 Get:3 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
0.717 Err:3 http://deb.debian.org/debian-security bookworm-security InRelease
0.717 At least one invalid signature was encountered.
0.722 Reading package lists...
0.728 W: GPG error: http://deb.debian.org/debian bookworm InRelease: At least one invalid signature was encountered.
0.728 E: The repository 'http://deb.debian.org/debian bookworm InRelease' is not signed.
0.728 W: GPG error: http://deb.debian.org/debian bookworm-updates InRelease: At least one invalid signature was encountered.
0.728 E: The repository 'http://deb.debian.org/debian bookworm-updates InRelease' is not signed.
0.728 W: GPG error: http://deb.debian.org/debian-security bookworm-security InRelease: At least one invalid signature was encountered.
0.728 E: The repository 'http://deb.debian.org/debian-security bookworm-security InRelease' is not signed.

failed to solve: process "/bin/sh -c apt-get update && apt-get install -y --no-install-recommends gcc libstdc++6" did not complete successfully: exit code: 100

rostwal95 · 2024-11-15T16:25:57Z

I still see the issue, not sure why the logging level is not marked as error -

worker-1 | 2024-11-15 16:23:58 info [:]: 🐂 Worker taking job b2c3e207-55ca-4abb-8be1-57a0b1b88cd2
worker-1 | 2024-11-15 16:23:58 info [ScrapeURL:]: Scraping URL "https://www.britishairways.com/travel/home/public/en_us/"...
worker-1 | 2024-11-15 16:23:58 debug [ScrapeURL:]: Engine scrapingbee meets feature priority threshold
worker-1 | 2024-11-15 16:23:58 debug [ScrapeURL:]: Engine scrapingbeeLoad meets feature priority threshold
worker-1 | 2024-11-15 16:23:58 debug [ScrapeURL:]: Engine playwright meets feature priority threshold
worker-1 | 2024-11-15 16:23:58 debug [ScrapeURL:]: Engine fetch meets feature priority threshold
worker-1 | 2024-11-15 16:23:58 debug [ScrapeURL:]: Engine pdf meets feature priority threshold
worker-1 | 2024-11-15 16:23:58 debug [ScrapeURL:]: Engine docx meets feature priority threshold
worker-1 | 2024-11-15 16:23:58 info [ScrapeURL:]: Scraping via scrapingbee...
worker-1 | 2024-11-15 16:23:59 error [ScrapeURL:]: ScrapingBee threw an error {"module":"ScrapeURL","scrapeId":"b2c3e207-55ca-4abb-8be1-57a0b1b88cd2","method":"","engine":"scrapingbee","body":{"message":"Invalid api key: # use if you'd like to use as a fallback scraper"}}
worker-1 | 2024-11-15 16:23:59 info [ScrapeURL:]: Engine scrapingbee could not scrape the page.
worker-1 | 2024-11-15 16:23:59 info [ScrapeURL:]: Scraping via scrapingbeeLoad...
worker-1 | 2024-11-15 16:23:59 error [ScrapeURL:]: ScrapingBee threw an error {"module":"ScrapeURL","scrapeId":"b2c3e207-55ca-4abb-8be1-57a0b1b88cd2","method":"","engine":"scrapingbeeLoad","body":{"message":"Invalid api key: # use if you'd like to use as a fallback scraper"}}
worker-1 | 2024-11-15 16:23:59 info [ScrapeURL:]: Engine scrapingbeeLoad could not scrape the page.
worker-1 | 2024-11-15 16:23:59 info [ScrapeURL:]: Scraping via playwright...
worker-1 | 2024-11-15 16:23:59 debug [ScrapeURL:scrapeURLWithPlaywright]: Sending request...
worker-1 | 2024-11-15 16:23:59 debug [ScrapeURL:scrapeURLWithPlaywright]: Request failed
worker-1 | 2024-11-15 16:23:59 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.
worker-1 | 2024-11-15 16:23:59 info [ScrapeURL:]: Scraping via fetch...
worker-1 | 2024-11-15 16:24:01 info [ScrapeURL:]: Scrape via fetch deemed successful.
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Executing transformer deriveHTMLFromRawHTML...
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Finished executing transformer deriveHTMLFromRawHTML (7ms)
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Executing transformer deriveMarkdownFromHTML...
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Finished executing transformer deriveMarkdownFromHTML (1ms)
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Executing transformer deriveLinksFromHTML...
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Finished executing transformer deriveLinksFromHTML (0ms)
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Executing transformer deriveMetadataFromRawHTML...
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Finished executing transformer deriveMetadataFromRawHTML (4ms)
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Executing transformer uploadScreenshot...
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Finished executing transformer uploadScreenshot (0ms)
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Executing transformer performLLMExtract...
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Finished executing transformer performLLMExtract (0ms)
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Executing transformer coerceFieldsToFormats...
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Finished executing transformer coerceFieldsToFormats (0ms)
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Executing transformer removeBase64Images...
worker-1 | 2024-11-15 16:24:01 debug [ScrapeURL:]: Finished executing transformer removeBase64Images (0ms)
worker-1 | 2024-11-15 16:24:01 info [:]: 🐂 Job done b2c3e207-55ca-4abb-8be1-57a0b1b88cd2

response has empty markdown -

{
"success": true,
"data": {
"markdown": "",
"metadata": {
"title": "British Airways | Book Flights, Holidays, City Breaks & Check In Online",
"description": "Save on worldwide flights and holidays when you book directly with British Airways. Browse our guides, find great deals, manage your booking and check in online.",
"language": "en",
"robots": "all",
"ogLocaleAlternate": [],
"theme-color": "#ffffff",
"viewport": "width=device-width, initial-scale=1",
"sourceURL": "https://www.britishairways.com/travel/home/public/en_us/",
"url": "https://www.britishairways.com/travel/home/public/en_us/",
"statusCode": 200
}
}
}

mkaskov · 2024-11-20T06:53:17Z

another error. after that happens firecrawl start working not correct

worker-1 | 2024-11-20 06:40:56 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.
worker-1 | 2024-11-20 06:40:56 info [ScrapeURL:]: Scraping via fetch...
worker-1 | 2024-11-20 06:40:57 info [ScrapeURL:]: Scrape via fetch deemed successful.
worker-1 | 2024-11-20 06:40:57 info [:]: 🐂 Job done 79431bc8-736d-4379-bf0d-ddae76e0dabe
api-1 | 2024-11-20 06:40:57 warn [:]: You're bypassing authentication {}
playwright-service-1 | ✅ Scrape successful!
worker-1 | 2024-11-20 06:40:57 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.
worker-1 | 2024-11-20 06:40:57 info [ScrapeURL:]: Scraping via fetch...
playwright-service-1 | ✅ Scrape successful!
worker-1 | 2024-11-20 06:40:57 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.
worker-1 | 2024-11-20 06:40:57 info [ScrapeURL:]: Scraping via fetch...
playwright-service-1 | ✅ Scrape successful!
worker-1 | 2024-11-20 06:40:57 info [ScrapeURL:]: Scrape via fetch deemed successful.
worker-1 | 2024-11-20 06:40:58 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.
worker-1 | 2024-11-20 06:40:58 info [ScrapeURL:]: Scraping via fetch...
worker-1 | 2024-11-20 06:40:58 info [ScrapeURL:]: Scrape via fetch deemed successful.
worker-1 | 2024-11-20 06:40:58 info [ScrapeURL:]: Scrape via fetch deemed successful.
worker-1 | 2024-11-20 06:40:58 info [:]: 🐂 Job done 27e3c51f-1b4a-45a9-8b4f-abe61f67ac8a
worker-1 | 2024-11-20 06:40:58 info [:]: 🐂 Job done e2ec67bb-740b-40fa-803d-4918ced6006c
worker-1 | 2024-11-20 06:40:58 info [:]: 🐂 Job done a252a1c1-aa4c-4f0c-960b-c379292cb997
worker-1 | 2024-11-20 06:40:59 info [:]: 🐂 Worker taking job 0ad857e1-70f6-4c2d-9255-2c890f207c5a
worker-1 | 2024-11-20 06:40:59 error [:]: 🐂 Job errored 0ad857e1-70f6-4c2d-9255-2c890f207c5a - TypeError: Cannot read properties of undefined (reading 'timeout') {}
worker-1 | 2024-11-20 06:40:59 error [:]: undefined {}
worker-1 | 2024-11-20 06:40:59 error [:]: TypeError: Cannot read properties of undefined (reading 'timeout')
worker-1 | at processJob (/app/dist/src/services/queue-worker.js:249:40)
worker-1 | at processJobInternal (/app/dist/src/services/queue-worker.js:65:30)
worker-1 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {}
worker-1 | /app/dist/src/main/runWebScraper.js:18
worker-1 | formats: job.data.scrapeOptions.formats.concat(["rawHtml"]),
worker-1 | ^
worker-1 |
worker-1 | TypeError: Cannot read properties of undefined (reading 'formats')
worker-1 | at startWebScraperPipeline (/app/dist/src/main/runWebScraper.js:18:49)
worker-1 | at processJob (/app/dist/src/services/queue-worker.js:245:57)
worker-1 | at processJobInternal (/app/dist/src/services/queue-worker.js:65:30)
worker-1 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
worker-1 |
worker-1 | Node.js v20.18.0
worker-1 exited with code 1

lauridskern · 2024-11-21T15:28:31Z

same issue for me

fatwang2 · 2024-11-27T08:31:59Z

same issue

shingoxray · 2024-12-02T06:58:21Z

same issue +1

zhucan · 2024-12-03T05:02:42Z

same issue +1

Hanxiao-Adam-Qi · 2024-12-06T12:33:43Z

Same issue "info [ScrapeURL:]: An unexpected error happened while scraping with playwright. ", both in original playwright and playwright-ts

riddlegit · 2024-12-09T08:32:06Z

Cannot read properties of undefined (reading 'timeout')

I guess this kind of error should be caused by some job config properties missing, maybe try to add a "timeout" property in json job data or scrapOptions.
https://docs.firecrawl.dev/v1-welcome

lune-sta · 2024-12-15T05:45:05Z

same issue

mogery · 2024-12-15T14:00:47Z

Hey y'all! This should be fixed by #977 which we just merged. Can you re-test?

rostwal95 · 2024-12-15T15:19:07Z

playwright-service-1 | [2024-12-15 15:17:25 +0000] [10] [INFO] Running on http://[::]:3000 (CTRL + C to quit)
api-1 | 2024-12-15 15:18:09 warn [:]: You're bypassing authentication {}
api-1 | 2024-12-15 15:18:09 warn [:]: You're bypassing authentication {}
worker-1 | 2024-12-15 15:18:09 info [queue-worker:processJob]: 🐂 Worker taking job b4219ad0-71ee-4551-9a4a-923eaa71d301
worker-1 | 2024-12-15 15:18:09 info [ScrapeURL:]: Scraping URL "https://www.britishairways.com/travel/home/public/en_us/"...
worker-1 | 2024-12-15 15:18:09 debug [ScrapeURL:]: Engine scrapingbee meets feature priority threshold
worker-1 | 2024-12-15 15:18:09 debug [ScrapeURL:]: Engine scrapingbeeLoad meets feature priority threshold
worker-1 | 2024-12-15 15:18:09 debug [ScrapeURL:]: Engine playwright meets feature priority threshold
worker-1 | 2024-12-15 15:18:09 debug [ScrapeURL:]: Engine fetch meets feature priority threshold
worker-1 | 2024-12-15 15:18:09 debug [ScrapeURL:]: Engine pdf meets feature priority threshold
worker-1 | 2024-12-15 15:18:09 debug [ScrapeURL:]: Engine docx meets feature priority threshold
worker-1 | 2024-12-15 15:18:09 info [ScrapeURL:]: Scraping via scrapingbee...
worker-1 | 2024-12-15 15:18:10 error [ScrapeURL:]: ScrapingBee threw an error {"module":"ScrapeURL","scrapeId":"b4219ad0-71ee-4551-9a4a-923eaa71d301","scrapeURL":"https://www.britishairways.com/travel/home/public/en_us/","method":"","engine":"scrapingbee","body":{"message":"Invalid api key: # use if you'd like to use as a fallback scraper"}}
worker-1 | 2024-12-15 15:18:10 info [ScrapeURL:]: Engine scrapingbee could not scrape the page.
worker-1 | 2024-12-15 15:18:10 info [ScrapeURL:]: Scraping via scrapingbeeLoad...
worker-1 | 2024-12-15 15:18:10 error [ScrapeURL:]: ScrapingBee threw an error {"module":"ScrapeURL","scrapeId":"b4219ad0-71ee-4551-9a4a-923eaa71d301","scrapeURL":"https://www.britishairways.com/travel/home/public/en_us/","method":"","engine":"scrapingbeeLoad","body":{"message":"Invalid api key: # use if you'd like to use as a fallback scraper"}}
worker-1 | 2024-12-15 15:18:10 info [ScrapeURL:]: Engine scrapingbeeLoad could not scrape the page.
worker-1 | 2024-12-15 15:18:10 info [ScrapeURL:]: Scraping via playwright...
worker-1 | 2024-12-15 15:18:10 debug [ScrapeURL:scrapeURLWithPlaywright]: Request failed
worker-1 | 2024-12-15 15:18:10 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.
worker-1 | 2024-12-15 15:18:10 info [ScrapeURL:]: Scraping via fetch...
worker-1 | 2024-12-15 15:18:11 info [ScrapeURL:]: Scrape via fetch deemed successful.
worker-1 | 2024-12-15 15:18:11 debug [ScrapeURL:]: Executed transformers.
worker-1 | 2024-12-15 15:18:11 info [queue-worker:processJob]: 🐂 Job done b4219ad0-71ee-4551-9a4a-923eaa71d301
worker-1 | 2024-12-15 15:18:11 debug [queue-worker:processJobInternal]: Job succeeded -- putting result in Redis
api-1 | 2024-12-15 15:18:11 warn [:]: You're bypassing authentication {}

pulled the latest code .. still the same

rostwal95 · 2024-12-15T18:13:58Z

api-1 | 2024-12-15 18:10:43 warn [:]: You're bypassing authentication {}
api-1 | 2024-12-15 18:10:43 warn [:]: You're bypassing authentication {}
api-1 | 2024-12-15 18:10:43 debug [api/v1:crawlController]: Crawl 130c2cee-6bf8-417b-a25f-7cfcb7152680 starting
api-1 | 2024-12-15 18:10:43 debug [api/v1:crawlController]: Determined limit: 10000
api-1 | 2024-12-15 18:10:48 debug [api/v1:crawlController]: Failed to get robots.txt (this is probably fine!)
api-1 | 2024-12-15 18:10:48 debug [crawl-redis:saveCrawl]: Saving crawl 130c2cee-6bf8-417b-a25f-7cfcb7152680 to Redis...
api-1 | 2024-12-15 18:10:48 debug [WebCrawler:tryGetSitemap]: Fetching sitemap links from https://www.britishairways.com/travel/home/public/en_us/
api-1 | 2024-12-15 18:10:53 debug [WebCrawler:tryFetchSitemapLinks]: Failed to fetch sitemap with axios from https://www.britishairways.com/travel/home/public/en_us//sitemap.xml
api-1 | 2024-12-15 18:10:53 info [ScrapeURL:]: Scraping URL "https://www.britishairways.com/travel/home/public/en_us//sitemap.xml"...
api-1 | 2024-12-15 18:10:53 debug [ScrapeURL:]: Engine fire-engine;tlsclient meets feature priority threshold
api-1 | 2024-12-15 18:10:53 info [ScrapeURL:]: Scraping via fire-engine;tlsclient...
api-1 | 2024-12-15 18:10:53 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed, trying 2 more times
api-1 | 2024-12-15 18:10:53 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed, trying 1 more times
api-1 | 2024-12-15 18:10:53 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed
api-1 | 2024-12-15 18:10:53 info [ScrapeURL:]: An unexpected error happened while scraping with fire-engine;tlsclient.
api-1 | 2024-12-15 18:10:53 warn [ScrapeURL:]: scrapeURL: All scraping engines failed! {"module":"ScrapeURL","scrapeId":"sitemap","scrapeURL":"https://www.britishairways.com/travel/home/public/en_us//sitemap.xml","error":{"fallbackList":["fire-engine;tlsclient"],"results":{"fire-engine;tlsclient":{"state":"error","error":{"name":"Error","message":"Request failed","stack":"Error: Request failed\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:81:23)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:390:34)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)","cause":{"params":{"url":"undefined/scrape","logger":{},"method":"POST","body":{"url":"https://www.britishairways.com/travel/home/public/en_us//sitemap.xml","engine":"tlsclient","instantReturn":true,"disableJsDom":true,"timeout":30000},"headers":{},"schema":{"_def":{"unknownKeys":"strip","catchall":{"_def":{"typeName":"ZodNever"}},"typeName":"ZodObject"},"_cached":null},"ignoreResponse":false,"ignoreFailure":false,"tryCount":1},"requestId":"43b53df2-a334-4c6e-8f7a-fcf31d8af173","error":{"name":"TypeError","message":"Failed to parse URL from undefined/scrape","stack":"TypeError: Failed to parse URL from undefined/scrape\n at node:internal/deps/undici/undici:13392:13\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:45:19)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:390:34)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)","cause":{"code":"ERR_INVALID_URL","input":"undefined/scrape","name":"TypeError","message":"Invalid URL","stack":"TypeError: Invalid URL\n at new URL (node:internal/url:806:29)\n at new Request (node:internal/deps/undici/undici:9474:25)\n at fetch (node:internal/deps/undici/undici:10203:25)\n at fetch (node:internal/deps/undici/undici:13390:10)\n at fetch (node:internal/bootstrap/web/exposed-window-or-worker:72:12)\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:45:25)\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:30)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)"}}}},"unexpected":true,"startedAt":1734286253679,"finishedAt":1734286253690}},"name":"Error","message":"All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].","stack":"Error: All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].\n at scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:211:15)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:390:34)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)"}}
api-1 | 2024-12-15 18:10:53 error [WebCrawler:getLinksFromSitemap]: Request failed for https://www.britishairways.com/travel/home/public/en_us//sitemap.xml {"crawlId":"130c2cee-6bf8-417b-a25f-7cfcb7152680","module":"WebCrawler","method":"getLinksFromSitemap","mode":"fire-engine","sitemapUrl":"https://www.britishairways.com/travel/home/public/en_us//sitemap.xml","error":{"fallbackList":["fire-engine;tlsclient"],"results":{"fire-engine;tlsclient":{"state":"error","error":{"name":"Error","message":"Request failed","stack":"Error: Request failed\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:81:23)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:390:34)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)","cause":{"params":{"url":"undefined/scrape","logger":{},"method":"POST","body":{"url":"https://www.britishairways.com/travel/home/public/en_us//sitemap.xml","engine":"tlsclient","instantReturn":true,"disableJsDom":true,"timeout":30000},"headers":{},"schema":{"_def":{"unknownKeys":"strip","catchall":{"_def":{"typeName":"ZodNever"}},"typeName":"ZodObject"},"_cached":null},"ignoreResponse":false,"ignoreFailure":false,"tryCount":1},"requestId":"43b53df2-a334-4c6e-8f7a-fcf31d8af173","error":{"name":"TypeError","message":"Failed to parse URL from undefined/scrape","stack":"TypeError: Failed to parse URL from undefined/scrape\n at node:internal/deps/undici/undici:13392:13\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:45:19)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:390:34)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)","cause":{"code":"ERR_INVALID_URL","input":"undefined/scrape","name":"TypeError","message":"Invalid URL","stack":"TypeError: Invalid URL\n at new URL (node:internal/url:806:29)\n at new Request (node:internal/deps/undici/undici:9474:25)\n at fetch (node:internal/deps/undici/undici:10203:25)\n at fetch (node:internal/deps/undici/undici:13390:10)\n at fetch (node:internal/bootstrap/web/exposed-window-or-worker:72:12)\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:45:25)\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:30)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)"}}}},"unexpected":true,"startedAt":1734286253679,"finishedAt":1734286253690}},"name":"Error","message":"All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].","stack":"Error: All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].\n at scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:211:15)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:390:34)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)"}}
api-1 | 2024-12-15 18:10:58 debug [WebCrawler:tryFetchSitemapLinks]: Failed to fetch sitemap from https://www.britishairways.com/sitemap.xml
api-1 | 2024-12-15 18:10:58 info [ScrapeURL:]: Scraping URL "https://www.britishairways.com/sitemap.xml"...
api-1 | 2024-12-15 18:10:58 debug [ScrapeURL:]: Engine fire-engine;tlsclient meets feature priority threshold
api-1 | 2024-12-15 18:10:58 info [ScrapeURL:]: Scraping via fire-engine;tlsclient...
api-1 | 2024-12-15 18:10:58 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed, trying 2 more times
api-1 | 2024-12-15 18:10:58 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed, trying 1 more times
api-1 | 2024-12-15 18:10:58 debug [ScrapeURL:fireEngineScrape/robustFetch]: Request failed
api-1 | 2024-12-15 18:10:58 info [ScrapeURL:]: An unexpected error happened while scraping with fire-engine;tlsclient.
api-1 | 2024-12-15 18:10:58 warn [ScrapeURL:]: scrapeURL: All scraping engines failed! {"module":"ScrapeURL","scrapeId":"sitemap","scrapeURL":"https://www.britishairways.com/sitemap.xml","error":{"fallbackList":["fire-engine;tlsclient"],"results":{"fire-engine;tlsclient":{"state":"error","error":{"name":"Error","message":"Request failed","stack":"Error: Request failed\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:81:23)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:416:36)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)","cause":{"params":{"url":"undefined/scrape","logger":{},"method":"POST","body":{"url":"https://www.britishairways.com/sitemap.xml","engine":"tlsclient","instantReturn":true,"disableJsDom":true,"timeout":30000},"headers":{},"schema":{"_def":{"unknownKeys":"strip","catchall":{"_def":{"typeName":"ZodNever"}},"typeName":"ZodObject"},"_cached":null},"ignoreResponse":false,"ignoreFailure":false,"tryCount":1},"requestId":"d3539c0b-ce44-4843-b532-5894a8d9ffb1","error":{"name":"TypeError","message":"Failed to parse URL from undefined/scrape","stack":"TypeError: Failed to parse URL from undefined/scrape\n at node:internal/deps/undici/undici:13392:13\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:45:19)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:416:36)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)","cause":{"code":"ERR_INVALID_URL","input":"undefined/scrape","name":"TypeError","message":"Invalid URL","stack":"TypeError: Invalid URL\n at new URL (node:internal/url:806:29)\n at new Request (node:internal/deps/undici/undici:9474:25)\n at fetch (node:internal/deps/undici/undici:10203:25)\n at fetch (node:internal/deps/undici/undici:13390:10)\n at fetch (node:internal/bootstrap/web/exposed-window-or-worker:72:12)\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:45:25)\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:30)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)"}}}},"unexpected":true,"startedAt":1734286258707,"finishedAt":1734286258717}},"name":"Error","message":"All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].","stack":"Error: All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].\n at scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:211:15)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:416:36)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)"}}
api-1 | 2024-12-15 18:10:58 error [WebCrawler:getLinksFromSitemap]: Request failed for https://www.britishairways.com/sitemap.xml {"crawlId":"130c2cee-6bf8-417b-a25f-7cfcb7152680","module":"WebCrawler","method":"getLinksFromSitemap","mode":"fire-engine","sitemapUrl":"https://www.britishairways.com/sitemap.xml","error":{"fallbackList":["fire-engine;tlsclient"],"results":{"fire-engine;tlsclient":{"state":"error","error":{"name":"Error","message":"Request failed","stack":"Error: Request failed\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:81:23)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:416:36)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)","cause":{"params":{"url":"undefined/scrape","logger":{},"method":"POST","body":{"url":"https://www.britishairways.com/sitemap.xml","engine":"tlsclient","instantReturn":true,"disableJsDom":true,"timeout":30000},"headers":{},"schema":{"_def":{"unknownKeys":"strip","catchall":{"_def":{"typeName":"ZodNever"}},"typeName":"ZodObject"},"_cached":null},"ignoreResponse":false,"ignoreFailure":false,"tryCount":1},"requestId":"d3539c0b-ce44-4843-b532-5894a8d9ffb1","error":{"name":"TypeError","message":"Failed to parse URL from undefined/scrape","stack":"TypeError: Failed to parse URL from undefined/scrape\n at node:internal/deps/undici/undici:13392:13\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:45:19)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:416:36)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)","cause":{"code":"ERR_INVALID_URL","input":"undefined/scrape","name":"TypeError","message":"Invalid URL","stack":"TypeError: Invalid URL\n at new URL (node:internal/url:806:29)\n at new Request (node:internal/deps/undici/undici:9474:25)\n at fetch (node:internal/deps/undici/undici:10203:25)\n at fetch (node:internal/deps/undici/undici:13390:10)\n at fetch (node:internal/bootstrap/web/exposed-window-or-worker:72:12)\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:45:25)\n at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:30)\n at async robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:73:24)\n at async /app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:43:16\n at async fireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/scrape.js:37:27)\n at async performFireEngineScrape (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:37:20)\n at async scrapeURLWithFireEngineTLSClient (/app/dist/src/scraper/scrapeURL/engines/fire-engine/index.js:214:20)\n at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:294:12)\n at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:121:35)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)"}}}},"unexpected":true,"startedAt":1734286258707,"finishedAt":1734286258717}},"name":"Error","message":"All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].","stack":"Error: All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [email protected].\n at scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:211:15)\n at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:249:24)\n at async getLinksFromSitemap (/app/dist/src/scraper/WebScraper/sitemap.js:22:34)\n at async WebCrawler.tryFetchSitemapLinks (/app/dist/src/scraper/WebScraper/crawler.js:416:36)\n at async WebCrawler.tryGetSitemap (/app/dist/src/scraper/WebScraper/crawler.js:173:30)\n at async crawlController (/app/dist/src/controllers/v1/crawl.js:91:11)"}}

tried crawl on the same url.

rostwal95 · 2025-01-02T11:30:39Z

@mogery is there any update on this one ?

worker-1 | 2025-01-02 11:26:56 info [ScrapeURL:]: Scraping via scrapingbee...
worker-1 | 2025-01-02 11:26:57 error [ScrapeURL:]: ScrapingBee threw an error {"module":"ScrapeURL","scrapeId":"54e64e5e-d859-4b06-890f-2fac04054b1e","scrapeURL":"https://www.britishairways.com/travel/home/public/en_us/","method":"","engine":"scrapingbee","body":{"message":"Invalid api key: # use if you'd like to use as a fallback scraper"}}
worker-1 | 2025-01-02 11:26:57 info [ScrapeURL:]: Engine scrapingbee could not scrape the page.
worker-1 | 2025-01-02 11:26:57 info [ScrapeURL:]: Scraping via scrapingbeeLoad...
worker-1 | 2025-01-02 11:26:57 error [ScrapeURL:]: ScrapingBee threw an error {"module":"ScrapeURL","scrapeId":"54e64e5e-d859-4b06-890f-2fac04054b1e","scrapeURL":"https://www.britishairways.com/travel/home/public/en_us/","method":"","engine":"scrapingbeeLoad","body":{"message":"Invalid api key: # use if you'd like to use as a fallback scraper"}}
worker-1 | 2025-01-02 11:26:57 info [ScrapeURL:]: Engine scrapingbeeLoad could not scrape the page.
worker-1 | 2025-01-02 11:26:57 info [ScrapeURL:]: Scraping via playwright...
worker-1 | 2025-01-02 11:26:57 debug [ScrapeURL:scrapeURLWithPlaywright]: Request failed
worker-1 | 2025-01-02 11:26:57 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.
worker-1 | 2025-01-02 11:26:57 info [ScrapeURL:]: Scraping via fetch...
worker-1 | 2025-01-02 11:26:58 info [ScrapeURL:]: Scrape via fetch deemed successful.

Wanli063 · 2025-01-07T03:45:28Z

Describe the Issue  描述问题 Call to playwright fails when trying to scrape with playwright.调用 playwright 时尝试使用 playwright 抓取失败。

To Reproduce  无法复现 Steps to reproduce the issue:复现问题的步骤：

Configure the environment or settings with '...'配置环境或设置使用 '...'

Run the command '...'运行命令 '...'

Observe the error or unexpected output at '...'观察'...'处的错误或意外输出

Log output/error message日志输出/错误信息

Expected Behavior  预期行为 The call to playwright should be successful and dynamic js should be rendered and cleaned up.调用 playwright 应该成功，并且应该渲染和清理动态 js。

Screenshots  屏幕截图 If applicable, add screenshots or copies of the command line output to help explain the self-hosting issue.如果适用，添加屏幕截图或命令行输出的副本以帮助解释自托管问题。

Environment (please complete the following information):环境（请填写以下信息）：

OS: [e.g. macOS, Linux, Windows]操作系统：[例如 macOS、Linux、Windows]

Firecrawl Version: [e.g. 1.2.3]Firecrawl 版本：[例如 1.2.3]

Node.js Version: [e.g. 14.x]Node.js 版本：[例如 14.x]

Docker Version (if applicable): [e.g. 20.10.14]Docker 版本（如适用）: [例如 20.10.14]

Database Type and Version: [e.g. PostgreSQL 13.4]数据库类型和版本：[例如 PostgreSQL 13.4]

Logs  日志 worker-1 | 2024-11-15 05:13:48 debug [ScrapeURL:]: Engine docx meets feature priority thresholdworker-1 | 2024-11-15 05:13:48 调试 [抓取 URL:]：引擎 docx 符合功能优先级阈值 worker-1 | 2024-11-15 05:13:48 info [ScrapeURL:]: Scraping via playwright...worker-1 | 2024-11-15 05:13:48 信息 [抓取 URL:]: 通过 playwright 进行抓取... worker-1 | 2024-11-15 05:13:48 debug [ScrapeURL:scrapeURLWithPlaywright]: Sending request...worker-1 | 2024-11-15 05:13:48 调试 [抓取 URL:使用 Playwright 抓取 URL]: 发送请求... worker-1 | 2024-11-15 05:13:48 debug [ScrapeURL:scrapeURLWithPlaywright]: Request sent failure statusworker-1 | 2024-11-15 05:13:48 调试 [抓取 URL:使用 Playwright 抓取 URL]: 请求发送失败状态 worker-1 | 2024-11-15 05:13:48 info [ScrapeURL:]: An unexpected error happened while scraping with playwright.worker-1 | 2024-11-15 05:13:48 信息 [抓取 URL:] 在使用 playwright 抓取时发生了一个意外错误。 worker-1 | 2024-11-15 05:13:48 info [ScrapeURL:]: Scraping via fetch...worker-1 | 2024-11-15 05:13:48 信息 [抓取 URL:]: 通过 fetch 抓取...

here are the logs这里是有日志

Configuration  配置 Provide relevant parts of your configuration files (with sensitive information redacted).提供您配置文件的相关部分（敏感信息已删除）。

Additional Context  附加背景信息 Add any other context about the self-hosting issue here, such as specific infrastructure details, network setup, or any modifications made to the original Firecrawl setup.在此处添加有关自托管问题的任何其他背景信息，例如特定基础设施细节、网络设置或对原始 Firecrawl 设置的任何修改。

Wanli063 · 2025-01-07T03:47:09Z

same issue +1，Is there any solution?

wesselhuising · 2025-01-11T11:17:54Z

I think there is a problem with the Zod validation of the response. When setting the log level to DEBUG, the following error pops up.

worker-1 | 2025-01-11 11:13:21 debug [ScrapeURL:scrapeURLWithPlaywright]: Response does not match provided schema

The /html POST endpoint is working and the response does look "good" to me, it matches the response schema of Zod. I might be missing something here as I am new to this project, but my feeling tells me it has to do with the Zod schema validation of the response from the POST call to the /html route.

wesselhuising · 2025-01-11T11:31:51Z

Update; removing the response validation using Zod fixes the situation in the engine file (scraper/scrapeURL/engines/playwright/index.ts).

This is the failing code:

      schema: z.object({
        content: z.string(),
        pageStatusCode: z.number(),
        pageError: z.string().optional()
      }),

rostwal95 added the self-host label Nov 15, 2024

mogery self-assigned this Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Self-Host] call to playwright is failing #902

[Self-Host] call to playwright is failing #902

rostwal95 commented Nov 15, 2024

mogery commented Nov 15, 2024

mkaskov commented Nov 15, 2024 •

edited

Loading

mogery commented Nov 15, 2024

rostwal95 commented Nov 15, 2024

rostwal95 commented Nov 15, 2024 •

edited

Loading

mkaskov commented Nov 20, 2024

lauridskern commented Nov 21, 2024

fatwang2 commented Nov 27, 2024

shingoxray commented Dec 2, 2024

zhucan commented Dec 3, 2024

Hanxiao-Adam-Qi commented Dec 6, 2024

riddlegit commented Dec 9, 2024 •

edited

Loading

lune-sta commented Dec 15, 2024

mogery commented Dec 15, 2024

rostwal95 commented Dec 15, 2024

rostwal95 commented Dec 15, 2024

rostwal95 commented Jan 2, 2025

Wanli063 commented Jan 7, 2025

Wanli063 commented Jan 7, 2025

wesselhuising commented Jan 11, 2025

wesselhuising commented Jan 11, 2025

[Self-Host] call to playwright is failing #902

[Self-Host] call to playwright is failing #902

Comments

rostwal95 commented Nov 15, 2024

mogery commented Nov 15, 2024

mkaskov commented Nov 15, 2024 • edited Loading

mogery commented Nov 15, 2024

rostwal95 commented Nov 15, 2024

=> ERROR [playwright-service 2/6] RUN apt-get update && apt-get install -y --no-install-recommends gcc libstdc++6 0.9s

rostwal95 commented Nov 15, 2024 • edited Loading

mkaskov commented Nov 20, 2024

lauridskern commented Nov 21, 2024

fatwang2 commented Nov 27, 2024

shingoxray commented Dec 2, 2024

zhucan commented Dec 3, 2024

Hanxiao-Adam-Qi commented Dec 6, 2024

riddlegit commented Dec 9, 2024 • edited Loading

lune-sta commented Dec 15, 2024

mogery commented Dec 15, 2024

rostwal95 commented Dec 15, 2024

rostwal95 commented Dec 15, 2024

rostwal95 commented Jan 2, 2025

Wanli063 commented Jan 7, 2025

Wanli063 commented Jan 7, 2025

wesselhuising commented Jan 11, 2025

wesselhuising commented Jan 11, 2025

mkaskov commented Nov 15, 2024 •

edited

Loading

rostwal95 commented Nov 15, 2024 •

edited

Loading

riddlegit commented Dec 9, 2024 •

edited

Loading