Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Self-Host] does self-host support scrape web page written by vue.js or react.js ? #1027

Open
wangping886 opened this issue Dec 30, 2024 · 13 comments

Comments

@wangping886
Copy link

when i exec http://localhost:3002/v1/scrape but no content return. what is the body should set can get content
Screenshots
image

Environment (please complete the following information):

  • Docker Version (if applicable): [e.g. 20.10.14]
@mogery
Copy link
Member

mogery commented Dec 30, 2024

Hi there, you need to set up the Playwright microservice to be able to scrape sites that use JavaScript.

@mogery
Copy link
Member

mogery commented Dec 30, 2024

Scraping via fetch deemed successful.

This means that Firecrawl is using the fetch engine, not the playwright engine.

An unexpected error happened while scraping with playwright.

Are the playwright engine and the Firecrawl .env variables configured correctly?

@wangping886
Copy link
Author

i'm trying follow steps . i need scraping with myself playwright ?

(Optional) Running with TypeScript Playwright Service

Update the docker-compose.yml file to change the Playwright service:

    build: apps/playwright-service
TO

    build: apps/playwright-service-ts
Set the PLAYWRIGHT_MICROSERVICE_URL in your .env file:

PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3000/scrape
Don't forget to set the proxy server in your .env file as needed.

@mogery
Copy link
Member

mogery commented Dec 30, 2024

What is your PLAYWRIGHT_MICROSERVICE_URL set to in your .env file?

@wangping886
Copy link
Author

PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/html

@mogery
Copy link
Member

mogery commented Dec 30, 2024

PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/html

Can you try with PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape ?
Are you using playwright-service or playwright-service-ts?

@wangping886
Copy link
Author

wangping886 commented Dec 30, 2024

I'm docker build playwright-service-ts now . it's very slowly. this step (Optional) Running with TypeScript Playwright Service is not have to do ?

PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/html can't get content. and the server no logs.

I'll use /scrape to have a try , still can't retrieve content
image

@wangping886
Copy link
Author

i use playwright-service and set .env with PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape . don't get content

@Wanli063
Copy link

Wanli063 commented Jan 7, 2025

Hello, have you solved the problem? I had the same problem.

@wangping886
Copy link
Author

Hello, have you solved the problem? I had the same problem.

no, the author don't tell me how to solve

@namhnz
Copy link

namhnz commented Jan 10, 2025

Is has error:
image
My Ubuntu machine is:
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal

@eliaozi
Copy link

eliaozi commented Jan 14, 2025

Try this:

  • Modify fetch.ts , add debug info in order to analyze log information(optional).

WechatIMG3374
WechatIMG3375

  • Modify docker-compose.yaml, change playwright-service to playwright-service-ts. And change PLAYWRIGHT_MICROSERVICE_URL,add /scrape.
    WechatIMG3373

  • Run docker compose build and docker compose up.

@ocampoje17
Copy link

Try this:

  • Modify fetch.ts , add debug info in order to analyze log information(optional).

WechatIMG3374 WechatIMG3375

  • Modify docker-compose.yaml, change playwright-service to playwright-service-ts. And change PLAYWRIGHT_MICROSERVICE_URL,add /scrape.
    WechatIMG3373
  • Run docker compose build and docker compose up.

I tried with version 1.2.1 and I think this solution works well
And I also need to change the .env file to this:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants