Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This works insanely good #10

Open
loeffel-io opened this issue Nov 28, 2024 · 13 comments
Open

This works insanely good #10

loeffel-io opened this issue Nov 28, 2024 · 13 comments

Comments

@loeffel-io
Copy link

loeffel-io commented Nov 28, 2024

Really good job! Works so much better then wkhtmltopdf

I would love to migrate to this with https://github.com/loeffel-io/mail-downloader.

  1. For integration it would be great to create the file myself, for instance:
	orcgen.Generate(
		htmlBody,
		orcgen.PDFConfig{
			Landscape:         false,
			PrintBackground:   true,
			PreferCSSPageSize: true,
		}
	)

should return the file bytes and the error to create the file myself with the permissions i want etc

  1. With wkhtmltopdf i was able to disable javascript - is this possible?

  2. I would love to do the download of the browser before i start converting pdfs, because this results in:

322 / 322 [----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->] 100.00% 47 p/sProcessing messages...
5 / 322 [--->______________________________________________________________________________________________________________________________________________________________________________________________________] 1.55% 78 p/s[launcher.Browser]2024/11/28 19:12:09 Download: https://storage.googleapis.com/chromium-browser-snapshots/Mac_Arm/1294824/chrome-mac.zip
5 / 322 [--->______________________________________________________________________________________________________________________________________________________________________________________________________] 1.55% 78 p/s[launcher.Browser]2024/11/28 19:12:09 Progress: 00%
5 / 322 [--->______________________________________________________________________________________________________________________________________________________________________________________________________] 1.55% 68 p/s[launcher.Browser]2024/11/28 19:12:10 Progress: 07%
  1. German umlauts prints as Sie möcht. Would be great to have UTF-8 (?)

Thank you!

@luabagg
Copy link
Owner

luabagg commented Nov 28, 2024

Thanks for the feedback!

  1. It's already possible:
	fileinfo, err = orcgen.NewHandler(orcgen.PDFConfig{
		PrintBackground: true,
		PageRanges:      "1,2",
	}).GenerateFile(page)

	if err == nil {
		filename := "google.pdf"
		fileinfo.Output(getName(filename))
		fmt.Printf("%s generated successfully\n", filename)
	}

FileInfo struct:

type Fileinfo struct {
	File     []byte
	Filesize int
}

2, 3: I think you can disable Javascript and configure your browser path, but I would have to take a look at go-rod API

For this, I would have to do some changes in the Webdrive file. Maybe I can create an interface and let it be extendable, but I'll have to take a more detailed look.

@loeffel-io
Copy link
Author

loeffel-io commented Nov 28, 2024

Thanks for your answer!

I saw the new api before you answer, thats much better.

But what is the page? 😅

func (mail *mail) generatePdf(pdfGen handlers.FileHandler[orcgen.PDFConfig]) (*fileinfo.Fileinfo, error) {
	count := counter.CreateCounter()

	var htmlBody []byte
	for _, body := range mail.Body {
		if mime := mimetype.Detect(body); !mime.Is("text/html") {
			continue
		}

		htmlBody = append(htmlBody, body...)
		count.Next()
	}

	if count.Current() == 0 {
		return nil, nil
	}

	return pdfGen.SetFullPage(true).GenerateFile(&rod.Page{}) ???
}

@luabagg
Copy link
Owner

luabagg commented Nov 28, 2024

You can take a look at the examples_test.go

	page := wd.UrlToPage("https://google.com")
	wd.WaitLoad(page)
	page.MustInsertText("github orcgen package golang").Keyboard.Type(input.Enter)
	wd.WaitLoad(page)

This uses the webdriver directly. If you need to change JS and browser settings, I think it's the way to go. I'll problably add these options in the webdriver config later.

But for now, there's a simpler way to get the file bytes:

	fileinfo, err = orcgen.ConvertWebpage(
		pdf.New().SetFullPage(true), "https://www.x.com",
	)

or

	fileinfo, err := orcgen.ConvertHTML(
		screenshot.New().SetConfig(orcgen.ScreenshotConfig{
			Format: "jpeg",
		}),
		getHTML(),
	)

@loeffel-io
Copy link
Author

no i already have the file bytes - i just want to convert my html bytes to the pdf bytes

@loeffel-io
Copy link
Author

just want to fill the page param: return pdfGen.SetFullPage(true).GenerateFile(page somehow from my html bytes)

@loeffel-io
Copy link
Author

(and want to have multi page pdf like here: #2)

@loeffel-io
Copy link
Author

loeffel-io commented Nov 28, 2024

Ok sorry, got it: orcgen.ConvertHTML(pdfGen.SetFullPage(true), htmlBody)

the api could be a bit easier imo but its just amazing

also german umlauts somehow works - maybe its only for some emails

Thank you!

@loeffel-io
Copy link
Author

loeffel-io commented Nov 28, 2024

Looks like fileInfo is nil in some edge case

49 / 322 [------------------------------>__________________________________________________________________________________________________________________________________________________________________________] 15.22% 4 p/spanic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x100bcc700]

goroutine 1 [running]:
main.main()
/Users/loeffel/go/src/github.com/loeffel-io/mail-downloader/main.go:180 +0xf50

@loeffel-io
Copy link
Author

while the file bytes is not 0 - sure you capture any err in the GenerateHTML function? maybe this happens when the timeout of the webdriver is reached

sorry for all the spam

@luabagg
Copy link
Owner

luabagg commented Nov 28, 2024

It's ok

I need more info about where the memory error happened:

image

The handler is pretty simple, but you should check for errors.

@loeffel-io
Copy link
Author

loeffel-io commented Nov 29, 2024

All good, found the issue - was my fault.
Now it would be really great to predownload the browser and disable js.
Then the migration would be done ❤️

@loeffel-io
Copy link
Author

Sometimes this panics btw, would be good to handle this error

panic: context deadline exceeded

goroutine 1 [running]:
github.com/go-rod/rod/lib/utils.init.func2({0x1016cc3a0?, 0x101ca3980?})
/Users/loeffel/go/pkg/mod/github.com/go-rod/[email protected]/lib/utils/utils.go:69 +0x24
github.com/luabagg/orcgen/v2/pkg/webdriver.(*WebDriver).Connect.New.(*Browser).WithPanic.genE.func1({0x14000321790?, 0x140021e0570?, 0x502f?})
/Users/loeffel/go/pkg/mod/github.com/go-rod/[email protected]/must.go:36 +0x70
github.com/go-rod/rod.(*Page).MustWaitLoad(0x14002114370)
/Users/loeffel/go/pkg/mod/github.com/go-rod/[email protected]/must.go:451 +0x88
github.com/luabagg/orcgen/v2/pkg/webdriver.(*WebDriver).WaitLoad(0x14001080600, 0x14000af0a00?)
/Users/loeffel/go/pkg/mod/github.com/luabagg/orcgen/[email protected]/pkg/webdriver/webdriver.go:86 +0x30
github.com/luabagg/orcgen/v2.ConvertHTML[...]({0x101768e00?, 0x1400020ca90}, {0x14000af0a00, 0x502f, 0x5500})
/Users/loeffel/go/pkg/mod/github.com/luabagg/orcgen/[email protected]/orcgen.go:73 +0xbc
main.(*mail).generatePdf(0x14000218b40, {0x101768e00, 0x1400020ca90})
/Users/loeffel/go/src/github.com/loeffel-io/mail-downloader/mail.go:131 +0x158
main.main()
/Users/loeffel/go/src/github.com/loeffel-io/mail-downloader/main.go:178 +0x101c

@luabagg
Copy link
Owner

luabagg commented Dec 5, 2024

Hey, I will be really busy these following weeks - feel free to contribute if you need something faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants