Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Caddy JSON log parser #730

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

DavidVentura
Copy link
Contributor

@DavidVentura DavidVentura commented Mar 30, 2024

This should be based on top of the split-regexparser branch but I'm not entirely sure how to do that.

I understand that this does not allow for custom formats, only the base/default Caddy formatting. I'm not sure if a custom query language is desired or not.

@DavidVentura DavidVentura changed the title Implement caddy Implement Caddy JSON log parser Mar 30, 2024
@DavidVentura DavidVentura marked this pull request as ready for review March 30, 2024 17:54
@arp242
Copy link
Owner

arp242 commented Mar 31, 2024

I understand that this does not allow for custom formats, only the base/default Caddy formatting. I'm not sure if a custom query language is desired or not.

Do you know of any other software that uses JSON for logs?

In principle this could be done by scanning to a map[string]any, and then you can map JSON fields to things to import with something like --format=json:..

But if nothing else uses JSON then that's probably not worth it.

@DavidVentura
Copy link
Contributor Author

I only know of Traefik but have not used it.

@arp242
Copy link
Owner

arp242 commented Apr 8, 2024

Okay, it'll be fine to just add a Caddy-specific method. I don't really feel like coming up and implementing a generic approach, especially when it's unclear that's needed. Looking at goaccess, it doesn't support JSON at all, and when people ask for it it's mostly for Caddy. A more generic approach can always be added later if it's needed – just accepting the "caddy" format doesn't prevent that.

@arp242
Copy link
Owner

arp242 commented Apr 8, 2024

I merged your other PR – if you rebase on master I'll have a detailed look at this one.

@arp242
Copy link
Owner

arp242 commented Apr 8, 2024

The test file only has a single request – it should have a few requests, to test that reading more than one request works.

I also wonder if/how goatcounter import -follow will work with Caddy? If Caddy outputs one JSON object per line then that wouldn't be a problem, but if it outputs human-readable JSON per your test example then getting that to work would require some additional hackery.

func (l CaddyLogEntry) Timing() time.Duration {
// TODO: `Second` should depend on the log format
// {seconds, nano, string} where string in {1m32.05s, 6.31ms}
return time.Duration(l.Duration * float64(time.Second))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't clear on the best way of implementing this

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can probably be auto-detected based on the type? Otherwise probably need to add a new flag for it or something. Need to look a bit at what the options are here – if you add a (failing) testcase for all possible options then I would have a better idea.

@DavidVentura
Copy link
Contributor Author

Caddy does indeed print one request per line, so that wouldn't be necessary. I've been using this branch with my Caddy instance since last week 😁

I've rebased from master and will add a test in a bit

@arp242
Copy link
Owner

arp242 commented Apr 8, 2024

Right so – would be good to have logfiles as Caddy outputs them. Manually twiddling compacted JSON is a bit of a pain, but otherwise you're not really testing the real-world scenario.

@maruel
Copy link

maruel commented Sep 30, 2024

My use case is also caddy. What can unblock this PR?

@arp242
Copy link
Owner

arp242 commented Sep 30, 2024

I think it's basically fine; IIRC I just wanted to write a bit of docs, but never really got around it. Basically, I just forgot 😅 I'll try and find some time for things this week or so.

@DavidVentura
Copy link
Contributor Author

I forgot about this PR 😬 I have some time this week, let me know if there's anything you'd like me to do here

@arp242
Copy link
Owner

arp242 commented Sep 30, 2024

The only open comment is: #730 (comment)

I do remember now: what I wanted to do (but didn't) was to run Caddy and play around with possible options and see what does and doesn't work. I never really used Caddy and don't really have a good insight on how "complete" this PR is and what options Caddy does or doesn't have.

@DavidVentura
Copy link
Contributor Author

I am having some problems with this PR:

  1. Importing the same log file repeatedly will cause repeated views after some hours (waiting hours/overnight is necessary for the issue to manifest). I assume this is due to the pageview/visit calculation, but it's not really useful for me, as I run Goatcounter on a separate server
  2. "Sizes" remains "100% Unknown": is this supposed to be parsed from user-agent?
  3. "Locations" is "99.99% Unknown", but there are some results; is this supposed to be parsed from user-agent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants