Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to parse URIs #2989

Closed
philrz opened this issue Sep 1, 2021 · 1 comment · Fixed by #3080 or #3092
Closed

Function to parse URIs #2989

philrz opened this issue Sep 1, 2021 · 1 comment · Fixed by #3080 or #3092
Assignees
Milestone

Comments

@philrz
Copy link
Contributor

philrz commented Sep 1, 2021

Repro is with Zed commit 59dc184.

A community user recently asked:

Is there any way to manipulate the data within a field?  For instance, I was looking to break apart the different parameters that are in a URI. /test/?first=sometext&second=moretext turning that into first=sometext and second=moretext

Regexp group support (#2093) has the potential to help here when we implement it. For now, I pointed out the existence of the split() function and slice operator that can accomplish much of this. For example:

$ zq -version
Version: v0.29.0-487-g59dc1843

$ echo '{"uri": "/test/?first=sometext&second=moretext"}' | zq -Z 'put qstring:=split(uri,"?")[1] | put pairs:=split(qstring,"&")' -
{
    uri: "/test/?first=sometext&second=moretext",
    qstring: "first=sometext&second=moretext",
    pairs: [
        "first=sometext",
        "second=moretext"
    ]
}

No doubt this could be taken even further by using the sequence operator to break out the entries in the pairs array into separate fields.

However, having shown this, I was also aware that there's probably escaping or other complexity that could make this approach break with some URIs. That made me search & find Go libraries like https://pkg.go.dev/net/url#example-URL.Query that purport to do this heavy lifting for us. Since URI manipulation is something that may come up frequently for our users, it might be beneficial for Zed to provide a URI-specific parsing function that returns a nested record containing the fully-parsed data.

@nwt nwt added this to the Data MVP0 milestone Sep 9, 2021
@philrz philrz modified the milestones: Data MVP0, v0.26.0 Sep 14, 2021
@nwt nwt closed this as completed in #3080 Sep 16, 2021
brim-bot pushed a commit to brimdata/brimcap that referenced this issue Sep 16, 2021
This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#3080, authored by @nwt,
has been merged.

add Zed url_parse function

Closes brimdata/super#2989.
brim-bot pushed a commit to brimdata/brimcap that referenced this issue Sep 16, 2021
This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#3080, authored by @nwt,
has been merged.

add Zed url_parse function

Closes brimdata/super#2989.
brim-bot pushed a commit to brimdata/zui that referenced this issue Sep 16, 2021
This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#3080, authored by @nwt,
has been merged.

add Zed url_parse function

Closes brimdata/super#2989.
@philrz
Copy link
Contributor Author

philrz commented Sep 17, 2021

Verified in Zed commit fdca08a.

Let's take an example Grafana URL that has lots of stuff to parse.

$ cat data.ndjson 
{"url": "https://johndoe:[email protected]:443/d/000000010/annotations?orgId=1&from=now-3h&to=now"}

Calling the new function to parse it:

$ zq -version
Version: v0.30.0-24-g11d4a7cc

$ zq -Z 'goodies := parse_uri(url)' data.ndjson
{
    url: "https://johndoe:[email protected]:443/d/000000010/annotations?orgId=1&from=now-3h&to=now",
    goodies: {
        scheme: "https",
        opaque: null (string),
        user: "johndoe",
        password: "easy2guess",
        host: "play.grafana.org",
        port: 443 (uint16),
        path: "/d/000000010/annotations",
        query: |{
            {"to",[
                    "now"
                ]},
            {"from",[
                    "now-3h"
                ]},
            {"orgId",[
                    "1"
                ]}
        }|,
        fragment: null (string)
    }
}

Note that this is one of the first times a Zed function is returning a map-type value. Just to give a crash course on how to access a map's values, here we can make a top-level field that contains the value of one of the query parameters:

$ zq -Z 'goodies := parse_uri(url) | put q_from := goodies.query["from"] | drop goodies' data.ndjson
{
    url: "https://johndoe:[email protected]:443/d/000000010/annotations?orgId=1&from=now-3h&to=now",
    q_from: [
        "now-3h"
    ]
}

If you try to access a key that isn't in the map, you get a warning, but it's otherwise a no-op.

$ zq -Z 'goodies := parse_uri(url) | put q_from := goodies.query["absentparam"] | drop goodies' data.ndjson
put: a referenced field is missing
{
    url: "https://johndoe:[email protected]:443/d/000000010/annotations?orgId=1&from=now-3h&to=now"
}

It should also be noted that the Brim app is presenting map values pretty crudely at the moment (brimdata/zui#1245)

image

However, as something in Zed, this is definitely ready to show off to the user in the community that requested the functionality.

Thanks @nwt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants