Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflicting remotes set for packages installed from r-universe via url #727

Open
kevinushey opened this issue Dec 17, 2024 · 18 comments
Open

Comments

@kevinushey
Copy link

Originally from rstudio/renv#2060.

Try installing cmdstanr with:

pak::pak(pkg = 'cmdstanr=url::https://stan-dev.r-universe.dev/src/contrib/cmdstanr_0.8.1.tar.gz')

The installed package will have the following remotes set:

RemoteUrl: https://github.com/stan-dev/cmdstanr
RemoteRef: v0.8.1
RemoteSha: 02259ef7aa2a8b1c8de2fa3fc42a9feafd789288
< ... >
RemotePkgRef: cmdstanr=url::https://stan-dev.r-universe.dev/src/contrib/cmdstanr_0.8.1.tar.gz
RemoteType: url
RemoteEtag: 6b0a172a497ddd0c
RemotePackaged: TRUE

The top set of remotes are set on the source package by r-universe; the bottom set are added by pak.

I'm not sure what the best way to resolve this ambiguity is. Should pak update RemoteUrl to point at the specific URL requested? (I realize this is duplicating what is set in RemotePkgRef.)

@gaborcsardi
Copy link
Member

I suppose there are some bad consequences of this?

@kevinushey
Copy link
Author

kevinushey commented Dec 17, 2024

Mainly this confuses renv, since it sees RemoteType: url and so assumes the RemoteUrl field encodes the URL where archive can be retrieved, but pak does not set that field for URL remotes. (The URL can, of course, be recovered from RemotePkgRef in this case.)

For what it's worth, remotes does update the RemoteUrl field, so you end up with:

RemoteUrl: https://stan-dev.r-universe.dev/src/contrib/cmdstanr_0.8.1.tar.gz
RemoteRef: v0.8.1
RemoteSha: 02259ef7aa2a8b1c8de2fa3fc42a9feafd789288
< ... >
RemoteType: url

The leftover RemoteRef and RemoteSha fields are perhaps awkward, but at least the RemoteUrl field can be used here.

@gaborcsardi
Copy link
Member

So you want me to add a RemoteUrl field for url:: remotes, overwriting the one that is already there?

@gaborcsardi
Copy link
Member

Btw. it is somewhat weird to use url:: for R-universe, no? It is a CRAN-like repo with pre-built binaries.

@kevinushey
Copy link
Author

So you want me to add a RemoteUrl field for url:: remotes, overwriting the one that is already there?

At least, I thought that's what the implicit contract was. If RemoteType: url is set, then RemoteUrl should point at the URL where the package can be retrieved. But I'm also okay if you'd prefer to do nothing and ask renv to rely on RemotePkgRef.

Btw. it is somewhat weird to use url:: for R-universe, no? It is a CRAN-like repo with pre-built binaries.

Yeah, I definitely agree. At least in this case it seems like it was done to work around some other issue, but (to me) that issue wasn't well articulated.

@gaborcsardi
Copy link
Member

IDK why pak does not add RemoteUrl, probably an oversight, so I can certainly add that. I don't feel so good about overwriting fields, but since we kind of own the Remote* fields, it is OK, I think.

Probably R-universe should not serve packages with these fields, especially since they are incorrect. But anyway, we can work it here.

@kevinushey
Copy link
Author

Probably R-universe should not serve packages with these fields, especially since they are incorrect. But anyway, we can work it here.

I think the intention here is to allow package managers to recover the original package sources if the referenced binary is no longer available on r-universe. That said, might be better to use the typical "git" remote fields for this scenario, or to have r-universe use its own DESCRIPTION fields that it "owns" here -- I'm not sure.

@jeroen
Copy link
Member

jeroen commented Dec 17, 2024

What is incorrect about them? We specifically add those packages can be rebuilt from the upstream source. The idea is that for renv it should not matter if a package source code was pulled from git directly, or this same exact source was downloaded from a tarball from r-universe. What matters is the upstream git source and commit id.

@gaborcsardi
Copy link
Member

I think the intention here is to allow package managers to recover the original package sources if the referenced binary is no longer available on r-universe.

I think there is a good chance that it is not intentional, but a side effect of installing/building dependencies from Remotes fields on R-universe.

@jeroen
Copy link
Member

jeroen commented Dec 17, 2024

It is intentional, see https://ropensci.org/blog/2022/01/06/runiverse-renv/

@gaborcsardi
Copy link
Member

gaborcsardi commented Dec 17, 2024

What is incorrect about them?

If I install a package from R-universe via install.packages() then it should not have a RemoteUrl (or other) field, because that means that it was installed from a URL, and not from a CRAN-like repo.

E.g. remotes::update_packages() will try to update the package from the Remote* location instead of the R-universe repo. pak will do the same, once it will have an update function.

@jeroen
Copy link
Member

jeroen commented Dec 17, 2024

This is the way it is intended. R-universe is a proxy for the upstream git repository. It does not matter if a user downloaded the package source from github or a source tarball, as long as the sha is the same. Therefore the RemoteUrl and RemoteSha declare the upstream source, so renv or others can find it if they need it.

I think we need to disambiguate the spec for RemoteUrl because I assumed it is always a git repository url (iirc it always used to be the case), but from the above I understand RemoteUrl can also be a file.

@gaborcsardi
Copy link
Member

That does not sound good to me. I imagine that a big reason for people to use R-universe is the convenience of the binary builds. If you then send package managers like renv and pak to the source instead of using the CRAN-like repo then you lose this. Then people will get slow builds or even failing builds if system packages are not installed.

@jeroen
Copy link
Member

jeroen commented Dec 17, 2024

renv will first test if the binary is available from the cranlike repo, and fall back on the source if it is no longer available.

We have different use cases in mind. It is not about fast builds. From a reproducibility standpoint it is important that we formally declare the upstream git repository and commit sha, such that renv can lookup the source, even years later when the cranlike-repo may not be there. That's the way it was designed and has worked for 3 years now.

@jeroen
Copy link
Member

jeroen commented Dec 17, 2024

So currently all packages in R-universe declare these fields in the DESCRIPTION:

Repository: https://jeroen.r-universe.dev
RemoteUrl: https://github.com/jeroen/jsonlite
RemoteRef: HEAD
RemoteSha: 86f481d34eb4e7893e32a88608ce220421f59c41

This was modeled after what remotes does and hence allows renv to lookup the source code of a package from git if it is not available from the cran-like repo. Is some RemoteType field we can add to let pak know it does not need to auto update this from git?

@gaborcsardi
Copy link
Member

renv will first test if the binary is available from the cranlike repo

How can you even know that without e.g. the sha in the PACKAGES file? You can check that some binary is available, but not that a specific build is. E.g. how could pak check if it can use the binary to update a package?

From a reproducibility standpoint it is important that we formally declare the upstream git repository and commit sha,

Maybe it would be better to use a different set of fields then? These fields were meant to be for internal bookkeeping, they have never been standardized, and they are also changing over time.

This was modeled after what remotes does

This is what remotes::install_github() add currently:

RemoteType: github
RemoteHost: api.github.com
RemoteRepo: filelock
RemoteUsername: r-lib
RemoteRef: HEAD
RemoteSha: 1c2cda5d28d0c436c4ed4fec5477ffdefb83da0e

Anyway, pak is going to also set RemoteUrl for url:: packages, because remotes apparently does that as well.

@jeroen
Copy link
Member

jeroen commented Dec 17, 2024

Maybe it would be better to use a different set of fields then? These fields were meant to be for internal bookkeeping, they have never been standardized, and they are also changing over time.

When this was added this (3 years ago) we wanted something to declare the upstream source in the DESCRIPTION files such that any client that can lookup or verify the upstream source code from which a package installation or tarball was originally built.

Yes we could have reinvented the wheel and make up new field names, but for renv and other clients it is helpful if it can use some basic convention. We assumed that the convention set by remotes to use RemoteUrl + RemoteSha was pretty unambiguous and simple method to identify a git source. So we followed that and it has worked well. I did not expect these to conflict with pak.

Would it help anything if I add RemoteType: xgit to DESCRIPTION files to emphasise that the RemoteUrl is a git repo? Or some other RemoteType value?

@gaborcsardi
Copy link
Member

Would it help anything if I add RemoteType: xgit to DESCRIPTION files to emphasise that the RemoteUrl is a git repo? Or some other RemoteType value?

With what? With this issue, no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants