-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for 'git::' force token on local filepaths, both absolute and relative #269
base: main
Are you sure you want to change the base?
Conversation
In the wake of the discussion from earlier today, I've improved my testing setup for this issue and have one that demonstrates the main brokenness of the Here's the repo structure in which I've produced the problem:
The directory The Git repo reference on line 15 fails because the specified relative path is correct relative to the location of the The Git repo reference on line 21 works because the specified path happens to be correct relative to the For reference, the specific error emitted by (my locally modified)
|
eeb7134
to
d790d28
Compare
The go-getters library now has support for local file system paths to Git repositories, specified with the 'git::' forcing token. The feature works for both absolute and relative filepaths, and supports all the usual go-getter goodies including '//' delimited subdirs and URI-style query parameters.[0][1] We incorporate that capability into Terraform, which allows users to specify paths to locally present Git repositories from which to clone other Terrform modules on which they are dependent. When coupled with Git submodules, this creates a powerful way to manage Terraform modules at specific versions without requiring those modules to be available on the network (e.g., on GitHub): module "my_module" { source = "git::../git-submodules/tf-modules/some-tf-module?ref=v0.1.0" // ... } From the perspective of Terraform, such Git repositories are "remote" in the same way that repositories on GitHub are. Note that within a Terraform module "call" block, the filepaths specified are relative to the directory in which the *.tf file lives, not relative to the current working directory of the Terraform process. In order to support this feature, Terraform needs to supply that contextual information to go-getter to allow relative filepath resolution to work. In order to do so, we needed to switch over to using go-getter's new "Contextual Detector" API. It works in the same basic way as the traditional Detector API, but allows us to provide this additional information. In keeping with the "keep things simple" comment in the commit message of 2b2ac1f, we are here maintaining our custom go-getter detectors in two places. Only now each is called FooCtxDetector rather than FooDetector. Nevertheless, all except the GitCtxDetector do little more than "pass through" delegation to its analogous FooDetector counterpart. Fixes hashicorp#25488 Fixes hashicorp#21107 [0] hashicorp/go-getter#268 [1] hashicorp/go-getter#269
Any chance we could get a review of this soon? |
I love this idea! |
d790d28
to
9a40be6
Compare
…nd relative This series of changesets introduces a feature that allows the 'git::' forcing token to be used on local file system paths to reference Git repositories. Both absolute paths and relative paths are supported. For example: git::./some/relative/path/to/a/git-repo//some-subdir?ref=v1.2.3 or: git::../../some/relative/path/to/a/git-repo//some-subdir?ref=v1.2.3 or: git::/some/absolute/path/to/a/git-repo//some-subdir?ref=v4.5.6 Only filepaths that are prefixed with the 'git::' forcing token are considered for processing. Internally, go-getter transforms the provided string into a 'file://' URI with an absolute filepath, with query string params and subdirectory retained. The rationale for using a 'file://' URI internally is that the Git clone operation can already work with 'file://' URIs, and using them for this feature allows us to leverage the existing go-getter URI-handling machinery. That gets us support for query params (to clone a specific git ref (tag, commit hash, ...)) "for free". The rationale for using an absolute filepath (even when the provided string is a relative filepath) is that (per RFC 1738 and RFC 8089) only absolute filepaths are legitimate in 'file://' URIs. But more importantly here, the Git clone operation only supports 'file://' URIs with absolute paths. Q: Why support this functionality at all? Why not just require that a source location use an absolute path in a 'file://' URI explicitly if that's what is needed? A: The primary reason is to allow support for relative filepaths to Git repos. There are use cases in which the absolute path cannot be known in advance, but a relative path to a Git repo is known. For example, when a Terraform project (or any Git-based project) uses Git submodules, it will know the relative location of the Git submodule repos, but cannot know the absolute path in advance because it will vary based on where the "superproject" repo is cloned. Nevertheless, those relative paths should be usable as clonable Git repos, and this mechanism would allow for that. Support for filepaths that are already absolute is provided mainly for symmetry. It would be surprising for the feature to work with relative file paths, but not for absolute filepaths. For projects using Terraform, in particular, this feature (along with a small change in the Terraform code to leverage it) enables the non-fragile use of relative paths in a module "call" block, when combined with Git submodules: module "my_module" { source = "git::../git-submodules/tf-modules/some-tf-module?ref=v0.1.0" // ... } In the above example "superproject" Git repo (the one "calling" the terraform module) knows the relative path to its own Git submodules because they are embedded in a subdirectory beneath the top-level of the "superproject" repo. Two downstream Terraform issues that would require go-getter support for this feature (or something like it) are at [0] and [1]. This first changeset in the series updates the README.md documentation to note the new feature and provide examples. [0] "Unable to use relative path to local Git module" hashicorp/terraform#25488 [1] "In 0.12, modules can no longer be installed from local git repositories at relative paths" hashicorp/terraform#21107 Design Notes ------------ In order for this feature to work, additional contextual information is needed by the Git detector than can be provided using the existing Detector API. Internally, the Detector's Detect method does not pass along to the Detector implementations all of the contextual information that it has available. In particular, the forcing token and go-getter subdir component are stripped out of the source string before invoking the implementation's Detect method. In the particular case of the Git detector, that means it cannot know that a 'git::' forcing token was provided on an input string that otherwise looks like a file system path. And /that/ means that it is not correct or safe for it to identify any filepath string value as a Git repository. Externally, callers (such as Terraform) already provide a value for the 'pwd' parameter of Detect, but it is not (necessarily) the location from which a relative path in a 'git::' string should be resolved. In a Terraform module (which may be in an arbitrary subdirectory from the process current working directory), module "source" references that contain relative paths must be interpreted relative to the location of the module source file. Terraform has that information available, but in the existing Detect API there is no way to convey it to go-getter. Constraints ----------- Additional Detector methods cannot be added without burdening all existing detectors (both internal and in the wild) with the need to support them. Additional Detect method params cannot be added without breaking all existing Detector implementations (internal, wild). Additional parameters cannot be added to the Detect dispatching function without affecting all callers. Approach -------- The goal is to provide the feature in a way that is as minimally invasive as possible. But above all else it needs to avoid breaking backward compatibility in any way. Given that, the approach taken by this changeset series is to introduce the concept of a "Contextual Detector". It is structured in the same way as the current Detector framework, but works through a new CtxDetector interface that is not constrained by the existing API. The only callers affected by this change would be those that wish to take advantage of the additional capabilities. And for those, the migration path straight-forward because the new API is structured like the existing one. In particular, this changeset series introduces four new elements: 1. CtxDetector interface 2. CtxDetect dispatching function 3. CtxDetect method on the CtxDetector interface 4. Full suite of CtxDetector implementations that are analogues of the existing detectors (most of which (currently) just delegate to the existing Detector implementations). There is also a global 'ContextualDetectors' list that serves a function analogous to the existing 'Detectors' list. Signed-off-by: Alan D. Salewski <[email protected]>
Once the Detect(...) function finds a detector for a source string it combines the obtained result string with other data bits to produce the result returned to the caller. That processing is here extracted into a new handleDetected(...) function, which will be called from an additional context in an upcoming commit. The new handleDetected(...) function lives in a new file: detect_common.go The move emphasizes the function's slightly wider use now by both the Detect(...) and (soon-to-be-introduced) CtxDetect(...) dispatch functions. With the introduction of CtxDetect, the logic in handleDetected is no longer used exclusively by Detect. This changeset provides an implementation modification, but no behavioral change. Signed-off-by: Alan D. Salewski <[email protected]>
The existing Detector interface cannot be extended in a backward compatible way to support new features desired for the GitDetector implementation[0]. The new CtxDetector interface is introduced to allow such extension without breaking backward compatibility with existing users. The new interface also avoids adding new methods to the existing Detector interface that would then need to be supported by all Detector implementations, both in-library and in the wild. A CtxDetector is slightly more cumbersome to use than Detector. Callers can (and should) continue to use Detector unless the enhanced capabilities of one or more of the CtxDetector implementation modules is needed. At the time of writing (2020-08), the only CtxDetector with such extra mojo is the forthcoming GitCtxDetector. Existing Detector implementations can easily be wrapped by CtxDetector implementations. The information available to a CtxDetector impl. is a strict superset of the information provided to a Detector. Where there is no need for the additional context info provided by the CtxDetect dispatch function, impls. can simply pass through the common subset to the Detect method of the analogous Detector impl. CAVEAT: In this changeset the list of ContextualDetectors is commented- out. This is intended to make a clear introduction of the interface type prior to introducing any implementations of it. A forthcoming change will provide such wrapping for all in-tree Detector impls, followed by the introduction of specialization for the GitCtxDetector impl. [0] C.f., hashicorp#268 Signed-off-by: Alan D. Salewski <[email protected]>
...and uncomment the 'ContextualDetectors' defult list of them in CtxDetector. The CtxDetector implementations satisfy the CtxDetector interface for all of the built-in detectors (bitbucket, github, gitlab, file, git, GCS, and S3), but do not (yet) take advantage of the the additional contextual information available. These implementations all just pass-through a subset of their arguments to the existing Detector impl. analogue. These are intended to make it comfortable to swap-in use of the new "Contractual Detector" without having to change much code, and also provide a place to hang enhancement code that can take advantage of the contextual information made available by the CtxDetect dispatch function. Signed-off-by: Alan D. Salewski <[email protected]>
...except for GitCtxDetector, which will be getting special treatment since it is the module behind the motivation for this series of changes. The unit tests committed here are all just minimal adaptations of the existing detector unit tests to satisfy the CtxDetect method (which takes three more params than Detect). These tests basically test the "pass through" behavior of their respective implementations, but that's useful to demonstrate that they do not interfere with the traditional behaviors. Signed-off-by: Alan D. Salewski <[email protected]>
GitCtxDetector is able to process 'git::' forced filepath strings, when specified with both absolute and relative filepaths. The following all work now: Absolute: git::/path/to/some/git/repo git::/path/to/some/git/repo//some/subdir git::/path/to/some/git/repo//some/subdir?ref=v1.2.3 Relative (subdir): git::./path/to/some/git/repo//some/subdir?ref=v1.2.3 Relative (parent dir): git::../../path/to/some/git/repo//some/subdir?ref=v1.2.3 GitCtxDetector provides a superset of the functionality of GitDetector, with which it is mainly backward compatible. In fact, GitCtxDetector still delegates Git SSH URL detection to GitDetector. However, because CitCtxDetector has contextual information available to it, it provides the following advantages: * It can safely process 'git::<filepath>' strings as Git repositories; * It can avoid attempting detection on strings that have force tokens intended for other detectors; and * It can provide error messages when it was unable to process a 'git::' forced string (it provides this last capability "around" the wrapped delegation to GitDetector, too). Signed-off-by: Alan D. Salewski <[email protected]>
Signed-off-by: Alan D. Salewski <[email protected]>
9a40be6
to
a19ad2a
Compare
any progress here?? |
New feature: allow
git::
forced filepaths, both absolute and relativeFixes #268
This series of changesets introduces a feature that allows the
git::
forcing token to be used on local file system paths to reference Git repositories. Both absolute paths and relative paths are supported. For example:or:
or:
Only filepaths that are prefixed with the
git::
forcing token are considered for processing.Internally, go-getter transforms the provided string into a
file://
URI with an absolute filepath, with query string params and subdirectory retained.The rationale for using a
file://
URI internally is that the Git clone operation can already work withfile://
URIs, and using them for this feature allows us to leverage the existing go-getter URI-handling machinery. That gets us support for query params (to clone a specific git ref (tag, commit hash, ...)) "for free".The rationale for using an absolute filepath (even when the provided string is a relative filepath) is that (per RFC 1738 and RFC 8089) only absolute filepaths are legitimate in
file://
URIs. But more importantly here, the Git clone operation only supportsfile://
URIs with absolute paths.Q: Why support this functionality at all?
Why not just require that a source location use an absolute path in a
file://
URI explicitly if that's what is needed?A: The primary reason is to allow support for relative filepaths to Git repos.
There are use cases in which the absolute path cannot be known in advance, but a relative path to a Git repo is known.
For example, when a Terraform project (or any Git-based project) uses Git submodules, it will know the relative location of the Git submodule repos, but cannot know the absolute path in advance because it will vary based on where the "superproject" repo is cloned. Nevertheless, those relative paths should be usable as clonable Git repos, and this mechanism would allow for that.
Support for filepaths that are already absolute is provided mainly for symmetry. It would be surprising for the feature to work with relative file paths, but not for absolute filepaths.
For projects using Terraform, in particular, this feature (along with a small change in the Terraform code to leverage it) enables the non-fragile use of relative paths in a module "call" block, when combined with Git submodules:
In the above example "superproject" Git repo (the one "calling" the terraform module) knows the relative path to its own Git submodules because they are embedded in a subdirectory beneath the top-level of the "superproject" repo.
Two downstream Terraform issues that would require go-getter support for this feature (or something like it) are:
Unable to use relative path to local Git module terraform#25488: "Unable to use relative path to local Git module"
In 0.12, modules can no longer be installed from local git repositories at relative paths terraform#21107: "In 0.12, modules can no longer be installed from local git repositories at relative paths"
Design Notes
In order for this feature to work, additional contextual information is needed by the Git detector than can be provided using the existing Detector API.
Internally, the Detector's Detect method does not pass along to the Detector implementations all of the contextual information that it has available. In particular, the forcing token and go-getter subdir component are stripped out of the source string before invoking the implementation's Detect method. In the particular case of the Git detector, that means it cannot know that a
git::
forcing token was provided on an input string that otherwise looks like a file system path. And that means that it is not correct or safe for it to identify any filepath string value as a Git repository.Externally, callers (such as Terraform) already provide a value for the
pwd
parameter ofDetect
, but it is not (necessarily) the location from which a relative path in agit::
string should be resolved. In a Terraform module (which may be in an arbitrary subdirectory from the process current working directory), module "source" references that contain relative paths must be interpreted relative to the location of the module source file. Terraform has that information available, but in the existing Detect API there is no way to convey it to go-getter.Constraints
Additional
Detector
methods cannot be added without burdening all existing detectors (both internal and in the wild) with the need to support them.Additional
Detect
method params cannot be added without breaking all existingDetector
implementations (internal, wild).Additional parameters cannot be added to the
Detect
dispatching function without affecting all callers.Approach
The goal is to provide the feature in a way that is as minimally invasive as possible. But above all else it needs to avoid breaking backward compatibility in any way.
Given that, the approach taken by this changeset series is to introduce the concept of a "Contextual Detector". It is structured in the same way as the current
Detector
framework, but works through a newCtxDetector
interface that is not constrained by the existing API.The only callers affected by this change would be those that wish to take advantage of the additional capabilities. And for those, the migration path straight-forward because the new API is structured like the existing one.
In particular, this changeset series introduces four new elements:
CtxDetector
interfaceCtxDetect
dispatching functionCtxDetect
method on theCtxDetector
interfaceFull suite of
CtxDetector
implementations that are analogues of the existing detectors (most of which (currently) just delegate to the existingDetector
implementations).There is also a global
ContextualDetectors
list that serves a function analogous to the existingDetectors
list.The centerpiece of the seris is
GitCtxDetector
, which provides a superset of the functionality ofGitDetector
, with which it is mainly backward compatible. In fact,GitCtxDetector
still delegates Git SSH URL detection toGitDetector
.However, because
CitCtxDetector
has contextual information available to it, it provides the following advantages:It can safely process
git::<filepath>
strings as Git repositories;It can avoid attempting detection on strings that have force tokens intended for other detectors; and
It can provide error messages when it was unable to process a
git::
forced string (it provides this last capability "around" the wrapped delegation toGitDetector
, too).Tentative changes that show how this change can be incorporated into Terraform are in the branch
ads/github-issue-25488
here:There is a related draft PR for those changes here:
While there is a fair amount of new code added in
go-getter
, the changes required interraform
are quite modest.