-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: break out CLI for more specific commands/more reasonable defaults #344
Conversation
@jsstevenson did you see this old issue #210? |
one possible additional todo: add -o to things that produce file outputs to specify output location |
src/metakb/cli.py
Outdated
for name, aws_env_var_name in update_params: | ||
if aws_env_var_name in environ: | ||
msg = ( | ||
f"Updating the {name.value} AWS database from the MetaKB CLI is " | ||
f"prohibited. Unset the environment variable:`{aws_env_var_name}` to " | ||
"proceed." | ||
) | ||
_logger.error(msg) | ||
click.echo(msg) | ||
success = False | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we're fully set up to update normalizers in the cloud, I wonder if there's some configuration we could add to the Dynamo table templates that only allow writes/deletes to come from the identity that's granted to the cloud update services. It'd be nice if we didn't have to write all of these checks into the code itself.
@korikuzma having some second thoughts about what commands/combinations of commands should call |
@jsstevenson I haven't looked at the changes, but what if we just separate this out and place it on the user to decide and make a note in the documentation? |
def __repr__(self) -> str: | ||
"""Print as simple string rather than enum wrapper, e.g. 'civic' instead of | ||
<NormalizerName.CIVIC: 'civic'>. | ||
|
||
Makes Click error messages prettier. | ||
|
||
:return: formatted enum value | ||
""" | ||
return f"'{self.value}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bigger explanation of this:
For the most part, when you give enum values to click.Choice
, it'll format them correctly. For example:
[ cli-refactor ⚙ .venv] ~/code/metakb % metakb update --help
Usage: metakb update [OPTIONS] [[civic|moa]]...
However, if you pass an invalid arg, it prints them using __repr__
, so it's ugly:
Error: Invalid value for '[[civic|moa]]...': 'sdflkd' is not one of <NormalizerName.CIVIC: 'civic'>, <NormalizerName.MOA: 'moa'>.
With this fix it instead looks like this
Error: Invalid value for '[[civic|moa]]...': 'sdflkd' is not one of 'civic', 'moa'.
I don't think there are any major consequences to overwriting __repr__
like this but when debugging you have to be a little careful because it'll print without the enum wrapping. That said, not totally married to this change, I don't think it's a big deal that the validation error prints that way in the first place.
return newest_version | ||
|
||
|
||
if __name__ == "__main__": | ||
update_metakb_db(_anyio_backend="asyncio") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not totally sure why the asyncio thing was needed? maybe an old click version? I've been able to run the async transform
command without problems.
``` | ||
|
||
For more information on the different CLI arguments, see the [CLI README](docs/cli/README.md). | ||
The `--help` flag can be provided to any CLI command to bring up additional documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a way to embed Click docs into Sphinx, so I'll do that in the docs
branch.
|
||
|
||
@cli.command() | ||
@click.option("--normalizer_db_url", "-n", help=_normalizer_db_url_description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Dynamo takes a single URL but Postgres requires different libpq-style URLs that include normalizer-specific database names, I don't think it's really feasible to support postgres connections this way yet. Thought about doing something that could take a base postgres URL and add database names on top but it's probably not worth the work right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 🧼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last small thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some places where singular name is used for arguments (source, normalizer). It might make sense to make these plural since they can take > 1 value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some places where singular name is used for arguments (source, normalizer). It might make sense to make these plural since they can take > 1 value
@korikuzma yeah agreed that this becomes awkward in code. My sense, FWIW, is that it's more common on the help message itself to make the argument name singular and then represent plurality with ellipses, e.g. [FILE ...]
or [FILE] ...
. (See eg here or here or man ls
). We can use the metavar
param to specify how the arg is named in help messages while using a more appropriate variable name in the CLI code itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small requests
close #210
Miscellaneous quality of life improvements and feature additions to CLI:
metakb
as a console command.metakb update
to run a complete harvest/transform/load stepmetakb check-normalizers
andmetakb update-normalizers
to check and force refreshing of normalizer data. This supports a simple workflow likemetakb check-normalizers || metakb load-normalizers
to load if unavailable, rather than requiring the user to force normalizer reload while loading the MetaKB graph.metakb harvest
to just perform harvest of source(s)metakb transform
to just perform transform of source(s), ormetakb transform-file
to transform a specific harvested filemetakb load-cdm
to skip harvest/transform and directly load a CDM file, either from local (default location), a specific file, or from S3metakb clear-graph
to wipe the graph. No other CLI command will wipe the graph. I thought about calling it whenupdate
is used without any source qualifiers, but it seemed a little odd to include additional behavior such thatmetakb update <source> && metakb update <other source>
is different frommetakb update
. Also thought about including it as an option flag in some other commands, but at that point, you can just dometakb clear-graph && <other command>
.--output_directory
,-o
) where it makes sense. Unfortunately, most of these commands all producen
output files so I don't think there's a simple way to specify the name of the output file.username:password
option, since you need to provide both at once. (Not sure why Neo4j requires a password).cli.py
act as gateways/interfaces to them. There's probably a little bit more of this that we could do but nothing else stuck out to me.