Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**MAJOR CHANGES** Simplify plugin & description redesign #64

Merged
merged 127 commits into from
Feb 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
127 commits
Select commit Hold shift + click to select a range
060d5da
Fixing hash id extraction method.
rhysrevans3 Jul 27, 2022
fd1ed70
Initial commit.
rhysrevans3 Jul 27, 2022
9f65f37
Moving bulk outputs to own directory.
rhysrevans3 Jul 28, 2022
8a70a54
Removing deduplication for rabbitmq output.
rhysrevans3 Jul 28, 2022
855a1a0
Reconfiguring bulk outputs.
rhysrevans3 Jul 29, 2022
ce87f0b
Adding path parts extraction method.
rhysrevans3 Sep 1, 2022
b7ef2e9
Moving categories too extraction method.
rhysrevans3 Sep 2, 2022
a92623e
Flake fix
rhysrevans3 Sep 5, 2022
b71b797
Flake fix
rhysrevans3 Sep 5, 2022
623ac2c
Removing unused import.
rhysrevans3 Sep 5, 2022
ff0e177
undoing last commit.
rhysrevans3 Sep 5, 2022
5fd4eda
Combining the file stats extractors.
rhysrevans3 Sep 6, 2022
55db5f8
Spelling mistake fix.
rhysrevans3 Sep 6, 2022
f821368
Updating hash command.
rhysrevans3 Sep 6, 2022
2bd4828
Change from agregated to status.
rhysrevans3 Sep 14, 2022
28a10ce
Allowing terms to be kept in string join method.
rhysrevans3 Sep 22, 2022
791969d
Merge branch 'unify_file_stats_extraction' into issue/172/atod
rhysrevans3 Sep 22, 2022
7344e76
Fixing spelling mistake.
rhysrevans3 Sep 22, 2022
1a1f2cb
Spelling mistake.
rhysrevans3 Sep 22, 2022
c9c4914
Merge branch 'catagories_extraction' into issue/172/atod
rhysrevans3 Sep 26, 2022
23d9b41
Fixing flake8 warnings.
rhysrevans3 Sep 26, 2022
34e3981
Merge branch 'master' of github.com:cedadev/stac-generator into issue…
rhysrevans3 Sep 26, 2022
d7bfe7e
Ensuring uri in asset.
rhysrevans3 Sep 27, 2022
a386f3b
Adding missing requirements.
rhysrevans3 Sep 27, 2022
0317373
updating rabbitmq input logic.
rhysrevans3 Sep 27, 2022
1dd6eb3
Removing destination exchange from rabbit.
rhysrevans3 Sep 27, 2022
ab0d213
Removing unnessary nesting.
rhysrevans3 Sep 27, 2022
bb18e5b
Reverting message variable to uri.
rhysrevans3 Sep 28, 2022
8508b35
Adding check for filepath instead of uri.
rhysrevans3 Sep 28, 2022
510ab88
Fixing file stats error.
rhysrevans3 Sep 28, 2022
8ea1217
Adding more logging.
rhysrevans3 Sep 28, 2022
9504dbb
Extracting uri before running process.
rhysrevans3 Sep 28, 2022
e7fdd59
Updating rabbit and file stats
rhysrevans3 Sep 29, 2022
950af1c
Updating path parts skip default
rhysrevans3 Sep 29, 2022
b54d0b1
Upadting mod_time format.
rhysrevans3 Sep 29, 2022
36d9ac9
Removing unneeded quotes
rhysrevans3 Sep 29, 2022
27992a1
Removing Z.
rhysrevans3 Sep 29, 2022
6e28d8f
Wrong regex
rhysrevans3 Sep 29, 2022
b5523d4
Adding status to collection.
rhysrevans3 Sep 29, 2022
324872a
Moving file stats to properties.
rhysrevans3 Sep 30, 2022
42a6523
Adding ast eval if json fails.
rhysrevans3 Oct 3, 2022
ffb2821
Adding kwargs to rabbit exchange.
rhysrevans3 Oct 4, 2022
57f0884
Rabbitmq logic change.
rhysrevans3 Oct 5, 2022
8f7fce3
Fixing logic error.
rhysrevans3 Oct 5, 2022
084b0ef
Adding the description path to object body.
rhysrevans3 Oct 5, 2022
24d8bf3
Adding description path for item and collection.
rhysrevans3 Oct 5, 2022
5b15bbb
Adding extra logging.
rhysrevans3 Oct 5, 2022
70e604b
Lazy load logging.
rhysrevans3 Oct 5, 2022
0a6862c
Updating boto filestat backend.
rhysrevans3 Oct 6, 2022
2c3da08
Adding logging for description tree creation.
rhysrevans3 Oct 6, 2022
783509d
Elasticsearch change from |= to update.
rhysrevans3 Oct 6, 2022
f7fb7ea
Switching to connection kwargs.
rhysrevans3 Oct 6, 2022
bf673fa
Wrong variable name.
rhysrevans3 Oct 6, 2022
a4a1519
Debugging.
rhysrevans3 Oct 7, 2022
e9d4167
Updating ceda-directory-tree requirement.
rhysrevans3 Oct 7, 2022
baa15b5
Renaming esgf solr file.
rhysrevans3 Oct 7, 2022
eddc8b0
Having check for collection_id
rhysrevans3 Oct 7, 2022
e36c7fe
Updating collection description merging.
rhysrevans3 Oct 10, 2022
94cafd3
Removing encoding from open for 3.7
rhysrevans3 Oct 10, 2022
cb489a0
Readding encoding with correct spelling.
rhysrevans3 Oct 10, 2022
cd193e1
Correcting file title
rhysrevans3 Oct 10, 2022
eeb5913
Fixing collection description errors.
rhysrevans3 Oct 10, 2022
4efd280
Remvoing unused import.
rhysrevans3 Oct 10, 2022
903404a
Adding typing to new methods.
rhysrevans3 Oct 10, 2022
7acaa9c
Adding id_term for elasticsearch method.
rhysrevans3 Oct 10, 2022
e4c1a7e
Extracting file stats extraction to seperate
rhysrevans3 Nov 1, 2022
3bc1526
Fixing format errors.
rhysrevans3 Nov 1, 2022
619c973
Removing debug timing.
rhysrevans3 Nov 2, 2022
1a2daa5
Removing unnessary logging.
rhysrevans3 Nov 2, 2022
b2be77c
Updating bulk logic.
rhysrevans3 Nov 3, 2022
395057e
Merge branch 'master' of github.com:cedadev/stac-generator into issue…
rhysrevans3 Nov 3, 2022
144d1f9
Removing unnessary variable.
rhysrevans3 Nov 3, 2022
c26ebf0
Merge branch 'issue/50/bulk_rabbitmq' into issue/172/atod
rhysrevans3 Nov 3, 2022
db4f29f
Fixing hash bug.
rhysrevans3 Nov 3, 2022
e2f1460
Removing depricated type from bulk es.
rhysrevans3 Nov 3, 2022
0f2c5ab
Updating requirements.
rhysrevans3 Nov 9, 2022
569d61d
Switching too streaming_bulk.
rhysrevans3 Nov 23, 2022
903bd6d
Allowing for base description.
rhysrevans3 Dec 2, 2022
b022857
Adding basic qos for rabbit.
rhysrevans3 Jan 19, 2023
4c2188d
Adding keyword for basic_qos.
rhysrevans3 Jan 25, 2023
045a56d
Updating item id creation.
rhysrevans3 Feb 10, 2023
be9dc38
Adding request timeout too elasticsearch extrator.
rhysrevans3 Feb 14, 2023
f0da6bf
Fixing incorrect text file output.
rhysrevans3 Feb 20, 2023
0e80c20
Fix for json_file extration method.
rhysrevans3 Feb 20, 2023
84bca96
Fix for last fix.
rhysrevans3 Feb 20, 2023
7513c30
Another fix for json_file extractor.
rhysrevans3 Feb 20, 2023
be134f7
Allow elasticsearch to remove old records.
rhysrevans3 May 11, 2023
6967587
Adding elasticsearch input.
rhysrevans3 Jul 7, 2023
01f9477
Improving text file input performance.
rhysrevans3 Jul 7, 2023
b75ee77
Updating script to use click.
rhysrevans3 Jul 7, 2023
c1bdf32
Flake8 fixes.
rhysrevans3 Jul 7, 2023
3001875
Merge branch 'master' of github.com:cedadev/stac-generator into issue…
rhysrevans3 Jul 7, 2023
77fb0c9
Move all extractions to extraction_methods.
rhysrevans3 Jul 13, 2023
a9325fe
Seperating dot seperated string and hash methods.
rhysrevans3 Jul 20, 2023
a9efa2b
Adding CEDA mapping.
rhysrevans3 Jul 21, 2023
0c3955a
Moving URI into body for extraction methods.
rhysrevans3 Jul 21, 2023
d2ee759
Adding baker recipes.
rhysrevans3 Aug 8, 2023
1eadc04
Removing old baker and collection describer.
rhysrevans3 Aug 8, 2023
11898ee
Simplifying baker get.
rhysrevans3 Aug 8, 2023
1d4f1a5
Adding mappings.
rhysrevans3 Aug 24, 2023
9bf3888
Add remove, regex asset and
rhysrevans3 Sep 18, 2023
7850540
Updating JSON output to create multiple files.
rhysrevans3 Sep 21, 2023
14a3cfe
Adding xarray requirements.
rhysrevans3 Oct 5, 2023
a325fb5
Checking type for recipe in baker.
rhysrevans3 Oct 5, 2023
e7150f6
Testing with sentinel and cmip6.
rhysrevans3 Oct 10, 2023
2ba1e0c
Fixing extraction method bugs.
rhysrevans3 Jan 9, 2024
c8825da
black formatting.
rhysrevans3 Jan 9, 2024
1dd8154
Merge branch 'master' of github.com:cedadev/stac-generator into simpl…
rhysrevans3 Jan 9, 2024
b93227d
Fixing flate8 errors.
rhysrevans3 Jan 9, 2024
faebe8f
isort
rhysrevans3 Jan 9, 2024
adc1070
Moving extraction_methods to seperate repo.
rhysrevans3 Jan 24, 2024
c628c13
Updating fast_api output.
rhysrevans3 Feb 5, 2024
92af0b8
Merge branch 'simplify_plugins' of https://github.com/cedadev/stac-ge…
rhysrevans3 Feb 5, 2024
00a96aa
Adding example code.
rhysrevans3 Feb 7, 2024
3399e22
Updating STAC mapping.
rhysrevans3 Feb 20, 2024
b28c8e8
Updating docs.
rhysrevans3 Feb 27, 2024
fe50e7e
Run black.
rhysrevans3 Feb 27, 2024
18f2399
i-sort.
rhysrevans3 Feb 27, 2024
0ff3c35
Flake8
rhysrevans3 Feb 27, 2024
cf0bc31
python version strings for gh tests.
rhysrevans3 Feb 27, 2024
b50a74f
Updating tests.
rhysrevans3 Feb 27, 2024
6d9437e
Updating requirements.
rhysrevans3 Feb 27, 2024
92fffb1
requirements.
rhysrevans3 Feb 27, 2024
05358df
requriements.txt
rhysrevans3 Feb 27, 2024
a6ce7d0
test requirements
rhysrevans3 Feb 27, 2024
6830431
3.11 requirements.
rhysrevans3 Feb 28, 2024
17f8269
Fixing broken baker tests.
rhysrevans3 Feb 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/formatting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.8
python-version: 3.9
- name: Python Blacken
uses: psf/black@stable
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.7, 3.8]
python-version: ['3.9', '3.10', '3.11']

steps:
- uses: actions/checkout@v1
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.idea
*.egg-info
*venv
conf
.vscode
conf.yml
__pycache__
*.pyc
Expand All @@ -11,3 +11,4 @@ docs/build/
stac_generator-1.0.2.zip
.lock
GLOB-0.log
cprofile
4 changes: 2 additions & 2 deletions docs/source/api/stac_generator/stac_generator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ STAC Generator

:fa:`github` `View on Github <https://github.com/cedadev/stac-generator>`_

.. automodule:: stac_generator.core.processor
.. automodule:: stac_generator.core.extraction_method
:members:

.. autoclass:: stac_generator.core.generator.BaseGenerator

.. automodule:: stac_generator.core.collection_describer
.. automodule:: stac_generator.core.baker
:members:
110 changes: 0 additions & 110 deletions docs/source/collection_descriptions/building_a_workflow.rst

This file was deleted.

139 changes: 0 additions & 139 deletions docs/source/collection_descriptions/collection_descriptions.rst

This file was deleted.

20 changes: 7 additions & 13 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,11 @@ change the source of the files, the output of the metadata and the processing ch
which extracts the metadata. The framework leverages a modular, plugin architecture
to allow users to modify the workflow to fit their needs.

The process expects a stream of "assets" (an asset being a file, zarr object, etc.).
The process expects a stream of "messages" for which the recipes can be run against.
The source of this stream is configured with `input plugins <stac_generator/inputs>`_
which could be as simple as listing directories on a file system or using message
queues as part of a complex ingest system. The `generators <generators>`_ operate on this stream and
pass to `output plugins <stac_generator/outputs>`_. The output is at the level
of an "asset" so higher level aggregated objects may require an aggregation step.
pass to `output plugins <stac_generator/outputs>`_.

These outputs are also configurable so could dump to the terminal (for debugging), file,
a data store (postgres, elasticsearch, etc.) or even a message queue for onward processing.
Expand All @@ -36,22 +35,17 @@ in a certain space and time.
Generators
==========

The different generators are designed to extract different levels of metadata to build the assets, items, and collections of the STAC Catalog.
The different generators are designed to extract different levels of metadata to build the items, and collections of the STAC Catalog.

.. list-table::
:header-rows: 1

* - Name
- Description
* - :ref:`Asset Generator <stac_generator/generators:asset>`
- Generates STAC Assets via extraction methods specified in the :ref:`colelction descriptions <collection_descriptions/collection_descriptions:collection descriptions>`
focusing on file metadata (name, location, size, etc.)
* - :ref:`Item Generator <item_generator/generators:item>`
- Generates STAC Items via extraction methods specified in the :ref:`colelction descriptions <collection_descriptions/collection_descriptions:collection descriptions>`
focusing on aggregation from asset metadata.
* - :ref:`Collection Generator <stac_generator/generators:collection>`
- Generates STAC Collections via extraction methods specified in the :ref:`colelction descriptions <collection_descriptions/collection_descriptions:collection descriptions>`
focusing on aggregation from item metadata.
* - :ref:`Item Generator <item_generator/plugins/generators/item>`
- Generates STAC Items via extraction methods specified in the :ref:`colelction descriptions <recipe/recipes>`.
* - :ref:`Collection Generator <stac_generator/plugins/generators/collection>`
- Generates STAC Collections via extraction methods specified in the relivant :ref:`recipe <recipe/recipes>`.



Expand Down
Loading
Loading