- Added pool_pre_ping=True to fix connection pool issues.
- Fixed crash in
schema permissions apply
for tables that don't have a sequence in the database.
- Added SQL SEQUENCE permissions in
schema permissions apply
to makepg_dump
access easier. - Added
has_field_filter_access()
andfilter_auth
(filterAuth
in schema) to add permissions for list filtering. - Added
DatasetFieldSchema.is_relation
for convenience. - Fixed
schema permissions apply
to set the correct database grants for nested/through tables (only affectsbrk.kadastraleobjecten.soortCultuurBebouwd
for now). - Fixed accessing
DatasetFieldSchema.unit
if the unit is missing. - Fixed typo in unused property
discription_with_unit
->description_with_unit
. - Fixed clearing cached properties when
auth
orfilterAuth
are updated.
- Fix Django<4.2 pinning.
- Improved performance of auth checks (especially
has_field_access()
). - Changed method signature of
UserScopes.has_all_scopes()
andUserScopes.has_any_scopes()
(most callers should usehas_field_access()
anyway.) - Block
deepcopy()
of schema fields, as it's very slow. - Removed
DatasetType
base class. - Removed
schema import events
code, as its no longer used. - Removed deprecated API's (
schematools.utils
andis_relation_temporal
). - Removed wirerope dependency.
- Removed Python 3.8 style annotations.
- Add
unit
anddescription_with_unit
properties to fields.
- Change
id_field
to snake_case. It is directly used in sql queries.
- Added extra
id_field
to Datasettable. Needs to be configurable for geosearch.
- Added temporal indexes
- Fix bug in exporter where a loop through dataset tables was prematurely left when a tables has not export data.
- Do not write an empty export file if no columns are selected. Also fix the export to only use active records for jsonlines.
- Bugfix in _is_valid_sql
- Fix in _is_valid_sql to fix materialized view problem.
- Fix to the _get_scopes to return the correct scopes for both dataset, table and table fields.
- Fix the storage of datasettables.display_field (an old copy/paste error in the codebase)
- Modified create_views functions to support materialized views
- Add
enable_export
column to thedataset
model to be able to configure the exports per dataset.
- Remove the Django >= 4.2 pinning, because DSO is still on Django 3.x. Later on, we can migrate both schematools and DSO simultaneous to >= 4.2
- Fix auth property for subfields. The subfields do not have scopes, however, a scope can be defined on the parent field.
- Added an extra helper method to user-scopes to determine if one of the fields has a scope that is blocking access.
- Some old definitions (gebieden.stadsdelen) are using temporal relations defined as a plain string instead of an objects. The exporter need to take this into account.
- Change export to only use active records for csv and jsonlines,
so, no historical records. Also brought the export more in line
with the csv export of the DSO-API:
- headers using capitalize()
- date-time in iso notation
- foreign keys only with an
identificatie
(novolgnummer
)
- Updated Django version > 4.2
- Updated the github workflow to use postgres14 image
- Improve possibility to use git commit hashes when creating SQL migrations from amsterdam schema table definitions. Now also supports schemas with table definitions in separate files.
- Add possibility to use git commit hashes when creating SQL migrations from amsterdam schema table definitions.
- Bugfix: Update nested table when nested field name has underscore.
- Bugfix: Update parent table when parent table has shortname for update events.
- Bugfix: Only check for row existence when table exists.
- Bugfix: Ignore id when copying data from temp table to main table for nested tables.
- Bugfix: Snake case temp table schema name in EventProcessor.
- Bugfix: Don't try to create schema if schema already exists. Fails on 'create schema' permissions.
- Bugfix: Fixed issue where duplicate indexes were created
- Bugfix: Cache nested tables in EventProcessor.
- Bugfix: Reset last eventid after a manually aborted full load sequence.
- Bugfix: Fix full event loads for relation tables referencing tables with shortname.
- Bugfix: Fixed view_data insertion into datasets.dataset
- Bugfix: Fixed case where nested table has a parent table that uses shortname.
- Bugfix: Fixed bug in _is_valid_sql.
- Bugfix: Assigned create and usage rights to write_user for creating views.
- Bugfix: Fix error in permissions script, introduced a
view_owner
role that owns all views.
- Bugfix: Fix error when nested object in event is null.
- Bugfix: Fix error when relation table is not present during a relation full load.
- Bugfix: Fix error when trying to update relation from None value.
- Bugfix: update nested tables in EventProcessor.
- Bugfix: check for required permissions was not taking the
OPENBAAR
scope into account in the correct way.
- Fix: Cast datetime type to string, because of a out-of-range year in bag_panden.
- Bugfix: Fix error when invalid table is entered in derivedFrom paramter
- Bugfix: Fixed error in detecting if write user exists
- Feature: Added create-views command to django management commands to facilitate creating views.
- Bugfix: Ignore empty input lines in NDJSONImporter.
- Feature: Use dataset-specific schema to store temporary full load tables.
- Bugfix: Update main table relations after full load of relation table.
- Bugfix: Fix case of updating parent table where two relations exist where the name of one relation is a prefix of the other relation.
- Feature: Added the option
--additional-grants
to theschema permissions apply
script to be able to set grants for non-amsterdam-schema tables. This is needed for thedatasets_*
tables, because on Azure these tables are accessed in PostgreSQL from a user (or the anonymous) account and thescope_openbaar
scopt has to be granted for these tables.
- Bugfix: For the edge case that the dataset has the id
datasets
the validator was not behaving correctly. That has now been fixed.
- Bugfix: Fix missing fields in through table (second try).
- Feature: EventProcessor: Process events for which no relation table exists, does update parent table.
- Bugfix: Fix missing fields in through table. If a relation has extra properties defined on the relation, these properties should also be available on the through table that is created for this relation.
- Bugfix: Altered UnlimitedCharField to not throw an exception when max_length is found in kwargs
- Bugfix: nullable_int faker did not play well with enums, is now fixed.
- Added cli option to mocker to limit the tables.
- Feature: EventProcessor: Track processed event ids now for full load sequences as well.
- Feature: EventProcessor: Track processed event ids to avoid duplicate processing and key collisions.
- Bugfix: Fix constructing id's for tables where the id keys contain underscores.
- Bugfix: Removed a check for datasets with status beschikbaar in schematools/permissions/db.py set_dataset_read_permissions.
- Bigfix: Changed tests/test_export.py test_jsonlines_export to account for percision differences
- Bugfix: Use engine.connect() instead of engine.execute() directly. Not supported anymore in SQLAlchemy 1.4.
- Bugfix: Use column names in INSERT INTO statement instead of column positions.
- Fix bug in event processor. Use shortname attribute when updating parent table.
- Fix bug in event processor. Don't try to update parent tables for relation tables of n:m relations.
- Implement logic to recover from failed event messages
- Two small fixes to make
sqlmigrate_schema
work:- requires_system_checks needs to be a list (from Django 1.4)
- list of datsets need to be a set when calling Django schema migrate API
- Patch to fix custom implementation of UnlimitedCharField.max_length
- Recognize more than 2 consecutive capital letters as word boundaries
- Fix database column naming in model mocker class construction
- Fix handling of geometry fields containing underscores in the attribute name.
- Add utility cli commands for case-changes (snake, camel).
- Make export to csv/jsonlines less memory hungry.
- Add serialization of Decimal for orjson.dump() in exporter.
- Add option
ind_create_pk_lookup
toEventsProcessor
, to skip expensive index creation.
- Add UUID column type for introspection of PostgreSQL db.
- Add a
--to-snake-case
option to theschema show dataset[table]
cli functions.
- Add support for loading events in batches. Extract initialisation and finalisation into separate methods to improve performance. Cache initialised tables.
- Disable the versioning that creates postgresql schemas for new tables. This functionality is not fully completed and accepted and is now blocking the event processing code.
- Skip index creation on temporary full load table from event importer.
- Fix truncate bug that truncated all associated tables when updating a relation table.
- Add support for
first_
andlast_of_sequence
headers for event importer.
- Simplification of the events importer. Relations are now imported as separate objects.
- Apply some small fixes to cli commands and update template used to generate schema by introspection.
- Exclude all array-type fields during exports.
- Add cli commands to list schemas and tables.
- Workaround for DSO-API docs not loading.
- Fix condition for through tables for a 1-N relation.
- Pin SQLAlchemy to >= 1.4, < 2.0 to make schematools usable from Airflow 2.4.1.
- Add export cli commands to export geopackages, csv and jsonlines.
- Through tables for a 1-N relation is now based on the fact that the object field definition in the schema has additional attributes that are not part of the relation key.
- Security fix: authorisation on fields with subfields was incorrectly handled.
- The
schema validate
command was fixed to work with v2 publishers. - Validation errors are reporting in a hopefully more readable format.
enum
values in schemas are now type-checked during validation.
- Require SQLAlchemy <= 1.12.5
- Fix structural validation of publisher references by not inlining them in the json held against the metaschema.
- Pin pg-grant to 0.3.2 to stay compatible with SQLAlchemy
- Bugfix Dataset.json not properly dereferencing publisher property
- Fix names for the subfields of an objectfield. These names need a prefix, because they are exposed externally in the DSO API.
- Print error path as is from batch-validate.
- Bugfix for loader methods
get_publisher
andget_all_publishers
. - Dataset.publisher returns publisher object irrespective of schema version.
- Add whitelist to exclude certain datasets from the path-id validator.
- Pin SQLAlchemy to a version smaller than 1.4.0, because
pg_grant
breaks on a higher version.
- Bugfix for for name clashes that occur in Django ORM relation fields when two versions of the same dataset are deployed next to eachother.
- Bugfix for regression which caused dataset id to be matched with the path of a table when the validated schemafile is a table.
- Feature added to enable use of object fields in amsterdam schema.
Those fields are flattened in the relational schema (added to the parent table).
Furthermore, a second type of object field with
"format": "json"
has been added. For those fields an opaque json blob will be added in the relational database.
- Correctly resolve the publisher URL, regardless of whether there is a trailing slash
schema batch-validate
now produces more readable error messages.
- Bugfix in CLI batch_validate that caused validation to stop at the first invalid schema
- Bugfix in CLI batch_validate that caused dataset.json files in nested directories to be unresolvable
SUPPORTED METASCHEMAS: 1 2
- The
schema ckan
command was changed to generate unique (we hope) titles - Bugfix for getting pubishers from an online index
- Bugfix in publisher validation logging
SUPPORTED METASCHEMAS: 1 2
- Bugfix in batch_validate that treats extra_meta_schema_url as an argument instead of an option.
- Add pre-commit hook for validating publishers
SUPPORTED METASCHEMAS: 1 2
Note that support is not guaranteed yet, for now this a declaration of intention. Any bugs should be reported.
- Support loading and validating publishers from the schema-server.
- Make schematools aware of the metaschema major versions it can work with.
- Support for attempting validation against multiple metaschemas.
SUPPORTED METASCHEMAS: 1 2
- Several minor fixes to tests.
- Removal of unused DatasetSchema.identifier property
- Add
neuronId
is mapping needed for through table identifiers
- Mocked schemas now use properly camel-cased field names.
- Relations can be primary keys.
- The command
schema batch-validate
now works on table files as well asdataset.json
files.
- Fix importing schema files by using a relative path.
- Fix
related_dataset_schema_ids
to also detect changes in nested objects. - Fix
DatasetTableSchema.get_fields()
to return cached instances too. - Fix
verbose_name
ofGeometryField
in Django ORM, which reused globally defined data. - Fix performance of iterating over subfields, no longer needs to load related tables.
- Added
DatasetFieldSchema.is_nested_object
property. - Normalized exceptions for missing datasets/tables/fields:
- The
DatasetNotFound
exception extends fromSchemaObjectNotFound
. - Added
DatasetTableNotFound
andDatasetFieldNotFound
. - There is no need for
except (DatasetNotFound, SchemaObjectNotFound)
code, it can all beexcept SchemaObjectNotFound:
.
- The
- Cleanup Django model field creation logic.
- Cleanup SQLAlchemy column creation logic.
- The schema validator now rejects tables with both an 'id' field and a composite primary key.
- Fix
limit_tables_to
issue with crash in index creation for skipped tables. - Fix
limit_tables_to
issue for M2M relations, now reports the table is not available. - Fix SRID value for SQLAlchemy geometry columns (were always RD/NEW).
- Fix CKAN upload to skip datasets that are marked as "not available".
- Improved 3D coordinate system detection, and added more common SRID values.
- Improved naming of geometry column index to be consistent with other generated indices.
- Fix
BaseImporter.generate_db_objects()
to handle properly snake-cased table identifiers values for table creation. - Improve the underlying
tables_factory()
logic to support snake-cased table identifiers for all remaining parameters.
- Improve
limit_tables_to
to accept snake-cased table identifiers, which broke Airflow jobs. This addresses an inconsistency between parameters, whereBaseImporter.generate_db_objects()
allowed snake-cased identifiers fortable_id
, but needed exact-cased values forlimit_tables_to
.
A big change in schema loading.
This mostly affects unit tests in other projects, or files that do custom schema loading.
Unit test code should preferably use a schema_loader
instance per test run,
as all datasets are only cached within the same loader instance now.
- Added
schematools.loaders.get_schema_loader()
that provides a single object instance for loading. - Added
DatasetSchema.table_versions
mapping to access other table versions by name. - Added
Record.source
attribute toBaseImporter.load_file()
andparse_records()
return values. This allows callers to inspect the source record, e.g. for cursor handling. - Removed
TableVersions
injection in dataset schema data. Tables are now loaded on demand. - Removed internal global dataset cache, datasets are only cached per loader.
- Removed ununsed functions in
schematools.utils
. - Deprecated loading functions in
schematools.utils
, useschematools.loaders
instead.
- Using
BigAutoField
for all identifier fields now by default. - Fixed Django system check warnings for
AutoField
/BigAutoField
migration changes. - Fixed CKAN metadata upload to https://data.overheid.nl/ for datasets without a description or title.
- Added validation check to prevent field names from being prefixed with their table or dataset name.
- Fixed Django
db_column
for subfields that use a shortname (regression by 5.0). - Fixed dependency pinning of shapely to 1.8.0
A major new release that cleans up various internal API's.
-
Added many improvements to creating mock data.
-
Changed CLI arguments for mocking to be more intuitive.
-
Changed schema loaders to return relative paths instead of dataset ID's.
-
Changed test runner to skipping tests that require the database.
-
Completely rewrote the NDJSON importer for simplicity.
-
Completely rewrote database index creation for simplicity.
-
Fixed shortname leaking via
Dataset{Table,Field}Schema.name
attributes (also see PR #332 and #344). -
Fixed display/geometry field notation as exposed via
dataset_field
table. -
Fixed importing datasets from the filesystem that are namespaced inside a subfolder.
-
Fixed using schemaloader in Django management commands.
-
Fixed
saloger
fixture leaking to every other test, flooding the console. -
New API's:
-
DatasetSchema
:python_name
(formats as ClassName)db_name
(formats in snake_case)
-
DatasetTableSchema
:python_name
(formats as ClassName)short_name
through_fields
(for through tables)temporal.identifier_field
main_geometry_field
identifier_fields
-
DatasetFieldSchema
:python_name
is_identifier_part
is_subfield
srid
related_fields
nested_table
through_table
-
-
Changed API's:
-
DatasetTableSchema
:display_field
returns actual field now.temporal.dimensions
returns actual fields now.db_name()
=>db_name
became a property for the typical common usage.db_name_variant()
provides the versioned-table support
-
DatasetFieldSchema
:db_name()
=>db_name
- became a property for consistencyis_temporal
=>is_temporal_range
get_subfields()
=>subfields
- no longer needs prefixes.
-
-
Moved
to_snake_case()
/toCamelCase()
imports toschematools.naming
-
Deleted obsolete / unused functions:
DatasetTableSchema.name
(use theid
,db_name
, orpython_name
instead).get_dimension_fieldnames()
get_through_tables_by_id()
get_fields_by_id()
shorten_name()
_get_fk_fields()
-
Removed
DatasetTableSchema.get_subfields(add_prefixes=True)
logic as the new naming attributes address that. -
Removed unused Docker stuff in
consumer/
folder. -
Removed
more-itertools
dependency.