gtfs-validator
is composed from a number modules, as shown in the following dependency diagram:
graph BT;
model
core-->model;
processor-->core;
main-->model;
main-->core;
main-->processor;
cli-->main;
app:gui-->main;
app:pkg-->app:gui;
The architecture leverages AutoValue
and annotations to auto-generate the following classes used for loading and validation:
- all classes used to internally represent GTFS data (such as
GtfsStopTime.java
) *Schema.java
(such asGtfsAgencySchema.java
)*Enum.java
(such asGtfsFrequencyExactTimeEnum.java
)*Container.java
(such asGtfsAgencyTableContainer.java
)*Loader.java
(such asGtfsAgencyTableLoader.java
)*ForeignKeyValidator.java
(such asGtfsAttributionAgencyIdForeignKeyValidator.java
)
Depends on: processor
, core
, and model
If you're looking to add new GTFS fields or rules, you'll want to look at this module.
Contains:
- The command-line (CLI) app - The main application that uses the
processor
andcore
modules to read and validate a GTFS feed. - GTFS table schemas - Defines how GTFS files (e.g.,
trips.txt
) and the fields contained within that file (e.g.,trip_id
) are represented in the validator. You can add new GTFS files and fields here. - Business logic validation rules - Code that validates GTFS field values. You can add new validation rules here.
- Error notices - Containers for information about errors discovered during validation. You can add new notices here when implementing new validation rules.
Depends on: core
Contains:
- A file analyser to analyse annotations on Java interfaces that define GTFS schema and translate them to descriptors
- Descriptors of annotations fields (
ForeignKey
,GtfsEnum
,GtfsField
,GtfsFile
) - A processor to auto-generate data classes, loaders and validators based on annotations on GTFS schema interfaces
- GTFS entity classes to generate class names for a given GTFS table
- Code generators to generate code from annotations found by file analyser (e.g.
EnumGenerator
)
Depends on: model
Contains:
- Code to read zipped and unzipped file input
- CSV file and row parsers
- Notice to be generated when checking data type validation rules such as
EmptyFileNotice
- A notice container (
NoticeContainer
) - GTFS data type definitions such as
GtfsTime
,GtfsDate
, orGtfsColor
GtfsFeedLoader
to load for a whole GTFS feed with all its CSV files- GTFS feed's name
Depends on: nothing
Contains:
- root interfaces and annotations for modeling a GTFS schema table
Business logic should generally not be added to this module.
Depends on: main
A command-line-based application for running the validator.
Depends on: main
A GUI-based application for running the validator as a desktop application.
Depends on: app:gui
A minimal wrapper around app:gui
designed to facilitate packaging the GUI application as a Java Module and producing standalone executables and installers for various platforms.
1️⃣ Inputs
- A local GTFS archive (zip file) or fully qualified URL from which to download a GTFS archive
- Command line arguments
2️⃣ Validator loading
- Locate all validators annotated with
@GtfsValidator
and load them
3️⃣ Feed loading
- Read GTFS files
- Create
GtfsTableContainer
from data - Invoke and execute all
SingleEntityValidators
to validate data types, etc.
4️⃣ Validators execution
- Invoke and execute all
FileValidators
in parallel to validate GTFS semantic rules
5️⃣ Notice export
- Creates path to export notices as specified by command line input
--output
(or-o
). - Export notices from
NoticeContainer
to two JSON files in the specified directory -report.json
for validator results andsystem_errors.json
for any software errors that occurred during validation. Notices are alphabetically sorted in the.json
files.
Let's say that you are an agency which for some reason uses other_file.txt
as an additional table to represent GTFS information, and your goal is to implement validation rule related to this new table.
To do so, you would have to:
- add the new table to the validator;
- implement the new validation rules.
This section details how existing table are defined and gives information on annotation usage. One can then transpose these explanations to add a new table or field. Let's take a look at GtfsCalendarSchema
:
package org.mobilitydata.gtfsvalidator.table;
import org.mobilitydata.gtfsvalidator.annotation.ConditionallyRequired;
import org.mobilitydata.gtfsvalidator.annotation.EndRange;
import org.mobilitydata.gtfsvalidator.annotation.FieldType;
import org.mobilitydata.gtfsvalidator.annotation.FieldTypeEnum;
import org.mobilitydata.gtfsvalidator.annotation.GtfsTable;
import org.mobilitydata.gtfsvalidator.annotation.PrimaryKey;
import org.mobilitydata.gtfsvalidator.annotation.Required;
import org.mobilitydata.gtfsvalidator.type.GtfsDate;
@GtfsTable("calendar.txt")
@ConditionallyRequired
public interface GtfsCalendarSchema extends GtfsEntity {
@FieldType(FieldTypeEnum.ID)
@PrimaryKey
@Required
String serviceId();
@Required
GtfsCalendarService monday();
@Required
GtfsCalendarService tuesday();
@Required
GtfsCalendarService wednesday();
@Required
GtfsCalendarService thursday();
@Required
GtfsCalendarService friday();
@Required
GtfsCalendarService saturday();
@Required
GtfsCalendarService sunday();
@Required
@EndRange(field = "end_date", allowEqual = true)
GtfsDate startDate();
@Required
GtfsDate endDate();
}
By order of appearance in the interface definition:
@GtfsTable
: annotates the interface that defines schema forcalendar.txt
- Theprocessor
will generates data classes, loaders and validators based on annotations on this GTFS schema interface.@ConditionallyRequired
: hints that this file is conditionally required.@FieldType
: specifiescalendar_service_id
is defined as an ID by the GTFS specification.@PrimaryKey
: specifiescalendar_service_id
is the primary key of this table.@Required
: specifies a value forcalendar_service_id
is required - A notice will be issued at the parsing stage.@EndRange
: specifiesendDate
is the end point for the date range defined bycalendar.start_date
andcalendar.end_time
- A validator will be generated and check ifcalendar.start_date
is before or equal tocalendar.end_date
.
Annotation | Definition |
---|---|
CachedField | Enables caching of values for a given field to optimize memory usage. |
ConditionallyRequired | A hint that a field or a file is required. |
DefaultValue | Specifies a default value for a particular GTFS field. |
EndRange | Specifies a field for the end point of a date or time range. |
FieldType | Specifies type of a GTFS field, e.g., COLOR or LATITUDE . |
FirstKey | Specifies the first part of a composite key in tables like stop_times.txt (trip_id ). |
ForeignKey | Specifies a reference to a foreign key. |
Generated | Marker for all classes generated by annotation processor. |
GtfsEnumValue | Specifies a value for a GTFS enum. |
GtfsEnumValues | It is necessary for making GtfsEnumValue annotation repeatable. |
GtfsTable | Annotates an interface that defines schema for a single GTFS table, such as stops.txt . |
GtfsValidator | Annotates both custom and automatically generated validators to make them discoverable on the fly. |
Index | Asks annotation processor to create an index for quick search on a given field. The field does not need to have unique values. |
NonNegative | Generates a validation that an integer or a double (float) field is not negative. |
NonZero | Generates a validation that an integer or a double (float) field is not zero. |
Positive | Generates a validation that an integer or a double (float) field is positive. |
PrimaryKey | Specifies the primary key in a GTFS table. This also adds a validation that all values are unique. |
Required | Adds a validation that the field or a file is required. |
SequenceKey | Specifies the second part of a composite key in tables like stop_times.txt (stop_sequence). This annotation needs to be used in a combination with @FirstKey. |