Validation¶
How to Validate a SEIS-PROV Document¶
With a format as complex as SEIS-PROV
, validation, e.g. assuring any given
file is a valid SEIS-PROV
file, is crucial. As SEIS-PROV
is W3C PROV
with some additional constraints, this is a two step procedure:
- Make sure the file is a valid W3C PROV document.
- Assert that the additional constraints are followed.
Not all of the constraints can be enforced with a schema based validation so a
custom validator is necessary. The constraints for the various record types are
defined in a JSON file which acts as a schema for SEIS-PROV
records. It can
be found here:
A reference implementation of such a validator is also part of the
SEIS-PROV
definition. In case you want to develop your own validator,
please read the definition. The reference validator has largely been developed
around two sets of test files: A large number of valid files and a number of
invalid files with comments denoting why they are not valid. You can use this
alongside the schema as a starting point.
Official Validator¶
Installation¶
The validator is written in Python and currently supports Python 2.7, 3.3, and 3.4. Additionally it requires the following Python modules:
jsonschema>=2.4.0
lxml
prov
pytest
six
As it is not yet released, you will have to install from GitHub:
$ git clone https://github.com/SeismicData/SEIS-PROV.git
$ cd SEIS-PROV/validator
$ pip install -v -e .
You can test your installation with
$ python -m seis_prov_validate.test_validator
................................................................
................................................................
...........................................................
================== 542 passed in 0.91 seconds ==================
Command Line Usage¶
The module will install a single command: seis-prov-validate
.
$ seis-prov-validate prov.xml
VALID SEIS-PROV FILE!
Any other output mean your file is not valid. The error messages should hopefully give hints on how to fix it.
Library Usage¶
Usage as a library is also possible. The module exports a single relevant
function: seis_prov_validate.validate()
>>> from seis_prov_validate import validate
>>> result = validate("./valid_files/waveform_trace_min.xml")
>>> result.is_valid
True
>>> result.warnings
[]
>>> result.errors
[]
>>> result = validate("./invalid_files/waveform_with_extra_attribute.xml")
>>> result.is_valid
False
>>> result.warnings
[]
>>> result.errors
["Record 'seis_prov:sp001_wf_c17dd1f' has an additional attribute in "
"the SEIS-PROV namespace: 'something'. This is not allowed for this record type."]
Actions the Validator Performs¶
The following is a sequence of actions the official validator performs. A valid
SEIS-PROV
document must not fail any of these.
- Check if its a JSON or an XML file.
- Parse the document with the Python prov package. It can currently read PROV-JSON and PROV-XML serialized documents.
- Write the document as PROV-XML and validate against the PROV-XML XSD schema. This to a large parts assures the document is valid according to the W3C PROV specification.
- Make sure it has a
SEIS-PROV
namespace. Otherwise it is a valid W3C PROV document but does not contain anything fromSEIS-PROV
. - For the root document and each bundle in the document, find each provenance
record and assert the following things.
- If the record has an id or a prov:type in the
SEIS-PROV
namespace, it must be one of the fourSEIS-PROV
record types. - If not, skip this record. It is not part of
SEIS-PROV
but still valid W3C PROV as we already validated against the W3C PROV schema. - Make sure the record has exactly one prov:type attribute.
- Make sure the given
SEIS-PROV
type is valid and exists. - Validate each
SEIS-PROV
id against the regular expression. - Make sure the prov:label is correct.
- Validate the attributes against the definitions in the JSON schema.
- If the record has an id or a prov:type in the