For our RSEPedia showcase this week, let’s celebrate a library that has been empowering our json and yaml associated with Python projects for years to be parsed and verified as correct… jsonschema!
If you are already familiar with this software, we encourage you to contribute to the research software encyclopedia and annotate the respository:
otherwise, keep reading!
Python projects often have data files. These data files might be actual data for an analysis, or moreso something like a configuration file to define a service or standard. For either of these use cases, the formatting is important, meaning correct or required fields are provided, the type and structure of those fields are correct, and the overall structure of the data is what the program expects. How can we ensure or check that a data file looks okay before we hand it off to our program? This is a job for jsonschema! And did you know that it has over 133K reported users? What it basically does is provide a Python implementation of the general json-schema.org which defines itself as:
JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.
And that’s really exactly what it does. It will allow you to define how you want your json to be structured, and then validate it. And by the way - yaml also loads as json, so you can use jsonschema to validate yaml too. Let’s walk through a slightly modified example that is provided in the jsonschema README.
We start with a schema. It’s going to say that we want an object (dictionary) to describe a dinosaur with properties “name”, “age”, and “color”:
schema = {
"type" : "object",
"properties" : {
"name" : {"type" : "string"},
"age" : {"type" : "number"},
"color" : {"type" : "string"},
},
}
Now we want to validate some new instance of a dinosaur to see if it’s valid. Let’s first create a valid dinosaur:
dinosaur = {"name": "Pusheenadino", "age": 53, "color": "green"}
Now let’s see if our dinosaur validates. If no exception is raised, the instance is valid.
from jsonschema import validate
validate(instance=dinosaur, schema=schema)
Now let’s add an error (oops, our data file that we loaded has defined a string instead of a number for age!) and see if it validates:
dinosaur = {"name": "Pusheenadino", "age": "53", "color": "green"}
validate(instance=dinosaur, schema=schema)
ValidationError: '53' is not of type 'number'
Failed validating 'type' in schema['properties']['age']:
{'type': 'number'}
On instance['age']:
'53'
And that’s basically it. The trick with this library is getting familiar with how to write more complex schemas. I like to use the examples provided by json-schema.org, or look at examples in the wild (here is a set that I helped design for buildtest).
@software{julian_berman_2021_5539942,
author = {Julian Berman and
Chase Sterling and
Romain Taprest and
Harald Nezbeda and
wilson chen and
Opemipo and
DavidKorczynski and
Glenn Maynard and
Ben Smithers and
Martin Zugnoni and
Colin Dunklau and
Hillel Arnold and
Daniel Nephin and
Bouke Haarsma and
John Anderson and
Lennart and
ApamNapat and
Alexander Bayandin and
Gabriel Le Breton and
joepvandijken and
Nicolás Aimetti and
apiraino and
johnthagen and
Michael Droettboom and
Omar Ryhan and
Vlad Stefan Munteanu and
Adam Dobrawy and
Jacob D. Moorman and
Zac Hatfield-Dodds},
title = {Julian/jsonschema: v4.0.0},
month = sep,
year = 2021,
publisher = {Zenodo},
version = {v4.0.0},
doi = {10.5281/zenodo.5539942},
url = {https://doi.org/10.5281/zenodo.5539942}
}
You can also support the author on Patreon or Tidelift.
You might want to check out:
or read more about annotation here. You can clone the software repository to do bulk annotation, or annotation any repository in the software database, We want annotation to be fun, straight-forward, and easy, so we will be showcasing one repository to annotate per week. If you’d like to request annotation of a particular repository (or addition to the software database) please don’t hesitate to open an issue or even a pull request.
You might find these other resources useful:
For any resource, you are encouraged to give feedback and contribute!