Feb 21, 2021. | By: @vsoch

For our RSEPedia showcase this week, let’s celebrate a library that has been empowering our json and yaml associated with Python projects for years to be parsed and verified as correct… jsonschema!


If you are already familiar with this software, we encourage you to contribute to the research software encyclopedia and annotate the respository:

otherwise, keep reading!

What is jsonschema?

Python projects often have data files. These data files might be actual data for an analysis, or moreso something like a configuration file to define a service or standard. For either of these use cases, the formatting is important, meaning correct or required fields are provided, the type and structure of those fields are correct, and the overall structure of the data is what the program expects. How can we ensure or check that a data file looks okay before we hand it off to our program? This is a job for jsonschema! And did you know that it has over 133K reported users? What it basically does is provide a Python implementation of the general json-schema.org which defines itself as:

JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.

And that’s really exactly what it does. It will allow you to define how you want your json to be structured, and then validate it. And by the way - yaml also loads as json, so you can use jsonschema to validate yaml too. Let’s walk through a slightly modified example that is provided in the jsonschema README.

A Quick Example

We start with a schema. It’s going to say that we want an object (dictionary) to describe a dinosaur with properties “name”, “age”, and “color”:

schema = {
    "type" : "object",
    "properties" : {
        "name" : {"type" : "string"},
        "age" : {"type" : "number"},
        "color" : {"type" : "string"},

Now we want to validate some new instance of a dinosaur to see if it’s valid. Let’s first create a valid dinosaur:

dinosaur = {"name": "Pusheenadino", "age": 53, "color": "green"}

Now let’s see if our dinosaur validates. If no exception is raised, the instance is valid.

from jsonschema import validate
validate(instance=dinosaur, schema=schema)

Now let’s add an error (oops, our data file that we loaded has defined a string instead of a number for age!) and see if it validates:

dinosaur = {"name": "Pusheenadino", "age": "53", "color": "green"}
validate(instance=dinosaur, schema=schema)
ValidationError: '53' is not of type 'number'

Failed validating 'type' in schema['properties']['age']:
    {'type': 'number'}

On instance['age']:

And that’s basically it. The trick with this library is getting familiar with how to write more complex schemas. I like to use the examples provided by json-schema.org, or look at examples in the wild (here is a set that I helped design for buildtest).

How do I cite it?

The author doesn’t appear to have a citation, but you can support him on Patreon or Tidelift.

How do I get started?

You might want to check out:

How do I contribute to the software survey?

or read more about annotation here. You can clone the software repository to do bulk annotation, or annotation any repository in the software database, We want annotation to be fun, straight-forward, and easy, so we will be showcasing one repository to annotate per week. If you’d like to request annotation of a particular repository (or addition to the software database) please don’t hesitate to open an issue or even a pull request.

Where can I learn more?

You might find these other resources useful:

For any resource, you are encouraged to give feedback and contribute!


News 2

Tutorials 2

Software 33

Recent Posts