Parsl

Sep 14, 2020. | By: @vsoch

Have you ever had a workflow, or even an analysis script, and wanted to speed it up? Running something in parallel is probably an option that you considered. For this week’s software showcase we highlight Parsl, a parallel programming library for Python. If you know how to write a function in Python, there is a good chance you can extend it to be run with parsl, in parallel, either sequentially, with multicore (CPU, GPU, or accelerators) or multi-node MPI.


/rseng/assets/img/posts/showcase/parsl.png


If you already know and love Parsl, we encourage you to contribute to the research software encyclopedia and annotate the respository:

otherwise, keep reading!

What is Parsl?

Parsl works very simply to execute parallel tasks via Python. Let’s say that we have a function to run in parallel - call this a task, and when we run it in parallel, we will have many tasks. You first define a simple Python-based configuration that outlines where and how to execute your tasks. This might include settings for a cloud provider, a local cluster (SLURM, Torque/PBS, HTCondor, or Cobalt), or even a container orchestration framework (Kubernetes anyone?). Then Parsl will do it’s magic to scale your script. It’s even been used in the ballpark of running hundreds of thousands of cores across many thousands of nodes on high performance computing. Specifically, the following use cases are strong for this software (these come directly from the Parsl README:

  • Concurrent execution of a set of tasks in a bag-of-tasks program
  • Procedural workflows in which tasks are executed following control logic
  • Parallel dataflow in which tasks are executed when their data dependencies are met
  • Heterogeneous many-task applications in which many different computing resources are used together to execute different types of computational tasks
  • Dynamic workflows in which the workflow is determined during execution
  • Interactive parallel programming through notebooks or another interactive mechanism

And guess what? The second community meeting, ParslFest 2020 is being held in October 2020! So if you’ve never given it a look, now is the time! There are also a set of resources and tutorials for you to learn from.

How do I cite it?

Parsl can be cited via its ACM digital object identifier:

@inproceedings{10.1145/3307681.3325400,
author = {Babuji, Yadu and Woodard, Anna and Li, Zhuozhao and Katz, Daniel S. and Clifford, Ben and Kumar, Rohan and Lacinski, Lukasz and Chard, Ryan and Wozniak, Justin M. and Foster, Ian and Wilde, Michael and Chard, Kyle},
title = {Parsl: Pervasive Parallel Programming in Python},
year = {2019},
isbn = {9781450366700},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3307681.3325400},
doi = {10.1145/3307681.3325400},
abstract = {High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore's law), necessitates rethinking how parallelism is expressed in programs. Here, we present Parsl, a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism. These constructs allow Parsl to construct a dynamic dependency graph of components that it can then execute efficiently on one or many processors. Parsl is designed for scalability, with an extensible set of executors tailored to different use cases, such as low-latency, high-throughput, or extreme-scale execution. We show, via experiments on the Blue Waters supercomputer, that Parsl executors can allow Python scripts to execute components with as little as 5 ms of overhead, scale to more than 250000 workers across more than 8000 nodes, and process upward of 1200 tasks per second. Other Parsl features simplify the construction and execution of composite programs by supporting elastic provisioning and scaling of infrastructure, fault-tolerant execution, and integrated wide-area data management. We show that these capabilities satisfy the needs of many-task, interactive, online, and machine learning applications in fields such as biology, cosmology, and materials science.},
booktitle = {Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing},
pages = {25–36},
numpages = {12},
keywords = {parallel programming, parsl, python},
location = {Phoenix, AZ, USA},
series = {HPDC '19}
}

How do I get started?

The following resources are good for getting started:

You can also post questions on the GitHub issues board.

How do I contribute to the software survey?

or read more about annotation here. You can clone the software repository to do bulk annotation, or annotation any repository in the software database, We want annotation to be fun, straight-forward, and easy, so we will be showcasing one repository to annotate per week. If you’d like to request annotation of a particular repository (or addition to the software database) please don’t hesitate to open an issue or even a pull request.

Where can I learn more?

You might find these other resources useful:

For any resource, you are encouraged to give feedback and contribute!

Categories

News 2

Tutorials 2

Software 33

Recent Posts