WFCOMMONS DATASETS

Workflow Datasets

WfCommons provides curated datasets of production workflow specifications and executions, and a tool to explore these datasets.

  WFINSTANCES

Repository of workflow instances in JSON. A browser application to easily filter, select, download, visualize, and simulate the execution of the workflow instances in the repository

WFINSTANCES Repo      
WFINSTANCES Browser

  WFFORMAT

A JSON schema used to describe workflow instances

WFFORMAT Schema  

WFCOMMONS FRAMEWORK

Workflow Analysis and Development

The WfCommons Framework is a Python package that provides tools to analyze and use workflow instances.

DOCUMENTATION   
CODE ON GITHUB   

  WFCHEF

Automatically create synthetic workflow generators, called recipes, by analyzing real workflows to uncover task patterns and performance characteristics

  WFGEN

Use recipes to automatically generate realistic synthetic workflow instances of arbitrary scales

  WFBENCH

Produce workflow benchmarks based on synthetic workflow instances, with configurable performance characteristics, that can be executed using production workflow runtime systems

RESEARCH PUBLICATIONS

Research Papers, Journal Articles, and Technical Reports

Citing WfCommons

When citing WfCommons, please use the following paper, which provides a general overview on the framework.

T. Coleman, H. Casanova, L. Pottier, M. Kaushik, E. Deelman, and R. Ferreira da Silva, "WfCommons: A Framework for Enabling Scientific Workflow Research and Development," Future Generation Computer Systems, vol. 128, pp. 16-27, 2022. DOI: 10.1016/j.future.2021.09.043
@article{wfcommons,
    title   = { {WfCommons: A Framework for Enabling Scientific Workflow Research and Development} },
    author  = {Coleman, Tain\~a and Casanova, Henri and Pottier, Lo\"ic and Kaushik, Manav and Deelman, Ewa and Ferreira da Silva, Rafael},
    journal = {Future Generation Computer Systems},
    volume  = {128},
    number  = {},
    pages   = {16--27},
    doi     = {10.1016/j.future.2021.09.043},
    year    = {2022},
}

WfCommons: Data Collection and Runtime Experiments using Multiple Workflow Systems, H. Casanova, K. Berney, S. Chastel, R. Ferreira da Silva, 1st IEEE International Workshop on Workflows in Distributed Environments (WiDE 2023), 2023, doi: 10.1109/COMPSAC57700.2023.00290

WfBench: Automated Generation of Scientific Workflow Benchmarks, T. Coleman, H. Casanova, K. Maheshwari, L. Pottier, S. R. Wilkinson, J. Wozniak, F. Suter, M. Shankar, R. Ferreira da Silva, 2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 2022, doi: 10.1109/PMBS56514.2022.00014

WfCommons: A Framework for Enabling Scientific Workflow Research and Development, T. Coleman, H. Casanova, L. Pottier, M. Kaushik, E. Deelman, and R. Ferreira da Silva, Future Generation Computer Systems, vol. 128, 2022, doi: 10.1016/j.future.2021.09.043

WfChef: Automated Generation of Accurate Scientific Workflow Generators, T. Coleman, H. Casanova, L. Pottier, M. Kaushik, E. Deelman, and R. Ferreira da Silva, 17th IEEE Conference on eScience, 2021, doi: 10.1016/j.future.2023.04.031

THEY USE WFCOMMONS

Research Outcomes Enabled by WfCommons

WfCommons has enabled research in 43 research articles. These articles include research outcomes produced by our own team as well as other researchers from the workflows community.

 

Y. Semenov, O. Sukhoroslov, Bi-objective Workflow Scheduling in the Cloud: What is the Real State-of-the-Art?, Supercomputing, RuSCDays 2024, 2025

F. Lehmann, J. Bader, F. Tschirpke, N. de Mecquenem, A. Loser, S. Becker, K. E. Lewinska, L. Thamsen, U. Leser, WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows, 25th IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025

M. Wilhelm, T. Pionteck, Static task mapping for heterogeneous systems based on series-parallel decompositions, 34th Heterogeneity in Computing Workshop (HCW 2025), 2025

L. C. R. Alvarenga, Y. Frota, D. de Oliveira, R. Coutinho, Optimizing Resource Estimation for Scientific Workflows in HPC Environments: A Layered-Bucket Heuristic Approach, Concurrency and Computation: Practice and Experience, 2025

S. Kulagina, A. Benoit, H. Meyerhenke, Memory-aware Adaptive Scheduling of Scientific Workflows on Heterogeneous Architectures, 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025

M. Fan, L. Ye, X. Zuo, X. Zhao, A bidirectional workflow scheduling approach with feedback mechanism in clouds, Expert Systems with Applications, 2024

K. Karmakar, A. Tarafdar, R. K. Das, S. Khatua, Cost-efficient Workflow as a Service using Containers, Journal of Grid Computing, 2024

P. Barredo, J. Puente, Cooperative Multi-fitness Evolutionary Algorithm for Scientific Workflows Scheduling, International Work-Conference on the Interplay Between Natural and Artificial Computation, 2024

J. McDonald, J. Dobbs, Y.C. Wong, R. Ferreira da Silva, H. Casanova, An exploration of online-simulation-driven portfolio scheduling in workflow management systems, Future Generation Computer Systems, 2024

S Kulagina, H Meyerhenke, A Benoit, Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms, 53rd International Conference on Parallel Processing (ICPP '24), 2024

J. R. Coleman, Dispersed Computing in Dynamic Environments, PhD Thesis, 2024

Y. Su, V. Anand, J. Yu, J. Tan, A. Wierman, Learning-Augmented Energy-Aware List Scheduling for Precedence-Constrained Tasks, ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 2024

B. Lin, C. Lin, X. Chen, M. Lin, G. Huang, Z. Xu, Cost-Driven Scheduling for Workflow Decision Making Systems in Fuzzy Edge-Cloud Environments, IEEE Transactions on Automation Science and Engineering, 2024

M. Fan, X. Zhao, X. Zuo, L. Ye, A Budget-Constrained Workflow Scheduling Approach With Priority Adjustment and Critical Task Optimizing in Clouds, IEEE Transactions on Automation Science and Engineering, 2024

J. Shin, D. Arroyo, A. Tantawi, C. Wang, A. Youssef, R. Nagi, Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task Partition, ACM Symposium on Cloud Computing (SoCC'24), 2024

A. A. Da Silva, R. P. Hong Enriquez, G. Rattihalli, V. Thurimella, R. Ferreira da Silva, D. Milojicic, Enabling HPC Scientific Workflows for Serverless, , 2024

B. Qin, Q. Lei, X. Wang, DGCQN: a RL and GCN combined method for DAG scheduling in edge computing, The Journal of Supercomputing, 2024

K. Alam, B. Roy, A. Serebrenik, Reusability Challenges of Scientific Workflows: A Case Study for Galaxy, arXiv preprint, 2023

L. Yang, L. Ye, Y. Xia, Y. Zhan, Look-ahead workflow scheduling with width changing trend in clouds, Future Generation Computer Systems, 2023

T. Coleman, H. Casanova, R. Ferreira da Silva, Automated generation of scientific workflow generators with WfChef, Future Generation Computer Systems, 2023

T. Coleman, Scientific Workflow Generation and Benchmarking, PhD Thesis, 2023

J. Zhang, X. Li, L. Chen, R. Ruiz, Scheduling Workflows with Limited Budget to Cloud Server and Serverless Resources, IEEE Transactions on Services Computing, 2023

Q. Zhang, Q. Wu, M. Zhou, J. Wen, S. Yao, A Communication Contention-Cognizant Scheduling Approach for Workflow Execution Across Public and Private Clouds, IEEE Transactions on Automation Science and Engineering, 2023

O. Sukhoroslov, Scheduling of Workflows with Task Resource Requirements in Cluster Environments, International Conference on Parallel Computing Technologies, 2023

H. Casanova, K. Berney, S. Chastel, R. Ferreira da Silva, WfCommons: Data Collection and Runtime Experiments using Multiple Workflow Systems, 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), 2023

O. Sukhoroslov M. Gorokhovskii, Benchmarking DAG Scheduling Algorithms on Scientific Workflow Instances, Russian Supercomputing Days, 2023

A. Benoit, L. Perotin, Y. Robert, H. Sun, Checkpointing Workflows à la Young/Daly Is Not Good Enough, ACM Transactions on Parallel Computing, 2022

Z. Li, Y. Liu, L. Guo, Q. Chen, J. Cheng, W. Zheng, M. Guo, FaaSFlow: enable efficient workflow execution for function-as-a-service, 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022

L. Xing, M. Zhang, H. Li, M. Gong, J. Yang, K. Wang, Local search driven periodic scheduling for workflows with random task runtime in clouds, Computers & Industrial Engineering, 2022

H. Casanova, Y. C. Wong, L. Pottier, R. Ferreira da Silva, On the Feasibility of Simulation-driven Portfolio Scheduling for Cyberinfrastructure Runtime Systems, Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), 2022

Z. Zhang, Q-S. Hua, X. Zhang, H. Jin, X. Liao, DAG Scheduling with Communication Delays Based on Graph Convolutional Neural Network, Wireless Communications and Mobile Computing, 2022

T. Coleman, H. Casanova, L. Pottier, M. Kaushik, E. Deelman, R. Ferreira da Silva, Wfcommons: A framework for enabling scientific workflow research and development, Future Generation Computer Systems, 2022

M. Kiamari, B. Krishnamachari, GCNScheduler: Scheduling Distributed Computing Applications using Graph Convolutional Networks, GNNet '22: Proceedings of the 1st International Workshop on Graph Neural Networking, 2022

P. Barredo, J. Puente, Robust Makespan Optimization via Genetic Algorithms on the Scientific Workflow Scheduling Problem, International Work-Conference on the Interplay Between Natural and Artificial Computation, 2022

T. Coleman, H. Casanova, T. Gwartney, R. Ferreira da Silva, Evaluating Energy-Aware Scheduling Algorithms for I/O-Intensive Scientific Workflows, International Conference on Computational Science (ICCS), 2021

M. Orr, O. Sinnen, Optimal task scheduling for partially heterogeneous systems, Parallel Computing, 2021

S. Tuli, G. Casale, N. R. Jennings, MCDS: AI Augmented Workflow Scheduling in Mobile Edge Cloud Computing Systems, IEEE Transactions on Parallel and Distributed Systems, 2021

T. Coleman, H. Casanova, R. Ferreira da Silva, WfChef: Automated Generation of Accurate Scientific Workflow Generators, 2021 IEEE 17th International Conference on eScience (eScience), 2021