ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance

Published in EMNLP 2025 Findings, 2025

Abstract

Novel benchmark exposing systematic failures of vision-language models under multi-hop reasoning and high information density in visual question answering tasks involving process mining visualizations.

Key Findings

  • Vision-language models struggle with complex process visualizations
  • Performance degrades significantly with multi-hop reasoning requirements
  • High information density in visualizations poses major challenges for current VLMs
  • Models show systematic failure patterns across different structural properties

Significance

This work extends the broader research agenda on auditing AI systems, moving from traditional predictive models to vision-language models in the context of process mining and data visualization.

Note

  • indicates equal contribution

Zinat, K. T., Abrar, S. M., Saha, S., Sakhamuri, S., Duppala, S., & Liu, Z. (2025). ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance. In Findings of the Association for Computational Linguistics: EMNLP 2025. (* equal contribution)

Download paper here

Recommended citation: Zinat, K. T.*, Abrar, S. M.*, Saha, S., Sakhamuri, S., Duppala, S., & Liu, Z. (2025). ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance. In Findings of the Association for Computational Linguistics: EMNLP 2025. (* equal contribution)
Download Paper