ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance

Published in EMNLP 2025 Findings, 2025

Abstract

Novel benchmark exposing systematic failures of vision-language models under multi-hop reasoning and high information density in visual question answering tasks involving process mining visualizations.

Key Findings

Vision-language models struggle with complex process visualizations
Performance degrades significantly with multi-hop reasoning requirements
High information density in visualizations poses major challenges for current VLMs
Models show systematic failure patterns across different structural properties

Significance

This work extends the broader research agenda on auditing AI systems, moving from traditional predictive models to vision-language models in the context of process mining and data visualization.

Note

indicates equal contribution

Recommended Citation

Zinat, K. T., Abrar, S. M., Saha, S., Sakhamuri, S., Duppala, S., & Liu, Z. (2025). ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance. In Findings of the Association for Computational Linguistics: EMNLP 2025. (* equal contribution)

Download paper here

Recommended citation: Zinat, K. T.*, Abrar, S. M.*, Saha, S., Sakhamuri, S., Duppala, S., & Liu, Z. (2025). ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance. In Findings of the Association for Computational Linguistics: EMNLP 2025. (* equal contribution)
Download Paper

Share on

Twitter Facebook LinkedIn

Saad Mohammad Abrar