ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance
Published in EMNLP 2025 Findings, 2025
Abstract
Novel benchmark exposing systematic failures of vision-language models under multi-hop reasoning and high information density in visual question answering tasks involving process mining visualizations.
Key Findings
- Vision-language models struggle with complex process visualizations
- Performance degrades significantly with multi-hop reasoning requirements
- High information density in visualizations poses major challenges for current VLMs
- Models show systematic failure patterns across different structural properties
Significance
This work extends the broader research agenda on auditing AI systems, moving from traditional predictive models to vision-language models in the context of process mining and data visualization.
Note
- indicates equal contribution
Recommended Citation
Zinat, K. T., Abrar, S. M., Saha, S., Sakhamuri, S., Duppala, S., & Liu, Z. (2025). ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance. In Findings of the Association for Computational Linguistics: EMNLP 2025. (* equal contribution)
Recommended citation: Zinat, K. T.*, Abrar, S. M.*, Saha, S., Sakhamuri, S., Duppala, S., & Liu, Z. (2025). ProcVQA: Benchmarking the Effects of Structural Properties in Mined Process Visualizations on Vision–Language Model Performance. In Findings of the Association for Computational Linguistics: EMNLP 2025. (* equal contribution)
Download Paper
