Get the latest tech news

Transformers Utilization in Chart Understanding: A Review of Advances and Future


In recent years, interest in vision-language tasks has grown, especially those involving chart interactions. These tasks are inherently multimodal, requiring models to process chart images, accompanying text, underlying data tables, and often user queries. Traditionally, Chart Understanding (CU) relied on heuristics and rule-based systems. However, recent advancements that have integrated transformer architectures significantly improved performance. This paper reviews prominent research in CU, focusing on State-of-The-Art (SoTA) frameworks that employ transformers within End-to-End (E2E) solutions. Relevant benchmarking datasets and evaluation techniques are analyzed. Additionally, this article identifies key challenges and outlines promising future directions for advancing CU solutions. Following the PRISMA guidelines, a comprehensive literature search is conducted across Google Scholar, focusing on publications from Jan'20 to Jun'24. After rigorous screening and quality assessment, 32 studies are selected for in-depth analysis. The CU tasks are categorized into a three-layered paradigm based on the cognitive task required. Recent advancements in the frameworks addressing various CU tasks are also reviewed. Frameworks are categorized into single-task or multi-task based on the number of tasks solvable by the E2E solution. Within multi-task frameworks, pre-trained and prompt-engineering-based techniques are explored. This review overviews leading architectures, datasets, and pre-training tasks. Despite significant progress, challenges remain in OCR dependency, handling low-resolution images, and enhancing visual reasoning. Future directions include addressing these challenges, developing robust benchmarks, and optimizing model efficiency. Additionally, integrating explainable AI techniques and exploring the balance between real and synthetic data are crucial for advancing CU research.

View a PDF of the paper titled Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends, by Mirna Al-Shetairy and 3 other authors These tasks are inherently multimodal, requiring models to process chart images, accompanying text, underlying data tables, and often user queries. Additionally, integrating explainable AI techniques and exploring the balance between real and synthetic data are crucial for advancing CU research.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of review

review

Photo of future

future

Photo of advances

advances

Related news:

News photo

Android 15 review: Smooth operator

News photo

The future of the Kindle with Panos Panay

News photo

What the US Army’s 1959 ‘Soldier of Tomorrow’ Got Right About the Future of Warfare