Vision Language Models (VLMs), which extend Large Language Models (LLM) by incorporating visual understanding
capability, have demonstrated significant advancements in addressing open-ended visual question-answering (VQA) tasks.