Large multimodal models (LMM) have recently shown
encouraging progress with visual instruction tuning. In
this note, we show that the fully-connected vision-language
cross-modal connector in LLaVA is surprisingly powerful
and data-efficient
✨ SuperCoder 2.0 is now live & open-source! Checkout Now ✨
Large multimodal models (LMM) have recently shown
encouraging progress with visual instruction tuning. In
this note, we show that the fully-connected vision-language
cross-modal connector in LLaVA is surprisingly powerful
and data-efficient