LLaVA-1.6-Mistral-7B
Vision-language model for image understanding and captioning

LLaVA-1.6-Mistral-7B is a multimodal vision-language model that processes images alongside text to generate descriptive and reasoning-based responses. It enables image captioning and visual understanding by combining a vision encoder with a Mistral 7B language backbone.