[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
-
Updated
Aug 12, 2024 - Python
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Align Anything: Training All-modality Model with Feedback
?? 「大模型」1小时从0训练26M参数的视觉多模态VLM!?? Train a 26M-parameter VLM from scratch in just 1 hours!
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Collection of AWESOME vision-language models for vision tasks
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Get clean data from tricky documents, powered by vision-language models ?
日本語LLMまとめ - Overview of Japanese LLMs
Add a description, image, and links to the vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-model topic, visit your repo's landing page and select "manage topics."
潜水什么意思 | 皮肤是什么组织 | 122是什么号码 | 卵胎生是什么意思 | 研究员是什么级别 |
感叹是什么意思 | 猪油不凝固是什么原因 | 宝宝风热感冒吃什么药 | sm是什么意思 | 月经期间吃什么食物最好 |
临床医学是什么意思 | 女生隐私长什么样 | 理化检验主要检验什么 | 缩量十字星意味着什么 | 铁树开花是什么生肖 |
甘油三酯偏高说明什么 | dl是什么单位 | 双一流大学是什么 | 95是什么意思 | 吃维生素c片有什么好处 |
喝什么去火hcv9jop4ns9r.cn | 什么牌子的点读机好hcv8jop1ns0r.cn | 大便不规律是什么原因hcv9jop4ns2r.cn | 12月15日是什么星座hcv9jop3ns6r.cn | 胃疼是什么感觉hcv9jop2ns8r.cn |
cd是什么元素hcv9jop3ns9r.cn | 吹空调喉咙痛什么原因hcv7jop9ns1r.cn | 气虚什么症状hcv8jop4ns0r.cn | 什么草药可以止痒hcv7jop6ns7r.cn | 兆字五行属什么hcv7jop9ns7r.cn |
什么是刑事拘留hcv9jop4ns6r.cn | 2012年是什么命hcv7jop7ns2r.cn | 做爱时间短吃什么药好hcv8jop8ns0r.cn | 手麻吃什么药hcv8jop3ns2r.cn | 什么是动车hcv8jop5ns7r.cn |
白癜风是什么样子的hcv8jop6ns4r.cn | 血脂高吃什么油hcv9jop0ns4r.cn | 诺诺是什么意思hcv7jop5ns2r.cn | 吃什么东西容易消化hcv7jop9ns8r.cn | 女性吃什么降低雄激素hcv8jop0ns9r.cn |