Back to Rankings返回排行榜
Top 100 · Vision Language Models前 100 · 视觉语言模型
100 repositories sorted by vision language models 按 视觉语言模型 排序,共 100 个仓库
| # | Repository仓库 | Stars | Forks | Language语言 | Issues | Description描述 | Last Commit最后提交 |
|---|---|---|---|---|---|---|---|
| 1 | transformers huggingface | 160.3k | 33.1k | Python | 1049 | 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. 🤗 Transformers:文本、视觉、音频和多模态模型中最先进的机器学习模型的模型定义框架,用于推理和训练。 | 2026-05-05 |
| 2 | LlamaFactory hiyouga | 70.9k | 8.7k | Python | 953 | Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)100多个LLM和VLM的统一高效微调(ACL 2024) | 2026-05-03 |
| 3 | UI-TARS-desktop bytedance | 29.6k | 2.9k | TypeScript | 315 | The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra开源多模式 AI 代理堆栈:连接尖端 AI 模型和代理基础设施 | 2026-04-29 |
| 4 | sglang sgl-project | 27.1k | 5.7k | Python | 637 | SGLang is a high-performance serving framework for large language models and multimodal models.SGLang 是一个用于大型语言模型和多模态模型的高性能服务框架。 | 2026-05-06 |
| 5 | runanywhere-sdks RunanywhereAI | 10.4k | 356 | C++ | 32 | Production ready toolkit to run AI locally用于本地运行 AI 的生产就绪工具包 | 2026-05-05 |
| 6 | OpenRLHF OpenRLHF | 9.4k | 934 | Python | 295 | An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)基于 Ray 的易于使用、可扩展且高性能的 Agentic RL 框架(PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL) | 2026-05-05 |
| 7 | notebooks roboflow | 9.4k | 1.4k | Jupyter Notebook | 64 | A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.关于最先进的计算机视觉模型和技术的教程集合。探索从 ResNet 等基础架构到 RF-DETR、YOLO11、SAM 3 和 Qwen3-VL 等尖端模型的一切内容。 | 2026-03-27 |
| 8 | anomaly-detection-resources yzhao062 | 9.3k | 1.8k | Python | 11 | Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM and VLM works!异常检测相关的书籍、论文、视频和工具箱。 Last update late 2025 for LLM and VLM works! | 2026-03-02 |
| 9 | oumi oumi-ai | 9.2k | 760 | Python | 0 | Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!轻松微调、评估和部署 gpt-oss、Qwen3、DeepSeek-R1 或任何开源 LLM / VLM! | 2026-05-05 |
| 10 | vlmcsd Wind4 | 8.8k | 2.5k | C | 1 | KMS Emulator in C (currently runs on Linux including Android, FreeBSD, Solaris, Minix, Mac OS, iOS, Windows with or without Cygwin)C 语言的 KMS 模拟器(目前在 Linux 上运行,包括 Android、FreeBSD、Solaris、Minix、Mac OS、iOS、带或不带 Cygwin 的 Windows) | 2024-01-10 |
| 11 | nexa-sdk qualcomm | 8.0k | 996 | Kotlin | 44 | Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.跨 GPU、NPU 和 CPU 运行具有 Day-0 模型支持的前沿 LLM 和 VLM,并为 PC (Python/C++)、移动设备(Android 和 iOS)和 Linux/IoT(Arm64 和 x86 Docker)提供全面的运行时覆盖。支持 OpenAI GPT-OSS、IBM Granite-4、Qwen-3-VL、Gemma-3n、Ministral-3 等。 | 2026-04-14 |
| 12 | minimind-v jingyaogong | 7.8k | 843 | Python | 15 | 🚀 「大模型」2小时从0训练65M参数的视觉多模态VLM!🌏 Train a 65M-parameter VLM from scratch in just 2h! 🚀 「大模型」2小时从0训练65M参数的视觉多模态VLM! 🌏 Train a 65M-parameter VLM from scratch in just 2h! | 2026-05-01 |
| 13 | ERNIE PaddlePaddle | 7.7k | 1.4k | Python | 31 | The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.ERNIE 4.5 和 ERNIEKit 的官方存储库——其基于 PaddlePaddle 的工业级开发工具包。 | 2026-01-04 |
| 14 | CogVLM zai-org | 6.7k | 454 | Python | 67 | a state-of-the-art-level open visual language model | 多模态预训练模型 | 2024-05-29 |
| 15 | VLM-R1 om-ai-lab | 6.0k | 377 | Python | 164 | Solve Visual Understanding with Reinforced VLMs使用增强型 VLM 解决视觉理解问题 | 2026-03-12 |
| 16 | UltraRAG OpenBMB | 5.5k | 413 | Python | 6 | [GitHub Trending #2] A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines[GitHub 趋势 #2] 用于构建复杂且创新的 RAG 管道的低代码 MCP 框架 | 2026-05-05 |
| 17 | Awesome-LLM-Inference xlite-dev | 5.2k | 375 | Python | 0 | 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉📚精彩的 LLM/VLM 推理论文精选列表,包含代码:Flash-Attention、Paged-Attention、WINT8/4、Parallelism 等。🎉 | 2026-04-20 |
| 18 | nanoVLM huggingface | 4.9k | 488 | Python | 36 | The simplest, fastest repository for training/finetuning small-sized VLMs.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-10-27 |
| 19 | mlx-vlm Blaizzy | 4.6k | 512 | Python | 105 | MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.MLX-VLM 是一个使用 MLX 在 Mac 上推理和微调视觉语言模型 (VLM) 的软件包。 | 2026-05-05 |
| 20 | star-vector joanrod | 4.4k | 246 | Python | 48 | StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.StarVector 是 SVG 生成的基础模型,它将矢量化转换为代码生成任务。 StarVector 使用视觉语言建模架构来处理视觉和文本输入,以极高的精度生成高质量的 SVG 代码。 | 2025-11-07 |
| 21 | LLM-RL-Visualized changyeyu | 4.2k | 400 | Python | 3 | 🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献! 💥(100+ LLM/RL Algorithm Maps ) | 2026-04-21 |
| 22 | VLMEvalKit open-compass | 4.1k | 689 | Python | 206 | Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks大型多模态模型 (LMM) 开源评估工具包,支持 220+ LMM、80+ 基准 | 2026-04-29 |
| 23 | lmms-eval EvolvingLMMs-Lab | 4.1k | 579 | Python | 26 | One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks跨文本、图像、视频和音频任务的一站式多模态评估工具包 | 2026-04-29 |
| 24 | R1-V StarsfieldAI | 4.1k | 286 | Python | 91 | Witness the aha moment of VLM with less than $3.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-05-19 |
| 25 | VILA NVlabs | 3.8k | 319 | Python | 67 | VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.VILA 是一系列最先进的视觉语言模型 (VLM),适用于跨边缘、数据中心和云的各种多模式 AI 任务。 | 2026-03-12 |
| 26 | FastDeploy PaddlePaddle | 3.7k | 744 | Python | 285 | High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle基于PaddlePaddle的LLM和VLM的高性能推理和部署工具包 | 2026-05-05 |
| 27 | PromptEnhancer Hunyuan-PromptEnhancer | 3.7k | 320 | Python | 13 | [CVPR 2026] PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.[CVPR 2026] PromptEnhancer 是一种提示重写工具,可将提示细化为更清晰、结构化的版本,以实现更好的图像生成。 | 2026-01-26 |
| 28 | MiniMax-01 MiniMax-AI | 3.4k | 328 | Python | 7 | The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear AttentionMiniMax-Text-01和MiniMax-VL-01的官方仓库,基于线性注意力的大语言模型和视觉语言模型 | 2025-07-07 |
| 29 | Local-File-Organizer QiuYannnn | 3.2k | 311 | Python | 26 | An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.一款基于人工智能的文件管理工具,通过组织本地文本、图像来确保隐私。将 Llama3.2 3B 和 Llava v1.6 模型与 Nexa SDK 结合使用,它可以直观地扫描、重组和组织文件,以便快速、无缝访问和轻松检索。 | 2024-10-21 |
| 30 | Skywork-R1V SkyworkAI | 3.2k | 280 | Python | 28 | Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.Skywork-R1V是Skywork AI开发的先进多模态AI模型系列,专注于视觉语言推理。 | 2025-12-15 |
| 31 | VLM_survey jingyi0000 | 3.1k | 234 | N/A | 2 | Collection of AWESOME vision-language models for vision tasks用于视觉任务的很棒的视觉语言模型集合 | 2025-10-14 |
| 32 | OSWorld xlang-ai | 2.8k | 447 | Python | 147 | [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments[NeurIPS 2024] OSWorld:真实计算机环境中开放式任务的多模式代理基准测试 | 2026-05-01 |
| 33 | evalscope modelscope | 2.8k | 322 | Python | 86 | A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.一个简化且可定制的框架,用于高效的大型模型(LLM、VLM、AIGC)评估和性能基准测试。 | 2026-05-04 |
| 34 | DeepCamera SharpAI | 2.7k | 442 | JavaScript | 3 | Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. Local VLM video analysis with Qwen, DeepSeek, SmolVLM, LLaVA, YOLO26. LLM-powered agentic security camera agent — watches, understands, remembers & guards your home via Telegram, Discord or Slack. Pluggable AI skills. OpenAI, Google, Anthropic or local AI. Runs on Mac Mini & AI PC.开源AI摄像头技能平台、AI NVR和CCTV监控。使用 Qwen、DeepSeek、SmolVLM、LLaVA、YOLO26 进行本地 VLM 视频分析。由 LLM 提供支持的代理安全摄像头代理 — 通过 Telegram、Discord 或 Slack 监视、理解、记住和保护您的家。可插入的人工智能技能。 OpenAI、Google、Anthropic 或本地 AI。在 Mac Mini 和 AI PC 上运行。 | 2026-04-21 |
| 35 | OmAgent om-ai-lab | 2.6k | 288 | Python | 7 | [EMNLP-2024] Build multimodal language agents for fast prototype and production[EMNLP-2024] 构建多模式语言代理以实现快速原型和生产 | 2025-03-19 |
| 36 | Cradle BAAI-Agents | 2.5k | 266 | Python | 19 | The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.Cradle 框架是通用计算机控制(GCC)的首次尝试。 Cradle 支持代理在标准化的通用环境中以最低的要求实现强大的推理能力、自我改进和技能管理,从而完成任何计算机任务。 | 2024-11-07 |
| 37 | OmniSVG OmniSVG | 2.5k | 94 | Python | 36 | [NeurIPS 2025] OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from simple icons to intricate anime characters.[NeurIPS 2025] OmniSVG 是第一个端到端多模式 SVG 生成器系列,它利用预先训练的视觉语言模型 (VLM),能够生成复杂而详细的 SVG,从简单的图标到复杂的动漫角色。 | 2026-03-01 |
| 38 | CogVLM2 zai-org | 2.4k | 163 | Python | 58 | GPT4V-level open-source multi-modal model based on Llama3-8B基于Llama3-8B的GPT4V级开源多模态模型 | 2025-03-03 |
| 39 | GLM-V zai-org | 2.3k | 167 | Python | 11 | GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement LearningGLM-4.6V/4.5V/4.1V-Thinking:通过可扩展的强化学习实现多功能多模态推理 | 2026-04-06 |
| 40 | comfyui_LLM_party heshengtao | 2.2k | 182 | Python | 75 | LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAGComfyUI中的LLM代理框架包括MCP服务器、Omost、GPT-sovits、ChatTTS、GOT-OCR2.0、FLUX提示节点,接入飞书、discord,并适配所有具有类似openai/aisuite接口的LLMS,如o1、ollama、gemini、grok、qwen、GLM、deepseek、kimi、doubao。适配本地llms、vlm、gguf如llama-3.3 Janus-Pro、Linkage graphRAG | 2026-03-08 |
| 41 | gowall Achno | 2.2k | 37 | Go | 7 | A tool to convert a Wallpaper's color scheme / palette, OCR with VLM's Traditional & Hybrid, Image Compression ,color palette extraction, image upsacling with Adversarial Networks and more image processing features.转换壁纸的配色方案/调色板、使用 VLM 传统和混合的 OCR、图像压缩、调色板提取、使用对抗网络进行图像升频以及更多图像处理功能的工具。 | 2026-04-16 |
| 42 | starVLA starVLA | 2.2k | 263 | Python | 8 | StarVLA: A Lego-like Codebase for Vision-Language-Action Model DevelopingStarVLA:用于视觉-语言-动作模型开发的类似乐高的代码库 | 2026-05-05 |
| 43 | deepseek-ocr.rs TimmyOVO | 2.2k | 168 | Rust | 14 | Rust multi‑backend OCR/VLM engine (DeepSeek‑OCR-1/2, PaddleOCR‑VL, DotsOCR) with DSQ quantization and an OpenAI‑compatible server & CLI – run locally without Python.Rust 多后端 OCR/VLM 引擎(DeepSeek-OCR-1/2、PaddleOCR-VL、DotsOCR)具有 DSQ 量化以及 OpenAI 兼容服务器和 CLI - 无需 Python 即可在本地运行。 | 2026-02-21 |
| 44 | Tutorial InternLM | 2.0k | 1.5k | Python | 55 | LLM&VLM TutorialLLM&VLM教程 | 2026-04-22 |
| 45 | Awesome-LM-SSP CryptoAILab | 1.9k | 137 | N/A | 0 | A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).大型模型安全、安保和隐私的阅读清单(包括 Awesome LLM Security、Safety 等)。 | 2026-05-02 |
| 46 | verl-agent langfengQ | 1.9k | 177 | Python | 55 | verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"verl-agent 是 veRL 的扩展,旨在通过 RL 训练 LLM/VLM 代理。 verl-agent 也是论文《Group-in-Group Policy Optimization for LLM Agent Training》的官方代码 | 2026-02-27 |
| 47 | ComfyUI-Prompt-Assistant yawiii | 1.9k | 89 | JavaScript | 15 | 提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务,实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search.提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务,实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。 The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search. | 2026-04-25 |
| 48 | Qwen-VL-Series-Finetune 2U1 | 1.8k | 211 | Python | 58 | An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.阿里云对Qwen-VL系列进行微调的开源实现。 | 2026-04-10 |
| 49 | Awesome-LLM4AD Thinklab-SJTU | 1.8k | 105 | N/A | 1 | A curated list of awesome LLM/VLM/VLA/World Model for Autonomous Driving(LLM4AD) resources (continually updated)精彩的 LLM/VLM/VLA/自动驾驶世界模型 (LLM4AD) 资源精选列表(持续更新) | 2026-05-01 |
| 50 | awesome-yolo-object-detection coderonion | 1.7k | 234 | N/A | 0 | 🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.🚀🚀🚀 一些很棒的公共 YOLO 物体检测系列项目和相关物体检测数据集的集合。 | 2025-05-31 |
| 51 | ComfyUI-Florence2 kijai | 1.7k | 141 | Python | 113 | Inference Microsoft Florence2 VLM推理 Microsoft Florence2 VLM | 2026-04-18 |
| 52 | OpenAdapt OpenAdaptAI | 1.6k | 233 | Python | 1 | Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models开源生成流程自动化(即生成 RPA)。具有大型([语言 (LLM) / 动作 (LAM) / 多模式 (LMM)] / 视觉语言 (VLM))模型的人工智能优先流程自动化 | 2026-03-04 |
| 53 | Pai-Megatron-Patch alibaba | 1.6k | 229 | Python | 109 | The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.阿里云开发的用于LLM & VLM大规模训练的Pai-Megatron-Patch官方仓库。 | 2025-12-15 |
| 54 | paddler intentee | 1.5k | 85 | Rust | 26 | Open-source LLM/VLM load balancer and serving platform for self-hosting LLMs (and VLMs) at scale 🏓🦙 Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.开源 LLM/VLM 负载均衡器和服务平台,用于大规模自托管 LLM(和 VLM)🏓🦙 llm-d、Docker Model Runner 等项目的替代方案,但移动部件较少,并且围绕 ggml 生态系统构建的部署简单。在 CPU 和 GPU 上运行。 | 2026-05-04 |
| 55 | react-native-executorch software-mansion | 1.5k | 73 | C++ | 54 | Declarative way to run AI models in React Native on device, powered by ExecuTorch.在设备上的 React Native 中运行 AI 模型的声明式方式,由 ExecuTorch 提供支持。 | 2026-05-05 |
| 56 | paperbanana llmsresearch | 1.4k | 215 | Python | 18 | Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.Google Research 的 PaperBanana 的开源实现和扩展,用于自动化学术图表、图表和研究视觉效果,并扩展到幻灯片生成等新领域。 | 2026-04-22 |
| 57 | unblink zapdos-labs | 1.4k | 163 | Go | 2 | Camera monitoring with VLM使用 VLM 进行摄像机监控 | 2026-03-09 |
| 58 | MobileVLM Meituan-AutoML | 1.3k | 87 | Python | 33 | Strong and Open Vision Language Assistant for Mobile Devices适用于移动设备的强大且开放的视觉语言助手 | 2024-04-15 |
| 59 | Awesome-Jailbreak-on-LLMs yueliu1999 | 1.3k | 108 | N/A | 0 | Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.Awesome-Jailbreak-on-LLMs 是最先进、新颖、令人兴奋的 LLM 越狱方法的集合。它包含论文、代码、数据集、评估和分析。 | 2026-03-30 |
| 60 | xllm jd-opensource | 1.3k | 194 | C++ | 77 | A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.适用于 LLM、VLM、DiT 和 REC 模型的高性能推理引擎,针对各种 AI 加速器进行了优化。 | 2026-04-30 |
| 61 | awesome-vlm-architectures gokayfem | 1.2k | 55 | Markdown | 0 | Famous Vision Language Models and Their Architectures著名视觉语言模型及其架构 | 2026-01-11 |
| 62 | miles radixark | 1.2k | 182 | Python | 73 | Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.Miles 是一个面向企业的 LLM 和 VLM 训练后强化学习框架,由 slime 分叉并共同进化。 | 2026-05-06 |
| 63 | AeroSandbox peterdsharpe | 1.2k | 193 | Jupyter Notebook | 9 | Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.通过计算图转换(例如自动微分)快速实现飞机设计优化。用于空气动力学、推进、结构、轨迹设计等的可组合分析工具。 | 2026-04-14 |
| 64 | kubeai kubeai-project | 1.2k | 126 | Go | 82 | AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.Kubernetes 的人工智能推理运算符。在生产中提供 ML 模型的最简单方法。支持 VLM、LLM、嵌入和语音转文本。 | 2026-03-31 |
| 65 | vlm_arm TommyZihao | 1.2k | 195 | Jupyter Notebook | 6 | 机械臂+大模型+多模态=人机协作具身智能体 | 2026-02-28 |
| 66 | CogAgent zai-org | 1.2k | 99 | Python | 27 | An open-sourced end-to-end VLM-based GUI Agent基于 VLM 的开源端到端 GUI 代理 | 2025-04-04 |
| 67 | vlms-zero-to-hero SkalskiP | 1.2k | 102 | Jupyter Notebook | 1 | This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.本系列将带您踏上从 NLP 和计算机视觉基础知识到视觉语言模型前沿的旅程。 | 2025-01-23 |
| 68 | joycaption fpgaminer | 1.1k | 68 | Jupyter Notebook | 38 | JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.JoyCaption 是一个图像字幕视觉语言模型 (VLM),它是一个免费、开放且未经审查的模型,供社区用于训练 Diffusion 模型。 | 2026-02-24 |
| 69 | vlmcsd kkkgo | 1.1k | 310 | C | 0 | 🔑Portable open-source KMS Emulator in C🔑C 语言的便携式开源 KMS 模拟器 | 2024-01-06 |
| 70 | Bunny BAAI-DCAI | 1.1k | 76 | Python | 24 | A family of lightweight multimodal models. 一系列轻量级多模式模型。 | 2024-11-18 |
| 71 | Gpt-Agreement-Payment DanOps-1 | 1.0k | 456 | Python | 6 | ChatGPT Plus/Team/Pro 订阅协议端到端重放工具集 · hCaptcha 视觉求解器 · 反欺诈机制实证研究 / End-to-end protocol replay toolkit for ChatGPT Plus/Team/Pro subscription with from-scratch hCaptcha solver and empirical anti-fraud research | 2026-05-05 |
| 72 | AngelSlim Tencent | 981 | 102 | Python | 46 | Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.模型压缩工具包旨在增强可用性、全面性和效率。 | 2026-04-29 |
| 73 | prismatic-vlms TRI-ML | 975 | 1.1k | Python | 19 | A flexible and efficient codebase for training visually-conditioned language models (VLMs)用于训练视觉条件语言模型 (VLM) 的灵活高效的代码库 | 2024-07-04 |
| 74 | streaming-vlm mit-han-lab | 969 | 62 | Python | 27 | StreamingVLM: Real-Time Understanding for Infinite Video StreamsStreamingVLM:实时理解无限视频流 | 2025-10-15 |
| 75 | VisRAG OpenBMB | 950 | 72 | Python | 0 | Parsing-free RAG supported by VLMsVLM 支持的免解析 RAG | 2025-12-07 |
| 76 | GamingAgent lmgame-org | 926 | 99 | Python | 7 | [ICLR 2026] LLM/VLM gaming agents and model evaluation through games.[ICLR 2026] LLM/VLM 游戏代理和通过游戏进行模型评估。 | 2025-11-16 |
| 77 | mindnlp mindspore-lab | 918 | 271 | Python | 60 | MindSpore + 🤗Huggingface: Run any Transformers/Diffusers model on MindSpore with seamless compatibility and acceleration.MindSpore + 🤗Huggingface:在 MindSpore 上运行任何 Transformers/Diffusers 模型,具有无缝兼容性和加速功能。 | 2026-03-08 |
| 78 | Awesome-Token-Compress daixiangzi | 891 | 42 | N/A | 0 | A paper list of some recent works about Token Compress for Vit and VLM关于 Vit 和 VLM 的 Token compress 的一些最新作品的论文列表 | 2026-04-14 |
| 79 | gpt-assistant-android Skythinker616 | 877 | 123 | Java | 22 | 【新增智能体模式】安卓端全场景GPT助手,可用音量键唤起并进行语音交流,支持联网、拍照、模板、附件解析、智能体模式等 | GPT assistant for Android, activated via volume keys for voice interaction, supporting features such as networking, taking photos, templates, parsing PDF and Office documents, and agent mode. | 2026-05-05 |
| 80 | UniWorld PKU-YuanGroup | 876 | 29 | Python | 14 | UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and GenerationUniWorld:用于统一视觉理解和生成的高分辨率语义编码器 | 2025-12-23 |
| 81 | UniPic SkyworkAI | 869 | 45 | Python | 25 | Open-source SOTA multi-image editing model开源SOTA多图像编辑模型 | 2026-01-24 |
| 82 | InternNav InternRobotics | 838 | 114 | Jupyter Notebook | 5 | InternRobotics' open platform for building generalized navigation foundation models.InternRobotics 用于构建通用导航基础模型的开放平台。 | 2026-03-10 |
| 83 | Awesome-Robotics-3D zubair-irshad | 808 | 41 | N/A | 3 | A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites受令人敬畏的计算机视觉启发,与大型模型(即 LLM/VLM)时代的机器人领域相关的 3D 视觉论文精选列表,包括论文、代码和相关网站 | 2025-12-17 |
| 84 | awesome-llm-and-aigc coderonion | 807 | 74 | N/A | 5 | 🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.🚀🚀🚀一些关于大型语言模型(LLM)、视觉语言模型(VLM)、视觉语言动作(VLA)、人工智能生成内容(AIGC)、相关数据集和应用程序的精彩公共项目的集合。 | 2025-08-01 |
| 85 | Awesome-Prompt-Adapter-Learning-for-VLMs-CLIP zhengli97 | 771 | 40 | N/A | 0 | A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.针对 CLIP 等视觉语言模型的精彩提示/适配器学习方法的精选列表。 | 2026-04-16 |
| 86 | awesome-data-llm OpenDataBox | 770 | 68 | N/A | 1 | Official Repository of "LLM × DATA" Survey Paper“LLM × DATA”调查论文的官方存储库 | 2026-03-24 |
| 87 | OpenWorldLib OpenDCAI | 738 | 40 | Python | 5 | Unified Codebase for Advanced World Models.先进世界模型的统一代码库。 | 2026-05-02 |
| 88 | Awesome-Spatial-Intelligence-in-VLM mll-lab-nu | 736 | 40 | N/A | 3 | A paper list for spatial reasoning空间推理论文列表 | 2026-01-19 |
| 89 | NEO EvolvingLMMs-Lab | 731 | 27 | Python | 0 | NEO Series: Native Vision-Language Models from First PrinciplesNEO 系列:来自第一原理的原生视觉语言模型 | 2026-04-26 |
| 90 | GeoChat mbzuai-oryx | 713 | 62 | Python | 46 | [CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing[CVPR 2024 🔥] GeoChat,第一个落地遥感大视觉语言模型 | 2024-11-28 |
| 91 | LightCompress ModelTC | 712 | 79 | Python | 42 | [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.[EMNLP 2024 和 AAAI 2026] 一个强大的工具包,用于压缩大型模型,包括 LLM、VLM 和视频生成模型。 | 2026-04-01 |
| 92 | PytorchNetHub bobo0810 | 708 | 156 | Jupyter Notebook | 0 | 项目注释+论文复现+算法竞赛+Pytorch实践+LeetCode+VLM预训练 | 2025-05-12 |
| 93 | dingo MigoXLab | 693 | 71 | Python | 3 | Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation ToolDingo:全面的人工智能数据、模型和应用质量评估工具 | 2026-04-30 |
| 94 | vlmaps vlmaps | 676 | 79 | Python | 14 | [ICRA2023] Implementation of Visual Language Maps for Robot Navigation[ICRA2023] 机器人导航视觉语言地图的实现 | 2024-07-09 |
| 95 | OmniInfer omnimind-ai | 676 | 4 | Python | 2 | Easy, fast, and private LLM & VLM inference for every device适用于每台设备的简单、快速且私密的 LLM 和 VLM 推理 | 2026-05-01 |
| 96 | OmniLottie OpenVGLab | 658 | 38 | Python | 5 | [CVPR 2026🔥] 🧑🎨 OmniLottie, an open-sourced multi-modal instructed vector animation generator that produces Lottie JSONs.[CVPR 2026🔥] 🧑🎨 OmniLottie,一个开源多模式指令矢量动画生成器,可生成 Lottie JSON。 | 2026-04-06 |
| 97 | MiMo-VL XiaomiMiMo | 640 | 31 | N/A | 6 | MiMo-VL米莫-VL | 2025-08-21 |
| 98 | VLM2Vec TIGER-AI-Lab | 639 | 60 | Python | 27 | This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]此存储库包含“VLM2Vec:用于大规模多模态嵌入任务的训练视觉语言模型”的代码 [ICLR 2025] | 2026-04-28 |
| 99 | localGPT-Vision PromtEngineer | 630 | 130 | Python | 27 | Chat with your documents using Vision Language Models. This repo implements an End to End RAG pipeline with both local and proprietary VLMs使用视觉语言模型与您的文档聊天。此存储库使用本地和专有 VLM 实现端到端 RAG 管道 | 2025-07-26 |
| 100 | video-search-and-summarization NVIDIA-AI-Blueprints | 628 | 230 | Python | 5 | Suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications.用于构建 GPU 加速视觉代理和人工智能驱动的视频分析应用程序的参考架构套件。 | 2026-05-06 |
No repositories match your search
没有匹配的仓库