Github Ranking /
2026-05-06
Back to Rankings返回排行榜

👀 Top 100 · Vision Language Models前 100 · 视觉语言模型

100 repositories sorted by vision language models 按 视觉语言模型 排序,共 100 个仓库

📦 100 repos个仓库 🕐 2026-05-06
# Repository仓库 Stars Forks Language语言 Issues Description描述 Last Commit最后提交
1 transformers huggingface 160.3k 33.1k Python 1049 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. 🤗 Transformers:文本、视觉、音频和多模态模型中最先进的机器学习模型的模型定义框架,用于推理和训练。 2026-05-05
2 LlamaFactory hiyouga 70.9k 8.7k Python 953 Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)100多个LLM和VLM的统一高效微调(ACL 2024) 2026-05-03
3 UI-TARS-desktop bytedance 29.6k 2.9k TypeScript 315 The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra开源多模式 AI 代理堆栈:连接尖端 AI 模型和代理基础设施 2026-04-29
4 sglang sgl-project 27.1k 5.7k Python 637 SGLang is a high-performance serving framework for large language models and multimodal models.SGLang 是一个用于大型语言模型和多模态模型的高性能服务框架。 2026-05-06
5 runanywhere-sdks RunanywhereAI 10.4k 356 C++ 32 Production ready toolkit to run AI locally用于本地运行 AI 的生产就绪工具包 2026-05-05
6 OpenRLHF OpenRLHF 9.4k 934 Python 295 An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)基于 Ray 的易于使用、可扩展且高性能的 Agentic RL 框架(PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL) 2026-05-05
7 notebooks roboflow 9.4k 1.4k Jupyter Notebook 64 A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.关于最先进的计算机视觉模型和技术的教程集合。探索从 ResNet 等基础架构到 RF-DETR、YOLO11、SAM 3 和 Qwen3-VL 等尖端模型的一切内容。 2026-03-27
8 anomaly-detection-resources yzhao062 9.3k 1.8k Python 11 Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM and VLM works!异常检测相关的书籍、论文、视频和工具箱。 Last update late 2025 for LLM and VLM works! 2026-03-02
9 oumi oumi-ai 9.2k 760 Python 0 Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!轻松微调、评估和部署 gpt-oss、Qwen3、DeepSeek-R1 或任何开源 LLM / VLM! 2026-05-05
10 vlmcsd Wind4 8.8k 2.5k C 1 KMS Emulator in C (currently runs on Linux including Android, FreeBSD, Solaris, Minix, Mac OS, iOS, Windows with or without Cygwin)C 语言的 KMS 模拟器(目前在 Linux 上运行,包括 Android、FreeBSD、Solaris、Minix、Mac OS、iOS、带或不带 Cygwin 的 Windows) 2024-01-10
11 nexa-sdk qualcomm 8.0k 996 Kotlin 44 Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.跨 GPU、NPU 和 CPU 运行具有 Day-0 模型支持的前沿 LLM 和 VLM,并为 PC (Python/C++)、移动设备(Android 和 iOS)和 Linux/IoT(Arm64 和 x86 Docker)提供全面的运行时覆盖。支持 OpenAI GPT-OSS、IBM Granite-4、Qwen-3-VL、Gemma-3n、Ministral-3 等。 2026-04-14
12 minimind-v jingyaogong 7.8k 843 Python 15 🚀 「大模型」2小时从0训练65M参数的视觉多模态VLM!🌏 Train a 65M-parameter VLM from scratch in just 2h! 🚀 「大模型」2小时从0训练65M参数的视觉多模态VLM! 🌏 Train a 65M-parameter VLM from scratch in just 2h! 2026-05-01
13 ERNIE PaddlePaddle 7.7k 1.4k Python 31 The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.ERNIE 4.5 和 ERNIEKit 的官方存储库——其基于 PaddlePaddle 的工业级开发工具包。 2026-01-04
14 CogVLM zai-org 6.7k 454 Python 67 a state-of-the-art-level open visual language model | 多模态预训练模型 2024-05-29
15 VLM-R1 om-ai-lab 6.0k 377 Python 164 Solve Visual Understanding with Reinforced VLMs使用增强型 VLM 解决视觉理解问题 2026-03-12
16 UltraRAG OpenBMB 5.5k 413 Python 6 [GitHub Trending #2] A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines[GitHub 趋势 #2] 用于构建复杂且创新的 RAG 管道的低代码 MCP 框架 2026-05-05
17 Awesome-LLM-Inference xlite-dev 5.2k 375 Python 0 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉📚精彩的 LLM/VLM 推理论文精选列表,包含代码:Flash-Attention、Paged-Attention、WINT8/4、Parallelism 等。🎉 2026-04-20
18 nanoVLM huggingface 4.9k 488 Python 36 The simplest, fastest repository for training/finetuning small-sized VLMs.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-10-27
19 mlx-vlm Blaizzy 4.6k 512 Python 105 MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.MLX-VLM 是一个使用 MLX 在 Mac 上推理和微调视觉语言模型 (VLM) 的软件包。 2026-05-05
20 star-vector joanrod 4.4k 246 Python 48 StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.StarVector 是 SVG 生成的基础模型,它将矢量化转换为代码生成任务。 StarVector 使用视觉语言建模架构来处理视觉和文本输入,以极高的精度生成高质量的 SVG 代码。 2025-11-07
21 LLM-RL-Visualized changyeyu 4.2k 400 Python 3 🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献! 💥(100+ LLM/RL Algorithm Maps ) 2026-04-21
22 VLMEvalKit open-compass 4.1k 689 Python 206 Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks大型多模态模型 (LMM) 开源评估工具包,支持 220+ LMM、80+ 基准 2026-04-29
23 lmms-eval EvolvingLMMs-Lab 4.1k 579 Python 26 One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks跨文本、图像、视频和音频任务的一站式多模态评估工具包 2026-04-29
24 R1-V StarsfieldAI 4.1k 286 Python 91 Witness the aha moment of VLM with less than $3.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-05-19
25 VILA NVlabs 3.8k 319 Python 67 VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.VILA 是一系列最先进的视觉语言模型 (VLM),适用于跨边缘、数据中心和云的各种多模式 AI 任务。 2026-03-12
26 FastDeploy PaddlePaddle 3.7k 744 Python 285 High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle基于PaddlePaddle的LLM和VLM的高性能推理和部署工具包 2026-05-05
27 PromptEnhancer Hunyuan-PromptEnhancer 3.7k 320 Python 13 [CVPR 2026] PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.[CVPR 2026] PromptEnhancer 是一种提示重写工具,可将提示细化为更清晰、结构化的版本,以实现更好的图像生成。 2026-01-26
28 MiniMax-01 MiniMax-AI 3.4k 328 Python 7 The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear AttentionMiniMax-Text-01和MiniMax-VL-01的官方仓库,基于线性注意力的大语言模型和视觉语言模型 2025-07-07
29 Local-File-Organizer QiuYannnn 3.2k 311 Python 26 An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.一款基于人工智能的文件管理工具,通过组织本地文本、图像来确保隐私。将 Llama3.2 3B 和 Llava v1.6 模型与 Nexa SDK 结合使用,它可以直观地扫描、重组和组织文件,以便快速、无缝访问和轻松检索。 2024-10-21
30 Skywork-R1V SkyworkAI 3.2k 280 Python 28 Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.Skywork-R1V是Skywork AI开发的先进多模态AI模型系列,专注于视觉语言推理。 2025-12-15
31 VLM_survey jingyi0000 3.1k 234 N/A 2 Collection of AWESOME vision-language models for vision tasks用于视觉任务的很棒的视觉语言模型集合 2025-10-14
32 OSWorld xlang-ai 2.8k 447 Python 147 [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments[NeurIPS 2024] OSWorld:真实计算机环境中开放式任务的多模式代理基准测试 2026-05-01
33 evalscope modelscope 2.8k 322 Python 86 A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.一个简化且可定制的框架,用于高效的大型模型(LLM、VLM、AIGC)评估和性能基准测试。 2026-05-04
34 DeepCamera SharpAI 2.7k 442 JavaScript 3 Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. Local VLM video analysis with Qwen, DeepSeek, SmolVLM, LLaVA, YOLO26. LLM-powered agentic security camera agent — watches, understands, remembers & guards your home via Telegram, Discord or Slack. Pluggable AI skills. OpenAI, Google, Anthropic or local AI. Runs on Mac Mini & AI PC.开源AI摄像头技能平台、AI NVR和CCTV监控。使用 Qwen、DeepSeek、SmolVLM、LLaVA、YOLO26 进行本地 VLM 视频分析。由 LLM 提供支持的代理安全摄像头代理 — 通过 Telegram、Discord 或 Slack 监视、理解、记住和保护您的家。可插入的人工智能技能。 OpenAI、Google、Anthropic 或本地 AI。在 Mac Mini 和 AI PC 上运行。 2026-04-21
35 OmAgent om-ai-lab 2.6k 288 Python 7 [EMNLP-2024] Build multimodal language agents for fast prototype and production[EMNLP-2024] 构建多模式语言代理以实现快速原型和生产 2025-03-19
36 Cradle BAAI-Agents 2.5k 266 Python 19 The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.Cradle 框架是通用计算机控制(GCC)的首次尝试。 Cradle 支持代理在标准化的通用环境中以最低的要求实现强大的推理能力、自我改进和技能管理,从而完成任何计算机任务。 2024-11-07
37 OmniSVG OmniSVG 2.5k 94 Python 36 [NeurIPS 2025] OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from simple icons to intricate anime characters.[NeurIPS 2025] OmniSVG 是第一个端到端多模式 SVG 生成器系列,它利用预先训练的视觉语言模型 (VLM),能够生成复杂而详细的 SVG,从简单的图标到复杂的动漫角色。 2026-03-01
38 CogVLM2 zai-org 2.4k 163 Python 58 GPT4V-level open-source multi-modal model based on Llama3-8B基于Llama3-8B的GPT4V级开源多模态模型 2025-03-03
39 GLM-V zai-org 2.3k 167 Python 11 GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement LearningGLM-4.6V/4.5V/4.1V-Thinking:通过可扩展的强化学习实现多功能多模态推理 2026-04-06
40 comfyui_LLM_party heshengtao 2.2k 182 Python 75 LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAGComfyUI中的LLM代理框架包括MCP服务器、Omost、GPT-sovits、ChatTTS、GOT-OCR2.0、FLUX提示节点,接入飞书、discord,并适配所有具有类似openai/aisuite接口的LLMS,如o1、ollama、gemini、grok、qwen、GLM、deepseek、kimi、doubao。适配本地llms、vlm、gguf如llama-3.3 Janus-Pro、Linkage graphRAG 2026-03-08
41 gowall Achno 2.2k 37 Go 7 A tool to convert a Wallpaper's color scheme / palette, OCR with VLM's Traditional & Hybrid, Image Compression ,color palette extraction, image upsacling with Adversarial Networks and more image processing features.转换壁纸的配色方案/调色板、使用 VLM 传统和混合的 OCR、图像压缩、调色板提取、使用对抗网络进行图像升频以及更多图像处理功能的工具。 2026-04-16
42 starVLA starVLA 2.2k 263 Python 8 StarVLA: A Lego-like Codebase for Vision-Language-Action Model DevelopingStarVLA:用于视觉-语言-动作模型开发的类似乐高的代码库 2026-05-05
43 deepseek-ocr.rs TimmyOVO 2.2k 168 Rust 14 Rust multi‑backend OCR/VLM engine (DeepSeek‑OCR-1/2, PaddleOCR‑VL, DotsOCR) with DSQ quantization and an OpenAI‑compatible server & CLI – run locally without Python.Rust 多后端 OCR/VLM 引擎(DeepSeek-OCR-1/2、PaddleOCR-VL、DotsOCR)具有 DSQ 量化以及 OpenAI 兼容服务器和 CLI - 无需 Python 即可在本地运行。 2026-02-21
44 Tutorial InternLM 2.0k 1.5k Python 55 LLM&VLM TutorialLLM&VLM教程 2026-04-22
45 Awesome-LM-SSP CryptoAILab 1.9k 137 N/A 0 A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).大型模型安全、安保和隐私的阅读清单(包括 Awesome LLM Security、Safety 等)。 2026-05-02
46 verl-agent langfengQ 1.9k 177 Python 55 verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"verl-agent 是 veRL 的扩展,旨在通过 RL 训练 LLM/VLM 代理。 verl-agent 也是论文《Group-in-Group Policy Optimization for LLM Agent Training》的官方代码 2026-02-27
47 ComfyUI-Prompt-Assistant yawiii 1.9k 89 JavaScript 15 提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务,实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search.提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务,实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。 The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search. 2026-04-25
48 Qwen-VL-Series-Finetune 2U1 1.8k 211 Python 58 An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.阿里云对Qwen-VL系列进行微调的开源实现。 2026-04-10
49 Awesome-LLM4AD Thinklab-SJTU 1.8k 105 N/A 1 A curated list of awesome LLM/VLM/VLA/World Model for Autonomous Driving(LLM4AD) resources (continually updated)精彩的 LLM/VLM/VLA/自动驾驶世界模型 (LLM4AD) 资源精选列表(持续更新) 2026-05-01
50 awesome-yolo-object-detection coderonion 1.7k 234 N/A 0 🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.🚀🚀🚀 一些很棒的公共 YOLO 物体检测系列项目和相关物体检测数据集的集合。 2025-05-31
51 ComfyUI-Florence2 kijai 1.7k 141 Python 113 Inference Microsoft Florence2 VLM推理 Microsoft Florence2 VLM 2026-04-18
52 OpenAdapt OpenAdaptAI 1.6k 233 Python 1 Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models开源生成流程自动化(即生成 RPA)。具有大型([语言 (LLM) / 动作 (LAM) / 多模式 (LMM)] / 视觉语言 (VLM))模型的人工智能优先流程自动化 2026-03-04
53 Pai-Megatron-Patch alibaba 1.6k 229 Python 109 The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.阿里云开发的用于LLM & VLM大规模训练的Pai-Megatron-Patch官方仓库。 2025-12-15
54 paddler intentee 1.5k 85 Rust 26 Open-source LLM/VLM load balancer and serving platform for self-hosting LLMs (and VLMs) at scale 🏓🦙 Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.开源 LLM/VLM 负载均衡器和服务平台,用于大规模自托管 LLM(和 VLM)🏓🦙 llm-d、Docker Model Runner 等项目的替代方案,但移动部件较少,并且围绕 ggml 生态系统构建的部署简单。在 CPU 和 GPU 上运行。 2026-05-04
55 react-native-executorch software-mansion 1.5k 73 C++ 54 Declarative way to run AI models in React Native on device, powered by ExecuTorch.在设备上的 React Native 中运行 AI 模型的声明式方式,由 ExecuTorch 提供支持。 2026-05-05
56 paperbanana llmsresearch 1.4k 215 Python 18 Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.Google Research 的 PaperBanana 的开源实现和扩展,用于自动化学术图表、图表和研究视觉效果,并扩展到幻灯片生成等新领域。 2026-04-22
57 unblink zapdos-labs 1.4k 163 Go 2 Camera monitoring with VLM使用 VLM 进行摄像机监控 2026-03-09
58 MobileVLM Meituan-AutoML 1.3k 87 Python 33 Strong and Open Vision Language Assistant for Mobile Devices适用于移动设备的强大且开放的视觉语言助手 2024-04-15
59 Awesome-Jailbreak-on-LLMs yueliu1999 1.3k 108 N/A 0 Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.Awesome-Jailbreak-on-LLMs 是最先进、新颖、令人兴奋的 LLM 越狱方法的集合。它包含论文、代码、数据集、评估和分析。 2026-03-30
60 xllm jd-opensource 1.3k 194 C++ 77 A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.适用于 LLM、VLM、DiT 和 REC 模型的高性能推理引擎,针对各种 AI 加速器进行了优化。 2026-04-30
61 awesome-vlm-architectures gokayfem 1.2k 55 Markdown 0 Famous Vision Language Models and Their Architectures著名视觉语言模型及其架构 2026-01-11
62 miles radixark 1.2k 182 Python 73 Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.Miles 是一个面向企业的 LLM 和 VLM 训练后强化学习框架,由 slime 分叉并共同进化。 2026-05-06
63 AeroSandbox peterdsharpe 1.2k 193 Jupyter Notebook 9 Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.通过计算图转换(例如自动微分)快速实现飞机设计优化。用于空气动力学、推进、结构、轨迹设计等的可组合分析工具。 2026-04-14
64 kubeai kubeai-project 1.2k 126 Go 82 AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.Kubernetes 的人工智能推理运算符。在生产中提供 ML 模型的最简单方法。支持 VLM、LLM、嵌入和语音转文本。 2026-03-31
65 vlm_arm TommyZihao 1.2k 195 Jupyter Notebook 6 机械臂+大模型+多模态=人机协作具身智能体 2026-02-28
66 CogAgent zai-org 1.2k 99 Python 27 An open-sourced end-to-end VLM-based GUI Agent基于 VLM 的开源端到端 GUI 代理 2025-04-04
67 vlms-zero-to-hero SkalskiP 1.2k 102 Jupyter Notebook 1 This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.本系列将带您踏上从 NLP 和计算机视觉基础知识到视觉语言模型前沿的旅程。 2025-01-23
68 joycaption fpgaminer 1.1k 68 Jupyter Notebook 38 JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.JoyCaption 是一个图像字幕视觉语言模型 (VLM),它是一个免费、开放且未经审查的模型,供社区用于训练 Diffusion 模型。 2026-02-24
69 vlmcsd kkkgo 1.1k 310 C 0 🔑Portable open-source KMS Emulator in C🔑C 语言的便携式开源 KMS 模拟器 2024-01-06
70 Bunny BAAI-DCAI 1.1k 76 Python 24 A family of lightweight multimodal models. 一系列轻量级多模式模型。 2024-11-18
71 Gpt-Agreement-Payment DanOps-1 1.0k 456 Python 6 ChatGPT Plus/Team/Pro 订阅协议端到端重放工具集 · hCaptcha 视觉求解器 · 反欺诈机制实证研究 / End-to-end protocol replay toolkit for ChatGPT Plus/Team/Pro subscription with from-scratch hCaptcha solver and empirical anti-fraud research 2026-05-05
72 AngelSlim Tencent 981 102 Python 46 Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.模型压缩工具包旨在增强可用性、全面性和效率。 2026-04-29
73 prismatic-vlms TRI-ML 975 1.1k Python 19 A flexible and efficient codebase for training visually-conditioned language models (VLMs)用于训练视觉条件语言模型 (VLM) 的灵活高效的代码库 2024-07-04
74 streaming-vlm mit-han-lab 969 62 Python 27 StreamingVLM: Real-Time Understanding for Infinite Video StreamsStreamingVLM:实时理解无限视频流 2025-10-15
75 VisRAG OpenBMB 950 72 Python 0 Parsing-free RAG supported by VLMsVLM 支持的免解析 RAG 2025-12-07
76 GamingAgent lmgame-org 926 99 Python 7 [ICLR 2026] LLM/VLM gaming agents and model evaluation through games.[ICLR 2026] LLM/VLM 游戏代理和通过游戏进行模型评估。 2025-11-16
77 mindnlp mindspore-lab 918 271 Python 60 MindSpore + 🤗Huggingface: Run any Transformers/Diffusers model on MindSpore with seamless compatibility and acceleration.MindSpore + 🤗Huggingface:在 MindSpore 上运行任何 Transformers/Diffusers 模型,具有无缝兼容性和加速功能。 2026-03-08
78 Awesome-Token-Compress daixiangzi 891 42 N/A 0 A paper list of some recent works about Token Compress for Vit and VLM关于 Vit 和 VLM 的 Token compress 的一些最新作品的论文列表 2026-04-14
79 gpt-assistant-android Skythinker616 877 123 Java 22 【新增智能体模式】安卓端全场景GPT助手,可用音量键唤起并进行语音交流,支持联网、拍照、模板、附件解析、智能体模式等 | GPT assistant for Android, activated via volume keys for voice interaction, supporting features such as networking, taking photos, templates, parsing PDF and Office documents, and agent mode. 2026-05-05
80 UniWorld PKU-YuanGroup 876 29 Python 14 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and GenerationUniWorld:用于统一视觉理解和生成的高分辨率语义编码器 2025-12-23
81 UniPic SkyworkAI 869 45 Python 25 Open-source SOTA multi-image editing model开源SOTA多图像编辑模型 2026-01-24
82 InternNav InternRobotics 838 114 Jupyter Notebook 5 InternRobotics' open platform for building generalized navigation foundation models.InternRobotics 用于构建通用导航基础模型的开放平台。 2026-03-10
83 Awesome-Robotics-3D zubair-irshad 808 41 N/A 3 A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites受令人敬畏的计算机视觉启发,与大型模型(即 LLM/VLM)时代的机器人领域相关的 3D 视觉论文精选列表,包括论文、代码和相关网站 2025-12-17
84 awesome-llm-and-aigc coderonion 807 74 N/A 5 🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.🚀🚀🚀一些关于大型语言模型(LLM)、视觉语言模型(VLM)、视觉语言动作(VLA)、人工智能生成内容(AIGC)、相关数据集和应用程序的精彩公共项目的集合。 2025-08-01
85 Awesome-Prompt-Adapter-Learning-for-VLMs-CLIP zhengli97 771 40 N/A 0 A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.针对 CLIP 等视觉语言模型的精彩提示/适配器学习方法的精选列表。 2026-04-16
86 awesome-data-llm OpenDataBox 770 68 N/A 1 Official Repository of "LLM × DATA" Survey Paper“LLM × DATA”调查论文的官方存储库 2026-03-24
87 OpenWorldLib OpenDCAI 738 40 Python 5 Unified Codebase for Advanced World Models.先进世界模型的统一代码库。 2026-05-02
88 Awesome-Spatial-Intelligence-in-VLM mll-lab-nu 736 40 N/A 3 A paper list for spatial reasoning空间推理论文列表 2026-01-19
89 NEO EvolvingLMMs-Lab 731 27 Python 0 NEO Series: Native Vision-Language Models from First PrinciplesNEO 系列:来自第一原理的原生视觉语言模型 2026-04-26
90 GeoChat mbzuai-oryx 713 62 Python 46 [CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing[CVPR 2024 🔥] GeoChat,第一个落地遥感大视觉语言模型 2024-11-28
91 LightCompress ModelTC 712 79 Python 42 [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.[EMNLP 2024 和 AAAI 2026] 一个强大的工具包,用于压缩大型模型,包括 LLM、VLM 和视频生成模型。 2026-04-01
92 PytorchNetHub bobo0810 708 156 Jupyter Notebook 0 项目注释+论文复现+算法竞赛+Pytorch实践+LeetCode+VLM预训练 2025-05-12
93 dingo MigoXLab 693 71 Python 3 Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation ToolDingo:全面的人工智能数据、模型和应用质量评估工具 2026-04-30
94 vlmaps vlmaps 676 79 Python 14 [ICRA2023] Implementation of Visual Language Maps for Robot Navigation[ICRA2023] 机器人导航视觉语言地图的实现 2024-07-09
95 OmniInfer omnimind-ai 676 4 Python 2 Easy, fast, and private LLM & VLM inference for every device适用于每台设备的简单、快速且私密的 LLM 和 VLM 推理 2026-05-01
96 OmniLottie OpenVGLab 658 38 Python 5 [CVPR 2026🔥] 🧑‍🎨 OmniLottie, an open-sourced multi-modal instructed vector animation generator that produces Lottie JSONs.[CVPR 2026🔥] 🧑‍🎨 OmniLottie,一个开源多模式指令矢量动画生成器,可生成 Lottie JSON。 2026-04-06
97 MiMo-VL XiaomiMiMo 640 31 N/A 6 MiMo-VL米莫-VL 2025-08-21
98 VLM2Vec TIGER-AI-Lab 639 60 Python 27 This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]此存储库包含“VLM2Vec:用于大规模多模态嵌入任务的训练视觉语言模型”的代码 [ICLR 2025] 2026-04-28
99 localGPT-Vision PromtEngineer 630 130 Python 27 Chat with your documents using Vision Language Models. This repo implements an End to End RAG pipeline with both local and proprietary VLMs使用视觉语言模型与您的文档聊天。此存储库使用本地和专有 VLM 实现端到端 RAG 管道 2025-07-26
100 video-search-and-summarization NVIDIA-AI-Blueprints 628 230 Python 5 Suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications.用于构建 GPU 加速视觉代理和人工智能驱动的视频分析应用程序的参考架构套件。 2026-05-06
No repositories match your search 没有匹配的仓库