👀 Top 100 · Vision Language Models前 100 · 视觉语言模型

100 repositories sorted by vision language models 按视觉语言模型排序，共 100 个仓库

⌕

📦 100 repos个仓库 🕐 2026-05-06

#	Repository仓库	Stars	Forks	Language语言	Issues	Description描述	Last Commit最后提交
1	transformers huggingface	160.3k	33.1k	Python	1049	🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training. 🤗 Transformers：文本、视觉、音频和多模态模型中最先进的机器学习模型的模型定义框架，用于推理和训练。	2026-05-05
2	LlamaFactory hiyouga	70.9k	8.7k	Python	953	Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)100多个LLM和VLM的统一高效微调（ACL 2024）	2026-05-03
3	UI-TARS-desktop bytedance	29.6k	2.9k	TypeScript	315	The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra开源多模式 AI 代理堆栈：连接尖端 AI 模型和代理基础设施	2026-04-29
4	sglang sgl-project	27.1k	5.7k	Python	637	SGLang is a high-performance serving framework for large language models and multimodal models.SGLang 是一个用于大型语言模型和多模态模型的高性能服务框架。	2026-05-06
5	runanywhere-sdks RunanywhereAI	10.4k	356	C++	32	Production ready toolkit to run AI locally用于本地运行 AI 的生产就绪工具包	2026-05-05
6	OpenRLHF OpenRLHF	9.4k	934	Python	295	An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)基于 Ray 的易于使用、可扩展且高性能的 Agentic RL 框架（PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL）	2026-05-05
7	notebooks roboflow	9.4k	1.4k	Jupyter Notebook	64	A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.关于最先进的计算机视觉模型和技术的教程集合。探索从 ResNet 等基础架构到 RF-DETR、YOLO11、SAM 3 和 Qwen3-VL 等尖端模型的一切内容。	2026-03-27
8	anomaly-detection-resources yzhao062	9.3k	1.8k	Python	11	Anomaly detection related books, papers, videos, and toolboxes. Last update late 2025 for LLM and VLM works!异常检测相关的书籍、论文、视频和工具箱。 Last update late 2025 for LLM and VLM works!	2026-03-02
9	oumi oumi-ai	9.2k	760	Python	0	Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!轻松微调、评估和部署 gpt-oss、Qwen3、DeepSeek-R1 或任何开源 LLM / VLM！	2026-05-05
10	vlmcsd Wind4	8.8k	2.5k	C	1	KMS Emulator in C (currently runs on Linux including Android, FreeBSD, Solaris, Minix, Mac OS, iOS, Windows with or without Cygwin)C 语言的 KMS 模拟器（目前在 Linux 上运行，包括 Android、FreeBSD、Solaris、Minix、Mac OS、iOS、带或不带 Cygwin 的 Windows）	2024-01-10
11	nexa-sdk qualcomm	8.0k	996	Kotlin	44	Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.跨 GPU、NPU 和 CPU 运行具有 Day-0 模型支持的前沿 LLM 和 VLM，并为 PC (Python/C++)、移动设备（Android 和 iOS）和 Linux/IoT（Arm64 和 x86 Docker）提供全面的运行时覆盖。支持 OpenAI GPT-OSS、IBM Granite-4、Qwen-3-VL、Gemma-3n、Ministral-3 等。	2026-04-14
12	minimind-v jingyaogong	7.8k	843	Python	15	🚀 「大模型」2小时从0训练65M参数的视觉多模态VLM！🌏 Train a 65M-parameter VLM from scratch in just 2h! 🚀 「大模型」2小时从0训练65M参数的视觉多模态VLM！ 🌏 Train a 65M-parameter VLM from scratch in just 2h!	2026-05-01
13	ERNIE PaddlePaddle	7.7k	1.4k	Python	31	The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.ERNIE 4.5 和 ERNIEKit 的官方存储库——其基于 PaddlePaddle 的工业级开发工具包。	2026-01-04
14	CogVLM zai-org	6.7k	454	Python	67	a state-of-the-art-level open visual language model \| 多模态预训练模型	2024-05-29
15	VLM-R1 om-ai-lab	6.0k	377	Python	164	Solve Visual Understanding with Reinforced VLMs使用增强型 VLM 解决视觉理解问题	2026-03-12
16	UltraRAG OpenBMB	5.5k	413	Python	6	[GitHub Trending #2] A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines[GitHub 趋势 #2] 用于构建复杂且创新的 RAG 管道的低代码 MCP 框架	2026-05-05
17	Awesome-LLM-Inference xlite-dev	5.2k	375	Python	0	📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉📚精彩的 LLM/VLM 推理论文精选列表，包含代码：Flash-Attention、Paged-Attention、WINT8/4、Parallelism 等。🎉	2026-04-20
18	nanoVLM huggingface	4.9k	488	Python	36	The simplest, fastest repository for training/finetuning small-sized VLMs.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-10-27
19	mlx-vlm Blaizzy	4.6k	512	Python	105	MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.MLX-VLM 是一个使用 MLX 在 Mac 上推理和微调视觉语言模型 (VLM) 的软件包。	2026-05-05
20	star-vector joanrod	4.4k	246	Python	48	StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.StarVector 是 SVG 生成的基础模型，它将矢量化转换为代码生成任务。 StarVector 使用视觉语言建模架构来处理视觉和文本输入，以极高的精度生成高质量的 SVG 代码。	2025-11-07
21	LLM-RL-Visualized changyeyu	4.2k	400	Python	3	🌟100+ 原创 LLM / RL 原理图📚，《大模型算法》作者巨献！💥（100+ LLM/RL Algorithm Maps ）🌟100+ 原创 LLM / RL 原理图📚，《大模型算法》作者巨献！ 💥（100+ LLM/RL Algorithm Maps ）	2026-04-21
22	VLMEvalKit open-compass	4.1k	689	Python	206	Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks大型多模态模型 (LMM) 开源评估工具包，支持 220+ LMM、80+ 基准	2026-04-29
23	lmms-eval EvolvingLMMs-Lab	4.1k	579	Python	26	One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks跨文本、图像、视频和音频任务的一站式多模态评估工具包	2026-04-29
24	R1-V StarsfieldAI	4.1k	286	Python	91	Witness the aha moment of VLM with less than $3.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-05-19
25	VILA NVlabs	3.8k	319	Python	67	VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.VILA 是一系列最先进的视觉语言模型 (VLM)，适用于跨边缘、数据中心和云的各种多模式 AI 任务。	2026-03-12
26	FastDeploy PaddlePaddle	3.7k	744	Python	285	High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle基于PaddlePaddle的LLM和VLM的高性能推理和部署工具包	2026-05-05
27	PromptEnhancer Hunyuan-PromptEnhancer	3.7k	320	Python	13	[CVPR 2026] PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.[CVPR 2026] PromptEnhancer 是一种提示重写工具，可将提示细化为更清晰、结构化的版本，以实现更好的图像生成。	2026-01-26
28	MiniMax-01 MiniMax-AI	3.4k	328	Python	7	The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear AttentionMiniMax-Text-01和MiniMax-VL-01的官方仓库，基于线性注意力的大语言模型和视觉语言模型	2025-07-07
29	Local-File-Organizer QiuYannnn	3.2k	311	Python	26	An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.一款基于人工智能的文件管理工具，通过组织本地文本、图像来确保隐私。将 Llama3.2 3B 和 Llava v1.6 模型与 Nexa SDK 结合使用，它可以直观地扫描、重组和组织文件，以便快速、无缝访问和轻松检索。	2024-10-21
30	Skywork-R1V SkyworkAI	3.2k	280	Python	28	Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.Skywork-R1V是Skywork AI开发的先进多模态AI模型系列，专注于视觉语言推理。	2025-12-15
31	VLM_survey jingyi0000	3.1k	234	N/A	2	Collection of AWESOME vision-language models for vision tasks用于视觉任务的很棒的视觉语言模型集合	2025-10-14
32	OSWorld xlang-ai	2.8k	447	Python	147	[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments[NeurIPS 2024] OSWorld：真实计算机环境中开放式任务的多模式代理基准测试	2026-05-01
33	evalscope modelscope	2.8k	322	Python	86	A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.一个简化且可定制的框架，用于高效的大型模型（LLM、VLM、AIGC）评估和性能基准测试。	2026-05-04
34	DeepCamera SharpAI	2.7k	442	JavaScript	3	Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. Local VLM video analysis with Qwen, DeepSeek, SmolVLM, LLaVA, YOLO26. LLM-powered agentic security camera agent — watches, understands, remembers & guards your home via Telegram, Discord or Slack. Pluggable AI skills. OpenAI, Google, Anthropic or local AI. Runs on Mac Mini & AI PC.开源AI摄像头技能平台、AI NVR和CCTV监控。使用 Qwen、DeepSeek、SmolVLM、LLaVA、YOLO26 进行本地 VLM 视频分析。由 LLM 提供支持的代理安全摄像头代理 — 通过 Telegram、Discord 或 Slack 监视、理解、记住和保护您的家。可插入的人工智能技能。 OpenAI、Google、Anthropic 或本地 AI。在 Mac Mini 和 AI PC 上运行。	2026-04-21
35	OmAgent om-ai-lab	2.6k	288	Python	7	[EMNLP-2024] Build multimodal language agents for fast prototype and production[EMNLP-2024] 构建多模式语言代理以实现快速原型和生产	2025-03-19
36	Cradle BAAI-Agents	2.5k	266	Python	19	The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.Cradle 框架是通用计算机控制（GCC）的首次尝试。 Cradle 支持代理在标准化的通用环境中以最低的要求实现强大的推理能力、自我改进和技能管理，从而完成任何计算机任务。	2024-11-07
37	OmniSVG OmniSVG	2.5k	94	Python	36	[NeurIPS 2025] OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from simple icons to intricate anime characters.[NeurIPS 2025] OmniSVG 是第一个端到端多模式 SVG 生成器系列，它利用预先训练的视觉语言模型 (VLM)，能够生成复杂而详细的 SVG，从简单的图标到复杂的动漫角色。	2026-03-01
38	CogVLM2 zai-org	2.4k	163	Python	58	GPT4V-level open-source multi-modal model based on Llama3-8B基于Llama3-8B的GPT4V级开源多模态模型	2025-03-03
39	GLM-V zai-org	2.3k	167	Python	11	GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement LearningGLM-4.6V/4.5V/4.1V-Thinking：通过可扩展的强化学习实现多功能多模态推理	2026-04-06
40	comfyui_LLM_party heshengtao	2.2k	182	Python	75	LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAGComfyUI中的LLM代理框架包括MCP服务器、Omost、GPT-sovits、ChatTTS、GOT-OCR2.0、FLUX提示节点，接入飞书、discord，并适配所有具有类似openai/aisuite接口的LLMS，如o1、ollama、gemini、grok、qwen、GLM、deepseek、kimi、doubao。适配本地llms、vlm、gguf如llama-3.3 Janus-Pro、Linkage graphRAG	2026-03-08
41	gowall Achno	2.2k	37	Go	7	A tool to convert a Wallpaper's color scheme / palette, OCR with VLM's Traditional & Hybrid, Image Compression ,color palette extraction, image upsacling with Adversarial Networks and more image processing features.转换壁纸的配色方案/调色板、使用 VLM 传统和混合的 OCR、图像压缩、调色板提取、使用对抗网络进行图像升频以及更多图像处理功能的工具。	2026-04-16
42	starVLA starVLA	2.2k	263	Python	8	StarVLA: A Lego-like Codebase for Vision-Language-Action Model DevelopingStarVLA：用于视觉-语言-动作模型开发的类似乐高的代码库	2026-05-05
43	deepseek-ocr.rs TimmyOVO	2.2k	168	Rust	14	Rust multi‑backend OCR/VLM engine (DeepSeek‑OCR-1/2, PaddleOCR‑VL, DotsOCR) with DSQ quantization and an OpenAI‑compatible server & CLI – run locally without Python.Rust 多后端 OCR/VLM 引擎（DeepSeek-OCR-1/2、PaddleOCR-VL、DotsOCR）具有 DSQ 量化以及 OpenAI 兼容服务器和 CLI - 无需 Python 即可在本地运行。	2026-02-21
44	Tutorial InternLM	2.0k	1.5k	Python	55	LLM&VLM TutorialLLM&VLM教程	2026-04-22
45	Awesome-LM-SSP CryptoAILab	1.9k	137	N/A	0	A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).大型模型安全、安保和隐私的阅读清单（包括 Awesome LLM Security、Safety 等）。	2026-05-02
46	verl-agent langfengQ	1.9k	177	Python	55	verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"verl-agent 是 veRL 的扩展，旨在通过 RL 训练 LLM/VLM 代理。 verl-agent 也是论文《Group-in-Group Policy Optimization for LLM Agent Training》的官方代码	2026-02-27
47	ComfyUI-Prompt-Assistant yawiii	1.9k	89	JavaScript	15	提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务，实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search.提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务，实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。 The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search.	2026-04-25
48	Qwen-VL-Series-Finetune 2U1	1.8k	211	Python	58	An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.阿里云对Qwen-VL系列进行微调的开源实现。	2026-04-10
49	Awesome-LLM4AD Thinklab-SJTU	1.8k	105	N/A	1	A curated list of awesome LLM/VLM/VLA/World Model for Autonomous Driving(LLM4AD) resources (continually updated)精彩的 LLM/VLM/VLA/自动驾驶世界模型 (LLM4AD) 资源精选列表（持续更新）	2026-05-01
50	awesome-yolo-object-detection coderonion	1.7k	234	N/A	0	🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.🚀🚀🚀 一些很棒的公共 YOLO 物体检测系列项目和相关物体检测数据集的集合。	2025-05-31
51	ComfyUI-Florence2 kijai	1.7k	141	Python	113	Inference Microsoft Florence2 VLM推理 Microsoft Florence2 VLM	2026-04-18
52	OpenAdapt OpenAdaptAI	1.6k	233	Python	1	Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models开源生成流程自动化（即生成 RPA）。具有大型（[语言 (LLM) / 动作 (LAM) / 多模式 (LMM)] / 视觉语言 (VLM)）模型的人工智能优先流程自动化	2026-03-04
53	Pai-Megatron-Patch alibaba	1.6k	229	Python	109	The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.阿里云开发的用于LLM & VLM大规模训练的Pai-Megatron-Patch官方仓库。	2025-12-15
54	paddler intentee	1.5k	85	Rust	26	Open-source LLM/VLM load balancer and serving platform for self-hosting LLMs (and VLMs) at scale 🏓🦙 Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.开源 LLM/VLM 负载均衡器和服务平台，用于大规模自托管 LLM（和 VLM）🏓🦙 llm-d、Docker Model Runner 等项目的替代方案，但移动部件较少，并且围绕 ggml 生态系统构建的部署简单。在 CPU 和 GPU 上运行。	2026-05-04
55	react-native-executorch software-mansion	1.5k	73	C++	54	Declarative way to run AI models in React Native on device, powered by ExecuTorch.在设备上的 React Native 中运行 AI 模型的声明式方式，由 ExecuTorch 提供支持。	2026-05-05
56	paperbanana llmsresearch	1.4k	215	Python	18	Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.Google Research 的 PaperBanana 的开源实现和扩展，用于自动化学术图表、图表和研究视觉效果，并扩展到幻灯片生成等新领域。	2026-04-22
57	unblink zapdos-labs	1.4k	163	Go	2	Camera monitoring with VLM使用 VLM 进行摄像机监控	2026-03-09
58	MobileVLM Meituan-AutoML	1.3k	87	Python	33	Strong and Open Vision Language Assistant for Mobile Devices适用于移动设备的强大且开放的视觉语言助手	2024-04-15
59	Awesome-Jailbreak-on-LLMs yueliu1999	1.3k	108	N/A	0	Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.Awesome-Jailbreak-on-LLMs 是最先进、新颖、令人兴奋的 LLM 越狱方法的集合。它包含论文、代码、数据集、评估和分析。	2026-03-30
60	xllm jd-opensource	1.3k	194	C++	77	A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators.适用于 LLM、VLM、DiT 和 REC 模型的高性能推理引擎，针对各种 AI 加速器进行了优化。	2026-04-30
61	awesome-vlm-architectures gokayfem	1.2k	55	Markdown	0	Famous Vision Language Models and Their Architectures著名视觉语言模型及其架构	2026-01-11
62	miles radixark	1.2k	182	Python	73	Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.Miles 是一个面向企业的 LLM 和 VLM 训练后强化学习框架，由 slime 分叉并共同进化。	2026-05-06
63	AeroSandbox peterdsharpe	1.2k	193	Jupyter Notebook	9	Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.通过计算图转换（例如自动微分）快速实现飞机设计优化。用于空气动力学、推进、结构、轨迹设计等的可组合分析工具。	2026-04-14
64	kubeai kubeai-project	1.2k	126	Go	82	AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.Kubernetes 的人工智能推理运算符。在生产中提供 ML 模型的最简单方法。支持 VLM、LLM、嵌入和语音转文本。	2026-03-31
65	vlm_arm TommyZihao	1.2k	195	Jupyter Notebook	6	机械臂+大模型+多模态=人机协作具身智能体	2026-02-28
66	CogAgent zai-org	1.2k	99	Python	27	An open-sourced end-to-end VLM-based GUI Agent基于 VLM 的开源端到端 GUI 代理	2025-04-04
67	vlms-zero-to-hero SkalskiP	1.2k	102	Jupyter Notebook	1	This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.本系列将带您踏上从 NLP 和计算机视觉基础知识到视觉语言模型前沿的旅程。	2025-01-23
68	joycaption fpgaminer	1.1k	68	Jupyter Notebook	38	JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.JoyCaption 是一个图像字幕视觉语言模型 (VLM)，它是一个免费、开放且未经审查的模型，供社区用于训练 Diffusion 模型。	2026-02-24
69	vlmcsd kkkgo	1.1k	310	C	0	🔑Portable open-source KMS Emulator in C🔑C 语言的便携式开源 KMS 模拟器	2024-01-06
70	Bunny BAAI-DCAI	1.1k	76	Python	24	A family of lightweight multimodal models. 一系列轻量级多模式模型。	2024-11-18
71	Gpt-Agreement-Payment DanOps-1	1.0k	456	Python	6	ChatGPT Plus/Team/Pro 订阅协议端到端重放工具集 · hCaptcha 视觉求解器 · 反欺诈机制实证研究 / End-to-end protocol replay toolkit for ChatGPT Plus/Team/Pro subscription with from-scratch hCaptcha solver and empirical anti-fraud research	2026-05-05
72	AngelSlim Tencent	981	102	Python	46	Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.模型压缩工具包旨在增强可用性、全面性和效率。	2026-04-29
73	prismatic-vlms TRI-ML	975	1.1k	Python	19	A flexible and efficient codebase for training visually-conditioned language models (VLMs)用于训练视觉条件语言模型 (VLM) 的灵活高效的代码库	2024-07-04
74	streaming-vlm mit-han-lab	969	62	Python	27	StreamingVLM: Real-Time Understanding for Infinite Video StreamsStreamingVLM：实时理解无限视频流	2025-10-15
75	VisRAG OpenBMB	950	72	Python	0	Parsing-free RAG supported by VLMsVLM 支持的免解析 RAG	2025-12-07
76	GamingAgent lmgame-org	926	99	Python	7	[ICLR 2026] LLM/VLM gaming agents and model evaluation through games.[ICLR 2026] LLM/VLM 游戏代理和通过游戏进行模型评估。	2025-11-16
77	mindnlp mindspore-lab	918	271	Python	60	MindSpore + 🤗Huggingface: Run any Transformers/Diffusers model on MindSpore with seamless compatibility and acceleration.MindSpore + 🤗Huggingface：在 MindSpore 上运行任何 Transformers/Diffusers 模型，具有无缝兼容性和加速功能。	2026-03-08
78	Awesome-Token-Compress daixiangzi	891	42	N/A	0	A paper list of some recent works about Token Compress for Vit and VLM关于 Vit 和 VLM 的 Token compress 的一些最新作品的论文列表	2026-04-14
79	gpt-assistant-android Skythinker616	877	123	Java	22	【新增智能体模式】安卓端全场景GPT助手，可用音量键唤起并进行语音交流，支持联网、拍照、模板、附件解析、智能体模式等 \| GPT assistant for Android, activated via volume keys for voice interaction, supporting features such as networking, taking photos, templates, parsing PDF and Office documents, and agent mode.	2026-05-05
80	UniWorld PKU-YuanGroup	876	29	Python	14	UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and GenerationUniWorld：用于统一视觉理解和生成的高分辨率语义编码器	2025-12-23
81	UniPic SkyworkAI	869	45	Python	25	Open-source SOTA multi-image editing model开源SOTA多图像编辑模型	2026-01-24
82	InternNav InternRobotics	838	114	Jupyter Notebook	5	InternRobotics' open platform for building generalized navigation foundation models.InternRobotics 用于构建通用导航基础模型的开放平台。	2026-03-10
83	Awesome-Robotics-3D zubair-irshad	808	41	N/A	3	A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites受令人敬畏的计算机视觉启发，与大型模型（即 LLM/VLM）时代的机器人领域相关的 3D 视觉论文精选列表，包括论文、代码和相关网站	2025-12-17
84	awesome-llm-and-aigc coderonion	807	74	N/A	5	🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.🚀🚀🚀一些关于大型语言模型（LLM）、视觉语言模型（VLM）、视觉语言动作（VLA）、人工智能生成内容（AIGC）、相关数据集和应用程序的精彩公共项目的集合。	2025-08-01
85	Awesome-Prompt-Adapter-Learning-for-VLMs-CLIP zhengli97	771	40	N/A	0	A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.针对 CLIP 等视觉语言模型的精彩提示/适配器学习方法的精选列表。	2026-04-16
86	awesome-data-llm OpenDataBox	770	68	N/A	1	Official Repository of "LLM × DATA" Survey Paper“LLM × DATA”调查论文的官方存储库	2026-03-24
87	OpenWorldLib OpenDCAI	738	40	Python	5	Unified Codebase for Advanced World Models.先进世界模型的统一代码库。	2026-05-02
88	Awesome-Spatial-Intelligence-in-VLM mll-lab-nu	736	40	N/A	3	A paper list for spatial reasoning空间推理论文列表	2026-01-19
89	NEO EvolvingLMMs-Lab	731	27	Python	0	NEO Series: Native Vision-Language Models from First PrinciplesNEO 系列：来自第一原理的原生视觉语言模型	2026-04-26
90	GeoChat mbzuai-oryx	713	62	Python	46	[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing[CVPR 2024 🔥] GeoChat，第一个落地遥感大视觉语言模型	2024-11-28
91	LightCompress ModelTC	712	79	Python	42	[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.[EMNLP 2024 和 AAAI 2026] 一个强大的工具包，用于压缩大型模型，包括 LLM、VLM 和视频生成模型。	2026-04-01
92	PytorchNetHub bobo0810	708	156	Jupyter Notebook	0	项目注释+论文复现+算法竞赛+Pytorch实践+LeetCode+VLM预训练	2025-05-12
93	dingo MigoXLab	693	71	Python	3	Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation ToolDingo：全面的人工智能数据、模型和应用质量评估工具	2026-04-30
94	vlmaps vlmaps	676	79	Python	14	[ICRA2023] Implementation of Visual Language Maps for Robot Navigation[ICRA2023] 机器人导航视觉语言地图的实现	2024-07-09
95	OmniInfer omnimind-ai	676	4	Python	2	Easy, fast, and private LLM & VLM inference for every device适用于每台设备的简单、快速且私密的 LLM 和 VLM 推理	2026-05-01
96	OmniLottie OpenVGLab	658	38	Python	5	[CVPR 2026🔥] 🧑‍🎨 OmniLottie, an open-sourced multi-modal instructed vector animation generator that produces Lottie JSONs.[CVPR 2026🔥] 🧑‍🎨 OmniLottie，一个开源多模式指令矢量动画生成器，可生成 Lottie JSON。	2026-04-06
97	MiMo-VL XiaomiMiMo	640	31	N/A	6	MiMo-VL米莫-VL	2025-08-21
98	VLM2Vec TIGER-AI-Lab	639	60	Python	27	This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]此存储库包含“VLM2Vec：用于大规模多模态嵌入任务的训练视觉语言模型”的代码 [ICLR 2025]	2026-04-28
99	localGPT-Vision PromtEngineer	630	130	Python	27	Chat with your documents using Vision Language Models. This repo implements an End to End RAG pipeline with both local and proprietary VLMs使用视觉语言模型与您的文档聊天。此存储库使用本地和专有 VLM 实现端到端 RAG 管道	2025-07-26
100	video-search-and-summarization NVIDIA-AI-Blueprints	628	230	Python	5	Suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications.用于构建 GPU 加速视觉代理和人工智能驱动的视频分析应用程序的参考架构套件。	2026-05-06

No repositories match your search 没有匹配的仓库