Github Ranking /
2026-05-06
Back to Rankings返回排行榜

⚖️ Top 100 · RLHF / alignment前 100 · RLHF / 对齐

100 repositories sorted by rlhf / alignment 按 RLHF / 对齐 排序,共 100 个仓库

📦 100 repos个仓库 🕐 2026-05-06
# Repository仓库 Stars Forks Language语言 Issues Description描述 Last Commit最后提交
1 LlamaFactory hiyouga 70.9k 8.7k Python 953 Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)100多个LLM和VLM的统一高效微调(ACL 2024) 2026-05-03
2 Open-Assistant LAION-AI 37.4k 3.3k Python 227 OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.OpenAssistant 是一个基于聊天的助手,它可以理解任务,可以与第三方系统交互,并动态检索信息来执行此操作。 2024-08-17
3 LLMSurvey RUCAIBox 12.2k 941 Python 24 The official GitHub page for the survey paper "A Survey of Large Language Models".Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-03-11
4 OpenRLHF OpenRLHF 9.4k 934 Python 295 An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)基于 Ray 的易于使用、可扩展且高性能的 Agentic RL 框架(PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL) 2026-05-05
5 PaLM-rlhf-pytorch lucidrains 7.9k 679 Python 17 Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLMImplementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture.基本上是 ChatGPT,但使用 PaLM 2025-10-11
6 InternLM InternLM 7.2k 508 Python 8 Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-10-30
7 Chinese-LLaMA-Alpaca-2 ymcui 7.1k 567 Python 1 中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models) 2026-04-19
8 alignment-handbook huggingface 5.6k 488 Python 92 Robust recipes to align language models with human and AI preferencesError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-04-08
9 MedicalGPT shibing624 5.3k 742 Python 21 MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。 2026-04-28
10 OpenClaw-RL Gen-Verse 5.2k 561 Python 47 OpenClaw-RL: Train any agent simply by talkingError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-04-30
11 argilla argilla-io 5.0k 484 Python 2 Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasetsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-04-27
12 transformerlab-app transformerlab 4.9k 510 Python 23 The open source research environment for AI researchers to seamlessly train, evaluate, and scale models from local hardware to GPU clusters.供 AI 研究人员无缝训练、评估和扩展从本地硬件到 GPU 集群的模型的开源研究环境。 2026-05-05
13 Kiln Kiln-AI 4.8k 361 Python 22 Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-05-06
14 trlx CarperAI 4.7k 484 Python 86 A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)通过人类反馈(RLHF)进行强化学习的语言模型分布式训练的存储库 2024-01-08
15 align-anything PKU-Alignment 4.7k 506 Python 29 Align Anything: Training All-modality Model with Feedback对齐一切:通过反馈训练全模态模型 2025-11-27
16 awesome-RLHF opendilab 4.4k 252 N/A 0 A curated list of reinforcement learning with human feedback resources (continually updated)带有人类反馈资源的强化学习精选列表(持续更新) 2025-12-09
17 ChatGLM-Efficient-Tuning hiyouga 3.7k 464 Python 0 Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调 2023-10-12
18 docta Docta-ai 3.5k 256 Python 0 A Doctor for your data您的数据医生 2025-01-14
19 distilabel argilla-io 3.2k 243 Python 79 Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.Distilabel 是一个合成数据和人工智能反馈框架,适用于需要基于经过验证的研究论文的快速、可靠和可扩展管道的工程师。 2026-04-27
20 ROLL alibaba 3.1k 275 Python 87 An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language ModelsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-05-05
21 Cortex qibin0506 2.6k 209 Python 8 从零构建大模型:从预训练到RLHF的完整实践 2026-03-19
22 transformers_tasks HarderThenHarder 2.4k 401 Jupyter Notebook 59 ⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.⭐️ 带有 Transformer lib 的 NLP 算法。 Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc. 2023-09-29
23 alpaca_eval tatsu-lab 2.0k 308 Jupyter Notebook 19 An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.An automatic evaluator for instruction-following language models.经过人工验证、高质量、便宜且快速。 2025-08-09
24 rlhf-book natolambert 1.9k 187 Python 4 Textbook on reinforcement learning from human feedbackError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-05-05
25 hh-rlhf anthropics 1.8k 158 N/A 0 Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-06-17
26 ChatLM-mini-Chinese charent 1.7k 195 Python 9 中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。 2024-04-20
27 ImageReward zai-org 1.7k 92 Python 58 [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation[NeurIPS 2023] ImageReward:学习和评估人类对文本到图像生成的偏好 2025-10-29
28 WebGLM THUDM 1.6k 134 Python 51 WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)WebGLM:高效的网络增强问答系统 (KDD 2023) 2025-03-25
29 safe-rlhf PKU-Alignment 1.6k 132 Python 16 Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human FeedbackError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-11-24
30 RLHF-Reward-Modeling RLHFlow 1.5k 109 Python 19 Recipes to train reward model for RLHF.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-04-24
31 MOSS-RLHF OpenLMLab 1.4k 105 Python 39 Secrets of RLHF in Large Language Models Part I: PPOError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2024-03-03
32 AgentsMeetRL thinkwee 1.2k 48 HTML 0 Awesome List for Agentic RLAgentic RL 的精彩列表 2026-04-28
33 xtreme1 xtreme1-io 1.2k 204 TypeScript 41 Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.Xtreme1 是一款用于多模态数据训练的一体化数据标记和注释平台,支持 3D LiDAR 点云、图像和 LLM。 2025-07-15
34 pyre-code whwangovo 972 84 Python 0 A self-hosted ML coding practice platform. 68 problems from ReLU to flow matching — attention, training, RLHF, diffusion, and more. Instant feedback in the browser.一个自托管的机器学习编码练习平台。 68 problems from ReLU to flow matching — attention, training, RLHF, diffusion, and more.浏览器中的即时反馈。 2026-05-04
35 SimPO princeton-nlp 954 76 Python 25 [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward[NeurIPS 2024] SimPO:具有无参考奖励的简单偏好优化 2025-02-16
36 HALOs ContextualAI 904 51 Python 7 A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).具有 DPO、KTO、PPO、ORPO 和其他人类感知损失函数 (HALO) 的可扩展实现的库。 2025-09-30
37 alpaca_farm tatsu-lab 845 64 Python 6 A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data. Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2024-07-01
38 AlignLLMHumanSurvey GaryYufei 741 30 N/A 0 Aligning Large Language Models with Human: A Survey使大型语言模型与人类保持一致:一项调查 2023-09-11
39 reward-bench allenai 713 96 Python 1 RewardBench: the first evaluation tool for reward models.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-02-16
40 AI-Compass tingaicompass 713 88 Python 1 “AI-Compass”将为社区指引在 AI 技术海洋中航行的方向,无论你是初学者还是进阶开发者,都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势,并通过实践掌握从理论到落地的全过程。 2026-04-30
41 diy-llm datawhalechina 699 78 Jupyter Notebook 2 🎓 系统性大语言模型构建课程|🛠️ 覆盖预训练数据工程、Tokenizer、Transformer、MoE、GPU 编程 (CUDA/Triton)、分布式训练、Scaling Laws、推理优化及对齐 (SFT/RLHF/GRPO)|🚀 6 个渐进式作业 + 代码驱动,建立 LLM 全栈认知体系 2026-04-24
42 Cornucopia-LLaMA-Fin-Chinese jerry1993-tech 657 68 Python 17 聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等) 2023-06-30
43 oat sail-sg 652 64 Python 6 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-01-29
44 LLamaTuner jianzhnie 621 64 Python 18 Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署. 2025-01-24
45 Trinity-RFT agentscope-ai 619 66 Python 31 Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).Trinity-RFT 是一个通用、灵活且可扩展的框架,专为大语言模型 (LLM) 的强化微调 (RFT) 而设计。 2026-04-28
46 hands-on-modern-rl walkinglabs 590 32 Python 1 🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems. 🚀 一个开源的实践课程,弥合了从基本 RL 概念到 LLM 对齐、RLVR 和高级 Agentic 系统的差距。 2026-05-05
47 SPPO uclaml 587 48 Python 14 The official implementation of Self-Play Preference Optimization (SPPO)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-01-23
48 OpenJudge agentscope-ai 587 48 Python 9 OpenJudge: A Unified Framework for Holistic Evaluation and Quality RewardsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-04-30
49 TextRL voidful 564 61 Python 3 Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-04-23
50 Online-RLHF RLHFlow 544 48 Python 12 A recipe for online RLHF and online iterative DPO.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2024-12-28
51 dLLM-RL Gen-Verse 499 40 Python 24 [ICLR 2026] Official code for TraceRL: Revolutionizing post-training for Diffusion LLMs, powering the SOTA TraDo series.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-01-28
52 Open-AgentRL Gen-Verse 483 50 Python 6 [ICML 2026] RLAnything & DemyAgent: General and scalable agentic RL algorithms across terminal, GUI, SWE, and tool-call settings[ICML 2026] RLAnything 和 DemyAgent:跨终端、GUI、SWE 和工具调用设置的通用且可扩展的代理 RL 算法 2026-02-27
53 step_into_llm mindspore-lab 478 127 Jupyter Notebook 27 MindSpore online courses: Step into LLMError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-12-22
54 LLM-Algorithm-Intern-Guide Junvate 474 12 N/A 0 🚀 2026届大模型算法岗实习面经 | 包含 DeepSeek/Qwen 技术报告解析、手撕 PPO/RoPE/Transformer、RLHF 核心与八股文 | 持续更新中... 2026-03-28
55 LaMDA-rlhf-pytorch conceptofmind 469 73 Python 6 Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2024-02-24
56 LLM-RLHF-Tuning Joyce94 454 24 Python 3 LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA) Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2023-10-11
57 ABC-GRPO chi2liu 438 43 Python 0 Code For Adaptive-Boundary-Clipping GRPO. arxiv.org/pdf/2601.03895 2026-03-26
58 pykoi CambioML 411 45 Jupyter Notebook 2 pykoi: Active learning in one unified interfacepykoi:在一个统一的界面中进行主动学习 2025-09-24
59 VisionReward zai-org 400 13 Python 19 [AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation[AAAI 2026] VisionReward:用于图像和视频生成的细粒度多维人类偏好学习 2025-03-26
60 LLaVA-RLHF llava-rlhf 395 31 Python 4 Aligning LMMs with Factually Augmented RLHF将 LMM 与事实增强的 RLHF 结合起来 2023-11-01
61 awesome-llm-human-preference-datasets glgh 390 18 N/A 0 A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.用于 LLM 微调、RLHF 和评估的人类偏好数据集的精选列表。 2023-10-04
62 quick-start-guide-to-llms sinanuozdemir 380 209 Jupyter Notebook 1 The Official Repo for "Quick Start Guide to Large Language Models"“大型语言模型快速入门指南”的官方存储库 2025-10-07
63 mLoRA TUDB-Labs 376 66 Python 12 An Efficient "Factory" to Build Multiple LoRA AdaptersError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-02-13
64 mlx-lm-lora Goekdeniz-Guelmez 368 44 Python 1 Train Large Language Models on MLX.在 MLX 上训练大型语言模型。 2026-04-23
65 Stable-Alignment agi-templar 355 18 Python 4 Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2023-06-18
66 MedQA-ChatGLM WangRongsheng 338 50 Python 3 🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答 2023-09-02
67 ReaLHF openpsi-project 336 22 Python 0 Super-Efficient RLHF Training of LLMs with Parameter Reallocation通过参数重新分配对 LLM 进行超高效 RLHF 训练 2025-04-24
68 VADER mihirp1998 314 15 Python 11 Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc. 通过奖励梯度进行视频扩散对齐。我们通过使用 HPS、PickScore、VideoMAE、VJEPA、YOLO、Aesthetics 等各种奖励模型进行微调,改进了各种视频扩散模型,例如 VideoCrafter、OpenSora、ModelScope 和 StableVideoDiffusion。 2025-03-12
69 RLHF-V RLHF-V 307 9 Python 2 [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback[CVPR'24] RLHF-V:通过细粒度矫正人类反馈的行为调整,迈向值得信赖的 MLLM 2024-09-11
70 RLLoggingBoard HarderThenHarder 295 9 Python 0 A visuailzation tool to make deep understaning and easier debugging for RLHF training.一种可视化工具,可深入理解 RLHF 训练并更轻松地进行调试。 2025-02-20
71 JarvisEvo LYL1015 294 8 Python 1 [CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator OptimizationError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-02-22
72 RLHF sunzeyeah 287 35 Python 3 Implementation of Chinese ChatGPTError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2023-11-20
73 FineGrainedRLHF allenai 284 24 Python 2 2025-01-06
74 Open-R1 jianzhnie 277 54 Python 0 The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1 2025-03-10
75 RLHF-Label-Tool SupritYoung 255 21 Python 2 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.用于大模型 RLHF 进行人工数据标注排序的工具。 A tool for manual response data annotation sorting in RLHF stage. 2023-08-01
76 RLHF_in_notebooks ash80 245 29 Jupyter Notebook 0 RLHF (Supervised fine-tuning, reward model, and PPO) step-by-step in 3 Jupyter notebooksError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-06-20
77 llama-trl jasonvanf 239 24 Python 7 LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRAError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-08-17
78 unsloth-buddy TYH-labs 234 13 Python 0 Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Part of the Gaslamp AI platform.Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Gaslamp AI 平台的一部分。 2026-05-05
79 chain-of-hindsight haoliuhl 229 17 Python 3 Simple next-token-prediction for RLHFError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2023-09-30
80 RLHF HumanSignal 226 45 Jupyter Notebook 3 Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models关于如何收集数据和构建端到端 RLHF 系统以微调生成式 AI 模型的链接、教程和最佳实践的集合 2023-07-24
81 minChatGPT ethanyanjiali 226 35 Python 3 A minimum example of aligning language models with RLHF similar to ChatGPTError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2023-09-26
82 LLaVA-MoD shufangxun 227 16 Python 3 [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge DistillationError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2025-03-31
83 Vicuna-LoRA-RLHF-PyTorch jackaduma 221 18 Python 15 A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna在消费类硬件上使用 LoRA 和 RLHF 微调 Vicuna LLM 的完整流程。在 Vicuna 架构之上实现 RLHF(带有人类反馈的强化学习)。 Basically ChatGPT but with Vicuna 2024-05-20
84 ecoalign-forge dengxianghua888-ops 206 12 Python 0 Multi-Agent DPO Data Synthesis Factory — 多智能体偏好训练数据自动合成框架 | 红队攻击 → 多persona审核 → 终审裁决 → DPO偏好对 2026-04-11
85 IterComp YangLing0818 205 11 Python 5 [ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation[ICLR 2025] IterComp:从模型库中进行迭代组合感知反馈学习,以生成文本到图像 2025-02-19
86 ReMax liziniu 202 15 Python 1 Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)论文代码(ReMax:一种简单、高效、有效的对齐大型语言模型的强化学习方法) 2023-12-16
87 awesome-RLAIF mengdi-li 202 7 N/A 0 A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF) 不断更新的关于人工智能反馈强化学习 (RLAIF) 的文献列表 2025-08-06
88 MM-RLHF Kwai-YuanQi 200 9 Python 5 The Next Step Forward in Multimodal LLM Alignment多模式法学硕士联盟的下一步 2025-05-01
89 VL-RLHF TideDra 199 8 Python 11 A RLHF Infrastructure for Vision-Language ModelsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2024-11-15
90 ChatGLM-RLHF Miraclemarvel55 198 26 Python 3 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF 2023-05-23
91 lm-human-preference-details vwxyzjn 197 12 Python 0 RLHF implementation details of OAI's 2019 codebaseOAI 2019 代码库的 RLHF 实施细节 2024-01-14
92 aligner PKU-Alignment 193 10 Python 0 [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct【NeurIPS 2024 Oral】Aligner:通过学习矫正实现高效对准 2025-01-16
93 LLM-RLHF-Tuning-with-PPO-and-DPO raghavc 190 19 Python 2 Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2026-02-24
94 AIDoctor Jerry-XDL 188 16 Python 0 AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…AIDoctor 使用 ChatGPT 训练管道训练医学 GPT 模型,实施预训练、监督微调、RLHF(奖励建模和强化学习)和 DPO(直接偏好)... 2025-03-11
95 pretraining-with-human-feedback tomekkorbak 181 14 Python 6 Code accompanying the paper Pretraining Language Models with Human Preferences论文《根据人类偏好预训练语言模型》随附的代码 2024-02-13
96 beavertails PKU-Alignment 181 6 Makefile 3 BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).BeaverTails 是一个数据集集合,旨在促进大型语言模型 (LLM) 中的安全对齐研究。 2023-10-27
97 nanoRLHF hyunwoongko 180 16 Python 0 nanoRLHF: from-scratch journey into how LLMs and RLHF really work.nanoRLHF:从头开始了解法学硕士和 RLHF 的真正运作方式。 2026-01-23
98 instructGOOSE xrsrke 174 21 Jupyter Notebook 3 Implementation of Reinforcement Learning from Human Feedback (RLHF)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. 2023-04-07
99 llama-qrlhf lucidrains 170 8 Python 0 Implementation of the Llama architecture with RLHF + Q-learning使用 RLHF + Q-learning 实现 Llama 架构 2025-02-01
100 notus argilla-io 168 14 Python 1 Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approachNotus 是使用 SFT、DPO、SFT+DPO 和/或任何其他 RLHF 技术进行微调的 LLM 的集合,同时始终保持数据优先的方法 2024-01-15
No repositories match your search 没有匹配的仓库