Back to Rankings返回排行榜
Top 100 · RLHF / alignment前 100 · RLHF / 对齐
100 repositories sorted by rlhf / alignment 按 RLHF / 对齐 排序,共 100 个仓库
| # | Repository仓库 | Stars | Forks | Language语言 | Issues | Description描述 | Last Commit最后提交 |
|---|---|---|---|---|---|---|---|
| 1 | LlamaFactory hiyouga | 70.9k | 8.7k | Python | 953 | Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)100多个LLM和VLM的统一高效微调(ACL 2024) | 2026-05-03 |
| 2 | Open-Assistant LAION-AI | 37.4k | 3.3k | Python | 227 | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.OpenAssistant 是一个基于聊天的助手,它可以理解任务,可以与第三方系统交互,并动态检索信息来执行此操作。 | 2024-08-17 |
| 3 | LLMSurvey RUCAIBox | 12.2k | 941 | Python | 24 | The official GitHub page for the survey paper "A Survey of Large Language Models".Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-03-11 |
| 4 | OpenRLHF OpenRLHF | 9.4k | 934 | Python | 295 | An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)基于 Ray 的易于使用、可扩展且高性能的 Agentic RL 框架(PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL) | 2026-05-05 |
| 5 | PaLM-rlhf-pytorch lucidrains | 7.9k | 679 | Python | 17 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLMImplementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture.基本上是 ChatGPT,但使用 PaLM | 2025-10-11 |
| 6 | InternLM InternLM | 7.2k | 508 | Python | 8 | Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-10-30 |
| 7 | Chinese-LLaMA-Alpaca-2 ymcui | 7.1k | 567 | Python | 1 | 中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models) | 2026-04-19 |
| 8 | alignment-handbook huggingface | 5.6k | 488 | Python | 92 | Robust recipes to align language models with human and AI preferencesError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-04-08 |
| 9 | MedicalGPT shibing624 | 5.3k | 742 | Python | 21 | MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。 | 2026-04-28 |
| 10 | OpenClaw-RL Gen-Verse | 5.2k | 561 | Python | 47 | OpenClaw-RL: Train any agent simply by talkingError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-04-30 |
| 11 | argilla argilla-io | 5.0k | 484 | Python | 2 | Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasetsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-04-27 |
| 12 | transformerlab-app transformerlab | 4.9k | 510 | Python | 23 | The open source research environment for AI researchers to seamlessly train, evaluate, and scale models from local hardware to GPU clusters.供 AI 研究人员无缝训练、评估和扩展从本地硬件到 GPU 集群的模型的开源研究环境。 | 2026-05-05 |
| 13 | Kiln Kiln-AI | 4.8k | 361 | Python | 22 | Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-05-06 |
| 14 | trlx CarperAI | 4.7k | 484 | Python | 86 | A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)通过人类反馈(RLHF)进行强化学习的语言模型分布式训练的存储库 | 2024-01-08 |
| 15 | align-anything PKU-Alignment | 4.7k | 506 | Python | 29 | Align Anything: Training All-modality Model with Feedback对齐一切:通过反馈训练全模态模型 | 2025-11-27 |
| 16 | awesome-RLHF opendilab | 4.4k | 252 | N/A | 0 | A curated list of reinforcement learning with human feedback resources (continually updated)带有人类反馈资源的强化学习精选列表(持续更新) | 2025-12-09 |
| 17 | ChatGLM-Efficient-Tuning hiyouga | 3.7k | 464 | Python | 0 | Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调 | 2023-10-12 |
| 18 | docta Docta-ai | 3.5k | 256 | Python | 0 | A Doctor for your data您的数据医生 | 2025-01-14 |
| 19 | distilabel argilla-io | 3.2k | 243 | Python | 79 | Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.Distilabel 是一个合成数据和人工智能反馈框架,适用于需要基于经过验证的研究论文的快速、可靠和可扩展管道的工程师。 | 2026-04-27 |
| 20 | ROLL alibaba | 3.1k | 275 | Python | 87 | An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language ModelsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-05-05 |
| 21 | Cortex qibin0506 | 2.6k | 209 | Python | 8 | 从零构建大模型:从预训练到RLHF的完整实践 | 2026-03-19 |
| 22 | transformers_tasks HarderThenHarder | 2.4k | 401 | Jupyter Notebook | 59 | ⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.⭐️ 带有 Transformer lib 的 NLP 算法。 Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc. | 2023-09-29 |
| 23 | alpaca_eval tatsu-lab | 2.0k | 308 | Jupyter Notebook | 19 | An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.An automatic evaluator for instruction-following language models.经过人工验证、高质量、便宜且快速。 | 2025-08-09 |
| 24 | rlhf-book natolambert | 1.9k | 187 | Python | 4 | Textbook on reinforcement learning from human feedbackError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-05-05 |
| 25 | hh-rlhf anthropics | 1.8k | 158 | N/A | 0 | Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-06-17 |
| 26 | ChatLM-mini-Chinese charent | 1.7k | 195 | Python | 9 | 中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。 | 2024-04-20 |
| 27 | ImageReward zai-org | 1.7k | 92 | Python | 58 | [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation[NeurIPS 2023] ImageReward:学习和评估人类对文本到图像生成的偏好 | 2025-10-29 |
| 28 | WebGLM THUDM | 1.6k | 134 | Python | 51 | WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)WebGLM:高效的网络增强问答系统 (KDD 2023) | 2025-03-25 |
| 29 | safe-rlhf PKU-Alignment | 1.6k | 132 | Python | 16 | Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human FeedbackError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-11-24 |
| 30 | RLHF-Reward-Modeling RLHFlow | 1.5k | 109 | Python | 19 | Recipes to train reward model for RLHF.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-04-24 |
| 31 | MOSS-RLHF OpenLMLab | 1.4k | 105 | Python | 39 | Secrets of RLHF in Large Language Models Part I: PPOError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2024-03-03 |
| 32 | AgentsMeetRL thinkwee | 1.2k | 48 | HTML | 0 | Awesome List for Agentic RLAgentic RL 的精彩列表 | 2026-04-28 |
| 33 | xtreme1 xtreme1-io | 1.2k | 204 | TypeScript | 41 | Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.Xtreme1 是一款用于多模态数据训练的一体化数据标记和注释平台,支持 3D LiDAR 点云、图像和 LLM。 | 2025-07-15 |
| 34 | pyre-code whwangovo | 972 | 84 | Python | 0 | A self-hosted ML coding practice platform. 68 problems from ReLU to flow matching — attention, training, RLHF, diffusion, and more. Instant feedback in the browser.一个自托管的机器学习编码练习平台。 68 problems from ReLU to flow matching — attention, training, RLHF, diffusion, and more.浏览器中的即时反馈。 | 2026-05-04 |
| 35 | SimPO princeton-nlp | 954 | 76 | Python | 25 | [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward[NeurIPS 2024] SimPO:具有无参考奖励的简单偏好优化 | 2025-02-16 |
| 36 | HALOs ContextualAI | 904 | 51 | Python | 7 | A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).具有 DPO、KTO、PPO、ORPO 和其他人类感知损失函数 (HALO) 的可扩展实现的库。 | 2025-09-30 |
| 37 | alpaca_farm tatsu-lab | 845 | 64 | Python | 6 | A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data. Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2024-07-01 |
| 38 | AlignLLMHumanSurvey GaryYufei | 741 | 30 | N/A | 0 | Aligning Large Language Models with Human: A Survey使大型语言模型与人类保持一致:一项调查 | 2023-09-11 |
| 39 | reward-bench allenai | 713 | 96 | Python | 1 | RewardBench: the first evaluation tool for reward models.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-02-16 |
| 40 | AI-Compass tingaicompass | 713 | 88 | Python | 1 | “AI-Compass”将为社区指引在 AI 技术海洋中航行的方向,无论你是初学者还是进阶开发者,都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势,并通过实践掌握从理论到落地的全过程。 | 2026-04-30 |
| 41 | diy-llm datawhalechina | 699 | 78 | Jupyter Notebook | 2 | 🎓 系统性大语言模型构建课程|🛠️ 覆盖预训练数据工程、Tokenizer、Transformer、MoE、GPU 编程 (CUDA/Triton)、分布式训练、Scaling Laws、推理优化及对齐 (SFT/RLHF/GRPO)|🚀 6 个渐进式作业 + 代码驱动,建立 LLM 全栈认知体系 | 2026-04-24 |
| 42 | Cornucopia-LLaMA-Fin-Chinese jerry1993-tech | 657 | 68 | Python | 17 | 聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等) | 2023-06-30 |
| 43 | oat sail-sg | 652 | 64 | Python | 6 | 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-01-29 |
| 44 | LLamaTuner jianzhnie | 621 | 64 | Python | 18 | Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署. | 2025-01-24 |
| 45 | Trinity-RFT agentscope-ai | 619 | 66 | Python | 31 | Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).Trinity-RFT 是一个通用、灵活且可扩展的框架,专为大语言模型 (LLM) 的强化微调 (RFT) 而设计。 | 2026-04-28 |
| 46 | hands-on-modern-rl walkinglabs | 590 | 32 | Python | 1 | 🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems. 🚀 一个开源的实践课程,弥合了从基本 RL 概念到 LLM 对齐、RLVR 和高级 Agentic 系统的差距。 | 2026-05-05 |
| 47 | SPPO uclaml | 587 | 48 | Python | 14 | The official implementation of Self-Play Preference Optimization (SPPO)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-01-23 |
| 48 | OpenJudge agentscope-ai | 587 | 48 | Python | 9 | OpenJudge: A Unified Framework for Holistic Evaluation and Quality RewardsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-04-30 |
| 49 | TextRL voidful | 564 | 61 | Python | 3 | Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-04-23 |
| 50 | Online-RLHF RLHFlow | 544 | 48 | Python | 12 | A recipe for online RLHF and online iterative DPO.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2024-12-28 |
| 51 | dLLM-RL Gen-Verse | 499 | 40 | Python | 24 | [ICLR 2026] Official code for TraceRL: Revolutionizing post-training for Diffusion LLMs, powering the SOTA TraDo series.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-01-28 |
| 52 | Open-AgentRL Gen-Verse | 483 | 50 | Python | 6 | [ICML 2026] RLAnything & DemyAgent: General and scalable agentic RL algorithms across terminal, GUI, SWE, and tool-call settings[ICML 2026] RLAnything 和 DemyAgent:跨终端、GUI、SWE 和工具调用设置的通用且可扩展的代理 RL 算法 | 2026-02-27 |
| 53 | step_into_llm mindspore-lab | 478 | 127 | Jupyter Notebook | 27 | MindSpore online courses: Step into LLMError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-12-22 |
| 54 | LLM-Algorithm-Intern-Guide Junvate | 474 | 12 | N/A | 0 | 🚀 2026届大模型算法岗实习面经 | 包含 DeepSeek/Qwen 技术报告解析、手撕 PPO/RoPE/Transformer、RLHF 核心与八股文 | 持续更新中... | 2026-03-28 |
| 55 | LaMDA-rlhf-pytorch conceptofmind | 469 | 73 | Python | 6 | Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2024-02-24 |
| 56 | LLM-RLHF-Tuning Joyce94 | 454 | 24 | Python | 3 | LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA) Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2023-10-11 |
| 57 | ABC-GRPO chi2liu | 438 | 43 | Python | 0 | Code For Adaptive-Boundary-Clipping GRPO. arxiv.org/pdf/2601.03895 | 2026-03-26 |
| 58 | pykoi CambioML | 411 | 45 | Jupyter Notebook | 2 | pykoi: Active learning in one unified interfacepykoi:在一个统一的界面中进行主动学习 | 2025-09-24 |
| 59 | VisionReward zai-org | 400 | 13 | Python | 19 | [AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation[AAAI 2026] VisionReward:用于图像和视频生成的细粒度多维人类偏好学习 | 2025-03-26 |
| 60 | LLaVA-RLHF llava-rlhf | 395 | 31 | Python | 4 | Aligning LMMs with Factually Augmented RLHF将 LMM 与事实增强的 RLHF 结合起来 | 2023-11-01 |
| 61 | awesome-llm-human-preference-datasets glgh | 390 | 18 | N/A | 0 | A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.用于 LLM 微调、RLHF 和评估的人类偏好数据集的精选列表。 | 2023-10-04 |
| 62 | quick-start-guide-to-llms sinanuozdemir | 380 | 209 | Jupyter Notebook | 1 | The Official Repo for "Quick Start Guide to Large Language Models"“大型语言模型快速入门指南”的官方存储库 | 2025-10-07 |
| 63 | mLoRA TUDB-Labs | 376 | 66 | Python | 12 | An Efficient "Factory" to Build Multiple LoRA AdaptersError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-02-13 |
| 64 | mlx-lm-lora Goekdeniz-Guelmez | 368 | 44 | Python | 1 | Train Large Language Models on MLX.在 MLX 上训练大型语言模型。 | 2026-04-23 |
| 65 | Stable-Alignment agi-templar | 355 | 18 | Python | 4 | Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2023-06-18 |
| 66 | MedQA-ChatGLM WangRongsheng | 338 | 50 | Python | 3 | 🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答 | 2023-09-02 |
| 67 | ReaLHF openpsi-project | 336 | 22 | Python | 0 | Super-Efficient RLHF Training of LLMs with Parameter Reallocation通过参数重新分配对 LLM 进行超高效 RLHF 训练 | 2025-04-24 |
| 68 | VADER mihirp1998 | 314 | 15 | Python | 11 | Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc. 通过奖励梯度进行视频扩散对齐。我们通过使用 HPS、PickScore、VideoMAE、VJEPA、YOLO、Aesthetics 等各种奖励模型进行微调,改进了各种视频扩散模型,例如 VideoCrafter、OpenSora、ModelScope 和 StableVideoDiffusion。 | 2025-03-12 |
| 69 | RLHF-V RLHF-V | 307 | 9 | Python | 2 | [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback[CVPR'24] RLHF-V:通过细粒度矫正人类反馈的行为调整,迈向值得信赖的 MLLM | 2024-09-11 |
| 70 | RLLoggingBoard HarderThenHarder | 295 | 9 | Python | 0 | A visuailzation tool to make deep understaning and easier debugging for RLHF training.一种可视化工具,可深入理解 RLHF 训练并更轻松地进行调试。 | 2025-02-20 |
| 71 | JarvisEvo LYL1015 | 294 | 8 | Python | 1 | [CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator OptimizationError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-02-22 |
| 72 | RLHF sunzeyeah | 287 | 35 | Python | 3 | Implementation of Chinese ChatGPTError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2023-11-20 |
| 73 | FineGrainedRLHF allenai | 284 | 24 | Python | 2 | 2025-01-06 | |
| 74 | Open-R1 jianzhnie | 277 | 54 | Python | 0 | The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1 | 2025-03-10 |
| 75 | RLHF-Label-Tool SupritYoung | 255 | 21 | Python | 2 | 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.用于大模型 RLHF 进行人工数据标注排序的工具。 A tool for manual response data annotation sorting in RLHF stage. | 2023-08-01 |
| 76 | RLHF_in_notebooks ash80 | 245 | 29 | Jupyter Notebook | 0 | RLHF (Supervised fine-tuning, reward model, and PPO) step-by-step in 3 Jupyter notebooksError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-06-20 |
| 77 | llama-trl jasonvanf | 239 | 24 | Python | 7 | LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRAError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-08-17 |
| 78 | unsloth-buddy TYH-labs | 234 | 13 | Python | 0 | Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Part of the Gaslamp AI platform.Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Gaslamp AI 平台的一部分。 | 2026-05-05 |
| 79 | chain-of-hindsight haoliuhl | 229 | 17 | Python | 3 | Simple next-token-prediction for RLHFError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2023-09-30 |
| 80 | RLHF HumanSignal | 226 | 45 | Jupyter Notebook | 3 | Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models关于如何收集数据和构建端到端 RLHF 系统以微调生成式 AI 模型的链接、教程和最佳实践的集合 | 2023-07-24 |
| 81 | minChatGPT ethanyanjiali | 226 | 35 | Python | 3 | A minimum example of aligning language models with RLHF similar to ChatGPTError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2023-09-26 |
| 82 | LLaVA-MoD shufangxun | 227 | 16 | Python | 3 | [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge DistillationError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2025-03-31 |
| 83 | Vicuna-LoRA-RLHF-PyTorch jackaduma | 221 | 18 | Python | 15 | A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna在消费类硬件上使用 LoRA 和 RLHF 微调 Vicuna LLM 的完整流程。在 Vicuna 架构之上实现 RLHF(带有人类反馈的强化学习)。 Basically ChatGPT but with Vicuna | 2024-05-20 |
| 84 | ecoalign-forge dengxianghua888-ops | 206 | 12 | Python | 0 | Multi-Agent DPO Data Synthesis Factory — 多智能体偏好训练数据自动合成框架 | 红队攻击 → 多persona审核 → 终审裁决 → DPO偏好对 | 2026-04-11 |
| 85 | IterComp YangLing0818 | 205 | 11 | Python | 5 | [ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation[ICLR 2025] IterComp:从模型库中进行迭代组合感知反馈学习,以生成文本到图像 | 2025-02-19 |
| 86 | ReMax liziniu | 202 | 15 | Python | 1 | Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)论文代码(ReMax:一种简单、高效、有效的对齐大型语言模型的强化学习方法) | 2023-12-16 |
| 87 | awesome-RLAIF mengdi-li | 202 | 7 | N/A | 0 | A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF) 不断更新的关于人工智能反馈强化学习 (RLAIF) 的文献列表 | 2025-08-06 |
| 88 | MM-RLHF Kwai-YuanQi | 200 | 9 | Python | 5 | The Next Step Forward in Multimodal LLM Alignment多模式法学硕士联盟的下一步 | 2025-05-01 |
| 89 | VL-RLHF TideDra | 199 | 8 | Python | 11 | A RLHF Infrastructure for Vision-Language ModelsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2024-11-15 |
| 90 | ChatGLM-RLHF Miraclemarvel55 | 198 | 26 | Python | 3 | 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF | 2023-05-23 |
| 91 | lm-human-preference-details vwxyzjn | 197 | 12 | Python | 0 | RLHF implementation details of OAI's 2019 codebaseOAI 2019 代码库的 RLHF 实施细节 | 2024-01-14 |
| 92 | aligner PKU-Alignment | 193 | 10 | Python | 0 | [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct【NeurIPS 2024 Oral】Aligner:通过学习矫正实现高效对准 | 2025-01-16 |
| 93 | LLM-RLHF-Tuning-with-PPO-and-DPO raghavc | 190 | 19 | Python | 2 | Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2026-02-24 |
| 94 | AIDoctor Jerry-XDL | 188 | 16 | Python | 0 | AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…AIDoctor 使用 ChatGPT 训练管道训练医学 GPT 模型,实施预训练、监督微调、RLHF(奖励建模和强化学习)和 DPO(直接偏好)... | 2025-03-11 |
| 95 | pretraining-with-human-feedback tomekkorbak | 181 | 14 | Python | 6 | Code accompanying the paper Pretraining Language Models with Human Preferences论文《根据人类偏好预训练语言模型》随附的代码 | 2024-02-13 |
| 96 | beavertails PKU-Alignment | 181 | 6 | Makefile | 3 | BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).BeaverTails 是一个数据集集合,旨在促进大型语言模型 (LLM) 中的安全对齐研究。 | 2023-10-27 |
| 97 | nanoRLHF hyunwoongko | 180 | 16 | Python | 0 | nanoRLHF: from-scratch journey into how LLMs and RLHF really work.nanoRLHF:从头开始了解法学硕士和 RLHF 的真正运作方式。 | 2026-01-23 |
| 98 | instructGOOSE xrsrke | 174 | 21 | Jupyter Notebook | 3 | Implementation of Reinforcement Learning from Human Feedback (RLHF)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know. | 2023-04-07 |
| 99 | llama-qrlhf lucidrains | 170 | 8 | Python | 0 | Implementation of the Llama architecture with RLHF + Q-learning使用 RLHF + Q-learning 实现 Llama 架构 | 2025-02-01 |
| 100 | notus argilla-io | 168 | 14 | Python | 1 | Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approachNotus 是使用 SFT、DPO、SFT+DPO 和/或任何其他 RLHF 技术进行微调的 LLM 的集合,同时始终保持数据优先的方法 | 2024-01-15 |
No repositories match your search
没有匹配的仓库