⚖️ Top 100 · RLHF / alignment前 100 · RLHF / 对齐

100 repositories sorted by rlhf / alignment 按 RLHF / 对齐排序，共 100 个仓库

⌕

📦 100 repos个仓库 🕐 2026-05-06

#	Repository仓库	Stars	Forks	Language语言	Issues	Description描述	Last Commit最后提交
1	LlamaFactory hiyouga	70.9k	8.7k	Python	953	Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)100多个LLM和VLM的统一高效微调（ACL 2024）	2026-05-03
2	Open-Assistant LAION-AI	37.4k	3.3k	Python	227	OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.OpenAssistant 是一个基于聊天的助手，它可以理解任务，可以与第三方系统交互，并动态检索信息来执行此操作。	2024-08-17
3	LLMSurvey RUCAIBox	12.2k	941	Python	24	The official GitHub page for the survey paper "A Survey of Large Language Models".Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-03-11
4	OpenRLHF OpenRLHF	9.4k	934	Python	295	An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)基于 Ray 的易于使用、可扩展且高性能的 Agentic RL 框架（PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL）	2026-05-05
5	PaLM-rlhf-pytorch lucidrains	7.9k	679	Python	17	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLMImplementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture.基本上是 ChatGPT，但使用 PaLM	2025-10-11
6	InternLM InternLM	7.2k	508	Python	8	Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-10-30
7	Chinese-LLaMA-Alpaca-2 ymcui	7.1k	567	Python	1	中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)	2026-04-19
8	alignment-handbook huggingface	5.6k	488	Python	92	Robust recipes to align language models with human and AI preferencesError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-04-08
9	MedicalGPT shibing624	5.3k	742	Python	21	MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。	2026-04-28
10	OpenClaw-RL Gen-Verse	5.2k	561	Python	47	OpenClaw-RL: Train any agent simply by talkingError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-04-30
11	argilla argilla-io	5.0k	484	Python	2	Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasetsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-04-27
12	transformerlab-app transformerlab	4.9k	510	Python	23	The open source research environment for AI researchers to seamlessly train, evaluate, and scale models from local hardware to GPU clusters.供 AI 研究人员无缝训练、评估和扩展从本地硬件到 GPU 集群的模型的开源研究环境。	2026-05-05
13	Kiln Kiln-AI	4.8k	361	Python	22	Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-05-06
14	trlx CarperAI	4.7k	484	Python	86	A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)通过人类反馈（RLHF）进行强化学习的语言模型分布式训练的存储库	2024-01-08
15	align-anything PKU-Alignment	4.7k	506	Python	29	Align Anything: Training All-modality Model with Feedback对齐一切：通过反馈训练全模态模型	2025-11-27
16	awesome-RLHF opendilab	4.4k	252	N/A	0	A curated list of reinforcement learning with human feedback resources (continually updated)带有人类反馈资源的强化学习精选列表（持续更新）	2025-12-09
17	ChatGLM-Efficient-Tuning hiyouga	3.7k	464	Python	0	Fine-tuning ChatGLM-6B with PEFT \| 基于 PEFT 的高效 ChatGLM 微调	2023-10-12
18	docta Docta-ai	3.5k	256	Python	0	A Doctor for your data您的数据医生	2025-01-14
19	distilabel argilla-io	3.2k	243	Python	79	Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.Distilabel 是一个合成数据和人工智能反馈框架，适用于需要基于经过验证的研究论文的快速、可靠和可扩展管道的工程师。	2026-04-27
20	ROLL alibaba	3.1k	275	Python	87	An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language ModelsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-05-05
21	Cortex qibin0506	2.6k	209	Python	8	从零构建大模型：从预训练到RLHF的完整实践	2026-03-19
22	transformers_tasks HarderThenHarder	2.4k	401	Jupyter Notebook	59	⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.⭐️ 带有 Transformer lib 的 NLP 算法。 Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.	2023-09-29
23	alpaca_eval tatsu-lab	2.0k	308	Jupyter Notebook	19	An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.An automatic evaluator for instruction-following language models.经过人工验证、高质量、便宜且快速。	2025-08-09
24	rlhf-book natolambert	1.9k	187	Python	4	Textbook on reinforcement learning from human feedbackError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-05-05
25	hh-rlhf anthropics	1.8k	158	N/A	0	Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-06-17
26	ChatLM-mini-Chinese charent	1.7k	195	Python	9	中文对话0.2B小模型（ChatLM-Chinese-0.2B），开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调，给出三元组信息抽取微调示例。	2024-04-20
27	ImageReward zai-org	1.7k	92	Python	58	[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation[NeurIPS 2023] ImageReward：学习和评估人类对文本到图像生成的偏好	2025-10-29
28	WebGLM THUDM	1.6k	134	Python	51	WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)WebGLM：高效的网络增强问答系统 (KDD 2023)	2025-03-25
29	safe-rlhf PKU-Alignment	1.6k	132	Python	16	Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human FeedbackError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-11-24
30	RLHF-Reward-Modeling RLHFlow	1.5k	109	Python	19	Recipes to train reward model for RLHF.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-04-24
31	MOSS-RLHF OpenLMLab	1.4k	105	Python	39	Secrets of RLHF in Large Language Models Part I: PPOError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2024-03-03
32	AgentsMeetRL thinkwee	1.2k	48	HTML	0	Awesome List for Agentic RLAgentic RL 的精彩列表	2026-04-28
33	xtreme1 xtreme1-io	1.2k	204	TypeScript	41	Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.Xtreme1 是一款用于多模态数据训练的一体化数据标记和注释平台，支持 3D LiDAR 点云、图像和 LLM。	2025-07-15
34	pyre-code whwangovo	972	84	Python	0	A self-hosted ML coding practice platform. 68 problems from ReLU to flow matching — attention, training, RLHF, diffusion, and more. Instant feedback in the browser.一个自托管的机器学习编码练习平台。 68 problems from ReLU to flow matching — attention, training, RLHF, diffusion, and more.浏览器中的即时反馈。	2026-05-04
35	SimPO princeton-nlp	954	76	Python	25	[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward[NeurIPS 2024] SimPO：具有无参考奖励的简单偏好优化	2025-02-16
36	HALOs ContextualAI	904	51	Python	7	A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).具有 DPO、KTO、PPO、ORPO 和其他人类感知损失函数 (HALO) 的可扩展实现的库。	2025-09-30
37	alpaca_farm tatsu-lab	845	64	Python	6	A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data. Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2024-07-01
38	AlignLLMHumanSurvey GaryYufei	741	30	N/A	0	Aligning Large Language Models with Human: A Survey使大型语言模型与人类保持一致：一项调查	2023-09-11
39	reward-bench allenai	713	96	Python	1	RewardBench: the first evaluation tool for reward models.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-02-16
40	AI-Compass tingaicompass	713	88	Python	1	“AI-Compass”将为社区指引在 AI 技术海洋中航行的方向，无论你是初学者还是进阶开发者，都能在这里找到通往 AI 各大方向的路径。旨在帮助开发者系统性地了解 AI 的核心概念、主流技术、前沿趋势，并通过实践掌握从理论到落地的全过程。	2026-04-30
41	diy-llm datawhalechina	699	78	Jupyter Notebook	2	🎓 系统性大语言模型构建课程｜🛠️ 覆盖预训练数据工程、Tokenizer、Transformer、MoE、GPU 编程 (CUDA/Triton)、分布式训练、Scaling Laws、推理优化及对齐 (SFT/RLHF/GRPO)｜🚀 6 个渐进式作业 + 代码驱动，建立 LLM 全栈认知体系	2026-04-24
42	Cornucopia-LLaMA-Fin-Chinese jerry1993-tech	657	68	Python	17	聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)	2023-06-30
43	oat sail-sg	652	64	Python	6	🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-01-29
44	LLamaTuner jianzhnie	621	64	Python	18	Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.	2025-01-24
45	Trinity-RFT agentscope-ai	619	66	Python	31	Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).Trinity-RFT 是一个通用、灵活且可扩展的框架，专为大语言模型 (LLM) 的强化微调 (RFT) 而设计。	2026-04-28
46	hands-on-modern-rl walkinglabs	590	32	Python	1	🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems. 🚀 一个开源的实践课程，弥合了从基本 RL 概念到 LLM 对齐、RLVR 和高级 Agentic 系统的差距。	2026-05-05
47	SPPO uclaml	587	48	Python	14	The official implementation of Self-Play Preference Optimization (SPPO)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-01-23
48	OpenJudge agentscope-ai	587	48	Python	9	OpenJudge: A Unified Framework for Holistic Evaluation and Quality RewardsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-04-30
49	TextRL voidful	564	61	Python	3	Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-04-23
50	Online-RLHF RLHFlow	544	48	Python	12	A recipe for online RLHF and online iterative DPO.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2024-12-28
51	dLLM-RL Gen-Verse	499	40	Python	24	[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for Diffusion LLMs, powering the SOTA TraDo series.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-01-28
52	Open-AgentRL Gen-Verse	483	50	Python	6	[ICML 2026] RLAnything & DemyAgent: General and scalable agentic RL algorithms across terminal, GUI, SWE, and tool-call settings[ICML 2026] RLAnything 和 DemyAgent：跨终端、GUI、SWE 和工具调用设置的通用且可扩展的代理 RL 算法	2026-02-27
53	step_into_llm mindspore-lab	478	127	Jupyter Notebook	27	MindSpore online courses: Step into LLMError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-12-22
54	LLM-Algorithm-Intern-Guide Junvate	474	12	N/A	0	🚀 2026届大模型算法岗实习面经 \| 包含 DeepSeek/Qwen 技术报告解析、手撕 PPO/RoPE/Transformer、RLHF 核心与八股文 \| 持续更新中...	2026-03-28
55	LaMDA-rlhf-pytorch conceptofmind	469	73	Python	6	Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2024-02-24
56	LLM-RLHF-Tuning Joyce94	454	24	Python	3	LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA) Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2023-10-11
57	ABC-GRPO chi2liu	438	43	Python	0	Code For Adaptive-Boundary-Clipping GRPO. arxiv.org/pdf/2601.03895	2026-03-26
58	pykoi CambioML	411	45	Jupyter Notebook	2	pykoi: Active learning in one unified interfacepykoi：在一个统一的界面中进行主动学习	2025-09-24
59	VisionReward zai-org	400	13	Python	19	[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation[AAAI 2026] VisionReward：用于图像和视频生成的细粒度多维人类偏好学习	2025-03-26
60	LLaVA-RLHF llava-rlhf	395	31	Python	4	Aligning LMMs with Factually Augmented RLHF将 LMM 与事实增强的 RLHF 结合起来	2023-11-01
61	awesome-llm-human-preference-datasets glgh	390	18	N/A	0	A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.用于 LLM 微调、RLHF 和评估的人类偏好数据集的精选列表。	2023-10-04
62	quick-start-guide-to-llms sinanuozdemir	380	209	Jupyter Notebook	1	The Official Repo for "Quick Start Guide to Large Language Models"“大型语言模型快速入门指南”的官方存储库	2025-10-07
63	mLoRA TUDB-Labs	376	66	Python	12	An Efficient "Factory" to Build Multiple LoRA AdaptersError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-02-13
64	mlx-lm-lora Goekdeniz-Guelmez	368	44	Python	1	Train Large Language Models on MLX.在 MLX 上训练大型语言模型。	2026-04-23
65	Stable-Alignment agi-templar	355	18	Python	4	Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2023-06-18
66	MedQA-ChatGLM WangRongsheng	338	50	Python	3	🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调，我们的眼光不止于医疗问答	2023-09-02
67	ReaLHF openpsi-project	336	22	Python	0	Super-Efficient RLHF Training of LLMs with Parameter Reallocation通过参数重新分配对 LLM 进行超高效 RLHF 训练	2025-04-24
68	VADER mihirp1998	314	15	Python	11	Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc. 通过奖励梯度进行视频扩散对齐。我们通过使用 HPS、PickScore、VideoMAE、VJEPA、YOLO、Aesthetics 等各种奖励模型进行微调，改进了各种视频扩散模型，例如 VideoCrafter、OpenSora、ModelScope 和 StableVideoDiffusion。	2025-03-12
69	RLHF-V RLHF-V	307	9	Python	2	[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback[CVPR'24] RLHF-V：通过细粒度矫正人类反馈的行为调整，迈向值得信赖的 MLLM	2024-09-11
70	RLLoggingBoard HarderThenHarder	295	9	Python	0	A visuailzation tool to make deep understaning and easier debugging for RLHF training.一种可视化工具，可深入理解 RLHF 训练并更轻松地进行调试。	2025-02-20
71	JarvisEvo LYL1015	294	8	Python	1	[CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator OptimizationError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-02-22
72	RLHF sunzeyeah	287	35	Python	3	Implementation of Chinese ChatGPTError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2023-11-20
73	FineGrainedRLHF allenai	284	24	Python	2		2025-01-06
74	Open-R1 jianzhnie	277	54	Python	0	The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1	2025-03-10
75	RLHF-Label-Tool SupritYoung	255	21	Python	2	用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.用于大模型 RLHF 进行人工数据标注排序的工具。 A tool for manual response data annotation sorting in RLHF stage.	2023-08-01
76	RLHF_in_notebooks ash80	245	29	Jupyter Notebook	0	RLHF (Supervised fine-tuning, reward model, and PPO) step-by-step in 3 Jupyter notebooksError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-06-20
77	llama-trl jasonvanf	239	24	Python	7	LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRAError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-08-17
78	unsloth-buddy TYH-labs	234	13	Python	0	Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Part of the Gaslamp AI platform.Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Gaslamp AI 平台的一部分。	2026-05-05
79	chain-of-hindsight haoliuhl	229	17	Python	3	Simple next-token-prediction for RLHFError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2023-09-30
80	RLHF HumanSignal	226	45	Jupyter Notebook	3	Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models关于如何收集数据和构建端到端 RLHF 系统以微调生成式 AI 模型的链接、教程和最佳实践的集合	2023-07-24
81	minChatGPT ethanyanjiali	226	35	Python	3	A minimum example of aligning language models with RLHF similar to ChatGPTError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2023-09-26
82	LLaVA-MoD shufangxun	227	16	Python	3	[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge DistillationError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2025-03-31
83	Vicuna-LoRA-RLHF-PyTorch jackaduma	221	18	Python	15	A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna在消费类硬件上使用 LoRA 和 RLHF 微调 Vicuna LLM 的完整流程。在 Vicuna 架构之上实现 RLHF（带有人类反馈的强化学习）。 Basically ChatGPT but with Vicuna	2024-05-20
84	ecoalign-forge dengxianghua888-ops	206	12	Python	0	Multi-Agent DPO Data Synthesis Factory — 多智能体偏好训练数据自动合成框架 \| 红队攻击 → 多persona审核 → 终审裁决 → DPO偏好对	2026-04-11
85	IterComp YangLing0818	205	11	Python	5	[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation[ICLR 2025] IterComp：从模型库中进行迭代组合感知反馈学习，以生成文本到图像	2025-02-19
86	ReMax liziniu	202	15	Python	1	Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)论文代码（ReMax：一种简单、高效、有效的对齐大型语言模型的强化学习方法）	2023-12-16
87	awesome-RLAIF mengdi-li	202	7	N/A	0	A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF) 不断更新的关于人工智能反馈强化学习 (RLAIF) 的文献列表	2025-08-06
88	MM-RLHF Kwai-YuanQi	200	9	Python	5	The Next Step Forward in Multimodal LLM Alignment多模式法学硕士联盟的下一步	2025-05-01
89	VL-RLHF TideDra	199	8	Python	11	A RLHF Infrastructure for Vision-Language ModelsError 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2024-11-15
90	ChatGLM-RLHF Miraclemarvel55	198	26	Python	3	对ChatGLM直接使用RLHF提升或降低目标输出概率\|Modify ChatGLM output with only RLHF	2023-05-23
91	lm-human-preference-details vwxyzjn	197	12	Python	0	RLHF implementation details of OAI's 2019 codebaseOAI 2019 代码库的 RLHF 实施细节	2024-01-14
92	aligner PKU-Alignment	193	10	Python	0	[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct【NeurIPS 2024 Oral】Aligner：通过学习矫正实现高效对准	2025-01-16
93	LLM-RLHF-Tuning-with-PPO-and-DPO raghavc	190	19	Python	2	Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2026-02-24
94	AIDoctor Jerry-XDL	188	16	Python	0	AIDoctor training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preferenc…AIDoctor 使用 ChatGPT 训练管道训练医学 GPT 模型，实施预训练、监督微调、RLHF（奖励建模和强化学习）和 DPO（直接偏好）...	2025-03-11
95	pretraining-with-human-feedback tomekkorbak	181	14	Python	6	Code accompanying the paper Pretraining Language Models with Human Preferences论文《根据人类偏好预训练语言模型》随附的代码	2024-02-13
96	beavertails PKU-Alignment	181	6	Makefile	3	BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).BeaverTails 是一个数据集集合，旨在促进大型语言模型 (LLM) 中的安全对齐研究。	2023-10-27
97	nanoRLHF hyunwoongko	180	16	Python	0	nanoRLHF: from-scratch journey into how LLMs and RLHF really work.nanoRLHF：从头开始了解法学硕士和 RLHF 的真正运作方式。	2026-01-23
98	instructGOOSE xrsrke	174	21	Jupyter Notebook	3	Implementation of Reinforcement Learning from Human Feedback (RLHF)Error 500 (Server Error)!!1500.That’s an error.There was an error. Please try again later.That’s all we know.	2023-04-07
99	llama-qrlhf lucidrains	170	8	Python	0	Implementation of the Llama architecture with RLHF + Q-learning使用 RLHF + Q-learning 实现 Llama 架构	2025-02-01
100	notus argilla-io	168	14	Python	1	Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approachNotus 是使用 SFT、DPO、SFT+DPO 和/或任何其他 RLHF 技术进行微调的 LLM 的集合，同时始终保持数据优先的方法	2024-01-15

No repositories match your search 没有匹配的仓库