Skip to content

hscspring/rl-llm-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

RL-LLM-NLP

This repository encompasses libraries and papers on Reinforcement Learning (RL) within Large Language Models (LLM) and Natural Language Processing (NLP).

I consider RL to be a pivotal technology in the field of AI, and NLP (particularly LLM) to be a direction well worth exploring.

Library

GitHub From Year Desc
PRIME PRIME-RL 2025 Scalable RL solution for the advanced reasoning of language models
rStar MicroSoft 2025
veRL Bytedance 2024 Volcano Engine Reinforcement Learning for LLM
trl HuggingFace 2024 Train LM with RL
RL4LMs Allen 2023 RL library to fine-tune LM to human preferences
alignment-handbook huggingface 2023 Robust recipes to align language models with human and AI preferences

Paper

Cate Abbr Title From Year Link
RL MRT Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Carnegie Mellon 2025 paper, GitHub
RL L1, LCPO L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning Carnegie Mellon 2025 paper, GitHub
RL Online-DPO-R1 Online-DPO-R1: Unlocking Effective Reasoning Without the PPO Overhead Salesforce AI Research 2025 paper, GitHub
RL orz Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model StepFun 2025 paper, GitHub
RL OREAL Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning InternLM 2025 paper, GitHub
RL R1 DeepSeek-R1 DeepSeek 2025 paper, ①
o1 Sky-T1 Sky-T1: Train your own O1 preview model within $450 NovaSky-AI 2025 GitHub
o1 STILL A series of technical report on Slow Thinking with LLM RUCAIBox 2025 GitHub
RL Scaling LIMR LIMR: Less is More for RL Scaling GAIR-NLP 2025 paper, GitHub
RL Scaling DeepScaleR DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL Agentica 2025 paper, GitHub
RL Scaling ScalingLaw Value-Based Deep RL Scales Predictably Berkeley 2025 paper
SLM PRIME Process Reinforcement through Implicit Rewards PRIME-RL 2025 paper, GitHub
SLM rStar-Math rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking MicroSoft 2025 paper, GitHub
SLM rStar rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers MicroSoft 2024 paper, GitHub
Unlearn A Closer Look at Machine Unlearning for Large Language Models Sea AI 2024 paper, GitHub
Unlearn Quark Quark: Controllable Text Generation with Reinforced [Un]learning Allen 2022 paper, GitHub
Align ReMax ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models CUHK 2024 paper
Align A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More Salesforce 2024 paper
Align Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Allen 2024 paper, GitHub
Align Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Capital One 2024 paper
Align RLHF Training language models to follow instructions with human feedback OpenAI 2022 paper
Align NLPO Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization Allen 2022 paper, GitHub
Align Fine-Tuning Language Models from Human Preferences OpenAI 2020 paper, GitHub
Align RLOO Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Cohere 2024 paper
Policy Dr. DAPO Understanding R1-Zero-Like Training: A Critical Perspective Sea AI Lab 2025 paper, GitHub
Policy DAPO DAPO: An Open-Source LLM Reinforcement Learning System at Scale ByteDance Seed 2025 paper, GitHub
Policy GRPO DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models DeepSeek 2024 paper
Policy DPO Direct Preference Optimization: Your Language Model is Secretly a Reward Model Stanford 2024 paper
Policy Decision Transformer: Reinforcement Learning via Sequence Modeling Berkeley 2021 paper, GitHub
Policy PPO Proximal Policy Optimization Algorithms OpenAI 2017 paper
Policy REINFORCE multi-sample Buy 4 Reinforce Samples, Get a Baseline for Free! University of Amsterdam 2019 paper
Policy REINFORCE Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning Northeastern University 1992 paper

Appendix

About

Reinforcement Learning in LLM and NLP.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published