Awesome Agents Papers Collection

A comprehensive collection of papers and presentation slides on LLM agents, reasoning, and AI systems.

Sources:

arvindcr4/awesome-agents

redhat-et/agentic-reasoning-reinforcement-fine-tuning

Quick Stats

Papers: 88 PDFs (organized in 12 folders)
Slides: 93 presentation decks (~504 MB)
Topics: 15 categories
Audio Overviews: See NOTEBOOKLM_LINKS.md for AI-generated podcast summaries
Resources: See DEEP_RL_RESOURCES.md for comprehensive RL learning materials

Folder Structure

papers/
├── agent-frameworks/    # 10 papers - ReAct, AutoGen, DSPy, etc.
├── benchmarks/          #  6 papers - SWE-bench, WorkArena, evals
├── computer-use/        #  5 papers - OSWorld, DigiRL, SWE-agent
├── memory-rag/          #  3 papers - HippoRAG, retrieval systems
├── multi-agent/         #  2 papers - AgentNet, MasRouter
├── planning/            #  5 papers - Tree search, optimization
├── reasoning/           #  9 papers - Chain-of-thought, reasoning
├── rl-finetuning/       # 16 papers - DeepSeek R1, GRPO, DPO
├── robotics/            #  6 papers - Eureka, Voyager, GR00T
├── security/            # 10 papers - Prompt injection, red-teaming
├── theorem-proving/     #  9 papers - LeanDojo, AlphaGeometry
└── web-agents/          #  7 papers - WebArena, Mind2Web

slides/                  # 92 presentation decks (504 MB)

Inference-Time Techniques
Post-Training & Alignment
Memory & Planning
Agent Frameworks
Code Generation & Software Agents
Web & Multimodal Agents
Enterprise & Workflow Agents
Mathematics & Theorem Proving
Robotics & Embodied Agents
Scientific Discovery
Safety & Security
Evaluation & Benchmarking
Neural & Symbolic Reasoning
Agentic Reasoning & RL Fine-Tuning
Agentic Architectures & Coordination
Deep Reinforcement Learning

Inference-Time Techniques

Paper	Slides	Code	Media
Large Language Models as Optimizers	CS839 Prompting II	GitHub	🖼️
Large Language Models Cannot Self-Correct Reasoning Yet	-	-	🎨 🖼️
Teaching Large Language Models to Self-Debug	-	-	🎨 🖼️ 🎧
Chain-of-Thought Reasoning Without Prompting	CoT Princeton Lecture, CoT Toronto, CoT SJTU, CoT Interpretable ML, Concise CoT	GitHub (unofficial)	🎨 🖼️
Premise Order Matters in Reasoning with LLMs	-	-	🎨 🖼️
Chain-of-Thought Empowers Transformers	CoT Slides	-	🎨 🖼️

Post-Training & Alignment

Paper	Slides	Code	Media
Direct Preference Optimization (DPO)	DPO CMU, DPO UT Austin, DPO Toronto, DPO Jinen	GitHub	🎨 🖼️
Iterative Reasoning Preference Optimization	-	-	🎨 🖼️
Chain-of-Verification Reduces Hallucination	-	GitHub (unofficial)	🎨 🖼️
Unpacking DPO and PPO	DPO Slides	GitHub	🖼️
RLHF Background	RLHF UT Austin	-	-

Memory & Planning

Paper	Slides	Code	Media
Grokked Transformers are Implicit Reasoners	-	GitHub	🎨 🖼️
HippoRAG: Neurobiologically Inspired Long-Term Memory	HippoRAG NeurIPS	GitHub	🎨 🖼️
Is Your LLM Secretly a World Model of the Internet	-	GitHub	🖼️
Tree Search for Language Model Agents	-	GitHub	🖼️

Agent Frameworks

Paper	Slides	Code	Media
ReAct: Synergizing Reasoning and Acting	ReAct UVA Lecture	GitHub	🎨 🖼️ 🎧
AutoGen: Multi-Agent Conversation	-	GitHub	🎨 🖼️ 🎧
StateFlow: Enhancing LLM Task-Solving	-	GitHub	🖼️ 🎧
DSPy: Compiling Declarative Language Model	-	GitHub	🎨 🖼️ 🎧
LLM Agents Tutorials	EMNLP 2024 Tutorial, WWW 2024 Tutorial, Berkeley Training Agents	-	-

Code Generation & Software Agents

Paper	Slides	Code	Media
SWE-agent: Agent-Computer Interfaces	Software Agents (Neubig)	GitHub	🎨 🖼️
OpenHands: AI Software Developers	Software Agents (Neubig)	GitHub	🖼️ 🎧
Interactive Tools Assist LM Agents Security Vulnerabilities	Code Agents & Vulnerability Detection	GitHub	-
Big Sleep: LLM Vulnerabilities Real-World	Code Agents & Vulnerability Detection	-	-
SWE-bench Verified	-	GitHub	🖼️ 🎧

Web & Multimodal Agents

Paper	Slides	Code	Media
WebShop: Scalable Real-World Web Interaction	Multimodal Agents Berkeley	GitHub	-
Mind2Web: Generalist Agent for the Web	Multimodal Agents Berkeley	GitHub	-
WebArena: Realistic Web Environment	Multimodal Agents Berkeley, Web Agent Evaluation	GitHub	-
VisualWebArena	Multimodal Agents Berkeley	GitHub	-
AGUVIS: Unified Pure Vision Agents GUI	-	GitHub	-
BrowseComp: Web Browsing Benchmark	-	GitHub	-

Enterprise & Workflow Agents

Paper	Slides	Code	Media
WorkArena: Common Knowledge Work Tasks	-	GitHub	🖼️ 🎧
WorkArena++: Compositional Planning	-	GitHub	🖼️ 🎧
TapeAgents: Holistic Framework Agent Development	TapeAgents Slides	GitHub	🖼️ 🎧

Mathematics & Theorem Proving

Paper	Slides	Code	Media
LeanDojo: Theorem Proving Retrieval-Augmented	LeanDojo AITP, LeanDojo NeurIPS, Theorem Proving ML	GitHub	🎨
Autoformalization with Large Language Models	-	-	🎨
Autoformalizing Euclidean Geometry	-	GitHub	🎨
Draft, Sketch and Prove: Formal Theorem Provers	Theorem Proving ML	GitHub	🎨
miniCTX: Neural Theorem Proving Long-Contexts	-	GitHub	🎨
Lean-STaR: Interleave Thinking and Proving	Berkeley Slides	GitHub Website	🎨
ImProver: Agent-Based Automated Proof Optimization	-	GitHub	🎨
In-Context Learning Agent Formal Theorem-Proving	-	GitHub	-
Symbolic Regression: Learned Concept Library	-	GitHub	🖼️
AlphaGeometry: Solving Olympiad Geometry	-	GitHub	-

Robotics & Embodied Agents

Paper	Slides	Code	Media
Voyager: Open-Ended Embodied Agent	Voyager UT Austin	GitHub	🎨
Eureka: Human-Level Reward Design	Eureka Paper/Slides	GitHub	🎨 🖼️
DrEureka: Language Model Guided Sim-To-Real	-	GitHub	🎨 🖼️
Gran Turismo: Deep Reinforcement Learning	-	-	🖼️
GR00T N1: Foundation Model Humanoid	-	GitHub	🎨 🖼️
SLAC: Simulation-Pretrained Latent Action	-	-	-

Scientific Discovery

Paper	Slides	Code	Media
Paper2Agent: Research Papers as AI Agents	-	GitHub	🖼️ 🎧
OpenScholar: Synthesizing Scientific Literature	-	GitHub	🖼️

Safety & Security

Paper	Slides	Code	Media
DataSentinel: Game-Theoretic Detection Prompt Injection	Prompt Injection Duke	GitHub	-
AgentPoison: Red-teaming LLM Agents	Prompt Injection Duke	GitHub	🎨
Progent: Programmable Privilege Control	-	-	-
DecodingTrust: Trustworthiness GPT Models	-	GitHub	-
Representation Engineering: AI Transparency	-	GitHub	-
Extracting Training Data from LLMs	-	-	-
The Secret Sharer: Unintended Memorization	-	-	-
Privtrans: Privilege Separation	-	-	-

Evaluation & Benchmarking

Paper	Slides	Code	Media
Survey: Evaluation LLM-based Agents	AgentBench Multi-Turn NeurIPS	-	🖼️ 🎧
Adding Error Bars to Evals	-	GitHub	🖼️ 🎧
Tau2-Bench: Conversational Agents Dual-Control	-	GitHub	🖼️ 🎧
Data Science Agents	Data Science Agents Benchmark	-	-

Neural & Symbolic Reasoning

Paper	Slides	Code	Media
Beyond A-Star: Better Planning Transformers	-	GitHub	🖼️
Dualformer: Controllable Fast and Slow Thinking	-	GitHub	🖼️
Composing Global Optimizers: Algebraic Objects	-	-	🖼️
SurCo: Learning Linear Surrogates	-	-	🖼️

Agentic Reasoning & RL Fine-Tuning

Source: redhat-et/agentic-reasoning-reinforcement-fine-tuning

DeepSeek R1 & Reasoning Models

Paper	Slides	Code	Media
DeepSeek-R1: Reasoning via RL	DeepSeek R1 Intro, DeepSeek R1 Toronto, DeepSeek R1 CMU, DeepSeek R1 Seoul	GitHub	🎨 🖼️
DeepSeek R1: Implications for AI	DeepSeek R1 Intro	-	🎨 🖼️
DeepSeek R1: Are Reasoning Models Faithful?	-	-	🎨 🖼️
OpenAI O1 Replication Journey	-	GitHub	🎨 🖼️
Qwen QwQ Reasoning Model	-	HuggingFace	🎨 🖼️
Sky-T1: Training Small Reasoning LLMs	-	GitHub	🖼️
s1: Simple Test-Time Scaling	-	GitHub	🖼️

GRPO & RL Fine-Tuning

Paper	Slides	Code	Media
DeepSeekMath: GRPO Algorithm	Stanford RL for Reasoning	GitHub	🎨 🖼️
Guided GRPO: Adaptive Guidance	PTA-GRPO Planning	GitHub	🖼️
R-Search: Multi-Step Reasoning	Stanford RL for Reasoning	GitHub	🖼️
RL Fine-tuning: Instruction Following	-	-	🖼️
RFT Powers Multimodal Reasoning	-	-	🖼️
STILL-2: Distilling Reasoning	-	-	🖼️

Agentic RL

Paper	Slides	Code	Media
WebAgent-R1: Multi-Turn RL for Web Agents	-	GitHub	-
ARTIST: Agentic Reasoning & Tool Integration	ARTIST Microsoft	GitHub	🎨 🖼️ 🎧

Agentic Architectures & Coordination

Papers on multi-agent systems, decentralized coordination, and agentic frameworks

Decentralized Multi-Agent Systems

Paper	Slides	Code	Media
AgentNet: Decentralized Multi-Agent Coordination	-	GitHub	🎨 🖼️
MasRouter: Multi-Agent Routing	MasRouter ACL 2025	GitHub	🎨 🖼️
Multi-Agent RL Overview	Edinburgh MARL Intro	-	-

Device & Computer Control

Paper	Slides	Code	Media
DigiRL: Device Control Agents	DigiRL NeurIPS 2024	GitHub	🖼️ 🎧
OSWorld: Multimodal Agents Benchmark	-	GitHub	🖼️
OS-Harm: Computer Use Safety	OS-Harm Benchmark	GitHub	🖼️ 🎧

Agent Fine-Tuning & Tool Use

Paper	Slides	Code	Media
FireAct: Language Agent Fine-tuning	LLM Agents Tool Learning	GitHub	🖼️ 🎧
DeepSeek Janus Pro: Multimodal	-	GitHub	🎨 🖼️
PTA-GRPO: High-Level Planning	PTA-GRPO Planning	-	-
Stanford RL for Agents	Stanford RL Agents 2025	-	-
CMU LM Agents	CMU Language Models as Agents	-	-
Mannheim Tool Use	Mannheim LLM Agents Tool Use	-	-

Enterprise & Industry Guides

Resource	Description	Code
Intel AI Agents Architecture	AI agents resource guide	-
Cisco Agentic Frameworks	Overview of agentic frameworks	-

Deep Reinforcement Learning

See Full Deep RL Resources Guide - Comprehensive collection with 100+ resources and 92 slides

Value-Based Methods (DQN Family)

Paper	arXiv	Slides	Code	Media
Playing Atari with Deep RL (DQN)	1312.5602	CMU, CVUT, NTHU, Waterloo	OpenAI Baselines	-
Deep RL with Double Q-learning	1509.06461	CMU DQN	OpenAI Baselines	-
Dueling Network Architectures	1511.06581	Buffalo	OpenAI Baselines	-
Prioritized Experience Replay	1511.05952	Buffalo, Julien Vitay, ICML 2020	OpenAI Baselines	-
Rainbow: Combining Improvements	1710.02298	Prague, Berkeley, Wisconsin	Dopamine	-

Policy Gradient Methods

Paper	arXiv	Slides	Code	Media
Policy Gradient Methods	-	Toronto, Berkeley CS285, REINFORCE Stanford	Stable-Baselines3	-
Proximal Policy Optimization (PPO)	1707.06347	Waterloo, NTU Taiwan	OpenAI Baselines	-
Trust Region Policy Optimization (TRPO)	1502.05477	FAU, UT Austin, CMU Natural PG, Toronto PAIR	OpenAI Baselines	-
High-Dimensional Continuous Control (GAE)	1506.02438	Berkeley CS285	OpenAI Baselines	-

Actor-Critic Methods

Paper	arXiv	Slides	Code	Media
Asynchronous Methods (A3C)	1602.01783	WPI, Buffalo, NTU, UIUC, Julien Vitay	OpenAI Baselines	-
Continuous Control (DDPG)	1509.02971	Paderborn, FAU, Julien Vitay, Buffalo	Stable-Baselines3	-
Addressing Function Approximation (TD3)	1802.09477	Prague	Stable-Baselines3	-
Soft Actor-Critic (SAC)	1801.01290	Toronto PAIR, Purdue, Stanford CS231n, Prague	Stable-Baselines3	-

Temporal Difference & Q-Learning

Paper	arXiv	Slides	Code	Media
TD Learning Fundamentals	-	CMU, Michigan, Sutton & Barto	-	-
Q-Learning	-	Northeastern, CMU TD	-	-

Model-Based RL

Paper	arXiv	Slides	Code	Media
Model-Based RL	-	FAU, Toronto, Berkeley, CMU	MBRL-Lib	-

Imitation & Inverse RL

Paper	arXiv	Slides	Code	Media
Imitation Learning	-	WPI, EPFL	imitation	-
Inverse Reinforcement Learning	-	TU Darmstadt, Berkeley CS285	imitation	-

Introductory Lectures

Topic	Slides
Deep RL Introduction	Berkeley CS294, Berkeley 2017

Frameworks & Tools

Tool	Link	Description
OpenAI Gym	GitHub	RL environments
Gymnasium	GitHub	Maintained fork of Gym
Stable-Baselines3	GitHub	RL algorithms in PyTorch
Unity ML-Agents	GitHub	3D environments
PyTorch	pytorch.org	Deep learning framework
Google Dopamine	GitHub	RL research framework
CleanRL	GitHub	Single-file RL implementations
RLlib	GitHub	Scalable RL library

View all 100+ resources in DEEP_RL_RESOURCES.md

Recommended Study Path

Beginner

Start with WWW 2024 LLM Agents Tutorial - comprehensive overview
Read ReAct paper + slides + code
Study Chain-of-Thought with CoT Princeton Lecture

Intermediate

Software Agents (Neubig) for code agents + SWE-agent code
DPO CMU Lecture for alignment + DPO code
Multimodal Agents Berkeley for web agents + WebArena code

Advanced

LeanDojo slides for theorem proving + code
HippoRAG NeurIPS for memory systems + code
Prompt Injection Duke for security

Reasoning & RL Fine-Tuning Path

DeepSeek-R1 paper + DeepSeek R1 CMU slides + code
DeepSeekMath GRPO + Stanford RL for Reasoning + code
ARTIST paper for agentic reasoning with tools

License

Papers are property of their respective authors. This collection is for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
diagrams		diagrams
diagrams_gemini		diagrams_gemini
media		media
notebooks		notebooks
reinforcement_learning_papers		reinforcement_learning_papers
slides		slides
.gitignore		.gitignore
.mailmap		.mailmap
DEEP_RL_RESOURCES.md		DEEP_RL_RESOURCES.md
NOTEBOOKLM_LINKS.md		NOTEBOOKLM_LINKS.md
README.md		README.md
rl-research-directions.md		rl-research-directions.md
rlagent-universe.md		rlagent-universe.md

arvindcr4/awesome_agents_papers

Folders and files

Latest commit

History

Repository files navigation

Awesome Agents Papers Collection

Quick Stats

Folder Structure

Table of Contents

Inference-Time Techniques

Post-Training & Alignment

Memory & Planning

Agent Frameworks

Code Generation & Software Agents

Web & Multimodal Agents

Enterprise & Workflow Agents

Mathematics & Theorem Proving

Robotics & Embodied Agents

Scientific Discovery

Safety & Security

Evaluation & Benchmarking

Neural & Symbolic Reasoning

Agentic Reasoning & RL Fine-Tuning

DeepSeek R1 & Reasoning Models

GRPO & RL Fine-Tuning

Agentic RL

Agentic Architectures & Coordination

Decentralized Multi-Agent Systems

Device & Computer Control

Agent Fine-Tuning & Tool Use

Enterprise & Industry Guides

Deep Reinforcement Learning

Value-Based Methods (DQN Family)

Policy Gradient Methods

Actor-Critic Methods

Temporal Difference & Q-Learning

Model-Based RL

Imitation & Inverse RL

Introductory Lectures

Frameworks & Tools

Recommended Study Path

Beginner

Intermediate

Advanced

Reasoning & RL Fine-Tuning Path

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages