A comprehensive collection of papers and presentation slides on LLM agents, reasoning, and AI systems.
Sources:
- Papers: 88 PDFs (organized in 12 folders)
- Slides: 93 presentation decks (~504 MB)
- Topics: 15 categories
- Audio Overviews: See NOTEBOOKLM_LINKS.md for AI-generated podcast summaries
- Resources: See DEEP_RL_RESOURCES.md for comprehensive RL learning materials
papers/
โโโ agent-frameworks/ # 10 papers - ReAct, AutoGen, DSPy, etc.
โโโ benchmarks/ # 6 papers - SWE-bench, WorkArena, evals
โโโ computer-use/ # 5 papers - OSWorld, DigiRL, SWE-agent
โโโ memory-rag/ # 3 papers - HippoRAG, retrieval systems
โโโ multi-agent/ # 2 papers - AgentNet, MasRouter
โโโ planning/ # 5 papers - Tree search, optimization
โโโ reasoning/ # 9 papers - Chain-of-thought, reasoning
โโโ rl-finetuning/ # 16 papers - DeepSeek R1, GRPO, DPO
โโโ robotics/ # 6 papers - Eureka, Voyager, GR00T
โโโ security/ # 10 papers - Prompt injection, red-teaming
โโโ theorem-proving/ # 9 papers - LeanDojo, AlphaGeometry
โโโ web-agents/ # 7 papers - WebArena, Mind2Web
slides/ # 92 presentation decks (504 MB)
- Inference-Time Techniques
- Post-Training & Alignment
- Memory & Planning
- Agent Frameworks
- Code Generation & Software Agents
- Web & Multimodal Agents
- Enterprise & Workflow Agents
- Mathematics & Theorem Proving
- Robotics & Embodied Agents
- Scientific Discovery
- Safety & Security
- Evaluation & Benchmarking
- Neural & Symbolic Reasoning
- Agentic Reasoning & RL Fine-Tuning
- Agentic Architectures & Coordination
- Deep Reinforcement Learning
| Paper | Slides | Code | Media |
|---|---|---|---|
| Direct Preference Optimization (DPO) | DPO CMU, DPO UT Austin, DPO Toronto, DPO Jinen | GitHub | ๐จ ๐ผ๏ธ |
| Iterative Reasoning Preference Optimization | - | - | ๐จ ๐ผ๏ธ |
| Chain-of-Verification Reduces Hallucination | - | GitHub (unofficial) | ๐จ ๐ผ๏ธ |
| Unpacking DPO and PPO | DPO Slides | GitHub | ๐ผ๏ธ |
| RLHF Background | RLHF UT Austin | - | - |
| Paper | Slides | Code | Media |
|---|---|---|---|
| ReAct: Synergizing Reasoning and Acting | ReAct UVA Lecture | GitHub | ๐จ ๐ผ๏ธ ๐ง |
| AutoGen: Multi-Agent Conversation | - | GitHub | ๐จ ๐ผ๏ธ ๐ง |
| StateFlow: Enhancing LLM Task-Solving | - | GitHub | ๐ผ๏ธ ๐ง |
| DSPy: Compiling Declarative Language Model | - | GitHub | ๐จ ๐ผ๏ธ ๐ง |
| LLM Agents Tutorials | EMNLP 2024 Tutorial, WWW 2024 Tutorial, Berkeley Training Agents | - | - |
| Paper | Slides | Code | Media |
|---|---|---|---|
| Paper2Agent: Research Papers as AI Agents | - | GitHub | ๐ผ๏ธ ๐ง |
| OpenScholar: Synthesizing Scientific Literature | - | GitHub | ๐ผ๏ธ |
| Paper | Slides | Code | Media |
|---|---|---|---|
| Survey: Evaluation LLM-based Agents | AgentBench Multi-Turn NeurIPS | - | ๐ผ๏ธ ๐ง |
| Adding Error Bars to Evals | - | GitHub | ๐ผ๏ธ ๐ง |
| Tau2-Bench: Conversational Agents Dual-Control | - | GitHub | ๐ผ๏ธ ๐ง |
| Data Science Agents | Data Science Agents Benchmark | - | - |
Source: redhat-et/agentic-reasoning-reinforcement-fine-tuning
| Paper | Slides | Code | Media |
|---|---|---|---|
| WebAgent-R1: Multi-Turn RL for Web Agents | - | GitHub | - |
| ARTIST: Agentic Reasoning & Tool Integration | ARTIST Microsoft | GitHub | ๐จ ๐ผ๏ธ ๐ง |
Papers on multi-agent systems, decentralized coordination, and agentic frameworks
| Paper | Slides | Code | Media |
|---|---|---|---|
| AgentNet: Decentralized Multi-Agent Coordination | - | GitHub | ๐จ ๐ผ๏ธ |
| MasRouter: Multi-Agent Routing | MasRouter ACL 2025 | GitHub | ๐จ ๐ผ๏ธ |
| Multi-Agent RL Overview | Edinburgh MARL Intro | - | - |
| Paper | Slides | Code | Media |
|---|---|---|---|
| FireAct: Language Agent Fine-tuning | LLM Agents Tool Learning | GitHub | ๐ผ๏ธ ๐ง |
| DeepSeek Janus Pro: Multimodal | - | GitHub | ๐จ ๐ผ๏ธ |
| PTA-GRPO: High-Level Planning | PTA-GRPO Planning | - | - |
| Stanford RL for Agents | Stanford RL Agents 2025 | - | - |
| CMU LM Agents | CMU Language Models as Agents | - | - |
| Mannheim Tool Use | Mannheim LLM Agents Tool Use | - | - |
| Resource | Description | Code |
|---|---|---|
| Intel AI Agents Architecture | AI agents resource guide | - |
| Cisco Agentic Frameworks | Overview of agentic frameworks | - |
See Full Deep RL Resources Guide - Comprehensive collection with 100+ resources and 92 slides
| Paper | arXiv | Slides | Code | Media |
|---|---|---|---|---|
| Playing Atari with Deep RL (DQN) | 1312.5602 | CMU, CVUT, NTHU, Waterloo | OpenAI Baselines | - |
| Deep RL with Double Q-learning | 1509.06461 | CMU DQN | OpenAI Baselines | - |
| Dueling Network Architectures | 1511.06581 | Buffalo | OpenAI Baselines | - |
| Prioritized Experience Replay | 1511.05952 | Buffalo, Julien Vitay, ICML 2020 | OpenAI Baselines | - |
| Rainbow: Combining Improvements | 1710.02298 | Prague, Berkeley, Wisconsin | Dopamine | - |
| Paper | arXiv | Slides | Code | Media |
|---|---|---|---|---|
| Policy Gradient Methods | - | Toronto, Berkeley CS285, REINFORCE Stanford | Stable-Baselines3 | - |
| Proximal Policy Optimization (PPO) | 1707.06347 | Waterloo, NTU Taiwan | OpenAI Baselines | - |
| Trust Region Policy Optimization (TRPO) | 1502.05477 | FAU, UT Austin, CMU Natural PG, Toronto PAIR | OpenAI Baselines | - |
| High-Dimensional Continuous Control (GAE) | 1506.02438 | Berkeley CS285 | OpenAI Baselines | - |
| Paper | arXiv | Slides | Code | Media |
|---|---|---|---|---|
| Asynchronous Methods (A3C) | 1602.01783 | WPI, Buffalo, NTU, UIUC, Julien Vitay | OpenAI Baselines | - |
| Continuous Control (DDPG) | 1509.02971 | Paderborn, FAU, Julien Vitay, Buffalo | Stable-Baselines3 | - |
| Addressing Function Approximation (TD3) | 1802.09477 | Prague | Stable-Baselines3 | - |
| Soft Actor-Critic (SAC) | 1801.01290 | Toronto PAIR, Purdue, Stanford CS231n, Prague | Stable-Baselines3 | - |
| Paper | arXiv | Slides | Code | Media |
|---|---|---|---|---|
| TD Learning Fundamentals | - | CMU, Michigan, Sutton & Barto | - | - |
| Q-Learning | - | Northeastern, CMU TD | - | - |
| Paper | arXiv | Slides | Code | Media |
|---|---|---|---|---|
| Model-Based RL | - | FAU, Toronto, Berkeley, CMU | MBRL-Lib | - |
| Paper | arXiv | Slides | Code | Media |
|---|---|---|---|---|
| Imitation Learning | - | WPI, EPFL | imitation | - |
| Inverse Reinforcement Learning | - | TU Darmstadt, Berkeley CS285 | imitation | - |
| Topic | Slides |
|---|---|
| Deep RL Introduction | Berkeley CS294, Berkeley 2017 |
| Tool | Link | Description |
|---|---|---|
| OpenAI Gym | GitHub | RL environments |
| Gymnasium | GitHub | Maintained fork of Gym |
| Stable-Baselines3 | GitHub | RL algorithms in PyTorch |
| Unity ML-Agents | GitHub | 3D environments |
| PyTorch | pytorch.org | Deep learning framework |
| Google Dopamine | GitHub | RL research framework |
| CleanRL | GitHub | Single-file RL implementations |
| RLlib | GitHub | Scalable RL library |
View all 100+ resources in DEEP_RL_RESOURCES.md
- Start with WWW 2024 LLM Agents Tutorial - comprehensive overview
- Read ReAct paper + slides + code
- Study Chain-of-Thought with CoT Princeton Lecture
- Software Agents (Neubig) for code agents + SWE-agent code
- DPO CMU Lecture for alignment + DPO code
- Multimodal Agents Berkeley for web agents + WebArena code
- LeanDojo slides for theorem proving + code
- HippoRAG NeurIPS for memory systems + code
- Prompt Injection Duke for security
- DeepSeek-R1 paper + DeepSeek R1 CMU slides + code
- DeepSeekMath GRPO + Stanford RL for Reasoning + code
- ARTIST paper for agentic reasoning with tools
Papers are property of their respective authors. This collection is for educational purposes.