Skip to content

arvindcr4/awesome_agents_papers

Repository files navigation

Awesome Agents Papers Collection

A comprehensive collection of papers and presentation slides on LLM agents, reasoning, and AI systems.

Sources:

Quick Stats

  • Papers: 88 PDFs (organized in 12 folders)
  • Slides: 93 presentation decks (~504 MB)
  • Topics: 15 categories
  • Audio Overviews: See NOTEBOOKLM_LINKS.md for AI-generated podcast summaries
  • Resources: See DEEP_RL_RESOURCES.md for comprehensive RL learning materials

Folder Structure

papers/
โ”œโ”€โ”€ agent-frameworks/    # 10 papers - ReAct, AutoGen, DSPy, etc.
โ”œโ”€โ”€ benchmarks/          #  6 papers - SWE-bench, WorkArena, evals
โ”œโ”€โ”€ computer-use/        #  5 papers - OSWorld, DigiRL, SWE-agent
โ”œโ”€โ”€ memory-rag/          #  3 papers - HippoRAG, retrieval systems
โ”œโ”€โ”€ multi-agent/         #  2 papers - AgentNet, MasRouter
โ”œโ”€โ”€ planning/            #  5 papers - Tree search, optimization
โ”œโ”€โ”€ reasoning/           #  9 papers - Chain-of-thought, reasoning
โ”œโ”€โ”€ rl-finetuning/       # 16 papers - DeepSeek R1, GRPO, DPO
โ”œโ”€โ”€ robotics/            #  6 papers - Eureka, Voyager, GR00T
โ”œโ”€โ”€ security/            # 10 papers - Prompt injection, red-teaming
โ”œโ”€โ”€ theorem-proving/     #  9 papers - LeanDojo, AlphaGeometry
โ””โ”€โ”€ web-agents/          #  7 papers - WebArena, Mind2Web

slides/                  # 92 presentation decks (504 MB)

Table of Contents


Inference-Time Techniques

Paper Slides Code Media
Large Language Models as Optimizers CS839 Prompting II GitHub ๐Ÿ–ผ๏ธ
Large Language Models Cannot Self-Correct Reasoning Yet - - ๐ŸŽจ ๐Ÿ–ผ๏ธ
Teaching Large Language Models to Self-Debug - - ๐ŸŽจ ๐Ÿ–ผ๏ธ ๐ŸŽง
Chain-of-Thought Reasoning Without Prompting CoT Princeton Lecture, CoT Toronto, CoT SJTU, CoT Interpretable ML, Concise CoT GitHub (unofficial) ๐ŸŽจ ๐Ÿ–ผ๏ธ
Premise Order Matters in Reasoning with LLMs - - ๐ŸŽจ ๐Ÿ–ผ๏ธ
Chain-of-Thought Empowers Transformers CoT Slides - ๐ŸŽจ ๐Ÿ–ผ๏ธ

Post-Training & Alignment

Paper Slides Code Media
Direct Preference Optimization (DPO) DPO CMU, DPO UT Austin, DPO Toronto, DPO Jinen GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
Iterative Reasoning Preference Optimization - - ๐ŸŽจ ๐Ÿ–ผ๏ธ
Chain-of-Verification Reduces Hallucination - GitHub (unofficial) ๐ŸŽจ ๐Ÿ–ผ๏ธ
Unpacking DPO and PPO DPO Slides GitHub ๐Ÿ–ผ๏ธ
RLHF Background RLHF UT Austin - -

Memory & Planning

Paper Slides Code Media
Grokked Transformers are Implicit Reasoners - GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
HippoRAG: Neurobiologically Inspired Long-Term Memory HippoRAG NeurIPS GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
Is Your LLM Secretly a World Model of the Internet - GitHub ๐Ÿ–ผ๏ธ
Tree Search for Language Model Agents - GitHub ๐Ÿ–ผ๏ธ

Agent Frameworks

Paper Slides Code Media
ReAct: Synergizing Reasoning and Acting ReAct UVA Lecture GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ ๐ŸŽง
AutoGen: Multi-Agent Conversation - GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ ๐ŸŽง
StateFlow: Enhancing LLM Task-Solving - GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
DSPy: Compiling Declarative Language Model - GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ ๐ŸŽง
LLM Agents Tutorials EMNLP 2024 Tutorial, WWW 2024 Tutorial, Berkeley Training Agents - -

Code Generation & Software Agents

Paper Slides Code Media
SWE-agent: Agent-Computer Interfaces Software Agents (Neubig) GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
OpenHands: AI Software Developers Software Agents (Neubig) GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
Interactive Tools Assist LM Agents Security Vulnerabilities Code Agents & Vulnerability Detection GitHub -
Big Sleep: LLM Vulnerabilities Real-World Code Agents & Vulnerability Detection - -
SWE-bench Verified - GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง

Web & Multimodal Agents

Paper Slides Code Media
WebShop: Scalable Real-World Web Interaction Multimodal Agents Berkeley GitHub -
Mind2Web: Generalist Agent for the Web Multimodal Agents Berkeley GitHub -
WebArena: Realistic Web Environment Multimodal Agents Berkeley, Web Agent Evaluation GitHub -
VisualWebArena Multimodal Agents Berkeley GitHub -
AGUVIS: Unified Pure Vision Agents GUI - GitHub -
BrowseComp: Web Browsing Benchmark - GitHub -

Enterprise & Workflow Agents

Paper Slides Code Media
WorkArena: Common Knowledge Work Tasks - GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
WorkArena++: Compositional Planning - GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
TapeAgents: Holistic Framework Agent Development TapeAgents Slides GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง

Mathematics & Theorem Proving

Paper Slides Code Media
LeanDojo: Theorem Proving Retrieval-Augmented LeanDojo AITP, LeanDojo NeurIPS, Theorem Proving ML GitHub ๐ŸŽจ
Autoformalization with Large Language Models - - ๐ŸŽจ
Autoformalizing Euclidean Geometry - GitHub ๐ŸŽจ
Draft, Sketch and Prove: Formal Theorem Provers Theorem Proving ML GitHub ๐ŸŽจ
miniCTX: Neural Theorem Proving Long-Contexts - GitHub ๐ŸŽจ
Lean-STaR: Interleave Thinking and Proving Berkeley Slides GitHub Website ๐ŸŽจ
ImProver: Agent-Based Automated Proof Optimization - GitHub ๐ŸŽจ
In-Context Learning Agent Formal Theorem-Proving - GitHub -
Symbolic Regression: Learned Concept Library - GitHub ๐Ÿ–ผ๏ธ
AlphaGeometry: Solving Olympiad Geometry - GitHub -

Robotics & Embodied Agents

Paper Slides Code Media
Voyager: Open-Ended Embodied Agent Voyager UT Austin GitHub ๐ŸŽจ
Eureka: Human-Level Reward Design Eureka Paper/Slides GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
DrEureka: Language Model Guided Sim-To-Real - GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
Gran Turismo: Deep Reinforcement Learning - - ๐Ÿ–ผ๏ธ
GR00T N1: Foundation Model Humanoid - GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
SLAC: Simulation-Pretrained Latent Action - - -

Scientific Discovery

Paper Slides Code Media
Paper2Agent: Research Papers as AI Agents - GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
OpenScholar: Synthesizing Scientific Literature - GitHub ๐Ÿ–ผ๏ธ

Safety & Security

Paper Slides Code Media
DataSentinel: Game-Theoretic Detection Prompt Injection Prompt Injection Duke GitHub -
AgentPoison: Red-teaming LLM Agents Prompt Injection Duke GitHub ๐ŸŽจ
Progent: Programmable Privilege Control - - -
DecodingTrust: Trustworthiness GPT Models - GitHub -
Representation Engineering: AI Transparency - GitHub -
Extracting Training Data from LLMs - - -
The Secret Sharer: Unintended Memorization - - -
Privtrans: Privilege Separation - - -

Evaluation & Benchmarking

Paper Slides Code Media
Survey: Evaluation LLM-based Agents AgentBench Multi-Turn NeurIPS - ๐Ÿ–ผ๏ธ ๐ŸŽง
Adding Error Bars to Evals - GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
Tau2-Bench: Conversational Agents Dual-Control - GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
Data Science Agents Data Science Agents Benchmark - -

Neural & Symbolic Reasoning

Paper Slides Code Media
Beyond A-Star: Better Planning Transformers - GitHub ๐Ÿ–ผ๏ธ
Dualformer: Controllable Fast and Slow Thinking - GitHub ๐Ÿ–ผ๏ธ
Composing Global Optimizers: Algebraic Objects - - ๐Ÿ–ผ๏ธ
SurCo: Learning Linear Surrogates - - ๐Ÿ–ผ๏ธ

Agentic Reasoning & RL Fine-Tuning

Source: redhat-et/agentic-reasoning-reinforcement-fine-tuning

DeepSeek R1 & Reasoning Models

Paper Slides Code Media
DeepSeek-R1: Reasoning via RL DeepSeek R1 Intro, DeepSeek R1 Toronto, DeepSeek R1 CMU, DeepSeek R1 Seoul GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
DeepSeek R1: Implications for AI DeepSeek R1 Intro - ๐ŸŽจ ๐Ÿ–ผ๏ธ
DeepSeek R1: Are Reasoning Models Faithful? - - ๐ŸŽจ ๐Ÿ–ผ๏ธ
OpenAI O1 Replication Journey - GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
Qwen QwQ Reasoning Model - HuggingFace ๐ŸŽจ ๐Ÿ–ผ๏ธ
Sky-T1: Training Small Reasoning LLMs - GitHub ๐Ÿ–ผ๏ธ
s1: Simple Test-Time Scaling - GitHub ๐Ÿ–ผ๏ธ

GRPO & RL Fine-Tuning

Paper Slides Code Media
DeepSeekMath: GRPO Algorithm Stanford RL for Reasoning GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
Guided GRPO: Adaptive Guidance PTA-GRPO Planning GitHub ๐Ÿ–ผ๏ธ
R-Search: Multi-Step Reasoning Stanford RL for Reasoning GitHub ๐Ÿ–ผ๏ธ
RL Fine-tuning: Instruction Following - - ๐Ÿ–ผ๏ธ
RFT Powers Multimodal Reasoning - - ๐Ÿ–ผ๏ธ
STILL-2: Distilling Reasoning - - ๐Ÿ–ผ๏ธ

Agentic RL

Paper Slides Code Media
WebAgent-R1: Multi-Turn RL for Web Agents - GitHub -
ARTIST: Agentic Reasoning & Tool Integration ARTIST Microsoft GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ ๐ŸŽง

Agentic Architectures & Coordination

Papers on multi-agent systems, decentralized coordination, and agentic frameworks

Decentralized Multi-Agent Systems

Paper Slides Code Media
AgentNet: Decentralized Multi-Agent Coordination - GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
MasRouter: Multi-Agent Routing MasRouter ACL 2025 GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
Multi-Agent RL Overview Edinburgh MARL Intro - -

Device & Computer Control

Paper Slides Code Media
DigiRL: Device Control Agents DigiRL NeurIPS 2024 GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
OSWorld: Multimodal Agents Benchmark - GitHub ๐Ÿ–ผ๏ธ
OS-Harm: Computer Use Safety OS-Harm Benchmark GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง

Agent Fine-Tuning & Tool Use

Paper Slides Code Media
FireAct: Language Agent Fine-tuning LLM Agents Tool Learning GitHub ๐Ÿ–ผ๏ธ ๐ŸŽง
DeepSeek Janus Pro: Multimodal - GitHub ๐ŸŽจ ๐Ÿ–ผ๏ธ
PTA-GRPO: High-Level Planning PTA-GRPO Planning - -
Stanford RL for Agents Stanford RL Agents 2025 - -
CMU LM Agents CMU Language Models as Agents - -
Mannheim Tool Use Mannheim LLM Agents Tool Use - -

Enterprise & Industry Guides

Resource Description Code
Intel AI Agents Architecture AI agents resource guide -
Cisco Agentic Frameworks Overview of agentic frameworks -

Deep Reinforcement Learning

See Full Deep RL Resources Guide - Comprehensive collection with 100+ resources and 92 slides

Value-Based Methods (DQN Family)

Paper arXiv Slides Code Media
Playing Atari with Deep RL (DQN) 1312.5602 CMU, CVUT, NTHU, Waterloo OpenAI Baselines -
Deep RL with Double Q-learning 1509.06461 CMU DQN OpenAI Baselines -
Dueling Network Architectures 1511.06581 Buffalo OpenAI Baselines -
Prioritized Experience Replay 1511.05952 Buffalo, Julien Vitay, ICML 2020 OpenAI Baselines -
Rainbow: Combining Improvements 1710.02298 Prague, Berkeley, Wisconsin Dopamine -

Policy Gradient Methods

Paper arXiv Slides Code Media
Policy Gradient Methods - Toronto, Berkeley CS285, REINFORCE Stanford Stable-Baselines3 -
Proximal Policy Optimization (PPO) 1707.06347 Waterloo, NTU Taiwan OpenAI Baselines -
Trust Region Policy Optimization (TRPO) 1502.05477 FAU, UT Austin, CMU Natural PG, Toronto PAIR OpenAI Baselines -
High-Dimensional Continuous Control (GAE) 1506.02438 Berkeley CS285 OpenAI Baselines -

Actor-Critic Methods

Paper arXiv Slides Code Media
Asynchronous Methods (A3C) 1602.01783 WPI, Buffalo, NTU, UIUC, Julien Vitay OpenAI Baselines -
Continuous Control (DDPG) 1509.02971 Paderborn, FAU, Julien Vitay, Buffalo Stable-Baselines3 -
Addressing Function Approximation (TD3) 1802.09477 Prague Stable-Baselines3 -
Soft Actor-Critic (SAC) 1801.01290 Toronto PAIR, Purdue, Stanford CS231n, Prague Stable-Baselines3 -

Temporal Difference & Q-Learning

Paper arXiv Slides Code Media
TD Learning Fundamentals - CMU, Michigan, Sutton & Barto - -
Q-Learning - Northeastern, CMU TD - -

Model-Based RL

Paper arXiv Slides Code Media
Model-Based RL - FAU, Toronto, Berkeley, CMU MBRL-Lib -

Imitation & Inverse RL

Paper arXiv Slides Code Media
Imitation Learning - WPI, EPFL imitation -
Inverse Reinforcement Learning - TU Darmstadt, Berkeley CS285 imitation -

Introductory Lectures

Topic Slides
Deep RL Introduction Berkeley CS294, Berkeley 2017

Frameworks & Tools

Tool Link Description
OpenAI Gym GitHub RL environments
Gymnasium GitHub Maintained fork of Gym
Stable-Baselines3 GitHub RL algorithms in PyTorch
Unity ML-Agents GitHub 3D environments
PyTorch pytorch.org Deep learning framework
Google Dopamine GitHub RL research framework
CleanRL GitHub Single-file RL implementations
RLlib GitHub Scalable RL library

View all 100+ resources in DEEP_RL_RESOURCES.md


Recommended Study Path

Beginner

  1. Start with WWW 2024 LLM Agents Tutorial - comprehensive overview
  2. Read ReAct paper + slides + code
  3. Study Chain-of-Thought with CoT Princeton Lecture

Intermediate

  1. Software Agents (Neubig) for code agents + SWE-agent code
  2. DPO CMU Lecture for alignment + DPO code
  3. Multimodal Agents Berkeley for web agents + WebArena code

Advanced

  1. LeanDojo slides for theorem proving + code
  2. HippoRAG NeurIPS for memory systems + code
  3. Prompt Injection Duke for security

Reasoning & RL Fine-Tuning Path

  1. DeepSeek-R1 paper + DeepSeek R1 CMU slides + code
  2. DeepSeekMath GRPO + Stanford RL for Reasoning + code
  3. ARTIST paper for agentic reasoning with tools

License

Papers are property of their respective authors. This collection is for educational purposes.

About

Collection of papers and slide decks on LLM agents, reasoning, and AI systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •