Skip to content
View sfasfaffa's full-sized avatar
  • Harbin, China

Block or report sfasfaffa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sfasfaffa/README.md

Hi there 👋

I am Dengyun Peng, a first-year Master's student at HIT and a member of the SCIR LA. I am currently under the supervision of Professor Wanxiang Che and Ph.D. candidate Qiguang Chen. My current research interests focus on RL4LLM, LLM reasoning. I have research experience in Safe RL and Offline RL.

Intern:

  • iFLYTEK (Hefei)

    • Research Intern, September 2025 – Present
  • Du Xiaoman Financial (Beijing)

    • Research Intern, January 2025 – February 2025
  • Westlake University (Hangzhou)

    • Research Intern, December 2023 – September 2024

Publication:

(Preprint, Co-First author) Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable Problems (https://arxiv.org/abs/2512.01661)

(EMNLP2025 Findings, Co-First author) DLPO: Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective (https://arxiv.org/abs/2503.13413)

(NIPS2025, Co-First author) Boundary-to-Region Supervision for Offline Safe Reinforcement Learning (https://nips.cc/virtual/2025/poster/115428)

(AAAI2026, Co-First author) Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models (https://arxiv.org/abs/2508.11582)

(ICML2024, Second author) Reinformer: Max-Return Sequence Modeling for Offline RL (https://proceedings.mlr.press/v235/zhuang24b.html)

(SCIENCE CHINA Information Sciences, Fourth Author) Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models (https://arxiv.org/abs/2503.09567)

(Preprint, Fourth Author) ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model (https://arxiv.org/abs/2502.03325)

Email:

[email protected]

[email protected]

Google scholar

https://scholar.google.com.hk/citations?user=XtG_SxwAAAAJ&hl=zh-CN

Popular repositories Loading

  1. DR_SAF DR_SAF Public

    Official code for AAAI2026 {Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models}

    Python 13

  2. DLPO DLPO Public

    Official Code For EMNLP2025 Findings: {DLPO : Towards a Robust, Efficient, and Generalizable Prompt Optimization Framework from a Deep-Learning Perspective}

    Python 10

  3. unsolvableQA unsolvableQA Public

    Official code and data for{Learning the Boundary of Solvability: Aligning LLMs to Detect Unsolvable Problems}

    Python 3

  4. SystemAnalysisAndDesign SystemAnalysisAndDesign Public

    Java 1 1

  5. National_Mathematical_Modeling_Competition_2023_fall National_Mathematical_Modeling_Competition_2023_fall Public

    Python 1

  6. sfasfaffa sfasfaffa Public

    1