生成时间: 2026-06-01 21:39:54 (UTC+8); Arxiv 发布时间: 2026-06-01 20:00 EDT (2026-06-02 08:00 UTC+8)

今天共有 69 篇相关文章

Keyword: reinforcement learning

Delayed Repression and Emergent Instability in Adaptive Multi-Agent Systems

自适应多智能体系统中的延迟抑制与突发不稳定性

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

通过状态增强和可分动态共识实现可扩展的受限多智能体强化学习

Improving Small Language Models for Code Generation with Reinforcement Learning from Verification Feedback

通过验证反馈强化学习改进小语言模型用于代码生成

Physics-informed Goal-Conditioned Reinforcement Learning under Hybrid Contact Dynamics

混合接触动力学下的物理知情目标条件强化学习

Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future

毁灭是一种通用的学习策略,世代;扩散的优势在于认真对待;探索是未来

Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

不确定性感知和时间受控的自动驾驶强化学习专家建议

Constrained Flow Optimization via Sequential Fine Tuning for Molecular Design

通过顺序微调实现分子设计的受限流动优化

ZAPS-DA: Zero-Phase Action Policy Smoothing with Decoupled Actor for Continuous Control in Reinforcement Learning

ZAPS-DA:基于解耦演员的零相位动作策略平滑,用于强化学习中的连续控制

Temporally Encoded Double DQN for Proactive PRB Allocation in O-RAN Enabled Industrial Networks

在支持O-RAN的工业网络中,用于主动PRB分配的时序编码双DQN

Convergence of Steepest Descent and Adam under Non-Uniform Smoothness

在非均匀光滑性下,最陡下降与亚当的收敛

Learning to Perceive the World Through Control: Empowerment-Based Representation Learning

通过控制来感知世界:基于赋权的表征学习

Reinforcement Learning for Special Education: Aligning LLM Tutors to Diverse Learners through Disability-Adaptive Training

特殊教育强化学习:通过残障适应培训将LLM导师与多元学习者对齐

Universal Decision Learners

通用决策学习者

ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents

ExpGraph:基于图结构化记忆的模型无关体验学习,面向LLM代理

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

什么时候大型语言模型足够作为顺序强化学习任务的策略优化器?

MosaicLeaks:Privacy Risks in Querying-in-the-Open for Deep Research Agents

MosaicLeaks:公开查询深度研究代理的隐私风险

Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models

通过扩散模型生成类图论规则以实现知识图推理

FLAG: Flow Policy MaxEnt-RL by Latent Augmented Guidance

旗帜:通过潜在增强指导实现的流量策略 MaxEnt-RL

Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning

高效且具备不确定性意识的扩散框架,用于线下到在线强化学习

Learning Agent-Compatible Context Management for Long-Horizon Tasks

学习长期任务的代理兼容上下文管理

Feat2Go: Visual Feature-Grounded Value Estimation for Embodied Reinforcement Learning

Feat2Go:具身强化学习中的视觉特征基础价值估计

Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

以结构感知奖励为基础的计划者中心深度研究强化学习

SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning

SLAT:用于高效CoT推理的分段级自适应修剪

A Lecture Note on Offline RL and IRL, Part II: Foundations of Inverse Reinforcement Learning and Dynamic Discrete Choice Models

离线强化学习与现实学习讲义笔记 第二部分:逆强化学习基础与动态离散选择模型

Safe Equilibrium Policy Optimization for Strategic Agent Policies

战略代理政策的安全均衡优化

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning

DARTS:分布式感知的主动推广轨迹塑造,加速LLM强化学习

Distilling LLM Feedback for Lean Theorem Proving

精益定理证明中提取LLM反馈

GUI-C$^2$: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning

GUI-C$^2$:通过难度感知强化学习实现粗细GUI基础化

Zero Collapse: A Failure Mode of Policy Gradient Methods in Discontinuous Reward Environments

零崩溃:不连续奖励环境中策略梯度方法的失败模式

Inverse Reinforcement Learning without an Optimal Demonstrator: A Feasible Reward Set Approach

无最优演示者的逆向强化学习:一种可行的奖励集方法

Attend to Evidence: Evidence-Anchored Spatial Attention Supervision for Multimodal RLVR

关注证据:多模态RLVR的证据锚定空间注意力监督

Automating Formal Verification with Reinforcement Learning and Recursive Inference

通过强化学习和递归推理实现形式化验证自动化

De-attribute to Forget for LLM Unlearning

去属性到Forget,用于LLM逆学习

Enhancing Human-Likeness in Reinforcement Learning Agents via Hierarchical Macro Action Quantization

通过层级宏动作量化增强强化学习主体中的人性相似性

RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning

RDGen:通过强化学习实现高质量机器人学习的演示生成

Graph-GRPO: Dependency-Aware Credit Assignment for Generative E-commerce Search Relevance

Graph-GRPO:生成式电子商务搜索相关性的依赖感知信用分配

SDM-Q: Cost-Aware Staged Decision-Making for Multi-Omics Classification with Deep Q-Learning

SDM-Q:基于深度Q学习的成本感知分阶段决策,用于多组学分类

HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster

HADT:一种用于自主地球观测卫星集群的异构多代理差分变压器

Annealed Softmax Greedy in Many-Armed Bayesian Bandits

多臂贝叶斯强盗中的退火软极大贪婪

The Challenges of Using Reinforcement Learning for Controlling Industrial Energy Systems

利用强化学习控制工业能源系统的挑战

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

组合综合:通过原子分解与重组扩展码RLVR

AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering

AdaptR1:基于强化学习的多跳问答中的自适应交错思维

iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning

iVGR:通过强化学习内化视觉基础推理,促进多层次多层次学习(MLLM)

FOCUS: Forcing In-Context Object Localization through Visual Support Constraints and Policy Optimization

重点:通过可视化支持约束和策略优化强制上下文中对象本地化

Convergence of Two-Timescale Markovian Stochastic Approximations with Applications in Reinforcement Learning

两时间尺度马尔可夫随机近似的收敛与强化学习中的应用

The Regularizing Power of Language-Training Deepfake Detectors

语言训练深度伪造检测器的规范化力量

Multivariate Distributional Reinforcement Learning Using Sliced Divergences

多元分布强化学习,利用切片发散

EchoRL: Reinforcement Learning via Rollout Echoing

EchoRL:通过滚动回声进行强化学习

Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

为什么线性循环记忆在部分可观察强化学习中有效

DriveMA: Driving Vision-Language-Action Models with verifiable Meta-Actions

DriveMA:驱动具有可验证元行动的视觉-语言-行动模型

Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

生存强化学习:迈向可扩展的自我监督强化学习

The Terminal Representation in Reinforcement Learning

强化学习中的终端表示

Non-Asymptotic Convergence of Stochastic Iterative Algorithms: A Lyapunov Framework

随机迭代算法的非渐近收敛:李雅普诺夫框架

Generalized Intention Modeling in Multi-Agent Reinforcement Learning

多智能体强化学习中的广义意图建模

Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards

强化学习会加剧无害奖励的涌现错位

Dreaming Of Others: Latent Teammate Modeling In World Models For Multi-Agent Reinforcement Learning

梦见他人:多智能体强化学习世界中潜在队友建模

Unlocking Fine-Grained Translation Quality Estimation in LRMs through Synergistically Evolving Implicit and Explicit Reasoning

通过协同演进隐性与显式推理,解锁长程模型中的细粒度翻译质量估计

Constrained Multi-Objective Reinforcement Learning with Max-Min Criterion

带最大最小准则的受限多目标强化学习

Astra: a generalizable report generation foundation model for 3D computed tomography

Astra:一种适用于3D计算机断层扫描的通用报告生成基础模型

Answer-Set-Programming-based Abstractions for Reinforcement Learning

基于答案集编程的强化学习抽象

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

DRIFT:解耦推出与重要性加权微调,实现高效多回合优化

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

GPU 预测器:语言模型作为内核运行时优化的选择性替代

Batched Differentiable Rigid Body Dynamics in PyTorch for GPU-Accelerated Robot Learning

PyTorch 中的批量可微刚体动力学用于 GPU 加速机器人学习

Learning Controlled Separation of Small Objects Between Two Fingers with a Tactile Skin

学习用触觉皮肤控制小物体的两根手指分离

Are Full Rollouts Necessary for On-Policy Distillation?

全面推出是否必须用于政策提炼?

Skill Reuse as Compression in Agentic RL

技能重用作为能动强化学习中的压缩

Value Functions as Supermartingale Certificates

作为超级马丁格尔证书的价值函数

Preference-Aware Rubric Learning for Personalized Evaluation

个性化评估的偏好感知评分标准学习

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

LongTraceRL:通过评分标准奖励学习搜索代理轨迹的长上下文推理

Keyword: diffusion policy

There is no result