生成时间: 2026-06-17 20:18:21 (UTC+8); Arxiv 发布时间: 2026-06-17 20:00 EDT (2026-06-18 08:00 UTC+8)

今天共有 36 篇相关文章

Keyword: reinforcement learning

GeoDisaster: Benchmarking Orchestrated Agents for Operational Disaster Geo-Intelligence

地质灾害:为灾害地理情报的协调智能体进行基准测试

Rethinking Groups in Critic-Free RLVR

重新思考无批评RLVR中的团体

Training LLMs with Reinforcement Learning over Digital Twin Representations for Reasoning-Intensive Surgical VideoQA

通过数字孪生表述进行强化学习培训大型语言模型,用于推理密集型手术视频质量保证

Decision-Driven Geosteering Under Uncertainty: A Unified Framework for Sequential Decision Optimization

不确定性下的决策驱动地质引导:顺序决策优化的统一框架

Performance-Driven Environment Abstraction with Multi-Timescale Learning

多时间尺度学习的绩效驱动环境抽象

Treatment Response Optimized Clinical Decision Support AI System via Digital Twin Simulation

通过数字孪生模拟优化治疗反应的临床决策支持人工智能系统

Enhancing Pathological VLMs with Cross-scale Reasoning

通过跨尺度推理增强病理性VLMs

Memory-Efficient Meta-Reinforcement Learning for Adaptive Safety-Critical Control in Adversarial Spacecraft Proximity Operations

在对抗性航天器近距离操作中实现自适应安全关键控制的高效内存元强化学习

Embodiment Shapes Rolling Behavior in a Multimodal Infant Model

具身塑造了多模态婴儿模型中的滚动行为

Multi-Adapter PPO: A Cross-Attention Enhanced Wavelength Selection Framework for LIBS Quantitative Analysis

多适配器PPO:一种用于LIBS定量分析的交叉注意力增强波长选择框架

Theoretical Grounding of Out-Of-Distribution Detection With Reinforcement Learning Optimizer

利用强化学习优化器理论基础化分布外检测

When Robots Sleep: Offline Skill Consolidation for Shared-Policy Robot Learning

当机器人沉睡时:共享策略机器人学习的离线技能巩固

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

在空间视觉语言模型中强化双路径推理

Continuous-time Optimal Stopping through Deep Reinforcement Learning

通过深度强化学习实现连续时间最优停止

Reversal Q-Learning

反转Q-学习

Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning

闭合反馈循环:从经验提取到言语强化学习中的洞察治理

Using Cognitive Models to Improve Language Model Simulation of Human Persuasion Games

利用认知模型提升人类说服游戏的语言模型模拟

See First, Answer Later: Visual Evidence Pre-Alignment via Sufficiency-Driven RL

先看后答:通过充分性驱动强化学习实现视觉证据预对

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

环境RL:从代理强化学习中的环境动力学学习

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

从学员到培训师:由LLM设计的多智能体推理强化学习培训环境

SuCo: Sufficiency-guided Continuous Adaptive Reasoning

SuCo:充分引导的持续适应推理

Shattering the Autoregressive Curse: Dynamic Epistemic Entropy Orchestrated Erasable Reinforcement Learning for LLMs

打破自我回归的诅咒:大型语言模型的动态认知熵协式可抹除强化学习

Continual Self-Improvement with Lightweight Experiential Latent Memories

持续自我提升,拥有轻量级的体验式潜在记忆

StepGuard: Guarding Web Navigation via Single-Step Calibration

StepGuard:通过单步校准保护网页导航

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

动态展开编辑,用于减少强化学习训练推理模型中的过度思考

WAM-RL: World-Action Model Reinforcement Learning with Reconstruction Rewards and Online Video SFT

WAM-RL:世界行动模型强化学习,含重建奖励和在线视频SFT

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

从推理痕迹到可重用模块:理解语言模型推理中的组合推广

WireCraft: A Simulation Benchmark for Industrial DLO Manipulation

WireCraft:工业DLO操作的仿真基准

OmniPlan: An Adaptive Framework for Timely and Near-Optimal Network Planning Optimization

OmniPlan:一个用于及时且近乎最优网络规划优化的自适应框架

Deep Reinforcement Learning for Minimum Zero-Forcing Sets

针对最小零强制集的深度强化学习

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

多目标强化学习中的公平帕累托最优策略学习

Knowledge Reutilization in Meta-Reinforcement Learning

元强化学习中的知识再利用

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

近端策略优化区:提示中的教师,而非梯度

Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

从神经符号自主网络代理观察中学习红代理策略

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

统一多模自回归建模与共享上下文-可视化分词器是统一的关键

Keyword: diffusion policy

LAGO Policy: Latency-Aware Asynchronous Diffusion Policies with Goal-Directed Collision-Free Planning for Smooth Manipulation

LAGO 策略:延迟感知异步扩散策略,配合目标导向的无碰撞规划,实现平滑操作