生成时间: 2026-06-08 20:33:07 (UTC+8); Arxiv 发布时间: 2026-06-08 20:00 EDT (2026-06-09 08:00 UTC+8)

今天共有 28 篇相关文章

Keyword: reinforcement learning

MacArena: Benchmarking Computer Use Agents on an Online macOS Environment

MacArena:在线macOS环境中计算机使用代理的基准测试

Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning

通过一致性驱动强化学习提升跨语言事实回忆

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

不确定性感知的LLM引导政策塑造,用于稀疏奖励强化学习

What Do People Actually Want From AI? Mapping Preference Plurality

人们到底想从人工智能那里得到什么?映射偏好多数

Performance Variation in Deep Reinforcement Learning

深度强化学习中的性能差异

Learning All-Terrain Locomotion for a Planetary Rover with Actively Articulated Suspension

学习带主动铰接悬挂的行星探测车全地形移动

Exploring Reinforcement Learning for Fluid Transitions Between Clinical Mental Healthcare and Everyday Wellness Support

探索强化学习,帮助临床心理健康护理与日常健康支持之间实现流动过渡

VideoSEG-O3: A Multi-turn Reinforcement Learning Framework for Reasoning Video Object Segmentation

VideoSEG-O3:用于推理视频对象分割的多回合强化学习框架

SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling

SCALE:可扩展的跨注意力学习与外推以实现代理式工作流程调度

Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards

Progress-SQL:通过渐进奖励提升文本转SQL的强化学习

AdaGRPO: A Capability-Aware Adaptive Enhancement for Flow-based GRPO

AdaGRPO:基于流量的能力感知自适应增强

T-GMP: Terrain-conditioned Generative Motion Priors for Versatile and Natural Humanoid Locomotion

T-GMP:地形条件生成运动先验,实现多功能自然类人运动

GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios

GenPO++:带有无雅可比似然比的生成策略优化

Exploring Agentic Tool-Calling Decisions via Uncertainty-Aligned Reinforcement Learning

通过不确定性对齐强化学习探索代理工具调用决策

Teaching the Way, Not the Answer: Privileged Tutoring Distillation for Multimodal Policy Optimization

教导道路,而非答案:多模态政策优化的特权辅导提炼

StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents

StainFlow:GUI代理中实体污渍追踪与过程奖励的证据链接

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

SlimSearcher:通过自适应奖励门控训练高效感知的网络代理

On the Geometry of On-Policy Distillation

论政策内提炼的几何结构

Predictive Style Matching: Natural and Robust Humanoid Locomotion

预测风格匹配:自然且稳健的人形运动

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

从正确性到效用:基于增益的前缀评估用于LLM推理

Shield-Loco: Shielding Locomotion Policies with Predictive Safety Filtering

盾-机车:带预测安全过滤的屏蔽运动政策

Learning Multi-Agent Communication Protocol: Study on Information Entropy Efficiency in MARL

学习多智能体通信协议:MARL中信息熵效率的研究

KIT's Submission to Cross-Lingual Voice Cloning in IWSLT 2026

KIT 在 IWSLT 2026 中提交跨语言语音克隆

Self-evolving LLM agents with in-distribution Optimization

具有分布式优化的自我演化LLM代理

Rapid co-design of Buoyancy-assisted robots for Challenging Locomotion using Gaussian Evolutionary Specialists

利用高斯进化专家快速设计浮力辅助机器人以挑战运动

Modelling Opinion Dynamics at Scale with Deep MARL

利用深度MARL大规模建模意见动态

Affordance-Based Hierarchical Reinforcement Learning for Quadruped Pedipulation

基于可有性的分层强化学习用于四足踏步

Keyword: diffusion policy

Beyond Waypoints: A Trajectory-Centric Waypointing Paradigm for Vision-Language Navigation

超越航点:一种以轨迹为中心的视觉语言导航航路标范式