生成时间: 2026-07-02 18:43:23 (UTC+8); Arxiv 发布时间: 2026-07-02 20:00 EDT (2026-07-03 08:00 UTC+8)

今天共有 42 篇相关文章

Keyword: reinforcement learning

Trajectory Learning with Graph Representations for Social Robot Navigation

基于图形表示的轨迹学习用于社会机器人导航

Learning Dexterous Manipulation Using Contact Wrench Guidance From Human Demonstration

利用接触扳手引导学习灵巧操作,从人类演示中学习

Bayesian updates from coalgebraic determinisation

来自共代数定数的贝叶斯更新

Active Sensing for RIS-Aided Tracking and Power Control: A Hybrid Neuroevolution and Supervised Learning Approach

主动感知用于RIS辅助追踪与功率控制:一种混合神经进化与监督学习方法

Learning Expert Strategy for Autonomous Robotic Endovascular Intervention via Decoupled Procedural Execution

通过解耦程序执行实现自主机器人血管内干预的学习专家策略

RareDxR1: Autonomous Medical Reasoning for Rare Disease Diagnosis Beyond Human Annotation

RareDxR1:罕见病诊断的自主医学推理,超越人类注释

A Contextual-Bandit Oversight Game with Two-Sided Informational Asymmetry

一种带有双边信息不对称的情境盗贼监督游戏

Distributed Multi Robot Lunar Cargo Transportation via Phase Decomposed Reinforcement Learning

通过相位分解强化学习实现分布式多机器人月球货物运输

Verifiable Rewards for Calibrated Probabilistic Forecasting

校准概率预测的可验证奖励

Play Like Champions: Counterfactual Feedback Generation in Latent Space

像冠军一样玩:潜在空间中的反事实反馈生成

SLIM-RL: Risk-Budgeted Random-Masking RL for Diffusion LLMs Without Trajectory Slicing

SLIM-RL:风险预算随机掩蔽强化学习,适用于无轨迹切片的扩散大型语言模型

Learning Generalizable Skill Policy with Data-Efficient Unsupervised RL

利用数据高效的无监督强化学习学习可推广技能策略

Personalization as Inverse Planning: Learning Latent Design Intents for Agentic Slide Generation via Structural Denoising

个性化作为逆向规划:通过结构去噪学习代理滑动生成的潜在设计意图

Selective Test-Time Debiasing for CLIP via Reward Gating

通过奖励门禁对CLIP进行选择性测试时间去偏

Learning Gait-Aware Quadruped Locomotion with Temporal Logic Specifications

学习带有时间逻辑规范的步态感知四足行走

Gauging, Measuring, and Controlling Critic Complexity in Actor-Critic Reinforcement Learning

在演员-批评强化学习中衡量、衡量和控制批评复杂性

VLM-AR3L: Vision-Language Models for Absolute and Relative Rewards in Reinforcement Learning

VLM-AR3L:强化学习中绝对与相对奖励的视觉语言模型

Efficient Multilingual Reasoning Transfer via Progressive Code-Switching

通过渐进式码切换实现高效的多语言推理转移

PAPA: Online Personalized Active Preference Alignment

PAPA:在线个性化主动偏好对齐

Active-GRPO: Adaptive Imitation and Self-Improving Reasoning for Molecular Optimization

Active-GRPO:自适应模仿与分子优化自我改进推理

Flow-Map GRPO: Reinforcement Learning for Few-Step Flow-Map Generators via Anchored Stochastic Composition

流图GRPO:通过锚定随机组合实现的少步流图生成器的强化学习

Loss Smoothing for Stable Adaptation Under Distribution Shift

分布偏移下稳定适应的损失平滑

Learning-based control of a single-DOF Aero system

单自由度空气系统的基于学习的控制

Coachable agents for interactive gameplay

可指导的互动游戏代理

M2Note: Continual Evolution of Vision Language Models via Mistake Notebook Learning

M2Note:通过错误笔记本学习,视觉语言模型的持续演进

Task-Relevant Representation Decoupling for Visual Reinforcement Learning Generalization

任务相关表征解耦用于视觉强化学习泛化

Local Motion Matters: A Deconstruct-Recompose Paradigm for Reinforcement Learning Pre-training from Videos

局部运动的重要性:基于视频进行强化学习预训练的解构-重组范式

From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training

从像素到时间相关:强化学习前训练中的信息表征

EFlow: Learning Evidence Flow for Long-Video Reasoning with Adaptive Reflection

EFlow:带自适应反思的长视频推理学习证据流

Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination

图原生强化学习通过概念重组实现可追溯的科学假说生成

Human-Machine Collaboration on Generative Meta-Learning: Model and Algorithm

生成元学习的人机协作:模型与算法

DRL-Based Joint Beamforming and Surface Shape Optimization for Flexible Intelligent Metasurface-Aided ISAC Systems

基于DRL的关节束成形和表面形状优化,适用于灵活智能的超曲面辅助ISAC系统

AMBUSH: Collaborative Capture in Complex Environments with Neural Acceleration

AMBUSH:复杂环境中的协同捕获与神经加速

AutoRestTest at the SBFT 2026 Tool Competition

2026年SBFT工具竞赛中的AutoRestTest

Can Agents Generalize to the Open World? Unveiling the Fragility of Static Training in Tool Use

代理能推广到开放世界吗?揭示静态训练在工具使用中的脆弱性

Next-Generation Agentic Reinforcement Learning Systems Enable Self-Evolving Agents

下一代智能体强化学习系统使智能体能够自我演化

QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling

准蒙特卡洛测试时间缩放

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

感知到推理:将感知与推理分离,实现细粒度的视觉推理

Quantum vs. Classical Machine Learning: A Unified Empirical Comparison

量子与经典机器学习:统一的经验比较

Language-Critique Imitation Learning from Suboptimal Demonstrations

语言批判模仿:从次优演示中学习

Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training

一层够吗?训练 单一变压器层可以匹配全参数强化学习训练

Keyword: diffusion policy

FAR: Failure-Aware Retry for Test-Time Recovery and Continual Policy Improvement

FAR:测试时间恢复和持续策略改进的失败感知重试