生成时间: 2026-03-05 16:45:55 (UTC+8); Arxiv 发布时间: 2026-03-05 20:00 EST (2026-03-06 09:00 UTC+8)

今天共有 38 篇相关文章

Keyword: reinforcement learning

SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

SE-Search:通过记忆和密集奖励实现自我演化的搜索代理

HumanLM: Simulating Users with State Alignment Beats Response Imitation

HumanLM:模拟状态对齐用户胜过响应模仿

Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

Sleeper Cell:向使用工具的大型语言模型注入潜在恶意的时间后门

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

MemSifter:通过结果驱动代理推理卸载LLM内存检索

Beyond Accuracy: Evaluating Visual Grounding In Multimodal Medical Reasoning

超越准确性:评估多模态医学推理中的视觉基础

[Re] FairDICE: A Gap Between Theory And Practice

[Re]FairDICE:理论与实践之间的鸿沟

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

在线强化学习中延迟观察的极小极大策略

Optimal trajectory-guided stochastic co-optimization for e-fuel system design and real-time operation

电子燃料系统设计和实时运行的最优轨迹引导随机协同优化

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion

Phys4D:来自视频扩散的细粒度物理一致四维建模

PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation

PhyPrompt:基于强化学习的提示精炼,实现物理上合理的文本转视频生成

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

连续状态强化学习的Q-Measure-Learning:高效实现与收敛

Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration

高效协调空间探索的混合信念强化学习

Freezing of Gait Prediction using Proactive Agent that Learns from Selected Experience and DDQN Algorithm

使用主动代理学习选择经验和DDQN算法冻结步态预测

Principled Learning-to-Communicate with Quasi-Classical Information Structures

基于准经典信息结构的原则性学习交流

MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation

MIND:统一探究与诊断强化学习,基于精神科咨询的临床支持标准

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

MAGE:面向战略探索与利用的元强化学习:语言代理

UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services

UrbanHuRo:一个用于异构城市服务联合优化的双层人机协作框架

HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration

HALyPO:异构代理李雅普诺夫策略优化,用于人机协作

Interaction-Aware Whole-Body Control for Compliant Object Transport

交互感知的全体控制,用于合规的物体传输

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning

信心校准小-大语言模型协作,实现成本效益高的推理

Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation

公平始于状态:在交互式推荐中净化层级强化学习的潜在偏好

Dual-Interaction-Aware Cooperative Control Strategy for Alleviating Mixed Traffic Congestion

双交互感知协同控制策略,用于缓解混合交通拥堵

Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control

选择离线强化学习算法以实现随机网络控制

RVN-Bench: A Benchmark for Reactive Visual Navigation

RVN-Bench:反应式视觉导航的标杆

GIPO: Gaussian Importance Sampling Policy Optimization

GIPO:高斯重要性抽样策略优化

Discriminative Perception via Anchored Description for Reasoning Segmentation

通过锚定描述进行判断感知的推理分割

Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation

重新思考强化学习在放射科报告生成中的效率与效果

Self-adapting Robotic Agents through Online Continual Reinforcement Learning with World Model Feedback

通过在线持续强化学习结合世界模型反馈实现自我适应机器人智能体

SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling

SaFeR:通过可行性约束令牌重采样实现自动驾驶测试的安全关键场景生成

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

BeamPERL:参数高效的RL且可验证奖励,专注于结构化束力学推理的紧凑型大型语言模型

Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation

通过预测性神经肌肉骨骼模拟学习髋部外骨骼控制政策

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Memex(RL):通过索引经验记忆扩展长期视野LLM代理

IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning

IPD:在离线强化学习中通过虚构规划提炼提升顺序策略

What Does Flow Matching Bring To TD Learning?

流程匹配对TD学习有什么意义?

Tendon Force Modeling for Sim2Real Transfer of Reinforcement Learning Policies for Tendon-Driven Robots

Sim2Real 强化学习策略转移的腱力建模

A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications

一种受限强化学习方法,用于成本效益高地交付对延迟敏感的应用

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

双模态多阶段对抗性安全培训:强健多模态网络代理抵御跨模态攻击

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

TaxonRL:带有中级奖励的强化学习,用于可解释的细粒度视觉推理

Keyword: diffusion policy

There is no result