生成时间: 2026-04-13 17:58:06 (UTC+8); Arxiv 发布时间: 2026-04-13 20:00 EDT (2026-04-14 08:00 UTC+8)

今天共有 34 篇相关文章

Keyword: reinforcement learning

Distributionally Robust Token Optimization in RLHF

RLHF 中的分布式鲁棒令牌优化

StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning

StructRL:从分布强化学习中的动态规划结构恢复

RAMP: Hybrid DRL for Online Learning of Numeric Action Models

RAMP:用于数值作用模型在线学习的混合日程学习

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

无线通信增强值分解用于多智能体强化学习

Artifacts as Memory Beyond the Agent Boundary

工件作为代理边界之外的记忆

Alleviating Community Fear in Disasters via Multi-Agent Actor-Critic Reinforcement Learning

通过多智能体演员-批评者强化学习,缓解灾难中的社区恐惧

Building Better Environments for Autonomous Cyber Defence

构建更优的自主网络防御环境

Simulation of Adaptive Running with Flexible Sports Prosthesis using Reinforcement Learning of Hybrid-link System

利用混合链路系统的强化学习,模拟灵活运动假肢的自适应跑步

HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation

HTNav:具有分层结构的城市空中视觉与语言导航混合导航框架

StaRPO: Stability-Augmented Reinforcement Policy Optimization

StaRPO:稳定性增强强化策略优化

Bridging SFT and RL: Dynamic Policy Optimization for Robust Reasoning

桥接SFT与强化学习:动态策略优化以实现稳健推理

WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning

WOMBET:基于世界模型的经验转移,实现稳健且样本高效的强化学习

Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning

高效的层级隐式流Q-learning用于离线目标条件强化学习

Multi-agent Reinforcement Learning for Low-Carbon P2P Energy Trading among Self-Interested Microgrids

多智能体强化学习用于自利微电网间的低碳点对点能源交易

PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

PerMix-RLVR:在可验证奖励对齐下保持人物表达力

ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning

ActFER:通过主动工具增强视觉推理实现的能动面部表情识别

Hypergraph Neural Networks Accelerate MUS Enumeration

超图神经网络加速多单元枚举

Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks

可塑性增强的多智能体专家组合,用于无人机辅助紧急通信网络中的动态目标适应

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

优势引导扩散用于基于模型的强化学习

Learning Vision-Language-Action World Models for Autonomous Driving

学习视觉-语言-行动世界模型用于自动驾驶

TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training

TensorHub:适用于LLM RL训练的可扩展且弹性的重量转移

Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling

截断纠正流策略用于一步抽样强化学习

On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach

关于DAG拓扑在能量感知云调度中的作用:基于GNN的深度强化学习方法

Online Intention Prediction via Control-Informed Learning

通过控制知情学习进行在线意图预测

Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym

注意空间推理与行动之间的差距!空间体态的逐步评估

Visually-Guided Policy Optimization for Multimodal Reasoning

多模态推理的可视化引导策略优化

Musculoskeletal Motion Imitation for Learning Personalized Exoskeleton Control Policy in Impaired Gait

肌肉骨骼运动模拟以学习受损步态中个性化外骨骼控制政策

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

SafeAdapt:深度强化学习中的可验证安全政策更新

From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models

从推理到智能:大型语言模型强化学习中的学分分配

Physics-Informed Reinforcement Learning of Spatial Density Velocity Potentials for Map-Free Racing

物理知情强化学习空间密度速度势能,用于无地图竞速

VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning

VISOR:通过迭代搜索和超视距推理实现的智能视觉检索增强生成

RIRF: Reasoning Image Restoration Framework

RIRF:推理图像修复框架

Event-Driven Temporal Graph Networks for Asynchronous Multi-Agent Cyber Defense in NetForge_RL

NetForge_RL中异步多代理网络防御的事件驱动时图网络

VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning

VL校准:大型视觉语言模型的解耦置信校准推理

Keyword: diffusion policy

There is no result