生成时间: 2026-04-15 17:23:10 (UTC+8); Arxiv 发布时间: 2026-04-15 20:00 EDT (2026-04-16 08:00 UTC+8)

今天共有 31 篇相关文章

Keyword: reinforcement learning

Self-Monitoring Benefits from Structural Integration: Lessons from Metacognition in Continuous-Time Multi-Timescale Agents

结构整合带来的自我监测益处:连续时间多时间尺度代理中元认知的经验教训

Offline-Online Reinforcement Learning for Linear Mixture MDPs

线性混合MDP的离线-在线强化学习

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision

自我提炼零:自我修正将二元奖励变成密集监督

Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration

思考不确定性:通过推理校准提升长格式生成事实性

Robust Optimization for Mitigating Reward Hacking with Correlated Proxies

利用相关代理进行的稳健优化,以减轻奖励黑客攻击

PubSwap: Public-Data Off-Policy Coordination for Federated RLVR

PubSwap:联邦RLVR的公共数据非政策协调

Nucleus-Image: Sparse MoE for Image Generation

核像:稀疏的成像环境用于图像生成

Hybrid Adaptive Tuning for Tiered Memory Systems

分层存储系统的混合自适应调优

MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization

MolMem:用于样本高效分子优化的记忆增强代理强化学习

ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception

ARGen:情感强化生成增强,面向基于视觉的动态情绪感知

WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

WebAgentGuard:一种基于推理的Guard模型,用于检测Web代理中的提示注入攻击

Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning

标签为 TrustSet 引导:带强化学习的批量主动学习

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron 3 Super:开放高效的专家混合混合曼巴-变换器模型用于代理推理

ReasonXL: Shifting LLM Reasoning Language Without Sacrificing Performance

ReasonXL:在不牺牲性能的情况下转换大型语言模型推理语言

Traffic-Aware Domain Partitioning and Load-Balanced Inter-Domain Routing for LEO Satellite Networks

LEO卫星网络的流量感知域划分与负载均衡域间路由

From Kinematics to Dynamics: Learning to Refine Hybrid Plans for Physically Feasible Execution

从运动学到动力学:学习优化混合计划以实现物理可行执行

KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning

KG-推理器:端到端多跳知识图谱推理的强化模型

Safety Training Modulates Harmful Misalignment Under On-Policy RL, But Direction Depends on Environment Design

安全培训在政策强化学习下调节有害的错位,但方向取决于环境设计

A Heterogeneous Dual-Network Framework for Emergency Delivery UAVs: Communication Assurance and Path Planning Coordination

紧急投递无人机的异构双网络框架:通信保障与路径规划协调

Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers

在次优控制器上进行全体移动操作,采用离线强化学习

SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

SOAR:扩散模型中自我校正以实现最佳对齐与精炼

KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance

KnowRL:通过强化学习在有限知识指导下提升LLM推理能力

Contextual Multi-Task Reinforcement Learning for Autonomous Reef Monitoring

自主珊瑚礁监测的上下文多任务强化学习

PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning

PromptEcho:视觉语言模型提供的无注释奖励,用于文本到图像强化学习

Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production

安全强化学习与在线过滤,用于生产中的疲劳预测人机任务规划与分配

Teaching LLMs Human-Like Editing of Inappropriate Argumentation via Reinforcement Learning

通过强化学习教授LLM类人编辑不当论证

FastGrasp: Learning-based Whole-body Control method for Fast Dexterous Grasping with Mobile Manipulators

快速抓握:基于学习的全身控制方法,利用移动操作器实现快速灵巧抓握

Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots

树状学习:人形机器人的多技能持续学习框架

E2E-Fly: An Integrated Training-to-Deployment System for End-to-End Quadrotor Autonomy

E2E-Fly:一个集成的训练到部署系统,实现端到端四旋翼自主性

Graph-based Hierarchical Deep Reinforcement Learning for Deliverable Block Propagation with Optimal Hybrid Cost in Web 3.0

基于图的分层深度强化学习,用于Web 3.0中实现可交付块传播,实现最佳混合成本

Cycle-Consistent Search: Question Reconstructability as a Proxy Reward for Search Agent Training

循环一致搜索:问题可重建性作为搜索代理训练的代理奖励

Keyword: diffusion policy

There is no result