生成时间: 2026-04-09 17:11:44 (UTC+8); Arxiv 发布时间: 2026-04-09 20:00 EDT (2026-04-10 08:00 UTC+8)

今天共有 30 篇相关文章

Keyword: reinforcement learning

Application-Driven Pedagogical Knowledge Optimization of Open-Source LLMs via Reinforcement Learning and Supervised Fine-Tuning

通过强化学习和监督微调,应用驱动的教学知识优化开源大型语言模型

A Control Barrier Function-Constrained Model Predictive Control Framework for Safe Reinforcement Learning

一个控制障碍功能约束模型用于安全强化学习的预测控制框架

Discrete Flow Matching Policy Optimization

离散流匹配策略优化

Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs

Hyperfastrl:基于超网络的强化学习,用于统一控制参数化混沌偏微分方程

Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning

训练-小部署-大:利用基于扩散的多机器人规划

TwinLoop: Simulation-in-the-Loop Digital Twins for Online Multi-Agent Reinforcement Learning

TwinLoop:在线多智能体强化学习的环中模拟数字孪生

The Theorems of Dr. David Blackwell and Their Contributions to Artificial Intelligence

大卫·布莱克韦尔博士定理及其对人工智能的贡献

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

重新思考推理中的泛化 SFT:关于优化、数据与模型能力的条件分析

KD-MARL: Resource-Aware Knowledge Distillation in Multi-Agent Reinforcement Learning

KD-MARL:多智能体强化学习中的资源感知知识蒸馏

Walk the Talk: Bridging the Reasoning-Action Gap for Thinking with Images via Multimodal Agentic Policy Optimization

言行一致:通过多模态代理策略优化弥合推理与行动之间的差距,以图像思考

Equivariant Multi-agent Reinforcement Learning for Multimodal Vehicle-to-Infrastructure Systems

多模态车辆到基础设施系统的等变多智能体强化学习

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

FP4探索,BF16列车:通过高效推广规模进行扩散强化学习

POS-ISP: Pipeline Optimization at the Sequence Level for Task-aware ISP

POS-ISP:任务感知ISP的序列层面流水线优化

A First Guess is Rarely the Final Answer: Learning to Search in the Travelling Salesperson Problem

第一次猜测很少是最终答案:学会在旅行推销员问题中寻找答案

Sustainable Transfer Learning for Adaptive Robot Skills

适应性机器人技能的可持续迁移学习

Learning-Based Strategy for Composite Robot Assembly Skill Adaptation

基于学习的复合机器人组装技能适应策略

MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation

MAR-GRPO:用于AR扩散混合图像生成的稳定GRPO

EmoMAS: Emotion-Aware Multi-Agent System for High-Stakes Edge-Deployable Negotiation with Bayesian Orchestration

EmoMAS:情绪感知多智能体系统,用于高风险边缘部署谈判,采用贝叶斯编排

Predictive Representations for Skill Transfer in Reinforcement Learning

强化学习中技能转移的预测表征

Production-Ready Automated ECU Calibration using Residual Reinforcement Learning

使用残差强化学习的生产准备自动化ECU校准

Epistemic Robust Offline Reinforcement Learning

认知稳健离线强化学习

STRIDE-ED: A Strategy-Grounded Stepwise Reasoning Framework for Empathetic Dialogue Systems

STRIDE-ED:一个基于策略的共情对话系统逐步推理框架

Energy Saving for Cell-Free Massive MIMO Networks: A Multi-Agent Deep Reinforcement Learning Approach

无单元大规模MIMO网络的节能:多智能体深度强化学习方法

Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing

多回合推理大型语言模型用于移动边缘计算中的任务卸载

Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization

链条中的推理,树中学习:多回合代理策略优化的自我纠正与嫁接

Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization

Smart Commander:舰队级PHM决策优化的分层强化学习框架

BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment

BRIDGE:通过强化学习查询对齐实现多模态到文本检索

Robust Quadruped Locomotion via Evolutionary Reinforcement Learning

通过进化强化学习实现的强健四足行走

Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Android Coach:通过单一状态多行动提升在线代理培训效率

Keyword: diffusion policy

RichMap: A Reachability Map Balancing Precision, Efficiency, and Flexibility for Rich Robot Manipulation Tasks

RichMap:一张平衡精准、高效和灵活性的可达性地图,适用于丰富的机器人操作任务