生成时间: 2026-06-15 21:57:54 (UTC+8); Arxiv 发布时间: 2026-06-15 20:00 EDT (2026-06-16 08:00 UTC+8)

今天共有 30 篇相关文章

Keyword: reinforcement learning

UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems

UP-NRPA:基于用户肖像的嵌套推广政策适应,用于目标导向对话系统中大型语言模型规划

Orchestra-o1: Omnimodal Agent Orchestration

Orchestra-o1:全模态代理编排

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

混合开放式三进化造就更深入的研究者

Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response

安全契约图:多智能体强化学习,用于自主网络安全响应

Temporally Consistent Graph Q-Networks for Intelligent Network Control

智能网络控制的时序一致图Q-网络

TetraRL: A Self-Adaptive Runtime for On-Device Deep Reinforcement Learning Systems

TetraRL:用于设备内深度强化学习系统的自适应运行时

Can Post-Training Turn LLMs into Good Medical Coders? An Empirical Study of Generative ICD Coding

培训后能让LLM成为优秀的医疗编码员吗?生成式ICD编码的实证研究

Explainable and Trustworthy Speech Emotion Recognition Using Confidence Score and Reinforcement Learning Rectified Speech Emotion Descriptors

可解释且可信的语音情感识别 使用信心评分和强化学习 纠正语音情绪描述符

Contract-Based Compositional Shielding for Safe Multi-Agent Reinforcement Learning

基于契约的组合屏蔽,用于安全多智能体强化学习

Aidos: A Hybrid Optimization Algorithm for Beam Hopping Scheduling in NGSO Mega-Constellations

Aidos:NGSO巨型星座中束流跳跳调度的混合优化算法

CacheRL:Multi-Turn Tool-Calling Agents via Cached Rollouts and Hybrid Reward

CacheRL:通过缓存部署和混合奖励实现多回合工具调用代理

DRIVE: Distributional and Retrieval-Augmented Bidding with Value Evaluation

驱动力:分发式与检索增强竞价,带价值评估

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

HarnessX:一个可组合、自适应且可进化的智能体束束铸造厂

Robust Fall Recovery for Armless Bipedal-Wheeled Robots Via Force-Guided Learning

无臂双足轮机器人通过力引导学习实现稳健的坠落恢复

Retrospective Progress-Aware Self-Refinement for LLM Agent Training

回顾性进展感知自我精炼用于LLM代理培训

ForceForget: Reinforcement Concept Removal for Enhancing Safety in Text-to-Image Models

ForceForget:消除强化概念以提升文本转图像模型的安全性

Elastic Queries Reinforcement Learning: Self-Aware Policy Execution for VLA Models

弹性查询强化学习:VLA模型的自觉策略执行

CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning

CSPO:约束敏感策略优化以实现安全强化学习

Causal Object-Centric Models for Planning with Monte Carlo Tree Search

基于因果对象的规划模型,采用蒙特卡洛树搜索

Kine2Go: Kinematic dataset for the Unitree Go2 robot with diverse gaits and motions

Kine2Go:适用于具有多样步态和动作的Unitree Go2机器人的运动学数据集

From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI

从聊天机器人到数字同事:向持久自主人工智能的范式转变

Provably Safe, Yet Scalable Reinforcement Learning

可验证安全且可扩展的强化学习

VISTA: View-Consistent Self-Verified Training for GUI Grounding

VISTA:视图一致的自我验证GUI基础培训

A Statistical and Machine Learning Framework for Operational Threshold Detection and Deployable Dispatch Controller Development in Hydrogen Multi-Energy Systems

氢多能系统中运行阈值检测和可部署调度控制器开发的统计与机器学习框架

Safe Reinforcement Learning of Autonomous Highway Driving: A Unified Framework for Safety and Efficiency

自动驾驶安全强化学习:安全与效率统一框架

HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities

HPSv3++:涵盖扩散模型全谱的奖励模型尺度

CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment

CORA:通过一致性导向推理对齐分析并弥合多模态RLVR中的思维与答案差距

Learning Coordinated Preference for Multi-Objective Multi-Agent Reinforcement Learning

多目标多代理强化学习的协调偏好

Keyword: diffusion policy

Diffusion Policy Optimization without Drifting Apart

扩散政策优化而不分离

Spatially Conditioned Diffusion Policy: Learning Precise and Robust Manipulation with a Single RGB Camera

空间条件扩散策略:学习单一RGB相机的精确稳健操作