生成时间: 2026-04-30 18:08:15 (UTC+8); Arxiv 发布时间: 2026-04-30 20:00 EDT (2026-05-01 08:00 UTC+8)

今天共有 18 篇相关文章

Keyword: reinforcement learning

Digital Twin-assisted belief-state reinforcement learning for latency-robust ISAC in 6G networks

数字孪生辅助信念状态强化学习,用于6G网络中具有延迟的ISAC

A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication

基于图神经网络的多智能体深度强化学习综述

Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems

深度强化学习在网络人工胰腺系统事件触发控制中的应用

AI Observability for Large Language Model Systems: A Multi-Layer Analysis of Monitoring Approaches from Confidence Calibration to Infrastructure Tracing

大型语言模型系统的AI可观察性:从置信度校准到基础设施追踪的多层次监控方法分析

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

DORA:一种可扩展的异步强化学习系统,用于语言模型训练

MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution

MedSynapse-V:通过潜在记忆进化连接视觉感知与临床直觉

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

通过精确熵曲线控制解决LLM RL的性能饱和问题

Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking

不确定性感知奖励折扣以缓解奖励黑客行为

Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning

李雅普诺夫引导的自我对齐:离线安全强化学习的测试时间适应

Learning to Route Electric Trucks Under Operational Uncertainty

学习在运营不确定性下为电动卡车布线

PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners

PAINT:部分解自适应插值培训,面向自我提炼推理者

ATLAS: An Annotation Tool for Long-horizon Robotic Action Segmentation

ATLAS:长视野机器人动作分割的注释工具

FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

FutureWorld:一个用于培训预测代理并带来真实结果奖励的实时环境

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

GLM-5V-Turbo:迈向多模态代理的本土基础模型

Factorized Latent Reasoning for LLM-based Recommendation

基于LLM的推荐的因式分解潜在推理

Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training

基于规则的高级教练,用于有限模拟训练下的搜救无人机任务中的目标条件强化学习

Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics

概率性神经网络动力学的不确定性感知预测安全过滤器

ClawGym: A Scalable Framework for Building Effective Claw Agents

ClawGym:打造高效爪特工的可扩展框架

Keyword: diffusion policy

There is no result