生成时间: 2025-11-19 16:30:31 (UTC+8); Arxiv 发布时间: 2025-11-19 20:00 EST (2025-11-20 09:00 UTC+8)

今天共有 25 篇相关文章

Keyword: reinforcement learning

Deep reinforcement learning-based spacecraft attitude control with pointing keep-out constraint

基于深度强化的学习航天器姿态控制,带有指向保持约束

Quantifying Distribution Shift in Traffic Signal Control with Histogram-Based GEH Distance

基于直方图的GEH距离量化交通信号控制中的分布偏移

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

击败长尾分析:用于强化学习的分布感知推测解码

TaoSearchEmb: A Multi-Objective Reinforcement Learning Framework for Dense Retrieval in Taobao Search

TaoSearchEmb:一个用于淘宝搜索密集检索的多目标强化学习框架

Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding

从小处开始,思考宏大:基于课程的相对政策优化以视觉基础

GRPO Privacy Is at Risk: A Membership Inference Attack Against Reinforcement Learning With Verifiable Rewards

GRPO隐私面临风险:针对可验证奖励强化学习的成员推断攻击

Text-Driven Reasoning Video Editing via Reinforcement Learning on Digital Twin Representations

基于数字孪生表征的强化学习文本驱动推理视频编辑

Fair-GNE : Generalized Nash Equilibrium-Seeking Fairness in Multiagent Healthcare Automation

公平-GNE:多智能体医疗自动化中的广义纳什均衡寻求公平性

A Receding Horizon Reinforcement Learning Framework for Campus Chiller Energy Management - A case study from an Australian University

校园冷水机组能源管理的后退地平线强化学习框架——澳大利亚一所大学的案例研究

FreeMusco: Motion-Free Learning of Latent Control for Morphology-Adaptive Locomotion in Musculoskeletal Characters

FreeMusco:肌肉骨骼特征中形态适应性运动的无运动学习潜控

Parallelizing Tree Search with Twice Sequential Monte Carlo

用双重顺序蒙特卡洛并行化树搜索

Object-Centric World Models for Causality-Aware Reinforcement Learning

基于因果关系感知强化学习的对象中心世界模型

Don't Miss the Forest for the Trees: In-Depth Confidence Estimation for LLMs via Reasoning over the Answer Space

不要错过《为树而见森林:通过答案空间推理为大型语言模型提供深入的信心估计》

MA-SLAM: Active SLAM in Large-Scale Unknown Environment using Map Aware Deep Reinforcement Learning

MA-SLAM:利用地图感知深度强化学习在大规模未知环境中进行主动SLAM

Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

多感官预训练,用于接触丰富机器人强化学习

Achieving Safe Control Online through Integration of Harmonic Control Lyapunov-Barrier Functions with Unsafe Object-Centric Action Policies

通过整合谐波控制实现安全在线控制 李雅普诺夫障碍功能与不安全以对象为中心的行动政策

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Agent-R1:通过端到端强化学习训练强大的LLM代理

Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

面具现实生活中:LLM引导的奖励与演示和语言的歧义

ReflexGrad: Three-Way Synergistic Architecture for Zero-Shot Generalization in LLM Agents

ReflexGrad:用于LLM代理零样本推广的三路协同架构

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

Seer:在线上下文学习,用于快速同步LLM强化学习

Failure to Mix: Large language models struggle to answer according to desired probability distributions

混合失败:大型语言模型难以根据期望的概率分布回答

Heterogeneous Multi-Agent Proximal Policy Optimization for Power Distribution System Restoration

电力分配系统恢复的异构多智能体近端策略优化

$π^{*}_{0.6}$: a VLA That Learns From Experience

$π^{*}_{0.6}$:从经验中学习的VLA

UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning

UniGen-1.5:通过奖励统一增强强化学习中的图像生成与编辑

Keyword: diffusion policy

Coffee: Controllable Diffusion Fine-tuning

咖啡:可控扩散微调