生成时间: 2026-05-15 18:27:49 (UTC+8); Arxiv 发布时间: 2026-05-15 20:00 EDT (2026-05-16 08:00 UTC+8)

今天共有 58 篇相关文章

Keyword: reinforcement learning

CA2: Code-Aware Agent for Automated Game Testing

CA2:用于自动游戏测试的代码感知代理

Rethinking Molecular OOD Generalization via Target-Aware Source Selection

重新思考通过目标感知源选择进行分子OOD泛化

WarmPrior: Straightening Flow-Matching Policies with Temporal Priors

WarmPrior:用时间先验调整流量匹配策略

R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning

R2R2:通过减少冗余实现强化经验再利用的强硬表征,实现自我预测学习

Optimal design of solar-battery hybrid resources considering multi-market participation under weather and price uncertainty

考虑多市场参与、天气和价格不确定性下的太阳能-电池混合资源的最优设计

Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning

是视力差还是思维不当?视觉-语言推理的奖励感知

Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

法律探询型对话代理的双层级对话政策学习

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

通过强化学习与可验证奖励的生成式平面设计,利用LLM进行

Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)

快速医疗互操作资源(FHIR)中工具调用代理的强化学习

Safety-Constrained Reinforcement Learning with Post-Training Reachability Verification for Robot Navigation

安全约束强化学习与培训后可达性验证的机器人导航

MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving

MAPLE:端到端自动驾驶的潜在多智能体游戏

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

MetaAgent-X:通过端到端强化学习突破自动多智能体系统的天花板

GenCircuit-RL: Reinforcement Learning from Hierarchical Verification for Genetic Circuit Design

GenCircuit-RL:遗传回路设计中的分层验证强化学习

PreFT: Prefill-only finetuning for efficient inference

PreFT:仅预填充微调以实现高效推断

Quantum Advantage in Multi Agent Reinforcement Learning

多智能体强化学习中的量子优势

Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability

部分可观测性下安全关键控制的动作条件风险门控

Towards Real-Time Autonomous Navigation: Transformer-Based Catheter Tip Tracking in Fluoroscopy

迈向实时自主导航:基于变压器的导管尖端追踪在透视中

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

PhyMotion:基于物理的人类视频生成的结构化3D运动奖励

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

KVPO:通过KV语义探索实现自回归视频比对的常微分方程原生GRPO

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

混合离散-连续作用空间中的策略优化,通过混合梯度

Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry

用于重用局部过渡几何的矩阵空间强化学习

Sub-Band Full Duplex Resource Allocation: A Predictive Deep Reinforcement Learning Approach

子频带全双工资源分配:一种预测性深度强化学习方法

CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation

CrystalReasoner:性质条件晶体结构生成的推理与强化学习

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

通过自适应任务采样实现分布稳健多任务强化学习

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

带语义奖励的强化学习实现低资源语言扩展而无需对齐税

Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games

数据增强博弈开始,加速不完美信息游戏中的自我对弈探索

Energy-Efficient Quadruped Locomotion with Compliant Feet

节能四足行车,脚部顺从

A Unified Knowledge Embedded Reinforcement Learning-based Framework for Generalized Capacitated Vehicle Routing Problems

基于通用容量车辆路由问题的统一知识嵌入式强化学习框架

Efficient Generative Retrieval for E-commerce Search with Semantic Cluster IDs and Expert-Guided RL

采用语义集群ID和专家引导强化学习的电子商务搜索高效生成检索

Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience

黑箱大型语言模型中多步推理和工具使用的提示策略,结合经验迭代提炼

LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning

LEMON:通过反事实强化学习学习可执行多智能体编排

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization

ROAD:通过双层优化实现离线到在线强化学习的自适应数据混合

Fully Dynamic Rebalancing in Dockless Bike-Sharing Systems via Deep Reinforcement Learning

通过深度强化学习实现无对接单车共享系统的全动态再平衡

Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

从失败中学习:以纠正为导向的策略优化,并获得可验证的奖励

Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

解决行动瓶颈:基于令牌级能量的能动强化学习

Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning

天使还是恶魔:探讨可塑性干预对深度强化学习中后门威胁的影响

Fast Rates for Inverse Reinforcement Learning

逆向强化学习的快速速率

DRL-STAF: A Deep Reinforcement Learning Framework for State-Aware Forecasting of Complex Multivariate Hidden Markov Processes

DRL-STAF:一种用于复杂多变量隐马尔可夫过程状态感知预测的深度强化学习框架

Multi-objective application placement in fog computing using graph neural network-based reinforcement learning

利用基于图神经网络的强化学习,雾计算中的多目标应用部署

Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model

通过与临床世界模型互动,将LLM中的患者动态具体化

Addressing Terminal Constraints in Data-Driven Demand Response Scheduling

数据驱动需求响应调度中的终端限制问题解决

EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding

EARL:迈向一个统一分析引导的强化学习框架,用于自我中心的互动推理和像素基础

Probabilistic Verification of Recurrent Neural Networks for Single and Multi-Agent Reinforcement Learning

单代理和多智能体强化学习中循环神经网络的概率验证

Peng's Q($λ$) for Conservative Value Estimation in Offline Reinforcement Learning

Peng 的 Q($λ$) 用于离线强化学习中的保守价值估计

CaMeRL: Collision-Aware and Memory-Enhanced Reinforcement Learning for UAV Navigation in Multi-Scale Obstacle Environments

CaMeRL:多尺度障碍环境中无人机导航的碰撞感知与记忆增强强化学习

Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

通过闭环验证推理解锁复杂的视觉生成

Critic-Driven Voronoi-Quantization for Distilling Deep RL Policies to Explainable Models

批评者驱动的沃罗诺伊量子化,用于将深度强化学习策略提炼为可解释模型

Chrono-Gymnasium: An Open-Source, Gymnasium-Compatible Distributed Simulation Framework

Chrono-Gymnasium:一个开源、兼容Gymnasium的分布式模拟框架

Slot-MPC: Goal-Conditioned Model Predictive Control with Object-Centric Representations

Slot-MPC:目标条件模型预测控制,基于对象为中心的表示

Not All Symbols Are Equal: Importance-Aware Constellation Design for Semantic Communication

并非所有符号都相同:语义传播中的重要性感知星座设计

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

基于性能的策略优化,用于自适应窗口处理的推测性解码

Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition

通过策略黑森分解的二阶演员-批评者方法用于贴现MDP

Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

通过随机选择的少数机会指导,通过可验证的奖励来提升强化学习

Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use

基于案例的自适应推理与执行校准,用于LLM工具

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

扩散OPD:扩散模型中政策提炼的统一视角

Learning from Language Feedback via Variational Policy Distillation

通过变分策略蒸馏从语言反馈中学习

Self-Distilled Agentic Reinforcement Learning

自我蒸馏的代理强化学习

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

RAVEN:实时自回归视频推断,采用一致性模型GRPO

Keyword: diffusion policy

There is no result