生成时间: 2026-05-05 18:07:03 (UTC+8); Arxiv 发布时间: 2026-05-05 20:00 EDT (2026-05-06 08:00 UTC+8)

今天共有 79 篇相关文章

Keyword: reinforcement learning

RA-CMF: Region-Adaptive Conditional MeanFlow for CT Image Reconstruction

RA-CMF:区域自适应条件平均流量用于CT图像重建

Interpretable experiential learning based on state history and global feedback

基于州历史和全球反馈的可解释体验式学习

PPO guided Agentic Pipeline for Adaptive Prompt Selection and Test Case Generation

PPO引导代理流水线用于自适应提示选择和测试用例生成

Your Loss is My Gain: Low Stake Attacks on Liquid Staking Pools

你的损失就是我的收获:对流动质押池的低额攻击

Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning

通过多智能体强化学习实现小型无人机系统异构机队之间的分离保障

Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines

巴西医疗保健的LLM教学:从官方临床指南中注入知识

Learning to Race in Minutes: Infoprop Dyna on the Mini Wheelbot

几分钟内学会赛车:Mini Wheelbot上的Infoprop Dyna

PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs

PERSA:教授式的强化学习,采用大型语言模型(LLM)个性化反馈

Forager: a lightweight testbed for continual learning with partial observability in RL

Forager:一个轻量级的持续学习测试平台,具备部分可观测性

The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining

思考的隐性成本:LM在预培训后对能源使用与环境影响

Zero-Shot Signal Temporal Logic Planning with Disjunctive Branch Selection in Dynamic Semantic Maps

动态语义映射中的零样信号时间逻辑规划与析取分支选择

Breaking the Computational Barrier: Provably Efficient Actor-Critic for Low-Rank MDPs

突破计算障碍:低秩MDP可证明的高效actor-critic算法

S^3-R1: Learning to Retrieve and Answer Step-by-Step with Synthetic Data

S^3-R1:学习如何逐步检索并回答合成数据

Bi-Level Reinforcement Learning Control for an Underactuated Blimp via Center-of-Mass Reconfiguration

通过质心重构实现对欠驱动飞艇的双级强化学习控制

Beyond Perceptual Shortcuts: Causal-Inspired Debiasing Optimization for Generalizable Video Reasoning in Lightweight MLLMs

超越感知捷径:基于因果的去偏见优化,用于轻量级多层次语言模型中的通用视频推理

Segment-Aligned Policy Optimization for Multi-Modal Reasoning

多模态推理的分段对齐策略优化

A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis

多视图媒体画像套件:资源、评估与分析

LLM Output Detectability and Task Performance Can be Jointly Optimized

LLM的输出可检测性和任务性能可以共同优化

Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data

基于模型的主动成本生成,用于离线学习安全政策,且违规数据有限

PACE: Parameter Change for Unsupervised Environment Design

PACE:无监督环境设计的参数变更

Coordination Architecture Shapes Continuous Demand Response Outcomes in Building Districts

协调架构塑造建筑区内的持续需求响应结果

Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

通过强化学习向MLLM注入分布意识以实现深度不平衡回归

Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks

Medmarks:一套全面的开源大型语言模型基准测试套件,适用于医疗任务

CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making

CoFlow:为离线多智能体决策提供协调的少数步骤流程

LLM-Foraging: Large Language Models for Decentralized Swarm Robot Foraging

LLM采集:用于去中心化群体机器人采集的大型语言模型

An Intelligent eUPF for Time-Sensitive Path Selection in B5G Edge Networks

用于B5G边缘网络中时间敏感路径选择的智能eUPF

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

通过群体相对策略优化,在结构因果模型中扎根多跳推理

SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning

SciResearcher:扩展深度研究代理以实现前沿科学推理

Protein-Conditioned Multi-Objective Reinforcement Learning for Full-Length mRNA Design

全长mRNA设计中的蛋白质条件多目标强化学习

Dynamics Distillation for Efficient and Transferable Control Learning

动力学蒸馏以实现高效且可转移的控制学习

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

VAnim:用于结构保持向量动画的渲染感知稀疏状态建模

MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models

MIRL:视觉语言模型中的互信息引导强化学习

Hybrid Quantum Reinforcement Learning with QAOA for Improved Vehicle Routing Optimization

结合QAOA的混合量子强化学习,提升车辆路径优化

TRIMMER: A New Paradigm for Video Summarization through Self-Supervised Reinforcement Learning

TRIMMER:通过自我监督强化学习实现视频摘要的新范式

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

通过流锚噪声条件Q学习实现高效且富有表现力的离线强化学习

Zero-Shot, Safe and Time-Efficient UAV Navigation via Potential-Based Reward Shaping, Control Lyapunov and Barrier Functions

通过基于潜能的奖励形塑、控制李雅普诺夫和屏障功能实现零发射、安全且高效无人机导航

MAGIC: Multi-Step Advantage-Gated Causal Influence for Multi-agent Reinforcement Learning

MAGIC:多步优势门槛因果影响,用于多智能体强化学习

Selector-Guided Autonomous Curriculum for One-Shot Reinforcement Learning from Verifiable Rewards

选择者引导的自主课程,实现一次性强化学习,基于可验证奖励

RMGAP: Benchmarking the Generalization of Reward Models across Diverse Preferences

RMGAP:针对不同偏好的奖励模型推广基准测试

Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning

合作多智能体强化学习的质量感知探索预算分配

Chart-FR1: Visual Focus-Driven Fine-Grained Reasoning on Dense Charts

图表-FR1:视觉聚焦驱动的细粒度推理,聚焦于密集图表

Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading

Moira:语言驱动的层级强化学习用于配对交易

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

多用户决斗强盗:利用纳什社会福利的公平方法

AdamO: A Collapse-Suppressed Optimizer for Offline RL

AdamO:离线强化学习的抑制崩溃优化器

Stability of Control Lyapunov Function Guided Reinforcement Learning

控制稳定性 李雅普诺夫函数引导强化学习

Enhancing Judgment Document Generation via Agentic Legal Information Collection and Rubric-Guided Optimization

通过代理法律信息收集和评分标准引导优化,增强判决文件生成

Optimization of CV-QKD Under Practical Constraints

在实际约束下的CV-QKD优化

Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition

Coopetition-Gym v1:一个在战略合作下实现混合动机多智能体强化学习的正式基础平台

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

通过优化奖励函数与搜索驱动强化学习,增强LLM推理能力

Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing Modulation and Spectrum Allocation in Elastic Optical Networks

图变换器和稳定强化学习用于弹性光学网络中大规模动态路由调制和频谱分配

Reinforcement Learning Trained Observer Control for Bearings-Only Tracking

强化学习训练观察者控制,仅用方位跟踪

Hierarchical Cooperative MARL for Joint Downlink PRB and Power Allocation in a 5G System

5G系统中用于联合下行PRB和功率分配的分层合作MARL

Combining Trained Models in Reinforcement Learning

强化学习中的训练模型结合

Experience Constrained Hierarchical Federated Reinforcement Learning for Large-scale UAV Teams in Hazardous Environments

在危险环境中,体验受限的层级联合强化学习,适用于大型无人机团队

Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning

规划者很重要!一个高效且不平衡的多代理协作框架,用于长期规划

T$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

T$^2$PO:稳定多回合代理强化学习的不确定性引导探索控制

Do We Really Need Immediate Resets? Rethinking Collision Handling for Efficient Robot Navigation

我们真的需要立即重置吗?重新思考碰撞处理以实现高效机器人导航

ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring

ARGUS:通过进化强化与对抗性裁判实现政策适应性广告治理

Break the Block: Dynamic-size Reasoning Blocks for Diffusion Large Language Models via Monotonic Entropy Descent with Reinforcement Learning

打破障碍:通过单调熵下降与强化学习实现扩散大型语言模型的动态规模推理块

Compositional Multi-hop Factual Error Correction via Decomposition-and-Injection

通过分解与注入进行合成多跳事实错误纠正

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

基于目标的财富管理的元强化学习方法

Differentiable Kernel Ridge Regression for Deep Learning Pipelines

深度学习管道中的可微核脊回归

Binary Rewards and Reinforcement Learning: Fundamental Challenges

二元奖励与强化学习:基本挑战

Enhancing Multimodal In-Context Learning via Inductive-Deductive Reasoning

通过归纳演绎推理增强多模态上下文学习

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

重技能:重思维作为智能束缚中的内在技能

Reference-Sampled Boltzmann Projection for KL-Regularized RLVR: Target-Matched Weighted SFT, Finite One-Shot Gaps, and Policy Mirror Descent

KL正则化RLVR的参考采样玻尔兹曼投影:目标匹配加权SFT、有限单次间隙和政策镜像下降

Efficient Preference Poisoning Attack on Offline RLHF

离线RLHF上的高效偏好中毒攻击

Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators

超越专业化:通过程序化地图生成器实现强化学习导航

Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability

部分可观察性下化疗控制的循环深度强化学习

Gradient-Gated DPO: Stabilizing Preference Optimization in Language Models

梯度门控DPO:语言模型中偏好优化的稳定化

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

自动对焦:不确定性感知的主动视觉搜索以实现图形界面的基础

Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE

Mamoda2.5:利用DiT-MoE增强统一多模态模型

AcademiClaw: When Students Set Challenges for AI Agents

AcademiClaw:当学生为人工智能代理设定挑战时

Federated Reinforcement Learning for Efficient Mobile Crowdsensing under Incomplete Information

联合强化学习用于在不完整信息下高效移动群众感知

Perceptual Flow Network for Visually Grounded Reasoning

视觉基础推理的感知流网络

A decoupled diffusion planner that adapts to changing cost limits by using cost-conditioned generation for safety and reward gradients for performance

一种解耦扩散规划器,通过使用成本条件生成实现安全性和奖励梯度来适应不断变化的成本限制

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

基于LLM的多智能体系统通过编排追踪进行强化学习

Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters

通过算法和超参数的SHAP分析提升机器人学中的强化学习泛化性

Keyword: diffusion policy

Hydra-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control

Hydra-DP3:三维扩散策略的频率感知适定,用于视觉运动控制