生成时间: 2026-03-10 16:52:10 (UTC+8); Arxiv 发布时间: 2026-03-10 20:00 EDT (2026-03-11 08:00 UTC+8)

今天共有 93 篇相关文章

Keyword: reinforcement learning

Autonomous AI Agents for Option Hedging: Enhancing Financial Stability through Shortfall Aware Reinforcement Learning

期权对冲的自主人工智能代理:通过缺口感知强化学习提升财务稳定性

Know When You're Wrong: Aligning Confidence with Correctness for LLM Error Detection

知道你错了:将信心与正确性对齐以进行LLM错误检测

Multi-Agent DRL for V2X Resource Allocation: Disentangling Challenges and Benchmarking Solutions

多代理 DRL 用于 V2X 资源分配:解开挑战与基准测试解决方案

Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible Reinforcement Learning Research

扩展策略,而非计算:一个独立的开源星际争霸II无障碍强化学习基准测试

Not all tokens are needed(NAT): token efficient reinforcement learning

并非所有令牌都必须(NAT):令牌高效的强化学习

Advances in GRPO for Generation Models: A Survey

发电模型GRPO的进展:综述

PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment

PaLMR:通过多模态过程对齐实现忠实的视觉推理

Digital Twin-Enabled Mobility-Aware Cooperative Caching in Vehicular Edge Computing

数字孪生驱动的移动感知协作缓存在车辆边缘计算中

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?

GameVerse:视觉语言模型能否从基于视频的反思中学习?

Hybrid Orchestration of Edge AI and Microservices via Graph-based Self-Imitation Learning

通过基于图的自我模仿学习,边缘人工智能与微服务的混合编排

Don't Freeze, Don't Crash: Extending the Safe Operating Range of Neural Navigation in Dense Crowds

别冻结,别撞车:在密集人群中延长神经导航的安全作范围

HybridMimic: Hybrid RL-Centroidal Control for Humanoid Motion Mimicking

混合模拟:用于模拟类人生物运动的混合强化学习中心控制

HGT-Scheduler: Deep Reinforcement Learning for the Job Shop Scheduling Problem via Heterogeneous Graph Transformers

HGT-调度器:通过异构图变换器解决工作车间调度问题的深度强化学习

Optimistic Policy Regularization

乐观政策正则化

Multi-Agent Reinforcement Learning with Submodular Reward

多智能体强化学习与亚模块奖励

Reinforcing the World's Edge: A Continual Learning Problem in the Multi-Agent-World Boundary

强化世界边界:多智能体-世界边界中的持续学习问题

Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration

LLM协作中多智能体强化学习的上下文反事实学分赋值

Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments

耦合动力学环境中的联合MDP与强化学习

Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards

图表-强化学习:通过强化学习实现的广义图表理解,附带可验证的奖励

Topology-Aware Reinforcement Learning over Graphs for Resilient Power Distribution Networks

基于图的拓扑感知强化学习,用于弹性电力分配网络

NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning

NePPO:通用和多智能体强化学习的近似策略优化

Diffusion Controller: Framework, Algorithms and Parameterization

扩散控制器:框架、算法与参数化

AdaGen: Learning Adaptive Policy for Image Synthesis

AdaGen:学习图像合成的自适应策略

AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge

AutoChecklist:用于LLM作为评判的可组合流程用于清单生成和评分

RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States

RESCHED:从基于Transformer架构的简化状态重新思考灵活工作坊调度

SSP: Safety-guaranteed Surgical Policy via Joint Optimization of Behavioral and Spatial Constraints

SSP:通过联合优化行为和空间约束实现安全保障的手术政策

Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction

Dreamer-CDP:通过连续确定性表示预测改进无重建世界模型

Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR

倒计时代码:研究RLVR中奖励黑客的出现与推广的试验平台

Facial Expression Generation Aligned with Human Preference for Natural Dyadic Interaction

与人类对自然二元互动偏好相符的面部表情生成

Learning From Failures: Efficient Reinforcement Learning Control with Episodic Memory

从失败中学习:基于情节记忆的高效强化学习控制

$\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving

$\textbf{Re}^{2}$:通过强化学习通过解析解锁大型语言模型推理

Reinforcement Learning for Vehicle-to-Grid Voltage Regulation: Single-Hub to Multi-Hub Coordination with Battery-Aware Constraints

车辆与电网电压调节的强化学习:单枢纽到多枢纽的协调,具备电池感知约束

Learning When to Cooperate Under Heterogeneous Goals

学习何时在异质目标下合作

Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving

基于运动学感知的潜在世界模型,实现数据高效的自动驾驶

Adaptive Double-Booking Strategy for Outpatient Scheduling Using Multi-Objective Reinforcement Learning

利用多目标强化学习实现门诊预约的自适应双重预约策略

AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery

AutoResearch-RL:用于自主神经结构发现的永续自我评估强化学习代理

Adversarial Latent-State Training for Robust Policies in Partially Observable Domains

针对部分可观测域中稳健策略的对抗性潜态训练

Learning to Reflect: Hierarchical Multi-Agent Reinforcement Learning for CSI-Free mmWave Beam-Focusing

学习反射:无CSI毫米波束聚焦的分层多智能体强化学习

Underwater Embodied Intelligence for Autonomous Robots: A Constraint-Coupled Perspective on Planning, Control, and Deployment

自主机器人的水下具身智能:关于规划、控制与部署的约束耦合视角

Dynamic Vehicle Routing Problem with Prompt Confirmation of Advance Requests

动态车辆路由问题,需提前确认请求

Generalization in Online Reinforcement Learning for Mobile Agents

移动代理在线强化学习中的推广

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

线性二次高斯控制的成本驱动表述学习:第二部分

Med-Evo: Test-time Self-evolution for Medical Multimodal Large Language Models

Med-Evo:医学多模态大型语言模型的测试时间自我演化

EvolveReason: Self-Evolving Reasoning Paradigm for Explainable Deepfake Facial Image Identification

EvolveReason:可解释的深度伪造面部图像识别自我演进推理范式

InterReal: A Unified Physics-Based Imitation Framework for Learning Human-Object Interaction Skills

InterReal:一个基于物理的统一模仿框架,用于学习人与物交互技能

Reinforcement learning-based dynamic cleaning scheduling framework for solar energy system

基于强化学习的太阳能系统动态清洁调度框架

TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning

TableMind++:一个用于工具增强表推理的不确定性意识程序化代理

COOL-MC: Verifying and Explaining RL Policies for Multi-bridge Network Maintenance

COOL-MC:验证和解释多桥网络维护的强化学习政策

Constraints Matrix Diffusion based Generative Neural Solver for Vehicle Routing Problems

基于矩阵扩散的生成神经求解器用于车辆路由问题的约束

GeoLoco: Leveraging 3D Geometric Priors from Visual Foundation Model for Robust RGB-Only Humanoid Locomotion

GeoLoco:利用Visual Foundation模型中的3D几何先验数据实现仅限RGB的人形运动

Exoskeleton Control through Learning to Reduce Biological Joint Moments in Simulations

通过学习控制外骨骼以减少仿真中的生物关节力矩

Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving

Helix:开放式科学问题解决的进化强化学习

Numerical Approach for On-the-Fly Active Flow Control via Flow Map Learning Method

通过流图学习方法实现实时主动流量控制的数值方法

Mitigating the Memory Bottleneck with Machine Learning-Driven and Data-Aware Microarchitectural Techniques

利用机器学习驱动和数据感知微架构技术缓解内存瓶颈

Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization

平均奖励约束MDP与神经批判和一般政策参数化的全局收敛

TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward

TDM-R1:强化具有不可微奖励的少数步扩散模型

Residual Control for Fast Recovery from Dynamics Shifts

动力学变化快速恢复的残差控制

ProgAgent:A Continual RL Agent with Progress-Aware Rewards

ProgAgent:一个持续的强化学习代理,具有进度感知奖励

Toward Global Intent Inference for Human Motion by Inverse Reinforcement Learning

通过逆向强化学习实现人类运动的全局意图推断

Preference-Conditioned Reinforcement Learning for Space-Time Efficient Online 3D Bin Packing

偏好条件强化学习用于空间时间高效的在线3D垃圾桶装箱

Relating Reinforcement Learning to Dynamic Programming-Based Planning

将强化学习与基于动态编程的规划联系起来

SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

SynPlanResearch-R1:鼓励探索工具以进行合成计划的深度研究

SMGI: A Structural Theory of General Artificial Intelligence

SMGI:通用人工智能的结构理论

SGG-R$^{\rm 3}$: From Next-Token Prediction to End-to-End Unbiased Scene Graph Generation

SGG-R$^{\rm 3}$:从下一标记预测到端到端无偏场景图生成

Model-Free DRL Control for Power Inverters: From Policy Learning to Real-Time Implementation via Knowledge Distillation

无模型的电力逆变器DRL控制:从策略学习到通过知识蒸馏实现的实时实施

VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

VORL-EXPLORE:动态环境中多机器人探索的混合学习规划方法

On the Feasibility and Opportunity of Autoregressive 3D Object Detection

关于自回归三维物体检测的可行性和机遇

MJ1: Multimodal Judgment via Grounded Verification

MJ1:通过有据核查实现的多模态判断

ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning

ImageEdit-R1:通过强化学习提升多智能体图像编辑

In-Context Reinforcement Learning for Tool Use in Large Language Models

用于大型语言模型工具的上下文强化学习

Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization

迈向基于LLM的稳健评判:分类偏见评估与去偏优化

DeReCo: Decoupling Representation and Coordination Learning for Object-Adaptive Decentralized Multi-Robot Cooperative Transport

DeReCo:对象自适应去中心化多机器人协作运输的解耦表示与协调学习

Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting

基于模型的离线强化学习,采用稳健的价值感知模型学习,并带有隐式可微的自适应加权

Towards Human-Like Manipulation through RL-Augmented Teleoperation and Mixture-of-Dexterous-Experts VLA

通过强化学习增强远程作和灵巧专家混合VLA实现类人作

RexDrug: Reliable Multi-Drug Combination Extraction through Reasoning-Enhanced LLMs

RexDrug:通过推理增强的大型语言模型实现可靠的多药组合提取

SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

SlowBA:针对基于VLM的图形界面代理的效率后门攻击

Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective

揭示大型语言模型中的行为可塑性:代币条件视角

A Recipe for Stable Offline Multi-agent Reinforcement Learning

稳定离线多智能体强化学习的配方

Aligning to Illusions: Choice Blindness in Human and AI Feedback

与幻觉对齐:人类与人工智能反馈中的选择盲点

Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

具备共享表示的元强化学习实现能源系统的快速适应

Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

推理作为压缩:通过条件信息瓶颈统一预算强制

Integrating Lagrangian Neural Networks into the Dyna Framework for Reinforcement Learning

将拉格朗日神经网络整合进Dyna强化学习框架

Oracle-Guided Soft Shielding for Safe Move Prediction in Chess

Oracle引导软屏蔽用于国际象棋安全走法预测

Breaking the Bias Barrier in Concave Multi-Objective Reinforcement Learning

打破凹多目标强化学习中的偏见壁垒

Impact of Connectivity on Laplacian Representations in Reinforcement Learning

连通性对强化学习中拉普拉斯表征的影响

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

RetroAgent:通过回顾性双重内在反馈从解决到进化

MetaWorld-X: Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation

MetaWorld-X:通过VLM编排专家进行人形机车控的分层世界建模

Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control

迈向批量到流的深度强化学习以实现持续控制

Diff-Muscle: Efficient Learning for Musculoskeletal Robotic Table Tennis

Diff-Muscle:肌肉骨骼机器人乒乓球的高效学习

Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery

将经典平衡控制原则融入强化学习以实现类人生物恢复

How Far Can Unsupervised RLVR Scale LLM Training?

无监督RLVR能在多大程度上扩展LLM培训?

Agentic Critical Training

能动批判训练

Keyword: diffusion policy

DexKnot: Generalizable Visuomotor Policy Learning for Dexterous Bag-Knotting Manipulation

DexKnot:灵活袋结作的通用体力运动政策学习