生成时间: 2026-06-25 18:46:02 (UTC+8); Arxiv 发布时间: 2026-06-25 20:00 EDT (2026-06-26 08:00 UTC+8)

今天共有 54 篇相关文章

Keyword: reinforcement learning

ReviewGuard: Aligning LLM-Assisted Peer Review with Long-Term Scientific Impact

ReviewGuard:将LLM辅助同行评审与长期科学影响相结合

LLM Evolution as an Industry-Scale Ecosystem: A Lifecycle Perspective on Continual Learning

作为行业规模生态系统的LLM演进:持续学习的生命周期视角

The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

智能人工智能搭便车指南:从基础到系统

Supervised Reinforcement Learning for the Coordination of Distributed Energy Resources

分布式能源资源协调的监督强化学习

Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity

数字孪生驱动的自适应模拟与现实对齐,通过强化学习实现基于振动的轴承健康监测,在数据稀缺下实现

Towards Scalable Multi-Task Reinforcement Learning with Large Decision Models

迈向可扩展的多任务强化学习,采用大型决策模型

Uncertainty-aware reinforcement learning for chemical language models

化学语言模型的不确定性感知强化学习

Solving Markov Decision Processes with Future Information via MPC

通过MPC解决马尔可夫决策过程与未来信息

ExTra: Exploratory Trajectory Optimization for Language Model Reinforcement Learning

ExTra:语言模型强化学习的探索性轨迹优化

A Zeroth-Order Deep Learning Method for Fully Nonlinear Parabolic Partial Differential Equations with Unknown Coefficients

一种用于完全非线性抛物型偏微分方程且系数未知的零阶深度学习方法

Geo-Strat-RL: Learning Geological Event Reasoning from Verifiable Tasks

地质-战略-强化学习:从可验证任务中学习地质事件推理

Bias-Controlled Primal-Dual Natural Actor-Critic: Optimal Rates for Constrained Multi-Objective Average-Reward RL

偏倚控制的原始-对偶自然行为者-批评者:受限多目标平均奖励强化学习的最优率

GCT-MARL: Graph-Based Contrastive Transfer for Sample-Efficient Cooperative Multi-Agent Reinforcement Learning

GCT-MARL:基于图的对比转移用于样本高效合作多代理强化学习

Energy Efficient Scheduling of AI/ML Workloads on Multi Instance GPUs with Dynamic Repartitioning

多实例GPU上的AI/ML工作负载节能调度,采用动态重分

RGB: RL Guided Whole-Body MPPI for Humanoid Control

RGB:用于人形控制的强化全身MPPI

Reward-Conditioned Attention: How Reward Design Shapes What Autonomous Driving Agents See

奖励条件注意力:奖励设计如何影响自动驾驶智能体所见

TRUSTMEM: Learning Trustworthy Memory Consolidation for LLM Agents with Long-Term Memory

TRUSTMEM:为具备长期记忆的LLM代理学习可信记忆巩固

Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

通用推理的可迁移性:多领域RLVR自动化课程

Learning Perceptive Platform Adaptive Locomotion Controllers for Quadrupedal Robots

学习四足机器人的感知平台自适应运动控制器

SoK: AI Secure Code Generation: Progress, Pitfalls, and Paths Forward

SoK:AI安全代码生成:进展、陷阱与前进路径

Inverse Reinforcement Learning for Interpretable Keystroke Biomarkers in Parkinson's Disease

帕金森病中可解释的按键生物标志物的逆强化学习

DynaMOMA: Instantaneous Prediction of Grasp Poses for Mobile Manipulation of Dynamic Objects

DynaMOMA:动态物体移动操作中的抓握姿态瞬时预测

V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

V-Zero:无答案标签的政策提炼,并以对比证据门控实现细粒度的视觉推理

Omni-Perception Policy Optimization for Multimodal Emotion Reasoning

多模态情绪推理的全知觉策略优化

Stagnant Neuron: Towards Understanding the Plasticity Loss in Multi-Agent Reinforcement Learning Value Factorization Methods

停滞神经元:探讨多智能体强化学习价值因子方法中的可塑性损失

AI Coaching for Accelerating Human Skill Development with Reinforcement Learning

人工智能教练通过强化学习加速人类技能发展

Efficient and Trainable Language Model Test-Time Scaling via Local Branch Routing

高效且可训练的语言模型通过局部分支路由进行测试时间扩展

Compositional Behavioral Semantics for State Abstraction in Reinforcement Learning

强化学习中状态抽象的组合行为语义

FactorLibrary: From Polynomials to Circuits via Recursive Subgoals

因子库:通过递归子目标从多项式到电路

MAPL: Multi-Objective Preference Learning for Robot Locomotion

MAPL:机器人运动的多目标偏好学习

Learning with a Single Rollout via Monte Carlo Pass@k Critic

通过蒙特卡洛Pass@k Critic 单次推广学习

Rate-Aware Quantum-Inspired Trajectory Learning for Interference-Limited Multi-UAV Networks

干扰限制多无人机网络的速率感知量子启发轨迹学习

Low Variance Trust Region Optimization with Independent Actors and Sequential Updates in Cooperative Multi-agent Reinforcement Learning

合作多智能体强化学习中的低方差信任区域优化,采用独立演员和顺序更新

Beyond One-Size-Fits-All: Diagnosis-Driven Online Reinforcement Learning with Offline Priors

超越一刀切:基于诊断的线上强化学习与线下先验

Latency-Aware Service Placement using Neural Combinatorial Optimisers for Edge--Cloud Systems

利用神经组合优化器实现边缘云系统的延迟感知服务部署

FeVOS: Foresight Expression Video Object Segmentation

FeVOS:前瞻性表达视频对象分割

Low-Complexity Policy Tessellations in Structured Markov Decision Processes

结构化马尔可夫决策过程中的低复杂度策略镶嵌

Power-Budgeted Underwater Vehicle Control via Constrained Reinforcement Learning

通过受限强化学习实现功率预算水下飞行器控制

Memory-Efficient Policy Libraries with Low-Rank Adaptation in Reinforcement Learning

强化学习中具低秩适应性的高效内存策略库

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

OPERA:通过基于客观困惑的强化学习对齐开放式推理

StairMaster: Learning to Conquer Risky Hollow Stairs for Agile Quadrupedal Robots

楼梯大师:学习征服灵活四足机器人的冒险空心楼梯

MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources

MiniOpt:用推理方法建模并解决有限资源下的通用优化问题

Semantic Consistency Policy Optimization for Reinforcement Learning of LLM Agents

语义一致性策略优化用于LLM代理的强化学习

Enhancing Brain MRI Anomaly Detection and Reasoning with ROI Rethink and Synthetic Data

通过投资回报率(ROI)和合成数据增强脑MRI异常检测与推理

WinDOM: Self-Family Distillation for Small-Model GUI Grounding

WinDOM:用于小模型图形界面基础的自家族蒸馏

Mixture-of-Experts RL for Fault-Tolerant Legged Locomotion

专家混合强化学习,用于容错腿式行车

Hierarchical Reinforcement Learning for Neural Network Compression (HiReLC): Pruning and Quantization

神经网络压缩的层级强化学习(HiReLC):剪枝与量化

FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation

FORCE:通过数值校准预热和自蒸馏实现高效的VLA增强微调

Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It

为什么多步骤工具使用强化学习会崩溃,以及监督信号如何解决这个问题

Learning Robot Visual Navigation in Crowds via Intention-Aware Scene Representations

通过意图感知场景表征学习机器人在人群中的视觉导航

Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents

培训后被忽视的免费午餐:LLM代理的进步优势

On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity

基于策略的自蒸馏与采样演示降低了输出多样性

Keyword: diffusion policy

One Body, Two Minds: Variable Autonomy Approach for a Co-embodied Robotic Hand

一体两智:同一体化机器人手的可变自主性方法

Stage-Aware and Roughness-Constrained Diffusion Policy for Multi-Stage Robotic Polishing

多阶段机器人抛光的阶段感知和粗糙度约束扩散政策