生成时间: 2026-06-26 18:54:37 (UTC+8); Arxiv 发布时间: 2026-06-26 20:00 EDT (2026-06-27 08:00 UTC+8)

今天共有 44 篇相关文章

Keyword: reinforcement learning

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

乐于助人会带来伤害:中期训练后同情心价值的领域依赖性下降

DocArena: Turning Raw Documents into Controllable Training Environments for Document Search Agents

DocArena:将原始文档转变为可控的文档搜索代理培训环境

Privacy-Aware Agent Collaboration for Dynamic VR Slice Management in 6G SD-RAN

6G SD-RAN 动态虚拟现实切片管理的隐私感知代理协作

Reinforcement Learning Enables Autonomous Microrobot Navigation and Intervention in Simulated Blood Capillaries

强化学习使微型机器人能够自主导航并干预模拟血细血管

Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration

强化学习在化学反应网络中的应用:作为好奇心驱动探索的移光性应用

RMTL: Reinforced Micro-task Learning for Long-Horizon Manipulation with VLM Rewards

RMTL:基于VLM奖励的强化微任务学习,用于长视野操作

HALO: Hierarchical Auction-assisted Learning for Offloading in SAGIN

HALO:SAGIN 中分层拍卖辅助的卸载学习

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

COrigami:一个用于共同设计平面折叠、视觉识别可识别折纸的AI流程

Racing a Wheeled Quadruped: Active Load Transfer Mitigation via Model Predictive Control

轮式四轮摩托车竞速:通过模型预测控制实现主动负载转移缓解

EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning

EVOM:强化学习中演员-批评架构的代理元进化

Mesh-RL: Coupled subgrid reinforcement learning

Mesh-RL:耦合子网格强化学习

Scaling Nonlinear Optimization: Many Problems One GPU

非线性优化缩放:一GPU上多问题

MPC-Injection: Biasing Off-Policy Locomotion RL Toward Controller-Induced Behavior Basins

MPC注入:偏向非策略移动强化学习,使其偏向控制者诱导的行为盆地

Deterministic Pareto-Optimal Policy Synthesis for Multi-Objective Reinforcement Learning

确定性帕累托最优策略综合用于多目标强化学习

Geometry-Aware MCTS for Extremal Problems in Combinatorial Geometry

针对组合几何极端问题的几何感知 MCTS

Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?

Play2Perfect:在 Dexterous Play 预训练中,精准组装的关键是什么?

AXLE: A Cloud Infrastructure for Lean 4 Theorem Proving Utilities

AXLE:精益四定理云基础设施,证明效用

Finding the Time to Think: Learning Planning Budgets in Real-Time RL

找到时间思考:实时强化学习预算规划

Sample-efficient Transfer Reinforcement Learning via Adaptive Reward Shaping and Policy-Ratio Reweighting Strategy

通过自适应奖励塑造和策略比重权策略实现样本高效的转移强化学习

VoiceTTA: Enhancing Zero-Shot Text-to-Speech via Reinforcement Learning-Based Test-Time Adaptation

VoiceTTA:通过基于强化学习的测试时间适配增强零样本文本转语音

Revisiting Action Factorization for Complex Action Spaces

复审复作用空间的作用分解

EvoOptiGraph: Weakness-Driven Coevolution via Graph-Based Structural Generation for Optimization Modeling

EvoOptiGraph:通过基于图的结构生成实现弱点驱动的共进化以实现优化建模

NebulaExp-8B: An Empirical Post-Training Pipeline via Full-Scale Ablation Research

NebulaExp-8B:通过全面消融研究实现的实证培训后流程

PressMimic: Pressure-Guided Motion Capture and Control for Humanoid Robot Imitation

PressMimic:用于仿人机器人模拟的压力引导动作捕捉与控制

AIGP: An LLM-Based Framework for Long-Term Value Alignment in E-Commerce Pricing

AIGP:基于大型语言模型的长期电子商务定价价值对齐框架

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

OPID:针对能动强化学习的策略技能提炼

Humanoid-DART: Humanoid Loco-Manipulation using Diffusion-guided Augmentation through Relabeling and Tracking

类人DART:通过重新标记和追踪实现扩散引导增强的人形机动操作

PlanRL: A Trajectory Planning Architecture for Reinforcement Learning-based Driving Experts

PlanRL:基于强化学习的驾驶专家的轨迹规划架构

SpatialFlow-GRPO: Where Spatial Credit Drives Image Editing

SpatialFlow-GRPO:空间署名如何驱动图像编辑

GEOALIGN: Geometric Rollout Curation for Robust LLM Reinforcement Learning

GEOALIGN:为稳健的大型语言模型强化学习提供几何推广策划

PortraitGen: Exemplar-Driven GRPO with Dual-Reward Guidance for Photorealistic Portrait Generation

PortraitGen:以示范为驱动的GRPO,配备双重奖励指导,实现照片级真实肖像生成

RobOralScan: Learning Active Intraoral Scanning for Robotic Dental Reconstruction

RobOralScan:学习主动口内扫描以实现机器人牙齿重建

RolloutPipe: Overlapping Pipelined Rollout and Training in Disaggregated On-Policy LLM Reinforcement Learning

RolloutPipe:在分解策略上LLM强化学习中的重叠流水线推广与培训

Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

通过基于心理学的推理和角色意识的政策优化,提升通用角色扮演代理

State Representation Matters in Deep Reinforcement Learning: Application to Energy Trading

深度强化学习中的状态代表性重要性:在能源交易中的应用

Heavy-Ball Q-Learning with Residual Weighting Correction

带有残差权重修正的重球Q学习

Automating Potential-based Reward Shaping with Vision Language Model Guidance

利用视觉语言模型指导自动化基于潜力的奖励塑造

Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes

以真实意图铺装:意图感知培训提升各培训体系中的LLM安全分类

Designing Reward Signals for Portable Query Generation: A Case Study in Industrial Semantic Job Search

为便携式查询生成设计奖励信号:工业语义求职案例研究

Sculpting NeRF Geometry: Human-Preference Fine-Tuning of a 3D-Aware Face GAN

雕刻NeRF几何体:3D感知面部GAN的人类偏好微调

VibeAct: Vibration to Actions for Contact-Rich Reactive Robot Dexterity

VibeAct:振动对动作的响应式机器人灵巧度提升

Bridging Performance and Generalization in Reinforcement Learning for Agile Flight

在强化学习中桥接性能与泛化,助力敏捷飞行

Reinforcement Learning without Ground-Truth Solutions can Improve LLMs

没有地面真实解决方案的强化学习可以提升LLMs

Keyword: diffusion policy

Bridging Handheld and Teleoperated Supervision for Contact-Rich Manipulation via State-Gated Experts

连接手持与远程监控,通过国家门控专家实现联系人丰富的操作