生成时间: 2026-03-25 16:55:47 (UTC+8); Arxiv 发布时间: 2026-03-25 20:00 EDT (2026-03-26 08:00 UTC+8)

今天共有 33 篇相关文章

Keyword: reinforcement learning

TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

提示:搜索增强大型语言模型的回合级信息潜力奖励塑造

The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis

效率衰减现象:对思维语言假说的计算挑战

WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement

WIST:基于网络的迭代自玩树,用于领域定向推理改进

Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning

利用可微世界模型进行离线强化学习的模型预测控制

CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation

CaP-X:机器人操作编码代理基准与改进框架

Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs

稀疏但关键:RLVR微调LLM分布变化的代币级分析

Q-Tacit: Image Quality Assessment via Latent Visual Reasoning

Q-Tacit:通过潜在视觉推理进行图像质量评估

Improving Safety Alignment via Balanced Direct Preference Optimization

通过平衡直接偏好优化提升安全对齐

CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models

CoMaTrack:具备视觉-语言-行动模型的竞争性多智能体博弈论追踪

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

重新思考多模态思维链的代币级策略优化

Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

在灵巧操作中建立模拟到现实的推广基础:一项基于视觉-语言-行动模型的实证研究

VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents

VLGOR:面向可推广智能体的视觉语言知识引导离线强化学习

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

EVA:端到端视频代理的高效强化学习

Quality Over Clicks: Intrinsic Quality-Driven Iterative Reinforcement Learning for Cold-Start E-Commerce Query Suggestion

点击优先质量:内在质量驱动的迭代强化学习,用于冷启动电子商务查询建议

From Morality Installation in LLMs to LLMs in Morality-as-a-System

从大型语言模型中的道德安装到道德作为系统中的大型语言模型

MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

MedCausalX:自适应因果推理与自我反思,打造可信的医学视觉语言模型

Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards

基于策略的自回归图像模型调优,并获得实例级和分布级奖励

SpecXMaster Technical Report

SpecXMaster 技术报告

Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems

容错设计与实时深度强化学习系统的多目标模型检查

Path Planning and Reinforcement Learning-Driven Control of On-Orbit Free-Flying Multi-Arm Robots

轨道自由飞行多臂机器人的路径规划与基于学习的强化控制

ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment

ImplicitRM:基于隐性偏好数据的无偏奖励建模,用于LLM对齐

GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL

GEM:离线强化学习中行为归一化候选行动选择的引导期望最大化

Neural ODE and SDE Models for Adaptation and Planning in Model-Based Reinforcement Learning

基于模型的强化学习中的神经常微分方程和SDE模型适应与规划

A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling

一种具有差距感知生成的异构DAG调度学习方法

Learning Multi-Agent Local Collision-Avoidance for Collaborative Carrying tasks with Coupled Quadrupedal Robots

学习与耦合四足机器人协作携带任务的多智能体局部碰撞避免

Off-Policy Value-Based Reinforcement Learning for Large Language Models

大型语言模型的非策略价值强化学习

A Joint Reinforcement Learning Scheduling and Compression Framework for Teleoperated Driving

远程驾驶的联合强化学习调度与压缩框架

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling

SortedRL:通过在线时长感知调度加速LLM的强化学习训练

End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

线性贝尔曼完全MDP的端到端高效强化学习,具有确定性转移

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

WildWorld:一个面向生成式ARPG的动态世界建模数据集,支持动作和显式状态

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

UniGRPO:推理驱动视觉生成的统一策略优化

Keyword: diffusion policy

DiSCo: Diffusion Sequence Copilots for Shared Autonomy

DiSCo:共享自治的扩散序列副驾驶

Efficient Hybrid SE(3)-Equivariant Visuomotor Flow Policy via Spherical Harmonics for Robot Manipulation

通过球谐函数实现的高效混合SE(3)-等变体液驱动器流动策略,用于机器人操作