生成时间: 2026-02-26 16:52:05 (UTC+8); Arxiv 发布时间: 2026-02-26 20:00 EST (2026-02-27 09:00 UTC+8)

今天共有 36 篇相关文章

Keyword: reinforcement learning

ImpRIF: Stronger Implicit Reasoning Leads to Better Complex Instruction Following

ImpRIF:更强的隐性推理带来更好的复杂指令跟随

Cross domain Persistent Monitoring for Hybrid Aerial Underwater Vehicles

混合空中水下载具的跨域持续监测

Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

Tool-R0:自演进的LLM代理,用于从零数据中学习工具

Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment

对准加权DPO:一种有原则的推理方法以改善安全对齐

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

过度自信错误需要更强的纠正:强化学习中的非对称信心惩罚

On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation

论政策转型下认识行为的结构性不保留

Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking

将理解与生成与交错分析-起草思维协同

GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning

GradAlign:用于大型语言模型强化学习的梯度对齐数据选择

See It, Say It, Sorted: An Iterative Training-Free Framework for Visually-Grounded Multimodal Reasoning in LVLMs

看见它,说出来,分类:一个无训练迭代的多模态推理框架,用于LVLMs中以视觉为基础的多模态推理

Training Generalizable Collaborative Agents via Strategic Risk Aversion

通过战略风险规避培训可通用协作代理

Which Tool Response Should I Trust? Tool-Expertise-Aware Chest X-ray Agent with Multimodal Agentic Learning

我应该相信哪个工具的回应?具备工具专业知识感知的胸部X光特工,具备多模态智能学习

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

ARLArena:稳定智能体强化学习的统一框架

Learning Agile and Robust Omnidirectional Aerial Motion on Overactuated Tiltable-Quadrotors

学习在过致动的可倾向四旋翼天上实现敏捷且稳健的全向空中运动

Tacmap: Bridging the Tactile Sim-to-Real Gap via Geometry-Consistent Penetration Depth Map

战术地图:通过几何一致的穿透深度图桥接触觉模拟与现实之间的差距

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

RuCL:基于评分标准的多模态大型语言模型推理课程学习

Self-Correcting VLA: Online Action Refinement via Sparse World Imagination

自我纠正VLA:通过稀疏世界想象实现在线动作精炼

CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning

CCCaption:双重奖励强化学习,实现完整且正确的图片字幕

Hierarchical Lead Critic based Multi-Agent Reinforcement Learning

基于层级主审判的多智能体强化学习

Two-Stage Active Distribution Network Voltage Control via LLM-RL Collaboration: A Hybrid Knowledge-Data-Driven Approach

通过LLM-RL协作实现两级有源配电网络电压控制:一种知识-数据驱动混合方法

Evaluating the relationship between regularity and learnability in recursive numeral systems using Reinforcement Learning

利用强化学习评估递归数字系统中规律性与可学习性之间的关系

LessMimic: Long-Horizon Humanoid Interaction with Unified Distance Field Representations

LessMimic:长视野类人生物与统一距离场表示的交互

Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling

图谱探索:通过路径精炼奖励建模激励知识图谱上自主探索大型语言模型

Enhancing Multi-Modal LLMs Reasoning via Difficulty-Aware Group Normalization

通过难度感知群归一化增强多模态大型语言模型的推理能力

Generalisation of RLHF under Reward Shift and Clipped KL Regularisation

RLHF在奖励移位和截断KL正则化下的推广

DexRepNet++: Learning Dexterous Robotic Manipulation with Geometric and Spatial Hand-Object Representations

DexRepNet++:学习灵活的机器人作,结合几何和空间手部物体表示

Self-Curriculum Model-based Reinforcement Learning for Shape Control of Deformable Linear Objects

基于模型的自学强化学习用于可变形线性物体的形状控制

LightSim: A Lightweight Cell Transmission Model Simulator for Traffic Signal Control Research

LightSim:一款用于交通信号控制研究的轻量级小区传输模型模拟器

Distill and Align Decomposition for Enhanced Claim Verification

提取和对齐分解以增强主张验证

ExpLang: Improved Exploration and Exploitation in LLM Reasoning with On-Policy Thinking Language Selection

ExpLang:通过策略思维语言选择改进的LLM推理探索与利用

RADAR: Reasoning as Discrimination with Aligned Representations for LLM-based Knowledge Graph Reasoning

RADAR:基于LLM的知识图谱推理中,推理作为区分与对齐表征

PanoEnv: Exploring 3D Spatial Intelligence in Panoramic Environments with Reinforcement Learning

PanoEnv:利用强化学习探索全景环境中的三维空间智能

System Design of the Ultra Mobility Vehicle: A Driving, Balancing, and Jumping Bicycle Robot

超能移动车辆系统设计:一款驾驶、平衡与跳跃自行车机器人

SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

软件工程师门徒:学会选择性地与专家协作,解锁了作为软件工程代理的小型语言模型

Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual

通过乐观原始对偶实现多目标安全LLM对齐的可证明最后迭代收敛

Improving Parametric Knowledge Access in Reasoning Language Models

改进推理语言模型中的参数化知识获取

Keyword: diffusion policy

ADM-DP: Adaptive Dynamic Modality Diffusion Policy through Vision-Tactile-Graph Fusion for Multi-Agent Manipulation

ADM-DP:通过视觉-触觉-图融合实现自适应动态模态扩散策略,用于多智能体作