生成时间: 2026-06-11 20:12:55 (UTC+8); Arxiv 发布时间: 2026-06-11 20:00 EDT (2026-06-12 08:00 UTC+8)

今天共有 44 篇相关文章

Keyword: reinforcement learning

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

大型语言模型的兼容性感知动态微调

ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

ProcessThinker:通过基于推广的过程奖励增强多模态大型语言模型推理能力

CFCamo: A Counterfactual Detect-or-Abstain Framework for Camouflaged Object Detection

CFCamo:一种用于伪装物体检测的反事实检测或避免框架

Multi-agent rendezvous in fluid flows via reinforcement learning

通过强化学习实现流体流动中的多智能体交会

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

Phi-Actor-Critic:引导一般和博弈达到帕累托有效相关均衡

PLUME: Probabilistic Latent Unified World Modeling and Parameter Estimation for Multi-Finger Manipulation

PLUME:多指操作的概率潜在统一世界建模与参数估计

Dynamic Execution Horizon Prediction for Chunk-based Robot Policies

基于区块的机器人策略动态执行视野预测

Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity

镜像下降超越欧几里得稳定性:初始化灵敏度的指数级分离

Agent Skill Evaluation and Evolution: Frameworks and Benchmarks

代理技能评估与演变:框架与基准

INFRAMIND: Infrastructure-Aware Multi-Agent Orchestration

INFRAMIND:基础设施感知多智能体编排

The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes

大型语言模型推理周期表:推理范式、方法与失败模式的结构化综述

Learning Object Manipulation from Scratch via Contrastive Interaction

通过对比交互从零开始学习对象操作

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

英雄:环境观察的事后洞察增强反思,用于能动自我蒸馏

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

架构感知强化学习使滑动窗口注意力在数学推理中更具竞争力

IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

IAPO:用于小型多模态代理工具的输入归因感知策略优化

Lung-R1: A Knowledge Graph-Guided LLM for Pulmonary Diagnostic Reasoning

肺-R1:一种基于知识图谱的肺部诊断逻辑大语言模型

Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents

组织然后检索:高效代理的层级内存导航

RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation

RLCSD:带有对比性政策自我提炼的强化学习

UniReason-Med: A Shared Grounded Reasoning Interface for 2D-to-3D Transfer in Medical VQA

UniReason-Med:医疗VQA中用于2D转3D传输的共享基础推理接口

TacCoRL: Integrating Tactile Feedback into VLA via Simulation

TacCoRL:通过仿真将触觉反馈整合进VLA

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

SVoT:通过强化学习实现空间推理的状态感知思维可视化

Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning

空间采样值衰减:非平稳深度强化学习的遗忘机制

RePAIR: Predictive Self-Supervised Representation Learning in Chess

RePAIR:国际象棋中的预测性自我监督表征学习

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

利用路由前瞻实现强化学习中微步级 MoE 负载均衡 后期训练

Critic Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation

批评架构的重要性:双重批评者与统一批评者在人形机车操控方面

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

审讯的艺术:一致性增强空间推理中的事实性

PAWS: Preference Learning with Advantage-Weighted Segments

PAWS:带有优势加权片段的偏好学习

Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization

泛化黑客:模型可以通过防止行为泛化来博弈强化学习

KinematicRL: A Sim-to-Real Reinforcement Learning Framework For Social Navigation With Kinodynamic Feasibility

运动学RL:一种具有运动动力学可行性的社会导航模拟到现实强化学习框架

World Model Self-Distillation: Training World Models to Solve General Tasks

世界模型自蒸馏:训练世界模型以解决通用任务

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

InternVideo3:多模态上下文推理的Agentify基础模型

DrivingAgent: Design and Scheduling Agents for Autonomous Driving Systems

DrivingAgent:自动驾驶系统设计与调度代理

Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization

强化学习打破基于梯度的对抗性优化

Beyond Fully Random Masking: Attention-Guided Denoising and Optimization for Diffusion Language Models

超越完全随机掩蔽:注意力引导去噪与扩散语言模型优化

Mathematical perspective on genetic algorithms with optimization guided operators

关于带有优化引导算子的遗传算法的数学视角

CCKS: Consensus-based Communication and Knowledge Sharing

CCKS:基于共识的沟通与知识共享

Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling

突破熵界限:通过拒绝抽样加速强化学习(MTP)的强化学习

UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

UniIntervene:高效现实世界强化学习的代理干预

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

可验证环境是乐高积木:推理推广的递归组合

APPO: Agentic Procedural Policy Optimization

APPO:代理程序策略优化

ATLAS: Active Theory Learning for Automated Science

ATLAS:自动化科学的主动理论学习

Keyword: diffusion policy

Adversarial Attacks on Learned Policies for Surgical Robotic Tasks

对手术机器人任务学习策略的对抗性攻击

Blind Dexterous Grasping via Real2Sim2Real Tactile Policy Learning

通过Real2Sim2Real触觉政策学习实现盲目灵巧抓取

Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics

环境扩散政策:机器人学中从次优数据中模仿学习