生成时间: 2026-02-25 16:53:58 (UTC+8); Arxiv 发布时间: 2026-02-25 20:00 EST (2026-02-26 09:00 UTC+8)

今天共有 31 篇相关文章

Keyword: reinforcement learning

Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning

混合策略RLVR中的可控探索用于多模态推理

Sample-Efficient Learning with Online Expert Correction for Autonomous Catheter Steering in Endovascular Bifurcation Navigation

带在线专家校正的高效学习,用于血管内分岔导航中的自主导管引导

What Matters for Simulation to Online Reinforcement Learning on Real Robots

模拟对在线强化学习的重要性

Aesthetic Camera Viewpoint Suggestion with 3D Aesthetic Field

带有3D美学场的美学摄像机视角建议

Generalizing from References using a Multi-Task Reference and Goal-Driven RL Framework

利用多任务参考和目标驱动强化学习框架从参考进行推广

Diffusion Modulation via Environment Mechanism Modeling for Planning

通过环境机制建模实现的扩散调制用于规划

KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning

KairosVL:统一推理的时间序列与语义编排

A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies

一个捕捉不断演变的学生教学策略的通用学徒学习框架

Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training

演员-策展人:通过政策改进强盗为强化学习者实现的共适应课程学习

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production

从日志到语言:学习基于LLM的生产推荐的最佳口头化

OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services

OptiLeak:多租户大型语言模型服务中的高效即时重建,通过强化学习实现

From Pairs to Sequences: Track-Aware Policy Gradients for Keypoint Detection

从对到序列:关键点检测的轨道感知策略梯度

TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer

TrajGPT-R:利用强化学习增强生成预训练变换器生成城市出行轨迹

CAMEL: Confidence-Gated Reflection for Reward Modeling

CAMEL:信心门控反思用于奖励建模

IG-RFT: An Interaction-Guided RL Framework for VLA Models in Long-Horizon Robotic Manipulation

IG-RFT:一种用于长视野机器人作中VLA模型的交互引导强化学习框架

Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning

缓冲区的重要性:释放大型语言模型推理中非策略强化学习的力量

Deep Reinforcement Learning Based Block Coordinate Descent for Downlink Weighted Sum-rate Maximization on AI-Native Wireless Networks

基于深度强化学习的块状坐标下降,用于AI原生无线网络下行加权和速率最大化

Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback

在城市交通控制中平衡多重目标与基于AI反馈的强化学习

Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty

Fuz-RL:一个模糊引导的稳健框架,用于在不确定性下安全强化学习

PyVision-RL: Forging Open Agentic Vision Models via RL

PyVision-RL:通过强化学习打造开放代理视觉模型

Overton Pluralistic Reinforcement Learning for Large Language Models

Overton 多元强化学习用于大型语言模型

Probing Dec-POMDP Reasoning in Cooperative MARL

探讨合作式MARL中对DEC-POMDP推理的探讨

Regret-Guided Search Control for Efficient Learning in AlphaZero

AlphaZero中高效学习的遗憾引导搜索控制

LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding

LongVideo-R1:低成本长视频理解的智能导航

Task-oriented grasping for dexterous robots using postural synergies and reinforcement learning

利用姿势协同和强化学习,针对灵巧机器人的任务导向抓取

The Art of Efficient Reasoning: Data, Reward, and Optimization

高效推理的艺术:数据、奖励与优化

Localized Dynamics-Aware Domain Adaption for Off-Dynamics Offline Reinforcement Learning

用于离线动态强化学习的局部动态感知域适配

Cooperative-Competitive Team Play of Real-World Craft Robots

现实世界工艺机器人的合作竞技团队游戏

SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards

SELAUR:通过不确定性感知奖励实现自我进化的LLM代理

Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics

Squint:模拟到现实机器人的快速视觉强化学习

Keyword: diffusion policy

Recursive Belief Vision Language Model

递归信念视觉语言模型