生成时间: 2025-11-28 16:30:16 (UTC+8); Arxiv 发布时间: 2025-11-27 20:00 EST (2025-11-28 09:00 UTC+8)

今天共有 22 篇相关文章

Keyword: reinforcement learning

Video Object Recognition in Mobile Edge Networks: Local Tracking or Edge Detection?

移动边缘网络中的视频对象识别:本地跟踪还是边缘检测?

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

LongVT:通过原生工具调用激励“用长视频思考”

SPHINX: A Synthetic Environment for Visual Perception and Reasoning

SPHINX:一个用于视觉感知与推理的合成环境

Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

探讨强化学习中败血症治疗的时间步长

Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems

基于梯度的独立政策强化学习,用于经济且可靠的多微电网系统能源管理

Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning

子目标:基于大型语言模型的图增强规划,用于大型语言模型引导的开放世界强化学习

ICPO: Intrinsic Confidence-Driven Group Relative Preference Optimization for Efficient Reinforcement Learning

ICPO:内在信心驱动群体相对偏好优化以实现高效强化学习

Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning

错开环境重置大幅提升策略上的并行强化学习

Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs

打破安全与能力权衡:带可验证奖励的强化学习维护大型语言模型的安全防护

Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning

利用平衡微调将LLM与生物医学知识对齐

Dual-Agent Reinforcement Learning for Adaptive and Cost-Aware Visual-Inertial Odometry

双代理强化学习用于自适应且成本感知的视觉惯性里程测量

SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation

SocialNav:培养以人为本的社会意识具身导航基础模型

Maglev-Pentabot: Magnetic Levitation System for Non-Contact Manipulation using Deep Reinforcement Learning

磁悬浮-五方机器人:基于深度强化学习的非接触式磁悬浮系统

Kinematics-Aware Multi-Policy Reinforcement Learning for Force-Capable Humanoid Loco-Manipulation

运动学感知多策略强化学习,用于具备原力能力的人形机车控

Sparse shepherding control of large-scale multi-agent systems via Reinforcement Learning

通过强化学习对大规模多智能体系统的稀疏牧羊控制

Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance

混合-AIRL:通过专家监督提升逆向强化学习

Monet: Reasoning in Latent Visual Space Beyond Images and Language

莫奈:超越图像与语言的潜在视觉空间中的推理

Decentralized Shepherding of Non-Cohesive Swarms Through Cluttered Environments via Deep Reinforcement Learning

通过深度强化学习,去中心化对非凝聚群体的管理,穿越杂乱环境中

Predictive Safety Shield for Dyna-Q Reinforcement Learning

Dyna-Q强化学习的预测安全盾

BAMAS: Structuring Budget-Aware Multi-Agent Systems

BAMAS:构建预算感知的多智能体系统

Escaping the Verifier: Learning to Reason via Demonstrations

逃避验证者:通过演示学习推理

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

ToolOrchestra:通过高效的模型和工具编排提升智能

Keyword: diffusion policy

There is no result