生成时间: 2026-02-18 16:49:53 (UTC+8); Arxiv 发布时间: 2026-02-18 20:00 EST (2026-02-19 09:00 UTC+8)

今天共有 14 篇相关文章

Keyword: reinforcement learning

CLOT: Closed-Loop Global Motion Tracking for Whole-Body Humanoid Teleoperation

CLOT:闭环全球运动追踪,用于全身人形远程作

Near-Optimal Sample Complexity for Online Constrained MDPs

在线受限MDP的近似最优样本复杂度

MyoInteract: A Framework for Fast Prototyping of Biomechanical HCI Tasks using Reinforcement Learning

MyoInteract:利用强化学习快速成型生物力学HCI任务的框架

EventMemAgent: Hierarchical Event-Centric Memory for Online Video Understanding with Adaptive Tool Use

EventMemAgent:基于自适应工具的分层事件中心记忆,用于在线视频理解

CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies

CDRL:受小脑回路和树突计算策略启发的强化学习框架

Fairness over Equality: Correcting Social Incentives in Asymmetric Sequential Social Dilemmas

公平胜于平等:纠正非对称连续社会困境中的社会激励

Efficient Knowledge Transfer for Jump-Starting Control Policy Learning of Multirotors through Physics-Aware Neural Architectures

通过物理感知神经架构实现多旋翼控制策略学习的高效知识转移

Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL

超越静态流水线:学习文本转SQL的动态工作流程

STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

STAPO:通过静音稀有虚假代币稳定大型语言模型的强化学习

Recursive Concept Evolution for Compositional Reasoning in Large Language Models

大型语言模型中组合推理的递归概念演化

MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction

MeshMimic:通过三维场景重建实现几何感知类人生物运动学习

GLM-5: from Vibe Coding to Agentic Engineering

GLM-5:从Vibe编码到代理工程

Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning

利用强化学习解决参数鲁棒问题,避免可行性未知的问题

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

感知类人跑酷:通过动作匹配串联动态人类技能

Keyword: diffusion policy

There is no result