生成时间: 2026-01-20 16:36:37 (UTC+8); Arxiv 发布时间: 2026-01-19 20:00 EST (2026-01-20 09:00 UTC+8)

今天共有 24 篇相关文章

Keyword: reinforcement learning

Energy-Efficient Omnidirectional Locomotion for Wheeled Quadrupeds via Predictive Energy-Aware Nominal Gait Selection

通过预测能量感知名义步态选择实现轮式四足动物的节能全向运动

Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration

《用长期记忆探索:基于多模态的LLM强化学习框架的具身探索基准与框架》

Reasoning Models Generate Societies of Thought

推理模型生成思维社会

Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning

Action Shapley:强化学习中世界模型的训练数据选择指标

Realistic Curriculum Reinforcement Learning for Autonomous and Sustainable Marine Vessel Navigation

自主且可持续海上船舶航行的现实课程强化学习

Where to Touch, How to Contact: Hierarchical RL-MPC Framework for Geometry-Aware Long-Horizon Dexterous Manipulation

触摸地点,如何联系:层级RL-MPC框架,用于几何感知的长视野灵巧作

MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement

MMedExpert-R1:通过领域特异性适应和临床指南强化强化多模态医学推理

Toward Adaptive Grid Resilience: A Gradient-Free Meta-RL Framework for Critical Load Restoration

迈向自适应电网韧性:一个无梯度的元强化学习框架用于关键负载恢复

BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search

BAPO:边界感知策略优化,实现可靠的代理搜索

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

虚假奖励悖论:机械性理解RLVR如何激活LLM中的记忆捷径

Visual Marker Search for Autonomous Drone Landing in Diverse Urban Environments

视觉标志搜索:在多样城市环境中自主着陆的无人机

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

PhysRVG:视频生成模型的物理感知统一强化学习

Learning Quadrupedal Locomotion for a Heavy Hydraulic Robot Using an Actuator Model

利用执行器模型学习重型液压机器人的四足行走

Deep GraphRAG: A Balanced Approach to Hierarchical Retrieval and Adaptive Integration

深度GraphRAG:一种平衡的层级检索与自适应集成方法

TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech

TANDEM:多模态仇恨言论的时间感知神经检测

Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems

基于策略的深度强化学习超启发式方法,用于工作间调度问题

Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

知识不够:注入强化学习技能以实现持续适应

Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency

基于离线强化学习的电源控制,实现应用无关的能效

The Mini Wheelbot Dataset: High-Fidelity Data for Robot Learning

迷你轮机器人数据集:机器人学习的高保真数据

Factored Value Functions for Graph-Based Multi-Agent Reinforcement Learning

基于图的多智能体强化学习中的分解值函数

Generative Scenario Rollouts for End-to-End Autonomous Driving

端到端自动驾驶的生成式场景推广

Do explanations generalize across large reasoning models?

解释是否能在大型推理模型中泛化?

Keyword: diffusion policy

Multi-Agent Formation Navigation Using Diffusion-Based Trajectory Generation

利用基于扩散轨迹生成的多智能体形成导航

X-Distill: Cross-Architecture Vision Distillation for Visuomotor Learning

X-Distill:跨架构视觉提炼用于Visuomotor学习