生成时间: 2026-01-26 16:35:47 (UTC+8); Arxiv 发布时间: 2026-01-26 20:00 EST (2026-01-27 09:00 UTC+8)

今天共有 16 篇相关文章

Keyword: reinforcement learning

A Regularized Actor-Critic Algorithm for Bi-Level Reinforcement Learning

用于双级强化学习的正则化演员-批评算法

Clarify or Answer: Reinforcement Learning for Agentic VQA with Context Under-specification

澄清或回答:针对具备上下文不足的智能VQA强化学习

Towards a Theoretical Understanding to the Generalization of RLHF

迈向理论理解到RLHF推广

Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision Agriculture

基于强化学习的能源感知覆盖路径规划,适用于精准农业

Endless Terminals: Scaling RL Environments for Terminal Agents

无尽终端:终端代理的强化学习环境扩展

Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

融合专业知识:将人类思维带回围棋游戏

Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic

及时机器:时间意识使测试时间尺度变得有智能

UAV-Assisted Joint Data Collection and Wireless Power Transfer for Batteryless Sensor Networks

无人机辅助联合数据收集与无线电力传输,用于无电池传感器网络

Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab

网络物理移动实验室中的零射MARL基准测试

A Cognitive Framework for Autonomous Agents: Toward Human-Inspired Design

自主智能体的认知框架:迈向人为启发的设计

Sim-to-Real Transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-Shot Deployment on Robotic Manipulators through Visual Domain Adaptation

通过样式识别周期的一致生成对抗网络进行模拟到现实的传输:通过视觉域适应在机器人机械手上的零样本部署

Adaptive Reinforcement and Model Predictive Control Switching for Safe Human-Robot Cooperative Navigation

自适应强化与模型预测控制切换,实现安全人机协作导航

LongCat-Flash-Thinking-2601 Technical Report

LongCat-Flash-Thinking-2601技术报告

Reasoning Promotes Robustness in Theory of Mind Tasks

推理促进心智理论任务的稳健性

Boosting Deep Reinforcement Learning with Semantic Knowledge for Robotic Manipulators

利用语义知识提升机器人机械手的深度强化学习

The Trajectory Alignment Coefficient in Two Acts: From Reward Tuning to Reward Learning

轨迹对齐系数分为两个过程:从奖励调优到奖励学习

Keyword: diffusion policy

There is no result