生成时间: 2025-12-04 16:32:30 (UTC+8); Arxiv 发布时间: 2025-12-04 20:00 EST (2025-12-05 09:00 UTC+8)

今天共有 42 篇相关文章

Keyword: reinforcement learning

Safe and Sustainable Electric Bus Charging Scheduling with Constrained Hierarchical DRL

安全且可持续的电动公交充电安排与受限层级日程

Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration

通过扩散贝叶斯探索动态校正错误状态估计

Hierarchical Process Reward Models are Symbolic Vision Learners

层级过程奖励模型是象征性愿景学习者

Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments

多智能体强化学习与虚拟环境中机器人足球的实时决策

GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding

GRAND:多智能体路径寻找中的网络调度的指导、再平衡与分配

A Multi-Agent, Policy-Gradient approach to Network Routing

多智能体、策略梯度网络路由方法

SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

SPARK:无引用强化学习的逐步过程感知奖励

SpatialReasoner: Active Perception for Large-Scale 3D Scene Understanding

空间推理器:大规模三维场景理解的主动感知

Better World Models Can Lead to Better Post-Training Performance

更好的世界模型可以带来更好的训练后表现

World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations

来自LIDAR观测的地面机器人自主导航世界模型

Multimodal Reinforcement Learning with Agentic Verifier for AI Agents

多模态强化学习与智能体验证器(Agentic Verifier)用于人工智能代理

PretrainZero: Reinforcement Active Pretraining

PretrainZero:强化主动预训练

Variable-Impedance Muscle Coordination under Slow-Rate Control Frequencies and Limited Observation Conditions Evaluated through Legged Locomotion

通过腿式行走评估低速控制频率和有限观察条件下的可变阻抗肌肉协调

Adaptive sampling using variational autoencoder and reinforcement learning

利用变分自编码器和强化学习的自适应采样

Multi-Agent Reinforcement Learning with Communication-Constrained Priors

多智能体强化学习,带有通信受限先验

A Learning-based Control Methodology for Transitioning VTOL UAVs

一种基于学习的垂直起降无人机转换控制方法

RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL

RoboScape-R:通过强化学习实现通用机器人训练的统一奖励-观察世界模型

Accelerating Detailed Routing Convergence through Offline Reinforcement Learning

通过离线强化学习加速详细路由收敛

A Descriptive Model for Modelling Attacker Decision-Making in Cyber-Deception

网络欺骗中攻击者决策建模的描述模型

ContactRL: Safe Reinforcement Learning based Motion Planning for Contact based Human Robot Collaboration

ContactRL:基于安全强化学习的动作规划,用于基于接触的人机协作

Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks

无线网络大型语言模型增强强化学习教程

Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing

自主规划空间组装强化学习免费飞行Yer(APIARY)国际空间站Astrobee测试

Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control

跨越模拟与地面测试到太空部署自主自由飞行器控制之间的Sim2Real鸿沟

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

以编程愿景思考:迈向以图像思考的统一视角

Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective

扩散大型语言模型的原则性强化学习从序列层面出现

Sample-Efficient Model-Free Policy Gradient Methods for Stochastic LQR via Robust Linear Regression

通过鲁棒线性回归实现随机LQR的样本高效无模型策略梯度方法

Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving

安全强化模型预测控制(SRMPC):通过强化学习改进MPC以实现自动驾驶运动规划

Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning

全自动思考:通过强化学习实现的自适应多模态推理

AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

AdaptVision:通过自适应视觉习得实现高效的视觉语言模型

MPCFormer: A physics-informed data-driven approach for explainable socially-aware autonomous driving

MPCFormer:一种基于物理学的数据驱动方法,用于可解释的社会意识自动驾驶

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA

动态算法配置的深度强化学习:关于用 (1+($λ$,$λ$))-GA 优化 OneMax 的案例研究

Multi-Agent Deep Reinforcement Learning for UAV-Assisted 5G Network Slicing: A Comparative Study of MAPPO, MADDPG, and MADQN

无人机辅助5G网络切片的多智能体深度强化学习:MAPPO、MADDPG和MADQN的比较研究

DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training

DVPO:基于分布价值建模的策略优化,用于LLM后期训练

Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

通过大型语言模型实现的自动攻击发现,实现少数类增量学习

Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning

基于数字孪生的控制协同设计,通过深度强化学习实现全车主动悬挂

Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware

采用英特尔Loihi 2神经形态硬件实现自主强化学习机器人控制

Hierarchical Vision Language Action Model Using Success and Failure Demonstrations

采用成功与失败演示的层级视觉语言行动模型

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning

TempR1:通过时间感知多任务强化学习提升对MLLM的时间理解

Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning

引导流策略:离线强化学习中的高价值行动学习

SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

SpaceTools:通过双交互式强化学习工具增强空间推理

SkillFactory: Self-Distillation For Learning Cognitive Behaviors

SkillFactory:自我提炼以学习认知行为

PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

PosterCopilot:迈向专业平面设计的布局推理与可控编辑

Keyword: diffusion policy

There is no result