生成时间: 2026-01-05 16:36:36 (UTC+8); Arxiv 发布时间: 2026-01-05 20:00 EST (2026-01-06 09:00 UTC+8)

今天共有 21 篇相关文章

Keyword: reinforcement learning

Reinforcement learning with timed constraints for robotics motion planning

基于定时约束的机器人运动规划强化学习

Universal Adaptive Constraint Propagation: Scaling Structured Inference for Large Language Models via Meta-Reinforcement Learning

通用自适应约束传播:通过元强化学习对大型语言模型进行结构化推理的扩展

GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments

GRL-SNAM:基于路径微分哈密顿量的几何强化学习,用于未知环境中的同时导航和制图

Reinforcement Learning with Function Approximation for Non-Markov Processes

非马尔可夫过程的强化学习与函数近似

Online Finetuning Decision Transformers with Pure RL Gradients

纯强化梯度的在线微调决策变换器

Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings

强化学习的量化语义嵌入不等错误保护

From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning

从视觉到洞察:通过强化学习提升多模态模型的视觉推理能力

Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing

现代神经形态人工智能:从代币内到代币间处理

Next Generation Intelligent Low-Altitude Economy Deployments: The O-RAN Perspective

下一代智能低空经济部署:O-RAN视角

Can Optimal Transport Improve Federated Inverse Reinforcement Learning?

最优传输能改善联邦逆强化学习吗?

Offline Multi-Agent Reinforcement Learning for 6G Communications: Fundamentals, Applications and Future Directions

6G通信的离线多智能体强化学习:基础、应用与未来方向

Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach

地理定位的视觉语言推理:一种强化学习方法

E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models

E-GRPO:高熵步骤驱动流模型的有效强化学习

CPPO: Contrastive Perception for Vision Language Policy Optimization

CPPO:视觉语言政策优化中的对比感知

Traffic-Aware Optimal Taxi Placement Using Graph Neural Network-Based Reinforcement Learning

基于图神经网络的强化学习的交通感知最佳出租车位置

Vision-based Goal-Reaching Control for Mobile Robots Using a Hierarchical Learning Framework

基于愿景的移动机器人目标达成控制,采用层级学习框架

RoboReward: General-Purpose Vision-Language Reward Models for Robotics

RoboReward:机器人通用视觉语言奖励模型

IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning

IRPO:通过强化学习扩展布拉德利-特里模型

ARISE: Adaptive Reinforcement Integrated with Swarm Exploration

ARISE:与群体探索集成的自适应强化

Precision Autotuning for Linear Solvers via Contextual Bandit-Based RL

通过基于上下文的强化学习实现线性求解器的高精度自动调谐

Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty

随机行为者批评者:通过时间偶然不确定性缓解高估

Keyword: diffusion policy

There is no result