生成时间: 2025-12-12 16:33:45 (UTC+8); Arxiv 发布时间: 2025-12-12 20:00 EST (2025-12-13 09:00 UTC+8)

今天共有 32 篇相关文章

Keyword: reinforcement learning

TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0

TDC-Cache:一个值得信赖的去中心化协作缓存框架,适用于Web3.0

Latent Action World Models for Control with Unlabeled Trajectories

无标记轨迹控制的潜在动作世界模型

Diffusion Is Your Friend in Show, Suggest and Tell

扩散是你的朋友,展示、建议和传达

SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation

SEMDICE:通过平稳分布修正估计实现非策略状态熵最大化

Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation

更聪明,而非更难:层级强化学习扩散策略以实现高效非抓握作

Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation

显式控制屏障功能安全滤波器及其资源感知计算

An exploration for higher efficiency in multi objective optimisation with reinforcement learning

通过强化学习探索多目标优化的更高效率

Latent Chain-of-Thought World Modeling for End-to-End Driving

端到端驾驶的潜在思维链世界建模

Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine

基于情境奖励机的任务导向抓握,利用强化学习

Multi-dimensional Preference Alignment by Conditioning Reward Itself

通过条件反射奖励实现多维偏好对齐

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

针对异构GPU集群上DL工作负载的混合学习与优化动态调度

A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale

一种保护隐私的云架构,用于大规模分布式机器学习

Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention

通过选择性对抗熵干预提升基于强化学习的视觉推理能力

HypeR Adaptivity: Joint $hr$-Adaptive Meshing via Hypergraph Multi-Agent Deep Reinforcement Learning

HypeR 自适应性:通过 Hypergraph 多智能体深度强化学习实现联合 $hr$-自适应网格化

UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning

UACER:一个不确定性感知的批评者集合框架,用于强健的对抗强化学习

Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning

自适应重放缓冲区用于离线到在线强化学习

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

通过复杂度提升强化学习实现奥林匹亚级几何大型语言模型代理

Grounding Everything in Tokens for Multimodal Large Language Models

将一切建立在多模态大型语言模型的代币基础上

Multi-Objective Reward and Preference Optimization: Theory and Algorithms

多目标奖励与偏好优化:理论与算法

AgriGPT-Omni: A Unified Speech-Vision-Text Framework for Multilingual Agricultural Intelligence

AgriGPT-Omni:多语言农业智能的统一语音-视觉-文本框架

Enhancing Radiology Report Generation and Visual Grounding using Reinforcement Learning

利用强化学习提升放射科报告生成和视觉基础

How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning

怎么刹车?深度强化学习的伦理紧急制动

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

奥林匹克级数学问题解决的长视野推理代理

Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification

学习分裂:一种基于强化学习引导的神经网络验证分离启发式方法

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

OPV:基于结果的流程验证器,实现高效的长链思考验证

Learning Controllable and Diverse Player Behaviors in Multi-Agent Environments

在多智能体环境中学习可控且多样化的玩家行为

Iterative Compositional Data Generation for Robot Control

机器人控制的迭代合成数据生成

Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation

数字孪生监督强化学习框架,用于自主水下导航

Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit

基于课程的强化学习,用于未知弯曲管状管道中的自主无人机导航

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

我们准备好迎接文本转三维生成的强化学习了吗?进步调查

Keyword: diffusion policy

Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation

更聪明,而非更难:层级强化学习扩散策略以实现高效非抓握作

ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

隐式RDP:一种端到端视觉力扩散策略,采用结构性慢速快速学习