生成时间: 2026-06-24 18:52:28 (UTC+8); Arxiv 发布时间: 2026-06-24 20:00 EDT (2026-06-25 08:00 UTC+8)

今天共有 27 篇相关文章

Keyword: reinforcement learning

EXPO-SQL: Execution-based Clause-level Policy Optimization for Text-to-SQL

EXPO-SQL:基于执行的条款级策略优化,用于文本转SQL的

Enforcing Human-like Kinematics in Dexterous Piano Playing via Adversarial Posture Regularization

通过对抗姿势规范化,强化灵巧钢琴演奏中的类人运动学

KLip-PPO: A per-sample KL perspective on PPO-Clip

KLip-PPO:PPO-Clip的每样本KL视角

Offline Reinforcement Learning for Warehouse SLAM Throughput Control

仓库SLAM吞吐量控制的离线强化学习

Learning to Trigger: Reinforcement Learning at the Large Hadron Collider

学习触发:大型强子对撞机上的强化学习

Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control

通过约束流形控制实现安全且可推广的分层多智能体强化学习

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

强化学习:面向广泛且持续有益的模型

TurboMPC: Fast, Scalable, and Differentiable Model Predictive Control on the GPU

TurboMPC:GPU上的快速、可扩展且可微分的模型预测控制

Breaking the Filter Bubble: A Semantic Pareto-DQN Framework for Multi-Objective Recommendation

打破过滤泡沫:多目标推荐的语义帕累托-DQN框架

An LMM for Precisely Grounding Elements in Documents

用于精确接地文档元素的 LMM

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

通过多目标强化学习进行LLM预训练的整体数据调度器

AsyncOPD: How Stale Can On-Policy Distillation Be?

AsyncOPD:政策提炼会有多陈旧?

An Introduction to Causal Reinforcement Learning

因果强化学习导论

SkyChain Intelligence: A Blockchain-Secured Multi-Agent DRL Framework for Low-Altitude Embodied Artificial Intelligence

SkyChain Intelligence:一个区块链安全、多智能体的DRL框架,用于低空具身人工智能

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

基于Transformer的跨领域语言模型:架构、应用与关键评估

Managing Task Execution for Unknown Workloads in Batteryless IoT: A Hardware-Agnostic Evaluation

无电池物联网中未知工作负载的任务执行管理:硬件无关性评估

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation

加速基于扩散的并行性和训练器辅助生成的视觉生成大型语言模型的分解强化学习

BRAVR: An AP-Assisted Online DRL Mechanism for Interactive VR Bitrate Adaptation over Wi-Fi

BRAVR:一种AP辅助在线DRL机制,用于Wi-Fi上的互动VR码率适配

NoContactNoWorries: Estimating Contact through Vision and Proprioception for In-Hand Dexterous Manipulation

NoContactNoWorries:通过视觉和本体感觉估计接触,实现手部灵巧操作

video-SALMONN-R$^3$: Learning to ReWatch, ReAsk, and ReAnswer for Efficient Video Understanding

视频-SALMONN-R$^3$:学习重看、再问和再答以实现高效视频理解

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

计算机智能体的强化学习,具备自主评估

PointVG-R: Internalizing Geometric Reasoning in MLLMs for Precise Pointing Localization via Visual Chain of Thought

PointVG-R:通过视觉思维链实现精确指向定位的MLLM中的几何推理

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

ASALT:多智能体强化学习中横向转移的自适应状态对齐

ViTexQA: A Multi-Frame Temporal Perception Dataset for Video Text Question Answering

ViTexQA:多帧时间感知数据集,用于视频文本问答

Themis: An explainable AI-enabled framework for Reinforcement Learning with Human Feedback

Themis:一个可解释的人工智能框架,用于基于人类反馈的强化学习

CineCap: Structured Reasoning with Spatio-Temporal Anchors for Cinematographic Video Captioning

CineCap:电影视频字幕的结构化推理与时空锚点

LaGO: Latent Action Guidance for Online Reinforcement Learning

LaGO:在线强化学习的潜在行动指导

Keyword: diffusion policy

There is no result