生成时间: 2025-12-17 16:33:56 (UTC+8); Arxiv 发布时间: 2025-12-17 20:00 EST (2025-12-18 09:00 UTC+8)

今天共有 22 篇相关文章

Keyword: reinforcement learning

AI-Powered Annotation Pipelines for Stabilizing Large Language Models: A Human-AI Synergy Approach

人工智能驱动的注释流水线用于稳定大型语言模型:人机协同方法

Meta Hierarchical Reinforcement Learning for Scalable Resource Management in O-RAN

用于O-RAN可扩展资源管理的元层级强化学习

Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce

时间限制的建议:电子商务强化学习策略

RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing

RAST-MoE-RL:一种用于网约车深度强化学习的体制感知时空MoE框架

Explainable reinforcement learning from human feedback to improve alignment

通过人类反馈进行可解释的强化学习,以提升对齐

Adaptive digital twins for predictive decision-making: Online Bayesian learning of transition dynamics

自适应数字孪生用于预测决策:在线贝叶斯学习过渡动力学

Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model

建筑任务中的样本高效机器人技能学习:分层强化学习与视觉-语言-行动VLA模型的基准测试

OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving

OmniDrive-R1:强化驱动的交错多模态思维链,实现可信的视觉语言自动驾驶

Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning

通过无动作变换器编码-解码器进行元强化学习的上下文表示

RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees

RADAR:利用基于强化学习的动态草稿树加速大型语言模型推断

A First-Order Logic-Based Alternative to Reward Models in RLHF

RLHF 中基于一阶逻辑的奖励模型替代方案

Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis

通过图像激励工具增强思维进行医学图像分析

Understanding and Improving Hyperbolic Deep Reinforcement Learning

理解与改进双曲深度强化学习

GLM-TTS Technical Report

GLM-TTS技术报告

A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks

一个基于阈值触发的深度Q网络框架,用于自主软件定义IIoT边缘网络中的自我修复

Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations

多智能体医疗决策共识矩阵系统:一个用于肿瘤学MDT咨询的智能协作框架

A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data

一种数据物理混合生成模型,用于针对患者的特定中风后运动康复,利用可穿戴传感器数据

Context-Picker: Dynamic context selection using multi-stage reinforcement learning

上下文选择器:利用多阶段强化学习进行动态上下文选择

RecGPT-V2 Technical Report

RecGPT-V2 技术报告

Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes

离散行动非马尔可夫奖励决策过程中的基于模型的强化学习

CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

CRISP:单眼视频中结合平面场景原语的接触引导Real2Sim

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

TimeLens:重新思考多模大型语言模型的视频时间接地

Keyword: diffusion policy

There is no result