生成时间: 2026-04-16 17:23:31 (UTC+8); Arxiv 发布时间: 2026-04-16 20:00 EDT (2026-04-17 08:00 UTC+8)

今天共有 33 篇相关文章

Keyword: reinforcement learning

Integration of Deep Reinforcement Learning and Agent-based Simulation to Explore Strategies Counteracting Information Disorder

深度强化学习与基于主体的模拟的整合,探索对抗信息失调的策略

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

动态环境中自主人工智能代理学习的自适应记忆结晶

Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation

组内学习序列级奖励的设计条件:令牌梯度抵消

C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination

C$^2$T:字幕结构与与大语言模型对齐的常识奖励学习,用于交通-车辆协调

Automated co-design of high-performance thermodynamic cycles via graph-based hierarchical reinforcement learning

通过基于图的层级强化学习自动协同设计高性能热力学循环

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

通过平滑切比谢夫标量实现的帕累托最优离线强化学习

Synthesis and Deployment of Maximal Robust Control Barrier Functions through Adversarial Reinforcement Learning

通过对抗强化学习综合与部署最大鲁棒控制障碍函数

From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning

从预测到正当化:通过强化学习将情感推理与人类理性对齐

Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence

在马尔可夫依赖下,多数票集合的极小极大最优性和谱路由

Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus

桥接MARL到SARL:通过潜在共识实现的订单无关多智能体变换器

Towards Scalable Lightweight GUI Agents via Multi-role Orchestration

通过多角色编排实现可扩展的轻量级图形界面代理

Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning

基于大型语言模型进行强化学习的不确定奖励链

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

路由上的代表性:克服多时间尺度PPO中的代理黑客攻击

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management

RiskWebWorld:电子商务风险管理中GUI代理的现实互动基准

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

MM-Doc-R1:通过多轮强化学习训练代理进行长文档视觉问答

Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

通过证据意识奖励和自我纠正偏好学习,增强放射科报告生成的强化学习

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

大型模型时代的奖励黑客:机制、涌现的错位与挑战

VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection

VRAG-DFD:基于MLLM的深度伪造检测可验证检索-增强

Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt

迈向细粒度时间感知:带音频侧时间提示的大型音频语言模型训练后

Jump-Start Reinforcement Learning with Vision-Language-Action Regularization

带有视觉-语言-行动正则化的跳板强化学习

Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces

软 $Q(λ)$:一种多步非策略方法,用于使用资格痕迹进行熵正则化强化学习

DUET: Joint Exploration of User Item Profiles in Recommendation System

DUET:推荐系统中用户项目档案的联合探索

Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning

超越言语的性格:通过强化学习在音频大型语言模型中利用角色扮演评估

AlphaCNOT: Learning CNOT Minimization with Model-Based Planning

AlphaCNOT:基于模型的规划学习 CNOT 最小化

RPS: Information Elicitation with Reinforcement Prompt Selection

RPS:信息引导与强化提示选择

MUSE: Multi-Domain Chinese User Simulation via Self-Evolving Profiles and Rubric-Guided Alignment

MUSE:通过自我演进配置文件和评分标准引导对齐实现的多领域中国用户模拟

Drowsiness-Aware Adaptive Autonomous Braking System based on Deep Reinforcement Learning for Enhanced Road Safety

基于深度强化学习的自适应自主制动系统,提升道路安全

Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning

超越多智能体场景中的保守自动驾驶,通过耦合模型预测控制和深度强化学习

DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off

DiPO:细粒度探索与开发权衡的解缠复杂策略优化

Provably Efficient Offline-to-Online Value Adaptation with General Function Approximation

可证明的高效离线到在线值适应,采用一般函数近似

Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation

带有运行时安全屏蔽的分层强化学习,用于电网运行

Enhancing Local Life Service Recommendation with Agentic Reasoning in Large Language Model

在大型语言模型中用能动推理增强本地生活服务推荐

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

从$P(y|x)$到$P(y)$:研究预训练空间中的强化学习

Keyword: diffusion policy

There is no result