生成时间: 2025-11-13 16:30:02 (UTC+8); Arxiv 发布时间: 2025-11-13 20:00 EST (2025-11-14 09:00 UTC+8)

今天共有 25 篇相关文章

Keyword: reinforcement learning

Interpretable by Design: Query-Specific Neural Modules for Explainable Reinforcement Learning

可解释设计:用于可解释强化学习的特定于查询的神经模块

Structured Uncertainty guided Clarification for LLM Agents

结构化不确定性引导的 LLM 代理澄清

TIGER-MARL: Enhancing Multi-Agent Reinforcement Learning with Temporal Information through Graph-based Embeddings and Representations

TIGER-MARL:通过基于图的嵌入和表示,利用时间信息增强多智能体强化学习

UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models

UCO:一种基于大型语言模型的自适应教学的多轮交互式强化学习方法

A Shared Control Framework for Mobile Robots with Planning-Level Intention Prediction

一种具有规划级意图预测的移动机器人共享控制框架

Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning

离线强化学习的价值条件优化扩散策略

Achieving Equilibrium under Utility Heterogeneity: An Agent-Attention Framework for Multi-Agent Multi-Objective Reinforcement Learning

在效用异质性下实现均衡:多智能体多目标强化学习的智能体-注意力框架

SpiralThinker: Latent Reasoning through an Iterative Process with Text-Latent Interleaving

SpiralThinker:通过文本潜在交错的迭代过程进行潜在推理

Advancing Autonomous Emergency Response Systems: A Generative AI Perspective

推进自主应急响应系统:生成式人工智能视角

APEX: Action Priors Enable Efficient Exploration for Robust Motion Tracking on Legged Robots

APEX:动作先验实现高效探索,实现有腿机器人的稳健运动跟踪

Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

向前和向后思考:用于检索增强推理的多目标强化学习

Towards a Generalisable Cyber Defence Agent for Real-World Computer Networks

面向现实世界计算机网络的通用网络防御代理

History-Aware Reasoning for GUI Agents

GUI 代理的历史感知推理

Efficient Reasoning via Reward Model

通过奖励模型进行高效推理

Learning Efficient Communication Protocols for Multi-Agent Reinforcement Learning

学习多智能体强化学习的高效通信协议

Iterated Population Based Training with Task-Agnostic Restarts

与任务无关的重启的基于群体的迭代训练

Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

分支和边界规划:基于模型的强化学习进行精确组合优化

Stabilizing Reinforcement Learning for Honesty Alignment in Language Models on Deductive Reasoning

稳定强化学习以实现演绎推理语言模型中的诚实一致性

A Distributed Training Architecture For Combinatorial Optimization

用于组合优化的分布式训练架构

CoRL-MPPI: Enhancing MPPI With Learnable Behaviours For Efficient And Provably-Safe Multi-Robot Collision Avoidance

CoRL-MPPI:通过可学习行为增强MPPI,以实现高效且可证明安全的多机器人碰撞避免

Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm

有效但隐蔽:通过双级约束强化范式根据顺序推荐重新思考剖面污染

AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting

AdaCuRL:具有无效样本缓解和历史重温的自适应课程强化学习

SPIDER: Scalable Physics-Informed Dexterous Retargeting

SPIDER:可扩展的物理知情灵巧重定向

Quasi-Newton Compatible Actor-Critic for Deterministic Policies

确定性策略的准牛顿兼容行为者-批评者

WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

WMPO:基于世界模型的视觉-语言-行动模型政策优化

Keyword: diffusion policy

There is no result