生成时间: 2025-10-15 16:29:59 (UTC+8); Arxiv 发布时间: 2025-10-15 20:00 EDT (2025-10-16 08:00 UTC+8)

今天共有 35 篇相关文章

Keyword: reinforcement learning

AI Agents for the Dhumbal Card Game: A Comparative Study

Dhumbal 纸牌游戏的 AI 代理:一项比较研究

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

GAR:用于形式定理证明的生成对抗强化学习

Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning

协作多智能体强化学习的鲁棒性和弹性的实证研究

Don't Walk the Line: Boundary Guidance for Filtered Generation

不要走这条线:过滤生成的边界指南

Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling

通过序列建模在随机博弈中进行鲁棒对抗强化学习

ADARL: Adaptive Low-Rank Structures for Robust Policy Learning under Uncertainty

ADARL:不确定性下稳健政策学习的自适应低秩结构

Efficient Restarts in Non-Stationary Model-Free Reinforcement Learning

非平稳无模型强化学习中的高效重启

Scaling Long-Horizon LLM Agent via Context-Folding

通过上下文折叠扩展长视野 LLM 代理

Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning

重新思考动态稀疏训练在可扩展深度强化学习中的作用

Self-Verifying Reflection Helps Transformers with CoT Reasoning

自验证反射帮助变压器进行 CoT 推理

Reinforced Preference Optimization for Recommendation

推荐的强化偏好优化

PromptFlow: Training Prompts Like Neural Networks

PromptFlow:训练神经网络等提示

Diffusion Models for Reinforcement Learning: Foundations, Taxonomy, and Development

强化学习的扩散模型:基础、分类法和发展

$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning

$\mathbf{T^3}$:减少强化学习中的信念偏差以进行主动推理

Human-in-the-Loop Bandwidth Estimation for Quality of Experience Optimization in Real-Time Video Communication

用于实时视频通信体验质量优化的人机交互带宽估计

Heterogeneous RBCs via deep multi-agent reinforcement learning

通过深度多智能体强化学习的异构红细胞

Deep SPI: Safe Policy Improvement via World Models

深度 SPI:通过世界模型改进安全政策

Finite-time Convergence Analysis of Actor-Critic with Evolving Reward

具有演变奖励的行为者-批评者的有限时间收敛分析

Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints

考虑配电网电压约束的大规模电动汽车智能充电的物理信息强化学习

Pretraining in Actor-Critic Reinforcement Learning for Robot Motion Control

机器人运动控制的演员-批评强化学习预训练

Robot Learning: A Tutorial

机器人学习:教程

Biased-Attention Guided Risk Prediction for Safe Decision-Making at Unsignalized Intersections

在无信号交叉路口进行安全决策的偏注意力引导风险预测

Bayesian Optimization for Dynamic Pricing and Learning

动态定价和学习的贝叶斯优化

A Task-Efficient Reinforcement Learning Task-Motion Planner for Safe Human-Robot Cooperation

一种用于安全人机协作的任务高效强化学习任务运动规划器

Inclusive Fitness as a Key Step Towards More Advanced Social Behaviors in Multi-Agent Reinforcement Learning Settings

包容性适应度是多智能体强化学习环境中迈向更高级社交行为的关键一步

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

CoIRL-AD:自动驾驶潜在世界模型中的协作-竞争模仿-强化学习

Laminar: A Scalable Asynchronous RL Post-Training Framework

Laminar:可扩展的异步 RL 训练后框架

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

记忆即行动:长视野代理任务的自主上下文管理

Expert or not? assessing data quality in offline reinforcement learning

专家与否?评估离线强化学习中的数据质量

Reasoning Pattern Matters: Learning to Reason without Human Rationales

推理模式很重要:学习在没有人类理由的情况下进行推理

Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning

通过约束强化学习进行月球表面作的自主腿移动纵

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

ERA:通过具身先验学习和在线强化学习将VLM转化为具身智能体

Reflection-Based Task Adaptation for Self-Improving VLA

基于反思的自强VLA任务自适应

Residual MPC: Blending Reinforcement Learning with GPU-Parallelized Model Predictive Control

残差 MPC:将强化学习与 GPU 并行模型预测控制相结合

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

DeepMMSearch-R1:在多模态 Web 搜索中赋能多模态 LLM

Keyword: diffusion policy

There is no result