生成时间: 2025-11-20 16:31:02 (UTC+8); Arxiv 发布时间: 2025-11-20 20:00 EST (2025-11-21 09:00 UTC+8)

今天共有 33 篇相关文章

Keyword: reinforcement learning

Causally-Informed Reinforcement Learning for Adaptive Emotion-Aware Social Media Recommendation

因果知情强化学习用于适应性情绪感知社交媒体推荐

Learning Interestingness in Automated Mathematical Theory Formation

学习自动化数学理论形成中的趣味性

Empowering Multi-Turn Tool-Integrated Reasoning with Group Turn Policy Optimization

通过组回合策略优化赋能多回合工具集成推理

Transformer-Guided Deep Reinforcement Learning for Optimal Takeoff Trajectory Design of an eVTOL Drone

变压器引导深度强化学习,实现eVTOL无人机的最佳起飞轨迹设计

Skin-R1: Toward Trustworthy Clinical Reasoning for Dermatological Diagnosis

Skin-R1:迈向可信的皮肤科诊断临床推理

Z-Merge: Multi-Agent Reinforcement Learning for On-Ramp Merging with Zone-Specific V2X Traffic Information

Z-Merge:多智能体强化学习,用于匝道合并,结合特定区域的V2X交通信息

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

康定斯基5.0:图像与视频生成的基础模型家族

Task Specific Sharpness Aware O-RAN Resource Management using Multi Agent Reinforcement Learning

任务特定锐利度感知的O-RAN资源管理,采用多智能体强化学习

Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environment

在动态、部分观察、时间序列环境中的模拟人类学习

Distributed primal-dual algorithm for constrained multi-agent reinforcement learning under coupled policies

分布式原始对偶算法,用于耦合策略下的约束多智能体强化学习

Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

通过轨迹优化与动作量化学习类人强化学习代理

From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs

从求解到验证:大型语言模型中稳健推理的统一目标

Vehicle Routing Problems via Quantum Graph Attention Network Deep Reinforcement Learning

通过量子图注意力网络深度强化学习解决车辆路由问题

Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning

蒙面自回归变分加速:快速推断使实用的强化学习成为现实

Learning Where, What and How to Transfer: A Multi-Role Reinforcement Learning Approach for Evolutionary Multitasking

学习在哪里、什么以及如何转移:一种用于进化多任务的多角色强化学习方法

Reasoning in Diffusion Large Language Models is Concentrated in Dynamic Confusion Zones

扩散中的推理大语言模型集中在动态混淆区

Symmetry-Breaking in Multi-Agent Navigation: Winding Number-Aware MPC with a Learned Topological Strategy

多智能体导航中的对称破缺:带有学习拓扑策略的绕数感知MPC。

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

EntroPIC:通过熵稳定与比例积分控制实现LLMs的稳定长期训练

GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning

GRPO-RM:通过GRPO驱动强化学习进行微调表示模型

ChartEditor: A Reinforcement Learning Framework for Robust Chart Editing

ChartEditor:一个用于稳健图表编辑的强化学习框架

Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception

看,放大,理解:具身感知的机器人眼球

Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments

动态环境中通过多智能体强化学习进行路径规划

Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention

平台无关的强化学习框架,用于通过图关注安全探索杂乱环境

Terra Nova: A Comprehensive Challenge Environment for Intelligent Agents

Terra Nova:智能代理的全面挑战环境

Communication-Pipelined Split Federated Learning for Foundation Model Fine-Tuning in UAV Networks

通信流水线分流式联合学习用于无人机网络基础模型微调

Meta-Black-Box Optimization with Bi-Space Landscape Analysis and Dual-Control Mechanism for SAEA

SAEA的元黑匣子优化结合双空间景观分析和双重控制机制

SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

SRPO:视觉-语言-行动模型的自指政策优化

Continual Reinforcement Learning for Cyber-Physical Systems: Lessons Learned and Open Challenges

网络物理系统的持续强化学习:经验教训与开放挑战

VisPlay: Self-Evolving Vision-Language Models from Images

VisPlay:从图像中自我进化的视觉语言模型

DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models

DeepThinkVLA:增强视觉-语言-行动模型的推理能力

The Impact of Quantization on Large Reasoning Model Reinforcement Learning

量化对大型推理模型强化学习的影响

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

GeoVista:基于网络增强的地理定位智能视觉推理

Keyword: diffusion policy

Theoretical Closed-loop Stability Bounds for Dynamical System Coupled with Diffusion Policies

动力系统与扩散策略耦合的理论闭环稳定性界限