生成时间: 2025-11-05 18:05:23 (UTC+8); Arxiv 发布时间: 2025-11-05 20:00 EST (2025-11-06 09:00 UTC+8)

今天共有 26 篇相关文章

Keyword: reinforcement learning

Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch

Tool Zero:通过 Pure RL 从头开始训练工具增强的 LLM

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

更短但不是更差:通过简单的样本作为数学 RLVR 中的长度正则化器进行节俭推理

Automated Reward Design for Gran Turismo

Gran Turismo的自动奖励设计

Second-Order Policy Gradient Methods for the Linear Quadratic Regulator

线性二次调节器的二阶策略梯度方法

A Quantitative Comparison of Centralised and Distributed Reinforcement Learning-Based Control for Soft Robotic Arms

基于集中式和分布式强化学习的软机械臂控制的定量比较

Training Proactive and Personalized LLM Agents

培训主动和个性化的 LLM 代理

Adaptive Cooperative Transmission Design for Ultra-Reliable Low-Latency Communications via Deep Reinforcement Learning

基于深度强化学习的超可靠低时延通信的自适应协同传输设计

Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments

在混合自动驾驶环境中优化多车道交叉路口性能

Structural Plasticity as Active Inference: A Biologically-Inspired Architecture for Homeostatic Control

结构可塑性作为主动推理:一种受生物启发的稳态控制架构

SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

SAIL-RL:通过双重奖励 RL 调整指导 MLLM 何时以及如何思考

Reinforcement learning based data assimilation for unknown state model

基于强化学习的未知状态模型数据同化

Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation

释放多智能体法学硕士推理的力量:从懒惰智能体到深思熟虑

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

自动机条件协同多智能体强化学习

Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning

通过并行多智能体强化学习大规模自动头颈癌碳离子治疗规划

ChartM$^3$: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

ChartM$^3$:用于构建图表理解中多维多步骤视觉推理数据的多阶段代码驱动管道

Auditable-choice reframing unlocks RL-based verification for open-ended tasks

可审计选择重构为开放式任务解锁了基于 RL 的验证

Dexterous Robotic Piano Playing at Scale

灵巧的机器人钢琴大规模演奏

An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

一种用于解决容量定位路由问题的端到端学习方法

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

离线强化学习的自适应邻域约束Q学习

Directional-Clamp PPO

定向钳式 PPO

Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning

强化学习中活体保鲜屏蔽的自适应GR(1)规范修复

Natural-gas storage modelling by deep reinforcement learning

深度强化学习的天然气储存建模

Curriculum Design for Trajectory-Constrained Agent: Compressing Chain-of-Thought Tokens in LLMs

轨迹约束智能体的课程设计:压缩法学硕士中的思维链标记

VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

VidEmo:以情感为中心的视频基础模型的情感树推理

Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning

基于强化学习的集中式多智能体LLM系统的性能和预算控制

From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos

从独奏到交响乐:通过单代理演示协调多代理协作

Keyword: diffusion policy

There is no result