生成时间: 2025-12-15 16:34:43 (UTC+8); Arxiv 发布时间: 2025-12-15 20:00 EST (2025-12-16 09:00 UTC+8)

今天共有 20 篇相关文章

Keyword: reinforcement learning

KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering

KBQA-R1:强化大型语言模型用于知识库问答

In-Context Multi-Objective Optimization

上下文多目标优化

Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts

将强化学习增强的空间指数与传统、高级和学术对应指标进行基准比较

CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound

CORL:通过分支与界限解决的MILP政策强化学习

Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning

带宽受限的变分消息编码用于合作多智能体强化学习

Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control

大规模混合交通控制的多目标强化学习

A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation

A-LAMP:基于代理式大型语言模型的自动化MDP建模与策略生成框架

When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents

当行动教会你思考:通过会话代理中的强化学习实现推理-行动协同效应

RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training

RollMux:针对分解强化学习后训练的阶段级复用

DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning

DAPO:结构感知通过排序设计,采用高阶合成与图对比与强化学习

Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits

对称意识引导等变扩散政策:优点与限制

Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization

通过零空间约束策略优化缓解安全对齐税

Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance

通过行为指导,迈向可信赖的多回合大型语言模型代理

Motif-2-12.7B-Reasoning: A Practitioner's Guide to RL Training Recipes

Motif-2-12.7B-推理:实践者指南 强化学习训练配方

Three methods, one problem: Classical and AI approaches to no-three-in-line

三种方法,一个问题:经典与人工智能方法,防止三字串联

Rethinking Expert Trajectory Utilization in LLM Post-training

重新思考大型语言模型(LLM)培训后专家轨迹的利用

DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

DentalGPT:激励牙科中的多模复杂推理

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

UniBYD:一个跨实体学习机器人作的统一框架,超越模仿人类演示

SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support

SUMFORU:基于LLM的个性化购买决策评审摘要框架

Agile Flight Emerges from Multi-Agent Competitive Racing

敏捷飞行从多智能体竞赛中诞生

Keyword: diffusion policy

There is no result