生成时间: 2025-12-22 16:34:35 (UTC+8); Arxiv 发布时间: 2025-12-22 20:00 EST (2025-12-23 09:00 UTC+8)

今天共有 31 篇相关文章

Keyword: reinforcement learning

Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

通过科学家对齐的工作流程探究大型语言模型的科学通用智能

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Turn-PPO:利用PPO估算回合级优势,提升代理型大型语言模型中的多回合强化学习

GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning

GB-DQN:用于非定常强化学习的梯度增强DQN模型

UniRel-R1: RL-tuned LLM Reasoning for Knowledge Graph Relational Question Answering

UniRel-R1:基于强化学习的大型语言模型推理知识图关系问答

Value Under Ignorance in Universal Artificial Intelligence

在通用人工智能中无知下的价值

Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making

学习规划,规划学习:自适应层级RL-MPC用于样本高效决策

Reinforcement Learning for Self-Improving Agent with Skill Library

带有技能库的自我提升智能体强化学习

Towards Senior-Robot Interaction: Reactive Robot Dog Gestures

迈向老年人与机器人互动:反应性机器人狗的手势

Enhancing AIGC Service Efficiency with Adaptive Multi-Edge Collaboration in A Distributed System

通过分布式系统中的自适应多边协作提升AIGC服务效率

Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors

多教师学习中的保守偏见:为什么代理更喜欢低回报的顾问

MAPPO-LCR: Multi-Agent Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games

MAPPO-LCR:空间公共物品博弈中的多智能体策略优化与本地合作奖励

MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation

MMRAG-RFT:可解释多模态检索增强生成的两阶段强化微调

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

推理调色板:通过潜在情境化调节推理,实现(V)LMs的可控探索

CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency

CheXPO-v2:具有知识图谱一致性的胸部X光VLM的偏好优化

Learning When to Look: A Disentangled Curriculum for Strategic Perception in Multimodal Reasoning

学会何时寻找:多模态推理中战略感知的解开课程

Cooperative Energy Scheduling of Multi-Microgrids Based on Risk-Sensitive Reinforcement Learning

基于风险敏感强化学习的多微电网合作能源调度

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

种子校验器1.5:通过经验学习掌握本科水平定理证明

A Theoretical Analysis of State Similarity Between Markov Decision Processes

马尔可夫决策过程状态相似性的理论分析

Understanding Generalization in Role-Playing Models via Information Theory

通过信息理论理解角色扮演模型中的泛化

Large Language Models as Pokémon Battle Agents: Strategic Play and Content Generation

大型语言模型作为宝可梦战斗代理人:战略玩法与内容生成

Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks

利用大型语言模型进行语言引导空间任务的神经符号控制

Xiaomi MiMo-VL-Miloco Technical Report

小米MiMo-VL-Miloco技术报告

Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning

利用多智能体强化学习评估长期电力市场设计以实现雄心勃勃的脱碳目标

Learning Safe Autonomous Driving Policies Using Predictive Safety Representations

利用预测安全表述学习安全自动驾驶政策

SCOPE: Sequential Causal Optimization of Process Interventions

范围:过程干预的顺序因果优化

Trust-Region Adaptive Policy Optimization

信任区域自适应策略优化

About Time: Model-free Reinforcement Learning with Timed Reward Machines

关于时间:无模型的定时奖励机强化学习

Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes

作为下降的规划:学习能量景观中的目标条件潜在轨迹综合

AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning

AnyTask:一个自动化任务与数据生成框架,用于推动模拟到现实政策学习

Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy

分布式稳健模仿学习:可认证自治的分层控制架构

Keyword: diffusion policy

Kinematics-Aware Diffusion Policy with Consistent 3D Observation and Action Space for Whole-Arm Robotic Manipulation

运动学感知扩散政策,具有一致的三维观察和动作空间,用于全臂机器人作