生成时间: 2026-04-08 17:05:07 (UTC+8); Arxiv 发布时间: 2026-04-08 20:00 EDT (2026-04-09 08:00 UTC+8)

今天共有 32 篇相关文章

Keyword: reinforcement learning

Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO

领地涂漆战:在竞争性多代理PPO中诊断与缓解故障模式

Enhancing sample efficiency in reinforcement-learning-based flow control: replacing the critic with an adaptive reduced-order model

增强基于强化学习的流量控制中的样本效率:用自适应降阶模型替代批判者

Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

Vintix II:决策预训练变换器是一款可扩展的上下文强化学习器

Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

通过国际象棋推理:推理如何从数据通过微调和强化学习演变

SenseAI: A Human-in-the-Loop Dataset for RLHF-Aligned Financial Sentiment Reasoning

SenseAI:一个用于RLHF对齐金融情绪推理的人机环绕数据集

Bypassing the CSI Bottleneck: MARL-Driven Spatial Control for Reflector Arrays

绕过CSI瓶颈:MARL驱动的反射阵列空间控制

Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors

学习对焦:针对可重构反射镜的无CSI分层MARL

Cross-fitted Proximal Learning for Model-Based Reinforcement Learning

基于模型的近端学习交叉拟合

Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem

车辆即提示:针对异构车队车辆路由问题的统一深度强化学习框架

Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification

正合我的水平:一个统一的多语言简化框架,用于熟练度感知文本简化

Curr-RLCER:Curriculum Reinforcement Learning For Coherence Explainable Recommendation

Curr-RLCER:课程强化学习以实现连贯性可解释的推荐

Neural Assistive Impulses: Synthesizing Exaggerated Motions for Physics-based Characters

神经辅助冲动:为基于物理的字符合成夸张动作

Reproducing AlphaZero on Tablut: Self-Play RL for an Asymmetric Board Game

在Tablut上重现AlphaZero:自玩RL的非对称桌游

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning

我们能信任黑盒大型语言模型吗?通过偏扩散和多智能体强化学习实现的不可靠边界检测 LLM

OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

OmniDiagram:通过视觉询问奖励推进统一图代码生成

UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning

UniCreative:通过无引用强化学习统一长形式逻辑与短形式闪耀

ActivityEditor: Learning to Synthesize Physically Valid Human Mobility

活动编辑器:学习综合物理上有效的人类流动性

SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills

SignalClaw:以LLM引导的可解释交通信号控制技能的进化综合

COSMO-Agent: Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

COSMO-Agent:用于闭环优化、仿真和建模编排的工具增强代理

An Iterative Test-and-Repair Framework for Competitive Code Generation

一个用于竞争性代码生成的迭代测试与修复框架

Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming

通过多样性感知红队,揭示视觉-语言-行动模型中的语言脆弱性

CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control

CuraLight:以LLM为中心的交通信号控制进行辩论引导数据管理

Can Large Language Models Reinvent Foundational Algorithms?

大型语言模型能否重新发明基础算法?

Emergent social transmission of model-based representations without inference

基于模型的表征在无推理的情况下涌现的社会传递

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

带有增强步骤级过渡的层级强化学习,适用于LLM代理

Reinforcement Learning with Negative Tests as Completeness Signal for Formal Specification Synthesis

以负性测试作为形式规范综合完备信号的强化学习

Precise Aggressive Aerial Maneuvers with Sensorimotor Policies

精准的攻击性空中机动与感觉运动政策

AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning

AgentGL:通过强化学习实现大型语言模型的智能图学习

Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning

具有一致性策略学习的显著性引导表征用于视觉无监督强化学习

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

MARL-GPT:多智能体强化学习基础模型

Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

通过双自一致性强化学习进行科学图形程序综合

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

MMEmb-R1:推理增强多模嵌入,结合配对感知选择和自适应控制

Keyword: diffusion policy

There is no result