生成时间: 2026-03-13 16:44:26 (UTC+8); Arxiv 发布时间: 2026-03-13 20:00 EDT (2026-03-14 08:00 UTC+8)

今天共有 48 篇相关文章

Keyword: reinforcement learning

ResWM: Residual-Action World Model for Visual RL

ResWM:视觉强化学习的残余作用世界模型

Learning Tree-Based Models with Gradient Descent

学习带有梯度下降的树状模型

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

通过多智能体系统和组合融合增强大型语言模型的价值对齐

DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning

DeReason:难度意识课程改进了解耦SFT再强化学习的通用推理训练

Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

Senna-2:整合VLM与端到端驾驶政策,以实现一致的决策和规划

ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning

ExecVerify:白盒强化学习,代码执行推理提供可验证的分步骤奖励

Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings

事后诸葛亮的策略优化:在稀疏奖励环境中将失败转化为反馈

Meta-Reinforcement Learning with Self-Reflection for Agentic Search

带有自我反思的元强化学习用于智能搜索

Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning

学习协助:基于物理的多智能体强化学习人与人控制

Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning

通过混合大型语言模型(LLM)-符号规划和LLM引导强化学习的新颖适应

abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance

abx_amr_simulator:一个用于抗菌耐药性下抗生素处方政策优化的模拟环境

Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification

通过离线强化学习和数字孪生验证确保自动机械通气的安全

SliceFed: Federated Constrained Multi-Agent DRL for Dynamic Spectrum Slicing in 6G

SliceFed:用于6G动态频谱切片的联邦约束多智能体DRL

ARROW: Augmented Replay for RObust World models

ARROW:增强重放,适用于RoBust World模型

Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing

用于检测车辆路由中虚假数据注入攻击的对抗强化学习

NFPO: Stabilized Policy Optimization of Normalizing Flow for Robotic Policy Learning

NFPO:机器人策略学习流程规范化的稳定策略优化

SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning

SVLL:基于身体化的具身任务规划的分阶段视觉语言学习

Multi-Agent Reinforcement Learning for UAV-Based Chemical Plume Source Localization

基于无人机的化学羽流源定位的多智能体强化学习

WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing

WeEdit:一个数据集、基准测试和字形引导的文本中心图像编辑框架

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

混合能源感知奖励塑造:一种统一的轻量级物理指导政策优化方法

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

简单配方有效:视觉-语言-行动模型是带有强化学习的自然持续学习者

Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models

Resonate:通过大型音频语言模型的在线反馈强化文本转音频生成

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

多任务强化学习:增强多模态LLM作为评判

STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning

阶梯型:带有交错递归结构变换器的时空注意力,用于离线多任务多智能体强化学习

Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

利用非专家和多元代理的专业知识进行社会盗贼学习:一种自由能源方法

Towards High-Fidelity CAD Generation via LLM-Driven Program Generation and Text-Based B-Rep Primitive Grounding

通过LLM驱动的程序生成和基于文本的B-Rep原始接地实现高保真CAD生成

Hybrid Human-Agent Social Dilemmas in Energy Markets

能源市场中的混合人与主体社会困境

Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

Bielik-Minitron-7B:通过结构化剪枝和知识蒸馏压缩波兰语大型语言模型

The price of decentralization in managing engineering systems through multi-agent reinforcement learning

通过多智能体强化学习管理工程系统的去中心化代价

Learning Visuomotor Policy for Multi-Robot Laser Tag Game

学习多机器人激光枪战游戏的维苏马达策略

Sim-to-reality adaptation for Deep Reinforcement Learning applied to an underwater docking application

深度强化学习的模拟现实适配应用于水下对接应用

AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

AGMARL-DKS:一种基于动态Kubernetes调度的自适应图增强多智能体强化学习

Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics

通过Bellman一致性和混合批评者的跨域策略优化

A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

一个稳健高效的多智能体强化学习框架用于交通信号控制

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

关于强化学习中对大型语言模型代理主动推理的信息自锁

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

驯服对手:通过分数目标实现稳定的极小极大深度确定性策略梯度

Increasing intelligence in AI agents can worsen collective outcomes

人工智能智能体的提升可能恶化集体结果

Automatic Generation of High-Performance RL Environments

高性能强化学习环境的自动生成

Linking Perception, Confidence and Accuracy in MLLMs

连接多层次营销中的感知、信心与准确性

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

IsoCompute手册:LLM RL的优化扩展采样计算

LatentGeo: Learnable Auxiliary Constructions in Latent Space for Multimodal Geometric Reasoning

LatentGeo:多模几何推理的潜在空间中可学习辅助构造

Integrated Online Monitoring and Adaption of Process Model Predictive Controllers

集成在线监控与流程模型预测控制器的适配

HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies

HandelBot:通过快速适应灵巧机器人政策实现现实钢琴演奏

Separable neural architectures as a primitive for unified predictive and generative intelligence

可分离神经架构作为统一预测与生成智能的原始工具

Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

信任你的批评者:强有力的奖励建模与强化学习,实现忠实的图像编辑与生成

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

先关注:通过自回归凝视实现高效且可扩展的视频理解

Keyword: diffusion policy

V2A-DPO: Omni-Preference Optimization for Video-to-Audio Generation

V2A-DPO:视频到音频生成的全偏好优化

Concurrent Prehensile and Nonprehensile Manipulation: A Practical Approach to Multi-Stage Dexterous Tasks

并行抓握与非抓握操作:多阶段灵巧任务的实用方法