生成时间: 2026-03-27 16:56:12 (UTC+8); Arxiv 发布时间: 2026-03-27 20:00 EDT (2026-03-28 08:00 UTC+8)

今天共有 31 篇相关文章

Keyword: reinforcement learning

Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

双图多智能体强化学习用于切换优化

Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards

训练LLM进行多步工具编排,采用受限数据综合和渐进奖励

Decentralized Task Scheduling in Distributed Systems: A Deep Reinforcement Learning Approach

分布式系统中的去中心化任务调度:深度强化学习方法

Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

信任即监控:用户信任与AI开发者行为的演变动态

Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR

生成时修剪:在线推出剪枝,以实现更快更好的RLVR

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

超越模式:语言模型中分布推理的强化学习

Gaze patterns predict preference and confidence in pairwise AI image evaluation

凝视模式预测成对AI图像评估中的偏好和信心

Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization

学习配备人员:离线强化学习与优化大型语言模型以优化仓库人员配置

COIN: Collaborative Interaction-Aware Multi-Agent Reinforcement Learning for Self-Driving Systems

COIN:自动驾驶系统中的协作交互感知多智能体强化学习

Unbiased Multimodal Reranking for Long-Tail Short-Video Search

长尾短视频搜索的多模态重新排名

MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models

MoE-GRPO:通过视觉语言模型中的强化学习优化专家混合

Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

从抽样中学习推广:一种R1风格的分词化交通模拟模型

Distributed Real-Time Vehicle Control for Emergency Vehicle Transit: A Scalable Cooperative Method

分布式实时车辆控制:一种可扩展的协作方法

VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning

VideoTIR:高效工具集成推理,准确理解长视频

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Intern-S1-Pro:万亿尺度的科学多模态基础模型

Bridging Perception and Reasoning: Token Reweighting for RLVR in Multimodal LLMs

桥接感知与推理:多模态大型语言模型中RLVR的代币重权

MSRL: Scaling Generative Multimodal Reward Modeling via Multi-Stage Reinforcement Learning

MSRL:通过多阶段强化学习扩展生成多模态奖励建模

AnyDoc: Enhancing Document Generation via Large-Scale HTML/CSS Data Synthesis and Height-Aware Reinforcement Optimization

AnyDoc:通过大规模HTML/CSS数据综合和高度感知强化优化提升文档生成

Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

Training at Moving Edge:在线验证的提示选择,实现大型推理模型的高效强化学习训练

AnyID: Ultra-Fidelity Universal Identity-Preserving Video Generation from Any Visual References

AnyID:从任何视觉参考生成超保真通用身份视频

Offline Decision Transformers for Neural Combinatorial Optimization: Surpassing Heuristics on the Traveling Salesman Problem

神经组合优化中的离线决策变换器:超越旅行推销员问题的启发式方法

Macroscopic Characteristics of Mixed Traffic Flow with Deep Reinforcement Learning Based Automated and Human-Driven Vehicles

基于深度强化学习的自动化和人驾驶车辆混合交通流的宏观特性

DRL-Based Spectrum Sharing for RIS-Aided Local High-Quality Wireless Networks

基于DRL的频谱共享,用于RIS辅助的本地高质量无线网络

Integrating Deep RL and Bayesian Inference for ObjectNav in Mobile Robotics

在移动机器人中将深度强化学习和贝叶斯推断应用于ObjectNav应用

TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning

TAPO:多语言数学推理的翻译增强策略优化

Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning

Sim2Real零射点强化学习中的最大熵行为探索

Cooperative Deep Reinforcement Learning for Fair RIS Allocation

公平RIS分配的合作深度强化学习

LanteRn: Latent Visual Structured Reasoning

LanteRn:潜在视觉结构化推理

Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning

持久机器人世界模型:通过强化学习稳定多步推广

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

R-C2:循环一致性强化学习提升多模态推理能力

Keyword: diffusion policy

FODMP: Fast One-Step Diffusion of Movement Primitives Generation for Time-Dependent Robot Actions

FODMP:快速一步扩散运动原语生成,用于时间依赖机器人动作