生成时间: 2026-03-19 16:45:58 (UTC+8); Arxiv 发布时间: 2026-03-19 20:00 EDT (2026-03-20 08:00 UTC+8)

今天共有 44 篇相关文章

Keyword: reinforcement learning

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation: Radiologist-Like Workflow with Clinically Verifiable Rewards

多模态多代理强化学习用于放射科报告生成:类放射科医生的工作流,具有临床可验证的奖励

Federated Multi Agent Deep Learning and Neural Networks for Advanced Distributed Sensing in Wireless Networks

联邦多智能体深度学习与神经网络,用于无线网络中的先进分布式感测

Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability,Stability and Fairness

动态定价的多智能体强化学习:平衡盈利、稳定性与公平性

Leveraging Large Vision Model for Multi-UAV Co-perception in Low-Altitude Wireless Networks

利用大视觉模型实现低空无线网络中多无人机共感知

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

MHPO:稳定强化学习中的调制危害感知策略优化

DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

DeepStage:学习针对多阶段APT战役的自主防御政策

Rewarding DINO: Predicting Dense Rewards with Vision Foundation Models

奖励DINO:用愿景基础模型预测高密度奖励

Efficient and Reliable Teleoperation through Real-to-Sim-to-Real Shared Autonomy

通过真实到模拟到真实共享自治实现高效且可靠的远程操作

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Astrolabe:引导前进过程强化学习,用于蒸馏自回归视频模型

PaAgent: Portrait-Aware Image Restoration Agent via Subjective-Objective Reinforcement Learning

PaAgent:通过主观-客观强化学习实现的肖像感知图像恢复代理

SLowRL: Safe Low-Rank Adaptation Reinforcement Learning for Locomotion

SLowRL:安全的低阶适应强化学习,用于移动

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge

REAL:面向法官的回归感知强化学习

Shielded Reinforcement Learning Under Dynamic Temporal Logic Constraints

动态时间逻辑约束下的屏蔽强化学习

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

MetaClaw:Just Talk——一个在野外通过元学习和进化的特工

Adaptive Anchor Policies for Efficient 4D Gaussian Streaming

高效4D高斯流的自适应锚策略

Network- and Device-Level Cyber Deception for Contested Environments Using RL and LLMs

利用强化学习和大型语言模型(LLM)进行网络和设备级网络欺骗,针对有争议环境

WINFlowNets: Warm-up Integrated Networks Training of Generative Flow Networks for Robotics and Machine Fault Adaptation

WINFlowNets:机器人和机器故障适配生成流网络的预热集成网络训练

Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations

对比推理对齐:从隐藏表征中学习强化

ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency Optimization

ReLMXEL:基于强化学习的自适应内存控制器,具备可解释的能量和延迟优化功能

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

信息密度:奖励信息密集的追踪以促进高效推理

Ruyi2.5 Technical Report

如意2.5技术报告

Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress

使用视觉-语言模型进行反复推理估算长期具身任务进展

Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing

基于物理的离线强化学习消除了海上航线中的灾难性燃料浪费

ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling

ShuttleEnv:一个用于羽毛球战略建模的交互式数据驱动强化环境

A Progressive Visual-Logic-Aligned Framework for Ride-Hailing Adjudication

一个基于视觉逻辑的渐进式叫车裁定框架

EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection

EvoGuard:一个可扩展的基于代理强化学习的框架,用于实用且不断发展的AI生成图像检测

Efficient Exploration at Scale

大规模高效勘探

CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval

CRE-T1预览技术报告:超越对比学习,实现推理密集型检索

AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization

AR-CoPO:将自回归视频生成与对比策略优化对齐

Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control

高效的软行为者-批评者,基于LLM的动作级指导,实现持续控制

Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation

解读多目标机器人导航的情境感知人类偏好

From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation

从孤立评分到协作排名:基于LLM论文评估的比较原生框架

Complementary Reinforcement Learning

补充强化学习

Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

通过随机逆最优性进行基准测试强化学习:生成具有已知最优策略的系统

Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards

Linux权限升级本地LLM代理培训后可验证奖励

Flow Matching Policy with Entropy Regularization

带熵正则化的流匹配策略

Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation

网络攻击机器学习分类与统计评估:合成数据生成的分类与对抗学习方法

CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

CoVerRL:通过生成器-验证器共进化打破无标签推理中的共识陷阱

Federated Distributional Reinforcement Learning with Distributional Critic Regularization

结合分布批判正则化的联合分布强化学习

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents

CodeScout:代码搜索代理强化学习的有效配方

Procedural Generation of Algorithm Discovery Tasks in Machine Learning

机器学习中算法发现任务的过程生成

Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs

具有无界成本的一般MDP的算符理论基础与策略梯度方法

Training Diffusion Language Models for Black-Box Optimization

黑箱优化的扩散语言模型训练

Unified Policy Value Decomposition for Rapid Adaptation

快速适应的统一策略价值分解

Keyword: diffusion policy

There is no result