生成时间: 2025-12-11 16:32:16 (UTC+8); Arxiv 发布时间: 2025-12-11 20:00 EST (2025-12-12 09:00 UTC+8)

今天共有 22 篇相关文章

Keyword: reinforcement learning

Enhancing Reliability across Short and Long-Form QA via Reinforcement Learning

通过强化学习提升短型和长型QA的可靠性

Optimizing Algorithms for Mobile Health Interventions with Active Querying Optimization

通过主动查询优化优化移动健康干预算法

Financial Instruction Following Evaluation (FIFE)

评估后财务指导(FIFE)

Training Multi-Image Vision Agents via End2End Reinforcement Learning

通过端对端强化学习训练多图像视觉代理

Learning Unmasking Policies for Diffusion Language Models

学习扩散语言模型的解除掩蔽策略

Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning

电弧炉在电价波动性下的调度与强化学习

Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam Hopping

Tyche:卫星波束跳跃照明模式的混合计算框架

COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning

COVLM-RL:利用VLM引导强化学习实现自动驾驶的关键面向对象推理

CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning

CFLight:通过反事实学习提升交通信号控制的安全

Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation

通过路径引导MAPPO和定向前沿分配实现在杂乱环境中实现的通用协作搜索与捕获

RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning

RouteRAG:通过强化学习从文本和图中高效检索-增强生成

Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search

迈向通过语言模型、属性比对和战略性搜索实现闭环分子发现

Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing

掌握多样化、未知且杂乱的赛道,打造基于视觉的无人机竞速

SynthPix: A lightspeed PIV images generator

SynthPix:光速PIV图像生成器

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models

d-TreeRPO:迈向更可靠的扩散语言模型策略优化

Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies

小型稀疏无人机群群动态一次性传递关键数据:MARL缩放研究中的模型问题

MOA: Multi-Objective Alignment for Role-Playing Agents

MOA:角色扮演代理的多目标对齐

RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning

RIFT:一种基于强化学习的可扩展方法论用于LLM加速器故障评估

ChronusOmni: Improving Time Awareness of Omni Large Language Models

ChronusOmni:提升对全域大型语言模型的时间感知

FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

FlipLLM:利用强化学习对多模大型语言模型进行高效的位翻转攻击

STACHE: Local Black-Box Explanations for Reinforcement Learning Policies

胡须:强化学习政策的本地黑箱解释

Keyword: diffusion policy

Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation

多模态机器人作的同时触觉-视觉感知