生成时间: 2026-04-20 17:59:54 (UTC+8); Arxiv 发布时间: 2026-04-20 20:00 EDT (2026-04-21 08:00 UTC+8)

今天共有 23 篇相关文章

Keyword: reinforcement learning

InfoChess: A Game of Adversarial Inference and a Laboratory for Quantifiable Information Control

InfoChess:一场对抗推理的游戏与可量化信息控制的实验室

Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning

超越单模型优化:在持续强化学习中保持可塑性

Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models

无奖励加权分类器指导作为自回归模型政策改进

"Excuse me, may I say something..." CoLabScience, A Proactive AI Assistant for Biomedical Discovery and LLM-Expert Collaborations

“打扰一下,我能说句话吗......”CoLabScience,一款主动的AI助手,促进生物医学发现和LLM专家协作

CSLE: A Reinforcement Learning Platform for Autonomous Security Management

CSLE:自主安全管理的强化学习平台

Flexible Empowerment at Reasoning with Extended Best-of-N Sampling

通过扩展N对最佳抽样的推理灵活赋能

Majority Voting for Code Generation

代码生成多数投票

Hierarchical Active Inference using Successor Representations

使用后继表示的层级主动推理

Multi-objective Reinforcement Learning With Augmented States Requires Rewards After Deployment

多目标增强学习在增强状态下需要部署后奖励

Zero-Shot Scalable Resilience in UAV Swarms: A Decentralized Imitation Learning Framework with Physics-Informed Graph Interactions

无人机群体中的零射可扩展韧性:具有物理知情图交互的去中心化模仿学习框架

Fuzzy Logic Theory-based Adaptive Reward Shaping for Robust Reinforcement Learning (FARS)

基于模糊逻辑理论的自适应奖励塑造用于强化强化学习(FARS)

Scattered Hypothesis Generation for Open-Ended Event Forecasting

开放式事件预测中的散点假设生成

Placing Puzzle Pieces Where They Matter: A Question Augmentation Framework for Reinforcement Learning

将拼图碎片放在关键位置:强化学习中的问题增强框架

CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

CoEvolve:通过代理-数据相互演进训练LLM代理

AgentV-RL: Scaling Reward Modeling with Agentic Verifier

AgentV-RL:使用智能验证器进行奖励尺度建模

Safe Deep Reinforcement Learning for Building Heating Control and Demand-side Flexibility

安全深度强化学习,用于建筑供暖控制和需求侧灵活性

Beyond One-Size-Fits-All: Adaptive Test-Time Augmentation for Sequential Recommendation

超越一刀切:自适应测试时间增加以实现顺序推荐

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency

AtManRL:通过可区分注意力显著性迈向忠实推理

Detecting and Suppressing Reward Hacking with Gradient Fingerprints

利用渐变指纹检测和抑制奖励黑客行为

Find, Fix, Reason: Context Repair for Video Reasoning

查找、修复、推理:视频推理的上下文修复

Beyond Distribution Sharpening: The Importance of Task Rewards

超越分布锐化:任务奖励的重要性

Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design

评估大语言模型能力在小分子药物设计中的进展

Keyword: diffusion policy

VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation

VADF:视觉自适应扩散政策框架,用于高效机器人操作