生成时间: 2026-03-16 17:00:27 (UTC+8); Arxiv 发布时间: 2026-03-16 20:00 EDT (2026-03-17 08:00 UTC+8)

今天共有 28 篇相关文章

Keyword: reinforcement learning

Thermodynamics of Reinforcement Learning Curricula

强化学习课程的热力学

Maximum Entropy Exploration Without the Rollouts

最大熵探索,无需推广

Beyond Motion Imitation: Is Human Motion Data Alone Sufficient to Explain Gait Control and Biomechanics?

超越运动模拟:仅凭人体运动数据是否足以解释步态控制和生物力学?

CALF: Communication-Aware Learning Framework for Distributed Reinforcement Learning

CALF:分布式强化学习的沟通感知学习框架

Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages

具有熵引导步选和分步优势的扩散大型语言模型的强化学习

A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric

在克拉梅尔度规下分布贝尔曼算子的谱重访

Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback

交换引导偏好学习,实现基于人类反馈的个性化强化学习

FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

FastDSAC:释放高维类人生物控制中最大熵强化学习的潜力

Collaborative Multi-Agent Optimization for Personalized Memory System

个性化记忆系统的协作多智能体优化

RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction

RetroReasoner:用于战略逆综合预测的推理大型语言模型

EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning

EvolveCoder:通过对抗性验证演进测试用例以实现代码强化学习

Think and Answer ME: Benchmarking and Exploring Multi-Entity Reasoning Grounding in Remote Sensing

思考与回答我:基准测试与探索多实体推理 遥感基础

FLUX: Accelerating Cross-Embodiment Generative Navigation Policies via Rectified Flow and Static-to-Dynamic Learning

FLUX:通过整流和静态到动态学习加速跨实体生成导航策略

Reinforcement Learning for Elliptical Cylinder Motion Control Tasks

椭圆圆柱体运动控制任务的强化学习

A Multi-task Large Reasoning Model for Molecular Science

分子科学的多任务大推理模型

Rethinking Multiple-Choice Questions for RLVR: Unlocking Potential via Distractor Design

重新思考RLVR的多项选择题:通过干扰设计释放潜力

Beyond Imitation: Reinforcement Learning Fine-Tuning for Adaptive Diffusion Navigation Policies

超越模仿:强化学习为自适应扩散导航策略微调

Test-time RL alignment exposes task familiarity artifacts in LLM benchmarks

测试时强化学习对齐暴露了大型语言模型基准测试中的任务熟悉度伪影

Enhanced Drug-drug Interaction Prediction Using Adaptive Knowledge Integration

利用自适应知识集成增强药物-药物相互作用预测

Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models

文本到图像模型的强化学习后训练有限差分流优化

Thinking in Streaming Video

流媒体视频中的思考

Efficient Real-World Autonomous Racing via Attenuated Residual Policy Optimization

通过衰减残差策略优化实现高效的现实世界自动驾驶竞速

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

长篇奖励Bench:评估长形式生成的奖励模型

ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning

ARL-Tangram:释放代理强化学习中的资源效率

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

PISmith:基于强化学习的红队,用于快速注射防御

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

修补漏洞:在多语言翻译强化学习中缓解奖励黑客

Topo-R1: Detecting Topological Anomalies via Vision-Language Models

Topo-R1:通过视觉-语言模型检测拓扑异常

Visual-ERM: Reward Modeling for Visual Equivalence

视觉ERM:视觉等效性的奖励建模

Keyword: diffusion policy

There is no result