生成时间: 2026-04-22 17:23:58 (UTC+8); Arxiv 发布时间: 2026-04-22 20:00 EDT (2026-04-23 08:00 UTC+8)

今天共有 35 篇相关文章

Keyword: reinforcement learning

ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants

ARGUS:由数据流不变量引导的代理GPU优化

Discrete Tilt Matching

离散倾斜匹配

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

ARES:自适应红队化及政策奖励系统的端到端修复

Prioritizing the Best: Incentivizing Reliable Multimodal Reasoning by Rewarding Beyond Answer Correctness

优先考虑最佳:通过奖励超越答案正确性的奖励,激励可靠的多模态推理

From Particles to Perils: SVGD-Based Hazardous Scenario Generation for Autonomous Driving Systems Testing

从粒子到危险:基于SVGD的自动驾驶系统测试危险场景生成

Fine-Tuning Small Reasoning Models for Quantum Field Theory

量子场论中微调小推理模型

Reasoning Structure Matters for Safety Alignment of Reasoning Models

推理结构对推理模型安全对齐至关重要

Self-Improving Tabular Language Models via Iterative Group Alignment

通过迭代组比对实现自我改进的表形式语言模型

Toward Clinically Acceptable Chest X-ray Report Generation: A Qualitative Retrospective Pilot Study of CXRMate-2

迈向临床可接受的胸部X光报告生成:CXRMate-2的定性回顾性初步研究

Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning

非策略强化学习中批判者学习的低秩适应

SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution

SAVOIR:通过Shapley奖励归因学习社会智慧

Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning

引导分布匹配蒸馏与基于梯度的强化学习

Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback

基于人类反馈安全强化学习的政策梯度原始-对偶方法

Intentional Updates for Streaming Reinforcement Learning

流式强化学习的有意更新

TRN-R1-Zero: Text-rich Network Reasoning via LLMs with Reinforcement Learning Only

TRN-R1-Zero:仅强化学习的文本丰富网络推理 LLMs

OLLM: Options-based Large Language Models

OLLM:基于选项的大型语言模型

Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

基于选择性对抗运动先验的强化学习,用于类人机器人的多步态学习

Reinforcement Learning Enabled Adaptive Multi-Task Control for Bipedal Soccer Robots

强化学习实现了双足足球机器人的自适应多任务控制

GraphRAG-IRL: Personalized Recommendation with Graph-Grounded Inverse Reinforcement Learning and LLM Re-ranking

GraphRAG-IRL:基于图的逆向强化学习和大型语言模型重新排序的个性化推荐

The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models

大型语言模型中言语抽动的兴起:跨前沿模型的系统分析

ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation

ReflectMT:内化反思以实现高效且高质量的机器翻译

RL-ABC: Reinforcement Learning for Accelerator Beamline Control

RL-ABC:加速器光束线控制的强化学习

Reasoning-Aware AIGC Detection via Alignment and Reinforcement

通过对齐和强化实现推理感知AIGC检测

Thinking Before Matching: A Reinforcement Reasoning Paradigm Towards General Person Re-Identification

配对前思考:一种基于强化推理的范式,适用于一般人重新认同

Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

学会认可正确步骤:可视化生成的目标感知流程优化

LASER: Learning Active Sensing for Continuum Field Reconstruction

激光:学习主动感测以重建连续介质场

HP-Edit: A Human-Preference Post-Training Framework for Image Editing

HP-EDIT:基于人类偏好的图像编辑后培训框架

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

EVPO:解释了在LLM后培训中自适应批评者利用中的方差策略优化

Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic

多模态推理与大型语言模型用于视觉语义算术

Lyapunov-Certified Direct Switching Theory for Q-Learning

Lyapunov认证的Q-学习直接交换理论

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

SmartPhotoCrafter:自动摄影图像编辑的统一推理、生成与优化

Pause or Fabricate? Training Language Models for Grounded Reasoning

暂停还是制造?基础推理的语言模型训练

Learning Hybrid-Control Policies for High-Precision In-Contact Manipulation Under Uncertainty

在不确定性下学习高精度接触操作的混合控制策略

FASTER: Value-Guided Sampling for Fast RL

加快:快速强化学习的价值引导抽样

Safe Continual Reinforcement Learning in Non-stationary Environments

非固定环境中的安全持续强化学习

Keyword: diffusion policy

There is no result