生成时间: 2025-10-16 16:30:26 (UTC+8); Arxiv 发布时间: 2025-10-16 20:00 EDT (2025-10-17 08:00 UTC+8)

今天共有 35 篇相关文章

Keyword: reinforcement learning

Energy-Guided Diffusion Sampling for Long-Term User Behavior Prediction in Reinforcement Learning-based Recommendation

基于强化学习的推荐中用于长期用户行为预测的能量引导扩散采样

Maximum In-Support Return Modeling for Dynamic Recommendation with Language Model Prior

使用语言模型先验进行动态推荐的最大支持内回报建模

Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning

修剪不会损害鲁棒性:强化学习中的认证权衡

Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation

用于胎儿超声解释的认识感知视觉语言基础模型

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

DeepPlanner:通过优势塑造扩展深度研究代理的计划能力

Escaping Local Optima in the Waddington Landscape: A Multi-Stage TRPO-PPO Approach for Single-Cell Perturbation Analysis

在沃丁顿景观中逃避局部最优:用于单细胞扰动分析的多阶段 TRPO-PPO 方法

Repairing Reward Functions with Human Feedback to Mitigate Reward Hacking

利用人工反馈修复奖励函数以减轻奖励黑客攻击

Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games

在KL正则化零和马尔可夫博弈中实现对数遗憾

DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models

DriveCritic:使用视觉语言模型对自动驾驶进行上下文感知、人性化的评估

EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems

EvoTest:自我改进智能体系统的进化测试时间学习

Altruistic Ride Sharing: A Community-Driven Approach to Short-Distance Mobility

利他拼车:社区驱动的短途出行方法

Beyond Static LLM Policies: Imitation-Enhanced Reinforcement Learning for Recommendation

超越静态 LLM 策略:用于推荐的模仿增强强化学习

SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning

SAJA:基于多智能体深度强化学习的状态-行动联合攻击框架

Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation

超越正确性:奖励检索增强生成中的忠实推理

ChatR1: Reinforcement Learning for Conversational Reasoning and Retrieval Augmented Question Answering

ChatR1:用于对话推理和检索增强问答的强化学习

AOAD-MAT: Transformer-based multi-agent deep reinforcement learning model considering agents' order of action decisions

AOAD-MAT:考虑智能体行动决策顺序的基于Transformer的多智能体深度强化学习模型

Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

用于鲁棒机器人控制的离线到在线强化学习中的对抗性微调

A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control

在线强化学习中用于持续控制的 Transformers 新视角

Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation

强化学习与屏蔽生成模型的结合:用于文本到图像生成的 Mask-GRPO

Bridge the Gap: Enhancing Quadruped Locomotion with Vertical Ground Perturbations

弥合差距:通过垂直地面扰动增强四足动物运动

Offline and Online KL-Regularized RLHF under Differential Privacy

差分隐私下的离线和在线KL正则化RLHF

Tandem Training for Language Models

语言模型的串联训练

Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

注意力照亮LLM推理:预规划锚定节奏赋能细粒度策略优化

What is the objective of reasoning with reinforcement learning?

强化学习推理的目的是什么?

Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking

稳定 RLHF 的信息论奖励建模:检测和缓解奖励黑客攻击

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

简单嵌入提高了 Actor-Critic 代理中的样本效率

From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails

从拒绝到恢复:生成式人工智能护栏的控制论方法

Asymptotically optimal reinforcement learning in Block Markov Decision Processes

块马尔可夫决策过程中的渐近最优强化学习

The Art of Scaling Reinforcement Learning Compute for LLMs

LLM 扩展强化学习计算的艺术

Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach

对强化学习系统的可证明无敌对抗性攻击:一种速率失真信息论方法

MimicKit: A Reinforcement Learning Framework for Motion Imitation and Control

MimicKit:用于运动模仿和控制的强化学习框架

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

面包屑推理:使用压缩信标进行内存效率推理

PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning

PhysMaster:通过强化学习掌握视频生成的物理表示

Keyword: diffusion policy

Energy-Guided Diffusion Sampling for Long-Term User Behavior Prediction in Reinforcement Learning-based Recommendation

基于强化学习的推荐中用于长期用户行为预测的能量引导扩散采样

Tactile-Conditioned Diffusion Policy for Force-Aware Robotic Manipulation

用于力感知机器人作的触觉条件扩散策略