生成时间: 2026-05-04 18:15:45 (UTC+8); Arxiv 发布时间: 2026-05-04 20:00 EDT (2026-05-05 08:00 UTC+8)

今天共有 30 篇相关文章

Keyword: reinforcement learning

Exploring LLM biases to manipulate AI search overview

探索大语言模型偏见以操控AI搜索概览

Dynamic-TD3: A Novel Algorithm for UAV Path Planning with Dynamic Obstacle Trajectory Prediction

Dynamic-TD3:一种带动态障碍物轨迹预测的无人机路径规划新算法

XekRung Technical Report

XekRung技术报告

World Model for Robot Learning: A Comprehensive Survey

机器人学习世界模型:一项综合调查

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

Wasserstein 分布稳健遗憾优化用于基于人类反馈的强化学习

E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

E$^2$DT:具有经验感知采样的高效决策变换器,用于机器人操作

TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization

TUR-DPO:拓扑与不确定性感知的直接偏好优化

Bayesian Optimization in Linear Time

线性时间中的贝叶斯优化

Pessimism-Free Offline Learning in General-Sum Games via KL Regularization

通过KL正则化实现的无悲观离线学习一般和博弈

Data Deletion Can Help in Adaptive RL

数据删除可以帮助自适应强化学习

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

奥德修斯:通过强化学习将VLM规模缩放到100+回合决策

Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity

统一正确策略优化:打破RLVR对多样性的冷漠

AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees

AlphaInventory:通过大型语言模型演进白盒库存策略并提供部署保证

GaMMA: Towards Joint Global-Temporal Music Understanding in Large Multimodal Models

GaMMA:迈向大型多模态模型中的全球-时空音乐联合理解

ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning

ResRL:通过负样本投射残留强化学习提升LLM推理能力

PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning

PrefMoE:专家混合奖励学习的稳健偏好建模

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

基于模型的强化学习,策略优化和离线估计中具有双倍神谕效率

Beyond Heuristics: Learnable Density Control for 3D Gaussian Splatting

超越启发式:3D高斯喷溅的可学习密度控制

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

物理原生世界模型:生成世界建模的哈密顿视角

Learning while Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

部署中的学习:通用机器人政策的舰队规模强化学习

AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

AEM:多回合能动强化学习中的自适应熵调制

Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning

通过需求感知课程强化学习提升LLM代码生成

A Policy-Driven DRL Framework for System-Level Tradeoff Control in NR-U/Wi-Fi Coexistence

一个基于策略的DRL框架,用于NR-U/Wi-Fi共存中的系统级权衡控制

Recovering Hidden Reward in Diffusion-Based Policies

基于扩散的政策中回收隐藏奖励

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

学习如何从自己点击:基于GUI的政策自提炼

Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning

增强拉格朗日乘数网络,用于强化学习中的状态安全

STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack

凝视:多模态毒性攻击的分阶段时间对齐与红队引擎

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

学习如何以及记忆什么:基于认知的两阶段优化以促进记忆进化

SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

SAVGO:连续控制中带余弦相似性的状态-作用值几何学习

Keyword: diffusion policy

MSACT: Multistage Spatial Alignment for Stable Low-Latency Fine Manipulation

MSACT:多级空间对准,实现稳定低延迟的精细操作