生成时间: 2026-03-02 16:50:04 (UTC+8); Arxiv 发布时间: 2026-03-02 20:00 EST (2026-03-03 09:00 UTC+8)

今天共有 36 篇相关文章

Keyword: reinforcement learning

Pacing Opinion Polarization via Graph Reinforcement Learning

通过图强化学习实现节奏与观点极化

Learning to Generate Secure Code via Token-Level Rewards

学习通过代币级奖励生成安全代码

Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning

截断步级抽样及过程奖励用于检索增强推理

Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning

人类监督作为信息瓶颈:人类引导学习中错误底线的统一理论

Component Centric Placement Using Deep Reinforcement Learning

基于组件的深度强化学习配置

Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem

构建、合并、解决和适应,利用强化学习解决多重旅行推销员问题

Learning to Reflect and Correct: Towards Better Decoding Trajectories for Large-Scale Generative Recommendation

学习反思与纠正:迈向更佳的大规模生成式推荐解码轨迹

The Auton Agentic AI Framework

Auton 智能人工智能框架

Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning

通过Diffusion薛定谔桥桥跨域强化学习实现动力学间隙的桥梁

MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning

MAGE:多尺度自回归生成用于离线强化学习

EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models

EMO-R3:多模态大型语言模型中情感推理的反思强化学习

Actor-Critic Pretraining for Proximal Policy Optimization

Actor-Critic 预训练用于近端策略优化

See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent

参见、行动、适应:通过个性化VLM引导代理实现无监督跨域视觉适应的主动感知

Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parameteric Policies

超越状态镜像下降:带参数策略的离线策略优化

Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective

在未知约束条件下通过专家演示学习保持安全:Q-学习视角

RUMAD: Reinforcement-Unifying Multi-Agent Debate

RUMAD:增援统一多代理之争

SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale

SWE-rebench V2:大规模语言无关软件工程任务收集

Hybrid Offline-Online Reinforcement Learning for Sensorless, High-Precision Force Regulation in Surgical Robotic Grasping

用于无传感器、高精度力调控的混合离线-在线强化学习,用于外科机器人抓取

TSC: Topology-Conditioned Stackelberg Coordination for Multi-Agent Reinforcement Learning in Interactive Driving

TSC:拓扑条件斯塔克伯格协调,用于交互式驾驶中的多智能体强化学习

Learning to Build: Autonomous Robotic Assembly of Stable Structures Without Predefined Plans

学习建造:无预设计划的自主机器人组装稳定结构

Green or Fast? Learning to Balance Cold Starts and Idle Carbon in Serverless Computing

绿色还是快速?学习在无服务器计算中平衡冷启动和闲置碳

Thinking with Images as Continuous Actions: Numerical Visual Chain-of-Thought

以图像作为连续行动思考:数值视觉思维链

Pessimistic Auxiliary Policy for Offline Reinforcement Learning

悲观的离线强化学习辅助政策

Foundation World Models for Agents that Learn, Verify, and Adapt Reliably Beyond Static Environments

Foundation World 模型,用于能够在静态环境之外可靠地学习、验证和适应的智能体

Curriculum Reinforcement Learning for Quadrotor Racing with Random Obstacles

随机障碍四旋翼竞速课程强化学习

Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis

任务复杂性重要:情感分析中大型语言模型推理的实证研究

Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning

适应性相关加权内在奖励用于强化学习

Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance

用于现实世界冬季道路维护的双级强化学习启发式优化

Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance

回收失败:通过细致的非政策指导挽救RLVR中的探索

Planning from Observation and Interaction

从观察与互动出发的规划

Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints

在有限缓冲和物料套箱限制下学习灵活的作业车间排班

Multi-Objective Reinforcement Learning for Large-Scale Tote Allocation in Human-Robot Collaborative Fulfillment Centers

人机协作履约中心大规模托盘分配的多目标强化学习

Enhancing Spatial Understanding in Image Generation via Reward Modeling

通过奖励建模增强图像生成中的空间理解

SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems

SafeGen-LLM:增强机器人系统任务规划中的安全泛化

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDA Agent:用于高性能 CUDA 内核生成的大规模 Agentic RL

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

DARE-bench:评估数据科学中大型语言模型的建模与指令忠实度

Keyword: diffusion policy

There is no result