生成时间: 2026-07-01 19:26:04 (UTC+8); Arxiv 发布时间: 2026-07-01 20:00 EDT (2026-07-02 08:00 UTC+8)

今天共有 42 篇相关文章

Keyword: reinforcement learning

Locker-based Truck-Drone Routing with Integrated Considerations of Pickups, Deliveries, and No-Fly Zones

基于储物柜的卡车-无人机路由,综合考虑接送、配送和禁飞区

An AI-Based Solution for Secure Service Provisioning in IoT

基于人工智能的物联网安全服务配置解决方案

From Search to Synthesis: Training LLMs as Zero-Shot Workflow Generators

从搜索到综合:将LLM训练为零样本工作流程生成器

Sampling-Based Coordination-Informed Multi-Objective Multi-Robot Reinforcement Learning

基于抽样的协调知情多目标多机器人强化学习

HyPOLE: Hyperproperty-Guided Multi-Agent Reinforcement Learning under Partial Observation

HyPOLE:部分观察下的超属性引导多智能体强化学习

A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management

税务意识个性化投资组合管理的三阶段基础模型

Offline Reinforcement Learning for Fluid Controls: Data-based Multi-observational Policy Extraction

流体控制的离线强化学习:基于数据的多观察策略提取

GenPage: Towards End-to-End Generative Homepage Construction at Netflix

GenPage:迈向Netflix端到端生成式首页建设

Warp RL: Reshaping Base Policy Distributions for Dynamics Adaptation

Warp RL:重塑基础策略分布以适应动态

What Probing Reveals about Autonomous Driving: Linking Internal Prediction Errors to Ego Planning

探究揭示自动驾驶:将内部预测错误与自我规划联系起来

ELASTIC: Efficiently Learning to Adaptively Scale Test-Time Compute for Generative Control Policies

ELASTIC:高效学习如何自适应扩展生成控制策略的测试时计算

AETDICE: Unified Framework and Offline Optimization for Nonlinear Multi-Objective RL

AETDICE:非线性多目标强化学习的统一框架与离线优化

Deep Reinforcement Learning for Spacecraft Attitude Control During Atmospheric Re-Entry

航天器大气再入期间姿态控制的深度强化学习

Safe Online Learning via Smooth Safety-Structured Policy Composition

通过流畅的安全结构化政策构建实现安全在线学习

Smart charging of large fleets of Electric Vehicles: Independent Multi-Agent Reinforcement Learning approaches

大型电动汽车车队的智能充电:独立多智能体强化学习方法

Failure-Based Testing for Deep Reinforcement Learning Agents

基于失败的深度强化学习代理测试

Stage-Transition Dense Reward Modeling for Reinforcement Learning

阶段-过渡密集奖励建模用于强化学习

Xiaomi-GUI-0 Technical Report

小米GUI-0技术报告

Learning to Select, Not Relearn: Hard-Routed Mixtures of Reasoning LoRAs

学习选择,而非重新学习:硬性推理LoRA的混合

Stabilization Learning: A Paradigm Transition Bridging Control Theory and Machine Learning

稳定学习:范式转变——连接控制理论与机器学习

Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

哪些代币重要?RLVR的自适应代币选择,带有相对惊讶指数

Token-Sparse Medical Multimodal Reasoning via Dual-Stream Reinforcement Learning

通过双流强化学习实现令牌稀疏医学多模态推理

What Memory Do GUI Agents Really Need? From Passive Records to Active Task-Driving States

GUI 代理真正需要什么内存?从被动记录到主动任务驱动状态

Robust Autonomous UAV Landing on Maritime Platforms via Multimodal Agentic AI and Active Wave Compensation

通过多模态智能人工智能和主动波次补偿实现稳健自主无人机在海上平台着陆

Think in English, Answer in Korean: Efficient Adaptation of Multilingual Tool-Using Agents

用英语思考,用韩语回答:多语言工具使用代理的高效适应

FastDSAC: Enhancing Policy Plasticity via Constrained Exploration for Scalable Humanoid Locomotion

FastDSAC:通过受限探索提升可扩展类人机动的政策可塑性

Diffusing Blame: Task-Dependent Credit Assignment in Biologically Plausible Dual-Stream Networks

分散责任:生物学上合理的双流网络中的任务依赖性学分分配

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

UniCoder:通过符号奖励和引用引导的代码优化实现统一的可视化到代码生成

Addressing Over-Refusal in LLMs with Competing Rewards

解决带有竞争奖励的大型语言模型中的过度拒绝问题

Reinforcement Learning-Based Control for an Inline Skating Humanoid Robot

基于强化学习的直排轮人机器人控制

Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR

RLVR中保持几何的正交法一初始化,用于低秩适应

Z-1: Efficient Reinforcement Learning for Vision-Language-Action Models

Z-1:视觉-语言-行动模型的高效强化学习

CoDex: Learning Compositional Dexterous Functional Manipulation without Demonstrations

CoDex:学习无演示的组合灵巧功能操作

Learning Locomotion on Discrete Terrain via Minimal Proximity Sensing

通过最小接近感测学习离散地形上的运动

LeCropFollow: Latent Space Planning for Navigation in Unstructured Crop Fields

LeCropFollow:无结构作物田中导航的潜在空间规划

Adapting Generalist Robot Policies with Semantic Reinforcement Learning

采用语义强化学习的通用机器人策略

GR2 Technical Report

GR2技术报告

OopsieVerse: A Safety Benchmark with Damage-Aware Simulation for Robot Manipulation

OopsieVerse:机器人操控的安全基准,具备损伤感知模拟

On the Comparison of Reinforcement Learning and Adaptive Control for Linear Systems under Packet Loss and Uncertainty

关于强化学习与线性系统在数据包丢失和不确定性条件下的自适应控制比较

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

分诊:代理强化学习中的角色类型学分作业

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

带有元认知反馈的强化学习在大型语言模型中激发忠实的不确定性表达

Keyword: diffusion policy

From Grasps to Dexterity: Large-Scale Grasp Pretraining for Dexterous Manipulation

从抓握到灵巧:大规模抓握预训练以提升灵巧操作