生成时间: 2026-03-31 17:05:37 (UTC+8); Arxiv 发布时间: 2026-03-31 20:00 EDT (2026-04-01 08:00 UTC+8)

今天共有 68 篇相关文章

Keyword: reinforcement learning

Learning Energy-Efficient Air--Ground Actuation for Hybrid Robots on Stair-Like Terrain

学习在阶梯状地形上的混合机器人的节能空气-地面驱动

Physicochemical-Neural Fusion for Semi-Closed-Circuit Respiratory Autonomy in Extreme Environments

物理化学-神经融合技术在极端环境中实现半闭路呼吸自主能力

SutureAgent: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space

缝合剂:通过目标条件离线强化学习在像素空间中学习手术轨迹

Evolutionary Warm-Starts for Reinforcement Learning in Industrial Continuous Control

工业连续控制强化学习中的进化热启动

Bitboard version of Tetris AI

俄罗斯方块AI的Bitboard版本

LogicDiff: Logic-Guided Denoising Improves Reasoning in Masked Diffusion Language Models

LogicDiff:逻辑引导去噪提升掩盖扩散语言模型中的推理能力

Learning to Select Visual In-Context Demonstrations

学习选择视觉上下文演示

PiCSRL: Physics-Informed Contextual Spectral Reinforcement Learning

PiCSRL:物理知情的情境谱强化学习

Stable Reasoning, Unstable Responses: Mitigating LLM Deception via Stability Asymmetry

稳定推理,不稳定反应:通过稳定性不对称来缓解LLM欺骗

Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching

无监督行为压缩:通过状态-占有匹配学习低维策略流形

Dynamic resource matching in manufacturing using deep reinforcement learning

制造业中的动态资源匹配,利用深度强化学习

Semantic Interaction Information mediates compositional generalization in latent space

语义交互信息在潜空间中介导组合推广

Reasoning-Driven Anomaly Detection and Localization with Image-Level Supervision

基于推理的异常检测与定位,配合图像级监督

Incentivizing Temporal-Awareness in Egocentric Video Understanding Models

在自我中心的视频理解模型中激励时间意识

Autonomous overtaking trajectory optimization using reinforcement learning and opponent pose estimation

利用强化学习和对抗姿态估计实现自主超车轨迹优化

Rethinking Easy-to-Hard: Limits of Curriculum Learning in Post-Training for Deductive Reasoning

重新思考从简单到困难:演绎推理后培训课程学习的局限性

Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning

学习渠道:分析政策梯度导向探索,用于政策内机器人强化学习

D-SPEAR: Dual-Stream Prioritized Experience Adaptive Replay for Stable Reinforcement Learninging Robotic Manipulation

D-SPEAR:双流优先体验自适应回放,用于稳定强化学习机器人操作

DRASTIC: A Dynamic Resource Allocation Framework over 6G Network Slicing in Task-aware Closed-Loop Tactile Internet Applications

DRASTIC:任务感知闭环触觉互联网应用中基于6G网络切片的动态资源分配框架

Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models

大型视觉语言模型中,通过可验证奖励桥接视觉表征与强化学习

Diagnosing Non-Markovian Observations in Reinforcement Learning via Prediction-Based Violation Scoring

通过基于预测的违规评分诊断强化学习中的非马尔可夫观察

Rainbow-DemoRL: Combining Improvements in Demonstration-Augmented Reinforcement Learning

彩虹-演示RL:结合演示增强强化学习的改进

Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

代理驱动自主强化学习研究:四足行走的迭代策略改进

FlowRL: A Taxonomy and Modular Framework for Reinforcement Learning with Diffusion Policies

FlowRL:带有扩散策略的强化学习分类法与模块化框架

Driving Condition-Aware Multi-Agent Integrated Power and Thermal Management for Hybrid Electric Vehicles

混合动力电动汽车的驾驶状态感知多智能体集成动力与热管理

Learning to Focus and Precise Cropping: A Reinforcement Learning Framework with Information Gaps and Grounding Loss for MLLMs

学习聚焦与精确裁剪:一个针对多层次学习者信息缺口和基础损失的强化学习框架

Match or Replay: Self Imitating Proximal Policy Optimization

匹配或回放:自我模仿的近端策略优化

Secure Reinforcement Learning: On Model-Free Detection of Man in the Middle Attacks

安全强化学习:关于无模型检测中间人攻击

DSevolve: Enabling Real-Time Adaptive Scheduling on Dynamic Shop Floor with LLM-Evolved Heuristic Portfolios

DSevolve:通过LLM演化的启发式投资组合,在动态车间实现实时自适应调度

RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement Learning

RTLSeek:通过多阶段多样性导向强化学习提升基于LLM的RTL生成

Optimizing Coverage and Difficulty in Reinforcement Learning for Quiz Composition

优化测验写作的强化学习覆盖和难度

KAT-Coder-V2 Technical Report

KAT-Coder-V2技术报告

TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration

TIR代理:训练一款探索性且高效的图像修复代理

SkyNet: Belief-Aware Planning for Partially-Observable Stochastic Games

天网:部分可观测随机博弈的信念感知规划

Wan-R1: Verifiable-Reinforcement Learning for Video Reasoning

Wan-R1:视频推理的可验证强化学习

Energy Efficient Orchestration in Multiple-Access Vehicular Aerial-Terrestrial 6G Networks

多接入车辆空中-陆地6G网络中的节能编排

Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards

用于学习具有对抗性奖励的线性混合CMDP的近优原始对偶算法

Flip Stunts on Bicycle Robots using Iterative Motion Imitation

利用迭代运动模仿在自行车机器人上做翻转特技

Principal Prototype Analysis on Manifold for Interpretable Reinforcement Learning

可解释强化学习流形的主要原型分析

SARL: Label-Free Reinforcement Learning by Rewarding Reasoning Topology

SARL:通过奖励推理拓扑实现的无标签强化学习

Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL

通过视觉语言嵌入减少基于偏好的强化学习的Oracle反馈

Koopman-based surrogate modeling for reinforcement-learning-control of Rayleigh-Benard convection

基于库普曼的替代建模用于雷利-贝纳德对流的强化-学习-控制

Heddle: A Distributed Orchestration System for Agentic RL Rollout

Heddle:一种用于代理强化学习(Agentic RL)推广的分布式编排系统

$AutoDrive\text{-}P^3$: Unified Chain of Perception-Prediction-Planning Thought via Reinforcement Fine-Tuning

$AutoDrive\text{-}P^3$:通过强化微调实现的感知-预测-规划统一思维链

MedLoc-R1: Performance-Aware Curriculum Reward Scheduling for GRPO-Based Medical Visual Grounding

MedLoc-R1:基于GRPO的医学视觉基础的绩效意识课程奖励安排

A Deep Reinforcement Learning Framework for Closed-loop Guidance of Fish Schools via Virtual Agents

通过虚拟代理闭环引导鱼群的深度强化学习框架

ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models

ERPO:用于大型推理模型的代币级熵调控策略优化

Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion

成本匹配模型预测控制,用于人形运动中高效强化学习

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback

来自人类反馈的强化盗版离线多智能体强化学习

Competitor-aware Race Management for Electric Endurance Racing

电动耐力赛的竞赛管理

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

Kernel-Smith:进化内核优化的统一配方

Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models

重新思考基于视觉自回归模型的文本引导图像编辑中的结构保存

Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids

无批判深度强化学习,用于不规则六边形网格的海洋覆盖路径规划

Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models

通过大型语言模型进化发现强化学习算法

$R_{dm}$: Re-conceptualizing Distribution Matching as a Reward for Diffusion Distillation

$R_{dm}$:重新概念化分配匹配作为扩散蒸馏奖励

CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains

CiQi-代理:多模态代理中视觉、工具与美学的对齐,以促进中国瓷器文化推理

Tac2Real: Reliable and GPU Visuotactile Simulation for Online Reinforcement Learning and Zero-Shot Real-World Deployment

Tac2Real:可靠且GPU的视触模拟,用于在线强化学习和零射点真实部署

Intelligent Radio Resource Slicing for 6G In-Body Subnetworks

6G 内置子网的智能无线资源切片

GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum

GraphWalker:通过合成轨迹课程解答智能知识图题

Learning Partial Action Replacement in Offline MARL

离线MARL中的部分动作替换学习

Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning

与你共见:多模态推理中的感知-推理共进化

DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

DreamLite:一款轻量级的设备内统一图像生成与编辑模型

Dynamic Dual-Granularity Skill Bank for Agentic RL

智能强化学习动态双粒度技能库

Stepwise Credit Assignment for GRPO on Flow-Matching Models

流量匹配模型GRPO的分级分配

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

SOLE-R1:视频语言推理作为机器人强化学习的唯一奖励

Gen-Searcher: Reinforcing Agentic Search for Image Generation

Gen-Searcher:强化能动搜索以生成图像

Keyword: diffusion policy

UMI-Underwater: Learning Underwater Manipulation without Underwater Teleoperation

UMI-水下:学习水下操控而不使用水下远程操作

Tele-Catch: Adaptive Teleoperation for Dexterous Dynamic 3D Object Catching

远程捕捉:灵活动态三维物体捕捉的自适应远程操作