生成时间: 2026-03-17 17:00:01 (UTC+8); Arxiv 发布时间: 2026-03-17 20:00 EDT (2026-03-18 08:00 UTC+8)

今天共有 88 篇相关文章

Keyword: reinforcement learning

Agentic AI, Retrieval-Augmented Generation, and the Institutional Turn: Legal Architectures and Financial Governance in the Age of Distributional AGI

代理人工智能、检索增强生成与制度转向:分布式AGI时代的法律架构与金融治理

Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework

将深度强化学习提炼为可解释的模糊规则:一个可解释的人工智能框架

Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota

利用强化学习解决带有排放配额的动态车辆路由问题的需求接受

Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs

Pragma-VL:迈向多层次多层次营销中安全与帮助性的务实仲裁

ICPRL: Acquiring Physical Intuition from Interactive Control

ICPRL:从交互式控制获得物理直觉

Evidence-based Distributional Alignment for Large Language Models

大型语言模型的循证分布对齐

LightningRL: Breaking the Accuracy-Parallelism Trade-off of Block-wise dLLMs via Reinforcement Learning

LightningRL:通过强化学习打破分块型数字大型语言模型在准确性与并行性的权衡

AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints

AutoTool:通过解耦熵约束自动扩展强化学习工具使用能力

Learning When to Trust in Contextual Bandits

学会何时信任情境强盗

Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

大型视觉语言模型中的语言引导令牌压缩与强化学习

Scalable Machines with Intrinsic Higher Mental-State Dynamics

具有内在高阶心理状态动力学的可扩展机器

REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

REFINE-DP:通过强化学习对类人机车操控进行扩散政策微调

Implicit Maximum Likelihood Estimation for Real-time Generative Model Predictive Control

实时生成模型预测控制的隐式最大似然估计

Knowledge Distillation for Large Language Models

大型语言模型的知识蒸馏

Retrieve, Schedule, Reflect: LLM Agents for Chip QoR Optimization

检索、调度、反射:用于芯片生活质量优化的大型语言模型代理

Your Vision-Language-Action Model Already Has Attention Heads For Path Deviation Detection

你的视觉-语言-行动模型已经有注意力,用于路径偏差检测

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

微调还不够:端到端自动驾驶中协作模仿与强化学习的并行框架

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

APEX-Searcher:通过代理规划与执行增强大型语言模型的搜索能力

Path-conditioned Reinforcement Learning-based Local Planning for Long-Range Navigation

基于路径条件强化学习的远程导航局部规划

ATCC: Adaptive Concurrency Control for Unforeseen Agentic Transactions

ATCC:针对不可预见代理事务的自适应并发控制

SmoothVLA: Aligning Vision-Language-Action Models with Physical Constraints via Intrinsic Smoothness Optimization

SmoothVLA:通过内在平滑性优化将视觉-语言-行动模型与物理约束对齐

LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement

视听语音增强的LLM引导强化学习

Chunk-Guided Q-Learning

区块引导Q学习

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

监督式微调与强化学习:大型语言模型训练后方法的研究

LLM-Guided Safe Reinforcement Learning for Energy System Topology Reconfiguration

用于能源系统拓扑重构的LLM引导安全强化学习

GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models

GRPO与反射对大型语言模型中数学推理的奖励

Amortizing Trajectory Diffusion with Keyed Drift Fields

利用密钥漂移场摊销轨迹扩散

Improving Visual Reasoning with Iterative Evidence Refinement

通过迭代证据精炼提升视觉推理

Diffusion Reinforcement Learning via Centered Reward Distillation

通过中心奖励蒸馏的扩散强化学习

Understanding Strategic Platform Entry and Seller Exploration: A Stackelberg Model

理解战略平台进入与卖家探索:斯塔克尔伯格模型

GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies

GoldenStart:蒸馏流策略中的Q引导先验与熵控制

MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos

MistExit:学习程序化视频中早期错误检测的退出方法

Load-Aware Locomotion Control for Humanoid Robots in Industrial Transportation Tasks

工业运输任务中人形机器人的载荷感知运动控制

Data-Driven Physics Embedded Dynamics with Predictive Control and Reinforcement Learning for Quadrupeds

基于数据驱动的物理嵌入式动力学,结合四足动物的预测控制和强化学习

AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models

AgroNVILA:多视角农业多模态大型语言模型的感知-推理解耦

VIP-Loco: A Visually Guided Infinite Horizon Planning Framework for Legged Locomotion

VIP-Loco:一种视觉引导的无限地平线规划框架,用于腿部移动

Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling

通过高效的多样响应抽样揭示大型语言模型中的长尾安全失效

SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI

SPARQ:为节能边缘人工智能激增早期退出神经网络

From $\boldsymbol{\logπ}$ to $\boldsymbolπ$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight

从 $\boldsymbol{\logπ}$ 到 $\boldsymbolπ$:通过双边解耦衰减概率梯度权重来调控软剪裁中的发散

Physics-Informed Policy Optimization via Analytic Dynamics Regularization

通过解析动力学正则化实现物理启发策略优化

AI Can Learn Scientific Taste

人工智能可以学习科学品味

VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning

VLA-Thinker:通过图像思维推理提升视觉-语言-行动模型

Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms

可视化批评者匹配损失景观以解释在线强化学习控制算法

MorFiC: Fixing Value Miscalibration for Zero-Shot Quadruped Transfer

MorFiC:修正零射击四足运输的数值误校准

Machine Learning-Driven Intelligent Memory System Design: From On-Chip Caches to Storage

机器学习驱动的智能内存系统设计:从片上缓存到存储

Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning

将批评者匹配损失景观可视化调整为非策略强化学习

A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study

用于解读强化学习的损失景观可视化框架:ADHDP案例研究

EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

EcoFair-CH-MARL:具备实时排放预算和公平性保证的可扩展受限分层多智能体强化学习

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting

VisionCoach:通过视觉感知提示强化扎根视频推理

DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement Learning

DeFRiS:通过去中心化联合强化学习实现的孤岛协作物联网应用调度

Ego to World: Collaborative Spatial Reasoning in Embodied Systems via Reinforcement Learning

自我到世界:通过强化学习实现具身系统中的协作空间推理

Shopping Companion: A Memory-Augmented LLM Agent for Real-World E-Commerce Tasks

购物伴侣:一个用于现实电商任务的记忆增强大型语言模型代理

Sample-Efficient Hypergradient Estimation for Decentralized Bi-Level Reinforcement Learning

去中心化双级强化学习的样本高效高梯度估计

ViSA: Visited-State Augmentation for Generalized Goal-Space Contrastive Reinforcement Learning

ViSA:广义目标空间对比强化学习的访问状态增强

PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning

PerlAD:迈向基于伪仿真的强化学习的增强闭环端到端自动驾驶

EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing

编辑HF-1M:百万尺度丰富的人类偏好反馈用于图像编辑

CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models

CyCLeGen:视觉基础模型中的周期一致布局预测与图像生成

Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing

分子标识符视觉提示与可验证强化学习用于化学反应图解析

CycleRL: Sim-to-Real Deep Reinforcement Learning for Robust Autonomous Bicycle Control

CycleRL:用于稳健自动自行车控制的模拟到真实深度强化学习

Interference-Aware K-Step Reachable Communication in Multi-Agent Reinforcement Learning

多智能体强化学习中的干扰感知K步可达通信

Writer-R1: Enhancing Generative Writing in LLMs via Memory-augmented Replay Policy Optimization

Writer-R1:通过内存增强重放策略优化提升LLM中的生成式写作

HALO:Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation

HALO:通过可微分模拟缩小重载人形敏捷运动技能的模拟与现实差距

Sampling-guided exploration of active feature selection policies

采样引导探索主动特征选择策略

MMKU-Bench: A Multimodal Update Benchmark for Diverse Visual Knowledge

MMKU-Bench:多模态更新基准,提升多元视觉知识

Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

安全流Q-Learning:基于可达性流程策略的离线安全强化学习

Multi-Scale Control of Large Agent Populations: From Density Dynamics to Individual Actuation

多尺度控制大型药物群体:从密度动态到个体驱动

KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots

KiRAS:关键帧引导自我模仿,实现四足机器人中稳健且自适应的技能学习

Iterative Learning Control-Informed Reinforcement Learning for Batch Process Control

迭代学习控制驱动强化学习用于批量过程控制

Towards Foundation Models for Consensus Rank Aggregation

迈向共识排名聚合的基础模型

SAGE: Multi-Agent Self-Evolution for LLM Reasoning

SAGE:多智能体自我演化用于大型语言模型推理

Probe-then-Plan: Environment-Aware Planning for Industrial E-commerce Search

探查然后规划:工业电子商务搜索的环境感知规划

Evaluating the Robustness of Reinforcement Learning based Adaptive Traffic Signal Control

基于强化学习的自适应交通信号控制的鲁棒性评估

NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation

NavThinker:用于社会导航中耦合预测与规划的行动条件世界模型

Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

轨迹多样性驱动的稳健视觉与语言导航

Fusian: Multi-LoRA Fusion for Fine-Grained Continuous MBTI Personality Control in Large Language Models

Fusian:多重LoRA融合用于大型语言模型中细粒度连续MBTI人格控制

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

测试时强化学习中的放大效应:安全性与推理漏洞

MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

MA-VLCM:一种用于多智能体团队环境中策略价值估计的视觉语言批评模型

Gym-V: A Unified Vision Environment System for Agentic Vision Research

Gym-V:一个用于代理视觉研究的统一视觉环境系统

Listening to the Echo: User-Reaction Aware Policy Optimization via Scalar-Verbal Hybrid Reinforcement Learning

倾听回声:通过标量-语言混合强化学习实现用户反应感知策略优化

Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions

随机复合包含关系的无偏和偏差减弱前反射后向拆分方法

From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation

从被动观察者到主动批评者:强化学习引发机器人操作的过程推理

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

代码A1:通过强化学习对抗性演进代码LLM和测试LLM的进化

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

HSImul3R:模拟就绪人机场景交互的物理环路重建

GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

GlyphPrinter:用于字形准确视觉文本渲染的区域分组直接偏好优化

Keyword: diffusion policy

REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

REFINE-DP:通过强化学习对类人机车操控进行扩散政策微调

OCRA: Object-Centric Learning with 3D and Tactile Priors for Human-to-Robot Action Transfer

OCRA:以对象为中心的学习,结合3D和触觉先验,实现人到机器人的动作传递

ReMAP-DP: Reprojected Multi-view Aligned PointMaps for Diffusion Policy

ReMAP-DP:用于扩散政策的重新投影多视角对齐点图

Master Micro Residual Correction with Adaptive Tactile Fusion and Force-Mixed Control for Contact-Rich Manipulation

主控微残余校正,配备自适应触觉融合和力混合控制,实现接触丰富操作