生成时间: 2026-03-18 16:55:28 (UTC+8); Arxiv 发布时间: 2026-03-18 20:00 EDT (2026-03-19 08:00 UTC+8)

今天共有 52 篇相关文章

Keyword: reinforcement learning

SAC-NeRF: Adaptive Ray Sampling for Neural Radiance Fields via Soft Actor-Critic Reinforcement Learning

SAC-NeRF:通过软演员-批判者强化学习实现神经辐射场自适应射线采样

Alternating Reinforcement Learning with Contextual Rubric Rewards

交替强化学习与情境评分标准奖励

Beyond Reward Suppression: Reshaping Steganographic Communication Protocols in MARL via Dynamic Representational Circuit Breaking

超越奖励抑制:通过动态表征电路断裂重塑MARL中的星形图通信协议

BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator

BadLLM-TG:由LLM触发发生器驱动的后门防御者

Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models

Meta-TTRL:一种在统一多模态模型中自我提升测试时间强化学习的元认知框架

Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation

仿真提炼:模拟中的预训练世界模型以实现快速现实适应

CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

CorrectionPlanner:带有自动驾驶强化学习的自我纠正规划器

Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

通过多样化重置和大规模强化学习实现的初现灵巧度

Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning

反对强化学习:重新思考高效且可扩展的深度强化学习的核心原则

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

边境防御博弈论辅助强化学习:基于分析解的早期终止

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

ExpertGen:可扩展的模拟到现实专家政策,从不完美行为的先验中学习

Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning

通过多任务强化学习,协调语音大型语言模型中的副语言理解与生成

Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition

通过无批判强化学习实现跨用户传感器活动识别的协作时间特征生成

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

ARISE:层级强化学习中具内在技能演化的智能体推理

Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

大型奖励模型:利用视觉语言模型生成可推广的在线机器人奖励生成

SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding

SWE-QA-Pro:代表性的基准和可扩展的代码库级代码理解培训配方

Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards

噪声数据对可验证奖励的强化学习具有破坏性

Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV Deployment

去中心化协作无人机部署的通信感知多智能体强化学习

HIPO: Instruction Hierarchy via Constrained Reinforcement Learning

HIPO:通过受限强化学习实现指令层级

DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay

DyJR:通过动态Jensen-Shannon回放实现可验证奖励的强化学习多样性

Execution-Grounded Credit Assignment for GRPO in Code Generation

代码生成中GRPO的执行基础学分分配

SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation

SQL-ASTRA:通过列集匹配和轨迹聚合缓解代理SQL中的稀疏反馈

Enforcing Task-Specified Compliance Bounds for Humanoids via Anisotropic Lipschitz-Constrained Policies

通过各向异性利普希茨约束策略强制执行任务指定的类人生物合规界限

Reliable Reasoning in SVG-LLMs via Multi-Task Multi-Reward Reinforcement Learning

SVG-LLMs中的多任务多奖励强化学习的可靠推理

Offline Exploration-Aware Fine-Tuning for Long-Chain Mathematical Reasoning

离线探索感知的长链数学推理微调

Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism

双重共识:通过两阶段投票机制摆脱无监督RLVR中的虚假多数

VIGOR: VIdeo Geometry-Oriented Reward for Temporal Generative Alignment

VIGOR:VIdeo 以几何为导向的时间生成对齐奖励

Agile Interception of a Flying Target using Competitive Reinforcement Learning

利用竞争强化学习对飞行目标进行敏捷拦截

Controlling Fish Schools via Reinforcement Learning of Virtual Fish Movement

通过强化学习虚拟鱼类运动控制鱼群

Deep Reinforcement Learning-Assisted Automated Operator Portfolio for Constrained Multi-objective Optimization

深度强化学习辅助自动化操作员组合,用于受限多目标优化

Onboard MuJoCo-based Model Predictive Control for Shipboard Crane with Double-Pendulum Sway Suppression

基于MuJoCo的船载预测控制模型,配备双摆抑制

Via Negativa for AI Alignment: Why Negative Constraints Are Structurally Superior to Positive Preferences

通过负面AI对齐:为什么负约束结构优于正向偏好

Agentic AI for SAGIN Resource Management_Semantic Awareness, Orchestration, and Optimization

SAGIN 资源意识、编排与优化中的代理人工智能Management_Semantic

Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition

《追踪线索,框架真相:开放词汇多模态情绪识别中的混合证据演绎推理》

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

多智能体强化学习抵消多卫星系统中延迟的CSI

From the Inside Out: Progressive Distribution Refinement for Confidence Calibration

从内而外:基于置信度校准的渐进分布细化

Kamino: GPU-based Massively Parallel Simulation of Multi-Body Systems with Challenging Topologies

Kamino:基于GPU的多体系统大规模并行仿真,具有挑战性的拓扑结构

EmoLLM: Appraisal-Grounded Cognitive-Emotional Co-Reasoning in Large Language Models

EmoLLM:基于评估的认知-情感共推理在大型语言模型中

When and Why Does Unsupervised RL Succeed in Mathematical Reasoning? A Manifold Envelopment Perspective

无监督强化学习何时以及为何在数学推理中取得成功?多形包围视角

Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLMReward Models

理据很重要:通过代理引导批评学习VLMReward模型的可转移评分标准

What if Pinocchio Were a Reinforcement Learning Agent: A Normative End-to-End Pipeline

如果匹诺曹是一个强化学习代理:一条规范性的端到端管道,会怎样

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

机器人应该什么时候思考?通过强化学习实现具身机器人决策的资源感知推理

Learning Whole-Body Control for a Salamander Robot

学习蝾螈机器人的全身控制

GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution

GDPO-SR:单步生成图像超分辨率的组直接偏好优化

Anticipatory Planning for Multimodal AI Agents

多模态人工智能代理的前瞻性规划

Deep Reinforcement Learning-driven Edge Offloading for Latency-constrained XR pipelines

深度强化学习驱动的边缘卸载,适用于延迟受限的XR管道

Learning to Present: Inverse Specification Rewards for Agentic Slide Generation

学习呈现:agentic slide 生成的逆规范奖励

Stochastic Resetting Accelerates Policy Convergence in Reinforcement Learning

随机重置加速强化学习中的策略趋同

DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models

DreamPlan:通过视频世界模型高效强化和精细调优视觉语言规划师

Efficient Reasoning on the Edge

边缘的高效推理

Keyword: diffusion policy

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

ExpertGen:可扩展的模拟到现实专家政策,从不完美行为的先验中学习

Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy

样式条件扩散策略的可预测性和可读性编码