生成时间: 2026-04-03 16:58:40 (UTC+8); Arxiv 发布时间: 2026-04-03 20:00 EDT (2026-04-04 08:00 UTC+8)

今天共有 36 篇相关文章

Keyword: reinforcement learning

Trustworthy AI-Driven Dynamic Hybrid RIS: Joint Optimization and Reward Poisoning-Resilient Control in Cognitive MISO Networks

可信赖的AI驱动动态混合RIS:认知MISO网络中的联合优化与奖励中毒韧性控制

Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming

通过强化学习和并行思维扩展推理代币:竞技编程的证据

Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning

Malliavin 演算用于自适应逆强化学习中的反事实梯度估计

RIFT: A RubrIc Failure Mode Taxonomy and Automated Diagnostics

RIFT:Rubric 失效模式分类法与自动诊断

Residuals-based Offline Reinforcement Learning

基于残差的离线强化学习

Improving Latent Generalization Using Test-time Compute

利用测试时计算改进潜在泛化

Reinforcing Consistency in Video MLLMs with Structured Rewards

通过结构化奖励强化视频MLLM的一致性

When Reward Hacking Rebounds: Understanding and Mitigating It with Representation-Level Signals

奖励黑客反弹时:理解并利用表征级信号缓解

Soft MPCritic: Amortized Model Predictive Value Iteration

软MPCritic:摊销模型预测价值迭代

DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data

DISCO-TAB:一个用于保护隐私的复杂临床数据综合的分层强化学习框架

Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training

匹配准确性,不同几何结构:进化策略与大型语言模型后训练中的 GRPO

DeltaMem: Towards Agentic Memory Management via Reinforcement Learning

DeltaMem:通过强化学习迈向能动记忆管理

Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling

边听边思考:长视野序列建模中的快慢递现

MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction

MM-ReCoder:通过强化学习和自我纠正推进图表到代码生成

Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error

伪量子化的演员-批评者算法,用于对噪声时间差误的鲁棒性

ContextBudget: Budget-Aware Context Management for Long-Horizon Search Agents

ContextBudget:面向长期搜索代理的预算感知上下文管理

DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment

DEFT:分布引导高效微调以实现人类对齐

TestDecision: Sequential Test Suite Generation via Greedy Optimization and Reinforcement Learning

TestDecision:通过贪婪优化与强化学习生成顺序测试套件

STRIVE: Structured Spatiotemporal Exploration for Reinforcement Learning in Video Question Answering

STRIVE:视频问答中强化学习的结构化时空探索

Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids

基于Gibbs先验的物理知情强化学习用于电网拓扑控制

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

并非所有代币的视野都相同:基于感知的策略优化适用于大型视觉语言模型

From Guessing to Placeholding: A Cost-Theoretic Framework for Uncertainty-Aware Code Completion

从猜测到定位:一个用于不确定性感知代码完成的成本理论框架

The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning

非平稳性中排名和梯度的损失:样本重量衰减在强化学习中缓解可塑性损失

Captioning Daily Activity Images in Early Childhood Education: Benchmark and Algorithm

早期儿童教育中的每日活动图片字幕:基准与算法

ProCeedRL: Process Critic with Exploratory Demonstration Reinforcement Learning for LLM Agentic Reasoning

ProCeedRL:提供探索性演示强化学习的过程批评者,适用于LLM代理推理

Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning

Apriel-Reasoner:通用且高效推理的强化后培训

Bridging Discrete Planning and Continuous Execution for Redundant Robot

桥接离散规划与冗余机器人的持续执行

Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning

通过强化学习优化 RAG 重排序器,并用 LLM 反馈

Auction-Based Online Policy Adaptation for Evolving Objectives

基于拍卖的在线政策适应以适应不断变化的目标

Multi-Agent Video Recommenders: Evolution, Patterns, and Open Challenges

多代理视频推荐器:演变、模式与开放挑战

When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning

何时提出:不确定门控语言辅助强化学习

Model-Based Reinforcement Learning for Control under Time-Varying Dynamics

基于模型的强化学习用于时间变化动力学控制

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

SKILL0:情境内能动强化学习,用于技能内化

CIVIC: Cooperative Immersion Via Intelligent Credit-sharing in DRL-Powered Metaverse

CIVIC:通过智能信用共享实现的合作沉浸式,在日日学习驱动的元宇宙中实现

Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

通过样本路由统一群相对和自蒸馏策略优化

Beyond Referring Expressions: Scenario Comprehension Visual Grounding

超越指称表达:情境理解视觉基础

Keyword: diffusion policy

There is no result