生成时间: 2026-01-28 16:37:07 (UTC+8); Arxiv 发布时间: 2026-01-28 20:00 EST (2026-01-29 09:00 UTC+8)

今天共有 37 篇相关文章

Keyword: reinforcement learning

Variational Quantum Circuit-Based Reinforcement Learning for Dynamic Portfolio Optimization

基于变分量子电路的强化学习用于动态组合优化

Differential Voting: Loss Functions For Axiomatically Diverse Aggregation of Heterogeneous Preferences

差异投票:异质偏好公理性多样性聚合的损失函数

Analysis of Control Bellman Residual Minimization for Markov Decision Problem

马尔可夫判定问题中控制贝尔曼残差最小化分析

Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach

向量值分布强化学习策略评估:希尔伯特空间嵌入方法

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning

保存好前缀:通过过程监督强化学习实现精确的错误惩罚以增强LLM推理能力

A Unifying View of Coverage in Linear Off-Policy Evaluation

线性非保单评估中覆盖范围的统一视角

Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback

通过AI反馈强化学习优化口语对话系统中的会话质量

m2sv: A Scalable Benchmark for Map-to-Street-View Spatial Reasoning

m2sv:地图到街景空间推理的可扩展基准测试

Reward Engineering for Reinforcement Learning in Software Tasks

软件任务中强化学习的奖励工程

Glance and Focus Reinforcement for Pan-cancer Screening

泛癌筛查的眼神与专注强化

Exploring Weaknesses in Function Call Models via Reinforcement Learning: An Adversarial Data Augmentation Approach

通过强化学习探索函数调用模型的弱点:一种对抗性数据增强方法

Towards Pixel-Level VLM Perception via Simple Points Prediction

通过简单点预测实现像素级VLM感知

Structure-based RNA Design by Step-wise Optimization of Latent Diffusion Model

通过逐步优化潜在扩散模型实现结构基RNA设计

iFAN Ecosystem: A Unified AI, Digital Twin, Cyber-Physical Security, and Robotics Environment for Advanced Nuclear Simulation and Operations

iFAN生态系统:一个统一的人工智能、数字孪生、网络物理安全与机器人环境,支持先进核模拟与作

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

分布式稳健优化驱动强化学习用于大型语言模型推理

Output Feedback Stabilization of Linear Systems via Policy Gradient Methods

通过策略梯度法实现线性系统的输出反馈稳定

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

创新者-VL:一个用于科学发现的多模态大型语言模型

From Observations to Events: Event-Aware World Model for Reinforcement Learning

从观察到事件:事件感知世界模型用于强化学习

CHEHAB RL: Learning to Optimize Fully Homomorphic Encryption Computations

CHEHAB RL:学习优化完全同态加密计算

Task-Centric Policy Optimization from Misaligned Motion Priors

从错位运动先验中实现任务中心策略优化

OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation

OSIRIS:将模拟电路设计与机器学习与可扩展数据集生成相结合

APC-RL: Exceeding Data-Driven Behavior Priors with Adaptive Policy Composition

APC-RL:超越数据驱动行为先验,采用自适应策略组合

Reinforcement Learning Goal-Reaching Control with Guaranteed Lyapunov-Like Stabilizer for Mobile Robots

增强学习目标达成控制,保证李雅普诺夫式稳定器适用于移动机器人

Bridging Information Asymmetry: A Hierarchical Framework for Deterministic Blind Face Restoration

弥合信息不对称:确定性盲脸修复的层级框架

LLM-Enhanced Reinforcement Learning for Long-Term User Satisfaction in Interactive Recommendation

LLM增强强化学习,实现交互式推荐中的长期用户满意度

Safe Exploration via Policy Priors

通过政策先验安全探索

R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning

R^3:重玩、反思与排名奖励,用于大型语言模型强化学习

Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning

跟踪漂移:非平稳强化学习中的变异感知熵调度

Video-KTR: Reinforcing Video Reasoning via Key Token Attribution

视频-KTR:通过关键令牌归属强化视频推理

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

AlignCoder:将检索与目标意图对齐,实现仓库级代码完成

Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow

通过价值引导流实现高维连续控制的可扩展探索

Improving Policy Exploitation in Online Reinforcement Learning with Instant Retrospect Action

通过即时回顾行动改善在线强化学习中的策略利用

Reimagining Social Robots as Recommender Systems: Foundations, Framework, and Applications

重新构想社会机器人作为推荐系统:基础、框架与应用

Reimagining Peer Review Process Through Multi-Agent Mechanism Design

通过多代理机制设计重新构想同行评审流程

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

高效探索的无监督学习:通过自我设定目标预训练适应性政策

A Latent Space Framework for Modeling Transient Engine Emissions Using Joint Embedding Predictive Architectures

一个利用联合嵌入预测架构建模瞬态发动机排放的潜在空间框架

Self-Distillation Enables Continual Learning

自我蒸馏促进持续学习

Keyword: diffusion policy

There is no result