生成时间: 2026-05-01 17:48:41 (UTC+8); Arxiv 发布时间: 2026-05-01 20:00 EDT (2026-05-02 08:00 UTC+8)

今天共有 35 篇相关文章

Keyword: reinforcement learning

Learning-to-Explain through 20Q Gaming: An Explainable Recommender for Cybersecurity Education

通过20Q游戏学习解释:网络安全教育的可解释推荐

PALCAS: A Priority-Aware Intelligent Lane Change Advisory System for Autonomous Vehicles using Federated Reinforcement Learning

PALCAS:基于联邦强化学习的自动驾驶车辆优先级感知智能变道咨询系统

A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

高吞吐量、计算效率高的POMDP隐藏与寻址引擎(HASE),用于多代理操作

Learning Tactile-Aware Quadrupedal Loco-Manipulation Policies

学习触觉感知四足机车操作策略

AutoREC: A software platform for developing reinforcement learning agents for equivalent circuit model generation from electrochemical impedance spectroscopy data

AutoREC:一款用于开发增强学习代理的软件平台,用于从电化学阻抗光谱数据生成等效电路模型

VeraRetouch: A Lightweight Fully Differentiable Framework for Multi-Task Reasoning Photo Retouching

VeraRetouch:一个轻量级全可微分的多任务推理照片修图框架

Detecting is Easy, Adapting is Hard: Local Expert Growth for Visual Model-Based Reinforcement Learning under Distribution Shift

检测容易,适应困难:分布转变下可视化模型强化学习的本地专家成长

RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC

RAY-TOLD:基于TDMPC的密集动态障碍避让的基于射线的潜动力学

From Coarse to Fine: Benchmarking and Reward Modeling for Writing-Centric Generation Tasks

从粗到细:以写作为中心的生成任务中的基准测试与奖励建模

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

PRTS:通过对比表征实现的原始推理与任务系统

Leveraging Verifier-Based Reinforcement Learning in Image Editing

在图像编辑中利用基于验证器的强化学习

Bayesian policy gradient and actor-critic algorithms

贝叶斯策略梯度和演员-批评算法

WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning

WaferSAGE:通过合成数据生成和评分标准引导强化学习的大型语言模型驱动晶圆缺陷分析

Autonomous Traffic Signal Optimization Using Digital Twin and Agentic AI for Real-Time Decision-Making

利用数字孪生和代理人工智能实现实时决策的自主交通信号优化

CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting

CastFlow:学习基于角色的专属工作流用于时间序列预测

Rethinking Agentic Reinforcement Learning In Large Language Models

重新思考大型语言模型中的能动强化学习

Generate Your Talking Avatar from Video Reference

从视频参考生成你的会说话的头像

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

带强化学习的图形界面代理:迈向数字居民

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

潜在-GRPO:针对潜在推理的群体相对策略优化

Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

核化优势估计:从非参数统计到大型语言模型推理

Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care

从分歧中学习:临床医生作为价值导向护理中隐性偏好信号的覆盖

Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

回声α:用于超声解读的大型代理多模态推理模型

Cost-Aware Learning

成本意识学习

Exponential families from a single KL identity

单个KL恒等式的指数族

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

RHyVE:能力感知验证与阶段感知部署,用于LLM生成的奖励假设

Intelligent Self-tuning Active EMI Filtering for Electrified Automotive Power Systems Using Reinforcement Learning

用于电气化汽车电力系统的智能自调谐主动EMI滤波,采用增强学习

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

迈向基于法律和安全原则的神经符号因果规则综合、验证与评估

FiLMMeD: Feature-wise Linear Modulation for Cross-Problem Multi-Depot Vehicle Routing

FiLMMeD:跨问题多车库车辆路由的线性调制功能

GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment

GSDrive:通过多模式轨迹探测与3D高斯喷溅环境强化驾驶政策

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

PRISM:通过黑箱策略提炼实现多模态强化学习的预对齐

AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation

AdvDMD:对抗性奖励与DMD的结合,实现高质量的少步生成

Global Optimality for Constrained Exploration via Penalty Regularization

通过惩罚正则化实现受限勘探的全局最优

Synthetic Computers at Scale for Long-Horizon Productivity Simulation

大规模合成计算机用于长期生产力模拟

Exploration Hacking: Can LLMs Learn to Resist RL Training?

探索黑客:大型语言模型能学会抵抗强化学习训练吗?

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

LaST-R1:通过自适应物理潜在推理强化VLA模型的作用

Keyword: diffusion policy

There is no result