生成时间: 2026-03-26 17:00:25 (UTC+8); Arxiv 发布时间: 2026-03-26 20:00 EDT (2026-03-27 08:00 UTC+8)

今天共有 33 篇相关文章

Keyword: reinforcement learning

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

针对主动用户与大型语言模型交互的隐式回合策略优化

Safe Reinforcement Learning with Preference-based Constraint Inference

基于偏好的约束推断安全强化学习

Utilizing Adversarial Training for Robust Voltage Control: An Adaptive Deep Reinforcement Learning Method

利用对抗性训练实现稳健电压控制:一种自适应深度强化学习方法

Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL

双门认知时间膨胀:异步MARL中的自主计算调制

BXRL: Behavior-Explainable Reinforcement Learning

BXRL:行为可解释强化学习

Self Paced Gaussian Contextual Reinforcement Learning

自进高斯情境强化学习

Human, AI, and Hybrid Ensembles for Detection of Adaptive, RL-based Social Bots

人类、人工智能与混合集合用于检测自适应、基于强化学习的社交机器人

Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation

以学习为导向的优先规划,实现仓库自动化中终身多智能体路径寻找

HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation

HDPO:通过特权自蒸馏优化混合蒸馏策略

The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search

DeepXube 软件包,用于解决带有已学习启发式函数和搜索的寻路问题

ProcureGym: A Multi-Agent Markov Game Framework for Modeling National Volume-based Drug Procurement

ProcureGym:一个多智能体马尔可夫博弈框架,用于建模国家量级药品采购

Off-Policy Safe Reinforcement Learning with Constrained Optimistic Exploration

带有受限乐观探索的非策略安全强化学习

Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

无限视界MDP的最优方差依赖遗憾界限

PointRFT: Explicit Reinforcement Fine-tuning for Point Cloud Few-shot Learning

PointRFT:点云少样本学习的显式强化微调

From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments

从像素到数字代理:强化学习环境分类学与技术趋势的实证研究

Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

策略引导威胁狩猎:一个支持大型语言模型的框架,支持 Splunk SOC 分流

PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning

PCHC:通过多目标强化学习实现偏好条件人形控制

Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

迈向有效体验式学习:双重指导的应用与内化

Likelihood hacking in probabilistic program synthesis

概率程序合成中的似然黑客

Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection

导师-学生强化学习:一套动态课程,用于强健的深度伪造检测

A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula

深入探讨基于合成数据和课程的代码生成扩展强化学习

SumRank: Aligning Summarization Models for Long-Document Listwise Reranking

SumRank:对齐长文档列表重排序的总结模型

Decentralized End-to-End Multi-AAV Pursuit Using Predictive Spatio-Temporal Observation via Deep Reinforcement Learning

通过深度强化学习进行预测时空观察的去中心化端到端多AAV追踪

C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

C-STEP:基于物理学的实时空间赋能,实现移动智能体的安全强化学习

Heuristic Self-Paced Learning for Domain Adaptive Semantic Segmentation under Adverse Conditions

在逆境条件下的启发式自进度学习领域自适应语义分割

LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control

LATS:大语言模型辅助师生框架,用于交通信号控制中的多智能体强化学习

CoordLight: Learning Decentralized Coordination for Network-Wide Traffic Signal Control

CoordLight:学习去中心化协调,实现全网络交通信号控制

Improving Lean4 Autoformalization via Cycle Consistency Fine-tuning

通过循环一致性微调改进精益4自形式化

Composer 2 Technical Report

作曲家2技术报告

Completeness of Unbounded Best-First Minimax and Descent Minimax

无界最佳优先极小极大和下降极小极大的完备性

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

VFIG:利用视觉语言模型在SVG中向量化复杂图形

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

三月:多智能体强化自我检查以防LLM幻觉

DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving

DreamerAD:通过潜在世界模型实现的高效强化学习,实现自动驾驶

Keyword: diffusion policy

There is no result