生成时间: 2026-04-01 17:08:34 (UTC+8); Arxiv 发布时间: 2026-04-01 20:00 EDT (2026-04-02 08:00 UTC+8)

今天共有 29 篇相关文章

Keyword: reinforcement learning

Mitigating Temporal Blindness in Kubernetes Autoscaling: An Attention-Double-LSTM Framework

缓解Kubernetes自动扩展中的时间盲点:一个注意力双重LSTM框架

Robust Multi-Agent Reinforcement Learning for Small UAS Separation Assurance under GPS Degradation and Spoofing

在GPS劣化和欺骗下实现小型无人机分离保障的强健多智能体强化学习

Optimistic Online LQR via Intrinsic Rewards

通过Intrinsic Rewards的乐观在线LQR

A Pontryagin Method of Model-based Reinforcement Learning via Hamiltonian Actor-Critic

一种基于模型的Pontryagin方法,通过哈密顿演员-批评者

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

特洛伊语:通过对抗性精调化绕过宪法分类器,无需越狱税

Realistic Market Impact Modeling for Reinforcement Learning Trading Environments

强化学习交易环境的现实市场影响建模

MemRerank: Preference Memory for Personalized Product Reranking

MemRerank:个性化产品重新排序的偏好记忆

Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry

现代行业中边缘云连续体智能化的缺点

Scaling Whole-Body Human Musculoskeletal Behavior Emulation for Specificity and Diversity

针对全人体肌肉骨骼行为模拟进行标度化,以实现特异性和多样性

AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP

AP-DRL:一种用于在Versal ACAP上实现深度强化学习自动任务划分的协同算法-硬件框架

Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning

基于监督扩散辅助多智能体强化学习的多AUV协作目标跟踪

Calibrated Confidence Expression for Radiology Report Generation

放射科报告生成的校准置信表达

MemFactory: Unified Inference & Training Framework for Agent Memory

MemFactory:代理内存的统一推理与训练框架

Learning to Generate Formally Verifiable Step-by-Step Logic Reasoning via Structured Formal Intermediaries

学习通过结构化形式中介生成形式可验证的逐步逻辑推理

Target-Aligned Reinforcement Learning

目标对齐强化学习

Learning Diagnostic Reasoning for Decision Support in Toxicology

在毒理学中学习诊断推理以支持决策

ASI-Evolve: AI Accelerates AI

ASI-Evolve:人工智能加速人工智能

6GAgentGym: Tool Use, Data Synthesis, and Agentic Learning for Network Management

6GAgentGym:工具使用、数据综合与网络管理中的代理学习

Reinforced Reasoning for End-to-End Retrosynthetic Planning

强化端到端逆合成规划的推理

Friends, Foes, and First Authors: A Game Theory Model of How Power Plays Rewrite Academic Co-Authorship Networks

朋友、敌人与第一作者:权力游戏如何重写学术合著网络的博弈论模型

VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing

VectorGym:SVG代码生成、草图与编辑的多任务基准测试

An Output Feedback Q-learning Algorithm for Optimal Control of Nonlinear Systems with Koopman Linear Embedding

一种用于库普曼线性嵌入非线性系统最优控制的输出反馈Q-学习算法

ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training

ShapE-GRPO:Shapley增强奖励分配,用于多候选人LLM培训

UniRank: End-to-End Domain-Specific Reranking of Hybrid Text-Image Candidates

UniRank:面向特定领域的端到端重新排序混合文本-图像候选对象

GreenFLag: A Green Agentic Approach for Energy-Efficient Federated Learning

绿色农业:一种绿色代理方法,实现节能的联邦学习

Phyelds: A Pythonic Framework for Aggregate Computing

Phyelds:聚合计算的Python框架

Hybrid Framework for Robotic Manipulation: Integrating Reinforcement Learning and Large Language Models

机器人操作混合框架:强化学习与大型语言模型的整合

Keyword: diffusion policy

Enhancing Policy Learning with World-Action Model

利用世界行动模式提升政策学习

CLaD: Planning with Grounded Foresight via Cross-Modal Latent Dynamics

CLaD:通过跨模态潜在动力学实现基于基础的前瞻性规划