生成时间: 2025-12-05 16:31:26 (UTC+8); Arxiv 发布时间: 2025-12-05 20:00 EST (2025-12-06 09:00 UTC+8)

今天共有 39 篇相关文章

Keyword: reinforcement learning

Quantum-Embedded Dynamic Security Control using Hybrid Deep Reinforcement Learning

采用混合深度强化学习的量子嵌入动态安全控制

When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

当人工智能坐沙发:心理测学越狱揭示前沿模型中的内部冲突

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

关于搜索R1中的GRPO崩溃:懒惰似然-位移死亡螺旋

Toward Virtuous Reinforcement Learning

迈向美德强化学习

The Geometry of Benchmarks: A New Path Toward AGI

基准的几何结构:迈向通用人工智能的新路径

Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order

强化学习后培训的自助混合奖励:注入规范动作顺序

Driving Beyond Privilege: Distilling Dense-Reward Knowledge into Sparse-Reward Policies

超越特权:将密集奖励知识提炼为稀疏奖励政策

Towards better dense rewards in Reinforcement Learning Applications

在强化学习应用中实现更密集的奖励

Data-regularized Reinforcement Learning for Diffusion Models at Scale

大规模扩散模型的数据正则化强化学习

Long-Horizon Model-Based Offline Reinforcement Learning Without Conservatism

基于长视野的基于模型的离线强化学习,无保守主义

Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

含语义和符号熵的高效强化学习用于大型语言模型推理

AutoGuard: A Self-Healing Proactive Security Layer for DevSecOps Pipelines Using Reinforcement Learning

AutoGuard:基于强化学习的DevSecOps流水线自愈主动安全层

LangSAT: A Novel Framework Combining NLP and Reinforcement Learning for SAT Solving

LangSAT:结合自然语言处理与强化学习用于SAT解决的新框架

Learning to Orchestrate Agents in Natural Language with the Conductor

与指挥一起学习用自然语言编排代理

Quantum-Accelerated Deep Reinforcement Learning for Frequency Regulation Enhancement

量子加速深度强化学习以增强频率调控

MARL Warehouse Robots

MARL仓库机器人

GTM: Simulating the World of Tools for AI Agents

GTM:模拟人工智能代理工具的世界

RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

RRPO:基于LLM的情感TTS的稳健奖励政策优化

Omniscient Attacker in Stochastic Security Games with Interdependent Nodes

随机安全游戏中的全知攻击者,节点相互依赖

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

库珀:空间智能中合作感知与推理的统一模型

Gauss-Newton accelerated MPPI Control

高斯-牛顿加速MPPI控制

Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control

交通信号控制中多智能体深度强化学习的半集中式训练去中心化执行架构

TRINITY: An Evolved LLM Coordinator

TRINITY:一位进化型LLM协调员

RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting

RLHFSpec:通过自适应制图打破RLHF培训中的效率瓶颈

Using Machine Learning to Take Stay-or-Go Decisions in Data-driven Drone Missions

利用机器学习在数据驱动的无人机任务中做出“留下还是离开”的决策

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance

YingMusic-Singer:零镜头唱声合成与编辑,配合无注释旋律指导

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

PaCo-RL:通过成对奖励建模推进强化学习以实现一致图像生成

YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases

YingMusic-SVC:真实世界稳健零帧唱声转换,结合Flow-GRPO和歌唱特异的归纳偏置

Safe model-based Reinforcement Learning via Model Predictive Control and Control Barrier Functions

基于模型的安全强化学习,通过模型预测控制和控制障碍函数

Multi-Agent Reinforcement Learning for Intraday Operating Rooms Scheduling under Uncertainty

多智能体强化学习用于不确定性下日间手术室排程

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

CARL:多步代理的关键行动聚焦强化学习

Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

可实现抽象:近优层级强化学习

From Generated Human Videos to Physically Plausible Robot Trajectories

从生成的人类视频到物理上合理的机器人轨迹

SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards

SA-IQA:重新定义空间美学图像质量评估,并以多维奖励

Structured Document Translation via Format Reinforcement Learning

通过格式强化学习实现结构化文档翻译

Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning

语义软引导:无强化学习的大型语言模型中的长上下文推理

STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

STARE-VLA:渐进式阶段感知强化,用于微调视觉-语言-行动模型

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

ARM思维者:通过能动工具使用和视觉推理强化多模态生成奖励模型

Keyword: diffusion policy

Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting

连接模拟与现实:跨域传输与语义二维高斯喷溅