生成时间: 2025-11-25 16:33:28 (UTC+8); Arxiv 发布时间: 2025-11-25 20:00 EST (2025-11-26 09:00 UTC+8)

今天共有 67 篇相关文章

Keyword: reinforcement learning

AURA: Adaptive Unified Reasoning and Automation with LLM-Guided MARL for NextG Cellular Networks

AURA:基于LLM引导的自适应统一推理与自动化,适用于NextG蜂窝网络

Enhancing Robustness of Offline Reinforcement Learning Under Data Corruption via Sharpness-Aware Minimization

通过锐利感知最小化,增强离线强化学习在数据损坏下的稳健性

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation

通过值去相关和外推实现LLM的多值对齐

Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design

通过以人为本的课程设计,提升三维视觉空间任务中的强化学习

Non-stationary and Varying-discounting Markov Decision Processes for Reinforcement Learning

非平稳和变折现的马尔可夫决策过程用于强化学习

Can we use LLMs to bootstrap reinforcement learning? -- A case study in digital health behavior change

我们能用大型语言模型来启动强化学习吗?——数字健康行为改变的案例研究

Smart Manufacturing: MLOps-Enabled Event-Driven Architecture for Enhanced Control in Steel Production

智能制造:支持MLOps的事件驱动架构,增强钢铁生产控制

Dialogue Diplomats: An End-to-End Multi-Agent Reinforcement Learning System for Automated Conflict Resolution and Consensus Building

对话外交官:一个端到端的多智能体强化学习系统,用于自动化冲突解决与共识构建

LEARN: Learning End-to-End Aerial Resource-Constrained Multi-Robot Navigation

学习:学习端到端、资源有限的多机器人空中导航

Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch

跨张量平行大小的确定性推断,消除训练-推断不匹配

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

带 RL 或 SFT 的变换器可以证明学习稀疏布尔函数,但方式不同

Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models

培养涌现的联合联想:语言模型中创造性思维的强化学习方法

Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

生成对抗式后训练缓解了现场人机音乐交互中的奖励黑客行为

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

MobileVLA-R1:增强移动机器人的视觉-语言-行动

DISPATCH -- Decentralized Informed Spatial Planning and Assignment of Tasks for Cooperative Heterogeneous Agents

DISPATCH——为合作异构代理提供分散式知情的空间规划与任务分配

PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning

PA-FAS:通过路径增强强化学习实现可解释且可推广的多模态人脸反欺骗

A Reinforcement Learning Framework for Resource Allocation in Uplink Carrier Aggregation in the Presence of Self Interference

在自干涉存在下,上行载波聚合资源分配的强化学习框架

SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization

脊柱:带熵带正则化的令牌选择性测试时间强化学习

Hybrid LSTM and PPO Networks for Dynamic Portfolio Optimization

动态组合优化的混合LSTM和PPO网络

Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning

空间流行病模拟的奖励工程:个体行为学习的强化学习平台

IE-Critic-R1: Advancing the Explanatory Measurement of Text-Driven Image Editing for Human Perception Alignment

IE-Critic-R1:推进文本驱动图像编辑在人类感知对齐中的解释性测量

Anti-Jamming based on Null-Steering Antennas and Intelligent UAV Swarm Behavior

基于零转向天线和智能无人机群行为的反干扰

A New Error Temporal Difference Algorithm for Deep Reinforcement Learning in Microgrid Optimization

一种用于微电网优化中深度强化学习的新误差时差算法

MOMA-AC: A preference-driven actor-critic framework for continuous multi-objective multi-agent reinforcement learning

MOMA-AC:一种基于偏好的演员-批评框架,用于连续多目标多代理强化学习

Deep Gaussian Process Proximal Policy Optimization

深度高斯过程近端策略优化

A Novel and Practical Universal Adversarial Perturbations against Deep Reinforcement Learning based Intrusion Detection Systems

针对基于深度强化学习的入侵检测系统的新颖且实用的普遍对抗扰动

Carbon-Aware Intrusion Detection: A Comparative Study of Supervised and Unsupervised DRL for Sustainable IoT Edge Gateways

碳感知入侵检测:可持续物联网边缘网关监督与非监督DRL的比较研究

EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning

EgoVITA:学习如何规划并验证以应对以自我为中心的视频推理

Dreaming Falcon: Physics-Informed Model-Based Reinforcement Learning for Quadcopters

梦境猎鹰:基于物理的模型强化学习,适用于四旋翼飞机

Tail Distribution of Regret in Optimistic Reinforcement Learning

乐观强化学习中的遗憾尾分布

LLM Reasoning for Cold-Start Item Recommendation

冷启动项目推荐的LLM推理

MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation

MammothModa2:一个统一的AR扩散框架,用于多模态理解与生成

DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition

DiVE-k:细粒度图像识别的差分视觉推理

Synthetic Curriculum Reinforces Compositional Text-to-Image Generation

综合课程强化构图文本生成

General Agentic Memory Via Deep Research

通过深入研究获得一般能动记忆

Perceptual-Evidence Anchored Reinforced Learning for Multimodal Reasoning

感知证据锚定的强化学习用于多模态推理

Energy-Efficient Task Computation at the Edge for Vehicular Services

车载服务的节能任务计算边缘

ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints

ORIGAMISPACE:多步空间推理中多模态大型语言模型的基准测试,且有数学约束

SafeFall: Learning Protective Control for Humanoid Robots

安全坠落:学习人形机器人的保护控制

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

从代码基础模型到代理与应用:代码智能实用指南

How to Train Your Latent Control Barrier Function: Smooth Safety Filtering Under Hard-to-Model Constraints

如何训练你的潜在控制屏障功能:在难以建模的约束下实现平滑的安全过滤

Multi-Agent Cross-Entropy Method with Monotonic Nonlinear Critic Decomposition

多智能体交叉熵方法,采用单调非线性批判分解

Seeing What Matters: Visual Preference Policy Optimization for Visual Generation

看清什么重要:视觉偏好政策优化,用于视觉生成

Reinforcement Learning for Self-Healing Material Systems

自我修复材料系统的强化学习

ProxT2I: Efficient Reward-Guided Text-to-Image Generation via Proximal Diffusion

ProxT2I:通过近端扩散实现高效的奖励引导文本转图像生成

VideoPerceiver: Enhancing Fine-Grained Temporal Perception in Video Multimodal Large Language Models

视频感知器:增强视频多模态大型语言模型中的细粒度时间感知

PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation

PrismAudio:分解的思维链条与视频转音频生成的多维奖励

Periodic Asynchrony: An Effective Method for Accelerating On-Policy Reinforcement Learning

周期性非同步:加速策略上强化学习的有效方法

Accelerating Reinforcement Learning via Error-Related Human Brain Signals

通过错误相关的人脑信号加速强化学习

Learning to Compress Graphs via Dual Agents for Consistent Topological Robustness Evaluation

学习通过对偶代理压缩图以实现一致的拓扑鲁棒性评估

FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning

快进剪枝:通过单步强化学习实现高效的大型语言模型修剪

Dynamic Mixture of Experts Against Severe Distribution Shifts

专家动态组合以应对严重的分布变动

Energy-Efficient Routing Protocol in Vehicular Opportunistic Networks: A Dynamic Cluster-based Routing Using Deep Reinforcement Learning

车载机会网络中的节能路由协议:基于深度强化学习的动态集群路由

ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay

再研究:通过情境化回顾体验回放改进多层次营销产品以实现身体探索

DeCoRL: Decoupling Reasoning Chains via Parallel Sub-Step Generation and Cascaded Reinforcement for Interpretable and Scalable RLHF

DeCoRL:通过并行子步生成和级联强化实现可解释和可扩展的RLHF的推理链解耦

VIL2C: Value-of-Information Aware Low-Latency Communication for Multi-Agent Reinforcement Learning

VIL2C:信息价值感知的低延迟通信,用于多智能体强化学习

RAVEN++: Pinpointing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning

RAVEN++:通过主动强化推理精准定位广告视频中的细粒度违规

Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization

通过树群双重感知搜索与优化实现LLM安全对齐的对抗性攻防共演

MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization

MAESTRO:通过任务和奖励优化塑造多智能体环境

Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning

Syn-GRPO:MLLM感知推理的自我演化数据综合

Leveraging LLMs for reward function design in reinforcement learning control tasks

在强化学习控制任务中利用LLM进行奖励函数设计

Growing with the Generator: Self-paced GRPO for Video Generation

与生成器共成长:视频生成的自节奏GRPO

LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile Systems

基于大型语言模型驱动的多智能体强化学习专家演示

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

图鲁博士:深度研究的强化学习与不断演变的评分标准

Learning Robust Social Strategies with Large Language Models

利用大型语言模型学习稳健的社会策略

SLMFix: Leveraging Small Language Models for Error Fixing with Reinforcement Learning

SLMFix:利用小语言模型进行强化学习的错误修复

Keyword: diffusion policy

Learning Diffusion Policies for Robotic Manipulation of Timber Joinery under Fabrication Uncertainty

学习在制造不确定性下木材接合机器人作的扩散策略