生成时间: 2025-11-18 16:33:33 (UTC+8); Arxiv 发布时间: 2025-11-18 20:00 EST (2025-11-19 09:00 UTC+8)

今天共有 74 篇相关文章

Keyword: reinforcement learning

Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL

注意熵:从最大熵到轨迹熵约束强化学习

Machine learning-based cloud resource allocation algorithms: a comprehensive comparative review

基于机器学习的云资源分配算法:全面比较综述

Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning

基于聚类的权重正交化用于深度强化学习的稳定

Environment-Aware Transfer Reinforcement Learning for Sustainable Beam Selection

环境感知转移强化学习以实现可持续光束选择

Convergence of Multiagent Learning Systems for Traffic control

多智能体学习系统在交通控制中的融合

OSGym: Super-Scalable Distributed Data Engine for Generalizable Computer Agents

OSGym:面向通用计算机代理的超可扩展分布式数据引擎

Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom

通过语义分割增强三维环境中的强化学习:ViZDoom 案例研究

How Machine Learning-Data Driven Replication Strategies Enhance Fault Tolerance in Large-Scale Distributed Systems

机器学习数据驱动复制策略如何增强大规模分布式系统的容错能力

Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction

学习精炼:一种代理式强化学习方法用于迭代 SPARQL 查询构造

Image-POSER: Reflective RL for Multi-Expert Image Generation and Editing

图像模拟器:面向多专家图像生成与编辑的反射强化学习

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

MiroThinker:通过模型、上下文和交互式扩展推动开源研究代理的性能边界

Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

为成本效益高的LLM代理提供共形约束策略优化

Better LLM Reasoning via Dual-Play

通过双人游戏更好地进行大型语言模型推理

Context-Emotion Aware Therapeutic Dialogue Generation: A Multi-component Reinforcement Learning Approach to Language Models for Mental Health Support

情境感知治疗对话生成:一种多元强化学习方法,用于心理健康支持的语言模型

VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

VULPO:通过策略内大型语言模型优化实现上下文感知漏洞检测

Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression

分位Q学习:利用分位数回归重新审视离线极限Q学习

Goal-Oriented Multi-Agent Reinforcement Learning for Decentralized Agent Teams

去中心化智能体团队的目标导向多智能体强化学习

Look As You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning

《如你所思:通过强化学习统一推理与视觉证据归因以实现可验证文档RAG》

EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation

EARL:对熵感知强化学习的LLM对齐,实现可靠的RTL代码生成

Intelligent Collaborative Optimization for Rubber Tyre Film Production Based on Multi-path Differentiated Clipping Proximal Policy Optimization

基于多径差分裁剪的橡胶轮胎薄膜生产智能协作优化 近端策略优化

Treatment Stitching with Schrödinger Bridge for Enhancing Offline Reinforcement Learning in Adaptive Treatment Strategies

采用薛定谔桥进行治疗缝合,以增强自适应治疗策略中的离线强化学习

HCPO: Hierarchical Conductor-Based Policy Optimization in Multi-Agent Reinforcement Learning

HCPO:多智能体强化学习中的基于导体的层级策略优化

AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing

人工智能销售员:迈向可靠的大型语言模型驱动电话营销

CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic

CriticSearch:通过回顾性批评人为搜索代理人提供细致的署名分配

SocialNav-Map: Dynamic Mapping with Human Trajectory Prediction for Zero-Shot Social Navigation

SocialNav-Map:零射点社会导航的动态地图与人类轨迹预测

Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning

通过评分标准奖励与指导:促进探索以提升多领域推理能力

Dynamic Reward Scaling for Multivariate Time Series Anomaly Detection: A VAE-Enhanced Reinforcement Learning Approach

多变量时间序列异常检测的动态奖励尺度:VAE增强强化学习方法

Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning

通过强化学习构建和解释数字孪生表示以实现视觉推理

Learning Adaptive Neural Teleoperation for Humanoid Robots: From Inverse Kinematics to End-to-End Control

学习人形机器人自适应神经远程作:从逆向运动学到端到端控制

Integrating Neural Differential Forecasting with Safe Reinforcement Learning for Blood Glucose Regulation

将神经差异预测与安全强化学习相结合以实现血糖调节

Tailored Primitive Initialization is the Secret Key to Reinforcement Learning

定制化的原始初始化是强化学习的秘密钥匙

ClutterNav: Gradient-Guided Search for Efficient 3D Clutter Removal with Learned Costmaps

ClutterNav:利用学习成本图实现高效3D杂乱去除的梯度引导搜索

Designed to Spread: Generative Approaches to Enhance Information Diffusion

旨在传播:生成式方法促进信息传播

TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction

TAdaRAG:通过动态知识图谱构建实现任务自适应检索增强生成

ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding

ReaSon:带有信息瓶颈的强化因果搜索以促进视频理解

Mitigating Length Bias in RLHF through a Causal Lens

通过因果视角缓解RLHF中的长度偏置

NFQ2.0: The CartPole Benchmark Revisited

NFQ2.0:CartPole 基准测试再访

Task-Aware Morphology Optimization of Planar Manipulators via Reinforcement Learning

通过强化学习实现平面作器的任务感知形态优化

Beyond Fixed Tasks: Unsupervised Environment Design for Task-Level Pairs

超越固定任务:任务级对的无监督环境设计

Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL

通过上下文强化学习实现端到端自动驾驶的提示驱动域适配

Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation

通过梯度估计实现可扩展的多目标和元强化学习

Multi-Agent Reinforcement Learning for Heterogeneous Satellite Cluster Resources Optimization

多智能体强化学习用于异构卫星集群资源优化

Maximizing the efficiency of human feedback in AI alignment: a comparative analysis

最大化人工智能对齐中人类反馈的效率:一项比较分析

Expressive Temporal Specifications for Reward Monitoring

奖励监测的表达性时间规范

Mapping fNIRS Signals to Agent Performance: Toward Reinforcement Learning from Neural Feedback

将fNIRS信号映射到代理表现:迈向神经反馈强化学习

Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making

思考、说话、决策:语言增强多智能体强化学习用于经济决策

Green Emergency Communications in RIS- and MA-Assisted Multi-UAV SAGINs: A Partially Observable Reinforcement Learning Approach

RIS和MA辅助多无人机SAGIN中的绿色应急通信:一种部分可观察的强化学习方法

DeepSport: A Multimodal Large Language Model for Comprehensive Sports Video Reasoning via Agentic Reinforcement Learning

DeepSport:一个通过智能强化学习实现全面体育视频推理的多模态大型语言模型

Wide-Area Feedback Control for Renewables-Heavy Power Systems: A Comparative Study of Reinforcement Learning and Lyapunov-Based Design

可再生能源重电力系统的广域反馈控制:强化学习与基于李雅普诺夫设计的比较研究

Learning Branching Policies for MILPs with Proximal Policy Optimization

学习带有近端策略优化的MILP分支策略

The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training

优点、缺点与混合:推理模型训练中的奖励结构对决

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

修订者:超越文本反思,迈向长视频理解中的多模态内省推理

Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection

自然语言数学证明验证与选择的生成验证器规模化

An Online Multiobjective Policy Gradient for Long-run Average-reward Markov Decision Process

长期平均回报马尔可夫决策过程的在线多目标政策梯度

One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow

带Q学习的一步生成策略:平均流的重新表述

ViSS-R1: Self-Supervised Reinforcement Video Reasoning

ViSS-R1:自我监督强化视频推理

STEP: Success-Rate-Aware Trajectory-Efficient Policy Optimization

步骤:成功率感知轨迹的高效策略优化

Transformer-Based Scalable Multi-Agent Reinforcement Learning for Networked Systems with Long-Range Interactions

基于变换器的可扩展多智能体强化学习,适用于具有远程交互的网络系统

Soft Conflict-Resolution Decision Transformer for Offline Multi-Task Reinforcement Learning

软冲突解决决策变换器,用于离线多任务强化学习

Conditional Diffusion Model for Multi-Agent Dynamic Task Decomposition

多智能体动态任务分解的条件扩散模型

DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play

DiffFP:通过扩散的虚构游戏从零开始学习行为

Video Spatial Reasoning with Object-Centric 3D Rollout

基于对象的3D展开视频空间推理

PIGEON: VLM-Driven Object Navigation via Points of Interest Selection

鸽子:通过兴趣点选择实现VLM驱动的目标导航

Learning to Solve Resource-Constrained Project Scheduling Problems with Duration Uncertainty using Graph Neural Networks

利用图神经网络学习解决资源受限且时长不确定性的项目调度问题

MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection

MMD-Thinker:多模态虚假信息检测的自适应多维思维

Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning

通过通过 Voronoi 状态划分提炼为局部专用线性策略,实现可解释的强化学习策略

Finding Kissing Numbers with Game-theoretic Reinforcement Learning

利用博弈论强化学习寻找吻数

Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness

结合ProMP重参数化和能量感知的接触安全强化学习

Artificial Intelligence-driven Intelligent Wearable Systems: A full-stack Integration from Material Design to Personalized Interaction

人工智能驱动的智能可穿戴系统:从材料设计到个性化交互的全栈整合

P1: Mastering Physics Olympiads with Reinforcement Learning

P1:通过强化学习掌握物理奥林匹克竞赛

Distribution Matching Distillation Meets Reinforcement Learning

分布匹配蒸馏与强化学习结合

Keyword: diffusion policy

MATT-Diff: Multimodal Active Target Tracking by Diffusion Policy

MATT-Diff:通过扩散策略实现多模主动目标跟踪

Decoupled Action Head: Confining Task Knowledge to Conditioning Layers

解耦行动头:将任务知识限制在条件层

DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play

DiffFP:通过扩散的虚构游戏从零开始学习行为