生成时间: 2026-01-02 16:32:43 (UTC+8); Arxiv 发布时间: 2026-01-01 20:00 EST (2026-01-02 09:00 UTC+8)

今天共有 41 篇相关文章

Keyword: reinforcement learning

A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering Simulation

工程仿真中几何准备和网格生成的人工智能方法综述

Audited Skill-Graph Self-Improvement for Agentic LLMs via Verifiable Rewards, Experience Synthesis, and Continual Memory

经审计的技能图谱自我提升,适用于代理型大型语言模型,通过可验证的奖励、经验综合和持续记忆

Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions

安全偏向策略优化:通过信任区域实现硬约束强化学习

FineFT: Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading

FineFT:期货交易中的高效且风险意识强化学习

Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark

提示诱发的过载生成作为拒绝服务:黑箱攻击端基准测试

Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR

带流量匹配的最大熵强化学习及LQR案例研究

Distributed Beamforming in Massive MIMO Communication for a Constellation of Airborne Platform Stations

分布式波束成形,用于大规模MIMO通信,用于空中平台站星座

Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias

约束催生推广:时间动力学作为归纳偏压

CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards

CEC-Zero:零监督字符错误纠正,附带自我生成奖励

RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations

RSAgent:通过多回合工具调用学习推理与行动文本引导分割

Reinforced Diffusion: Learning to Push the Limits of Anisotropic Diffusion for Image Denoising

强化扩散:学习推动各向异性扩散的极限以实现图像去噪

ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment

ROAD:通过自动调试实现零射智能体比对的反射优化

HY-MT1.5 Technical Report

HY-MT1.5技术报告

GARDO: Reinforcing Diffusion Models without Reward Hacking

GARDO:强化扩散模型而不使用奖励黑客

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

通过方向性解耦对齐在扩散强化学习中对齐的驯服偏好模式崩溃

Deep Reinforcement Learning for Solving the Fleet Size and Mix Vehicle Routing Problem

深度强化学习用于解决车队规模与混合车辆布线问题

DRL-TH: Jointly Utilizing Temporal Graph Attention and Hierarchical Fusion for UGV Navigation in Crowded Environments

DRL-TH:在拥挤环境中联合利用时间图注意力和层级融合技术进行UGV导航

Real-world Reinforcement Learning from Suboptimal Interventions

来自次优干预的现实强化学习

Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking

弄清楚:用主动视觉思维提升推理前沿

MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems

MaRCA:大规模推荐系统中动态计算分配的多智能体强化学习

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

SenseNova-MARS:通过强化学习赋能多模态智能推理与搜索

Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models

逆强化学习和动态离散选择模型的高效推理

Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics

以偏置-噪声-对齐诊断为指导的自适应学习

Networked Markets, Fragmented Data: Adaptive Graph Learning for Customer Risk Analytics and Policy Design

网络市场,碎片化数据:客户风险分析与政策设计的自适应图学习

From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning

从构建模块到规划:LLM中的多步空间推理与强化学习

From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme

从感知到笑点:用野外表情包艺术赋能VLM

Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization

强化学习增强型LLM代理用于协作决策和性能优化

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

优图代理:通过自动生成和混合策略优化提升代理生产力

Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation

结合深度强化学习的混合运动规划,用于移动机器人导航

RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence

RoboMIND 2.0:一个多模态、双手移动作数据集,用于可推广具身智能

Hierarchical Online Optimization Approach for IRS-enabled Low-altitude MEC in Vehicular Networks

车载网络中IRS支持的低空MEC的分层在线优化方法

Dynamic Policy Learning for Legged Robot with Simplified Model Pretraining and Model Homotopy Transfer

带有简化模型预训练和模型同伦转移的腿部机器人动态策略学习

Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting

进化而非训练:通过进化提示进行零样本推理分割

Control of Microrobots with Reinforcement Learning under On-Device Compute Constraints

在设备端计算约束下,利用强化学习控制微型机器人

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Dream2Flow:连接视频生成与开放世界作与3D对象流

Throughput Optimization in UAV-Mounted RIS under Jittering and Imperfect CSI via DRL

在抖动和不完美CSI下的无人机安装RIS中通过DRL实现吞吐量优化

Iterative Deployment Improves Planning Skills in LLMs

迭代部署提升大型语言模型的规划能力

MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control

MSACL:多步骤演员-批评者学习,配备利雅普诺夫证书,用于指数稳定控制

ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning

ResponseRank:通过偏好强度学习实现数据高效奖励建模

Many Minds from One Model: Bayesian Transformers for Population Intelligence

一个模型中的多脑:贝叶斯变换器用于人口智能

Scaling Open-Ended Reasoning to Predict the Future

扩展开放式推理以预测未来

Keyword: diffusion policy

There is no result