生成时间: 2025-11-04 16:33:19 (UTC+8); Arxiv 发布时间: 2025-11-04 20:00 EST (2025-11-05 09:00 UTC+8)

今天共有 68 篇相关文章

Keyword: reinforcement learning

On the Fundamental Limitations of Decentralized Learnable Reward Shaping in Cooperative Multi-Agent Reinforcement Learning

研究合作多智能体强化学习中去中心化可学习奖励塑造的根本局限性

Graph-Attentive MAPPO for Dynamic Retail Pricing

用于动态零售定价的图形专心 MAPPO

SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation

SpatialTraceGen:用于高效 VLM 空间推理蒸馏的高保真迹线

World Simulation with Video Foundation Models for Physical AI

使用物理 AI 的视频基础模型进行世界模拟

Token-Regulated Group Relative Policy Optimization for Stable Reinforcement Learning in Large Language Models

大型语言模型中稳定强化学习的代币调节组相对策略优化

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Alpamayo-R1:长尾通用自动驾驶的桥接推理和动作预测

Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

通过残差RL生成数据的自我改进视觉-语言-动作模型

Real-DRL: Teach and Learn in Reality

Real-DRL:在现实中教与学

End-to-End Framework Integrating Generative AI and Deep Reinforcement Learning for Autonomous Ultrasound Scanning

集成生成式人工智能和深度强化学习的端到端框架,用于自主超声扫描

LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers

LC-Opt:对强化学习和代理人工智能进行基准测试,以实现数据中心的端到端液体冷却优化

DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads

DCcluster-opt:地理分布式数据中心工作负载动态多目标优化基准测试

A Dual Large Language Models Architecture with Herald Guided Prompts for Parallel Fine Grained Traffic Signal Control

具有 Herald 引导提示的双大语言模型架构,用于并行细粒度交通信号控制

Study on Supply Chain Finance Decision-Making Model and Enterprise Economic Performance Prediction Based on Deep Reinforcement Learning

基于深度强化学习的供应链金融决策模型与企业经济绩效预测研究

Iterative Foundation Model Fine-Tuning on Multiple Rewards

多重奖励的迭代基础模型微调

Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

通过多轮强化学习一致地模拟人类角色

Improving the Robustness of Control of Chaotic Convective Flows with Domain-Informed Reinforcement Learning

利用域知情强化学习提高混沌对流控制的鲁棒性

Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog Computing

车载多雾计算中资源分配的强化学习

Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict

我们可以信任谁?具有多代理冲突的范围感知视频时刻检索

Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond

重新思考多模态大语言模型时代的面部表情识别:基准、数据集及其他

VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning

VinciCoder:通过粗到细的视觉强化学习统一多模态代码生成

CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks

CoT-Saliency:异构显著性任务的统一思维链推理

UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings

UME-R1:探索推理驱动的生成式多模态嵌入

Bootstrap Off-policy with World Model

使用世界模型的 Bootstrap Off-policy

GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining

GraphChain:通过工具链进行大规模图分析的大型语言模型

ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation

ID-Composer:具有分层身份保留的多主题视频合成

Robust Single-Agent Reinforcement Learning for Regional Traffic Signal Control Under Demand Fluctuations

需求波动下区域交通信号控制的鲁棒单智能体强化学习

Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control

面向区域自适应交通信号控制的单智能体强化学习模型

OpenSIR: Open-Ended Self-Improving Reasoner

OpenSIR:开放式自我改进推理器

PreferThinker: Reasoning-based Personalized Image Preference Assessment

PreferThinker:基于推理的个性化图像偏好评估

Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

Ariadne:用于探测和扩展VLM推理边界的可控框架

Power Control Based on Multi-Agent Deep Q Network for D2D Communication

基于多智能体深度Q网络的D2D通信功率控制

Efficient Reinforcement Learning for Large Language Models with Intrinsic Exploration

具有内在探索的大型语言模型的高效强化学习

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

GrowthHacker:使用代码修改 LLM 代理进行自动策略外评估优化

Logic-informed reinforcement learning for cross-domain optimization of large-scale cyber-physical systems

用于大规模信息物理系统跨域优化的逻辑知情强化学习

Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?

数学推理法学硕士是否有助于预测公共交通事件的影响?

Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games

均衡策略泛化:在追避博弈中进行跨图零样本泛化的强化学习框架

KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization

KFCPO:克罗内克因数近似约束策略优化

Optimizing Energy and Latency in 6G Smart Cities with Edge CyberTwins

利用 Edge CyberTwins 优化 6G 智慧城市的能源和延迟

MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL

MARS-SQL:用于文本到 SQL 的多代理强化学习框架

IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation

IF-CRITIC:迈向用于指令遵循评估的细粒度 LLM 评论家

Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

Prompt-R1:通过端到端强化学习的协作自动提示框架

Quantum Reinforcement Learning for 6G and Beyond Wireless Networks

适用于 6G 及更高无线网络的量子强化学习

Predictive Auxiliary Learning for Belief-based Multi-Agent Systems

基于信念的多智能体系统的预测辅助学习

Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment

通过人机交互偏好对齐实现可部署的视觉驱动无人机河流导航

SLAP: Shortcut Learning for Abstract Planning

SLAP:抽象规划的捷径学习

DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models

DART:用于高效大型语言模型的难度自适应推理截断

Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning

自我和谐:在考试时间强化学习中学习协调自我监督和自我游戏

DEER: Disentangled Mixture of Experts with Instance-Adaptive Routing for Generalizable Machine-Generated Text Detection

DEER:具有实例自适应路由的专家的解缠混合,用于可推广的机器生成文本检测

Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering

Thought-For-Food:推理链诱导的食物视觉问答

Optimizing Electric Vehicle Charging Station Placement Using Reinforcement Learning and Agent-Based Simulations

使用强化学习和基于智能体的模拟优化电动汽车充电站布局

From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models

从像素到协作:基于多模态世界模型的多智能体强化学习

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

RobustVLA:视觉-语言-动作模型的鲁棒性感知后训练

Diffusion-Based Solver for CNF Placement on the Cloud-Continuum

基于扩散的求解器,用于在云连续体上放置 CNF

Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series

与 DistilQwen 一起思考:四个提炼推理和奖励模型系列的故事

Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization

通过重新参数化和多样性正则化学习棘手的多模态策略

Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm

双任务范式下深度强化学习智能体时间决策的调制

Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis

学习寻找证据:具有因果忠实性分析的可验证推理代理

BARD: budget-aware reasoning distillation

BARD:预算意识推理蒸馏

TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding Tasks

TPS-Bench:评估AI智能体在复合任务中的工具规划和调度能力

Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning

学习该说什么以及如何准确:通过可微分的离散通信学习实现高效通信

L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3

L2T-Tune:使用 LHS 和 TD3 进行 LLM 引导的混合数据库调优

Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

Actial:激活多模态大语言模型的空间推理能力

Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward

通过具有 IQA 奖励的难度自适应强化学习增强基于扩散的恢复模型

Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding

通过资源感知并行推测解码进行协同大型语言模型推理

RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks

RLAC:使用对抗性批评者进行自由形式生成任务的强化学习

MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll

MOBIUS:一种多模态双足机器人,可以行走、爬行、攀爬和滚动

GenDexHand: Generative Simulation for Dexterous Hands

GenDexHand:灵巧手的生成模拟

Keyword: diffusion policy

Improving Robustness to Out-of-Distribution States in Imitation Learning via Deep Koopman-Boosted Diffusion Policy

通过深度库夫曼增强扩散策略提高模仿学习中对分布外状态的鲁棒性