生成时间: 2025-12-23 16:34:42 (UTC+8); Arxiv 发布时间: 2025-12-23 20:00 EST (2025-12-24 09:00 UTC+8)

今天共有 58 篇相关文章

Keyword: reinforcement learning

Graph-O1 : Monte Carlo Tree Search with Reinforcement Learning for Text-Attributed Graph Reasoning

Graph-O1:基于文本属性图推理的蒙特卡洛树搜索与强化学习

QAISim: A Toolkit for Modeling and Simulation of AI in Quantum Cloud Computing Environments

QAISim:量子云计算环境中人工智能建模与仿真工具包

NystagmusNet: Explainable Deep Learning for Photosensitivity Risk Prediction

NystagmusNet:用于光敏感风险预测的可解释深度学习

SuperFlow: Training Flow Matching Models with RL on the Fly

SuperFlow:实时使用强化学习的训练流匹配模型

Adaptive Agents in Spatial Double-Auction Markets: Modeling the Emergence of Industrial Symbiosis

空间双重拍卖市场中的适应性代理:工业共生的出现建模

ReGal: A First Look at PPO-based Legal AI for Judgment Prediction and Summarization in India

ReGal:基于PPO的法律人工智能在印度判决预测与摘要中的首次解析

Towards Autonomous Navigation in Endovascular Interventions

迈向血管内干预中的自主导航

Unifying Causal Reinforcement Learning: Survey, Taxonomy, Algorithms and Applications

统一因果强化学习:调查、分类学、算法与应用

On Swarm Leader Identification using Probing Policies

关于使用探测策略识别群体领导者

NL2CA: Auto-formalizing Cognitive Decision-Making from Natural Language Using an Unsupervised CriticNL2LTL Framework

NL2CA:利用无监督批评者框架从自然语言自动形式化认知决策 NL2LTL

Stable and Efficient Single-Rollout RL for Multimodal Reasoning

多模态推理的稳定高效单次扩展RL

Embedded Safety-Aligned Intelligence via Differentiable Internal Alignment Embeddings

通过可微分内部对齐嵌入实现嵌入式安全对齐智能

Monitoring Monitorability

可监控性

Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems

可信且可解释的深度强化学习,实现安全节能过程控制:工业压缩空气系统中的应用场景

Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

使用软演员批评(SAC)强化学习四旋翼位置控制

Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

动态熵调谐在强化学习中低级四旋翼控制:随机性与决定论

On the Universality of Transformer Architectures; How Much Attention Is Enough?

《论变压器架构的通用性》;多少关注才算足够?

When Robots Say No: The Empathic Ethical Disobedience Benchmark

当机器人说“不”:同理伦理不服从基准

Scaling up Stability: Reinforcement Learning for Distributed Control of Networked Systems in the Space of Stabilizing Policies

提升稳定性:在稳定策略领域实现网络系统分布式控制的强化学习

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

通过自玩SWE-RL培训超级智能软件代理

Distributionally Robust Multi-Agent Reinforcement Learning for Intelligent Traffic Control

分布式稳健的多智能体强化学习用于智能交通控制

Vox Deorum: A Hybrid LLM Architecture for 4X / Grand Strategy Game AI -- Lessons from Civilization V

Vox Deorum:一款用于4X/大战略游戏AI的混合大型语言模型架构——从《文明V》中汲取的经验教训

Trajectory Planning for UAV-Based Smart Farming Using Imitation-Based Triple Deep Q-Learning

基于无人机的智能农业轨迹规划,基于模仿的三重深度Q-学习

A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback

一个多智能体Text2SQL框架,使用小语言模型和执行反馈

LLM-CAS: Dynamic Neuron Perturbation for Real-Time Hallucination Correction

LLM-CAS:动态神经元扰动用于实时幻觉矫正

Offline Reinforcement Learning for End-to-End Autonomous Driving

端到端自动驾驶的离线强化学习

Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments

动态环境中的演示引导持续强化学习

A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models

通过基于能量的模型,为强化学习调谐语言模型提供理论透镜

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

InSight-o3:通过广义视觉搜索赋能多模态基础模型

Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning

高斯混合模型Q函数用于强化学习中的策略迭代

MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation

MaskFocus:聚焦于掩面图像生成的关键步骤的策略优化

From Word to World: Can Large Language Models be Implicit Text-based World Models?

从文字到世界:大型语言模型可以成为隐式基于文本的世界模型吗?

InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement

InDRiVE:通过潜在分歧实现无奖励的世界模型自动驾驶预训练

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

核心:以概念为导向的强化,弥合数学推理中定义与应用之间的鸿沟

Remedy-R: Generative Reasoning for Machine Translation Evaluation without Error Annotations

Remedy-R:无错误注释的机器翻译评估生成推理

A Framework for Deploying Learning-based Quadruped Loco-Manipulation

部署基于学习的四足机车作框架

Training Multimodal Large Reasoning Models Needs Better Thoughts: A Three-Stage Framework for Long Chain-of-Thought Synthesis and Selection

训练多模态大型推理模型需要更好的思考:一个用于长链思考综合与选择的三阶段框架

Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation

在线分布鲁棒强化学习扩展:带有一般函数近似的样本高效保证

ORPR: An OR-Guided Pretrain-then-Reinforce Learning Model for Inventory Management

ORPR:一种由运筹学指导的库存管理预培训再强化学习模型

CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models

CoDrone:由边缘和云基金会模型辅助的自主无人机导航

Tool-Augmented Hybrid Ensemble Reasoning with Distillation for Bilingual Mathematical Problem Solving

工具增强混合集合推理与提炼技术用于双语数学问题解决

AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards

AWPO:通过明确整合推理奖励,提升大型语言模型的工具使用能力

WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

WorldRFT:自动驾驶的潜在世界模型规划与强化微调

RMLer: Synthesizing Novel Objects across Diverse Categories via Reinforcement Mixing Learning

RMLer:通过强化混合学习综合不同类别的新颖对象

Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing

桥接语义与几何:一个用于遥感推理分割的解耦LVLM-SAM框架

Learning-Assisted Multi-Operator Variable Neighborhood Search for Urban Cable Routing

学习辅助多运营商变量邻域搜索城市电缆路由

Enhancing PLS of Indoor IRS-VLC Systems for Colluding and Non-Colluding Eavesdroppers

增强室内IRS-VLC系统对联合和非合作窃听者的PLS

First-Order Representation Languages for Goal-Conditioned RL

目标条件强化学习的一阶表示语言

Interpretable Hybrid Deep Q-Learning Framework for IoT-Based Food Spoilage Prediction with Synthetic Data Generation and Hardware Validation

可解释的混合深度Q学习框架,用于基于物联网的食品变质预测,结合合成数据生成和硬件验证

Learning General Policies with Policy Gradient Methods

利用政策梯度方法学习一般政策

CodeSimpleQA: Scaling Factuality in Code Large Language Models

CodeSimpleQA:在代码大型语言模型中提升事实性

LacaDM: A Latent Causal Diffusion Model for Multiobjective Reinforcement Learning

LacaDM:多目标强化学习的潜在因果扩散模型

CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal

注意哪些失败:可验证多模态的对比锚定反射

LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller

LeLaR:基于人工智能的卫星姿态控制器的首次在轨演示

Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations

从合成演示中学习可推广的手对象跟踪

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

自下而上策略优化:你的语言模型策略秘密包含内部政策

Scalably Enhancing the Clinical Validity of a Task Benchmark with Physician Oversight

在医生监督下,可扩展地提升任务基准的临床效度

Keyword: diffusion policy

A Flexible Field-Based Policy Learning Framework for Diverse Robotic Systems and Sensors

一个灵活的基于现场的政策学习框架,适用于多样化机器人系统和传感器