生成时间: 2026-01-14 16:34:51 (UTC+8); Arxiv 发布时间: 2026-01-14 20:00 EST (2026-01-15 09:00 UTC+8)

今天共有 40 篇相关文章

Keyword: reinforcement learning

Reinforcement Learning Methods for Neighborhood Selection in Local Search

局部搜索中邻域选择的强化学习方法

Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety

先例与法规并列推理:案件加重审议以实现LLM安全

FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures

图Ex2:科学复合图形的视觉条件面板检测与字幕

Formalizing the Relationship between Hamilton-Jacobi Reachability and Reinforcement Learning

形式化汉密尔顿-雅各比可达性与强化学习之间的关系

Forecast Aware Deep Reinforcement Learning for Efficient Electricity Load Scheduling in Dairy Farms

预报感知深度强化学习,助力奶牛场高效用电负载调度

DRL-based Power Allocation in LiDAL-Assisted RLNC-NOMA OWC Systems

基于DRL的功率分配,LiDAL辅助的RLNC-NOMA OWC系统

STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order

STO-RL:通过LLM引导子目标时间顺序进行稀疏奖励下的离线强化学习

Structure Detection for Contextual Reinforcement Learning

情境强化学习中的结构检测

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

逆流匹配:带有扩散和流策略的在线强化学习统一框架

ZeroDVFS: Zero-Shot LLM-Guided Core and Frequency Allocation for Embedded Platforms

ZeroDVFS:嵌入式平台的零发射级大规模语言模型(LLM)引导核心与频率分配

Scalable Multiagent Reinforcement Learning with Collective Influence Estimation

可扩展多智能体强化学习与集体影响估计

The End of Reward Engineering: How LLMs Are Redefining Multi-Agent Coordination

奖励工程的终结:大型语言模型如何重新定义多智能体协调

Incorporating Cognitive Biases into Reinforcement Learning for Financial Decision-Making

将认知偏误融入强化学习以实现财务决策

Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks

大型人工智能模型引导深度强化学习,用于非地球网络资源分配

Unleashing Tool Engineering and Intelligence for Agentic AI in Next-Generation Communication Networks

释放工具工程与智能化,应用于下一代通信网络中的代理人工智能

Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees

通过展开树发现并强化工具集成推理链

D$^2$Plan: Dual-Agent Dynamic Global Planning for Complex Retrieval-Augmented Reasoning

D$^2$Plan:复杂检索增强推理的双代理动态全局规划

ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning

ORBIT:政策上的探索——可控多预算推理的利用

AtomMem : Learnable Dynamic Agentic Memory with Atomic Memory Operation

AtomMem :可学习的动态代理记忆,具原子记忆作

Safe Heterogeneous Multi-Agent RL with Communication Regularization for Coordinated Target Acquisition

具通信正则化的安全异构多剂强化语言,用于协调目标获取

Owen-Shapley Policy Optimization (OSPO): A Principled RL Algorithm for Generative Search LLMs

Owen-Shapley 策略优化(OSPO):一种用于生成式搜索大型语言模型的原则性强化学习算法

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

评分标准中心:通过自动粗细生成的全面且高度判别性的评分标准数据集

Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?

具身智能驾驶的大型多模式模型:自动驾驶的下一个前沿?

Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management

Fine-Mem:用于长视野内存管理的细粒度反馈对齐

Incentivizing Cardiologist-Like Reasoning in MLLMs for Interpretable Echocardiographic Diagnosis

激励MLLM中类似心脏病学家的推理以实现可解读的超声心动图诊断

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

JudgeRLVR:先评判,后生成以实现高效推理

Baiting AI: Deceptive Adversary Against AI-Protected Industrial Infrastructures

诱导人工智能:针对人工智能保护工业基础设施的欺骗性对手

AME-2: Agile and Generalized Legged Locomotion via Attention-Based Neural Map Encoding

AME-2:通过注意力神经图编码实现敏捷和广义腿部运动

AUV Trajectory Learning for Underwater Acoustic Energy Transfer and Age Minimization

AUV轨迹学习用于水下声能传递和年龄最小化

Your Group-Relative Advantage Is Biased

你的群体相对优势是有偏的

Provably Safe Reinforcement Learning using Entropy Regularizer

使用熵正则化器实现可证明的安全强化学习

From Classical to Quantum Reinforcement Learning and Its Applications in Quantum Control: A Beginner's Tutorial

从经典到量子强化学习及其在量子控制中的应用:初学者教程

VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory

VLingNav:具身导航结合自适应推理和视觉辅助语言记忆

PersonaDual: Balancing Personalization and Objectivity via Adaptive Reasoning

PersonaDual:通过适应性推理平衡个性化与客观性

QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models

QuantEval:大型语言模型中财务定量任务的基准

Model-Agnostic Solutions for Deep Reinforcement Learning in Non-Ergodic Contexts

非遍历语境下深度强化学习的模型无关解决方案

TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback

TerraFormer:自动化基础设施即代码,LLM通过策略引导的验证反馈进行微调

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

奖励稀有:独特性意识强化学习在大型语言模型中创造性解决问题

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

多工思维:通过令牌分支与合并进行推理

Keyword: diffusion policy

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

逆流匹配:带有扩散和流策略的在线强化学习统一框架