生成时间: 2025-12-26 16:30:45 (UTC+8); Arxiv 发布时间: 2025-12-25 20:00 EST (2025-12-26 09:00 UTC+8)

今天共有 20 篇相关文章

Keyword: reinforcement learning

BitRL-Light: 1-bit LLM Agents with Deep Reinforcement Learning for Energy-Efficient Smart Home Lighting Optimization

BitRL-Light:具备深度强化学习的1位LLM代理,实现高效智能家居照明优化

Quantum-Inspired Multi Agent Reinforcement Learning for Exploration Exploitation Optimization in UAV-Assisted 6G Network Deployment

量子启发的多智能体强化学习,用于无人机辅助6G网络部署中的探索与利用优化

Mechanism-Based Intelligence (MBI): Differentiable Incentives for Rational Coordination and Guaranteed Alignment in Multi-Agent Systems

基于机制的智能(MBI):多智能体系统中合理协调和保证对齐的可差异激励

AI-Driven Green Cognitive Radio Networks for Sustainable 6G Communication

人工智能驱动的绿色认知无线网络,实现可持续的6G通信

AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent

AgentMath:通过工具增强代理赋能大型语言模型的数学推理

Generalization of RLVR Using Causal Reasoning as a Testbed

利用因果推理作为试验平台推广RLVR

Safety Alignment of LMs via Non-cooperative Games

通过非合作游戏实现登陆舱的安全对齐

Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions

参数化动作强化学习的上下文敏感抽象

NVIDIA Nemotron 3: Efficient and Open Intelligence

NVIDIA Nemotron 3:高效且开放的智能

The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents

沉默学者问题:打破LLM智能体认知不对称的概率框架

Embodied AI-Enhanced IoMT Edge Computing: UAV Trajectory Optimization and Task Offloading with Mobility Prediction

具身人工智能增强的物联网边缘计算:无人机轨迹优化与任务卸载,结合移动预测

One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents

一个工具就足够了:为仓库级LLM代理提供的强化学习

ReACT-Drug: Reaction-Template Guided Reinforcement Learning for de novo Drug Design

ReACT-Drug:新药物设计中的反应模板引导强化学习

Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions

具有可学习基函数的深贝叶斯强化学习中的广义线性模型

LLM-Empowered Agentic AI for QoE-Aware Network Slicing Management in Industrial IoT

基于LLM赋能的代理人工智能,用于工业物联网中的QoE感知网络切片管理

Policy-Conditioned Policies for Multi-Agent Task Solving

多智能体任务解决的策略条件策略

LSTM-Based Modeling and Reinforcement Learning Control of a Magnetically Actuated Catheter

基于LSTM的建模与磁性驱动导管的强化学习控制

Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics

动力式强化学习:非线性动力学建模与控制

Global End-Effector Pose Control of an Underactuated Aerial Manipulator via Reinforcement Learning

全局末端执行器姿态通过强化学习控制欠驱动空中机械臂

MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

MiST:理解中期科学培训在化学推理模型开发中的作用

Keyword: diffusion policy

There is no result