生成时间: 2025-12-24 16:33:13 (UTC+8); Arxiv 发布时间: 2025-12-24 20:00 EST (2025-12-25 09:00 UTC+8)

今天共有 32 篇相关文章

Keyword: reinforcement learning

QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning

O-RAN中的QoS感知动态CU选择,结合基于图的强化学习

Holographic MIMO Empowered NOMA-ISAC for 6G: Rate-Splitting Enhanced Near-Field Modeling, Multi-Objective Optimization, and Statistical Performance Validation

全息MIMO赋能NOMA-ISAC用于6G:速率分频增强近场建模、多目标优化及统计性能验证

Thermodynamic Focusing for Inference-Time Search: Practical Methods for Target-Conditioned Sampling and Prompted Inference

热力学聚焦推断时间搜索:靶条件抽样和提示推断的实用方法

Tiny, On-Device Decision Makers with the MiniConv Library

MiniConv库中的微型设备决策者

Hard Negative Sample-Augmented DPO Post-Training for Small Language Models

小型语言模型的硬负样本增强DPO后训练

OpComm: A Reinforcement Learning Framework for Adaptive Buffer Control in Warehouse Volume Forecasting

OpComm:用于仓库流量预测中自适应缓冲区控制的强化学习框架

Learning to Design City-scale Transit Routes

学习设计城市规模的交通线路

Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning

通过行为校准强化学习缓解LLM幻觉

An Optimal Policy for Learning Controllable Dynamics by Exploration

通过探索学习可控动力学的最优策略

Scaling Reinforcement Learning for Content Moderation with Large Language Models

利用大型语言模型进行内容管理的扩展强化学习

Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches

NIFTY 50 的自适应金融情绪分析,通过指令调优的大型语言模型、RAG 和强化学习方法

Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

记忆-T1:多会话代理中时间推理的强化学习

Information-directed sampling for bandits: a primer

土匪信息导向抽样:入门指南

ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language

ABBEL:通过信仰瓶颈行动的LLM代理,语言表达

Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering

基于样本过滤的高效策略约束离线深度强化学习

MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization

MolAct:一种用于分子编辑和性质优化的能动强化学习框架

Multi-hop Reasoning via Early Knowledge Alignment

通过早期知识对齐实现多跳推理

Offline Safe Policy Optimization From Heterogeneous Feedback

基于异质反馈的离线安全策略优化

RESPOND: Risk-Enhanced Structured Pattern for LLM-driven Online Node-level Decision-making

回应:风险增强结构化模式用于大型语言模型驱动的在线节点级决策

FaithLens: Detecting and Explaining Faithfulness Hallucination

FaithLens:检测与解释忠诚幻觉

Edge-Served Congestion Control for Wireless Multipath Transmission with a Transformer Agent

用于无线多径传输的边缘服务拥塞控制,采用变压器代理

Joint Design of Embedded Index Coding and Beamforming for MIMO-based Distributed Computing via Multi-Agent Reinforcement Learning

通过多智能体强化学习,为基于MIMO的分布式计算实现嵌入式索引编码与波束成形的联合设计

Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning

多任务拟合Q迭代与离线Q学习中的推广

Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks

图符号政策执行与控制(G-SPEC):5G自治网络中安全智能人工智能的神经符号框架

TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

TableGPT-R1:通过强化学习推进表格推理

Identifying Appropriately-Sized Services with Deep Reinforcement Learning

通过深度强化学习识别合适规模的服务

Resilient Packet Forwarding: A Reinforcement Learning Approach to Routing in Gaussian Interconnected Networks with Clustered Faults

弹性数据包转发:在高斯互联网络中集群故障中,一种强化学习方法进行路由

Recurrent Off-Policy Deep Reinforcement Learning Doesn't Have to be Slow

循环的非策略深度强化学习不必缓慢

Performative Policy Gradient: Optimality in Performative Reinforcement Learning

执行性策略梯度:执行性强化学习的最优性

Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information

利用高精度数字模型和强化学习实现任务工程:完美信息下空中灭火的案例研究

Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

自回归模型中的涌现时间抽象使分层强化学习成为可能

LongVideoAgent: Multi-Agent Reasoning with Long Videos

LongVideoAgent:多智能体推理与长视频

Keyword: diffusion policy

There is no result