生成时间: 2026-01-21 16:37:43 (UTC+8); Arxiv 发布时间: 2026-01-21 20:00 EST (2026-01-22 09:00 UTC+8)

今天共有 62 篇相关文章

Keyword: reinforcement learning

GRADE: Replacing Policy Gradients with Backpropagation for LLM Alignment

GRADE:用反向传播替代策略梯度以实现LLM对齐

Bielik 11B v3: Multilingual Large Language Model for European Languages

Bielik 11B v3:欧洲语言多语言大语言模型

Hindsight Preference Replay Improves Preference-Conditioned Multi-Objective Reinforcement Learning

事后诸葛亮偏好重放改进偏好条件多目标强化学习

Reinforcement Learning for Dynamic Workflow Optimization in CI/CD Pipelines

CI/CD管道中动态工作流优化的强化学习

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

基于LLM的软件工程问题解决的进展与前沿:一项综合综述

AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

AGGC:用于稳定大型语言模型训练的自适应群体梯度裁剪

Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration

控制受限强化学习中的低估偏差以实现安全探索

R$^2$PO: Decoupling Training Trajectories from Inference Responses for LLM Reasoning

R$^2$PO:将训练轨迹与推理响应解耦用于大型语言模型推理

Extreme Value Policy Optimization for Safe Reinforcement Learning

安全强化学习的极值策略优化

Profit Maximization for Electric Vehicle Charging Stations Using Multiagent Reinforcement Learning

利用多智能体强化学习实现电动汽车充电站利润最大化

UniMo: Unified Motion Generation and Understanding with Chain of Thought

UniMo:统一运动生成与理解与思维链

Aletheia: What Makes RLVR For Code Verifiers Tick?

Aletheia:是什么让代码验证器的RLVR运作?

Speculative Sampling with Reinforcement Learning

与强化学习的推测采样

Optimal Power Allocation and Sub-Optimal Channel Assignment for Downlink NOMA Systems Using Deep Reinforcement Learning

使用深度强化学习的下行NOMA系统的最优功率分配与次优信道分配

Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation

超越狄拉克δ:缓解增强微调中的多样性崩溃以实现多功能图像生成

RLMiner: Finding the Most Frequent k-sized Subgraph via Reinforcement Learning

RLMiner:通过强化学习寻找最常见的k维子图

ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

ReWorld:具身世界模型的多维奖励建模

Incentivizing In-depth Reasoning over Long Contexts with Process Advantage Shaping

通过过程优势塑造激励长期背景下的深入推理

Agentic Reasoning for Large Language Models

大型语言模型的能动推理

STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models

STEP-LLM:利用大型语言模型从自然语言生成CAD STEP模型

Multiagent Reinforcement Learning in Enhancing Resilience of Microgrids under Extreme Weather Events

多智能体强化学习在增强微电网在极端天气事件下的韧性

Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks

利用图神经网络实现估计误差最小化的去中心化学习策略

Resource-Conscious RL Algorithms for Deep Brain Stimulation

资源意识型强化学习算法用于深脑刺激

Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization

竞技游戏中的奖励解码:带熵正则化的逆博弈论

Teaching Large Reasoning Models Effective Reflection

教授大型推理模型 有效反思

Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off

以分布为中心的策略优化主导了探索与开发的权衡

Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction

通过环境交互教大语言模型学习工具试用和执行

Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout Coordination

通过停滞约束的推广协调,释放高效的异步强化学习后培训

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

FRoM-W1:迈向通用类人生物全身控制及语言指令

Communication Methods in Multi-Agent Reinforcement Learning

多智能体强化学习中的通信方法

PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient

PaperGuide:让小型语言模型的纸张阅读代理更高效

Graph Reasoning Paradigm: Structured and Symbolic Reasoning with Topology-Aware Reinforcement Learning for Large Language Models

图推理范式:结构化与符号推理,结合拓扑感知强化学习,适用于大型语言模型

Think3D: Thinking with Space for Spatial Reasoning

Think3D:空间思维以实现空间推理

Feedforward-Feedback Integration in Flight Control: Reinforcement Learning with Sliding Mode Control

飞控中的前馈-反馈集成:带滑动模式控制的强化学习

Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning

通过强化学习进行情境化推理的能动会话搜索

Training instability in deep learning follows low-dimensional dynamical principles

深度学习中的训练不稳定性遵循低维动力学原理

Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints

纳米尺度的自主导航:算法、架构与约束

CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning

Cure-Med:多语言医学推理的课程导向强化学习

Balancing Classification and Calibration Performance in Decision-Making LLMs via Calibration Aware Reinforcement Learning

通过校准感知强化学习平衡决策型大型语言模型中的分类与校准性能

Group Relative Policy Optimization for Robust Blind Interference Alignment with Fluid Antennas

与流体天线实现稳健盲干扰对准的群相对策略优化

Reasoning While Recommending: Entropy-Guided Latent Reasoning in Generative Re-ranking Models

推荐时的推理:生成式重新排序模型中的熵引导潜在推理

Behavior Knowledge Merge in Reinforced Agentic Models

强化代理模型中的行为知识融合

Communication-Free Collective Navigation for a Swarm of UAVs via LiDAR-Based Deep Reinforcement Learning

通过基于激光雷达的深度强化学习,为一群无人机实现无通信的集体导航

Reinforcement Learning for Opportunistic Routing in Software-Defined LEO-Terrestrial Systems

软件定义LEO-地面系统中机会性路由的强化学习

Finding RELIEF: Shaping Reasoning Behavior without Reasoning Supervision via Belief Engineering

寻找缓解:通过信念工程塑造无推理监督的推理行为

TractRLFusion: A GPT-Based Multi-Critic Policy Fusion Framework for Fiber Tractography

TractRLFusion:基于GPT的多批判者政策融合框架,用于纤维图谱

HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs

HyperWalker:基于动态超图谱的深度诊断,用于医疗VLM中EHR和X光的多跳临床建模

Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning

一瞥或凝视:通过强化学习激励LMM自适应聚焦搜索

RL-BioAug: Label-Efficient Reinforcement Learning for Self-Supervised EEG Representation Learning

RL-BioAug:自监督脑电表征学习的标签高效强化学习

RM-Distiller: Exploiting Generative LLM for Reward Model Distillation

RM-Distiller:利用生成式大型语言模型进行奖励模型蒸馏

Optimizing Energy and Data Collection in UAV-aided IoT Networks using Attention-based Multi-Objective Reinforcement Learning

利用基于注意力的多目标强化学习优化无人机辅助物联网网络中的能源和数据收集

Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning

扩散引导后门攻击在现实强化学习中的应用

CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems

创建:跨层韧性特性描述与优化,以实现高效且可靠的具身人工智能系统

Toward Efficient Agents: Memory, Tool learning, and Planning

迈向高效代理:记忆、工具学习与规划

Differentiated Pickup Point Offering for Emission Reduction in Last-Mile Delivery

差异化取货点服务,实现最后一公里配送的减排

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

InT:自我提案干预使LLM推理中的学分分配成为可能

Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment

基于注意力的离线强化学习与可解释败血症治疗的聚类

KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning

KAGE-Bench:强化学习中的快速已知轴视觉泛化评估

Q-learning with Adjoint Matching

带伴随匹配的Q学习

Spatiotemporal Wildfire Prediction and Reinforcement Learning for Helitack Suppression

直升机扑灭的时空野火预测与强化学习

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Jet-RL:通过统一培训和推广精准流程实现政策内FP8强化学习

Keyword: diffusion policy

Sparse ActionGen: Accelerating Diffusion Policy with Real-time Pruning

稀疏ActionGen:通过实时修剪加速扩散政策