生成时间: 2026-01-30 16:45:03 (UTC+8); Arxiv 发布时间: 2026-01-30 20:00 EST (2026-01-31 09:00 UTC+8)

今天共有 66 篇相关文章

Keyword: reinforcement learning

Distributional Active Inference

分布主动推断

SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model

SIGMA-PPG:PPG基础模型的统计先验知情生成掩蔽架构

Log2Motion: Biomechanical Motion Synthesis from Touch Logs

Log2Motion:来自Touch Logs的生物力学运动合成

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B 技术报告

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

OpenSec:在对抗性证据下衡量事件响应代理校准

Deep Reinforcement Learning for Fault-Adaptive Routing in Eisenstein-Jacobi Interconnection Topologies

Eisenstein-Jacobi互连拓扑中故障自适应路由的深度强化学习

Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed

安全强化学习中的分布转移下的安全泛化:糖尿病测试平台

Do Reasoning Models Enhance Embedding Models?

推理模型能增强嵌入模型吗?

When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

我应该在什么时候搜索更多:带强化学习的自适应复查询优化

Intelli-Planner: Towards Customized Urban Planning via Large Language Model Empowered Reinforcement Learning

Intelli-Planner:通过大型语言模型赋能强化学习实现定制城市规划

Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification

减少噪音,多声音:通过指令净化实现推理的强化学习

Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels

元评估强化学习:对齐无实地标签的语言模型

EGAM: Extended Graph Attention Model for Solving Routing Problems

EGAM:用于解决路由问题的扩展图关注模型

The Surprising Difficulty of Search in Model-Based Reinforcement Learning

基于模型的强化学习中搜索的惊人难度

Few-Shot Learning for Dynamic Operations of Automated Electric Taxi Fleets under Evolving Charging Infrastructure: A Meta-Deep Reinforcement Learning Approach

在不断演变的充电基础设施下,自动化电动出租车车队动态运营的少数样本学习:一种元深度强化学习方法

Heterogeneous Vertiport Selection Optimization for On-Demand Air Taxi Services: A Deep Reinforcement Learning Approach

按需空中出租车服务的异构垂直机场选择优化:深度强化学习方法

Self-Improving Pretraining: using post-trained models to pretrain better models

自我改进预训练:利用后训练模型预训练更好的模型

Factored Causal Representation Learning for Robust Reward Modeling in RLHF

RLHF中伦理奖励建模的因果表征学习

Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control

迈向弥合大规模预训练与高效精调化之间的差距,用于类人生物控制

Intrinsic Reward Policy Optimization for Sparse-Reward Environments

稀疏奖励环境下的内在奖励策略优化

Towards Space-Based Environmentally-Adaptive Grasping

迈向基于空间的环境适应把握

Mitigating Overthinking in Large Reasoning Models via Difficulty-aware Reinforcement Learning

通过困难感知强化学习缓解大型推理模型中的过度思考

HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing

她:类人推理与强化学习用于大型语言模型角色扮演

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

MemOCR:布局感知视觉记忆,用于高效的长视野推理

SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models

SOUP:大型语言模型的令牌级单样本混合策略强化学习

Mean-Field Control on Sparse Graphs: From Local Limits to GNNs via Neighborhood Distributions

稀疏图的均值场控制:从局部极限到GNNs,通过邻域分布

ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

ETS:能量引导测试时间尺度,用于无训练强化学习对齐

Explicit Credit Assignment through Local Rewards and Dependence Graphs in Multi-Agent Reinforcement Learning

通过多智能体强化学习中的局部奖励和依赖图实现显式的学分分配

Training slow silicon neurons to control extremely fast robots with spiking reinforcement learning

训练慢速硅神经元以控制极快机器人,采用尖峰强化学习

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

ASTRA:自动合成能动轨迹与强化场域

Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks

信号自适应信任区域用于反复尖峰神经网络的无梯度优化

Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

可扩展功率采样:通过分布锐化解锁高效、无训练的大型语言模型推理

Beyond Imitation: Reinforcement Learning for Active Latent Planning

超越模仿:主动潜在规划的强化学习

RecNet: Self-Evolving Preference Propagation for Agentic Recommender Systems

RecNet:代理推荐系统中的自我演进偏好传播

PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization

PathReasoner-R1:通过知识引导策略优化,将结构化推理融入病理学视觉语言模型

Expected Return Causes Outcome-Level Mode Collapse in Reinforcement Learning and How to Fix It with Inverse Probability Scaling

期望回报导致强化学习中的结果层级模式崩溃以及如何通过逆概率尺度法解决

BAP-SRL: Bayesian Adaptive Priority Safe Reinforcement Learning for Vehicle Motion Planning at Mixed Traffic Intersections

BAP-SRL:混合交通路口车辆运动规划的贝叶斯自适应优先安全强化学习

Can David Beat Goliath? On Multi-Hop Reasoning with Resource-Constrained Agents

大卫能打败歌利亚吗?关于资源受限代理的多跳推理

TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning

TACLer:量身定制的课程强化学习,助力高效推理

Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations

解开感知和推理,以提升学习布料作中的数据效率,无需演示

Mixed-Precision Training and Compilation for RRAM-based Computing-in-Memory Accelerators

基于RRAM的内存计算加速器的混合精度训练与编译

Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems

认知情境学习:在基于LLM的多智能体系统中正确建立信任

Language-based Trial and Error Falls Behind in the Era of Experience

基于语言的试错在经验时代落后

Influence Guided Sampling for Domain Adaptation of Text Retrievers

影响引导采样用于文本检索器的域适应

OneMall: One Model, More Scenarios -- End-to-End Generative Recommender Family at Kuaishou E-Commerce

OneMall:一个模型,更多场景——快手电商端到端生成推荐器家族

Error Amplification Limits ANN-to-SNN Conversion in Continuous Control

误差放大限制了连续控制中人工神经网络到噪声网络的转换

Distribution-Aware Reward Estimation for Test-Time Reinforcement Learning

分布感知奖励估计用于测试时间强化学习

Constrained Meta Reinforcement Learning with Provable Test-Time Safety

具备可证明测试时间安全性的受限元强化学习

READY: Reward Discovery for Meta-Black-Box Optimization

准备好了:为元黑盒优化发现奖励

Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting

移动边缘无人机网络的时空持续学习:缓解灾难性遗忘

WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents

WebArbiter:基于原则引导的推理过程奖励模型,适用于网络代理

From Meta-Thought to Execution: Cognitively Aligned Post-Training for Generalizable and Reliable LLM Reasoning

从元思维到执行:认知对齐的后训练,实现可推广且可靠的大型语言模型推理

ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation

ProRAG:用于检索增强生成的过程监督强化学习

Self-Compression of Chain-of-Thought via Multi-Agent Reinforcement Learning

通过多智能体强化学习实现思维链的自我压缩

Optimistic Transfer under Task Shift via Bellman Alignment

通过贝尔曼对齐在任务转移下的乐观转移

OVD: On-policy Verbal Distillation

OVD:政策上的言语提炼

Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding

令牌守护:通过自我检查解码实现令牌级幻觉控制

Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

学习去中心化大型语言模型与多智能体演员批评者协作

Elign: Equivariant Diffusion Model Alignment from Foundational Machine Learning Force Fields

Elign:基于基础机器学习力场的等变扩散模型对齐

Geometry of Drifting MDPs with Path-Integral Stability Certificates

带有路径积分稳定性证书的漂移MDP几何形状

SymbXRL: Symbolic Explainable Deep Reinforcement Learning for Mobile Networks

SymbXRL:移动网络的符号可解释深度强化学习

SIA: Symbolic Interpretability for Anticipatory Deep Reinforcement Learning in Network Control

SIA:网络控制中预期性深度强化学习的符号可解释性

Learning to Dial-a-Ride: A Deep Graph Reinforcement Learning Approach to the Electric Dial-a-Ride Problem

学习预约乘车:一种深度图强化学习方法解决电动预约乘车问题

DynaWeb: Model-Based Reinforcement Learning of Web Agents

DynaWeb:基于模型的网络代理强化学习

Exploring Reasoning Reward Model for Agents

探索智能体的推理奖励模型

Keyword: diffusion policy

PocketDP3: Efficient Pocket-Scale 3D Visuomotor Policy

PocketDP3:高效的袖珍尺度三维维动体策略