生成时间: 2026-02-16 16:53:25 (UTC+8); Arxiv 发布时间: 2026-02-16 20:00 EST (2026-02-17 09:00 UTC+8)

今天共有 42 篇相关文章

Keyword: reinforcement learning

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

用于基础设施运行与维护中关节组件的机器人作的能源感知强化学习

Abstractive Red-Teaming of Language Model Character

语言模型特征的抽象红队化

Wireless TokenCom: RL-Based Tokenizer Agreement for Multi-User Wireless Token Communications

无线令牌Com:基于强化学习的多用户无线令牌通信令牌协议

Intrinsic Credit Assignment for Long Horizon Interaction

长视距相互作用的内在信用分配

LongNav-R1: Horizon-Adaptive Multi-Turn RL for Long-Horizon VLA Navigation

LongNav-R1:用于长视距VLA导航的地平自适应多转RL

Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning

利用集合错误进行强化学习探索的价值奖励

Provably Convergent Actor-Critic in Risk-averse MARL

风险规避型MARL中可证实的收敛演员-批评者

Synthetic Interaction Data for Scalable Personalization in Large Language Models

大型语言模型中可扩展个性化的合成交互数据

What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis

强化学习对视觉推理有什么提升?弗兰肯斯坦式分析

AstRL: Analog and Mixed-Signal Circuit Synthesis with Deep Reinforcement Learning

AstRL:模拟与混合信号电路合成,结合深度强化学习

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

大型语言模型的代理技能:架构、习得、安全及未来发展路径

Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models

通过基于恢复的屏蔽与高斯过程动力学模型进行安全强化学习

Theory of Mind Guided Strategy Adaptation for Zero-Shot Coordination

心智理论引导策略适应零射击协调

Designing RNAs with Language Models

利用语言模型设计RNA

Composable Model-Free RL for Navigation with Input-Affine Systems

可组合无模型强化学习,用于输入仿射系统的导航

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

关于强化学习微调VLM的鲁棒性和思维链一致性

Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games

Bench-MFG:平稳均值场博弈学习基准套件

Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings

多智能体基于模型的强化学习与联合状态-行动学习嵌入

Constraint-Rectified Training for Efficient Chain-of-Thought

约束纠正训练以实现高效思维链

Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

流工厂:流匹配模型中强化学习的统一框架

Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

排名推理:利用大型语言模型进行推荐的端到端解决方案

To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models

混合还是合并:迈向大型语言模型的多领域强化学习

VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction

VI-CuRL:通过置信导方差减少稳定验证器无关强化学习推理

RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

RLinf-Co:基于强化学习的VLA模型模拟-实境共训练

Dual-Granularity Contrastive Reward via Generated Episodic Guidance for Efficient Embodied RL

通过生成的情节指导实现双粒度对比奖励,实现高效的具身强化学习

Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics

通过潜在动力学统一无模型效率和基于模型的表示

PMG: Parameterized Motion Generator for Human-like Locomotion Control

PMG:参数化运动发生器,用于类人运动控制

$\mathcal{X}$-KD: General Experiential Knowledge Distillation for Large Language Models

$\mathcal{X}$-KD:大型语言模型的通用体验式知识蒸馏

ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training

ALOE:视觉-语言-行动模型培训后行动层级非策略评估

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

MedXIAOHE:打造医疗多层次营销的全面配方

TRANS: Terrain-aware Reinforcement Learning for Agile Navigation of Quadruped Robots under Social Interactions

译者:基于地形感知的强化学习,用于四足机器人在社交互动下的敏捷导航

FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching

FLAC:通过动能正则化桥匹配实现的最大熵强化

EARL: Energy-Aware Adaptive Antenna Control with Reinforcement Learning in O-RAN Cell-Free Massive MIMO Networks

EARL:在O-RAN无单元大规模MIMO网络中的增强学习中实现能量感知自适应天线控制

Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models

摊销推理树搜索:大型语言模型中的解耦提案与判定

DPUConfig: Optimizing ML Inference in FPGAs Using Reinforcement Learning

DPUConfig:利用强化学习优化FPGA中的机器学习推断

Hierarchical Reinforcement Learning for Cooperative Air-Ground Delivery in Urban System

城市系统中合作空地投递的层级强化学习

Look Inward to Explore Outward: Learning Temperature Policy from LLM Internal States via Hierarchical RL

向内探索:通过层级强化学习从LLM内部状态学习温度政策

TCRL: Temporal-Coupled Adversarial Training for Robust Constrained Reinforcement Learning in Worst-Case Scenarios

TCRL:在最坏情况下实现强健受限强化学习的时序耦合对抗训练

Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation

课程-DPO++:通过数据和模型课程进行文本到图像生成的直接偏好优化

Peaceful Anarcho-Accelerationism: Decentralized Full Automation for a Society of Universal Care

和平无政府加速主义:全民护理社会的去中心化全自动化

Learning to Approximate Uniform Facility Location via Graph Neural Networks

通过图神经网络学习近似均匀设施位置

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach

上下文自治网络事件响应:端到端大型语言模型代理方法

Keyword: diffusion policy

There is no result