生成时间: 2026-02-09 16:58:26 (UTC+8); Arxiv 发布时间: 2026-02-09 20:00 EST (2026-02-10 09:00 UTC+8)

今天共有 46 篇相关文章

Keyword: reinforcement learning

Transformer-Based Reinforcement Learning for Autonomous Orbital Collision Avoidance in Partially Observable Environments

基于变压器的强化学习用于部分可观测环境中自主轨道碰撞规避

Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch Reinforcement Learning

大奖:极端演员-策略错配强化学习的最佳预算拒绝采样

Self-Improving World Modelling with Latent Actions

带有潜在行动的自我改进世界建模

Flow Matching for Offline Reinforcement Learning with Discrete Actions

离散动作的离线强化学习流程匹配

Learning Rate Scaling across LoRA Ranks and Transfer to Full Finetuning

LoRA等级间的学习速率调整及全面精调化

VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation

元音提示:通过元音级韵律增强从文本中听到言语情感

Online Adaptive Reinforcement Learning with Echo State Networks for Non-Stationary Dynamics

利用回声状态网络在线自适应强化学习,用于非平稳动力学

HiWET: Hierarchical World-Frame End-Effector Tracking for Long-Horizon Humanoid Loco-Manipulation

HiWET:用于长视界类人机车控的分层世界帧端执行器跟踪

Training Data Selection with Gradient Orthogonality for Efficient Domain Adaptation

带梯度正交的训练数据选择以实现高效域适配

FMBench: Adaptive Large Language Model Output Formatting

FMBench:自适应大型语言模型输出格式化

POINTS-GUI-G: GUI-Grounding Journey

POINTS-GUI-G:GUI-接地之旅

Unlocking Noisy Real-World Corpora for Foundation Model Pre-Training via Quality-Aware Tokenization

通过质量感知的标记化解锁噪杂的现实世界语料库,用于基础模型预训练

MeDocVL: A Visual Language Model for Medical Document Understanding and Parsing

MeDocVL:用于医学文档理解与解析的视觉语言模型

Learning Human Visual Attention on 3D Surfaces through Geometry-Queried Semantic Priors

通过几何查询语义先验学习三维表面上的人类视觉注意力

TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking

TrailBlazer:黑盒大型语言模型越狱的历史引导强化学习

Evaluating an evidence-guided reinforcement learning framework in aligning light-parameter large language models with decision-making cognition in psychiatric clinical reasoning

评估基于证据的强化学习框架,使光参数大语言模型与精神病临床推理中的决策认知对齐

Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning

棱镜:多智能体强化学习中的频谱参数共享

AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

AgentCPM-Explore:实现边缘规模代理的远景深度探索

Simulating Word Suggestion Usage in Mobile Typing to Guide Intelligent Text Entry Design

模拟移动输入中的单词建议使用以指导智能文本输入设计

Adaptive Uncertainty-Aware Tree Search for Robust Reasoning

适应性不确定性感知树搜索,支持稳健推理

DreamHome-Pano: Design-Aware and Conflict-Free Panoramic Interior Generation

DreamHome-Pano:设计意识和无冲突的全景室内生成

World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy

世界VLA环:视频世界模型和VLA策略的闭环学习

Progress Constraints for Reinforcement Learning in Behavior Trees

行为树中强化学习的进展约束

Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion

零射程执行器反转的动态对齐共享超网络

SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees

SeeUPO:带收敛保证的序列级智能强化学习

Reinforcement Learning-Based Dynamic Management of Structured Parallel Farm Skeletons on Serverless Platforms

基于强化学习的结构化并行农场骨架在无服务器平台上的动态管理

SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs

SPARC:分离感知电路与推理电路以实现VLM测试时间尺度

Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response

具有联合体验最佳响应的样本高效策略空间响应预言机

The hidden risks of temporal resampling in clinical reinforcement learning

临床强化学习中时间抽样的隐性风险

Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations

类人控界面:来自无机器人演示中的类人生物全身控

compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data

compar:IA:法国政府的大型语言模型平台,用于收集法语人类提示和偏好数据

Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models

评估和增强大型语言模型的脆弱性推理能力

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

F-GRPO:不要让你的政策学到显而易见的事实,而忘记罕见的

Semantically Labelled Automata for Multi-Task Reinforcement Learning with LTL Instructions

多任务强化学习的语义标签自动机,采用LTL指令

R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging

R-Align:通过以理性为中心的元评判提升生成奖励模型

Soft Forward-Backward Representations for Zero-shot Reinforcement Learning with General Utilities

零样品强化学习的软前后表示,采用通用效用

UnifSrv: AP Selection for Achieving Uniformly Good Performance of CF-MIMO in Realistic Urban Networks

UnifSrv:实现真实城市网络中CF-MIMO均优性能的AP评选

Generating Data-Driven Reasoning Rubrics for Domain-Adaptive Reward Modeling

生成基于数据的数据的推理评分标准用于领域自适应奖励建模

AEGPO: Adaptive Entropy-Guided Policy Optimization for Diffusion Models

AEGPO:扩散模型的自适应熵引导策略优化

SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks

SEMA:多回合越狱攻击的简单而有效的学习方法

A first realization of reinforcement learning-based closed-loop EEG-TMS

基于强化学习的闭环脑电图-TMS的首次实现

Continuous-time reinforcement learning: ellipticity enables model-free value function approximation

连续时间强化学习:椭圆性使得无模型的价值函数近似成为可能

Cochain Perspectives on Temporal-Difference Signals for Learning Beyond Markov Dynamics

关于超越马尔可夫动力学学习的时间差分信号的余链视角

Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches

主动磁悬浮系统的最优导数反馈控制:数据驱动方法的实验研究

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

InftyThink+:通过强化学习实现高效且高效的无限视野推理

MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images

MedMO:医学图像多模态大语言模型的基础化与理解

Keyword: diffusion policy

There is no result