生成时间: 2026-06-19 20:08:11 (UTC+8); Arxiv 发布时间: 2026-06-19 20:00 EDT (2026-06-20 08:00 UTC+8)

今天共有 32 篇相关文章

Keyword: reinforcement learning

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

实体雅达利:一个强大且易于访问的实时机器人强化学习平台

Human-like autonomy emerges from self-play and a pinch of human data

类人自主性源自自我游戏和一点人类数据

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

Insulin4RL:重症监护病房的实时胰岛素管理,用于离线强化学习

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

通过决策树蒸馏对学习到的多智能体通信策略进行形式验证

CTS-MoE: Implicit Terrain Adaptation via Mixture-of-Experts for Perceptive Locomotion

CTS-MoE:通过专家混合实现隐式地形适应以实现感知移动

DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning

DF-ExpEnse:扩散滤波探索以实现样本高效微调化

Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems

多粒度注意力驱动强化学习框架,用于网络智能增强系统

OnDeFog: Online Decision Transformer under Frame Dropping

OnDeFog:帧丢失下的在线决策变换器

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

《流形盗贼:贝叶斯课程学习:大型语言模型潜在几何学》

Temporal Self-Imitation Learning

时间自我模仿学习

Beyond Entropy: Learning from Token-Level Distributional Deviations for LLM Reasoning

超越熵:从代币级分布偏差中学习LLM推理

Uncertainty-Aware Reward Modeling for Stable RLHF

稳定RLHF的不确定性感知奖励建模

MetaResearcher: Scaling Deep Research via Self-Reflective Reinforcement Learning in Adversarial Virtual Environments

MetaResearcher:在对抗性虚拟环境中通过自我反思强化学习扩展深度研究

CARE: Competence-Aware Reward Shaping for Adaptive Reasoning Length in Video-MLLMs

关怀:视频MLLM中适应性推理长度的能力意识奖励塑造

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

连接点:通过强化学习实现跨域泛化的长生命周期代理大型语言模型训练

VIMPO: Value-Implicit Policy Optimization for LLMs

VIMPO:面向大型语言模型的价值隐式策略优化

Hierarchical Control in Multi-Agent Games: LLM-based Planning and RL Execution

多智能体游戏中的层级控制:基于LLM的规划与强化学习执行

A Neuromorphic Reinforcement Learning Framework for Efficient Pathfinding in Robotic Mobile Fulfillment Systems

一个用于机器人移动履约系统高效路径寻寻的神经形态强化学习框架

Process-Verified Reinforcement Learning for Theorem Proving via Lean

通过精益证明定理的过程验证强化学习

Multi-Head Attention-Based Feature Extractor Integration with Soft Actor-Critic for Porosity Prediction and Process Parameter Optimization in Additive Manufacturing

多头基于注意力的特征提取器与软演员-批判者集成,用于增材制造中的孔隙度预测和工艺参数优化

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

均值分位数:一种无加分的集成方法用于极小极大最优强化学习

Augmenting Game AI with Deep Reinforcement Learning

用深度强化学习增强游戏AI。

A Multi-Agent system for Multi-Objective constrained optimization

一个多目标约束优化的多代理系统

ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval

ELVA:探索排名驱动的通用多模态检索

A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

一种基于模型的强化学习环境家庭开发方法

CRAX: Fast Safe Reinforcement Learning Benchmarking

CRAX:快速安全强化学习基准测试

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

直接优势估计,实现可扩展且样本高效的深度强化学习

TaCauchy: An Extensible FEM Framework for Vision-Based Tactile Simulation

TaCauchy:一种用于基于视觉的可扩展有限元(FEM)触摸模拟框架

Fast Human Attention Prediction for Fixation-guided Active Perception in Autonomous Navigation

自主导航中注视引导主动感知的快速人类注意力预测

Keyword: diffusion policy

DiffusionVS: A Generative Framework for Robust Visual Servoing Based on Diffusion Policy

DiffusionVS:基于扩散策略的稳健视觉服务生成框架

MirrorDuo: Reflection-Consistent Visuomotor Learning from Mirrored Demonstration Pairs

MirrorDuo:镜像示范对的反射一致视觉运动学习

Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation

频率感知流量匹配,实现连续且一致的机器人动作生成