生成时间: 2026-01-06 16:36:01 (UTC+8); Arxiv 发布时间: 2026-01-06 20:00 EST (2026-01-07 09:00 UTC+8)

今天共有 42 篇相关文章

Keyword: reinforcement learning

Horizon Reduction as Information Loss in Offline Reinforcement Learning

视界约简作为离线强化学习中的信息丢失

SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation

SmartFlow强化学习与智能人工智能用于共享单车优化

VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition

视频学习学习:正交难度分解视频课程强化学习

Dichotomous Diffusion Policy Optimization

二分扩散策略优化

DVGBench: Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models

DVGBench:无人机影像中基于大型视觉语言模型的隐式到显式视觉基础基准测试

Performance and Security Aware Distributed Service Placement in Fog Computing

雾计算中性能与安全感知的分布式服务部署

Latent Space Reinforcement Learning for Multi-Robot Exploration

多机器人探索的潜空间强化学习

ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation

ORION:用于合作多智能体在线导航的选项正则化深度强化学习

Reinforcement Learning Based Whittle Index Policy for Scheduling Wireless Sensors

基于强化学习的Whittle索引计划无线传感器调度策略

SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards

SecureCodeRL:基于部分学分奖励的代码生成安全意识强化学习

OrchestrRL: Dynamic Compute and Network Orchestration for Disaggregated RL

OrchestrRL:去中心化强化学习的动态计算与网络编排

PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS

PyBatchRender:一个用于批量渲染、最高可达百万帧率的 Python 库

dataRLsec: Safety, Security, and Reliability With Robust Offline Reinforcement Learning for DPAs

dataRLsec:DPA的安全、保障与可靠性,结合强健的离线强化学习

DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

DreamID-V:通过扩散变换器弥合图像与视频之间的高保真面部交换差距

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

SWE-Lego:推动监督式微调的极限,用于软件问题解决

Context-Aware Information Transfer via Digital Semantic Communication in UAV-Based Networks

基于无人机的网络中通过数字语义通信实现上下文感知信息传输

Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization

通过优势解耦偏好优化实现视觉语言模型的统一生成与自我验证

Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

逻辑-STEM:通过失败驱动的后期培训和文档知识增强赋能LLM推理

HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller

HanoiWorld:基于预测架构的联合嵌入式自动驾驶车辆控制器模型

DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos

DemoBot:从第三人称人类视频中高效学习双手作,灵活双手作

Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives

多目标的对抗实例生成与神经组合优化的稳健训练

SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines

SRAS:基于强化学习的轻量级文档选择器,适用于边缘原生RAG管道

Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving

稀疏威胁,聚焦防御:临界感知强化学习以实现安全自动驾驶

PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism and Comprehensive AI Psychological Counselor

PsychEval:多次会谈和多治疗的高现实主义与综合人工智能心理咨询基准

Moments Matter:Stabilizing Policy Optimization using Return Distributions

时刻重要:利用收益分布稳定策略优化

DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs

DermoGPT:面向形态学基础皮肤病学推理MLLMs的开放权重与开放数据

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

代理记忆:学习大型语言模型代理的统一长期与短期记忆管理

Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning

基于偏好的强化学习中评估特征相关噪声

Distorted Distributional Policy Evaluation for Offline Reinforcement Learning

离线强化学习中的失真分布策略评估

Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation

用蓝图思考:通过结构化对象表示辅助视觉语言模型进行空间推理

GDRO: Group-level Reward Post-training Suitable for Diffusion Models

GDRO:适合扩散模型的组级训练后奖励

Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management

深度强化学习中的高阶动作正则化:从连续控制到建筑能源管理

MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics

MDAgent2:分子动力学中的代码生成与知识问答的大型语言模型

Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

熵适应微调:解决自信冲突以减轻遗忘

ACDZero: Graph-Embedding-Based Tree Search for Mastering Automated Cyber Defense

ACDZero:基于图嵌入的树搜索,助力掌握自动化网络防御

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

CORE:基于代码的逆向自训框架,支持虚拟代理图展开

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

NextFlow:统一顺序建模激活多模态理解与生成

Enabling Deep Reinforcement Learning Research for Energy Saving in Open RAN

支持开放RAN节能的深度强化学习研究

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

VAR RL 正确执行:解决视觉自回归生成中的异步策略冲突

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Talk2Move:场景中文本指令对象级几何变换的强化学习

Keyword: diffusion policy

Dichotomous Diffusion Policy Optimization

二分扩散策略优化

Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations

收缩扩散策略:通过微分方程的基于收缩评分抽样实现稳健的作用扩散