生成时间: 2025-11-11 16:31:34 (UTC+8); Arxiv 发布时间: 2025-11-11 20:00 EST (2025-11-12 09:00 UTC+8)

今天共有 67 篇相关文章

Keyword: reinforcement learning

Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models

前瞻揭露引发扩散语言模型中的准确解码

CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling

CoPRIS:通过并发控制的部分推出和重要性采样实现高效稳定的强化学习

Distributionally Robust Self Paced Curriculum Reinforcement Learning

分布稳健自定进度课程强化学习

STAIR: Stability criterion for Time-windowed Assignment and Internal adversarial influence in Routing and decision-making

STAIR:时间窗口分配的稳定性准则以及路由和决策中的内部对抗影响

SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control

SymLight:探索交通信号控制的可解释和可部署符号策略

Evader-Agnostic Team-Based Pursuit Strategies in Partially-Observable Environments

与回避者无关的基于团队的部分可观察环境中的追捕策略

WAR-Re: Web API Recommendation with Semantic Reasoning

WAR-Re:具有语义推理的 Web API 推荐

Policy Gradient-Based EMT-in-the-Loop Learning to Mitigate Sub-Synchronous Control Interactions

基于策略梯度的EMT在环学习,缓解次同步控制交互

Learning-Based Multi-Stage Strategy for a Fixed-Wing Aircraft to Evade a Missile Detected at a Short Distance

基于学习的固定翼飞机躲避短距离探测导弹的多级策略

EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph

EGG-SR:通过相等图将符号等价嵌入符号回归

Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills

通过 VLM 计划原子技能演示进行温和纵策略学习

MCP-RiskCue: Can LLM infer risk information from MCP server System Logs?

MCP-RiskCue:LLM 能否从 MCP 服务器系统日志中推断出风险信息?

Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs

强化学习改进了法学硕士中层次知识的遍历

Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling

Klear-AgentForge:通过训练后扩展锻造代理智能

Adaptive Agent Selection and Interaction Network for Image-to-point cloud Registration

用于图点云配准的自适应智能体选择与交互网络

DWM-RO: Decentralized World Models with Reasoning Offloading for SWIPT-enabled Satellite-Terrestrial HetNets

DWM-RO:支持SWIPT的星地HetNet的具有推理卸载的去中心化世界模型

Revisiting Entropy in Reinforcement Learning for Large Reasoning Models

重温大型推理模型强化学习中的熵

Probe-and-Release Coordination of Platoons at Highway Bottlenecks with Unknown Parameters

参数未知的公路瓶颈处排的探放协调

ScRPO: From Errors to Insights

ScRPO:从错误到洞察

Approximating Shapley Explanations in Reinforcement Learning

强化学习中的近似 Shapley 解释

Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices

用于机械循环装置智能脱机的 Guardian-regular 化安全离线强化学习

A Deep Learning Model for Predicting Transformation Legality

一种用于预测转换合法性的深度学习模型

Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLMs

Maestro:学习通过多代理法学硕士的条件列表策略优化进行协作

When Object-Centric World Models Meet Policy Learning: From Pixels to Policies, and Where It Breaks

以对象为中心的世界模型何时满足策略学习:从像素到策略,以及它在哪里中断

MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning

MALinZero:掌握复杂多智能体规划的高效低维搜索

Elastic Data Transfer Optimization with Hybrid Reinforcement Learning

使用混合强化学习进行弹性数据传输优化

OpenVLN: Open-world aerial Vision-Language Navigation

OpenVLN:开放世界空中视觉语言导航

Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment

考虑信用分配的微观交通模拟中动态起点-目的地矩阵估计的深度强化学习

MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios

MrCoM:跨多场景泛化的元正则化世界模型

VideoSSR: Video Self-Supervised Reinforcement Learning

VideoSSR:视频自监督强化学习

What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models

是什么让推理无效:大型语言模型的回声反射缓解

Dynamic Electric Vehicle Charging Pricing for Load Balancing in Power Distribution Networks based on Collaborative DDPG Agents

基于协同DDPG代理的配电网负载均衡动态电动汽车充电定价

SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization

SofT-GRPO:通过Gumbel-Reparameterized软思维策略优化超越离散token法学强化学习

CG-TTRL: Context-Guided Test-Time Reinforcement Learning for On-Device Large Language Models

CG-TTRL:面向设备大型语言模型的上下文引导测试时强化学习

Sim-to-Real Transfer in Deep Reinforcement Learning for Bipedal Locomotion

双足运动深度强化学习中的模拟到实数转移

Brain-Inspired Planning for Better Generalization in Reinforcement Learning

在强化学习中实现更好泛化的类脑规划

Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models

放大漫画:区域感知 RL 提高了视觉语言模型中细粒度的漫画理解

SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports

SportR:体育多模态大语言模型推理的标杆

Adaptive PID Control for Robotic Systems via Hierarchical Meta-Learning and Reinforcement Learning with Physics-Based Data Augmentation

基于物理的数据增强,通过分层元学习和强化学习对机器人系统进行自适应 PID 控制

Practical Policy Distillation for Reinforcement Learning in Radio Access Networks

无线接入网强化学习的实用策略提炼

Underactuated Biomimetic Autonomous Underwater Vehicle for Ecosystem Monitoring

用于生态系统监测的欠驱动仿生自主水下航行器

GRAPH-GRPO-LEX: Contract Graph Modeling and Reinforcement Learning with Group Relative Policy Optimization

GRAPH-GRPO-LEX:具有组相对策略优化的契约图建模和强化学习

Secure Low-altitude Maritime Communications via Intelligent Jamming

通过智能干扰保护低空海上通信

Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View

从难度区分视角重新审视多模态后训练中的数据采样

Physically-Grounded Goal Imagination: Physics-Informed Variational Autoencoder for Self-Supervised Reinforcement Learning

物理基础目标想象:用于自监督强化学习的物理知情变分自动编码器

OntoTune: Ontology-Driven Learning for Query Optimization with Convolutional Models

OntoTune:本体驱动学习,使用卷积模型进行查询优化

Controllable Flow Matching for Online Reinforcement Learning

在线强化学习的可控流匹配

On The Presence of Double-Descent in Deep Reinforcement Learning

关于深度强化学习中双降的存在

Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

通过奖励函数优化强化学习微调基于扩散的推荐系统

Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning

学习专注:在部分可观察强化学习中优先考虑具有结构化注意力机制的信息历史

Learning Quantized Continuous Controllers for Integer Hardware

学习整数硬件的量化连续控制器

Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation

通过基于强化学习的自适应数据增强改进深度伪造检测

Multi-Agent Reinforcement Learning for Deadlock Handling among Autonomous Mobile Robots

用于自主移动机器人死锁处理的多智能体强化学习

Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture

两个头比一个好:将大型语言模型特征提炼成具有特征分解和混合的小模型

Dynamics-Decoupled Trajectory Alignment for Sim-to-Real Transfer in Reinforcement Learning for Autonomous Driving

自动驾驶强化学习中用于模拟到实数转移的动力学解耦轨迹对齐

Guiding Generative Models to Uncover Diverse and Novel Crystals via Reinforcement Learning

引导生成模型通过强化学习发现多样化和新颖的晶体

Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization

通过深度行为者批评稳定实现策略外模仿学习

Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search

使用自我游戏强化学习和测试时间搜索的 Stratego 超人 AI

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

RLVE:使用自适应可验证环境扩展语言模型的强化学习

FinRpt: Dataset, Evaluation System and LLM-based Multi-agent Framework for Equity Research Report Generation

FinRpt:用于股票研究报告生成的数据集、评估系统和基于 LLM 的多智能体框架

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

IterResearch:通过马尔可夫状态重建重新思考长视野代理

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Q-RAG:通过基于值的嵌入器训练进行长上下文多步骤检索

Grounding Computer Use Agents on Human Demonstrations

在人体演示中接地计算机使用代理

Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

Transformer 树推理后培训课程的可证明益处

Unified Humanoid Fall-Safety Policy from a Few Demonstrations

来自一些演示的统一人形坠落安全政策

Robot Learning from a Physical World Model

从物理世界模型中学习机器人

Keyword: diffusion policy

Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills

通过 VLM 计划原子技能演示进行温和纵策略学习