生成时间: 2025-11-10 16:31:38 (UTC+8); Arxiv 发布时间: 2025-11-10 20:00 EST (2025-11-11 09:00 UTC+8)

今天共有 25 篇相关文章

Keyword: reinforcement learning

Reasoning Up the Instruction Ladder for Controllable Language Models

推理可控语言模型的指令阶梯

NCSAC: Effective Neural Community Search via Attribute-augmented Conductance

NCSAC:通过属性增强电导进行有效的神经社区搜索

SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory

SMART-WRITE:相变存储器的基于自适应学习的写入能量优化

Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models

探索推理语言模型的强化学习中留下的数据

Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning

用于样本高效强化学习的量子玻尔兹曼机

FoodRL: A Reinforcement Learning Ensembling Framework For In-Kind Food Donation Forecasting

FoodRL:实物食品捐赠预测的强化学习集成框架

Self-Interest and Systemic Benefits: Emergence of Collective Rationality in Mixed Autonomy Traffic Through Deep Reinforcement Learning

自身利益与系统效益:通过深度强化学习在混合自主交通中出现集体理性的出现

You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models

你需要推理来学习推理:弱基模型中无标签 RL 的局限性

Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale

多智能体 Craftax:超大规模开放式多智能体强化学习基准测试

DeepForgeSeal: Latent Space-Driven Semi-Fragile Watermarking for Deepfake Detection Using Multi-Agent Adversarial Reinforcement Learning

DeepForgeSeal:使用多智能体对抗强化学习进行深度伪造检测的潜在空间驱动的半脆弱水印

Multi-agent Coordination via Flow Matching

通过流程匹配实现多代理协调

FM4Com: Foundation Model for Scene-Adaptive Communication Strategy Optimization

FM4Com:场景自适应通信策略优化的基础模型

Real-World Adverse Weather Image Restoration via Dual-Level Reinforcement Learning with High-Quality Cold Start

通过具有高质量冷启动的双级强化学习恢复真实世界的恶劣天气图像

Emergence from Emergence: Financial Market Simulation via Learning with Heterogeneous Preferences

从涌现中涌现:通过异质偏好学习进行金融市场模拟

An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones

一种端到端的深度强化学习方法,用于解决无人机旅行推销员问题

DeepEyesV2: Toward Agentic Multimodal Model

DeepEyesV2:走向代理多模态模型

Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models

反射式个性化优化:黑盒大型语言模型的事后重写框架

QUESTER: Query Specification for Generative Retrieval

QUESTER:生成检索的查询规范

TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework

TeaRAG:一种令牌高效的代理检索增强生成框架

PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization

PreResQ-R1:通过偏好-响应解缠策略优化,实现细粒度的秩与分数强化学习

Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction

在线交互的分布鲁棒非动力强化学习的样本复杂度

Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning

通过偏好自适应强化学习对对话系统进行少数族裔感知满意度估计

TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning

TimeSearch-R:通过自我验证强化学习实现长视频理解的自适应时间搜索

Visual Spatial Tuning

视觉空间调整

Keyword: diffusion policy

MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery

MoE-DP:一种用于具有技能分解和故障恢复的鲁棒长视野机器人作的 MoE 增强扩散策略