生成时间: 2025-10-24 16:28:09 (UTC+8); Arxiv 发布时间: 2025-10-24 20:00 EDT (2025-10-25 08:00 UTC+8)

今天共有 41 篇相关文章

Keyword: reinforcement learning

An Integrated Approach to Neural Architecture Search for Deep Q-Networks

深度 Q 网络的神经架构搜索集成方法

FairGRPO: Fair Reinforcement Learning for Equitable Clinical Reasoning

FairGRPO:公平临床推理的公平强化学习

Large Language Model enabled Mathematical Modeling

支持大型语言模型的数学建模

Robust Reinforcement Learning in Finance: Modeling Market Impact with Elliptic Uncertainty Sets

金融中的鲁棒强化学习:使用椭圆不确定性集对市场影响进行建模

Simultaneous learning of state-to-state minimum-time planning and control

同时学习状态到状态的最小时间规划和控制

SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph

SALT:通过轨迹图为长视野代理分配阶梯级优势分配

Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards

在延迟奖励下通过情境强化学习学习个性化广告影响

Enhancing Reasoning Skills in Small Persian Medical Language Models Can Outperform Large-Scale Data Training

增强小型波斯语医学语言模型的推理能力可以优于大规模数据训练

StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

StableSketcher:通过视觉问答反馈增强基于像素的草图生成的扩散模型

Competition is the key: A Game Theoretic Causal Discovery Approach

竞争是关键:博弈论因果发现方法

BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation

BoundRL:通过强化边界生成实现高效的结构化文本分割

Soft Switching Expert Policies for Controlling Systems with Uncertain Parameters

用于控制参数不确定系统的软开关专家策略

Reinforcement Learning-based Robust Wall Climbing Locomotion Controller in Ferromagnetic Environment

基于强化学习的铁磁环境下鲁棒爬壁运动控制器

Mixture-of-Minds: Multi-Agent Reinforcement Learning for Table Understanding

Mixture-of-Minds:用于表理解的多智能体强化学习

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

每个问题都有自己的价值:具有明确人类价值观的强化学习

Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents

具有优化确定性等价物的风险规避约束强化学习

High-order Interactions Modeling for Interpretable Multi-Agent Q-Learning

用于可解释多智能体 Q 学习的高阶交互建模

Multi-Objective Reinforcement Learning with Max-Min Criterion: A Game-Theoretic Approach

基于Max-Min准则的多目标强化学习:博弈论方法

Optimistic Task Inference for Behavior Foundation Models

行为基础模型的乐观任务推理

ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows

ResearchGPT:为端到端计算机科学研究工作流程对法学硕士进行基准测试和培训

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

UI-Ins:通过多视角指令即推理增强 GUI 基础

Moving or Predicting? RoleAware-MAPP: A Role-Aware Transformer Framework for Movable Antenna Position Prediction to Secure Wireless Communications

移动还是预测?RoleAware-MAPP:用于移动天线位置预测的角色感知变压器框架,以保护无线通信

Enhancing Security in Deep Reinforcement Learning: A Comprehensive Survey on Adversarial Attacks and Defenses

增强深度强化学习的安全性:对抗性攻击和防御的综合调查

Teaching Language Models to Reason with Tools

使用工具教授语言模型进行推理

Multi-Modal Decentralized Reinforcement Learning for Modular Reconfigurable Lunar Robots

模块化可重构月球机器人的多模态分散强化学习

Ask a Strong LLM Judge when Your Reward Model is Uncertain

当您的奖励模型不确定时,请问一位强大的 LLM 评委

NeuralTouch: Neural Descriptors for Precise Sim-to-Real Tactile Robot Control

NeuralTouch:用于精确模拟到真实触觉机器人控制的神经描述符

Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control

平衡专业化与集中化:面向顺序工业控制的多智能体强化学习基准

Why DPO is a Misspecified Estimator and How to Fix It

为什么 DPO 是一个错误指定的估计器以及如何修复它

LM-mixup: Text Data Augmentation via Language Model based Mixup

LM-mixup:通过基于语言模型的混合进行文本数据增强

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

柯南:在多尺度视觉证据上像侦探一样逐步学习推理

A Unified Framework for Zero-Shot Reinforcement Learning

零样本强化学习的统一框架

GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning

GlobalRAG:通过强化学习增强多跳问答中的全局推理

AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN

AdaDoS:SDN中深度对抗强化学习的自适应DoS攻击

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Open-o3 视频:具有明确时空证据的接地视频推理

The Shape of Reasoning: Topological Analysis of Reasoning Traces in Large Language Models

推理的形状:大语言模型中推理痕迹的拓扑分析

Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs

计划然后检索:强化学习引导的知识图谱复杂推理

Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

基于模型预测控制和强化学习的四足动物实时步态适应

No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

使用高斯过程进行有限视界马尔可夫决策过程的无后悔汤普森采样

GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation

GSWorld:用于机器人作的闭环逼真仿真套件

KL-Regularized Reinforcement Learning is Designed to Mode Collapse

KL 正则化强化学习旨在模态崩溃

Keyword: diffusion policy

There is no result