生成时间: 2026-01-22 16:36:41 (UTC+8); Arxiv 发布时间: 2026-01-22 20:00 EST (2026-01-23 09:00 UTC+8)

今天共有 28 篇相关文章

Keyword: reinforcement learning

Beyond Affinity: A Benchmark of 1D, 2D, and 3D Methods Reveals Critical Trade-offs in Structure-Based Drug Design

《超越亲和力:一维、二维和三维方法基调》揭示了基于结构的药物设计中的关键权衡

Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree

系统发育树上的大型语言模型驱动进化代码优化

Towards Execution-Grounded Automated AI Research

迈向基于执行的自动化人工智能研究

Report for NSF Workshop on AI for Electronic Design Automation

美国国家科学基金会电子设计自动化人工智能研讨会报告

Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

奖励模型的教学思维方式:整合教学推理与思维,为教育中的LLMs提供奖励

Learning Consistent Taxonomic Classification through Hierarchical Reasoning

通过层级推理学习一致的分类分类

SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation

SearchGym:通过经济高效且高保真度的环境模拟,自力更生地创建真实世界的搜索代理

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

MAS-Orchestra:通过整体调度和受控基准来理解和提升多智能体推理

FARE: Fast-Slow Agentic Robotic Exploration

FARE:快速-缓慢智能机器人探索

Beyond Error-Based Optimization: Experience-Driven Symbolic Regression with Goal-Conditioned Reinforcement Learning

超越基于错误的优化:基于经验的符号回归与目标条件强化学习

CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation

CoScale-RL:通过数据和计算的共尺度实现高效的后训练

DARL: Encouraging Diverse Answers for General Reasoning without Verifiers

DARL:鼓励多样化的答案以进行一般推理,无需验证者

Proximal Policy Optimization with Evolutionary Mutations

带有进化突变的近端策略优化

Case-Guided Sequential Assay Planning in Drug Discovery

药物发现中的病例引导顺序检测计划

DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs

DARA:通过上下文决策与强化学习精细调优的大型语言模型实现在线广告中的少数样本预算分配

PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning

PCL-Reasoner-V1.5:通过离线强化学习推进数学推理

CI4A: Semantic Component Interfaces for Agents Empowering Web Automation

CI4A:赋能网络自动化的代理语义组件接口

What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study

是什么让低位量化感知训练对推理型大型语言模型有效?系统性研究

Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation

多语言检索增强生成的语言耦合强化学习

Improving Regret Approximation for Unsupervised Dynamic Environment Generation

改进无监督动态环境生成的遗憾近似

Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control

大规模流量控制强化学习算法的即插即用基准测试

A Curriculum-Based Deep Reinforcement Learning Framework for the Electric Vehicle Routing Problem

基于课程的电动车路由问题深度强化学习框架

Memory Retention Is Not Enough to Master Memory Tasks in Reinforcement Learning

记忆保持不足以掌握强化学习中的记忆任务

Vehicle Routing with Finite Time Horizon using Deep Reinforcement Learning with Improved Network Embedding

利用深度强化学习和改进网络嵌入实现有限时间视野的车辆路由

CLEANER: Self-Purified Trajectories Boost Agentic Reinforcement Learning

清洁剂:自我净化轨迹增强能动强化学习

Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data

基于结果的强化学习可以证明地引导变换金器讲理,但前提是数据正确

Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

知识图谱是隐性奖励模型:路径导出信号使组合推理成为可能

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

灵活性陷阱:为何任意顺序限制扩散语言模型中的推理潜力

Keyword: diffusion policy

There is no result