生成时间: 2025-10-14 16:32:01 (UTC+8); Arxiv 发布时间: 2025-10-14 20:00 EDT (2025-10-15 08:00 UTC+8)

今天共有 94 篇相关文章

Keyword: reinforcement learning

Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation

大语言模型时代的表问答:任务、方法与评估的综合调查

A Multi-Component Reward Function with Policy Gradient for Automated Feature Selection with Dynamic Regularization and Bias Mitigation

一种具有策略梯度的多分量奖励函数,用于具有动态正则化和偏差缓解的自动特征选择

ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting

ARROW:一种用于全球天气预报的自适应推出和路由方法

WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions

WARC-Bench:基于 Web 存档的 GUI 子任务执行基准测试

Abductive Preference Learning

归纳偏好学习

Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective

结构化协同多智能体强化学习:贝叶斯网络视角

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

视觉-语言-行动模型流匹配策略的强化微调

ATRos: Learning Energy-Efficient Agile Locomotion for Wheeled-legged Robots

ATRos:为轮腿机器人学习节能敏捷运动

RIPRAG: Hack a Black-box Retrieval-Augmented Generation Question-Answering System with Reinforcement Learning

RIPRAG:破解强化学习黑盒检索增强生成问答系统

Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

超越单个查询的限制:使用强化学习训练 LLM 以进行查询扩展

Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

通过LLM增强优化在无人机支持的低空经济网络中实现高效的机载视觉语言推理

Experience-Efficient Model-Free Deep Reinforcement Learning Using Pre-Training

使用预训练进行体验高效的无模型深度强化学习

Think Twice to See More: Iterative Visual Reasoning in Medical VLMs

三思而后行,了解更多:医疗 VLM 中的迭代视觉推理

One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

One4Many-StablePacker:一种针对3D箱包装问题的高效深度强化学习框架

Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

Unilaw-R1:一种基于强化学习和迭代推理的法律推理大语言模型

Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models

多模态大语言模型的答案一致思维链强化学习

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

重新思考 RLVR 中的熵干预:熵变化视角

Dejavu: Post-Deployment Learning for Embodied Agents via Experience Feedback

Dejavu:通过经验反馈对具身代理进行部署后学习

Don't Just Fine-tune the Agent, Tune the Environment

不要只是微调代理,而是调整环境

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

RLFR:使用流环境扩展法学硕士的强化学习

Performance Index Shaping for Closed-loop Optimal Control

用于闭环优化控制的性能指标塑造

Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning

自适应双推理器:大型推理模型可以通过混合推理进行高效思考

SGM: A Statistical Godel Machine for Risk-Controlled Recursive Self-Modification

SGM:用于风险控制递归自修改的统计 Godel 机器

Reasoning-Enhanced Large Language Models for Molecular Property Prediction

推理增强型大语言模型用于分子性质预测

Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting

通过事后诸葛亮轨迹重写在 LM 代理中进行样本高效的在线学习

Towards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework

基于软行为者-批评框架的双阿克曼转向机器人的安全纵

RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation

RECON:用于高效检索增强生成的冷凝推理

Data-driven simulator of multi-animal behavior with unknown dynamics via offline and online reinforcement learning

通过离线和在线强化学习对动态未知的多动物行为进行数据驱动模拟器

Towards Dynamic Quadrupedal Gaits: A Symmetry-Guided RL Hierarchy Enables Free Gait Transitions at Varying Speeds

迈向动态四足步态:对称引导的 RL 层次结构可在不同速度下实现自由步态转换

MARS-Sep: Multimodal-Aligned Reinforced Sound Separation

MARS-Sep:多模态对齐的增强声音分离

A Hybrid Machine Learning Approach for Synthetic Data Generation with Post Hoc Calibration for Clinical Tabular Datasets

一种混合机器学习方法,用于合成数据生成,对临床表格数据集进行事后校准

Population-Coded Spiking Neural Networks for High-Dimensional Robotic Control

用于高维机器人控制的群体编码尖峰神经网络

Reinforced Domain Selection for Continuous Domain Adaptation

用于连续域适应的强化域选择

Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?

重新思考RL评估:基准测试能否真正揭示RL方法的失败?

PAC-Bayesian Reinforcement Learning Trains Generalizable Policies

PAC-贝叶斯强化学习训练可推广的策略

Reinforcement Learning-based Dynamic Adaptation for Sampling-Based Motion Planning in Agile Autonomous Driving

基于强化学习的敏捷自动驾驶中基于采样的运动规划动态自适应

AQORA: A Learned Adaptive Query Optimizer for Spark SQL

AQORA:用于 Spark SQL 的学习自适应查询优化器

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models

ViSurf:大型视觉和语言模型的视觉监督和强化微调

OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment

OmniQuality-R:通过全方位的质量评估推进奖励模型

Assessing Policy Updates: Toward Trust-Preserving Intelligent User Interfaces

评估策略更新:实现可信任的智能用户界面

Collaborative Text-to-Image Generation via Multi-Agent Reinforcement Learning and Semantic Fusion

通过多智能体强化学习和语义融合进行协作文本到图像生成

Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems

通过 LLM 引导的按需移动系统的目标演化进行分层优化

Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning

解锁 RLVR 中的探索:不确定性感知优势塑造以进行更深入的推理

RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

RePro:训练语言模型以忠实地回收 Web 进行预训练

Digital Twin-enabled Multi-generation Control Co-Design with Deep Reinforcement Learning

支持数字孪生的多代控制协同设计与深度强化学习

Understanding Sampler Stochasticity in Training Diffusion Models for RLHF

了解 RLHF 训练扩散模型中的采样器随机性

LLM-Empowered Agentic MAC Protocols: A Dynamic Stackelberg Game Approach

LLM 赋能的代理 MAC 协议:动态 Stackelberg 博弈方法

PoU: Proof-of-Use to Counter Tool-Call Hacking in DeepResearch Agents

PoU:用于对抗 DeepResearch 代理中工具调用黑客攻击的使用证明

Neutral Agent-based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems

多方开放系统中基于中立代理的对抗性策略学习与深度强化学习

Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

重新发现熵正则化:自适应系数释放其法学硕士强化学习的潜力

Game-Theoretic Risk-Shaped Reinforcement Learning for Safe Autonomous Driving

博弈论风险形强化学习实现安全自动驾驶

APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport

APLOT:通过具有最佳传输的自适应偏好学习进行鲁棒奖励建模

RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection

RV-HATE:用于隐性仇恨言论检测的强化多模块投票

Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning

通过选择性关键标记微调增强大型语言模型推理

Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph

Video-STR:通过关系图强化视频时空推理中的MLLM

GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation

GeoVLMath:通过辅助线创建的跨模态奖励增强视觉语言模型中的几何推理

Unveiling Uncertainty-Aware Autonomous Cooperative Learning Based Planning Strategy

揭示基于不确定性的自主合作学习规划策略

Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs

共同强大:协作法学硕士的政策强化学习

A Primer on SO(3) Action Representations in Deep Reinforcement Learning

深度强化学习中的SO(3)动作表示入门

Graph Neural Network-Based Multicast Routing for On-Demand Streaming Services in 6G Networks

基于图神经网络的组播路由,用于6G网络中点播流服务

Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM

通过强化学习微调LLM完善CVRP的混合遗传搜索

Emergence of hybrid computational dynamics through reinforcement learning

通过强化学习出现混合计算动力学

Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?

工具集成强化学习能否在不同领域推广?

Aligning Deep Implicit Preferences by Learning to Reason Defensively

通过学习防御性推理来调整深层内隐偏好

Vision-LLMs for Spatiotemporal Traffic Forecasting

用于时空交通预测的视觉法学硕士

Gym-TORAX: Open-source software for integrating RL with plasma control simulators

Gym-TORAX:用于将 RL 与等离子体控制模拟器集成的开源软件

FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks

FOSSIL:利用对次优样本的反馈,通过模仿学习实现具身视觉和语言任务的数据高效泛化

Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony

第二部分:ROLL Flash -- 通过异步加速 RLVR 和代理训练

Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment

推理作为表征:重新思考图像质量评估中的视觉强化学习

Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers

通过对齐训练和推理路由器来稳定 MoE 强化学习

KnowRL: Teaching Language Models to Know What They Know

KnowRL:教语言模型知道他们知道什么

Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games

自动驾驶汽车需要社会意识才能在多智能体强化学习路由游戏中找到最优

From to : Multidimensional Supervision of Reasoning Process for LLM Optimization

From to : LLM优化推理过程的多维度监督

Unifying Deductive and Abductive Reasoning in Knowledge Graphs with Masked Diffusion Model

使用掩蔽扩散模型统一知识图谱中的演绎和归纳推理

Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning

基于分层多智能体强化学习的现实空战中的协调策略

Constraint-Aware Reinforcement Learning via Adaptive Action Scaling

通过自适应动作缩放的约束感知强化学习

How Reinforcement Learning After Next-Token Prediction Facilitates Learning

下一个标记预测后的强化学习如何促进学习

ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

ReLook:基于视觉的 RL 与代理 Web 编码的多模态 LLM 批评者

Offline Reinforcement Learning with Generative Trajectory Policies

使用生成轨迹策略的离线强化学习

Context-Aware Model-Based Reinforcement Learning for Autonomous Racing

基于情境感知模型的自动驾驶赛车强化学习

A Physics-Informed Reinforcement Learning Approach for Degradation-Aware Long-Term Charging Optimization in Batteries

一种基于物理的强化学习方法,用于电池退化感知的长期充电优化

A Flexible Multi-Agent Deep Reinforcement Learning Framework for Dynamic Routing and Scheduling of Latency-Critical Services

一种灵活的多智能体深度强化学习框架,用于延迟关键型服务的动态路由和调度

NaviGait: Navigating Dynamically Feasible Gait Libraries using Deep Reinforcement Learning

NaviGait:使用深度强化学习导航动态可行的步态库

MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model

MATH-Beyond:RL 超越基本模型的基准

SR-Scientist: Scientific Equation Discovery With Agentic AI

SR-Scientist:使用代理人工智能发现科学方程

Ego-Vision World Model for Humanoid Contact Planning

人形接触规划的自我视觉世界模型

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

扩散大型语言模型内存高效RL的边界引导策略优化

Representation-Based Exploration for Language Models: From Test-Time to Post-Training

基于表示的语言模型探索:从测试时间到训练后

Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

Phys2Real:将 VLM 先验与交互式在线适应融合在一起,以实现不确定性感知的模拟到实数作

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

QeRL:超越效率——面向法学硕士的量化增强强化学习

Demystifying Reinforcement Learning in Agentic Reasoning

揭秘智能体推理中的强化学习

Reinforced sequential Monte Carlo for amortised sampling

用于摊销抽样的强化顺序蒙特卡洛

Keyword: diffusion policy

Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks

通过无分类器指导增强扩散策略,用于时态机器人任务

Understanding Sampler Stochasticity in Training Diffusion Models for RLHF

了解 RLHF 训练扩散模型中的采样器随机性