生成时间: 2025-12-18 16:32:07 (UTC+8); Arxiv 发布时间: 2025-12-18 20:00 EST (2025-12-19 09:00 UTC+8)

今天共有 21 篇相关文章

Keyword: reinforcement learning

SEMO: A Socio-Evolutionary Adaptive Optimization Framework for Dynamic Social Network Tie Management

SEMO:一种用于动态社交网络联系管理的社会进化自适应优化框架

A Bayesian latent class reinforcement learning framework to capture adaptive, feedback-driven travel behaviour

一个贝叶斯潜在类强化学习框架,用于捕捉适应性、反馈驱动的旅行行为

Quantum Decision Transformers (QDT): Synergistic Entanglement and Interference for Offline Reinforcement Learning

量子决策变换器(QDT):协同纠缠与干扰用于离线强化学习

Entropy-Reservoir Bregman Projection: An Information-Geometric Unification of Model Collapse

熵-储层布雷格曼投影:模型坍缩的信息几何统一

Puzzle Curriculum GRPO for Vision-Centric Reasoning

以愿景为中心推理的谜题课程GRPO

Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes

自适应划分与学习以随机控制扩散过程

Spectral Representation-based Reinforcement Learning

基于谱表示的强化学习

Beyond Fast and Slow: Cognitive-Inspired Elastic Reasoning for Large Language Models

超越快与慢:大型语言模型中的认知启发弹性推理

Automatic Reward Shaping from Multi-Objective Human Heuristics

多目标人类启发式的自动奖励塑造

Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning

超越多数投票:迈向更细粒度且更可靠的测试时强化学习奖励信号

EagleVision: A Dual-Stage Framework with BEV-grounding-based Chain-of-Thought for Spatial Intelligence

EagleVision:基于BEV接地的双阶段空间智能思维链框架

Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning

开始良好,半成:带前缀优化的强化学习用于大型语言模型推理

Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis

图上下文强化学习用于高效定向控制器综合

EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning

EUBRL:认识论不确定性导向贝叶斯强化学习

Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods

人工智能能否生成更全面的测试场景?自动驾驶系统测试场景生成方法综述

FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments

FM-EAC:基于模型的功能增强型演员批评器,用于动态环境中的多任务控制

Double Horizon Model-Based Policy Optimization

基于双重视界模型的策略优化

Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction

自回归语言模型是基于能量的秘密模型:洞察下一代币预测的前瞻性能力

Stepwise Think-Critique: A Unified Framework for Robust and Interpretable LLM Reasoning

逐步思考批判:一个统一框架,用于稳健且可理解的大型语言模型推理

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

大型语言模型能引导自己的探索吗?LLM推理中的梯度引导强化学习

Keyword: diffusion policy

ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision

ISS策略:带隐式场景监督的可扩展扩散策略