生成时间: 2025-12-29 16:34:30 (UTC+8); Arxiv 发布时间: 2025-12-29 20:00 EST (2025-12-30 09:00 UTC+8)

今天共有 18 篇相关文章

Keyword: reinforcement learning

CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation

CosmoCore-Evo:进化梦境重放强化学习用于自适应代码生成

A Reinforcement Learning Approach to Synthetic Data Generation

合成数据生成的强化学习方法

A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning

关于带有强化学习的新鲜感感知无线网络的综述

dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning

dUltra:通过强化学习实现超快速扩散语言模型

DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO

DiverseGRPO:通过多样性感知GRPO缓解图像生成中的模式崩溃

Generative Actor Critic

生成演员评论家

Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

牵引绳:自适应长度惩罚与奖励塑造以实现高效大型推理模型

Towards Learning-Based Formula 1 Race Strategies

迈向基于学习的一级方程式比赛策略

Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations

视频是示例高效监督:通过潜在表示从视频中进行行为克隆

Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards

重新思考带有可验证奖励的强化学习中的样本极性

Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search

基于先验的变异树保单用于蒙特卡洛树木搜索

Multiconnectivity for SAGIN: Current Trends, Challenges, AI-driven Solutions, and Opportunities

SAGIN的多元互联:当前趋势、挑战、AI驱动解决方案与机遇

Q-A3C2: Quantum Reinforcement Learning with Time-Series Dynamic Clustering for Adaptive ETF Stock Selection

Q-A3C2:基于时间序列动态聚类的量子强化学习用于自适应ETF股票选择

A Comedy of Estimators: On KL Regularization in RL Training of LLMs

估计器的喜剧:关于强化学习训练中LLM正则化的基层

SWE-RM: Execution-free Feedback For Software Engineering Agents

SWE-RM:软件工程代理的无执行反馈

Latency-Optimal Cache-aided Multicast Streaming via Forward-Backward Reinforcement Learning

通过正向强化学习实现延迟最优缓存辅助多播流

Meta-Learning-Based Handover Management in NextG O-RAN

NextG O-RAN中的基于元学习的切换管理

Keyword: diffusion policy

Flexible Multitask Learning with Factorized Diffusion Policy

灵活多任务学习与分解扩散策略