生成时间: 2026-04-02 17:00:51 (UTC+8); Arxiv 发布时间: 2026-04-02 20:00 EDT (2026-04-03 08:00 UTC+8)

今天共有 36 篇相关文章

Keyword: reinforcement learning

MSA-Thinker: Discrimination-Calibration Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis

MSA-Thinker:多模态情感分析中的区分-校准推理与提示引导强化学习

Generalizable Dense Reward for Long-Horizon Robotic Tasks

对长期机器人任务的可推广密集奖励

Evolution Strategies for Deep RL pretraining

深度强化学习预训练的进化策略

Learning to Play Blackjack: A Curriculum Learning Perspective

学习玩二十一点:课程学习视角

Finite-Time Analysis of Projected Two-Time-Scale Stochastic Approximation

预测两时间尺度随机近似的有限时间分析

Offline Constrained RLHF with Multiple Preference Oracles

带有多优先预言机的离线约束RLHF

Scalable machine learning-based approaches for energy saving in densely deployed Open RAN

基于机器学习的可扩展方法,在密集部署的开放无线网络中实现节能方法

Autonomous Adaptive Solver Selection for Chemistry Integration via Reinforcement Learning

通过强化学习实现化学集成的自主自适应求解器选择

Certified Set Convergence for Piecewise Affine Systems via Neural Lyapunov Functions

通过神经里雅普诺夫函数实现分段仿射系统的认证集合收敛

Agent Q-Mix: Selecting the Right Action for LLM Multi-Agent Systems through Reinforcement Learning

Agent Q-Mix:通过强化学习选择LLM多智能体系统的正确动作

GUIDE: Reinforcement Learning for Behavioral Action Support in Type 1 Diabetes

指南:1型糖尿病行为行动支持的强化学习

Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games

部分可观测马尔可夫势博弈的内部基于状态的策略梯度方法

TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning

TR-ICRL:情境强化学习中的测试时间再思考

Execution-Verified Reinforcement Learning for Optimization Modeling

优化建模的执行验证强化学习

All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models

《所有道路通向罗马:在视觉语言模型中激励发散性思维》

A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation

一个基于推理的视觉语言基础模型用于胸部X光解读

MOON3.0: Reasoning-aware Multimodal Representation Learning for E-commerce Product Understanding

MOON3.0:推理感知多模态表示学习,用于电子商务产品理解

AceTone: Bridging Words and Colors for Conditional Image Grading

AceTone:连接词语与颜色以实现条件图像评分

Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation

Optimsyn:影响引导评分标准优化合成数据生成

Toward Efficient Deployment and Synchronization in Digital Twins-Empowered Networks

迈向数字孪生赋能网络的高效部署与同步

A Physical Imitation Learning Pipeline for Energy-Efficient Quadruped Locomotion Assisted by Parallel Elastic Joint

一个物理模仿学习管道,用于节能四足行走,辅助平行弹性关节

Full-Gradient Successor Feature Representations

全梯度继任特征表示

TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning

TTA-Vid:视频推理的广义测试时间适应

Learning to Hint for Reinforcement Learning

学习提示以促进强化学习

LangMARL: Natural Language Multi-Agent Reinforcement Learning

LangMARL:自然语言多智能体强化学习

RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

RefineRL:通过自我完善强化学习推进竞技编程

Bridging RL and MPC for mixed-integer optimal control with application to Formula 1 race strategies

将强化学习(RL)和MPC桥接,实现混合整数最优控制,并应用于一级方程式赛车策略

Disentangling to Re-couple: Resolving the Similarity-Controllability Paradox in Subject-Driven Text-to-Image Generation

解开纠缠与重新耦合:解决主体驱动文本到图像生成中的相似性-可控悖论

Policy Improvement Reinforcement Learning

策略改进强化学习

Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization

基于流的策略结合分布强化学习在轨迹优化中的应用

Query-Conditioned Evidential Keyframe Sampling for MLLM-Based Long-Form Video Understanding

基于MLLM的长视频理解中的查询条件证据帧抽样

Adversarial Attacks in AI-Driven RAN Slicing: SLA Violations and Recovery

AI驱动的RAN切片中的对抗性攻击:SLA违规与恢复

BAT: Balancing Agility and Stability via Online Policy Switching for Long-Horizon Whole-Body Humanoid Control

BAT:通过在线策略切换平衡敏捷性与稳定性,实现长期全身人形控制

Multi-Agent LLM Governance for Safe Two-Timescale Reinforcement Learning in SDN-IoT Defense

SDN-IoT 防御中安全两时间尺度强化学习的多代理大型语言模型治理

Deep Reinforcement Learning for Robotic Manipulation under Distribution Shift with Bounded Extremum Seeking

具有有界极值寻寻的分布转移下机器人操作的深度强化学习

Embarrassingly Simple Self-Distillation Improves Code Generation

令人尴尬的简单自蒸馏提升了代码生成

Keyword: diffusion policy

There is no result