Home / Articles
| Self Improving AI Systems Using Reinforcement Learning from Model Disagreement |
|
|
Author Name DHANUSHKODI KALIMUTHU L, MADHUARAVIND S, SAKTHIVEL A, SARAN KUMAR R, VARUNVEL R Abstract Reinforcement learning has been widely adopted to refine machine learning models beyond supervised training. In recent years, reinforcement learning from human feedback (RLHF) has demonstrated strong performance improvements in natural language processing and decision systems. However, RLHF depends heavily on manual annotations, which are costly, time-intensive, and potentially inconsistent. This paper introduces a framework termed Reinforcement Learning from Model Disagreement (RLMD), in which predictive divergence among independently trained peer models is used as an intrinsic reward signal. Instead of relying on external supervision, the proposed approach interprets disagreement as an indicator of epistemic uncertainty. A regularized reward function is formulated to balance exploration and prediction stability. The framework is implemented for a text classification task and evaluated against a supervised baseline. Experimental results indicate that RLMD improves calibration and robustness while maintaining competitive predictive accuracy. The findings suggest that structured model disagreement can serve as a viable alternative to human-driven reinforcement in constrained research settings. Index Terms—Reinforcement learning, model disagreement, uncertainty estimation, ensemble learning, intrinsic reward, policy gradient optimization, epistemic uncertainty, model calibration, autonomous learning systems, text classification Published On : 2026-03-04 Article Download :
|
|



