TEST-TIME TRAINING FOR MONOCULAR DEPTH ESTIMATION

Authors

DOI:

https://doi.org/10.31891/2307-5732-2025-359-94

Keywords:

deep learning, test-time training, monocular depth estimation

Abstract

Despite the significant technological advancements in deep learning models over the past few years, they still cannot be reliably applied to challenging tasks, such as monocular depth estimation, on real-world data. One of the key bottlenecks arises from difficulties in obtaining high-quality ground truth data, which is essential for current deep learning models. Supervised objectives are only as reliable as their targets: imprecise or biased labels distort the loss landscape, encouraging models to overfit spurious cues and hurting generalization. For depth estimation in particular,  the data comes from LiDAR/RGB-D projections, MVS reconstructions, ordinal labels, or synthetic renderings—each with characteristic failures: LiDAR is sparse and prone to motion/occlusion ghosting; RGB-D has holes and edge artifacts on shiny/transparent or distant surfaces. To alleviate the issue of a lack of real-world, quality-labeled data, the most successful contemporary models utilize artificially generated datasets, which introduce distribution shifts. Recently, there was a surge in research on test-time training methods, which can be successfully applied to the issue of adaptation to new data. In test-time training (TTT), the model updates a subset of its parameters (e.g., normalization statistics or lightweight heads) at inference using self-supervised auxiliary losses that do not require labels, such as consistency or reconstruction objectives. By aligning internal representations to the target distribution on-the-fly, TTT can reduce distribution shift and improve robustness without additional annotated data. This work sets as a goal to evaluate whether Time-Test Training methods could improve the results of the already pretrained monocular depth estimation model. Results suggest that improvements are indeed possible, but with a significant increase in inference time. Even though results improve on aggregate, there is uncertainty about whether the improvement will be achieved on some specific sample. In separate cases, the percentage of correctly placed points can improve from 0.1 to 0.9; however, on other samples, the score can go down from 0.9 to 0.5.

 

Published

2025-12-19

How to Cite

DARMOHRAI, S. (2025). TEST-TIME TRAINING FOR MONOCULAR DEPTH ESTIMATION. Herald of Khmelnytskyi National University. Technical Sciences, 359(6.2), 167-173. https://doi.org/10.31891/2307-5732-2025-359-94