To remedy these issues, this paper proposes a high-order feature and attention-assisted CLIP model HoCLIP for monocular depth estimation. Specifically, with the CLIP model as the backbone, Matrix ...