نشریه علوم زمین خوارزمی

نشریه علوم زمین خوارزمی

تخمین عمق و بازسازی سه‌بعدی از تک تصویر مبتنی بر مدل یادگیری عمیق MiDaS

نویسندگان
1 دانشگاه شهید بهشتی
2 دانشگاه خواجه نصیرالدین طوسی
چکیده
بازسازی سه‌بعدی نقش مهمی در نقشه‌برداری و فتوگرامتری برد کوتاه ایفا می‌کند و به استخراج دقیق اطلاعات هندسی از اشیا و محیط اطراف کمک می‌نماید. روش‌های مرسوم این حوزه معمولاً به تصاویر چندنمایی و داده‌های موقعیت و زاویه تصویربرداری نیاز دارند که در برخی کاربردهای عملی با محدودیت‌هایی همراه است. در این پژوهش، روشی نوین مبتنی بر مدل یادگیری عمیق MiDas، یکی از معماری‌های دقیق تخمین عمق تک‌نما، معرفی شده است که قادر است تنها با یک تصویر دوبعدی، نقشه عمق نسبی تولید کند. سپس با بهره‌گیری از الگوریتم بازسازی سطح پواسون (Poisson Surface Reconstruction)، مدل سه‌بعدی نهایی بدون نیاز به اطلاعات مکانی یا زاویه دوربین استخراج می‌شود. برای ارزیابی عملکرد روش پیشنهادی، مدل سه‌بعدی حاصل با مدل مرجع تولید شده توسط روش رایج فتوگرامتری مقایسه شد. نتایج نشان داد که خطای میانگین مربعات (RMSE) برابر با 0.775 سانتی‌متر است که دقت مناسب و قابلیت اعتماد روش پیشنهادی را در شرایط محدودیت داده‌های چندنمایی تأیید می‌کند. این مطالعه ظرفیت بالای مدل‌های یادگیری عمیق مانند MiDas را در بازسازی سه‌بعدی و کاربردهای نقشه‌برداری و فتوگرامتری نشان می‌دهد و اشاره می‌کند که استفاده از نسخه‌های پیشرفته‌تر همچون DPT می‌تواند دقت نتایج را در پژوهش‌ها و کاربردهای آینده بهبود بخشد.
کلیدواژه‌ها

عنوان مقاله English

Depth estimation and 3D reconstruction from a single image based on the MiDaS deep learning model

نویسندگان English

Mahdi Farhangi 1
Asghar Milan 1
Gholamreza Fallahi 1
Ehsan Khankeshi-Zadeh 2
1 Shahid Beheshti University
2 K. N. Toosi University of Technology
چکیده English

3D reconstruction plays an important role in surveying and close-range photogrammetry, facilitating the accurate extraction of geometric information from objects and their surrounding environment. However, conventional methods in this field typically require multi-view images along with positional and angular data of the camera, which can pose limitations in certain practical applications. This study introduces a novel approach based on the MiDaS deep learning model, one of the most accurate architectures for monocular depth estimation, which is capable of generating a relative depth map from a single 2D image. The final 3D model is then extracted using the Poisson Surface Reconstruction algorithm, without the need for spatial information or camera orientation data. To evaluate the performance of the proposed method, the resulting 3D model was compared against a reference model produced by the conventional photogrammetry method. The results showed a Root Mean Square Error (RMSE) of 0.775 centimeters, confirming the appropriate accuracy and reliability of the proposed approach under multi-view data limitations. This study demonstrates the high potential of deep learning models like MiDaS in 3D reconstruction and surveying applications, and highlights that using more advanced versions such as DPT could further improve accuracy in future research and applications.

کلیدواژه‌ها English

Depth Estimation
3D Reconstruction
Single Image
Deep Learning
Photogrammetry
Machine Vision
Cazals, F., Giesen, J., 2004. Delaunay Triangulation Based Surface Reconstruction. Ideas and Algorithms. Institut national de recherche en informatique et en automatique 1–45.
Choe, J., Im, S., Rameau, F., Kang, M., Kweon, I.S., 2021. VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction. Proceedings of the IEEE International Conference on Computer Vision 16066–16075.
Chu, P.M., Sung, Y., Cho, K., 2019. Generative Adversarial Network-Based Method for Transforming Single RGB Image into 3D Point Cloud. IEEE Access 7, 1021–1029.
Eigen, D., Puhrsch, C., Fergus, R., 2014. Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems 3, 2366–2374.
Eldesokey, A., Felsberg, M., Khan, F.S., 2020. Confidence Propagation through CNNs for Guided Sparse Depth Regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2423–2436.
Fusiello, A., 2024. Computer Vision: Three-dimensional Reconstruction Techniques. Springer International Publishing Cham.
Häufungsanalyse, C., Möller, P.R., n.d. Non-Standard-Datenbanken und Data Mining Übersicht.
Hermann, M., Ruf, B., Weinmann, M., Hinz, S., 2020. Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences 5, 357–364.
Hristova, H., Abegg, M., Fischer, C., Rehush, N., 2022. Monocular Depth Estimation in Forest Environments. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 43, 1017–1023.
Kazhdan, M., Hoppe, H., 2023. Distributed Poisson Surface Reconstruction. Computer Graphics Forum 42.
Khan, F., Salahuddin, S., Javidnia, H., 2020. Deep learning-based monocular depth estimation methods—a state-of-the-art review. Sensors (Switzerland) 20, 1–16.
Lunscher, N., Zelek, J., 2018. Deep learning whole body point cloud scans from a single depth map. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 1208–1215.
Ming, Y., Meng, X., Fan, C., Yu, H., 2021. Deep learning for monocular depth estimation: A review. Neurocomputing 438, 14–33
Najaf, M., Arefi, H., Amirkolaee, H.A., Farajelahi, B., 2023. Monocular Depth Estimation of Google Earth Images Using Convolutional Neural Networks. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences 10, 589–594.
Nolasco, C., Jácome, N.J., Hurtado-Lugo, N.A., 2020. Applications of the Poisson and diffusion equations to materials science. Journal of Physics: Conference Series 1587.
Owen, T., 1994. Three-Dimensional Computer Vision: A Geometric Viewpoint by Olivier Faugeras The MIT Press, London, UK, ISBN 0–262–06158–9, 1993, 663 pages incl index (£58·50). Robotica 12, 475–475.
Ozden, K.E., Schindler, K., van Gool, L., 2007. Simultaneous Segmentation and 3D Reconstruction of Monocular Image Sequences, in: 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1–8.
Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Gool, L. Van, Yu, F., 2024. UniDepth: Universal Monocular Metric Depth Estimation, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 10106–10116.
Puscas, M.M., Xu, D., Pilzer, A., Sebe, N., 2019. Structured Coupled Generative Adversarial Networks for Unsupervised Monocular Depth Estimation. Proceedings of the 2019 International Conference on 3D Vision (3DV) 18–26.
Rajapaksha, U., Sohel, F., Laga, H., Diepeveen, D., Bennamoun, M., 2024. Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey. ACM Computing Surveys 56, 1–51.
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V., 2020. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer XX, 1–14.
Salzmann, M., Fua, P., 2010. Deformable Surface 3D Reconstruction from Monocular Images. Synthesis Lectures on Computer Vision 2, 1–113.
Saxena, A., Chung, S.H., Ng, A.Y., 2008. 3-D depth reconstruction from a single still image. International Journal of Computer Vision 76, 53–69.
Saxena, A., Sun, M., Ng, A.Y., 2007. 3-D reconstruction from sparse views using monocular vision. Proceedings of the IEEE International Conference on Computer Vision.
Silberman, N., Hoiem, D., Kohli, P., Fergus, R., 2012. Indoor segmentation and support inference from RGBD images. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 7576 LNCS, 746–760.
Wandt, B., Ackermann, H., Rosenhahn, B., 2016. 3D Reconstruction of human motion from monocular image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 1505–1516.
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 600–612.
Welponer, M., Stathopoulou, E.K., Remondino, F., 2022. Monocular Depth Prediction in Photogrammetric Applications. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 43, 469–476.
Zhang Ximin, Liu Ran, Zhou Yiyuan, Wan Wanggen, Lu Libing, 2013. Normal estimation algorithm for point cloud using KD-tree, in: IET International Conference on Smart and Sustainable City 2013 (ICSSC 2013). Institution of Engineering and Technology 286–289.