Preview

Siberian Journal of Clinical and Experimental Medicine

Advanced search

Study of images in open datasets of the ocular fundus in diabetic retinopathy designed for training neural network algorithms

https://doi.org/10.29001/2073-8552-2025-40-1-218-225

Abstract

Diabetes mellitus is a common disabling disease that, without proper treatment, leads to visual impairment and blindness. This paper presents the analysis of duplicate and modified images in open datasets (datasets that can be freely downloaded on the Internet) containing ocular fundus images with manifestations of diabetic retinopathy.

Aim: To determine the quality and suitability of open datasets available for the query "diabetic retinopathy" on the Kaggle.com platform for use in training machine learning models.

Material and Methods. More than 100 open data sources were analyzed with the total number of ocular fundus images with diabetic retinopathy amounting to almost 2 million. The images were examined by analyzing the hash sums of the files obtained with the SHA-3 algorithm and comparing the file names between the original and resized images.

Results. The study showed that duplicate images were quite common, with a maximum of up to 14 repetitions in different datasets. It was found that 56% of all images are repeated at least twice in different datasets. Authors also searched for modified images, i.e., resized images. The analysis found 9 datasets with such images, which is 24% of the total number of images in the database.

Conclusion. The authors of the article note that the obtained results can be used to optimize the training process and improve the quality of computer vision algorithms in ophthalmology. They also point out the need to develop measures to prevent duplication and modification of images in datasets to ensure their high quality and reliability of neural network model training results, as the creation of datasets without standardization and verification will not lead to improved machine learning results.

About the Authors

A. I. Bursov
Ivannikov Institute for System Programming of the Russian Academy of Sciences (ISP RAS); Patrice Lumumba Peoples Friendship University of Russia (RUDN)
Russian Federation

Andrey I. Bursov, Digital Medicine Advisor, ISP RAS; Assistant Professor, Department of Medical Informatics and Telemedicine, RUDN

25, Aleksandr Solzhenitsyn str., Moscow, 109004,

7, Miklukho-Maklaya str., Moscow, 117198



D. M. Safonova
M.M. Krasnov Research Institute of Eye Diseases (Krasnov Research Institute of Eye Diseases)
Russian Federation

Daria M. Safonova, Сand. Sci. (Med.), Research Scientist, Department of Modern Methods of Treatment in Ophthalmology

11A, B, Rossolimo str., Moscow, 119435



References

1. Sun H., Saeedi P., Karuranga S., Pinkepank M., Ogurtsova K., Duncan B.B. et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabet. Res. Clin. Pract. 2022;183:109119. https://doi.org/10.1016/j.diabres.2021.109119

2. Nanegrungsunk O., Ruamviboonsuk P., Grzybowski A. Prospective studies on artificial intelligence (AI)-based diabetic retinopathy screening. Ann. Transl. Med. 2022;10(24):1297. https://doi.org/10.21037/atm-2022-71

3. Huang X., Wang H., She C., Feng J., Liu X., Hu X. et al. Artificial intelligence promotes the diagnosis and screening of diabetic retinopathy. Front. Endocrinol. (Lausanne). 2022;13:946915. https://doi.org/10.3389/fendo.2022.946915

4. Li J.O., Liu H., Ting D.S.J., Jeon S., Chan R.V.P., Kim J.E. et al. Digital technology, tele-medicine and artificial intelligence in ophthalmology: A global perspective. Prog. Retin. Eye Res. 2021;82:100900. https://doi.org/10.1016/j.preteyeres.2020.100900

5. Nakayama L.F., Zago Ribeiro L., Novaes F., Miyawaki I.A., Miyawaki A.E., de Oliveira J.A.E. Artificial intelligence for telemedicine diabetic retinopathy screening: a review. Ann. Med. 2023;55(2):2258149. https://doi.org/10.1080/07853890.2023.2258149

6. Liang X., Wen H., Duan Y., He K., Feng X., Zhou G. Nonproliferative diabetic retinopathy dataset (NDRD): A database for diabetic retinopathy screening research and deep learning evaluation. Health Informatics J. 2024;30(2):14604582241259328. https://doi.org/10.1177/14604582241259328

7. Guo J., Li X., Zhang W., Zhong J., Liu S. Validation of automatic diabetic retinopathy screening and diagnosis via deep neural networks on multi-modal retinal fundus image datasets. 2023 International Annual Conference on Complex Systems and Intelligent Science (CSIS-IAC), Shenzhen, China; 2023:834–840. http://dx.doi.org/10.1109/CSISIAC60628.2023.10363900

8. Alwakid G., Gouda W., Humayun M., Jhanjhi N.Z. Deep learning-enhanced diabetic retinopathy image classification. Digit. Health. 2023;9:20552076231194942. https://doi.org/10.1177/20552076231194942


Review

For citations:


Bursov A.I., Safonova D.M. Study of images in open datasets of the ocular fundus in diabetic retinopathy designed for training neural network algorithms. Siberian Journal of Clinical and Experimental Medicine. 2025;40(1):218-225. (In Russ.) https://doi.org/10.29001/2073-8552-2025-40-1-218-225

Views: 148


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2713-2927 (Print)
ISSN 2713-265X (Online)