Methodology and tools for creating training samples for artificial intelligence systems for recognizing lung cancer on CT images

Cover Page

Cite item

Full Text

Abstract

Introduction. Medical imaging techniques can diagnose many diseases at the early stages of their development, improving the patient survival. Artificial intelligence (AI) systems, requiring the high-quality annotated and marked-up sets of medical images, are a suitable and promising means of improving the diagnostics’ quality.

The purpose of the study was to develop a methodology and software for creating AIS training sets.

Material and methods. We compared the main annotation methods’ performance and accuracy and based the information system on the most efficient method in both domains to develop an optimal approach. To markup objects of interest, we used the cluster model of lesions localization previously developed by the authors. We used C++ and Kotlin programming languages for software development.

Results. A structured annotation template with delivered a glossary of terms became the basis of the information system. The latter consists of three interacting modules, two of which are executed on a remote server’s capacities and one on a personal computer or mobile device of the end-user. The first module is a web service responsible for the workflow logic. The second module, a web server, is responsible for interacting with client applications. Its role is to identify users and manage the database and Picture Archiving and Communication System (PACS) connections. The front-end module is a web application with a graphical interface that assists the end-user in images’ markup and annotation.

Conclusions. An algorithmic basis and a software package have been created for annotation and markup of CT images. The resulting information system was used in a large-scale lung cancer screening project for the creation of medical imaging datasets.

About the authors

Nikolay S. Kulberg

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department; Federal Research Center «Computer Science and Control» of Russian Academy of Sciences

Author for correspondence.
Email: kulberg@npcmr.ru
ORCID iD: 0000-0001-7046-7157

MD, Ph.D., head of the Department, Scientific and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Moscow, 109029, Russia.

e-mail: kulberg@npcmr.ru

Russian Federation

Maxim A. Gusev

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department; Moscow Polytechnic Uniersity

Email: noemail@neicon.ru
ORCID iD: 0000-0001-8864-8722
Russian Federation

Roman V. Reshetnikov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department; Institute of Molecular Medicine, Sechenov First Moscow State Medical University

Email: noemail@neicon.ru
ORCID iD: 0000-0002-9661-0254
Russian Federation

Alexey B. Elizarov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department

Email: noemail@neicon.ru
ORCID iD: 0000-0003-3786-4171
Russian Federation

Vladimir P. Novik

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department

Email: noemail@neicon.ru
ORCID iD: 0000-0002-6752-1375
Russian Federation

Sergey B. Prokudaylo

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department

Email: noemail@neicon.ru
ORCID iD: 0000-0003-0970-3645
Russian Federation

Yuriy N. Philippovich

Moscow Polytechnic Uniersity

Email: noemail@neicon.ru
ORCID iD: 0000-0001-9419-2282
Russian Federation

Victor A. Gobmolevsky

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department

Email: noemail@neicon.ru
ORCID iD: 0000-0003-1816-1315
Russian Federation

Anton V. Vladzymyrskyy

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department

Email: noemail@neicon.ru
ORCID iD: 0000-0002-2990-7736
Russian Federation

Natalya N. Kamynina

Research Institute for Healthcare Organization and Medical Management of Moscow Healthcare Department

Email: noemail@neicon.ru
ORCID iD: 0000-0002-0925-5822
Russian Federation

Sergey P. Morozov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department

Email: noemail@neicon.ru
ORCID iD: 0000-0001-6545-6170
Russian Federation

References

  1. Riquelme D., Akhloufi M.A. Deep learning for lung cancer nodules detection and classification in CT scans. AI. 2020; 1(1): 28–67. https://doi.org/10.3390/ai1010003
  2. Bell D.J., Morgan M.A. Lung-RADS. National Cancer Institute (NCI). Available at: https://radiopaedia.org/articles/lung-rads
  3. Morozov S.P., Kul’berg N.S., Gombolevskiy V.A., Ledikhova N.A., Sokolina I.A., Vladzimirskiy A.V., et al. Tagged Chest Computed Tomography (CT) Images. Patent RU № 2018620500; 2018. (in Russian)
  4. Morozov S.P., Kul’berg N.S., Gombolevskiy V.A., Ledikhova N.A., Sokolina I.A., Vladzimirskiy A.V., et al. Chest Computer Tomography (CT) set for Machine Learning. Patent RU № 2018620427; 2018. (in Russian)
  5. Li Z., Wang C., Han M., Xue Y., Wei W., Li L.J., et al. Thoracic Disease Identification and Localization with Limited Supervision. Available at: https://arxiv.org/abs/1711.06373
  6. Armato S.G., McLennan G., Bidaut L., McNitt-Gray M.F., Meyer C.R., Reeves A.P., et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans. Med. Phys. 2011; 38(2): 915–31. https://doi.org/10.1118/1.3528204
  7. Kan S.H. Metrics and Models in Software Quality Engineering. Boston: Addison-Wesley Professional; 2003.
  8. Kovalev V.A., Levchuk V.A., Kalinovskiy A.A., Fridman M.V. Tumor segmentation in whole-slide histology images using deep learning. Informatika. 2019; 16(2): 18–26. (in Russian)
  9. Xu R., Zhou X., Hirano Y., Tachibana R., Hara T., Kido S., et al. Particle system based adaptive sampling on spherical parameter space to improve the MDL method for construction of statistical shape models. Comput. Math. Methods Med. 2013; 2013: 196259. https://doi.org/10.1155/2013/196259
  10. Armato S.G., Meyer C.R., Mcnitt-Gray M.F., McLennan G., Reeves A.P., Croft B.Y., et al. The Reference Image Database to Evaluate Response to therapy in lung cancer (RIDER) project: A resource for the development of change analysis software. Clin. Pharmacol. Ther. 2008; 84(4): 448–56. https://doi.org/10.1038/clpt.2008.161
  11. Bakr S., Gevaert O., Echegaray S., Ayers K., Zhou M., Shafiq M., et al. A radiogenomic dataset of non-small cell lung cancer. Sci. Data. 2018; 5: 180202. https://doi.org/10.1038/sdata.2018.202

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2021 Kulberg N.S., Gusev M.A., Reshetnikov R.V., Elizarov A.B., Novik V.P., Prokudaylo S.B., Philippovich Y.N., Gobmolevsky V.A., Vladzymyrskyy A.V., Kamynina N.N., Morozov S.P.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ:  ПИ № ФС77-50668 от 13.07.2012 г.