Two-Stage Classification of Breast Tumor Biomarkers for Iraqi Women
Objective: Breast cancer is regarded as a deadly disease in women causing lots of mortalities. Early diagnosis of breast cancer with appropriate tumor biomarkers may facilitate early treatment of the disease, thus reducing the mortality rate. The purpose of the current study is to improve early diagnosis of breast by proposing a two-stage classification of breast tumor biomarkers fora sample of Iraqi women.
Methods: In this study, a two-stage classification system is proposed and tested with four machine learning classifiers. In the first stage, breast features (demographic, blood and salivary-based attributes) are classified into normal or abnormal cases, while in the second stage the abnormal breast cases are further classified into either malignant or benign. The collected 20 breast cancer features are utilized to test the performance of the proposed classification system with Leave-One-Out (LOO) cross validation and Synthetic Minority Over-Sampling Technique (SMOTE) to balance the classes. Furthermore, correlation-based feature selection (CFS) was employed in an exploratory analysis to find the best features for the 2-stage classification system.
Results: Classification accuracy of 94% for stage-1 and 100% for stage-2was achieved with a Naïve Bayesclassifier which outperformed other three methods. In addition, CFS selected small subset of features as being the best five features out of the all 20 features for both stage-1 and stage-2.
Conclusion: We achieved a high classification accuracy which is promising to help improve the early diagnosis of breast tumor. The outcome of this study also shows the importance of CA15-3protein in saliva and blood as well as carcinoembryonic antigen level and total protein in blood, and Estrogen hormone level in saliva, for predicting breast tumors.
N. A. S. Alwan, “Breast cancer: demographic characteristics and clinico-pathological presentation of patients in Iraq,” 2010.
M. S. Dawood and A. A. Mohammed, “Breast Tumor Diagnosis Using Diode Laser in NearInfrared Region,” Al-Khwarizmi Eng. J., vol. 5, no. 2, pp. 20–31, 2009.
M. Akram, M. Iqbal, M. Daniyal, and A. U. Khan, “Awareness and current knowledge of breast cancer,” Biol. Res., vol. 50, no. 1, p. 33, 2017.
G. N. Sharma, R. Dave, J. Sanadya, P. Sharma, and K. K. Sharma, “Various types and management of breast cancer: an overview,” J. Adv. Pharm. Technol. Res., vol. 1, no. 2, p. 109, 2010.
N. L. Henry and D. F. Hayes, “Cancer biomarkers,” Mol. Oncol., vol. 6, no. 2, pp. 140–146, 2012.
D. F. Hayes, “Biomarker validation and testing,” Mol. Oncol., vol. 9, no. 5, pp. 960–966, 2015.
M. Çınar, M. Engin, E. Z. Engin, and Y. Ziya Ateşçi, “Early prostate cancer diagnosis by using artificial neural networks and support vector machines,” Expert Syst. Appl., vol. 36, no. 3, Part 2, pp. 6357–6361, 2009.
V. Chaurasia, S. Pal, and B. B. Tiwari, “Prediction of benign and malignant breast cancer using data mining techniques,” J. Algorithm. Comput. Technol., vol. 12, no. 2, pp. 119–126, 2018.
S. F. Behadili, M. S. Abd, I. K. Mohammed, and M. M. Al-Sayyid, “Analyzing Breast Cancer Data for Iraqi Women using Data Mining Techniques,” in 3rd International Medical Education CONGRESS, 2018.
A. Mert, N. Kılıç, E. Bilgili, and A. Akan, “Breast cancer detection with reduced feature set,” Comput. Math. Methods Med., vol. 2015, 2015.
M. K. Abd-Ellah, A. I. Awad, A. A. M. Khalaf, and H. F. A. Hamed, “Design and implementation of a computer-aided diagnosis system for brain tumor classification,” in 2016 28th International Conference on Microelectronics (ICM), 2016, pp. 73–76.
I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd ed. San Francisco: Morgan Kaufmann, 2005.
G. H. John and P. Langley, “Estimating Continuous Distributions in Bayesian Classifiers,” in Eleventh Conference on Uncertainty in Artificial Intelligence, 1995, pp. 338–345.
R. C. Holte, “Very simple classification rules perform well on most commonly used datasets,” Mach. Learn., vol. 11, pp. 63–91, 1993.
C. Sammut and G. I. Webb, Eds., “Logistic Regression BT - Encyclopedia of Machine Learning,” Boston, MA: Springer US, 2010, p. 631.
M. Karabatak, “A new classifier for breast cancer detection based on Naïve Bayesian,” Measurement, vol. 72, pp. 32–36, 2015.
M. F. A. Saputra, T. Widiyaningtyas, and A. P. Wibawa, “Illiteracy Classification Using K Means-Naïve Bayes Algorithm,” JOIV Int. J. Informatics Vis., vol. 2, no. 3, pp. 153–158, 2018.
C. Sammut and G. I. Webb, Eds., “Decision Stump BT - Encyclopedia of Machine Learning,” Boston, MA: Springer US, 2010, pp. 262–263.
S. Sayad, “Tutorial on OneR classifier.” [Online]. Available: http://www.saedsayad.com/oner.htm. [Accessed: 28-Jan-2019].
N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic study,” Intell. data Anal., vol. 6, no. 5, pp. 429–449, 2002.
N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: an overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, 2000.
M. A. Hall, “Correlation-based feature selection for machine learning,” 1999.
(Received 18 November 2019; accepted 26 April 2020)
Copyright: Open Access authors retain the copyrights of their papers, and all open access articles are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided that the original work is properly cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations. While the advice and information in this journal are believed to be true and accurate on the date of its going to press, neither the authors, the editors, nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.