Two-Stage Classification of Breast Tumor Biomarkers for Iraqi Women

  • Iyden Kamil Mohammed Department of Biomedical Engineering / Alkhwarizmi College of Engineering/ University of Baghdad/ Baghdad/ Iraq
  • Ali Hussein Al-Timemy Department of Biomedical Engineering / Alkhwarizmi College of Engineering/ University of Baghdad/ Baghdad/ Iraq
  • Javier Escudero School of Engineering/ Institute for Digital Communications/ The University of Edinburgh/ Alexander Graham Bell Building/ EH9 3FG/ UK


Objective: Breast cancer is regarded as a deadly disease in women causing lots of mortalities. Early diagnosis of breast cancer with appropriate tumor biomarkers may facilitate early treatment of the disease, thus reducing the mortality rate. The purpose of the current study is to improve early diagnosis of breast by proposing a two-stage classification of breast tumor biomarkers fora sample of Iraqi women.

Methods: In this study, a two-stage classification system is proposed and tested with four machine learning classifiers. In the first stage, breast features (demographic, blood and salivary-based attributes) are classified into normal or abnormal cases, while in the second stage the abnormal breast cases are further classified into either malignant or benign. The collected 20 breast cancer features are utilized to test the performance of the proposed classification system with Leave-One-Out (LOO) cross validation and Synthetic Minority Over-Sampling Technique (SMOTE) to balance the classes. Furthermore, correlation-based feature selection (CFS) was employed in an exploratory analysis to find the best features for the 2-stage classification system.

Results: Classification accuracy of 94% for stage-1 and 100% for stage-2was achieved with a Naïve Bayesclassifier which outperformed other three methods. In addition, CFS selected small subset of features as being the best five features out of the all 20 features for both stage-1 and stage-2.

Conclusion: We achieved a high classification accuracy which is promising to help improve the early diagnosis of breast tumor. The outcome of this study also shows the importance of CA15-3protein in saliva and blood as well as carcinoembryonic antigen level and total protein in blood, and Estrogen hormone level in saliva, for predicting breast tumors.


Download data is not yet available.


N. A. S. Alwan, “Breast cancer: demographic characteristics and clinico-pathological presentation of patients in Iraq,” 2010.

M. S. Dawood and A. A. Mohammed, “Breast Tumor Diagnosis Using Diode Laser in NearInfrared Region,” Al-Khwarizmi Eng. J., vol. 5, no. 2, pp. 20–31, 2009.

M. Akram, M. Iqbal, M. Daniyal, and A. U. Khan, “Awareness and current knowledge of breast cancer,” Biol. Res., vol. 50, no. 1, p. 33, 2017.

G. N. Sharma, R. Dave, J. Sanadya, P. Sharma, and K. K. Sharma, “Various types and management of breast cancer: an overview,” J. Adv. Pharm. Technol. Res., vol. 1, no. 2, p. 109, 2010.

N. L. Henry and D. F. Hayes, “Cancer biomarkers,” Mol. Oncol., vol. 6, no. 2, pp. 140–146, 2012.

D. F. Hayes, “Biomarker validation and testing,” Mol. Oncol., vol. 9, no. 5, pp. 960–966, 2015.

M. Çınar, M. Engin, E. Z. Engin, and Y. Ziya Ateşçi, “Early prostate cancer diagnosis by using artificial neural networks and support vector machines,” Expert Syst. Appl., vol. 36, no. 3, Part 2, pp. 6357–6361, 2009.

V. Chaurasia, S. Pal, and B. B. Tiwari, “Prediction of benign and malignant breast cancer using data mining techniques,” J. Algorithm. Comput. Technol., vol. 12, no. 2, pp. 119–126, 2018.

S. F. Behadili, M. S. Abd, I. K. Mohammed, and M. M. Al-Sayyid, “Analyzing Breast Cancer Data for Iraqi Women using Data Mining Techniques,” in 3rd International Medical Education CONGRESS, 2018.

A. Mert, N. Kılıç, E. Bilgili, and A. Akan, “Breast cancer detection with reduced feature set,” Comput. Math. Methods Med., vol. 2015, 2015.

M. K. Abd-Ellah, A. I. Awad, A. A. M. Khalaf, and H. F. A. Hamed, “Design and implementation of a computer-aided diagnosis system for brain tumor classification,” in 2016 28th International Conference on Microelectronics (ICM), 2016, pp. 73–76.

I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd ed. San Francisco: Morgan Kaufmann, 2005.

G. H. John and P. Langley, “Estimating Continuous Distributions in Bayesian Classifiers,” in Eleventh Conference on Uncertainty in Artificial Intelligence, 1995, pp. 338–345.

R. C. Holte, “Very simple classification rules perform well on most commonly used datasets,” Mach. Learn., vol. 11, pp. 63–91, 1993.

C. Sammut and G. I. Webb, Eds., “Logistic Regression BT - Encyclopedia of Machine Learning,” Boston, MA: Springer US, 2010, p. 631.

M. Karabatak, “A new classifier for breast cancer detection based on Naïve Bayesian,” Measurement, vol. 72, pp. 32–36, 2015.

M. F. A. Saputra, T. Widiyaningtyas, and A. P. Wibawa, “Illiteracy Classification Using K Means-Naïve Bayes Algorithm,” JOIV Int. J. Informatics Vis., vol. 2, no. 3, pp. 153–158, 2018.

C. Sammut and G. I. Webb, Eds., “Decision Stump BT - Encyclopedia of Machine Learning,” Boston, MA: Springer US, 2010, pp. 262–263.

S. Sayad, “Tutorial on OneR classifier.” [Online]. Available: [Accessed: 28-Jan-2019].

N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic study,” Intell. data Anal., vol. 6, no. 5, pp. 429–449, 2002.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.

P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: an overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, 2000.

M. A. Hall, “Correlation-based feature selection for machine learning,” 1999.

How to Cite
Mohammed, I., Al-Timemy, A., & Escudero, J. (2020). Two-Stage Classification of Breast Tumor Biomarkers for Iraqi Women. Al-Khwarizmi Engineering Journal, 16(3), 1-10. Retrieved from