The visit of the Assistant President of the University for Scientific Affairs              Educational qualification course              Arabic proficiency              Anbar during the Ottoman era              Administrative Order / Workshop: Fulbright scholarship

Poll

Total Votes 345

Professors

Center
Career Title
Name

Graduates

Center
Certificate
Year
Type
Sex
Name

Students

Center
Stage
Certificate
Type
Sex
Name

 News Details

E-mail Spam Classification by Machine Learning

2023-02-01

E-mail Spam Classification by Machine Learning


E-mail Spam Classification by Machine Learning

by: Hadeel M Saleh

   E-mails are generally considered a reliable channel of communication and as such, has recently become the target of numerous attacks. A common form of these attacks is junk or spam emails; these junk emails are deliberately delivered to the target using different protocols, like SMTP [1][2]. They are sent in high numbers and as such, occupies a significant portion of network bandwidth. Spam emails can also be annoying and can deprive users of using the available network resources because they compete with the legal users for the available storage space on the server. Spam emails also causes wastage of precious communication effort and time; they are also a source of threat to official establishments [3] [4]. The detection of spam emails is generally done by appropriately classifying incoming emails into spam & non-spam classes. Most new spam detection systems are ML-based [5] [6] but one of the common problems encountered is how to select the optimal input feature subsets for the selected classifiers. This is normally done via FS processes and this is usually hampered by the issue of high data dimensionality associated with the FS process as it reduces the performance of some classifiers, like SVM, ANN, and NBC [7]–[15]. This high data dimensionality can be prevented by reducing the feature space; this can be achieved by minimizing the number of features present in the data. But it is proper to ensure that the FS process returned features that will represent the problems encountered in the document. Irrelevant features can impact the classification accuracy as well and can affect the time needed to train the classifier; it can also affect the feature-related expenditure and the number of required instances for learning [16], [17]

The evolutionary and swarm-based techniques, such as ACO [18], [19], GA [20], [21], ABC [21] [22], PSO [23] [24], and HSA have been the commonly used methods for addressing FS-related problems [26], [27]. As a nature-inspired framework, PSO [24], [28], [29] was developed based on inspiration from the natural way of life of fish and birds; it has been used in finding solution to different complex optimization tasks. Since its introduction by [28], PSO has been modified severally, giving rise to several version of the algorithm;

 

References

[1] D. Debarr and H. Wechsler, “Spam detection using Random Boost,” Pattern Recognit. Lett., vol. 33, no. 10, pp. 1237–1244, 2012.

[2] Q. Wu, S. Wu, and J. Liu, “Hybrid model based on SVM with Gaussian loss function and adaptive Gaussian PSO,” Eng. Appl. Artif. Intell., vol. 23, no. 4, pp. 487–494, 2010.

 [3] S. M. Lee, D. S. Kim, J. H. Kim, and J. S. Park, “Spam Detection Using Feature Selection and Parameters Optimization,” in 2010 International Conference on Complex, Intelligent and Software Intensive Systems, 2010, pp. 883–888.

 [4] N. Jindal and B. Liu, “Analyzing and detecting review spam,” in Proceedings - IEEE International Conference on Data Mining, ICDM, 2007, pp. 547–552.

[5] M. Hall, “Correlation-based Feature Selection for Machine Learning,” Methodology, vol. 21i195-i20, no. April, pp. 1–5, 1999.

 [6] M. Chang and C. K. Poon, “Using phrases as features in email classification,” J. Syst. Softw., vol. 82, no. 6, pp. 1036–1045, 2009.

[7] H. Liu, X. Shi, D. Guo, Z. Zhao, and Yimin, “Feature selection combined with neural network structure optimization for HIV-1 protease cleavage site prediction,” Biomed Res. Int., vol. 2015, 2015.

 [8] S. Li, P. Wang, and L. Goel, “Wind Power Forecasting Using Neural Network Ensembles with Feature Selection,” IEEE Trans. Sustain. Energy, vol. 6, no. 4, pp. 1447–1456, 2015.

[9] M. Zhao, C. Fu, L. Ji, K. Tang, and M. Zhou, “Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes,” Expert Syst. Appl., vol. 38, no. 5, pp. 5197–5204, 2011.

 [10] J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature selection for text classification with Naïve Bayes,” Expert Syst. Appl., vol. 36, no. 3 PART 1, pp. 5432–5435, 2009. PEN Vol. 9, No. 3, August 2021, pp.520-531 530

 [11] M. L. Zhang, J. M. Peña, and V. Robles, “Feature selection for multi-label naive Bayes classification,” Inf. Sci. (Ny)., vol. 179, no. 19, pp. 3218–3229, 2009.

 [12] G. Feng, J. Guo, B.-Y. Jing, and T. Sun, “Feature subset selection using naive Bayes for text classification,” Pattern Recognit. Lett., vol. 65, pp. 109–115, 2015.

 [13] S. Q. Salih, “A New Training Method Based on Black Hole Algorithm for Convolutional Neural Network,” J. Sourthwest Jiaotong Univ., vol. 54, no. 3, pp. 1–10, 2019.

 [14] S. I. Abba, S. J. Hadi, S. S. Sammen, S. Q. Salih, R. A. Abdulkadir, Q. B. Pham, and Z. M. Yaseen, “Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination,” J. Hydrol., vol. 587, p. 124974, Aug. 2020.

 [15] Z. M. Yaseen, Z. H. Ali, S. Q. Salih, and N. Al-Ansari, “Prediction of Risk Delay in Construction Projects Using a Hybrid Artificial Intelligence Model,” Sustainability, vol. 12, no. 4, p. 1514, Feb. 2020. [16] A. Unler, A. Murat, and R. B. Chinnam, “Mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification,” Inf. Sci. (Ny)., vol. 181, no. 20, pp. 4625–4641, 2011.

[17] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: Criteria of MaxDependency, Max-Relevance, and Min-Redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226–1238, 2005.

[18] P. Moradi and M. Rostami, “Integration of graph clustering with ant colony optimization for feature selection,” Knowledge-Based Syst., vol. 84, pp. 144–161, 2015.

[19] S. M. Vieira, J. M. C. Sousa, and T. A. Runkler, “Two cooperative ant colonies for feature selection using fuzzy models,” Expert Syst. Appl., vol. 37, no. 4, pp. 2714–2723, 2010.

[20] M. M. Kabir, M. Shahjahan, and K. Murase, “A new local search based hybrid genetic algorithm for feature selection,” Neurocomputing, vol. 74, no. 17, pp. 2914–2928, 2011.

[21] C.-H. Lin, H.-Y. Chen, and Y.-S. Wu, “Study of image retrieval and classification based on adaptive features using genetic algorithm feature selection,” Expert Syst. Appl., vol. 41, no. 15, pp. 6611–6621, 2014.

 [22] M. Schiezaro and H. Pedrini, “Data feature selection based on Artificial Bee Colony algorithm,” J. Image Video Process., vol. 1, no. 47, pp. 1–8, 2013.

 [23] V. Agrawal and S. Chandra, “Feature Selection using Artificial Bee Colony Algorithm for Medical Image Classification,” 2015 Eighth Int. Conf. Contemp. Comput., vol. 1, pp. 2–7, 2015.

[24] B. Xue, M. Zhang, and W. N. Browne, “Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms,” Appl. Soft Comput. J., vol. 18, pp. 261– 276, 2014.

 [25] B. Xue, M. Zhang, S. Member, and W. N. Browne, “Particle Swarm Optimization for Feature Selection in Classification?: A Multi-Objective Approach,” pp. 1–16, 2012.

 [26] C. C. O. Ramos, A. N. Souza, G. Chiachia, A. X. Falc??o, and J. P. Papa, “A novel algorithm for feature selection using Harmony Search and its application for non-technical losses detection,” Comput. Electr. Eng., vol. 37, no. 6, pp. 886–894, 2011.

[27] H. H. Inbarani, M. Bagyamathi, and A. T. Azar, “A novel hybrid feature selection method based on rough set and improved harmony search,” Neural Comput. Appl., vol. 26, no. 8, pp. 1859–1880, 2015. [28] R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Micro Machine and Human Science, 1995. MHS’95., Proceedings of the 6th International Symposium, 1995, pp. 39–43.

 

 

 

 Facebook Comments

 News More

 Learn to program in Python in no time

 scientific article

 Article by Professor Wissam Khaled Jammar

 Article by Professor Wissam Khaled Jummar

 Yemeni war of secession in 1994

 Teaching article

 The names of the participants in the Arabic language proficiency course

 Anbar during the Ottoman era

Share |