Estimate and Calculating Classifiers of Diabetes Diseases

2021-07-15

Estimate and Calculating Classifiers of Diabetes Diseases

  

Estimate and Calculating Classifiers of Diabetes Diseases

    Diabetes is one of the grave health complications and there is growth average of infection people with this disease according to World Health Organization WHO in report 2016, with different kinds, children, women, men, young, old, everybody could be infected. One of the most importance issues to fight this serious disease is the early fast diagnose, there is a set of a precise tests to diagnose diabetes, and if there are a lot of patient records many classification algorithms play great role to discover whether a person have diabetes or not.

    KNN is one of these algorithms and it works when there is training dataset can be used to predict new instances comparing the new instance as distance to nearest existing instances according to specific distance K.

    ANN is supervised classification algorithm. First it need to build train architecture network consists of (1- input layer ‘set of input neurons. 2- hidden layer crosses to approximate solution. 3- output layer neuron(s).). Second train this network to find the suitable weights. At last test weights with portion of data.

      Each classification method needs to be measured and find some parameters like (TP ’true positive’, TN ‘true negative’,

FP ‘false positive, FN ‘false negative’, specificity, and accuracy) to compare it with another method and this is the study objective.

Accuracy is the most used evaluation metrics define in(1) , used to compare between disease diagnosing classifications algorithms to know which one is better in classification.

                                                                                             Accuracy =TP+TN/TP+TN+FP+FN                                                        (1)

 

    The aim of this study is to classify PID by using two classification algorithms (ANN and KNN) in different architecture, and evaluate to discover the best one, in this paper new technique in evaluation have been used we couldn’t find it in any disease classification algorithm before (changing distance in KNN with number of hidden layers in the same rate and find the significant of statistic relation). The objective is make comparison between these classification algorithms.

This study will perform ANN and KNN on PID by changing K value in KNN and number of hidden layers in ANN from (1-50), and put the accuracy of both methods from each iteration in table, after that analyze accuracy values to discover if it is statistically significant. A studies work with comparing classification methods in different ways .

A.         Data set

    In this study (Pima-Indian-Diabetes PID) used as the dataset to performs classification, this dataset collected from pregnant women tested as infected and not infected with diabetes by university of California, Irvine Repository (UCI) of Machine learning databases .

(Pima-Indian-Diabetes) dataset is real dataset consist of 768 instances each instance has 9 attributes including the class attribute (the last one), these attributes are:

Number of times pregnant.

Plasma glucose concentration.

Diastolic blood pressure (mmHg).

Triceps skin fold thickness (mm).

•2-hour serum insulin (mu U/ms).

Body mass index (weight in kg/ (height in m)) 2.

Diabetes pedigrees function.

Age (years).

Class attribute (infected 1 not 0).

From 768 there is 268 instances with class 1 (infected), 500 with 0 (not infected).

B.K-nearest neighborhood KNN

   K-nearest neighborhood classification algorithm KNN is supervised classification algorithm depends on the distances between the test dataset and the training dataset and finds out which one is closest and take the majority class from K-list according to K samples chooses randomly.

 

 

Prepare the Printer   Back to Detail Page