Journal of Infectious Diseases & Travel Medicine (JIDTM)

ISSN: 2640-2653

Review Article

Statistical analysis on Alzheimer's disease

Authors: Bin Zhao1* and Xia Jiang2

DOI: 10.23880/jidtm-16000177

Abstract

Alzheimer's disease is a progressive neurodegenerative disease that occurs mostly in the elderly and has memory impairment as the main clinical symptom. There is no ideal treatment for Alzheimer's disease, so early prevention is important. In this paper, we use brain structural information to diagnose Alzheimer's disease features and cognitive-behavioral characteristics, which is important for early and accurate diagnosis of mild cognitive impairment. To investigate the factors influencing Alzheimer's disease, a correlation analysis model was developed after preprocessing the missing values of the data. First, the data features were viewed, the missing values of the data were analyzed, and the useless features were removed and the missing values of the remaining features were filled with the average value. To verify the accuracy of the subsequent intelligent diagnosis model and clustering model, this paper divides the training set and test set according to PTID. Finally, the top ten important features are selected and the Spearman coefficients are chosen according to the distribution of the features for correlation analysis. Machine learning methods were utilized to build an Alzheimer's classification model to solve the problem of intelligent diagnosis of Alzheimer's disease. The pre-processed dataset in the above paper was trained with the model, and five methods of logistic regression, support vector machine, KNN classification, decision tree classification and XGB were utilized to build the classification model, and the accuracy, recall and F1 value of each model were visualized and compared, among which the accuracy of XGB model reached 83%, which is reasonable for the intelligent diagnosis of the disease. A K-Means-based clustering model for disease types was established using the K-Means clustering algorithm, clustering CN, MCI and AD into three major classes, and then refining MCI into three subclasses. The optimal K-values and random seeds were firstly found using the elbow principle, then the cluster analysis was performed using the feature values and data sets selected after preprocessing, and finally the MCI in MCI was extracted and sub-clustered into three subclasses SMC, EMCI and LMCI. In order to investigate the evolution pattern of different categories of diseases over time, patients with 3 categories of diseases are screened separately for analysis in this paper. Firstly, by combining the results above and reviewing the data, the features irrelevant to this task and columns containing a large number of missing values were removed, the remaining features were selected and probability density plots were drawn, and all discrete features and all features that were essentially zero were continued to be screened out. After that, the 15 features of CN, MCI and AD diseases were plotted separately over time to reveal their evolution patterns over time. We reviewed the relevant literature, sorted out and summarized the existing studies at home and abroad, and summarized the criteria for determining the five stages of Alzheimer's disease and the early intervention of the disease.

Keywords: Alzheimer's Disease; Machine Learning; XGB Algorithm; K-Means Clustering

View PDF

Google_Scholar_logo Academic Research index asi ISI_logo logo_wcmasthead_en scilitLogo_white F1 search-result-logo-horizontal-TEST cas_color europub infobase logo_world_of_journals_no_margin