Journal of Infectious Diseases & Travel Medicine (JIDTM)

ISSN: 2640-2653

Research Article

Biostatistical Analysis of the Novel Coronavirus

Authors: Zhao B* and Cao J

Abstract

Background: Since the first appearance of novel coronavirus (COVID-19) in Wuhan in December 2019, it has quickly swept the world and become a major security accident facing humanity nowadays. While threatening people’s lives, the economies of various countries have also been severely damaged because of the epidemic. Because of the epidemic, it leading to the closure of a large number of companies, employment is becoming more difficult and people’s lives have been greatly affected. So to the Hubei Province, where the COVID-19 first broke out, and the United States, the most severely affected area, we establish time series models to analyze the spread of the new coronavirus and short-term forecasts. This will help countries better understand the development trend of the epidemic, and make better preparations, timely intervention and treatment to prevent the further spread of the virus. Methods: The data that collected from Hubei Province from 20 January, 2020 to 28 April, 2020 includes the cumulative confirmed diagnoses, death and cure. We use Excel to organize the data first, and then use SPSS to establish time series models and statistical analysis. Because there is no problem of missing data, so we define the day as the time variable, make time series graphs and observe the overall change rule. We remove the outliers, then use the SPSS expert modeler to automatically find the best fitting model for each dependent sequence, and predict by designating the independent variable and setting the width of the confidence interval to 95%. ACF and PACF graphs of residuals and Q test are used to determine whether the residual is a white noise sequence and whether the model is an appropriate model. The Holt model is used for the cumulative confirmed diagnoses in Hubei Province, and the ARIMA (1,2,0) model is used for cumulative cures and deaths in Hubei Province. Because the outbreak in the United States is later than China, we collect data from 29 February, 2020 to 28 April, 2020, which also includes the cumulative confirmed diagnoses, deaths and cures. The ARIMA (2,2,6) model is used for cumulative diagnoses in U.S., the ARIMA (0,2,0) model is used for cumulative deaths in U.S., and the ARIMA (0,2,1) model is used for cumulative cures in U.S. Findings: From our modeling of the data, the time series diagrams of the real the fitted data almost overlap, so the fitting effect of the Holt model and the ARIMA model we use is very suitable. We compare the predicted values with the real values of the same period and found that the epidemic situation in Hubei Province has basically ended after May, but the epidemic situation in the United States has become more severe after May, so the Holt model and the ARIMA model are also very appropriate in predicting the epidemic situation in short-term. Interpretation: Because the Chinese government has always put the safety of people’s lives in the first place, when the epidemic broke out, it decisively closed the city of Hubei Province. One side is in trouble, all sides support; they concentrate all resources of whole country to save Hubei Province at the expense of the economy only in order to save more people. Now we can clearly see that the epidemic has been controlled in China and the whole country is developing in a good direction. In contrast, the epidemic in the United States, because of the government’s lack of control, unwillingness to sacrifice the economy, premature return to work, and failure to call on people to wear masks, will lead to the epidemic in the United States has been going in a bad direction.

Keywords: COVID-19; Time Series Analysis; Holt Model; ARIMA Model

View PDF

F1 europub scilit.net