§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2601202202500900
DOI 10.6846/TKU.2022.00730
論文名稱(中文) 以機器學習方法預測新冠肺炎發展趨勢
論文名稱(英文) Using machine learning methods to predict the spread trend of COVID-19
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊管理學系碩士班
系所名稱(英文) Department of Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 110
學期 1
出版年 111
研究生(中文) 蔣蕙娟
研究生(英文) Hui-Juan Jiang
學號 607630299
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2022-01-13
論文頁數 38頁
口試委員 指導教授 - 張昭憲(090557@mail.tku.edu.tw)
口試委員 - 魏世杰(sekewei@mail.tku.edu.tw)
口試委員 - 壽大衛(shoutw@gmail.com)
關鍵字(中) COVID-19
流行病趨勢預測
機器學習
線性迴歸
關鍵字(英) COVID-19
Epidemic Trend Prediction
Machine Learning
Linear Regression
第三語言關鍵字
學科別分類
中文摘要
COVID-19蔓延全球,使各國蒙受重大損失,對人民生活與經濟發展造成重大影響。各國政府紛紛制定各種防疫措施,以避免其擴散與流行。為協助解決COVID-19所引發的影響,研究者莫不投注大量心力,發展各種符合成本效益的方法,做為相關單位制訂策略時的決策依據。雖然已經有許多方法被提出,但其實用性與準確性仍有改善空間。有鑑於此,本研究結合官方COVID-19歷史數據與網路社群發文,配合機器學習方法預測COVID-19的發展趨勢。首先,我們蒐集美國、英國CDC公布之COVID-19相關數據,並截取期間的Twitter發文,產生混合式的資料來源。當對社群發文進行分析時,我們採用官方公布的症狀單詞作為關鍵字。其次,本研究考量疾病的潛伏期,建立具有延遲特性的預測模型,期能提升預測準確率。最後,我們分別使用Linear Regression, MLP與LSTM進行塑模,預測未來可能的死亡與確診人數。實驗結果顯示,本研究提出之方法確實有助於COVID-19之流行趨勢預測,做為相關單位在制定策略時的決策依據。
英文摘要
The spread of COVID-19 around the world has caused major losses to countries and has a major impact on people's lives and economic development. Governments have formulated various anti-epidemic measures to avoid its spread and epidemic. In order to help solve the impact of COVID-19, researchers have devoted a lot of effort to develop various cost-effective methods as the basis for decision-making when relevant units formulate strategies. Although many methods have been proposed, there is still room for improvement in practicability and accuracy. In view of this, this research combines official COVID-19 historical data with online community postings, and cooperates with machine learning methods to predict the development trend of COVID-19. First, we collected COVID-19-related data released by the CDC in the United States and the United Kingdom, and intercepted Twitter posts during the period to generate mixed data sources. When analyzing the community posts, we use the official symptom words as keywords. Secondly, this study considers the incubation period of the disease and establishes a prediction model with delayed characteristics, which can improve the accuracy of prediction. Finally, we use Linear Regression, MLP and LSTM respectively to model to predict the number of possible future deaths and confirmed diagnoses. The experimental results show that the method proposed in this study is indeed helpful in predicting the epidemic trend of COVID-19, as a basis for decision-making by relevant units when formulating strategies.
第三語言摘要
論文目次
中文提要	I
英文提要	II
目次	III
表目次	IV
圖目次	V
 第一章 緒論	1
1.1	研究背景	1
1.2	研究動機	1
1.3	研究貢獻	4
 第二章 背景知識與文獻探討	5
2.1	COVID-19的監控與防控	5
2.2	文獻探討	7
2.3	線性迴歸(Linear Regression)	8
2.4	深度學習方法	9
 第三章 研究方法	11
3.1	資料蒐集	11
3.2	建立Twitter發文COVID-19預測模型	15
3.3	建立考量潛伏期之預測模型	18
3.4	以機器學習方法建立預測模型	19
 第四章 實驗結果	22
4.1	以Twitter資料集配合線性迴歸建立預測模型之效能評估	22
4.2	考量潛伏期之延遲模型預測效能評估	27
4.3	以深度學習方法塑模之預測效能評估	30
 第五章 結論與未來工作	34
參考文獻	35
附錄	37

表 2-1:運用社群網路分析與CDC歷史數據預測COVID-19之影響	7
表 3-1:預測2020/8/20美國新增病例(b=5)所需之記錄筆數(黃色區域)。	17
表 3-2:範例資料集,共有m筆記錄,n個自變數,及1個因變數	19
表 4-1:回溯天數對迴歸建模之預測結果(美國,25個關鍵字)	23
表 4-2:回溯天數對迴歸建模的預測結果(英國,25個關鍵字)	24
表 4-3:回溯天數對迴歸建模之預測結果(美國,11個關鍵字)	25
表 4-4:回溯天數對迴歸建模之預測結果(英國,11個關鍵字)	25
表 4-5:回溯天數對迴歸建模之預測結果(美國,14個關鍵字)	26
表 4-6:回溯天數對迴歸建模之預測結果(英國,14個關鍵字)	26
表 4-7:使用美國Twitter發文資料建立延遲預測模型之結果	28
表 4-8:使用英國Twitter發文資料建立延遲預測模型之結果	29
表 4-9:Twitter發文資料(美國)建立之延遲預測模型與單純預測法之效能比較	30
表 4-10:Twitter發文資料(英國)建立之延遲預測模型與單純預測法之效能比較	30
表 4-11:使用MLP與LSTM塑模時之資料集蒐集期間與欄位 (來源:美國CDC)	31
表 4-12:使用MLP配合美國CDC公布COVID-19數據建立模型之預測效能	31
表 4-13:使用LSTM配合美國CDC公布COVID-19數據建立模型之預測效能	32
表 4-14:使用MLP配合英國CDC公布COVID-19數據建立模型之預測效能	32
表 4-15:使用LSTM配合英國CDC公布COVID-19數據建立模型之預測效能	33

圖 2-1:LSTM之隱藏層元件	10
圖 3-1:Twitter用戶數據	11
圖 3-2:每日蒐集之美國與英國地區之Tweets	12
圖 3-3:美國CDC統計數據	13
圖 3-4:英國PHE病例報告	14
圖 3-5:英國PHE死亡報告	14
圖 3-6:2020年到2021年美國地區關鍵字統計表	16
圖 3-7:將資料集平坦化後,做為MLP塑模/測試所需之資料集(b=3, d=1, n=3)	20
圖 3-8:本研究使用之MLP預測模型架構	20
圖 3-9:本研究使用之LSTM模型架構	21
圖 3-10:本研究使用之MLP(左)與LSTM(右, b=5, n=2)模型參數	21
參考文獻
[1]	衛生福利部疾病管制署,<疾病介紹>,網址:
https://www.cdc.gov.tw/Category/Page/vleOMKqwuEbIMgqaTeXG8A,上網日期:2020年4月27日。
[2]	中央通訊社,<立院三讀紓困特別預算總額上限提高至8400億>,網址:https://www.cna.com.tw/news/firstnews/202105315004.aspx,上網日期:2021年5月31日。
[3]	Alsabek, M., B., Shahin, I., and Hassan, A. (2020). Studying the Similarity of COVID-19 Sounds based on Correlation Analysis of MFCC. 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics.
[4]	Gupta, P., Kumar, S., Suman, R. R., and Kumar V. (2021). Sentiment Analysis of Lockdown in India During COVOID-19: A Case Study on Twitter. IEEE Transactions on Computational Social Systems, 8(4), 939-949.
[5]	Gupta, V. K., Gupta, A., Kumar, D., and Sardana, A. (2021). Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model. Big Data Mining and Analytics 4(2), 116 – 123.
[6]	Hossen, Md. S., and Karmoker, D. (2020). Predicting the Probability of Covid-19 Recovered in South Asian Countries Based on Healthy Diet Pattern Using a Machine Learning Approach. 2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI).
[7]	Hu, C. (2020). The Topological Properties of COVID-19 Global Activity Time Series Forecasting. 2020 5th International Conference on Information Science, Computer Technology and Transportation.
[8]	Lampos, V., and Cristianini, N. "Tracking the flu pandemic by monitoring the social web," 2010 2nd International Workshop on Cognitive Information Processing, 2010, pp. 411-416, doi: 10.1109/CIP.2010.5604088.
[9]	Latif, S., Usman, M., Manzoor, S., Iqbal, W., Qadir, J., Tyson, G., Castro, I., Razi, A. (2020). Leveraging Data Science to Combat COVID-19: A Comprehensive Review. IEEE Transactions on Artificial Intelligence, 1(1), 85-94.
[10]	Liu, X., Zheng, L., Jia, X., Qi, H., Yu, S., and Wang, X. (2021). Public Opinion Analysis on Novel Coronavirus Pneumonia and Interaction with Event Evolution in Real World. IEEE Transactions on Computational Social Systems, 8(4), 1042 – 1051.
[11]	Mandayam, A., Rakshith A.C, Siddesha, S., Niranjan S. K. (2020). Prediction of COVID-19 pandemic based on Regression. 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks.
[12]	Mourad, A., Srour, A., Harmanani, H. M., Jenainatiy, C., and Arafeh, M. (2020). Critical Impact of Social Network Infodemic on Defeating Coronavirus COVID-19 Pandemic: Twitter-based Study and Research Directions. IEEE Transactions on Network and Service Management, 17(4), 2145-2155.
[13]	Naseem, U., Razzak, I., Khushi, M., Eklund, P. W., and Kim, J. (2021). COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis. IEEE Transactions on Computational Social Systems, 8(4), 1003 - 1015.
[14]	Nikil, S., Dalmia, H., and Kumar, P. (2020). Covid-19 Outbreak Analysis. 2020 International Conference on Smart Technologies in Computing, Electrical and Electronics.
[15]	Rakhra, A., Jain, I., Gupta, R., and Bhatia, M. (2021). Predicting the Prevalence Rate of COVID-19 Falsity on Temperature. 2021 11th International Conference on Cloud Computing, Data Science & Engineering.
[16]	Su, Z., Pahlavan, K., and Agu, E. (2021). Performance Evaluation of COVID-19 Proximity Detection Using Bluetooth LE Signal. IEEE Access, Vol. 9, 38891 – 38906.
[17]	Vrindavanam, J., Srinath, R., Shankar, H., and Nagesh, G. (2021). Machine Learning based COVID-19 Cough Classification Models – A comparative Analysis. 2021 5th International Conference on Computing Methodologies and Communication.
[18]	Wibowo, N., S., Mahardika, R., and Kusrini, K. (2021). Twitter Data Analysis Using Machine Learning to Evaluate Community Compliance in Preventing the Spread of Covid-19. 2020 2nd International Conference on Cybernetics and Intelligent System.
[19]	Wu Z, McGoogan JM. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72?314 Cases From the Chinese Center for Disease Control and Prevention. JAMA 2020 Apr 7; 323(13): 1239-42.
[20]	Yu, Shuo, Qing, Q., Chen Zhang, Ahsan Shehzad, Giles Oatley, and Feng Xia (2021).  Data-Driven Decision-Making in COVID-19 Response: A Survey. IEEE Transactions on Computational Social Systems, 8(4), 989-1002.
論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於繳交授權書後, 於網際網路立即公開
校內
校內紙本論文立即公開
同意電子論文全文授權於全球公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信