A novel method of literature mining to identify candidate COVID-19 drugs

Authors

Tomonari Muramatsu*, Masaru Tanokura*

Abstract

COVID-19 is a serious infectious disease that has recently emerged and continues to spread worldwide. Its spreading rate is too high to expect that new specific drugs will be developed in sufficient time. As an alternative, drugs already developed for other diseases have been tested for use in the treatment of COVID-19 (drug repositioning). However, to select candidate drugs from a large number of compounds, numerous inhibition assays involving viral infection of cultured cells are required. For efficiency, it would be useful to narrow the list of candidates down using logical considerations prior to performing these assays. We have developed a powerful tool to predict candidate drugs for the treatment of COVID-19 and other diseases. This tool is based on the concatenation of events/substances, each of which are linked to a KEGG (Kyoto Encyclopedia of Genes and Genomes) code based on a relationship obtained from text mining of the vast literature in the PubMed database. By analyzing 21,589,326 records with abstracts from PubMed, 98,556 KEGG codes with NAME/DEFINITION fields were connected. Among them, 9,799 KEGG drug codes were connected to COVID-19, of which 7,492 codes had no direct connection to COVID-19. Although this report focuses on COVID-19, the program developed here can be applied to other infectious diseases and used to quickly identify drug candidates when new infectious diseases appear in the future.

Paper Information

Journal: : Bioinformatics Advances
DOI: : 10.1093/bioadv/vbab013; : https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbab013/6325500