Development of effective drugs toward COVID-19 is urgently required, and so research is being implemented at Chuo University with in scilico drug repositioning
COVID-19 is a pandemic that threatens human life worldwide. Although vaccination was raising as a useful tool to compete with COVID-19, because of severe side reaction and limited effective period, vaccination has its own limitation. Development of effective drugs toward COVID-19 is urgently required.
PCA and TD based unsupervised FE
The activity of our group is mainly dedicated to bioinformatics, which means to attack genomic/biological problems with computer oriented strategy. Especially, we aimed to develop the computer based method that can deal with comprehensive analysis of massive data sets, e.g., those measured by so-called high throughput sequencing (HTS).
During a last decade, this kind of measurement techniques were highly developed. Although HTS techniques was mainly used for measuring genomic features, e.g., the amount of transcript (so called RNA), genomic sequence (DNA), histone modification, DNA methylation and DNA accessibility, other comprehensive analyses, e.g. metabolome and proteome, can also come to be able to be measured.
It is not straightforward to analyze these kind of heterogeneous data at all; New techniques are waited. In order to tackle this difficult problem, we developed unsupervised machine learning method, principal component analysis (PCA) and tensor decomposition (TD) based unsupervised feature extraction (FE). This is the method that enables us to integrate heterogeneous measurements, which is also called as multiomics, in fully data driven manner. We have already published a monograph that describe this method [1].
Application to COVID-19
Recently, we have applied PCA/TD based unsupervised FE to drug repositioning for COVD-19. We have analyzed various SARS-CoV-2 infected cell lines’ and patients’ gene expression profiles and identified critical genes [2,3,4]. These genes turn out to be deeply related to SARS-CoV-2 infection and thus the analysis was trustable. We further try to find drugs known to target genes identified by our methodology and identified drugs are also promising ones.
Other Applications
Our methodology, PCA/TD based unsupervised FE, is not adapted to only drug repositioning, but also adapted to other purpose, since it originally aims to solve more general problem, so called “large p small n problem”, which means the situation where there are limited number of samples (n) associated with large number or features (p). Under such situations, analyses are really difficult, since the number of features required to describe samples is as small as the number of samples. When the number of features is much larger than the number of samples, there are huge number of combinations that can fully describe the limited number of samples; this prevents us from understanding causality between samples and features. Since PCA/TD based unsupervised FE can, at least partially, address this problem, it can be applicable to many problems.
The problems to which PCA/TD based unsupervised FE was applied include identification of disease causing genes, identifying disease biomarker, understanding various genetic relationship behind biology. The biological topics to which we applied are cancers, social insects, PTSD, transepgenetics, epitranscritomics and so on. We are also expecting to apply our methodology to various other field.
Call for Collaborations
We are willing to collaborate with various research institutes and companies. Our methods are suitable to address the problem, “large p small n problem”, which is difficult to address with other machine learning methods like deep learning that requires huge number of samples. We are looking forward to hearing from all of you.
References
[1] Y-h. Taguchi, Unsupervised Feature Extraction Applied to Bioinformatics — A PCA Based and TD Based Approach, Springer International, 2020. https://rd.springer.com/book/10.1007/978-3-030-22456-1
[2] Taguchi Y-h, Turki T (2020) A new advanced in silico drug discovery method for novel coronavirus (SARS-CoV-2) with tensor decomposition-based unsupervised feature extraction. PLoS ONE 15(9): e0238907. https://doi.org/10.1371/journal.pone.0238907
[3] Y. -H. Taguchi and T. Turki, “Application of Tensor Decomposition to Gene Expression of Infection of Mouse Hepatitis Virus Can Identify Critical Human Genes and Efffective Drugs for SARS-CoV-2 Infection,” in IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 3, pp. 746-758, April 2021, doi: 10.1109/JSTSP.2021.3061251.
[4] Fujisawa, K., Shimo, M., Taguchi, YH. et al. PCA-based unsupervised feature extraction for gene expression analysis of COVID-19 patients. Sci Rep 11, 17351 (2021). https://doi.org/10.1038/s41598-021-95698-w
Other resources:
Researchmap: https://researchmap.jp/Yh_Taguchi/
Reserachgate: https://www.researchgate.net/profile/Y-H-Taguchi
Google Scholar: https://scholar.google.co.jp/citations?user=w7C9bR4AAAAJ&hl=ja
Speaker Deck: https://speakerdeck.com/tagtag
Slide Share: https://www.slideshare.net/yhtaguchi
Youtube: https://www.youtube.com/playlist?list=PL-JiREStAuYK0hKXtw4YuIzA1xNI-hjgd
Github: https://github.com/tagtag