Lecturer Scientific Research

Faculty of Information Technology  »  Research  »  Lecturer Scientific Research


A Novel Binning Algorithm Using Topic Modelling and k-mer Frequency on Groups of Non-Overlapping Short Reads

Hoang D. Quach, Hoang T. Lam, Dang H. N. Nguyen, Phuong V. D. Van, Van Hoai Chan (2020), “A Novel Binning Algorithm Using Topic Modelling and k-mer Frequency on Groups of Non-Overlapping Short Reads”, International conference on green technology and sustainable development (GTSD), Virtual conference 27th-28th November 2020 in Ho Chi Minh City, page 380

Abstract:

Metagenomics is a field that studies the microorganisms from the environment itself instead of traditional culturing methods. In this paper, we focus on the binning problem, which is to group reads into clusters that highly represent a taxonomic group. The result of this step serves as a crucial input for the next one of a metagenomic project such as assembly and annotation. Because metagenomic reads does not have explicit features, it is not easy to divide them into distinct groups. The solutions for this binning problem can be categorized as supervised and unsupervised approaches. Supervised ones need a reference database, which is unfortunately about 1% of the microorganisms in nature. This prevents these approaches from working well with the dataset that contain unknown species. In this paper we follow an unsupervised approach. Our proposed method is to combine the result from another technique named BiMeta, which based on a biological signature assumption that reads of a same taxonomic label have a same k-mer distribution, and topic modelling as a way of reducing the dimensions of the dataset. Our method shows better results (by precision, recall, and F-measure) than BiMeta on most datasets. Although following BiMeta, LDABiMeta out- performs it with the new proposed ideas. Moreover, our method is equiv- alent to MetaProb, which is the most successful method at present time, for the short-read datasets.

Click


  91,297       1/583