http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Kim, Sungchul,Sael, Lee,Yu, Hwanjo Oxford University Press 2015 Bioinformatics Vol.31 No.22
<P><B>Motivation</B>: As the quantity of genomic mutation data increases, the likelihood of finding patients with similar genomic profiles, for various disease inferences, increases. However, so does the difficulty in identifying them. Similarity search based on patient mutation profiles can solve various translational bioinformatics tasks, including prognostics and treatment efficacy predictions for better clinical decision making through large volume of data. However, this is a challenging problem due to heterogeneous and sparse characteristics of the mutation data as well as their high dimensionality.</P><P><B>Results</B>: To solve this problem we introduce a compact representation and search strategy based on Gene-Ontology and orthogonal non-negative matrix factorization. Statistical significance between the identified cancer subtypes and their clinical features are computed for validation; results show that our method can identify and characterize clinically meaningful tumor subtypes comparable or better in most datasets than the recently introduced Network-Based Stratification method while enabling real-time search. To the best of our knowledge, this is the first attempt to simultaneously characterize and represent somatic mutational data for efficient search purposes.</P><P><B>Availability</B>: The implementations are available at: https://sites.google.com/site/postechdm/research/implementation/orgos.</P><P><B>Contact</B>: sael@cs.stonybrook.edu or hwanjoyu@postech.ac.kr</P><P><B>Supplementary information:</B>Supplementary data are available at <I>Bioinformatics</I> online.</P>
Mass Spectrometry Coupled Experiments and Protein Structure Modeling Methods
Pi, Jaewoo,Sael, Lee Molecular Diversity Preservation International (MD 2013 INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES Vol.14 No.10
<P>With the accumulation of next generation sequencing data, there is increasing interest in the study of intra-species difference in molecular biology, especially in relation to disease analysis. Furthermore, the dynamics of the protein is being identified as a critical factor in its function. Although accuracy of protein structure prediction methods is high, provided there are structural templates, most methods are still insensitive to amino-acid differences at critical points that may change the overall structure. Also, predicted structures are inherently static and do not provide information about structural change over time. It is challenging to address the sensitivity and the dynamics by computational structure predictions alone. However, with the fast development of diverse mass spectrometry coupled experiments, low-resolution but fast and sensitive structural information can be obtained. This information can then be integrated into the structure prediction process to further improve the sensitivity and address the dynamics of the protein structures. For this purpose, this article focuses on reviewing two aspects: the types of mass spectrometry coupled experiments and structural data that are obtainable through those experiments; and the structure prediction methods that can utilize these data as constraints. Also, short review of current efforts in integrating experimental data in the structural modeling is provided.</P>
Fully Scalable Methods for Distributed Tensor Factorization
Shin, Kijung,Sael, Lee,Kang, U IEEE 2017 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERIN Vol.29 No.1
<P>Given a high-order large-scale tensor, how can we decompose it into latent factors? Can we process it on commodity computers with limited memory? These questions are closely related to recommender systems, which have modeled rating data not as a matrix but as a tensor to utilize contextual information such as time and location. This increase in the order requires tensor factorization methods scalable with both the order and size of a tensor. In this paper, we propose two distributed tensor factorization methods, CDTF and SALS. Both methods are scalable with all aspects of data and show a trade-off between convergence speed and memory requirements. CDTF, based on coordinate descent, updates one parameter at a time, while SALS generalizes on the number of parameters updated at a time. In our experiments, only our methods factorized a five-order tensor with 1 billion observable entries, 10M mode length, and 1 K rank, while all other state-of-the-art methods failed. Moreover, our methods required several orders of magnitude less memory than their competitors. We implemented our methods on MAPREDUCE with two widely-applicable optimization techniques: local disk caching and greedy row assignment. They speeded up our methods up to 98.2 x and also the competitors up to 5.9 x.</P>
Lee, Jungwoo,Oh, Sejoon,Sael, Lee Oxford University Press 2018 Bioinformatics Vol.34 No.24
<P><B>Abstract</B></P><P><B>Motivation</B></P><P>Given multi-platform genome data with prior knowledge of functional gene sets, how can we extract interpretable latent relationships between patients and genes? More specifically, how can we devise a tensor factorization method which produces an interpretable gene factor matrix based on functional gene set information while maintaining the decomposition quality and speed?</P><P><B>Results</B></P><P>We propose GIFT, a <B>G</B>uided and <B>I</B>nterpretable <B>F</B>actorization for <B>T</B>ensors. GIFT provides interpretable factor matrices by encoding prior knowledge as a regularization term in its objective function. We apply GIFT to the PanCan12 dataset (TCGA multi-platform genome data) and compare the performance with P-Tucker, our baseline method without prior knowledge constraint, and Silenced-TF, our naive interpretable method. Results show that GIFT produces interpretable factorizations with high scalability and accuracy. Furthermore, we demonstrate how results of GIFT can be used to reveal significant relations between (cancer, gene sets, genes) and validate the findings based on literature evidence.</P><P><B>Availability and implementation</B></P><P>The code and datasets used in the paper are available at https://github.com/leesael/GIFT.</P><P><B>Supplementary information</B></P><P>Supplementary data are available at <I>Bioinformatics</I> online.</P>
Review on Graph Clustering and Subgraph Similarity Based Analysis of Neurological Disorders
Thomas, Jaya,Seo, Dongmin,Sael, Lee MDPI 2016 INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES Vol.17 No.6
<P>How can complex relationships among molecular or clinico-pathological entities of neurological disorders be represented and analyzed? Graphs seem to be the current answer to the question no matter the type of information: molecular data, brain images or neural signals. We review a wide spectrum of graph representation and graph analysis methods and their application in the study of both the genomic level and the phenotypic level of the neurological disorder. We find numerous research works that create, process and analyze graphs formed from one or a few data types to gain an understanding of specific aspects of the neurological disorders. Furthermore, with the increasing number of data of various types becoming available for neurological disorders, we find that integrative analysis approaches that combine several types of data are being recognized as a way to gain a global understanding of the diseases. Although there are still not many integrative analyses of graphs due to the complexity in analysis, multi-layer graph analysis is a promising framework that can incorporate various data types. We describe and discuss the benefits of the multi-layer graph framework for studies of neurological disease.</P>