RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 음성지원유무
        • 학위유형
        • 주제분류
          펼치기
        • 수여기관
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 지도교수
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • Tree-Structured Regression for a Loglinear Model with an Extra-Poisson Varation

        최윤희 State University of New York at Stony Brook 2002 해외박사

        RANK : 232271

        이 논문은 과분산을 가진 포아송 분포자료를 좀 더 효과적으로 분석하기 위해 로그회귀나무를 제안하고 있다. 한 개의 로그선형모델 적합 후 잔차의 패턴을 파악하기 위해 잔차의 평균과 분산의 차이에 대한 통계적 검정을 하고, 잔차의 분포가 무작위가 될때까지 자료를 분할하여 분석하는 방법이다. 잔차는 'adjusted-Anscombe' 잔차를 이용하였다. 각 노드에서 자료를 분할 하기 위해 SUPPORT (Chaudhuri et al., 1994, Statistica Sinica, 4, 143-167)와 GUIDE (Loh, 2002, Statistica Sinica, 12, 361-386) 가 사용되었다. 적절한 나무 크기를 결정하기 위해 각 노드마다 bootstrap resampling method를 시행하는 방법이 개발되었다. 과분산을 가진 포아송 분포자료의 모델링을 위해 quasi-likelihood를 이용하였으며, 실제 자료가 과분산을 가지는 지에 대한 검정 통계량이 개발되었다. 이 검정통계량의 검정력을 알아보기 위해 또한, 여러 방정식의 상황에서 이 논문이 제시하고 있는 방법(Look-ahead procedure)과 기존의 방법(backward elimination method)을 비교하기 위하여 Monte-Carlo simulation 을 시행하였다. 마지막으로, 미국, Missouri 주의 폐암 사망에 대한 원인 분석이 본 논문에서 제시된 방법으로 분석되었다. Existing Poisson regression tree methods do not consider the existence of an over-dispersion within a cluster. In this dissertation, the algorithms of over-dispersed Poisson regression tree for analyses of count data is presented and implemented by FORTRAN 90 program. Performance of the extra-Poisson regression tree is compared with that of existing Poisson regression trees. The Quasi-likelihood is used for extra-Poisson regression, while the log-likelihood is used for fitting a linear model for existing Poisson regression model. For methods of constructing a regression tree, the look-ahead and backward elimination procedures are applied and compared. The former is based on SUPPORT (Smoothed and Unsmoothed Piecewise Polynomial Regression Tree by Chaudhuri et al., 1994, Statistica Sinica, 5, 641-666), and the latter is based on CART (Classification and Regression tree by Breiman et al., 1984). The look-ahead procedure employs a multi-step stopping rule with bootstrap resampling, while the backward-elimination method uses a pruning procedure. In order to obtain a tree with an optimal size, a new bootstrap approach, which is conducted at each node of the tree, is developed. The splitting variable is chosen using residual distributions, and two different tests are employed and compared. One is based on SUPPORT and the other is based on GUIDE (Generalized, Unbiased Interaction Detection and Estimation by Loh, 2002, Statistica Sinica, 13, to appear). SUPPORT em-ploys Levene's test for variance and two sample t-test for mean, and GUIDE employs the chi-squared test in the analysis of the residuals. GUIDE has a desirable strategy by limiting each predictor's role as regressor or split variable or both. This strategy is useful when a design matrix has categorical predictors. To illustrate the performance of the proposed tree-structured methods, real data from an epidemiological investigation (Marienfeld et al., 1980) of the effect for public drinking water on cancer mortality in the state of Missouri and motor insurance claims data will be analyzed. Simulated data are also used to evaluate the methods.

      • A Space-Efficient Positional Encoding Method in Transformers for Tree Structured Data

        김소정 숙명여자대학교 대학원 2025 국내석사

        RANK : 232268

        The traditional positional encoding methods used in Transformers are primarily designed for sequential data, limiting their ability to fully capture the hierarchical structure of tree-structured data. To address this limitation, this study proposes a novel positional encoding method, AMPE (Adjacency Matrix-based Positional Encoding), which is specifically tailored for tree-structured data. Existing BFS and DFS-based positional encoding approaches can only partially capture the hierarchical information within trees, making it challenging to learn the complex structures of trees effectively. AMPE, by leveraging a directional adjacency matrix, incorporates both parent-child relationships and the global structure of the tree into the encoding vector, enabling a unique representation of each node’s position within the tree. Experimental results show that AMPE achieves higher accuracy than BFS and DFS-based models, and maintains comparable performance to traditional methods such as 2D-PE and HTT, while offering superior memory efficiency by fixing the size of the encoding vector regardless of tree size. This allows Transformer models to efficiently learn hierarchical features when handling large-scale tree data. 기존의 트랜스포머에서 사용하는 위치 인코딩 방식은 주로 순차적 데이터에 적합하게 설계되어 트리 구조 데이터의 계층적 특성을 완전하게 반영하지 못하는 한계가 있다. 이에 본 연구에서는 트리 구조 데이터를 효과적으로 처리할 수 있는 새로운 위치 인코딩 기법, AMPE(Adjacency Matrix-based Positional Encoding)를 제안한다. 기존의 BFS 및 DFS 기반 위치 인코딩 방식은 트리의 계층적 정보를 제한적으로 반영할 수밖에 없으며, 트리의 복잡한 구조를 효율적으로 학습하기에 한계가 있었다. AMPE는 방향성 인접 행렬을 활용하여 부모-자식 관계뿐만 아니라 트리의 전역적 구조를 인코딩 벡터에 포함시켜, 트리 내 모든 노드의 위치를 고유하게 표현할 수 있다. 실험 결과, AMPE는 BFS와 DFS 기반 모델보다 높은 정확도를 보였고, 2D-PE 및 HTT와 같은 기존의 기법들과 비교했을 때 유사한 성능을 유지하면서도 인코딩 벡터 크기가 트리 크기와 관계없이 고정되어 메모리 효율성이 뛰어남을 보여준다.

      • 모형 적응성 가지치기를 이용한 이변량 로지스틱 분류 나무

        김철성 연세대학교 대학원 2016 국내석사

        RANK : 232238

        Two popular methods for classification are logistic regression and tree model. Combining them, this article introduces a new classification tree fitting a logistic model to the data in each node. Because the multivariate analysis of variance(MANOVA) chooses only two predictor variables at a time, the distribution of the data and the result of the classification in each node are easily visualized by means of two-dimensional plots.In addition, we proposes a new pruning algorithm which can be used in model-fitted trees. This algorithm considers the complexity of model-fitted terminal nodes, and decides whether it is model-fitted in each node automatically. Bivariate logistic classification trees using this pruning algorithm can reduce tree size, maintaining class prediction. Finally, we compare the performance of our algorithm to the existing method on 19 benchmark datasets. 나무 모형과 로지스틱 회귀 모형은 분류 문제에서 자주 쓰이는 모형화 방법이다. 본 논문에서는 이 둘을 결합해, 종점 마디에서 로지스틱 모형을 사용하는 분류 나무 모형을 제안한다. 특히 각 마디에서 다변량 분산분석(MANOVA)을 통해 유의한 두 예측변수를 찾아주고, 이를 이용해 이변량 로지스틱 모형을 만들면 종점 마디에서의 데이터 분포 및 분류 결과를 시각적으로 표현할 수 있다. 또한 모형을 적용함으로써 단변량 분할만으로 분류하기 어려운 데이터의 분류 정확도를 높인다. 추가적으로, 본 논문에서는 모형을 사용하는 분류 나무에 적용할 수 있는 새로운 가지치기 방법을 제안한다. 이를 통해 기존의 가지치기 방법이 반영하지 못하는 종점 마디 모형의 복잡성을 반영할 수 있고, 각 마디에서의 모형 사용 여부도 자동적으로 판단할 수 있다. 새로운 가지치기 방법을 이변량 로지스틱 분류 나무에 적용하면, 예측 정확도를 유지하며 나무의 크기를 줄일 수 있다. 마지막으로, 19개의 데이터 집합을 이용해 기존 기법과 새로운 기법의 분류 정확도 및 나무 크기의 차이를 비교한다.

      • Filling Gaps in the Research on IRTree Theoretically and Practically: A Comprehensive Taxonomy and a User-Friendly R Package

        Li, Zhaojun The Ohio State University ProQuest Dissertations & 2022 해외박사(DDOD)

        RANK : 232221

        Item response tree (IRTree) models (Bockenholt, 2012; De Boeck & Partchev, 2012) are a special type of item response models for analyzing response processes by decomposing categorical items using tree structures. Specifically, the tree structures divide categorical items into several nodes (i.e., sub-items), and the nodes can be indicators of different latent variables (e.g., the focal trait, response styles, the tendency to omit responses). Compared to traditional item response models, IRTree models relax the assumption of ordinality within a response scale, allowing response categories to measure different latent variables. Despite the advantages, applications of IRTree models are still limited for two major reasons: (1) the lack of a comprehensive taxonomy which hampers people from realizing the full range of potential applications, and (2) the difficulty of data preparation and model formulation for users who are less familiar with the models and/or the ad hoc coding that would be needed depending on the specific application. To alleviate these limitations, this dissertation has three main objectives. The first objective is to theoretically develop an IRTree taxonomy that categorizes IRTree models from the perspective of psychometrics. The second objective is to practically facilitate the use of IRTree models for substantive researchers through the development of an R package. The third objective is to encourage researchers to consider IRTree models for applications that seem less evident, by reporting on two of such applications (a multilevel IRTree model application and an unfolding IRTree model application). A relatively comprehensive taxonomy of IRTree models is established. The taxonomy lists various classes of features of IRTree models (e.g., type of link function, number of node parameters, number of node categories, modeling framework, type of IRTree structure). The taxonomy allows us to choose IRTree models by combining different model features that align with specific research interests. The existing IRTree applications thus far are mapped onto the taxonomy and possible new applications of IRTree models are discussed. Furthermore, a user-friendly R package, the second version of the irtrees package (irtrees 1.0.0; Li, Partchev, & De Boeck, 2021), is developed to facilitate the data preparation and model formulation in IRTree applications. Based on the first version of the irtrees package (irtrees 0.1.0; De Boeck & Partchev, 2012), I add eight functions that can accommodate more complex data structures and can recode datasets into IRTree datasets using either a wide format or a long format. I also provide brief tutorials for the eight functions and present examples of using commonly available software programs and packages to analyze IRTree datasets generated using the functions. Detailed manual and vignettes of the irtrees 1.0.0 package can be found on the CRAN website: https://CRAN.R-project.org/package=irtrees. To further illustrate (part of) the taxonomy and (part of) the package, I demonstrate two applications of IRTree models, one about multilevel IRTree models and the other about unfolding IRTree models. Both types of these IRTree models have not been used in previous studies. In each of these two applications, I present the model specification, an empirical application, and a simulation study. Good estimation performance and parameter recovery are found for both of these IRTree models. Tutorials on how to conduct data preparation and model estimation using the irtrees 1.0.0 package and commonly used software programs and packages for item response analysis in the two applications are presented in the appendices.

      • Machine Learning for Queue Prioritization: Applications to the Emergency Department

        Yilmaz, Gizem ProQuest Dissertations & Theses The University of 2022 해외박사(DDOD)

        RANK : 232202

        Queue prioritization is a common practice that allocates limited resources to heterogeneous customers to improve operational outcomes and customer satisfaction in service systems such as call centers and emergency departments.The first part of the dissertation studies a queue prioritization problem with two customer types under imperfect information. The service provider uses a binary classification model to estimate the probability of being a high-importance customer upon a customer’s arrival. If the likelihood is above a certain threshold, the customer is provided with priority service, which is faster and not too much more variable than non-priority service. The service provider wants to minimize the average waiting costs by selecting the optimal threshold.The ROC curve shows the performance of the binary classification algorithm in terms of sensitivity and specificity at various thresholds. Changing the threshold usually impacts the classification algorithm's sensitivity and specificity in opposite directions. The traditional threshold selection method tends to optimize a ROC curve-based metric and does not consider the operational externalities. This dissertation analyzes the optimal threshold policy in terms of the ROC curve, i.e., sensitivity and specificity, by considering the operational nature of the service systems. We find that optimal policy trades a loss in specificity for a higher gain in sensitivity.The second part of the dissertation is an empirical study on queue prioritization where customer priority depends on other customers' characteristics and system attributes, focusing on its application to the Emergency Department at the University of Chicago Medicine (UCM). We model the patient prioritization problem using a discrete choice framework and implement a tree-based segmentation algorithm that generates ED system clusters where a similar patient prioritization rule is observed. We find that room type, waiting room census, and time of the day are the most important system-level attributes; acuity and waiting time are the most important patient-level attributes for patient prioritization. High acuity patients are prioritized for the primary service area, while low acuity patients are prioritized for the fast-track area in the ED. The First-Come-First-Served principle is generally followed within the same acuity class. As the waiting room gets crowded and resource utilization increases, the adherence to acuity-based prioritization increases.Finally, we develop a tree-based segmentation algorithm that creates patient clusters and incorporates the cluster membership in the discrete choice model to capture the patient-level nonlinear and interaction effects.

      • 畵像트리構造를 利用한 非直列境界抽出과 應用에 關한 硏究

        김윤중 忠南大學校 1989 국내박사

        RANK : 232027

        The scene analysis is consisted of a few steps, namely, the conversion of the real scene into the image data on a discrete plane, the extraction of formative regions from the image data, the extraction of features from the formative regions, and the feature analysis, etc. It is desirable that the image data are converted into a special form in order to reduce the processing time and the required memory and also effectively to extract features like inclusive relation. The contour extraction and the maximal square moving method(MSM) have been proposed as such effective conversion methods. The contour extraction is again classified into two categories. One is the sequential border following method(SBF) which uses an image buffer of the same size as the input image and searches the starting pixel of the contour to trace the contour under a set of predefinded rules. The other is the non-sequential border following method(NBF) which relays on a special data structure rather than on the image buffer and searches the contour points in one pass of the raster scan input. Therefore, the NBF is thought to be better than the SBF from the viewpoint of the processing time and memory required. The MSM orginally developed by Wakayama extracts maximal squares included in the formative regions using the linked list structure and represents the image as their chains. He proposed some image processing techniques on that chains. Analyses of simple objects usually use information of the contour or the skeleton of objects. But in the case of the complex objects, the inclusive relation is also an important element to be analyzed. Especially, an automatic recognition system for complex objects requires the real time processing of the contour extraction and the interpretation of the inclusive relationship during the raster scan. However, it is impossible for the SBF to extract the contours during the raster scan input. In NHF, the interpretation of inclusive relationship and any operation on its structure were not considered. In MSM, the method for the interpretation of inclusive relationship was not mentioned, and an extra procedure is required to extract contours. In this dissertation, an algorithm that constructs a tree structure of an image is proposed and some image processing operations on the tree structure are applied. The algorithm extracts the contours and interpretes the inclusive relation of the contours in one pass of the raster scan input. The algorithm is in the category of NBF. In each node of the tree, the extracted contours are represented with the coordinates of the starting pixel and the chains of the 8 directional codes. In the structure of the tree, the inclusive relation of the contours is represented. The image processing operations proposed are as follows: contour handling, inclusive relation checks, image tree regeneration, pixel value checks, thinning, image tree input / output method, translation, enlargement, rotation, and the fast polygonal approximation of the contour. The proposed algorithm was implemented on an IBM-PC by PASCAL. Experiments with the CCITT test document of 1728 dots per line and 2128 lines show that this algorithm correctly extracts contours and interpretes inclusive relation.

      • Joint Latent Class Tree : An Application to Adolescents Risk Behaviors

        A Yeon Yun 고려대학교 대학원 2023 국내석사

        RANK : 231996

        Joint latent class analysis (JLCA) is a statistical method to handle multiple latent attributes. JLCA defines latent class membership for each latent class variable based on categorical response pattern of individuals, and also examines joint structure of latent class memberships. Since JLCA simultaneously considers multiple latent class variables to identify joint class patterns, however, it is challenging to obtain a meaningful solution. As an alternative, we propose the hierarchical clustering method, called joint latent class tree (JLCT), where each node of tree represents a joint latent class. JLCT is constructed in a tree structure by performing series of JLCA. At each joint latent class, JLCT selects the optimal set of latent class variables and determines whether a joint latent class is to be split based on the BIC. The sequential variable selection of JLCT enables us to include only necessary variables in model. Moreover we can obtain more straightforward interpretation of joint class patterns by the hierarchical structure of JLCT. We apply the JLCT method to explore the joint pattern of adolescent risk behaviors using the Youth Risk Behavior Surveillance System 2015 (YRBSS 2015) data. Especially, we consider four risk behaviors, Violent behavior, Drug use, Sexual behavior, Depression, examining which behavior is informative to identify joint patterns. 결합잠재범주분석 (Joint latent class analysis)은 여러 잠재 속성을 다루기 위해 사용되는 통계적 방법론이다. 결합잠재범주분석은 개인의 범주형 응답 패턴을 기반으로 각 잠재변수 (latent class variable)에 대한 잠재범주 (latent class)를 정의하고 정의된 잠재범주들 사이의 결합구조 (joint class pattern)를 파악한다. 그러나 결합잠재범주분석은 결합잠재범주 구조를 식별하기 위해 모든 잠재변수를 동시에 고려하기 때문에 실질적으로 의미있는 결합구조를 얻기 어려울 수 있다. 이러한 문제점을 극복하기 위해, 본 연구에서는 나무의 각 노드가 결합잠재범주를 의미하는 결합잠재범주나무 (Joint latent class tree)라는 계층적 클러스터링 방법을 제안한다. 결합잠재범주나무는 결합잠재범주분석을 연속적으로 수행하여 나무 구조를 완성한다. 각 노드에서 결합잠재범주나무는 BIC를 사용해 노드 분할에 가장 적합한 잠재변수 집합을 선택한 후, 노드의 분할 여부를 결정한다. 이러한 순차적 변수선택을 통해 결합잠재범주를 정의하는데 유용한 잠재변수만 모형에 포함시킬 수 있다. 또한 결합잠재범주나무의 계층적 구조는 각 결합잠재범주의 의미를 보다 직관적으로 이해하는데 도움을 준다. 본 연구에서는 Youth Risk Behavior Surveillance System 2015 (YRBSS 2015) 자료에 JLCT 모형을 적용하여 청소년 위험행동의 결합범주 구조를 분석한다. 특히 폭력적 행동, 약물 사용, 성적 행동 그리고 우울증의 네 가지 위험행동를 고려하여 어떤 위험행동이 결합잠재범주를 정의하는데 유용한지 알아보고자 한다.

      • 신경세포 자동 트리 구조화 및 신경세포 분할 방법

        송예슬 숭실대학교 대학원 2010 국내석사

        RANK : 231980

        오늘날 바이오메디컬 영상기술의 발달에 따라 생체분자 및 세포에서 일어나는 다양한 생명현상을 디지털 영상으로 기록하는 것이 가능해졌다. 이러한 영상을 통해 세포의 구조와 형태를 파악하거나 크기를 측정하여 세포의 성장 및 분열을 연구한다. 신경세포(Neuron)의 경우 둥근 형태의 일반적이 세포와 달리 형태가 복잡하여 기존의 세포분할 및 측정 연구를 적용하는 데에 어려움이 있다. 따라서 신경세포를 측정하거나 분할하기 위한 신경세포 형태 및 구조적 특징을 기반으로 하는 객관적인 척도가 필요하다. 그리고 대량의 자료를 효율적으로 처리하기 위한 자동화된 프로그램이 필요하다. 본 논문에서는 신경세포 형태가 나뭇가지처럼 뻗어나가며 분기한다는 점에 착안하여 신경세포 정보를 트리 구조 형태로 변환하는 방법을 연구하였다. 또한 영상 내에서 여러 개의 신경세포가 신경세포 돌기로 연결되어 있거나 겹쳐있는 경우 신경세포 트리구조의 정보를 이용하여 각각의 신경세포로 분할하는 방법을 연구하였다. 신경세포 영상을 입력받아 자동으로 신경세포 트리 구조화하고, 생성된 신경세포 트리 정보를 이용하여 신경세포의 크기 측정 및 구조를 파악하는 방법을 제안하였다. Capturing digital images from various life science occurred in cells or biomolecules has been made possible with the development of biomedical imaging technology. Biologists use these images to determining the structure, figure out the shape, or measure the size of cells. The earlier studies were based on general cells that are round shaped like a ball. But neuron researches can not use these studies because the shape of neuron has an intricate pattern. Therefore, the objective method based on the shape of neuron is needed for measure or segmentation of cells. This research suggests the method of convert neuron image information to tree structure based on the ideas of the shape of neuron is like branches growing. In addition, it also studied the method of segmentation to individual neuron using the neuron tree structure in case of overlapping cells or connections between neurite in digital image.

      • Dynamic Path Method Using Tree Structure

        간처지 충남대학교 대학원 2011 국내석사

        RANK : 231980

        There are many works in collision avoidance were introduced. But most of previous works are developed for finding a shortest length path or collision free and safest path from start point to goal point. Our proposed approach is focused on how to optimize driving methods on given fixed path depending on speed by getting informations of time and gasoline consumption. For this implementation, all collision positions are estimated based on time calculations first. Then cost function is derived which consists of two partial costs: total time and gasoline consumption for each speed. The total time function is the addition of two times. The first time is time between initial car state and collision area and the second time is time to wait until collision area becomes safe. The gasoline consumption calculates the amount of car fuel consumption until collision area. The tree structure is created depending on number of collisions to find optimum. It solved our problem by minimum sum after adding up all path values from root to leaves. The result looks like a sequence of actions that leads from initial state to a goal state. The overview of the algorithm and some experimental results are presented. 최근에는 물체의 이동에 소요되는 비용, 즉 소요시간, 연료 소모량, 이동 거리 및 속도를 최소화하는 알고리즘의 발달로 인해 실세계에서 이동에 대한 최소 비용을 예측 가능하게 하는 연구가 진행 되고 있다. 목적지까지이동하는 하나의 물체는 이동중에 움직이는 또 다른 물체와의 충돌이 발생할 수 있으며 충돌을 피하기 위해 정지하거나 속도를 줄임으로써이동 시간 및 연료 소모량에 비 효율적인 성능(비용)이 발생하게 된다. 물체의 이동에 소요되는 비용(시간+연료소모량)을 최소화하기 위해 이동 속도 및 연료 소모량에 대한 최적의 성능을 찾는알고리즘을 제안한다. 움직이는 다른 물체와의 충돌 위험지역까지 이동하는 구간에 대하여 가능한모든 이동 속도와 연료 소모량, 충돌 회피를 위해 대기하는 시간들의 요소를각각에 대한 가중치 연산을 통해 결과를 도출하고 결과를트리 형태로 구성함으로써 각각의 구간에 소요되는 최소 비용과 누적된 값에 대한 최적의 이동속도와 연료 소모량을 추론해 내는 방법을 제안 함으로써 움직이는 물체의 이동에 대한 효율적인 비용을 보장하고자 한다.

      • Predicting disease predisposition patterns of the personal genome based on disease hierarchy

        나영지 서울대학교 대학원 2013 국내박사

        RANK : 231980

        The advent of next-generation sequencing (NGS) technologies has had a huge impact upon functional genomics. The NGS technologies generate millions of short sequence reads per run, making it possible to sequence entire human genomes in a matter of weeks. These NGS technologies have already been employed to sequence the constitutional genomes of several individuals. Ambitious efforts like the 1000 Genomes Project and the Personal Genomes Project hope to add thousands more. The first five cancer genomes revealed thousands of novel somatic mutations and implicated new genes in tumor development and progression. Current knowledge of the genetic variants that underlie disease susceptibility, treatment response and other phenotypes will continually improve as these studies expand the catalog of DNA sequence variation in humans. As the cost of sequencing continues to freefall, the challenge of solving the data analysis and storage problems becomes more pressing. But those issues are nothing compared to the challenge facing the clinical community who are seeking to mine the genome for clinically actionable information. However, present analytical methods are insufficient to make genetic data accessible in a clinical context, and the clinical usefulness of these data for individual patients has not been formally assessed. Here, I focus on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. In this dissertation, I present a computational method for associating variants in the personal genome sequencing data with predispositions to disease. The method works by ranking all variants in the personal genome as potential disease risks, and reporting MeSH terms that are significantly associated with highly ranked genes. To identify genetic variants associated diseases, I obtained high-throughput sequencing data in several cancer types (acute myeloid leukemia, bladder cancer, breast cancer, colon cancer, glioblastoma multiforme, kidney cancer, lung adenocarcinoma, lung squamous cell carcinoma, malignant melanoma, ovarian serous cystadenocarcinoma and prostate cancer) and non-cancer types (Crohn’s disease, focal segmental glomerulosclerosis, and retinitis pigmentosa). From disease-gene association in the OMIM, I reconstructed relations of diseases and genes in the MeSH tree structures in order to consider the human disease hierarchical structure of human disease ontology. The results showed the distribution of mutual information in the MeSH disease category differs according to the population in the healthy people. It suggests that in order to interpret personal genome properly, we may consider population information together. In addition, MeSH disease terms are more highly ranked in the patients than healthy people. Disease-enrichment analysis showed Cancer, Neurological, Endocrine, and Immunological categories were over-represented in the patients as well as healthy people. Namely, it is possible to speculate systemic response patterns to diseases: Neuro-Endocrine-Immune Circuitry. In conclusion, although this study could not answer accurately the disease risk assessment, this study can provide data analysis scheme for the personal genome sequencing data. The scheme of this method has extendibility in genomic-based knowledge: drug-gene, environmental factor-gene and so on.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼