http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Designing Summary Tables for Mining Web Log Data
Ahn, Jeong-Yong Korean Data and Information Science Society 2005 한국데이터정보과학회지 Vol.16 No.1
In the Web, the data is generally gathered automatically by Web servers and collected in server or access logs. However, as users access larger and larger amounts of data, query response times to extract information inevitably get slower. A method to resolve this issue is the use of summary tables. In this short note, we design a prototype of summary tables that can efficiently extract information from Web log data. We also present the relative performance of the summary tables against a sampling technique and a method that uses raw data.
Analysis of Incomplete Data with Nonignorable Missing Values
김현정,Kim, Hyun-Jeong The Korean Data and Information Science Society 2002 한국데이터정보과학회지 Vol.13 No.2
In the case of "nonignorable missing data", it is necessary to assume a model dealing with the missing on each situations. In this article, for example, we sometimes meet situations where data set are income amounts in a survey of individuals and assume a model as the values are the larger, a missing data probability is the higher. The method is to maximize using the EM(Expectation and Maximization) algorithm based on the (missing data) mechanism that creates missing data of the case of exponential distribution. The method started from any initial values, and converged in a few iterations. We changed the missing data probability and the artificial data size to show the estimated accuracy. Then we discuss the properties of estimates.
A Comparison of NLSY and CPS Data
Jo, Yoon-Ae Korean Data and Information Science Society 2006 한국데이터정보과학회지 Vol.17 No.3
The family income distributions of NLSY97 and CPS youth data are compared by using the generalized beta distribution of the second kind. The null hypothesis that the two data sets represent the same underlying population is rejected. The ML estimation suggests that NLSY97 data are oversampled in an income group of $11,308 or less, by about 15.7% compared to CPS data.
Comprehensive comparison of normality tests: Empirical study using many different types of data
Lee, Chanmi,Park, Suhwi,Jeong, Jaesik The Korean Data and Information Science Society 2016 한국데이터정보과학회지 Vol.27 No.5
We compare many normality tests consisting of different sources of information extracted from the given data: Anderson-Darling test, Kolmogorov-Smirnov test, Cramervon Mises test, Shapiro-Wilk test, Shaprio-Francia test, Lilliefors, Jarque-Bera test, D'Agostino' D, Doornik-Hansen test, Energy test and Martinzez-Iglewicz test. For the purpose of comparison, those tests are applied to the various types of data generated from skewed distribution, unsymmetric distribution, and distribution with different length of support. We then summarize comparison results in terms of two things: type I error control and power. The selection of the best test depends on the shape of the distribution of the data, implying that there is no test which is the most powerful for all distributions.
Incremental Eigenspace Model Applied To Kernel Principal Component Analysis
Kim, Byung-Joo The Korean Data and Information Science Society 2003 한국데이터정보과학회지 Vol.14 No.2
An incremental kernel principal component analysis(IKPCA) is proposed for the nonlinear feature extraction from the data. The problem of batch kernel principal component analysis(KPCA) is that the computation becomes prohibitive when the data set is large. Another problem is that, in order to update the eigenvectors with another data, the whole eigenvectors should be recomputed. IKPCA overcomes this problem by incrementally updating the eigenspace model. IKPCA is more efficient in memory requirement than a batch KPCA and can be easily improved by re-learning the data. In our experiments we show that IKPCA is comparable in performance to a batch KPCA for the classification problem on nonlinear data set.
Can a securities law improve investor rationality in processing earnings information?
Kwag, Seung Woog The Korean Data and Information Science Society 2014 한국데이터정보과학회지 Vol.25 No.6
In this paper, I propose a general hypothesis that after the enactment of the Sarbanes-Oxley Act (SOA) financial statements convey more accurate and reliable corporate information to investors who in turn reflect such improvements in stock prices and test four practical hypotheses that simultaneously feature the degree of information asymmetry, forecast bias, and investor reaction to biased earnings information. The empirical results unanimously suggest that the post-SOA investors take advantage of the improvement in informational efficiency and accuracy and actively adjust for analyst forecast bias in earnings forecasts. The SOA indeed appears to achieve its primary goal of investor protection.
A note on Box-Cox transformation and application in microarray data
Rahman, Mezbahur,Lee, Nam-Yong The Korean Data and Information Science Society 2011 한국데이터정보과학회지 Vol.22 No.5
The Box-Cox transformation is a well known family of power transformations that brings a set of data into agreement with the normality assumption of the residuals and hence the response variable of a postulated model in regression analysis. Normalization (studentization) of the regressors is a common practice in analyzing microarray data. Here, we implement Box-Cox transformation in normalizing regressors in microarray data. Pridictabilty of the model can be improved using data transformation compared to studentization.
Nonpararmetric estimation for interval censored competing risk data
Kim, Yang-Jin,Kwon, Do young The Korean Data and Information Science Society 2017 한국데이터정보과학회지 Vol.28 No.4
A competing risk analysis has been applied when subjects experience more than one type of end points. Geskus (2011) showed three types of estimators of CIF are equivalent under left truncated and right censored data. We extend his approach to an interval censored competing risk data by using a modified risk set and evaluate their performance under several sample sizes. These estimators show very similar results. We also suggest a test statistic combining Sun's test for interval censored data and Gray's test for right censored data. The test sizes and powers are compared under several cases. As a real data application, the suggested method is applied a data where the feasibility of the vaccine to HIV was assessed in the injecting drug uses.
Lee, So-Yoon,Huh, Myung-Hoe,Park, Mira The Korean Data and Information Science Society 2014 한국데이터정보과학회지 Vol.25 No.5
In DNA microarray studies, the number of genes far exceeds the number of samples and the gene expression measures are highly correlated. Partial least squares regression (PLSR) is one of the popular methods for dimensional reduction and known to be useful for the classifications of microarray data by several studies. In this study, we suggest a modified version of the partial least squares regression to analyze gene expression data with survival information. The method is designed as a new gene selection method using PLSR with an iterative procedure of imputing censored survival time. Mean square error of prediction criterion is used to determine the dimension of the model. To visualize the data, plot for variables superimposed with samples are used. The method is applied to two microarray data sets, both containing survival time. The results show that the proposed method works well for interpreting gene expression microarray data.
Lee, Kyeongjun,Cho, Youngseuk The Korean Data and Information Science Society 2015 한국데이터정보과학회지 Vol.26 No.6
In lifetime data analysis, it is generally known that the lifetimes of test items may not be recorded exactly. There are also situations wherein the withdrawal of items prior to failure is prearranged in order to decrease the time or cost associated with experience. Moreover, it is generally known that more than one cause or risk factor may be present at the same time. Therefore, analysis of censored competing risks data are needed. In this article, we derive the Bayes estimators for the entropy function under the exponential distribution with an unknown scale parameter based on multiply Type II censored competing risks data. The Bayes estimators of entropy function for the exponential distribution with multiply Type II censored competing risks data under the squared error loss function (SELF), precautionary loss function (PLF) and DeGroot loss function (DLF) are provided. Lindley's approximate method is used to compute these estimators.We compare the proposed Bayes estimators in the sense of the mean squared error (MSE) for various multiply Type II censored competing risks data. Finally, a real data set has been analyzed for illustrative purposes.