http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Stream Data Mining: Platforms, Algorithms, Performance Evaluators and Research Trends
Bakshi Rohit Prasad,Sonali Agarwal 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.9
Streaming data are potentially infinite sequence of incoming data at very high speed and may evolve over the time. This causes several challenges in mining large scale high speed data streams in real time. Hence, this field has gained a lot of attention of researchers in previous years. This paper discusses various challenges associated with mining such data streams. Several available stream data mining algorithms of classification and clustering are specified along with their key features and significance. Also, the significant performance evaluation measures relevant in streaming data classification and clustering are explained and their comparative significance is discussed. The paper illustrates various streaming data computation platforms that are developed and discusses each of them chronologically along with their major capabilities. This paper clearly specifies the potential research directions open in high speed large scale data stream mining from algorithmic, evolving nature and performance evaluation measurement point of view. Finally, Massive Online Analysis (MOA) framework is used as a use case to show the result of key streaming data classification and clustering algorithms on the sample benchmark dataset and their performances are critically compared and analyzed based on the performance evaluation parameters specific to streaming data mining.
Comparative Study of Big Data Computing and Storage Tools : A Review
Bakshi Rohit Prasad,Sonali Agarwal 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.1
As a result of tremendous rise in internet usage like social media and forums, mail systems, scholarly and research articles, daily online transactions from multiple sources like health care systems, meteorological and environmental organizations etc., the data collected has shoot up exponentially. This vast collection of data, called Big Data, has caused the traditional tools incompetent for managing it from either of storage, computing or analytical perspective. There is an immense need of architectures, platforms, tools, techniques and algorithms to handle Big Data. The available technologies deal with two broad aspects related to Big Data that are Big Data Storage Management and Big Data Computing, focused to overcome various challenges such as scalability, faster processing speed, multiple format data processing, availability, faster response time and analytics etc. This paper reviews recent trends of storage and computing tools with their relative capabilities, limitations and environment they are suitable to work with.
An Ensemble Approach for Efficient Churn Prediction in Telecom Industry
Pretam Jayaswal,Bakshi Rohit Prasad,Divya Tomar,Sonali Agarwal 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.8
The rise of globalization and market liberalization are changing the face of market competitiveness significantly. The appearance of modern technology in business processes has intensified the competition and put forth new challenges for service providing companies. To cope up with changing scenarios, companies are shifting their attention on retaining the existing customers rather hiring new ones. This is more cost effective and requires lesser resource as well. The phenomenon of abandoning the company by a customer is known as churn and in this context, anticipating the customer's intention to churn is called churn prediction. Data Mining and machine learning techniques, as applied to customer behavior and usage information, can assist the churn management processes. This paper used customer usage and related information from a telecom service provider to analyze churn in telecom industry. The decision trees and its ensembles, Random Forest and Gradient Boosted trees are used as underlying statistical machine learning models for building the binary churn classifier. The implementation part has been done using apache spark which is state of the art unified data analysis framework for machine learning and data mining. In order to achieve better and efficient results, the grid based hyper-parameter optimization is applied.
Comparative Study of Recent Trends on Cancer Disease Prediction using Data Mining Techniques
Satyam Shukla,Dharmendra Lal Gupta,Bakshi Rohit Prasad 보안공학연구지원센터 2016 International Journal of Database Theory and Appli Vol.9 No.9
Technological advancements have evolved into several application domains to solve various problems. One such technological area is Data Mining. It has shown its significance and potential in health care industries to serve as a guiding and decision making component. Its potential in unveiling new trends in health care organizations has proved its importance for all people associated with this area. It is the most important and encouraging area of research which have the motive to find out the information from large data set. Advance researches in data mining had made it a key player in health care field. Good analytical techniques are of utmost requirement for detecting precious information lying hidden in health industry data. This survey paper presents the importance and usefulness of different Data mining techniques such as classification, clustering, Decision Tree, Naive Bayes etc. in health domain. Here the study and comparison is done of different data mining techniques used for prediction of cancer disease from clinical dataset with different accuracy.