In last years, Hadoop has become a basic data processing infrastructure in the field of Big data processing. Due to design limitations of Hadoop, it is difficult to efficiently classify for real-time processing and identify information
of data. File i...
In last years, Hadoop has become a basic data processing infrastructure in the field of Big data processing. Due to design limitations of Hadoop, it is difficult to efficiently classify for real-time processing and identify information
of data. File in HDFS modification is not possible due to a feature of WORM-based. Also a very large block size is inefficient to process a small file. MapReduce to perform parallel analysis on a cluster is not suitable for real-time processing because the analysis proceeds around the batch processing.
In this paper, we proposes a method of real-time processing and dynamic classification of Big data using tags. The tagged data can be used to the real-time processing. In addition, it can assist batch processing of
MapReduce. The proposed method to lower memory usage of Hadoop name node, and when performed MapReduce, it was effective to reducing the number of mappers generated. In addition, it was confirmed that the tags that are useful for real-time processing and dynamic classification.