In this paper, we present the features of the HDFS, which stands for Hadoop Distributed File System. HDFS is a distributed user-level file system to manage storage resources across a cluster, and is designed for high availability and portability acros...
In this paper, we present the features of the HDFS, which stands for Hadoop Distributed File System. HDFS is a distributed user-level file system to manage storage resources across a cluster, and is designed for high availability and portability across heterogeneous hardware and software platforms [1]. HDFS consists of a single Namenode and multiple Datanodes. This paper points a bottleneck of HDFS in the Namenode while introducing the architecture of HDFS. This problem is from the Master-Worker structure of HDFS nodes and the Namenode is suffered from heavy loads from many HDFS clients and Datanodes. As a solution of reducing loads in the Namenode, we propose a new approach to managing metadata in HDFS: it is to provide HDFS clients a cache to save the mapping information of requested files and their blocks.