About Hadoop Distributed File System

About Hadoop Distributed File System

Next

Previous

If there is just an area that has experienced tremendous development in recent years, that will be database management. Database has evolved tremendously with companies and an individual having the opportunity to improve the manner in which it handles or rather manages data. For instance, businesses nowadays prefer to utilize the more evolved columnar database instead of the more conventional row based database. This is primarily because of the increased flexibility that columnar database offers.

HDFC

Now, talking of growth in the world of applications, Hadoop apps are one of the best that have dramatically evolved over time and consequently giving businesses and people improved solutions when it comes to file systems. Specifically, Hadoop distributed file system is the main storage system which is normally utilized by Hadoop applications. It is renowned for providing high-performance data access across Hadoop clusters. Just like the rest of Hadoop applications, HDFS has also become a fundamental tool in the management of big data as well as in offering support in massive data analytics applications.

Advantages and disadvantages of HDFS

Apparently, Hadoop distributed file system is usually deployed on cheap commodity hardware. In turn, server failures are a common thing. However, the file system is originally designed to be extremely fault tolerant which is made possible by two things; Firstly, data transfer between compute nodes is made rapid and secondly, Hadoop systems are made to continue running when a compute node fails.

How does HDFS work?

The aim is to enable parallel processing. Upon taking data, it breaks down the information into distinct pieces and then distributes them to the different nodes within the cluster. HDFS also copies every piece of data several times besides distributing the copies to the individual nodes. At least a copy of the data is usually placed on a separate server rack than the rest. As a result, if data on the nodes crash, it can be accessed somewhere else within the cluster under consideration. This means that processing does not stop even when a potential failure is being resolved.

Why was HDFS designed?

HDFS was written specifically to help provide support to applications containing large sets of data, and this includes hulking individual files which reach up to terabytes. The file system makes use of master architecture where each cluster consists of one Name Node which manages the operations of the files system besides supporting Data Nodes, which usually manage storage of data on distinct compute nodes.