What is a good big data tutorial
Apache Hadoop: distributed storage architecture for large amounts of data
Hadoop Distributed File System (HDFS)
HDFS is a highly available file system, which is used to store large amounts of data in a computer cluster and is therefore responsible for data management within the framework. For this purpose, files are broken down into data blocks and distributed redundantly to different nodes without a classification scheme. According to the developers, HDFS is able to manage a number of files in the hundreds of millions. Both the length of the file blocks and the degree of redundancy can be configured individually.
The Hadoop cluster basically works according to the Master-slave principle. The architecture of the framework thus consists of a master node to which a large number of nodes are subordinate as slaves. This principle is also reflected in the structure of the HDFS, which is based on aNameNode and various subordinates DataNodes based. The NameNode manages all metadata on the file system, directory structures and files. The actual data storage takes place on the subordinate DataNotes. In order to minimize data loss, files are broken down into individual blocks and stored several times on different nodes. The standard configuration provides that each data block is available in triplicate.
Each DataNode sends the NameNode a sign of life, the so-called Heartbeat. If this signal is not received, the NameNote declares the respective slave "dead" and, with the help of the data copies on other nodes, ensures that enough copies of the relevant data blocks are available in the cluster despite the failure. The NameNode thus has a central role within the framework. So that this does not become a "single point of failure", it is common practice to have this master node SecondaryNameNode to the side, which records all changes regarding the metadata and thus enables a restoration of the central control instance.
In the transition from Hadoop 1 to Hadoop 2, HDFS was expanded to include additional backup systems: NameNode HA (High Availability) supplements the system with automatic failover, which automatically starts a replacement component in the event of a NameNode failure. A snapshot function also enables the system to be restored to an earlier state. In addition, the Federation extension allows multiple NameNodes to be managed within a cluster.
- How do I plan my studies
- What are Netflix Plans to Improve Recommendations
- What is the best personal translation app
- What are words that rhyme with discoverer?
- How important is personal growth to you
- What are the strengths of international law
- What is patriotism
- What are some nice affordable sapphire rings
- What is life like in Fuzhou?
- Why is she shy around me
- Paraffin is a sublime substance
- What is upper and middle back pain
- How do JIT compilers work
- What is the significance of constitutional law
- The PTE exam is difficult these days
- The joker could have a histrionic personality disorder
- How much is Jeremy Corbyn worth
- Can remove stains by professional carpet cleaning
- How many people don't use Facebook
- Why do relatives fake all the time
- Jewelry store prices are negotiable
- What are the best grunge songs
- Veterans are generally against the war
- Which are the most sought after Rolex watches