📜 ⬆️ ⬇️

Hadoop 3.0: a brief overview of new features

Apache Software Foundation announced the release of a new version of an open framework for developing and executing distributed programs - Hadoop 3.0. This is the first major release since the release of Hadoop 2 in 2013. In more detail about some new features of Hadoop 3.0 and about what the next versions will offer, we will tell further.


/ photo by Chris Feser CC

What's new


Redundancy Codes (Erasure Coding) for HDFS

This is a data protection method that was mainly used in object stores. Now Hadoop will not store three copies of data on different clusters. Instead, a data fragmentation method is used that is similar to RAID 5 or 6. Thanks to more efficient replication, storage efficiency is increased by 50%.
')
Vinod Kumar Vavilapalli (Vinod Kumar Vavilapalli), one of the update developers and CTO of Hortonworks, says that the reason for adding Erasure Coding was the growth of data stored by companies in Big Data clusters. The new feature will allow you to more effectively manage the storage and use the capabilities of the Hadoop cluster to the maximum.

YARN Federation

According to Vavilapalli, initially YARN scaled only to 10 thousand nodes. Therefore, developers from Microsoft have added the YARN Federation function, which allows YARN to work with 40 or even 100 thousand nodes.

New Resource Types

Hadoop 3.0 adds a new framework to YARN to work with additional types of resources, in addition to memory and CPU. The update also gives more disk control in Big Data clusters. In the future, support for GPU and FPGA will be added.

Java 8

Hadoop 2 runs on Java Developers Kit 7. In Apache Hadoop 3.0, the transition to JDK 8 was made. In addition to Hadoop 3.0, support for JDK 8 was announced in HBase 2.0, Hive 3.0 and Phoenix 3.0.

Roadmap: Hadoop 3.1 and 3.2


The developers told what functions will be added in future versions of the framework. Let's look at the main features.

Hadoop 3.1


Hadoop 3.2


The developers plan to release updates of Hadoop 3.1 and Hadoop 3.2 with a difference of three months.



PS Other materials from the first corporate IaaS blog:


Source: https://habr.com/ru/post/344596/


All Articles