⬆️ ⬇️

Disks, Controllers, OS and Advanced Format

image It would seem that everything about the Advanced Format disks over the past 4 years has been learned. There are indeed a lot of publications, but it is time to review all the technical details and pitfalls in one big article. It's about using AF-drives in servers, and I noticed that for most administrators, even in large companies, knowledge of the subject in most cases comes down to “this is somehow related to modern drives, but everything works for me”.



What is Advanced Format



Advanced Format is a new sector layout format used in some hard drives. Instead of the traditional 512-byte sector, 4096 bytes are used. Some SCSI / SAS / FC disks may use 520-byte and 528-byte thick sectors for additional control of data integrity , but this is not the topic of this article.



The increase in the size of the sector by 8 times is due to the need to improve the efficiency of data placement on modern disks. The overhead costs associated with 512-byte markup, begin to interfere with further increase in HDD capacity. In addition to service fields, there is a field with an error correction code (ECC) of 50 bytes in each 512-byte sector. In the 4096-byte sector, the length of the ECC-field is 100 bytes. The overall storage efficiency was improved by about 10%.





')

Naturally, support for non-standard sectors is required on the part of disk controllers and operating systems. To solve compatibility problems, an additional standard 512E was introduced, which designates disks with a physical sector size of 4096 bytes, but at the same time emulating the usual sector size of 512 bytes. Advanced Format disks without emulation are designated 4KN . Thus, now there are three options for markup:

Format

Logical sector size

Physical sector size

512N

512 bytes

512 bytes

512E

512 bytes

4096 bytes (4KiB)

4KN

4096 bytes (4KiB)

4096 bytes (4KiB)

Compatibility



Operating Systems



At first glance, it seems that the use of emulation of the 512-byte sector removes all compatibility problems, but it is not. First, a performance problem immediately arises. What happens when writing a block of 512 bytes in size to a disk with a sector size of 4096 bytes (even if it emulates the presence of 512 byte sectors)? There will be a classic process of read-modify-write, instead of one operation you will need two: read the sector 4096 bytes, change 512 bytes in it (the block to be written) and write 4096 bytes back. A similar problem manifests itself also in the absence of alignment, when the recorded data block can be quite large and even a multiple of 4096 bytes, but at the same time it is shifted relative to the boundaries of real sectors:







In modern conditions, write operations with blocks of less than 4096 bytes are extremely rare, but the problem with alignment remains. For example, in older Windows (before Windows Server 2008) during installation, the boot partition is created with an offset of 63 sectors. This has historically happened since the time when the BIOS used real disk geometry instead of LBA. Of course, the offset in 63x512 is not divided by 4096, which leads to a violation of alignment for all subsequent sections and a decrease in performance. For the first time, this problem was noticed due to the use of RAID controllers and the need to align sections on stripe boundaries and it was solved in Windows Vista / Windows Server 2008 (and at about the same time in other operating systems) by introducing alignment on 1024KiB boundaries ( 1MB) i.e. The first partition is created with an offset of 2048 512-byte sectors.



Why precisely 1MB if a smaller offset is appropriate (the main thing is to divide by 4096 bytes)? Just because you need a reserve, because in addition to the physical disk, volumes on RAID controllers (with a default stripe size, for example, Adaptec 256KiB), SSD (with a large page size) or disk images can be used as a block device , The recommended NTFS cluster size for SQL or Exchange is 64KiB, etc.



Problem number two - possible data loss for scenarios with synchronous recording. For situations with a block record less than 4096 bytes or an unaligned block of synchronous recording in fact will not work. It remains to “teach” the OS not to use blocks less than 4096 bytes on 512E disks when writing, but there are certain problems with this.

Microsoft



For Microsoft OS, there is the following official ( primary ) data:

Format

Logical sector size

Physical sector size

Compatible OS

512E

512 bytes

4096 bytes (4KiB)

  • Windows 8, 8.1
  • Windows Server 2012, 2012 R2
  • Windows 7 w / MS KB 982018
  • Windows 7 SP1
  • Windows Server 2008 R2 w / MS KB 982018
  • Windows Server 2008 R2 SP1
  • Windows Vista w / MS KB 2553708
  • Windows Server 2008 w / MS KB 2553708


4KN

4096 bytes (4KiB)

4096 bytes (4KiB)

  • Windows 8, 8.1
  • Windows Server 2012, 2012 R2


Note the additional reminder that Windows Server 2003 R2, Windows XP, and other operating systems based on XP code base (for example, Windows Home Server 1.0, Windows Small Business Server 2003, 2003 R2), although they can function in conjunction with 512E disks, but Microsoft warns against using such disks: if the alignment problem can still be solved, then performance problems or potential data loss in case of power loss during read-modify-write cannot be bypassed.



You can check the alignment of existing partitions and set the offset for new partitions in Windows using diskpart. Example (partition on disk 0 with an offset of 1024KiB or 2048 512-byte sectors):



	 select disk 0
	 create partition primary align = 1024
	
The easiest way to check is through WMI (example):



	 wmic partition get Blocksize, StartingOffset, Name
	 BlockSize Name StartingOffset
	 512 Disk # 0, section # 0 1048576
	 512 Disk # 0, section # 1 368050176
	 512 Disk # 2, section # 0 135266304
	 512 Disk # 1, section # 0 1048576 


In the StartingOffset column, there should be 1024KiB for the first partition, for the rest it should be divided into 1024KiB, this means that everything will be divided into 4096 bytes and all other “good numbers” (sizes of stripes and NTFS clusters).



Let me remind you that in modern Windows, the offset to 1024KiB is already used by default, so you only need to check / set it manually for the OS from the “63-sector” era. When automatically creating GPT markup (via Disk Management) on a 512N or 512E disk, you will see an offset for the first partition at 17KiB. This is not a cause for alarm, as this is the service section of the MSR . The first standard section will be created with an offset of 135266304 bytes (129MiB) - perfectly divided into any of our "good numbers".

Linux



Linux compatibility table (only common server distributions are listed):

Format

Logical sector size

Physical sector size

Compatible OS

512E

512 bytes

4096 bytes (4KiB)

  • RHEL 6.1
  • SLES 11 SP2
  • Ubuntu 13.10
  • Ubuntu 04/12/4


4KN

4096 bytes (4KiB)

4096 bytes (4KiB)

  • RHEL 6.1
  • SLES 11 SP2
  • Ubuntu 13.10
  • Ubuntu 04/12/4


For other distributions, you can focus on the kernel version (> 2.6.31) and the version of the disk partitioning tools: GNU Fdisk> 1.2.3 or GNU Parted> 2.1



You can see the sizes of the physical and logical blocks in / sys / block / sdX / queue / physical_block_size and in / sys / block / sdX / queue / logical_block_size, respectively.

GNU Fdisk will automatically use an offset of 1MiB when started with the -c and -u keys (disable DOS compatibility mode and use the sector as the unit of measurement). Normal Fdisk does not know how to work with GPT, so it is useless for disks> 2TiB, and you need to use Parted or GPT Fdisk . The latter uses by default for 512N / 512E disks an offset of 2048 sectors we need:



	 Disk / dev / sde: 7814037168 sectors, 3.6 TiB
	 Logical sector size: 512 bytes
	 Disk identifier (GUID): BE7D7D71-F6ED-4371-ACFE-B04819A4DDC2
	 Partition table holds up to 128 entries
	 First usable sector is 34, last usable sector is 7814037134
	 Partitions will be aligned on 2048-sector boundaries
	 Total free space is 7814037101 sectors (3.6 TiB) 


Example for GNU Parted (for 512N / 512E disks):



	 # create new GPT markup
	 mklabel gpt
	 # create a partition for all free space with an offset of 2048 sectors
	 (parted) mkpart part1 2048s 100%
	 (parted) print
	 Model: ATA WDC WD40EFRX-68W (scsi)
	 Disk / dev / sde: 7814037168s
	 Sector size (logical / physical): 512B / 4096B
	 Partition Table: gpt

	 Number Start End Size File System Name Flags
	 1 2048s 7814035455s 7814033408s part1 


In LVM, everything is fine: the default offset is 1MiB and the size of PE (physical extent) is a multiple of 1MiB.



	 # check offset
	 #pvs / dev / sde -o + pe_start
	 PV VG Fmt Attr PSize PFree 1st PE
	 / dev / sde VolRed lvm2 a-- 3.64t 3.64t 1.00m

	 # check PE size
	 #pvdisplay / dev / sde
	 --- Physical volume ---
	 PV Name / dev / sde
	 VG Name VolRed
	 PV Size 3.64 TiB / not usable 3.84 MiB
	 Allocateable yes
	 PE Size 4.00 MiB
	 Total PE 953861
	 Free PE 953861
	 Allocated PE 0
	 PV UUID 9AfJr9-OOtC-PB34-dUnq-kCDK-L1fN-aTAxus 




VMware



An article in the VMware Knowledge Base states that neither 512E nor 4KN disks are supported. 4KN disk support is claimed in vSphere 6.0.



With the advent of VMFS-5, we got a single block size - 1MiB and the correct 1MiB offset for the first partition. Previously, not always suitable offset was used in 64KiB. But all this does not cancel VMware’s claims that 512E drives are not supported. Apparently, this is due to the fact that the VMDK format stores data with a granularity of 512 bytes.



Other OS



Mac OSX supports Advanced Format since Tiger. FreeBSD and other * BSD, Oracle Solaris, and many other operating systems remain, but a detailed discussion of the situation with Advanced Format disks in them is beyond the scope of this article.



Microsoft services



Hyper-v



Despite the fact that 512E drives are supported in Windows Server 2008 and 2008 R2 (see the table of requirements for installed KB), the following problem appears in Hyper-V: VHD virtual disk file format uses 512-byte structures for dynamic (“thin”) and differential VHD, which naturally leads to regular read-modify-write. The situation is aggravated by the fact that for a guest OS a virtual disk looks like having 512 bytes of physical sectors. Use fixed VHDs, but if possible, do not use 512E drives to place VHD files.

In Windows Server 2012, the VHDX format appeared, which does not have the problems described above (you can create it in any form - 512N / 512E / 4KN).

Exchange server



There are features associated with replication in DAG:



SQL Server



The situation is the same as for Exchange Server - in fault tolerant configurations for databases and logs on all nodes disks with the same physical sector size should be used.

When using Storage Spaces, an interesting situation arises: the presented size of the physical sector turns out to be 4KiB regardless of which disks the Storage Spaces are assembled from (the Storage Spaces volume can be created from different disks - 512N and 512E, naturally cannot be mixed with 4KN, except for using tiering with SSD). VHDX (virtual disk) format is created by default as 512E. You can verify this by running fsutil fsinfo ntfsinfo <disk name>:



	 Bytes Per Sector: 512
	 Bytes Per Physical Sector: 4096 


When using VHDX on a Storage Spaces volume (or hardware RAID) consisting of 4KN disks, VHDX itself is also advisable to make 4KN:

       New-VHD -Path D: \ image4kn.vhdx -Fixed -SizeBytes 500GB -LogicalSectorSizeBytes 4096 -PhysicalSectorSizeBytes 4096
       


Is it safe for SQL and other applications using synchronous recording? The answer is yes, since the large granularity of the storage does not violate the integrity of the data, this also does not affect the performance, since 4096 is divided by 512.



Services using ESENT



Not quite relevant problem in Windows Server 2008. Services that use the Extensible Storage Engine API (AD, WINS, DHCP) in work can fall when the size of the physical sector changes (for example, when migrating from a 512N disk to 512E). Detailed description and hotfix see here .



Other software



Obviously, software designed to manage partitions (cloning, moving, resizing) and to automate backups must take into account the peculiarities of working with Advanced Format disks. Here is the situation:





Controllers



Universal rules for all controllers:



Adaptec by PMC





Compatibility lists for Adaptec controllers .

LSI / Avago



LSI chip controllers use Dell, IBM, Lenovo, Fujitsu, Intel and Supermicro. The correspondence between the models from LSI and OEM-options can be installed on the chip.



Update 03/25/2015: The latest 3108-based LSI MegaRAID RAID controllers have a poorly documented volume (VD) feature called Emulation Type . In the controller BIOS, the possible values ​​are Default, Disabled, and Forced. You can also switch via MSM or StorCLI:

  storcli / cx / vx set emulationType = 0 | 1 | 2 


This property is responsible for the block sizes presented to the host:

Default (0): if there are 512E disks in the volume, it is presented as 512E. If all the disks are 512N, then the volume will be presented as 512N

Disabled (1): Tom always presents as 512N despite the presence of 512E disks

Forced (2): Tom always presents as 512E even in the absence of 512E disks



The emulation type was ported to SAS2 controllers (LSI 2108/2208), but without the Forced value (2).



Software RAID in Intel Chipsets (RST / RSTe)



4KN is not supported at all, Intel RST on 512E drives requires fresh drivers .



Advanced Format in enterprise-class drives. What awaits us?



It's about corporate-class last episodes. Desktop HDD and positioned for NAS or video surveillance did not get here.

VendorSeriesForm factorInterfacesSpindle rotation speed, rpm512N512E4KNAdditionally
SeagateEnterprise Performance 10K HDD (10k.8)2.5 "SAS10,000YYYfor 512N capacity is limited: 600 / 1200GB
SeagateEnterprise Performance 15K HDD (15k.5)2.5 "SAS15,000YYY32GB of integrated SSD cache
SeagateEnterprise Capacity 2.5 HDD (V.3)2.5 "SAS, SATA7200YY
SeagateEnterprise Capacity 3.5 HDD (V.4)3.5 "SAS, SATA7200YY
SeagateArchive hdd3.5 "SATA7200YPositioned for archival use, less MTBF and worse BER
SeagateTerascale hdd3.5 "SATA5900/7200YPositioned for cloud use, less MTBF and worse BER
HgstUltrastar C10K18002.5 "SAS10,000YYYfor 512N capacity is limited: 300/600/900 / 1200GB
HgstUltrastar C15K6002.5 "SAS15,000YYY
HgstUltrastar C7K10002.5 "SAS7200Y
HgstUltrastar He 83.5 "SAS, SATA7200YY
HgstUltrastar He 63.5 "SAS, SATA7200Y
HgstUltrastar 7K60003.5 "SAS, SATA7200YY
HgstMegaScale DC 4000.B3.5 "SATA5400YPositioned for cloud use, less MTBF and worse BER
WdXe2.5 "/3.5"SAS10,000Y
WdRe3.5 "SATA7200Y
WdSe3.5 "SATA7200YPositioned for cloud use, less MTBF and worse BER
WdAe3.5 "SATA5760Y?Positioned for archival use, less MTBF and worse BER
ToshibaAL13SE2.5 "SAS10,000Y
ToshibaAL13SX2.5 "SAS15,000Y
ToshibaAL13SEL3.5 "SAS10,000Y
ToshibaMG03ACA / MG03SCA3.5 "SAS, SATA7200Y
ToshibaMG04ACA3.5 "SATA7200YY
ToshibaMG04SCA3.5 "SAS7200YY
ToshibaMC04ACA3.5 "SATA7200YPositioned for cloud use, less MTBF and worse BER
You see the trend yourself - Advanced Format has finally penetrated from the desktop segment into the corporate one. Fast SAS 10/15 kpm drives are still available in the 512N version, but the density increase forces manufacturers to use 4KiB sectors: Seagate 10k.8 and HGST Ultrastar C10K1800 with a capacity of 1800GB are available only in versions 512E and 4KN. All drives larger than 5TB with the exception of HGST Ultrastar He 6 - only Advanced Format.



SSD



SSD have their own characteristics. You can read and write data with pages that are 2–4–8–16KiB depending on the SSD architecture. At the same time, for recording, it is necessary to pre-erase cells, which is carried out not page by page, but in blocks of several hundred pages. For example, Samsung 840 EVO has blocks of 2Mb, each of which consists of 256 pages of 8kb. In this case, of course, any block size presented to the host - 512 or 4096 bytes - will be an abstraction.



Some of the current SAS / SATA SSDs emulate a 512E drive, but most of the compatibility considerations are 512N. There is no need to take any special measures in this regard, as in the enterprise-class SSD the contents of the cache are necessarily protected from power loss. Enough to ensure the alignment of the page size.

Some PCI-E SSDs, for example, produced by Fusion IO, allow using proprietary utilities to change the size of the logical sector during formatting, i.e. Switch between 512E and 4KN modes. For some SSDs with a SAS interface, this is also possible, for example, the Seagate 1200 supports resizing the sector with the usual sg_format. Transition to the 4KiB sector in some scenarios can significantly improve performance.



findings



  1. 512E drives are not suitable for use in servers with outdated OSes that ignore the size of the physical sector. In desktop applications, it does not matter much, as no one usually uses synchronous recording.
  2. Carefully study your infrastructure: OS, used services, controllers, storage systems, caching modes on controllers and storage systems. If there are potential problems with performance and / or data integrity, take appropriate action.
  3. Problems with outdated operating systems can be circumvented using virtualization, but still need to pay attention to the alignment of partitions.


Links



Source: https://habr.com/ru/post/245085/



All Articles