
It would seem that everything about the Advanced Format disks over the past 4 years has been learned. There are indeed a lot of publications, but it is time to review all the technical details and pitfalls in one big article. It's about using AF-drives in servers, and I noticed that for most administrators, even in large companies, knowledge of the subject in most cases comes down to “this is somehow related to modern drives, but everything works for me”.
What is Advanced Format
Advanced Format is a new sector layout format used in some hard drives. Instead of the traditional 512-byte sector, 4096 bytes are used. Some SCSI / SAS / FC disks may use 520-byte and 528-byte thick sectors for
additional control of data integrity , but this is not the topic of this article.
The increase in the size of the sector by 8 times is due to the need to improve the efficiency of data placement on modern disks. The overhead costs associated with 512-byte markup, begin to interfere with further increase in HDD capacity. In addition to service fields, there is a field with an error correction code (ECC) of 50 bytes in each 512-byte sector. In the 4096-byte sector, the length of the ECC-field is 100 bytes. The overall storage efficiency was improved by about 10%.
')
Naturally, support for non-standard sectors is required on the part of disk controllers and operating systems. To solve compatibility problems, an additional standard
512E was introduced, which designates disks with a physical sector size of 4096 bytes, but at the same time emulating the usual sector size of 512 bytes. Advanced Format disks without emulation are designated
4KN . Thus, now there are three options for markup:
Format
| Logical sector size
| Physical sector size
|
512N
| 512 bytes
| 512 bytes
|
512E
| 512 bytes
| 4096 bytes (4KiB)
|
4KN
| 4096 bytes (4KiB)
| 4096 bytes (4KiB)
|
Compatibility
Operating Systems
At first glance, it seems that the use of emulation of the 512-byte sector removes all compatibility problems, but it is not. First, a performance problem immediately arises. What happens when writing a block of 512 bytes in size to a disk with a sector size of 4096 bytes (even if it emulates the presence of 512 byte sectors)? There will be a classic process of read-modify-write, instead of one operation you will need two: read the sector 4096 bytes, change 512 bytes in it (the block to be written) and write 4096 bytes back. A similar problem manifests itself also in the absence of alignment, when the recorded data block can be quite large and even a multiple of 4096 bytes, but at the same time it is shifted relative to the boundaries of real sectors:
In modern conditions, write operations with blocks of less than 4096 bytes are extremely rare, but the problem with alignment remains. For example, in older Windows (before Windows Server 2008) during installation, the boot partition is created with an offset of 63 sectors. This has historically happened since the time when the BIOS used real disk geometry instead of LBA. Of course, the offset in 63x512 is not divided by 4096, which leads to a violation of alignment for all subsequent sections and a decrease in performance. For the first time, this problem was noticed due to the use of RAID controllers and the need to align sections on stripe boundaries and it was solved in Windows Vista / Windows Server 2008 (and at about the same time in other operating systems) by introducing alignment on 1024KiB boundaries ( 1MB) i.e. The first partition is created with an offset of 2048 512-byte sectors.
Why precisely 1MB if a smaller offset is appropriate (the main thing is to divide by 4096 bytes)? Just because you need a reserve, because in addition to the physical disk, volumes on RAID controllers (with a default stripe size, for example, Adaptec 256KiB), SSD (with a large page size) or disk images can be used as a block device , The recommended NTFS cluster size for SQL or Exchange is 64KiB, etc.
Problem number two - possible data loss for scenarios with synchronous recording. For situations with a block record less than 4096 bytes or an unaligned block of synchronous recording in fact will not work. It remains to “teach” the OS not to use blocks less than 4096 bytes on 512E disks when writing, but there are certain problems with this.
Microsoft
For Microsoft OS, there is the following official (
primary ) data:
Format
| Logical sector size
| Physical sector size
| Compatible OS
|
512E
| 512 bytes
| 4096 bytes (4KiB)
| - Windows 8, 8.1
- Windows Server 2012, 2012 R2
- Windows 7 w / MS KB 982018
- Windows 7 SP1
- Windows Server 2008 R2 w / MS KB 982018
- Windows Server 2008 R2 SP1
- Windows Vista w / MS KB 2553708
- Windows Server 2008 w / MS KB 2553708
|
4KN
| 4096 bytes (4KiB)
| 4096 bytes (4KiB)
| - Windows 8, 8.1
- Windows Server 2012, 2012 R2
|
Note the additional reminder that Windows Server 2003 R2, Windows XP, and other operating systems based on XP code base (for example, Windows Home Server 1.0, Windows Small Business Server 2003, 2003 R2), although they can function in conjunction with 512E disks, but Microsoft warns against using such disks: if the alignment problem can still be solved, then
performance problems or
potential data loss in case of power loss during read-modify-write cannot be bypassed.
You can check the alignment of existing partitions and set the offset for new partitions in Windows using diskpart. Example (partition on disk 0 with an offset of 1024KiB or 2048 512-byte sectors):
select disk 0
create partition primary align = 1024
The easiest way to check is through WMI (example):
wmic partition get Blocksize, StartingOffset, Name
BlockSize Name StartingOffset
512 Disk # 0, section # 0 1048576
512 Disk # 0, section # 1 368050176
512 Disk # 2, section # 0 135266304
512 Disk # 1, section # 0 1048576
In the StartingOffset column, there should be 1024KiB for the first partition, for the rest it should be divided into 1024KiB, this means that everything will be divided into 4096 bytes and all other “good numbers” (sizes of stripes and NTFS clusters).
Let me remind you that in modern Windows, the offset to 1024KiB is already used by default, so you only need to check / set it manually for the OS from the “63-sector” era. When automatically creating GPT markup (via Disk Management) on a 512N or 512E disk, you will see an offset for the first partition at 17KiB. This is not a cause for alarm, as this is the service section of the
MSR . The first standard section will be created with an offset of 135266304 bytes (129MiB) - perfectly divided into any of our "good numbers".
Linux
Linux compatibility table (only common server distributions are listed):
Format
| Logical sector size
| Physical sector size
| Compatible OS
|
512E
| 512 bytes
| 4096 bytes (4KiB)
| - RHEL 6.1
- SLES 11 SP2
- Ubuntu 13.10
- Ubuntu 04/12/4
|
4KN
| 4096 bytes (4KiB)
| 4096 bytes (4KiB)
| - RHEL 6.1
- SLES 11 SP2
- Ubuntu 13.10
- Ubuntu 04/12/4
|
For other distributions, you can focus on the kernel version (> 2.6.31) and the version of the disk partitioning tools: GNU Fdisk> 1.2.3 or GNU Parted> 2.1
You can see the sizes of the physical and logical blocks in / sys / block / sdX / queue / physical_block_size and in / sys / block / sdX / queue / logical_block_size, respectively.
GNU Fdisk will automatically use an offset of 1MiB when started with the -c and -u keys (disable DOS compatibility mode and use the sector as the unit of measurement). Normal Fdisk does not know how to work with GPT, so it is useless for disks> 2TiB, and you need to use Parted or
GPT Fdisk . The latter uses by default for 512N / 512E disks an offset of 2048 sectors we need:
Disk / dev / sde: 7814037168 sectors, 3.6 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): BE7D7D71-F6ED-4371-ACFE-B04819A4DDC2
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 7814037134
Partitions will be aligned on 2048-sector boundaries
Total free space is 7814037101 sectors (3.6 TiB)
Example for GNU Parted (for 512N / 512E disks):
# create new GPT markup
mklabel gpt
# create a partition for all free space with an offset of 2048 sectors
(parted) mkpart part1 2048s 100%
(parted) print
Model: ATA WDC WD40EFRX-68W (scsi)
Disk / dev / sde: 7814037168s
Sector size (logical / physical): 512B / 4096B
Partition Table: gpt
Number Start End Size File System Name Flags
1 2048s 7814035455s 7814033408s part1
In LVM, everything is fine: the default offset is 1MiB and the size of PE (physical extent) is a multiple of 1MiB.
# check offset
#pvs / dev / sde -o + pe_start
PV VG Fmt Attr PSize PFree 1st PE
/ dev / sde VolRed lvm2 a-- 3.64t 3.64t 1.00m
# check PE size
#pvdisplay / dev / sde
--- Physical volume ---
PV Name / dev / sde
VG Name VolRed
PV Size 3.64 TiB / not usable 3.84 MiB
Allocateable yes
PE Size 4.00 MiB
Total PE 953861
Free PE 953861
Allocated PE 0
PV UUID 9AfJr9-OOtC-PB34-dUnq-kCDK-L1fN-aTAxus
VMware
An article in the VMware Knowledge Base states that neither 512E nor 4KN disks are supported. 4KN disk support is claimed in vSphere 6.0.
With the advent of VMFS-5, we got a single block size - 1MiB and the correct 1MiB offset for the first partition. Previously, not always suitable offset was used in 64KiB. But all this does not cancel VMware’s claims that 512E drives are not supported. Apparently, this is due to the fact that the VMDK format stores data with a granularity of 512 bytes.
Other OS
Mac OSX supports Advanced Format since Tiger. FreeBSD and other * BSD, Oracle Solaris, and many other operating systems remain, but a detailed discussion of the situation with Advanced Format disks in them is beyond the scope of this article.
Microsoft services
Hyper-v
Despite the fact that 512E drives are supported in Windows Server 2008 and 2008 R2 (see the table of requirements for installed KB), the following problem appears in Hyper-V: VHD virtual disk file format uses 512-byte structures for dynamic (“thin”) and differential VHD, which naturally leads to regular read-modify-write. The situation is aggravated by the fact that for a guest OS a virtual disk looks like having 512 bytes of physical sectors. Use fixed VHDs, but if possible, do not use 512E drives to place VHD files.
In Windows Server 2012, the VHDX format appeared, which does not have the problems described above (you can create it in any form - 512N / 512E / 4KN).
Exchange server
There are features associated with replication in DAG:
- All disks used in the Exchange Availability Group (DAG) Exchange for storing databases and logs must use the same physical sector size.
- 4KN discs are not supported.
- 512E drives supported starting with Exchange 2010 Service Pack 2
SQL Server
The situation is the same as for Exchange Server - in fault tolerant configurations for databases and logs on all nodes disks with the same physical sector size should be used.
When using Storage Spaces, an interesting situation arises: the presented size of the physical sector turns out to be 4KiB regardless of which disks the Storage Spaces are assembled from (the Storage Spaces volume can be created from different disks - 512N and 512E, naturally cannot be mixed with 4KN, except for using tiering with SSD). VHDX (virtual disk) format is created by default as 512E. You can verify this by running fsutil fsinfo ntfsinfo <disk name>:
Bytes Per Sector: 512
Bytes Per Physical Sector: 4096
When using VHDX on a Storage Spaces volume (or hardware RAID) consisting of 4KN disks, VHDX itself is also advisable to make 4KN:
New-VHD -Path D: \ image4kn.vhdx -Fixed -SizeBytes 500GB -LogicalSectorSizeBytes 4096 -PhysicalSectorSizeBytes 4096
Is it safe for SQL and other applications using synchronous recording? The answer is yes, since the large granularity of the storage does not violate the integrity of the data, this also does not affect the performance, since 4096 is divided by 512.
Services using ESENT
Not quite relevant problem in Windows Server 2008. Services that use the Extensible Storage Engine API (AD, WINS, DHCP) in work can fall when the size of the physical sector changes (for example, when migrating from a 512N disk to 512E). Detailed description and hotfix see
here .
Other software
Obviously, software designed to manage partitions (cloning, moving, resizing) and to automate backups must take into account the peculiarities of working with Advanced Format disks. Here is the situation:
- Acronis products .
- Symantec Backup Exec supports Advanced Format disks (512E and 4KN) since version 2012 revision 1798 Service Pack 2. Earlier releases may work with 512E disks, but Symantec claims that this combination is not officially supported.
- Symantec Norton Ghost does not support 4KN discs.
Controllers
Universal rules for all controllers:
- The 4KN and 512N / 512E disks cannot be mixed in one array.
- With Adaptec and LSI controllers, the metadata is placed at the end of the disk, user space is available with LBA0. This means that there will be no alignment problems for 512E disks.
- An array of 4KN disks will also have a physical / logical sector size of 4KiB, i.e. GPT and UEFI are needed to boot from them.
- Do not forget to update the management utilities and drivers with the firmware.
- How will the LUN created on 512E disks - 512N or 512E be presented? From what has been verified: the LSI 9260 controllers, the 6th series Adaptec, the Infortrend ESDS data storage systems report 512N (logical / physical blocks 512 bytes), i.e. The problem with synchronous recording remains. Be sure to use write-back cache (of course, with protection) and UPS. Moreover, it is possible that when changing the firmware, the storage system and controller may suddenly behave “correctly”, and LUNs will turn into 512E with all the ensuing consequences for compatibility.
Adaptec by PMC
- SAS HBA series 5 and 6: support 512E, do not support 4KN
- SAS HBA 6H and 7H series: support 512E, 4KN - starting with firmware 10467.
- RAID controllers of series 7 and 8: support 512E, 4KN - starting with firmware 30862.
Compatibility lists for Adaptec controllers .
LSI / Avago
LSI chip controllers use Dell, IBM, Lenovo, Fujitsu, Intel and Supermicro. The correspondence between the models from LSI and OEM-options can be installed on the chip.
- Older controllers based on LSI1078: do not support Advanced Format disks at all
LSI 3ware series 9750 based on LSI2108 and earlier 3ware: do not support Advanced Format disks at all.- LSISAS2108 (LSI 9260/61/80): support 512E since firmware MR4.8, 4KN do not support. Compatibility list (4KN disks are present, but apparently refer to LSI 2208, see below).
- LSISAS2208 (LSI 9265/66/71/85/86): support 512E since firmware MR5.5, support 4KN since firmware MR5.8. Compatibility list .
- LSISAS3108 (LSI 9361/80): support 512E and 4KN. Compatibility list .
- SAS HBA based on LSISAS2008 and LSISAS2308 (LSI 9211/9200/9207): support 512E and 4KN. Compatibility list .
- SAS HBA based on LSISAS3008 (LSI 9311/9300): support 512E and 4KN. Compatibility list .
- RAID based on LSISAS2008 (LSI 9240, iMR firmware): support 512E, 4KN do not support. Compatibility list .
- RAID based on LSISAS3008 (LSI 9340, iMR firmware): support 512E, 4KN do not support. Compatibility list .
Update 03/25/2015: The latest 3108-based LSI MegaRAID RAID controllers have a poorly documented volume (VD) feature called
Emulation Type . In the controller BIOS, the possible values ​​are Default, Disabled, and Forced. You can also switch via MSM or StorCLI:
storcli / cx / vx set emulationType = 0 | 1 | 2
This property is responsible for the block sizes presented to the host:
Default (0): if there are 512E disks in the volume, it is presented as 512E. If all the disks are 512N, then the volume will be presented as 512N
Disabled (1): Tom always presents as 512N despite the presence of 512E disks
Forced (2): Tom always presents as 512E even in the absence of 512E disks
The emulation type was ported to SAS2 controllers (LSI 2108/2208), but without the Forced value (2).
Software RAID in Intel Chipsets (RST / RSTe)
4KN is not supported at all, Intel RST on 512E drives
requires fresh drivers .
Advanced Format in enterprise-class drives. What awaits us?
It's about corporate-class last episodes. Desktop HDD and positioned for NAS or video surveillance did not get here.
Vendor | Series | Form factor | Interfaces | Spindle rotation speed, rpm | 512N | 512E | 4KN | Additionally |
Seagate | Enterprise Performance 10K HDD (10k.8) | 2.5 " | SAS | 10,000 | Y | Y | Y | for 512N capacity is limited: 600 / 1200GB |
Seagate | Enterprise Performance 15K HDD (15k.5) | 2.5 " | SAS | 15,000 | Y | Y | Y | 32GB of integrated SSD cache |
Seagate | Enterprise Capacity 2.5 HDD (V.3) | 2.5 " | SAS, SATA | 7200 | | Y | Y | |
Seagate | Enterprise Capacity 3.5 HDD (V.4) | 3.5 " | SAS, SATA | 7200 | | Y | Y | |
Seagate | Archive hdd | 3.5 " | SATA | 7200 | | | Y | Positioned for archival use, less MTBF and worse BER |
Seagate | Terascale hdd | 3.5 " | SATA | 5900/7200 | | Y | | Positioned for cloud use, less MTBF and worse BER |
Hgst | Ultrastar C10K1800 | 2.5 " | SAS | 10,000 | Y | Y | Y | for 512N capacity is limited: 300/600/900 / 1200GB |
Hgst | Ultrastar C15K600 | 2.5 " | SAS | 15,000 | Y | Y | Y | |
Hgst | Ultrastar C7K1000 | 2.5 " | SAS | 7200 | Y | | | |
Hgst | Ultrastar He 8 | 3.5 " | SAS, SATA | 7200 | | Y | Y | |
Hgst | Ultrastar He 6 | 3.5 " | SAS, SATA | 7200 | Y | | | |
Hgst | Ultrastar 7K6000 | 3.5 " | SAS, SATA | 7200 | | Y | Y | |
Hgst | MegaScale DC 4000.B | 3.5 " | SATA | 5400 | | Y | | Positioned for cloud use, less MTBF and worse BER |
Wd | Xe | 2.5 "/3.5" | SAS | 10,000 | Y | | | |
Wd | Re | 3.5 " | SATA | 7200 | Y | | | |
Wd | Se | 3.5 " | SATA | 7200 | | Y | | Positioned for cloud use, less MTBF and worse BER |
Wd | Ae | 3.5 " | SATA | 5760 | | Y | ? | Positioned for archival use, less MTBF and worse BER |
Toshiba | AL13SE | 2.5 " | SAS | 10,000 | Y | | | |
Toshiba | AL13SX | 2.5 " | SAS | 15,000 | Y | | | |
Toshiba | AL13SEL | 3.5 " | SAS | 10,000 | Y | | | |
Toshiba | MG03ACA / MG03SCA | 3.5 " | SAS, SATA | 7200 | Y | | | |
Toshiba | MG04ACA | 3.5 " | SATA | 7200 | | Y | Y | |
Toshiba | MG04SCA | 3.5 " | SAS | 7200 | | Y | Y | |
Toshiba | MC04ACA | 3.5 " | SATA | 7200 | | Y | | Positioned for cloud use, less MTBF and worse BER |
You see the trend yourself - Advanced Format has finally penetrated from the desktop segment into the corporate one. Fast SAS 10/15 kpm drives are still available in the 512N version, but the density increase forces manufacturers to use 4KiB sectors: Seagate 10k.8 and HGST Ultrastar C10K1800 with a capacity of 1800GB are available only in versions 512E and 4KN. All drives larger than 5TB with the exception of HGST Ultrastar He
6 - only Advanced Format.
SSD
SSD have their own characteristics. You can read and write data with pages that are 2–4–8–16KiB depending on the SSD architecture. At the same time, for recording, it is necessary to pre-erase cells, which is carried out not page by page, but in blocks of several hundred pages. For example, Samsung 840 EVO has blocks of 2Mb, each of which consists of 256 pages of 8kb. In this case, of course, any block size presented to the host - 512 or 4096 bytes - will be an abstraction.
Some of the current SAS / SATA SSDs emulate a 512E drive, but most of the compatibility considerations are 512N. There is no need to take any special measures in this regard, as in the enterprise-class SSD the contents of the cache are necessarily protected from power loss. Enough to ensure the alignment of the page size.
Some PCI-E SSDs, for example, produced by Fusion IO, allow using proprietary utilities to change the size of the logical sector during formatting, i.e. Switch between 512E and 4KN modes. For some SSDs with a SAS interface, this is also possible, for example, the Seagate 1200 supports resizing the sector with the usual sg_format. Transition to the 4KiB sector
in some scenarios can significantly improve performance.
findings
- 512E drives are not suitable for use in servers with outdated OSes that ignore the size of the physical sector. In desktop applications, it does not matter much, as no one usually uses synchronous recording.
- Carefully study your infrastructure: OS, used services, controllers, storage systems, caching modes on controllers and storage systems. If there are potential problems with performance and / or data integrity, take appropriate action.
- Problems with outdated operating systems can be circumvented using virtualization, but still need to pay attention to the alignment of partitions.
Links