Windows Offloaded Data Transfer (Odx) is a powerful feature designed to significantly accelerate data copy and move operations within Windows Server environments. Introduced in Windows Server 2012 and fully supported on NTFS volumes, ODX streamlines data handling at the storage level, reducing server load and speeding up processes. This article delves into ODX from a storage device perspective, offering insights for IT professionals and storage administrators looking to optimize their data transfer efficiency. For those interested in the file system and minifilter aspects, refer to the comprehensive guide on Offloaded Data Transfers.
ODX: Revolutionizing Data Transfer
Traditional file transfers involve moving large volumes of data through the server, consuming valuable CPU and network resources. Windows ODX changes this paradigm by introducing a token-based system. Instead of the server handling the entire data stream, ODX offloads the actual data movement to the storage array itself. This “offload copy” operation dramatically reduces server overhead and accelerates transfer speeds, particularly for large files and virtual machine operations.
ODX functionality extends across various storage configurations, providing flexibility and broad applicability:
- Intra-volume Operations: Copying or moving files within the same volume.
- Inter-volume Operations (Same Host): Transfers between different volumes hosted on the same server.
- Local to Remote (SMB): Copying data between a local volume and a remote volume accessed via Server Message Block (SMB2 or SMB3).
- Server-to-Server (SMB): Data transfers between volumes on two distinct servers utilizing SMB2 or SMB3.
The following diagram illustrates the streamlined process of an offload copy operation on ODX-enabled storage devices.
- Offload Read Request: The initiating application sends an offload read request to the source storage device’s copy manager.
- Token Generation (ROD): The source copy manager responds by generating and returning a token. This token, known as a Representation of Data (ROD), acts as a pointer to the data to be copied.
- Offload Write Request with Token: The application then sends an offload write request, including the ROD token, to the destination storage device’s copy manager.
- Data Movement and Result: The storage array’s copy manager takes over, directly moving the data from the source to the destination device. Finally, it returns the offload write result to the application, confirming the operation’s completion.
Identifying ODX-Capable Storage
For ODX to function, storage arrays must adhere to the T10 standard specifications specifically designed for ODX-compatible devices. This includes implementing offload read and write operations that utilize tokens. Windows automatically detects and assesses the ODX capabilities of storage target devices during the Logical Unit Number (LUN) device enumeration process, which occurs at system boot or during plug-and-play events.
This detection process involves:
- Capability Query: Windows queries the storage device to determine its copy offload capabilities.
- Parameter Acquisition: Windows gathers essential parameters and limitations relevant to copy offload operations.
By default, Windows intelligently prioritizes the ODX path for copy operations if both the source and destination LUNs are identified as ODX-capable. Should the initial ODX request fail for any reason, Windows intelligently marks the specific source-destination LUN combination as “not ODX capable” and seamlessly reverts to the traditional, legacy copy file code path, ensuring data transfer continuity.
Deep Dive into ODX Read/Write Operations
Synchronous Command Structure and APIs
To ensure the robustness of synchronous offload writes, especially for large requests, Windows employs a splitting algorithm:
- Optimal Transfer Size Determination: If the target storage device doesn’t specify an optimal transfer size, a default of 64 MB is set.
- Maximum Transfer Size Cap: If the device’s optimal transfer size exceeds 256 MB, it’s capped at 256 MB for efficient handling.
- Device-Specified Optimal Size: If the device specifies an optimal size between 0 and 256 MB, that value is used.
The use of synchronous offload read and write SCSI commands simplifies management, particularly in complex MPIO (Multipath I/O) and cluster failover scenarios. Windows expects the storage array’s copy manager to complete these synchronous commands within a 4-second timeframe.
Applications can leverage various APIs to interact with storage arrays and execute copy offload operations, including:
- FSCTL (File System Control Codes): FSCTL
- DSM IOCTL (Device Specific Method Input/Output Control Codes): DSM IOCTL
- SCSI_PASS_THROUGH: SCSI_PASS_THROUGH
To safeguard against data corruption and system instability, Windows enforces a restriction preventing applications from directly writing to a file system-mounted volume without acquiring exclusive access. This is crucial because direct writes could conflict with file system operations, potentially leading to data inconsistencies.
Understanding Offload Read Operations
When an application initiates an offload read request, it can specify a token lifetime, defining the token’s inactivity timeout period. Setting the token lifetime to zero instructs the system to use the default inactivity timer. The storage array’s copy manager is responsible for managing and validating the token based on its inactivity timeout and associated credentials. Windows also imposes a limit of 64 file fragments per offload read request. Exceeding this limit will cause Windows to revert to traditional copy operations.
Upon successful completion of an offload read request, the copy manager generates a Representation of Data (ROD) token. This token represents a point-in-time snapshot of the user data and any associated protection information. The ROD token’s format is opaque and exclusively managed by the storage array, ensuring security and uniqueness. It can represent user data in either “open exclusively” or “open with share” formats.
The copy manager’s ROD policy dictates token invalidation. For “open exclusively” RODs, the token may be invalidated if the represented data is modified or moved. However, for “open with share” RODs, the token remains valid even if the data is modified.
A ROD token is a 512-byte structure with the following format:
Size in Bytes | Token Contents |
---|---|
4 | ROD Token Type |
508 | ROD Token ID |
The ROD token’s opaque nature and array-managed lifecycle ensure its security. If a token is modified, fails validation, or expires, the copy manager can invalidate it during the subsequent offload write operation. The ROD token returned from an offload read operation includes an inactivity timeout value, indicating the duration (in seconds) for which the copy manager guarantees the token’s validity for subsequent Write Using Token operations.
Delving into Offload Write Operations
After obtaining the ROD token from the copy manager, the application initiates the offload write request, including the ROD token, and sends it to the destination storage array’s copy manager. Similar to read operations, synchronous offload write commands are expected to complete within 4 seconds. Timeouts or errors will lead to command failure, prompting the application to fall back to legacy copy methods based on the returned status code.
Offload write operations can be completed through one or more Receive Offload Write Result commands. In cases of partial completion, the copy manager provides an estimated delay and a transfer count, indicating copy progress. The transfer count reflects the number of contiguous logical blocks successfully written from source to destination. Copy managers can perform offload writes sequentially or using scatter/gather patterns.
In the event of a write failure, the progress count reflects contiguous logical blocks written up to the point of failure. Client applications or copy engines can then resume the offload write from the point of failure. Upon successful completion of the offload write, the copy manager responds to the Receive ROD Token Information command with:
- An estimated status update delay set to zero.
- A data transfer progress count of 100 percent.
If the receive offload write result consistently returns the same data transfer progress count across four retries, Windows will revert the copy operation back to the application.
ODX also supports the concept of a “well-known ROD token,” a predefined token with a known data pattern and format, most commonly a “zero token.” Applications can use zero tokens to efficiently fill logical block ranges with zeros. If a well-known token is not supported or recognized, the copy manager will reject the offload write request with an “Invalid Token” error.
A well-known ROD token also adheres to a 512-byte format:
Size in Bytes | Token Contents |
---|---|
4 | ROD Token Type |
2 | Well Known Pattern |
506 | ROD Token ID |
It’s important to note that applications cannot request a well-known token using an offload read operation. The copy manager is responsible for verifying and managing well-known ROD tokens according to its specific policies.
Performance Tuning for Optimal ODX Implementation
ODX performance is largely independent of client-server network or storage area network (SAN) transport link speeds. The data movement is handled directly by the storage array’s copy manager and device servers.
However, not all copy operations benefit equally from ODX. For smaller file transfers, the overhead of ODX setup might outweigh the benefits. For instance, a 1-Gbit iSCSI storage array might complete a 3-GB file copy in under 10 seconds, achieving data transfer rates exceeding 300 MB per second, already surpassing the theoretical limits of a 1-Gbit Ethernet interface.
To optimize ODX performance, it’s often beneficial to restrict its use to files exceeding a certain minimum size and to manage maximum copy lengths. Key performance parameters include:
- Minimum File Size: Windows sets a minimum file size of 256 KB for copy offload operations. Files smaller than this threshold will automatically fall back to legacy copy processes.
- Maximum Token Transfer Size and Optimal Transfer Count: Windows uses these parameters to determine the optimal transfer size for offload read and write SCSI commands. The total transfer size in blocks must not exceed the maximum token transfer size. If the storage array doesn’t report an optimal transfer count, Windows defaults to 64 MB.
The optimal and maximum transfer length parameters define the ideal and maximum number of blocks within a single range descriptor. Applications adhering to these parameters can achieve the most efficient file transfer performance when utilizing ODX.
ODX Error Handling and High Availability
When an ODX operation encounters a failure during a file copy request, the copy engine and the Windows file system (NTFS) are designed to gracefully fall back to the traditional, legacy copy operation. If a failure occurs mid-offload write, the system resumes with legacy copy methods from the point of failure in the offload write.
Robust ODX Error Handling
ODX incorporates a robust error handling algorithm that adapts to the storage array’s capabilities. In the event of a copy offload failure within an ODX-capable path, Windows expects applications to revert to legacy copy operations. The Windows copy engine already includes this automatic fallback mechanism. After an ODX failure, NTFS temporarily marks the source and destination LUN as ODX-incapable for a period of three minutes. After this timeout, Windows will automatically retry ODX operations. This mechanism allows storage arrays to temporarily disable ODX support on specific paths during periods of high stress or load.
ODX in MPIO and Cluster Server Failover Scenarios
Offload read and write operations are designed to be atomic and must be completed or canceled from the same storage link (I_T nexus).
In MPIO or cluster server configurations, failovers during synchronous ODX operations are handled as follows:
- MPIO Path Failover: If an MPIO path failover occurs, Windows retries the failed ODX command. If the command fails again:
- In a cluster server environment, Windows initiates a cluster server node failover.
- If cluster server failover is not an option, Windows issues a LUN reset to the storage device and returns an I/O failure status to the application.
- Cluster Server Failover: In a cluster server configuration, the cluster storage service fails over to the next preferred cluster node, and the cluster storage service is resumed. Cluster-aware applications are expected to retry the offload read/write command after a cluster storage service failover.
If an offload read or write command fails even after MPIO path and cluster node failovers, Windows issues a LUN reset to the storage device following the failover. This LUN reset terminates all outstanding commands and pending operations on the LUN.
Currently, Windows does not support asynchronous offload read or write SCSI commands within the storage stack.
ODX Usage Scenarios and Models
ODX Across Diverse Storage Targets
To leverage ODX, the application server must have read/write access to both the source and destination LUNs. The copy offload application initiates an offload read request to the source LUN, receives a token from the source LUN’s copy manager, and then uses this token to issue an offload write request to the destination LUN. The copy manager then orchestrates the data movement directly between the source and destination LUNs via the storage network.
The following diagram illustrates common ODX usage scenarios across different storage targets.
ODX Operation within a Single Server
In a single-server setup, the copy offload application executes both offload read and write requests from the same server system.
The server (or VM) hosting the application has access to both the source LUN (which can be a VHD or physical disk) and the destination LUN (also VHD or physical disk). The application obtains a token from the source LUN via an offload read request and then uses this token to initiate an offload write request to the destination LUN. The storage array’s copy manager handles the data transfer between the source and destination LUNs within the same storage array.
ODX Operation Across Two Servers
In a two-server configuration, two servers and potentially multiple storage arrays managed by a unified copy manager are involved.
- One server (or VM) hosts the source LUN, while the second server (or VM) hosts the destination LUN. The source server shares the source LUN with the application client via SMB, and similarly, the destination server shares the destination LUN via SMB. This grants the application client access to both LUNs.
- The source and destination storage arrays are managed by the same copy manager within a SAN environment.
- From the application client, the copy offload application sends an offload read request to the source LUN and receives a token. It then uses this token to issue an offload write request to the destination LUN. The copy manager handles the data transfer between the source and destination LUNs, potentially spanning different storage arrays in different physical locations.
ODX for Massive Data Migration
Massive data migration, involving the transfer of large datasets like databases, spreadsheets, and documents to a new system, is a prime use case for ODX. This migration often occurs during storage system upgrades, database engine changes, or application/business process evolutions. ODX streamlines data migration from legacy to new storage systems, provided the new system’s copy manager can also manage the legacy storage.
- In this scenario, one server hosts the legacy storage, and another hosts the new storage. Both servers share their respective LUNs with a data migration application client via SMB.
- Both the legacy and new storage systems are managed by a common copy manager within a SAN.
- The data migration application client initiates offload read requests to the source (legacy) LUN to obtain tokens and then uses these tokens to issue offload write requests to the destination (new) LUN. The copy manager efficiently migrates the data between the two storage systems, even if they are geographically separated.
- Massive data migration using ODX can also be performed within a single server environment.
Host-Controlled Data Transfer in Tiered Storage
Tiered storage architectures categorize data across different storage media types to optimize cost, performance, and capacity utilization. Data is categorized based on protection needs, performance requirements, usage frequency, and other factors.
ODX plays a crucial role in enabling host-controlled data migration within tiered storage environments. Consider a two-tiered storage example:
- The server hosts the tiered storage system. Tier 1 storage acts as the source LUN, and Tier 2 storage serves as the destination LUN.
- A single copy manager manages all storage tiers.
- The data migration application on the server issues offload read requests to the Tier 1 LUN, obtains tokens, and then issues offload write requests with these tokens to the Tier 2 LUN. The copy manager manages the data movement between the different storage tiers.
- After migration, the application can delete the data from the Tier 1 storage, reclaiming valuable high-performance storage space.
By understanding and implementing Windows ODX, organizations can significantly enhance their data management efficiency, reduce server load, and accelerate critical data operations across diverse storage environments.