From time to time, customers will call in to report “performance problems” that they are having when copying large files from one location to another. By “performance problems”, they mean that the file isn’t copying as fast as they expect. The most common scenario is copying large SQL databases from server to server, but this could just as easily occur with other file types. More often than not, the customer has tried different methods of copying the file including Windows Explorer, Copy, XCopy & Robocopy – with the same results. So … what’s going on here?
Assuming that you aren’t experiencing network issues (and for the purposes of this article, we’ll assume a healthy network), the problem lies in the way in which the copy is performed – specifically Buffered v Unbuffered Input/Output (I/O). So let’s quickly define these terms. Buffered I/O describes the process by which the file system will buffer reads and writes to and from the disk in the file system cache. Buffered I/O is intended to speed up future reads and writes to the same file but it has an associated overhead cost. It is effective for speeding up access to files that may change periodically or get accessed frequently. There are two buffered I/O functions commonly used in Windows Applications such as Explorer, Copy, Robocopy or XCopy:
- CopyFile() – Copies an existing file to a new file
- CopyFileEx() – This also copies an existing file to a new file, but it can also call a specified callback function each time a portion of the copy operation is completed, thus notifying the application of its progress via the callback function. Additionally, CopyFileEx can be canceled during the copy operation.
So looking at the definition of buffered I/O above, we can see where the perceived performance problems lie – in the file system cache overhead. Unbuffered I/O (or a raw file copy) is preferred when attempting to copy a large file from one location to another when we do not intend to access the source file after the copy is complete. This will avoid the file system cache overhead and prevent the file system cache from being effectively flushed by the large file data. Many applications accomplish this by calling CreateFile() to create an empty destination file, then using the ReadFile() and WriteFile() functions to transfer the data.
- CreateFile() – The CreateFile function creates or opens a file, file stream, directory, physical disk, volume, console buffer, tape drive, communications resource, mailslot, or named pipe. The function returns a handle that can be used to access an object.
- ReadFile() – The ReadFile function reads data from a file, and starts at the position that the file pointer indicates. You can use this function for both synchronous and asynchronous operations.
- WriteFile() – The WriteFile function writes data to a file at the position specified by the file pointer. This function is designed for both synchronous and asynchronous operation.
For copying files around the network that are very large, my copy utility of choice is ESEUTIL which is one of the database utilities provided with Exchange. To get ESEUTIL working on a non-Exchange server, you just need to copy the ESEUTIL.EXE and ESE.DLL from your Exchange server to a folder on your client machine. It’s that easy. There are x86 & x64 versions of ESEUTIL, so make sure you use the right version for your operating system. The syntax for ESEUTIL is very simple: eseutil /y <srcfile> /d <destfile>. Of course, since we’re using command line syntax – we can use ESEUTIL in batch files or scripts. ESEUTIL is dependent on the Visual C++ Runtime Library which is available as a redistributable package.
Addendum: The XCOPY /J switch was added in Win7/2008R2.
Copies files without buffering. Recommended for very large files. This parameter was added introduced in Windows Server® 2008 R2.