2013-08-05 2 views
0

C# .Net에서 파일 IO를 수행하는 여러 가지 방법의 성능 테스트를 실행하고 있습니다. 필자가 수행하는 두 가지 작업은 텍스트 파일 전체를 버퍼로 읽어 들인 다음 해당 파일을 구문 분석하는 것입니다. 데이터는 헤더 레코드가있는 탭으로 구분 된 레코드이지만 중요하지 않습니다.왜 Windows ReadFile을 사용하여 데이터를 읽은 후 데이터를 파싱 할 때 속도가 느린가?

테스트에서 유일한 점은 파일에서 데이터를 읽는 데 사용되는 방법입니다. 구문 분석 코드는 동일합니다. 그러나 하나의 파일 읽기 방법 (Windows API ReadFile에 대한 직접 호출)은 다른 파일 (File.ReadAllBytes)보다 1.4 배 빠르지 만 나중에 ReadFile을 사용하여 읽은 파일의 구문 분석은 데이터에 적용 할 때보 다 2.5 배 길어집니다 File.ReadAllBytes를 사용하여 읽으십시오!

호출 ReadFile에는 메모리와 안전하지 않은 코드 고정 및 고정 해제가 포함됩니다. 이것이 성능 저하를 설명합니까? 주위에 방법이 있습니까? ReadFile을 호출하는 더 좋은 방법은? 다음은 테스트 코드 루프입니다. 나는 파싱 코드를 보여주지 않을 것이지만, 모든 작업은 IO 버퍼 스캐닝, 탭과 CRLF 구분자 찾기, 문자열 생성에만 국한되어 있습니다. 안전하지 않은 코드 자체 나 String 클래스, Encoding 클래스 및 generic 목록 이외의 외부 메서드에 대한 호출은 없습니다.

Windows ReadFile 함수 호출은 Robert G. Bryan이 작성한 일부 샘플 코드를 사용합니다. 여기 그의 WinFileIO 클래스가 있습니다. 나는 그것을 어떤 식 으로든 변형 시킨다고 생각하지 않는다.

using System; 
using System.ComponentModel; 
using System.Runtime.InteropServices; 

namespace Win32FileIO 
{ 
    unsafe public class WinFileIO : IDisposable 
    { 
     // This class provides the capability to utilize the ReadFile and Writefile windows IO functions. These functions 
     // are the most efficient way to perform file I/O from C# or even C++. The constructor with the buffer and buffer 
     // size should usually be called to init this class. PinBuffer is provided as an alternative. The reason for this 
     // is because a pointer needs to be obtained before the ReadFile or WriteFile functions are called. 
     // 
     // Error handling - In each public function of this class where an error can occur, an ApplicationException is 
     // thrown with the Win32Exception message info if an error is detected. If no exception is thrown, then a normal 
     // return is considered success. 
     // 
     // This code is not thread safe. Thread control primitives need to be added if running this in a multi-threaded 
     // environment. 
     // 
     // The recommended and fastest function for reading from a file is to call the ReadBlocks method. 
     // The recommended and fastest function for writing to a file is to call the WriteBlocks method. 
     // 
     // License and disclaimer: 
     // This software is free to use by any individual or entity for any endeavor for profit or not. 
     // Even though this code has been tested and automated unit tests are provided, there is no gaurantee that 
     // it will run correctly with your system or environment. I am not responsible for any failure and you agree 
     // that you accept any and all risk for using this software. 
     // 
     // 
     // Written by Robert G. Bryan in Feb, 2011. 
     // 
     // Constants required to handle file I/O: 
     private const uint GENERIC_READ = 0x80000000; 
     private const uint GENERIC_WRITE = 0x40000000; 
     private const uint OPEN_EXISTING = 3; 
     private const uint CREATE_ALWAYS = 2; 
     private const int BlockSize = 65536; 
     // 
     private GCHandle gchBuf;   // Handle to GCHandle object used to pin the I/O buffer in memory. 
     private System.IntPtr pHandle;  // Handle to the file to be read from or written to. 
     private void* pBuffer;    // Pointer to the buffer used to perform I/O. 

     // Define the Windows system functions that are called by this class via COM Interop: 
     [System.Runtime.InteropServices.DllImport("kernel32", SetLastError = true)] 
     static extern unsafe System.IntPtr CreateFile 
     (
      string FileName,   // file name 
      uint DesiredAccess,  // access mode 
      uint ShareMode,   // share mode 
      uint SecurityAttributes, // Security Attributes 
      uint CreationDisposition, // how to create 
      uint FlagsAndAttributes, // file attributes 
      int hTemplateFile   // handle to template file 
     ); 

     [System.Runtime.InteropServices.DllImport("kernel32", SetLastError = true)] 
     static extern unsafe bool ReadFile 
     (
      System.IntPtr hFile,  // handle to file 
      void* pBuffer,   // data buffer 
      int NumberOfBytesToRead, // number of bytes to read 
      int* pNumberOfBytesRead, // number of bytes read 
      int Overlapped   // overlapped buffer which is used for async I/O. Not used here. 
     ); 

     [System.Runtime.InteropServices.DllImport("kernel32", SetLastError = true)] 
     static extern unsafe bool WriteFile 
     (
      IntPtr handle,      // handle to file 
      void* pBuffer,    // data buffer 
      int NumberOfBytesToWrite, // Number of bytes to write. 
      int* pNumberOfBytesWritten,// Number of bytes that were written.. 
      int Overlapped      // Overlapped buffer which is used for async I/O. Not used here. 
     ); 

     [System.Runtime.InteropServices.DllImport("kernel32", SetLastError = true)] 
     static extern unsafe bool CloseHandle 
     (
      System.IntPtr hObject  // handle to object 
     ); 

     public WinFileIO() 
     { 
      pHandle = IntPtr.Zero; 
     } 

     public WinFileIO(Array Buffer) 
     { 
      // This constructor is provided so that the buffer can be pinned in memory. 
      // Cleanup must be called in order to unpin the buffer. 
      PinBuffer(Buffer); 
      pHandle = IntPtr.Zero; 
     } 

     protected void Dispose(bool disposing) 
     { 
      // This function frees up the unmanaged resources of this class. 
      Close(); 
      UnpinBuffer(); 
     } 

     public void Dispose() 
     { 
      // This method should be called to clean everything up. 
      Dispose(true); 
      // Tell the GC not to finalize since clean up has already been done. 
      GC.SuppressFinalize(this); 
     } 

     ~WinFileIO() 
     { 
      // Finalizer gets called by the garbage collector if the user did not call Dispose. 
      Dispose(false); 
     } 

     public void PinBuffer(Array Buffer) 
     { 
      // This function must be called to pin the buffer in memory before any file I/O is done. 
      // This shows how to pin a buffer in memory for an extended period of time without using 
      // the "Fixed" statement. Pinning a buffer in memory can take some cycles, so this technique 
      // is helpful when doing quite a bit of file I/O. 
      // 
      // Make sure we don't leak memory if this function was called before and the UnPinBuffer was not called. 
      UnpinBuffer(); 
      gchBuf = GCHandle.Alloc(Buffer, GCHandleType.Pinned); 
      IntPtr pAddr = Marshal.UnsafeAddrOfPinnedArrayElement(Buffer, 0); 
      // pBuffer is the pointer used for all of the I/O functions in this class. 
      pBuffer = (void*)pAddr.ToPointer(); 
     } 

     public void UnpinBuffer() 
     { 
      // This function unpins the buffer and needs to be called before a new buffer is pinned or 
      // when disposing of this object. It does not need to be called directly since the code in Dispose 
      // or PinBuffer will automatically call this function. 
      if (gchBuf.IsAllocated) 
       gchBuf.Free(); 
     } 

     public void OpenForReading(string FileName) 
     { 
      // This function uses the Windows API CreateFile function to open an existing file. 
      // A return value of true indicates success. 
      Close(); 
      pHandle = CreateFile(FileName, GENERIC_READ, 0, 0, OPEN_EXISTING, 0, 0); 
      if (pHandle == System.IntPtr.Zero) 
      { 
       Win32Exception WE = new Win32Exception(); 
       ApplicationException AE = new ApplicationException("WinFileIO:OpenForReading - Could not open file " + 
        FileName + " - " + WE.Message); 
       throw AE; 
      } 
     } 

     public void OpenForWriting(string FileName) 
     { 
      // This function uses the Windows API CreateFile function to open an existing file. 
      // If the file exists, it will be overwritten. 
      Close(); 
      pHandle = CreateFile(FileName, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, 0, 0); 
      if (pHandle == System.IntPtr.Zero) 
      { 
       Win32Exception WE = new Win32Exception(); 
       ApplicationException AE = new ApplicationException("WinFileIO:OpenForWriting - Could not open file " + 
        FileName + " - " + WE.Message); 
       throw AE; 
      } 
     } 

     public int Read(int BytesToRead) 
     { 
      // This function reads in a file up to BytesToRead using the Windows API function ReadFile. The return value 
      // is the number of bytes read. 
      int BytesRead = 0; 
      if (!ReadFile(pHandle, pBuffer, BytesToRead, &BytesRead, 0)) 
      { 
       Win32Exception WE = new Win32Exception(); 
       ApplicationException AE = new ApplicationException("WinFileIO:Read - Error occurred reading a file. - " + 
        WE.Message); 
       throw AE; 
      } 
      return BytesRead; 
     } 

     public int ReadUntilEOF() 
     { 
      // This function reads in chunks at a time instead of the entire file. Make sure the file is <= 2GB. 
      // Also, if the buffer is not large enough to read the file, then an ApplicationException will be thrown. 
      // No check is made to see if the buffer is large enough to hold the file. If this is needed, then 
      // use the ReadBlocks function below. 
      int BytesReadInBlock = 0, BytesRead = 0; 
      byte* pBuf = (byte*)pBuffer; 
      // Do until there are no more bytes to read or the buffer is full. 
      for (; ;) 
      { 
       if (!ReadFile(pHandle, pBuf, BlockSize, &BytesReadInBlock, 0)) 
       { 
        // This is an error condition. The error msg can be obtained by creating a Win32Exception and 
        // using the Message property to obtain a description of the error that was encountered. 
        Win32Exception WE = new Win32Exception(); 
        ApplicationException AE = new ApplicationException("WinFileIO:ReadUntilEOF - Error occurred reading a file. - " 
         + WE.Message); 
        throw AE; 
       } 
       if (BytesReadInBlock == 0) 
        break; 
       BytesRead += BytesReadInBlock; 
       pBuf += BytesReadInBlock; 
      } 
      return BytesRead; 
     } 

     public int ReadBlocks(int BytesToRead) 
     { 
      // This function reads a total of BytesToRead at a time. There is a limit of 2gb per call. 
      int BytesReadInBlock = 0, BytesRead = 0, BlockByteSize; 
      byte* pBuf = (byte*)pBuffer; 
      // Do until there are no more bytes to read or the buffer is full. 
      do 
      { 
       BlockByteSize = Math.Min(BlockSize, BytesToRead - BytesRead); 
       if (!ReadFile(pHandle, pBuf, BlockByteSize, &BytesReadInBlock, 0)) 
       { 
        Win32Exception WE = new Win32Exception(); 
        ApplicationException AE = new ApplicationException("WinFileIO:ReadBytes - Error occurred reading a file. - " 
         + WE.Message); 
        throw AE; 
       } 
       if (BytesReadInBlock == 0) 
        break; 
       BytesRead += BytesReadInBlock; 
       pBuf += BytesReadInBlock; 
      } while (BytesRead < BytesToRead); 
      return BytesRead; 
     } 

     public int Write(int BytesToWrite) 
     { 
      // Writes out the file in one swoop using the Windows WriteFile function. 
      int NumberOfBytesWritten; 
      if (!WriteFile(pHandle, pBuffer, BytesToWrite, &NumberOfBytesWritten, 0)) 
      { 
       Win32Exception WE = new Win32Exception(); 
       ApplicationException AE = new ApplicationException("WinFileIO:Write - Error occurred writing a file. - " + 
        WE.Message); 
       throw AE; 
      } 
      return NumberOfBytesWritten; 
     } 

     public int WriteBlocks(int NumBytesToWrite) 
     { 
      // This function writes out chunks at a time instead of the entire file. This is the fastest write function, 
      // perhaps because the block size is an even multiple of the sector size. 
      int BytesWritten = 0, BytesToWrite, RemainingBytes, BytesOutput = 0; 
      byte* pBuf = (byte*)pBuffer; 
      RemainingBytes = NumBytesToWrite; 
      // Do until there are no more bytes to write. 
      do 
      { 
       BytesToWrite = Math.Min(RemainingBytes, BlockSize); 
       if (!WriteFile(pHandle, pBuf, BytesToWrite, &BytesWritten, 0)) 
       { 
        // This is an error condition. The error msg can be obtained by creating a Win32Exception and 
        // using the Message property to obtain a description of the error that was encountered. 
        Win32Exception WE = new Win32Exception(); 
        ApplicationException AE = new ApplicationException("WinFileIO:WriteBlocks - Error occurred writing a file. - " 
         + WE.Message); 
        throw AE; 
       } 
       pBuf += BytesToWrite; 
       BytesOutput += BytesToWrite; 
       RemainingBytes -= BytesToWrite; 
      } while (RemainingBytes > 0); 
      return BytesOutput; 
     } 

     public bool Close() 
     { 
      // This function closes the file handle. 
      bool Success = true; 
      if (pHandle != IntPtr.Zero) 
      { 
       Success = CloseHandle(pHandle); 
       pHandle = IntPtr.Zero; 
      } 
      return Success; 
     } 
    } 
} 

참고 : WFIO.Dispose를 WFIO.Close 직후로 옮겨 보았지만 아무런 차이가 없었습니다. 마찬가지로 동일한 WFIO 객체를 반복적으로 재사용하려고했습니다.

답변

2

File.ReadAllBytes은 파일을 읽고 정확히 파일 크기의 배열을 반환합니다. 즉, 파일 길이가 94,864 바이트 인 경우 버퍼는 byte[94864]이됩니다.

Windows I/O를 사용하여 파일을 읽는 코드는 파일을 저장할만큼 큰 버퍼를 전달해야합니다. 128K 길이의 버퍼를 전달 중입니다. 그래서 파일이 그보다 작 으면 마지막에 빈 공간이 생길 것입니다.

파싱 메서드에 버퍼 길이를 전달하지 마십시오.

내 생각에 가장 좋은 추측은 파싱 코드가 버퍼 끝에있는 가비지를 파싱하려고하는데, 그렇게 오래 걸리는 것입니다.

+0

<시험 적으로 사라지는> 그 말이 맞는 것 같습니다. 제가 확인하겠습니다. –

+0

물론, 당신 말이 옳았습니다. 그것은 대부분의 시간 차이를 설명했다. 고마워. –