대용량 텍스트 파일의 반전 계산하기

코드의 목적은 전체 텍스트 파일을 찾는 것입니다. 배열에 대한 반전의 수. 내 코드가 성공적으로 작동합니다. 반전 카운트가 15 인 6 개 요소 (가장 높은 것으로부터 시작하여 역순으로 모든 요소)에 대해 성공적으로 테스트되었습니다. 또한 반전 횟수 = 45 인 10 개 요소 (모든 요소가 가장 높은 것으로부터 시작하여 역순으로)에 대해 성공적으로 테스트되었습니다. 그러나 100k 정수가 들어있는 큰 파일인데 거의 25 초가 걸립니다. 예상 되나요? 친절하게 제안하거나 실행 시간을 줄 일 수 있습니까? 기존의 병합 정렬 알고리즘 (즉, 역수의 총 개수를 계산하는 행)에서 약간의 조정을했습니다. 전체 실행 시간을 더 줄이려면 어떻게해야합니까?대용량 텍스트 파일의 반전 계산하기

def mergeSort(final_list): 
    global total_count 
    if len(final_list)>1: 
     mid_no=len(final_list)//2 
     left_half=final_list[:mid_no] 
     right_half=final_list[mid_no:] 

     mergeSort(left_half) 
     mergeSort(right_half)  

     '''Below code is for merging the lists''' 
     i=j=k=0 #i is index for left half, j for the right half and k for the resultant list 
     while i<len(left_half) and j<len(right_half): 
      if left_half[i] < right_half[j]: 
       final_list[k]=left_half[i] 
       i+=1 
       k+=1 

      else: 
       final_list[k]=right_half[j] 

       print 'total count is' 
       print total_count 
       #total_count+=len(left_half)-i 
       total_count+=len(left_half[i:]) 
       print 'total_count is ' 
       print total_count 

       print 'pairs are ' 
       print str(left_half[i:])+' with '+str(right_half[j]) 
       j+=1 
       k+=1 




     while i<len(left_half): 
      final_list[k]=left_half[i] 
      k+=1 
      i+=1 
     while j<len(right_half): 
      final_list[k]=right_half[j] 
      j+=1 
      k+=1 

     '''Code for list merge ends''' 

#temp_list=[45,21,23,4,65] 
#temp_list=[1,5,2,3,4,6] 
#temp_list=[6,5,4,3,2,1] 
#temp_list=[1,2,3,4,5,6] 
#temp_list=[10,9,8,7,6,5,4,3,2,1] 
#temp_list=[1,22,3,4,66,7] 
temp_list=[] 
f=open('temp_list.txt','r') 
for line in f: 
    temp_list.append(int(line.strip())) 

print 'list is ' 
print temp_list 
print 'list ends' 
print temp_list[0] 
print temp_list[-1] 
'''import time 
time.sleep(1000) 
print 'hhhhhhhhhh' 
''' 



total_count=0 
mergeSort(temp_list) 

print temp_list

출처

2016-08-03 fsociety

내가 재현하여 결과와 프로파일 러에 의해 i

는 그것을 발견 (및 프로파일 검증)

 #total_count+=len(left_half[i:]) 
     total_count += len(left_half) - i

left_half는 [내가 :]가 재귀 함수의 메인 루프의 여러 요소를 여러 번 복사하여 새로운리스트를 작성 . 그것은 접합의 영리한 사용이었다, 그러나 부작용은 당신의 성과를 죽이고있다. 나는 그것을 프로파일 기능을 고장 방법은 다음과

는 다음과 같습니다

def so_merge (final_list, left_half, right_half): 
    global total_count 
    i=j=k=0 #i is index for left half, j for the right half and k for the resultant list 
    while i<len(left_half) and j<len(right_half): 
     if left_half[i] < right_half[j]: 
      final_list[k]=left_half[i] 
      i+=1 
      k+=1 
     else: 
      final_list[k]=right_half[j] 
      count1 = get_incriment_bad(left_half, i) 
      count2 = get_incriment_good(left_half, i) 
      if count1 != count2: 
       raise ValueError 
      total_count += count1 
      j+=1 
      k+=1 
    finish_left(final_list, left_half, i, k) 
    finish_right(final_list, right_half, j, k)

을하고 ([내가 :] left_half) 결과는 렌지고 19.574 초 동안 보여

ncalls tottime percall cumtime percall filename:lineno(function) 
199999/1 0.805 0.000 29.562 29.562 week1.py:124(so_mergesort) 
99999 7.496 0.000 28.735 0.000 week1.py:104(so_merge) 
776644 19.512 0.000 19.574 0.000 week1.py:101(get_incriment_bad) 
776644 0.839 0.000 0.895 0.000 week1.py:98(get_incriment_good) 
5403164 0.382 0.000 0.382 0.000 {len} 
99999 0.273 0.000 0.286 0.000 week1.py:92(finish_right) 
99999 0.255 0.000 0.266 0.000 week1.py:86(finish_left) 
    1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

출처

2016-08-03 22:19:05

우연히 leftlen = len (left_half) 및 rightlen = len (right_half)를 저장하면 프로파일 러가 len을 가져 오는 것을 보여 주지만 코드가 수정 된 후 7 초에서 .382 –

명확하게 설명해 주셔서 감사합니다. – fsociety

대용량 텍스트 파일의 반전 계산하기

답변

관련 문제