2017-09-18 6 views
1

내 텍스트 파일에 다음과 같은 단락이 포함되어 있습니다.두 개의 유사한 타일 사이에 특정 단어가있는 단락을 추출하십시오.

summary 

A result oriented and dedicated professional with three years’ experience in Software Development. A proactive individual with a logical approach to challenges, performs effectively even within a highly pressurised working environment. 

summary 

Oct 28th, 2010 – Till date Cognizant Technology Solutions  


Project #1 

Title   Wealth Passport – R7.3 
Client     Northern Trust 
Operating System Windows XP 
Technologies  J2EE, JSP, Struts, Oracle, PL/SQL 
Team Size  3 
Role   Team Member 
Period     22nd Aug’ 2013 - Till Date  
Project Description 
Wealth Passport R7.3 release aims at enhancements in four projects SGY, PMM, WPA and WPX. This primarily involves analysing existing issues in the four applications and enhancements to some of the functionalities. 
Role and Responsibilities 
Handled dockets in SGY and PMM applications. 
Done root cause analysis to existing issues in a short span of time. 
Designed and developed enhancements in PMM application. 
Preparing Unit Test cases for the developed Java modules and executing them. 


Project #2 
Title   PFS Development – WP Filecabinet and R7.2 
Client     Northern Trust 
Operating System Windows XP 
Technologies  J2EE, JSP, Struts, Weblogic Portal, Oracle, PL/SQL, UNIX, Hibernate, Spring, DOJO 
Team Size  1 
Role   Team Member – JavaEE Developer 
Period     18th June’ 2013 – 21st Aug’ 2013 
Project Description 
PFS Development project is to provide the development services for PFS capital projects: Wealth Passport, Private Passport 6.0 and Private Passport 7.0 
Wealth Passport Filecabinet provides functionality for users to store their files on our system. This enables users to create folders, upload files and view the uploaded files. Batch upload/delete option is also available. Deleted files will be moved to Waste Bucket, from where users can restore should they wish. This project aims at improving the performance of Filecabinet which was mandated by increasing customer base and files handled by the system. 

지금, 나는 다른 요약 섹션을 추출하지 않고 "Project", "Teamsize " 같은 단어를 포함 섹션 요약을 추출하고 싶습니다. 내가 그것을 모두 요약 내용

import re 
import os 
with open ('9.txt', encoding='latin-1') as infile, open ('d.txt','w',encoding='latin-1') as outfile : 
    copy = False 
    for line in infile: 
     if line.strip() == 'summary': 
      re.compile('\r\nproject*\r\n') 
      copy = True 
     elif line.strip() == "summary": 
      copy =False 
     elif copy: 
      outfile.write(line) 
     #fh = open("d.txt",'r') 
     contents = fh.read() 
     len(contents) 

를 추출하고 난 것이다 여기에 내용

summary 

    Oct 28th, 2010 – Till date Cognizant Technology Solutions  


    Project #1 

    Title   Wealth Passport – R7.3 
    Client     Northern Trust 
    Operating System Windows XP 
    Technologies  J2EE, JSP, Struts, Oracle, PL/SQL 
    Team Size  3 
    Role   Team Member 
    Period     22nd Aug’ 2013 - Till Date  
    Project Description 
    Wealth Passport R7.3 release aims at enhancements in four projects SGY, PMM, WPA and WPX. This primarily involves analysing existing issues in the four applications and enhancements to some of the functionalities. 
    Role and Responsibilities 
    Handled dockets in SGY and PMM applications. 
    Done root cause analysis to existing issues in a short span of time. 
    Designed and developed enhancements in PMM application. 
    Preparing Unit Test cases for the developed Java modules and executing them. 


    Project #2 
    Title   PFS Development – WP Filecabinet and R7.2 
    Client     Northern Trust 
    Operating System Windows XP 
    Technologies  J2EE, JSP, Struts, Weblogic Portal, Oracle, PL/SQL, UNIX, Hibernate, Spring, DOJO 
    Team Size  1 
    Role   Team Member – JavaEE Developer 
    Period     18th June’ 2013 – 21st Aug’ 2013 
    Project Description 
    PFS Development project is to provide the development services for PFS capital projects: Wealth Passport, Private Passport 6.0 and Private Passport 7.0 
    Wealth Passport Filecabinet provides functionality for users to store their files on our system. This enables users to create folders, upload files and view the uploaded files. Batch upload/delete option is also available. Deleted files will be moved to Waste Bucket, from where users can restore should they wish. This project aims at improving the performance of Filecabinet which was mandated by increasing customer base and files handled by the system. 
+0

텍스트 파일의 형식을 제어 할 수 있습니까? 그렇다면'json','txt' 또는'csv' (몇 가지 이름으로) 파일 형식으로 선언하면 훨씬 쉽게 파싱 할 수 있습니다. –

+0

'd.txt '에 예상되는 결과는 무엇입니까? –

+0

프로젝트 단어가 포함 된 요약 섹션 –

답변

0

당신이 관심있는 단어를 포함하는 모든 summary 섹션을 추출하려면

은 자신의 역에, 당신이 할 수있는 간단한 설정을 부울을 전환하려면 :

split_on = 'summary\n\n' 
must_contain = ['Project', 'Team Size'] 

with open('9.txt') as f_input, open('d.txt', 'w') as f_output: 
    for part in f_input.read().split(split_on): 
     if all(text in part for text in must_contain): 
      f_output.write(split_on + part) 
+0

나는 많은 파일을 무작위로 추출하여 특정 단어를 확인하고 섹션을 추출하는 것은 모든 파일이 위와 비슷하지는 않다. –

+0

{project, teamsize 등의 단어 집합과 함께 임의의 텍스트 파일 입력에서 임의의 섹션을 추출하고 싶었지만, } –

+0

필요한 단어 목록을 포함하는 모든 섹션을 필터링하도록 스크립트를 업데이트했습니다. –

0

두 번째 조건 문이 포함 된 저장에 d.txt 같은 텍스트 파일을 기대하고있어,이 코드 아래 시도 첫 번째 것과 동일한 조건을 가지기 때문에 절대 실행하지 마십시오. 첫 번째 인스턴스가 summary 인 후에 의미 복사본은 항상 True이됩니다. 및 copy 전환 - 내가 권하고 싶습니다 무엇

if line.strip() == 'summary': 
    re.compile('\r\nproject*\r\n') 
    copy = True 
elif line.strip() == "summary": 
    copy =False 

는 "요약"태그 (나는이 코멘트 블록의 시작/끝 될 의미 가정)을 집어 들고 하나 개의 문장을 가지고있다. 예를 들어

a = True 
a = not a 
# a is now False 

:

if line.strip() == 'summary': 
    copy = not copy 
elif copy: 
    outfile.write(line)