Java 및 iText 7을 사용하여 데이터를 구문 분석 (및 수정)하기 위해 XFA PDF 양식에서 XML 데이터를 정확하게 분석하려고하지만 동일한 작업을 수행하기 위해 모든 기본 데이터를 수집합니다. 내가 사용하는 모든 XFA 파일.iText 7 (또는 기타)을 사용하여 Java의 XFA PDF 문서에서 XML을 추출하는 방법은 무엇입니까?
iText RUPS 도구에서 수행되었으므로 가능해야한다는 것을 알고 있지만 지금은 며칠 동안 서클에 참가했습니다.
public class Parse {
private PdfDocument pdf;
private PdfAcroForm form;
private XfaForm xfa;
private Document domDocument;
private Map<Integer, String> data;
private int numberOfPages;
private String pdfText;
public void openPdf(String src, String dest) throws IOException, TransformerException {
PdfReader reader = new PdfReader(src);
reader.setUnethicalReading(true);
pdf = new PdfDocument(reader, new PdfWriter(dest));
form = PdfAcroForm.getAcroForm(pdf, true);
data = new HashMap<Integer, String>();
numberOfPages = getNumberOfPdfPages();
PdfPage currentPage;
String textFromPage;
for (int page = 1; page <= numberOfPages; page++) {
System.out.println("Reading page: " + page + " -----------------");
currentPage = pdf.getPage(page);
textFromPage = PdfTextExtractor.getTextFromPage(currentPage);
data.put(page, textFromPage);
pdfText += currentPage + ":" + "\n" + textFromPage + "\n";
}
xfa = form.getXfaForm();
domDocument = xfa.getDomDocument();
Map<String, Node> map = xfa.extractXFANodes(domDocument);
System.out.println("The template node = " + map.get("template").toString() + "\n");
System.out.println("Dom document = " + domDocument.toString() + "\n");
System.out.println("In map form = " + map.toString() + "\n");
System.out.println("pdfText = " + pdfText + "\n");
Node node = xfa.getDatasetsNode();
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
System.out.println("Get Child Nodes Output = " + list.item(i) + "\n");
}
}
}
이것은 일반적인 출력입니다.
Reading page: 1 -----------------
The template node = [template: null]
Dom document = [#document: null]
In map form = {template=[template: null], form=[form: null], xfdf=[xfdf: null], xmpmeta=[x:xmpmeta: null], datasets=[xfa:datasets: null], config=[config: null], PDFSecurity=[PDFSecurity: null]}
pdfText = [email protected]:
> Please wait...
>
> If this message is not eventually replaced by the proper contents of
> the document, your PDF viewer may not be able to display this type of
> document. You can upgrade to the latest version of Adobe Reader
> for Windows®, Mac, or Linux® by visiting
> http://www.adobe.com/go/reader_download. For more assistance with
> Adobe Reader visit http://www.adobe.com/go/acrreader. Windows is
> either a registered trademark or a trademark of Microsoft Corporation
> in the United States and/or other countries. Mac is a trademark of
> Apple Inc., registered in the United States and other countries. Linux
> is the registered trademark of Linus Torvalds in the U.S. and other
> countries.
Get Child Nodes Output = [xfa:data: null]
정확히 내가 한 것! 완벽하게 작동합니다! 고맙습니다! – Bryan