html 문서에서 특정 링크 찾기 C# HTML 민첩성 팩 사용

페이지 내의 특정 링크를 검색하기 위해 HTML 문서를 구문 분석하려고합니다. 이 방법이 최선의 방법은 아닐지 모르지만 HTML 문서가 필요한 내부 노드를 찾으려고합니다. 그러나 HTML에서 두 곳의 인스턴스 (바닥 글과 탐색 모음)가 발생합니다. 내비게이션 막대에서 링크가 필요합니다. HTML의 "바닥 글"이 먼저옵니다. 여기 내 코드는 다음과 같습니다.html 문서에서 특정 링크 찾기 C# HTML 민첩성 팩 사용

public string findCollegeURL(string catalog, string college) 
    { 
     //Find college 
     HtmlDocument doc = new HtmlDocument(); 
     doc.LoadHtml(catalog); 
     var root = doc.DocumentNode; 
     var htmlNodes = root.DescendantsAndSelf(); 

     // Search through fetched html nodes for relevant information 
     int counter = 0; 
     foreach (HtmlNode node in htmlNodes) { 
      string linkName = node.InnerText; 
      if (linkName == colleges[college] && counter == 0) 
      { 
       counter++; 
       continue; 
      } 
      else if(linkName == colleges[college] && counter == 1) 
      { 
       string targetURL = node.Attributes["href"].Value; //"found it!"; // 
       return targetURL; 
      }/* */ 
     } 

     return "DID NOT WORK"; 
    }

프로그램이 if else 문을 입력하고 있지만 링크를 검색하려고 시도 할 때 NullReferenceException이 발생합니다. 왜 그럴까요? 필요한 링크를 어떻게 검색 할 수 있습니까? /content.php?catoid=10 & navoid = 1210 :

<tr class> 
     <td id="acalog-navigation"> 
      <div class="n2_links" id="gateway-nav-current">...</div> 
      <div class="n2_links">...</div> 
      <div class="n2_links">...</div> 
      <div class="n2_links">...</div> 
      <div class="n2_links">...</div> 
       <a href="/content.php?catoid=10&navoid=1210" class"navbar" tabindex="119">College of Science</a> ==$0 
      </div>

이 내가 원하는 링크입니다 : 여기

내가 접근에 노력하고있어 HTML의 문서에 코드입니다 당신은이 링크 w이 있다면 내가 사용하는 XPath를을 쉽게 사용하는 대신 코드

var link = doc.DocumentNode.SelectSingleNode("//a[text()='College of Science']") 
       .Attributes["href"].Value;

을 많이 쓰는 찾을

출처

2016-11-17 Andrea S.

동일한 텍스트 i 번째, 2 일

var link = doc.DocumentNode.SelectSingleNode("(//a[text()='College of Science'])[2]") 
       .Attributes["href"].Value;

내가 그와 함께 오류가 그것을

var links = doc.DocumentNode.Descendants("a") 
       .Where(a => a.InnerText == "College of Science") 
       .Select(a => a.Attributes["href"].Value) 
       .ToList();

출처

2016-11-17 21:17:37

의 Linq에 버전을 선택합니다. 'HtmlNode'에 'SelectSingleNode'에 대한 정의가 없습니다. –

@AndreaS. 어떤 버전을 사용합니까? 당신의 환경은 무엇입니까? –

Visual Studio 2015 –

html 문서에서 특정 링크 찾기 C# HTML 민첩성 팩 사용

답변

관련 문제