2010-02-01 6 views
3

코드는이 문제를 내가 할 수있는 것보다 훨씬 잘 설명 할 수 있습니다. 나는 또한 이것을 시도한 다른 방법을 포함시켰다. 가능한 경우 이러한 다른 방법이 작동하지 않는 이유를 설명하십시오. 슬프게도 HtmlAgilityPack에 대한 예제가 많지 않습니다. 나는 현재 문서를 통해 더 많은 아이디어를 찾고있다.HtmlAgility Pack을 사용하여 특정 양식의 입력을 얻는 방법은 무엇입니까? Lang : C# .net

내가 알아챈 한 가지는 .nextSibling 속성이었고 다음 형제 나 형식의 끝을 찾을 때까지 while 루프를 사용하여 양식을 검토 할 수 있다고 생각했습니다.

어쨌든, 여기에 코드는 다음과 같습니다

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using HtmlAgilityPack; 
using System.Collections; 

namespace ConsoleApplication1 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      string source = @" 
       <form name='form1' action='action1' method='method1' id='id1'> 
       <input type='text1.1' name='name1.1' value='value1.1' /> 
       <input type='text1.2' name='name1.2' value='value1.2' /> 
      </form> 
      <form name='form2' action='action2' method='method2' id='id2'> 
       <input type='text2.1' name='name2.1' value='value2.1' /> 
       <input type='text2.2' name='name2.2' value='value2.2' /> 
      </form> 
        "; 
      List<HtmlAttribute> formAttributes = new List<HtmlAttribute>();//this is what i'm wanting to get for the current form. 
      /** 
      * I want to end up with a list that has 
      * Name: type Value: text1.1 
      * Name: name Value: 1.1 
      * Name: value Value: value1.1 
      * Name: type Value: text1.2 
      * Name: name Value: name1.2 
      * Name: value Value: value1.2 
      * but I am ending up with the 2nd forms values as well 
      * */ 
      HtmlDocument htmlDoc = new HtmlDocument(); 
      htmlDoc.LoadHtml(source); 

      var forms = htmlDoc.DocumentNode.Descendants("form"); 
      foreach (var form in forms) 
      { 
       Console.WriteLine(form.Attributes[0].Value); //simple writes the form name to the console to keep track of things 

       HtmlNodeCollection inputs = form.SelectNodes("/input"); // gets all the inputs in the selected form, or so I thought. This is where the problem lies. Result: Shows both forms inputs. 
       //HtmlNodeCollection inputs = form.SelectNodes("//input"); // not the best at xpath, but perhaps this could make a difference? Result: no difference 
       //var inputs = form.Elements("input"); // Maybe the inputs are referred to as elements? Result: shows no input outerhtml at all. 
       foreach (var input in inputs) //this has all 4 inputs from both forms. I only want it to have 2 inputs from the selected form. 
       { 
        Console.WriteLine(input.OuterHtml); 
        List<HtmlAttribute> attributes = new List<HtmlAttribute>(); 
        attributes = input.Attributes.ToList<HtmlAttribute>(); 
        foreach (var att in attributes) 
        { 
         //add attributes to allattributes list code that will be done once problem of getting only inputs for specified form is fixed 
        } 
       } 
       // here comes an alternate method! Edit: Didn't work :'(
       //var inputs = form.Descendants("input"); // perhaps using the "Descendants class will make a difference. Result: Nope, didn't have any items at all! 
       //IEnumerator e = inputs.GetEnumerator(); 
       //while (e.MoveNext()) 
       //{ 
       // Console.WriteLine("input: " + e.Current); 

       //} 
       Console.WriteLine(); // Simply making everything look pretty with a newline after each form name/input outerhtml display. 
      } 
      Console.Read(); 
     } 

    } 
} 

답변

3

나는 답을 발견! 해결책과 설명이 포함되어 있으므로 아래의 코드를보십시오! :)

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using HtmlAgilityPack; 
using System.Collections; 

namespace ConsoleApplication1 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      string source = @" 
       <form name='form1' action='action1' method='method1' id='id1'> 
       <input type='text1.1' name='name1.1' value='value1.1' /> 
       <input type='text1.2' name='name1.2' value='value1.2' /> 
      </form> 
      <form name='form2' action='action2' method='method2' id='id2'> 
       <input type='text2.1' name='name2.1' value='value2.1' /> 
       <input type='text2.2' name='name2.2' value='value2.2' /> 
      </form> 
        "; 
      List<HtmlAttribute> formAttributes = new List<HtmlAttribute>(); 
      IEnumerable<HtmlNode> inputs; 
      /* 
      * The line below is the major reason that this solution "worked" and the other didn't 
      * */ 
      HtmlNode.ElementsFlags.Remove("form"); 
      /* 
      * I was going through the HtmlAgilityPack forum, and stumbled upon this little tidbit of info: 
      * 
      * "This is because by default, Forms are parsed as empty nodes - this is because forms are allowed to 
      overlap other elements in the HTML spec. 

       In other words, the following is technically legal HTML, even though it gives us developer hives: 

       <table> 
       <form> 
       <some input elements> 
       </table> 
       </form> 

       Here, the form overlaps the closing of the table and when properly rendered, will be contained inside the table. 
       Since HtmlDocument attempts to allow this as valid without automatically correcting the HTML, HtmlDocument by default 
       makes no attempt to populate the child nodes of the form. 
       Ok. All that is merely an introduction. You can get around this default behavior by adding the following line: 
       HtmlNode.ElementsFlags.Remove("form"); 
       before you make ANY use of HtmlDocument. This will allow it to parse the nodes of the form, but it sacrifices 
       the ability of the form to overlap other nodes. It will force the form to be closed properly." 
      * 
      * HtmlAgilityPack didn't put the inputs as the childnode of each form because of "technically legal HTML" that could mess things up a bit, 
      * so the only thing I had to do is remove the element flag! Enjoy the code below, it should be pretty self explanatory. 
      * */ 

      HtmlDocument htmlDoc = new HtmlDocument(); 
      htmlDoc.OptionOutputAsXml = true; 
      htmlDoc.OptionAutoCloseOnEnd = true; 
      htmlDoc.LoadHtml(source); 

      var forms = htmlDoc.DocumentNode.Descendants("form"); 
      foreach (var form in forms) 
      { 
       inputs = form.ChildNodes 
        .Where<HtmlNode>(a => a.OriginalName.Contains("input")); // woo hoo, finally figuring out what linq is. Sort of like mysql when I was coding php! 

       Console.WriteLine(form.Attributes[0].Value + " attributes:" + Environment.NewLine + "------------------"); 
       foreach (var input in inputs) 
       { 
        IEnumerable<HtmlAttribute> attributes; 
        attributes = input.Attributes; 
        foreach (var att in attributes) 
        { 
         Console.WriteLine("Name: " + att.Name + Environment.NewLine 
           + "Value: " + att.Value + Environment.NewLine); 
         formAttributes.Add(att); 
        } 
       } 
       Console.WriteLine(); // Simply making everything look pretty with a newline after each form name/input outerhtml display. 
      } 
      Console.Read(); 
     } 

    } 
} 
+0

나 자신의 답변을 투표 할 수 없습니까? 이것이 내가 8 시간 동안 문제가 아니라면 알아 내지 못했을까요? LOL, 너무 황홀 했어. – Codygman

+1

잘 알고 있습니다 - 답변을 게시 해 주셔서 감사합니다. – Dror