Skip to content Skip to sidebar Skip to footer

How To Extract Innermost Table From Html File With The Help Of The Html Agility Pack?

I am parsing the tabular information from the html file with the help of the html agility pack. Now I can do it and it works. But when the table what I want to extract is inner mo

Solution 1:

Load the document as a HtmlDocument. Then use an XPath query to find a table that contains no other tables and which has a td in the first row containing "Name".

The XPath implementation is the standard .NET one from System.Xml.XPath, so any documentation about using XPath with XmlDocument will be applicable.

HtmlDocumentdoc=newHtmlDocument();
doc.Load("file.html");
HtmlNodeel= (HtmlNode) doc.DocumentNode.SelectSingleNode("//table[not(descendant::table) and tr[1]/td['NAME' = normalize-space()]]");

If the "Name" column was fixed, you could use something like 'Name' = normalize-space(tr[1]/td[2]).

To find a table based on several column names, but not the inner most table condition.

HtmlNodeel= (HtmlNode) doc.DocumentNode.SelectSingleNode("//table[tr[1]/td['NAME' = normalize-space()] and tr[1]/td['ADDRESS' = normalize-space()]]");

Solution 2:

var table = doc.DocumentNode.SelectSingleNode("//table [not(descendant::table) and tr[1]/td[normalize-space()='ADDRESS'] ]");

Post a Comment for "How To Extract Innermost Table From Html File With The Help Of The Html Agility Pack?"