Skip to content Skip to sidebar Skip to footer

Extracting Anchor Values Hidden In Div Tags

From a html page I need to extract the values of v from all anchor links…each anchor link is hidden in some 5 div tags ?>

Solution 2:

I would better parse HTML with SimpleXML and XPath:

// Get your page HTML string
$html = file_get_contents('xx.html');

// As per comment by Gordon to suppress invalid markup warnings
libxml_use_internal_errors(true);

// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);

// Find a nodes
$anchors = $xml->xpath('//a[contains(@href, "v=")]');

foreach ($anchors as $a)
{
    $href = (string)$a['href'];
    $url = parse_url($href);
    parse_str($url['query'], $params);

    // $params['v'] contains what we need
    $vd[] = $params['v']; // push into array
}

// Clear invalid markup error buffer
libxml_clear_errors();

Post a Comment for "Extracting Anchor Values Hidden In Div Tags"