No products in the cart.
- This topic is empty.
Viewing 2 posts - 1 through 2 (of 2 total)
-
AuthorPosts
-
March 22, 2024 at 11:45 am #10656tvancParticipant
I was writing a tool to convert HTML tables to CSV and I noticed some bizarre behavior. Given this code
$html = <<<HTML <table> <tr><td>A</td><td>Rose</td></tr> </table> <h1>Leave me behind</h1> <table> <tr><td>By</td><td>Any</td></tr> </table> <table> <tr><td>Other</td><td>Name</td></tr> </table> HTML; $dom = new \DOMDocument(); $dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED); foreach ($dom->getElementsByTagName('table') as $table) { foreach ($table->getElementsByTagName('tr') as $row) { echo trim($row->nodeValue) . PHP_EOL; } }
I would expect output like this:
ARose ByAny OtherName
But what I get is this:
ARose ByAny OtherName ByAny OtherName
I get the same result if I omit the first closing tag. It appears DOMDocument is nesting the second and third
<table>
inside the first.Indeed, if I use xpath to only get immediate children from each table I get the correct output:
$xpath = new \DOMXPath($dom); foreach ($dom->getElementsByTagName('table') as $table) { foreach ($xpath->query('./tr', $table) as $row) { echo trim($row->nodeValue) . PHP_EOL; } }
March 22, 2024 at 11:58 am #10657ken-leeParticipantEnclose your $html with
<body>
and</body>
Revised Code (Note: I commented out the
$stream
lines)<?php $html = <<<HTML <body> <table> <tr><td>A</td><td>Rose</td></tr> </table> <h1>Leave me behind</h1> <table> <tr><td>By</td><td>Any</td></tr> </table> <table> <tr><td>Other</td><td>Name</td></tr> </table> </body> HTML; $dom = new \DOMDocument(); $dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED); $tables = $dom->getElementsByTagName('table'); // $stream = \fopen('php://output', 'w+'); for ($i = 0; $i < $tables->length; ++$i) { $rows = $tables->item($i)->getElementsByTagName('tr'); for ($j = 0; $j < $rows->length; ++$j) { echo trim($rows->item($j)->nodeValue) . "<br><br>"; } } // fclose($stream); ?>
Alternatively, change
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
to
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD);
-
AuthorPosts
Viewing 2 posts - 1 through 2 (of 2 total)
- You must be logged in to reply to this topic.