The other day, I was tasked with building a data scraper. Having never built such a contraption, I naturally turned to the Internets for preexisting code. I was horrified with what I found.
The “free” PHP scripts (that’s “free” as in “free baby vomit”) were all infested with the worst sorts of newfangled regex, and PHP 4 era DOM traversing.
Making matters worse, the scripts didn’t offer much of an API, or interface for data mining – rather they provided a rigid, and worthless example – leaving their hapless users to mutilate whatever useful lines they could find, and create an even more horrid fraken-script.*
It didn’t take me long to realize that PHP 5’s simpleXML was the answer. And indeed, after an hour of practice, simpleXML turned me into a scraping Ninja.
Below, is a very simple example [for drupal 6] that parses the drupal planet blogroll, and makes this neat little table out of it. Hopefully, you’ll find this method as easy, and useful as I did.
*Disclosure: I am not among the sadistic few that think Perl’s regular expressions are the greatest invention since sex. So you call simpleXML a crutch, and I’ll call you sick.

Recent comments
1 day 4 hours ago
3 days 3 min ago
3 days 1 hour ago
3 days 6 hours ago
3 days 6 hours ago
3 days 8 hours ago
3 days 8 hours ago
3 days 10 hours ago
3 days 13 hours ago
3 days 13 hours ago