I figured out how to get all the pages from screen-scraping the Amazon wish list. Basically, look for the "Next" button (it's in a <li class=a-last>
element). If that element is present, look for the next page.
function getwishlistitems ($listID, $page=1){
// ignore parsing warnings
$wishlistdom = new DOMDocument();
@$wishlistdom->loadHTMLFile("http://www.amazon.com/gp/registry/wishlist/$listID?disableNav=1&page=$page");
$wishlistxpath = new DOMXPath ($wishlistdom);
$items = iterator_to_array($wishlistxpath->query("//div[starts-with(@id,'item_')]"));
if ($wishlistxpath->evaluate("count(//li[@class='a-last'])")) { // this is the "Next->" button
$items = array_merge($items, $this->getwishlistitems($listID, $filter, $page+1));
}
return $items;
}
Note that this creates a complication: the array of items now includes nodes from different documents, so you can't use one saved DOMXPath
. Instead, where the original code has $wishlistxpath->evaluate($xpath, $node)
, use
(new DOMXPath($node->ownerDocument))->evaluate($xpath, $node);
Hope this helps someone.
Leave a Reply