Hacking at 0300

Getting multiple pages in the Amazon Wish List

I figured out how to get all the pages from screen-scraping the Amazon wish list. Basically, look for the "Next" button (it's in a <li class=a-last> element). If that element is present, look for the next page.

function getwishlistitems ($listID, $page=1){
	// ignore parsing warnings
	$wishlistdom = new DOMDocument();
	@$wishlistdom->loadHTMLFile("http://www.amazon.com/gp/registry/wishlist/$listID?disableNav=1&page=$page");
	$wishlistxpath = new DOMXPath ($wishlistdom);
	$items = iterator_to_array($wishlistxpath->query("//div[starts-with(@id,'item_')]"));
	if ($wishlistxpath->evaluate("count(//li[@class='a-last'])")) { // this is the "Next->" button
		$items = array_merge($items, $this->getwishlistitems($listID, $filter, $page+1));
	}
	return $items;
}

Note that this creates a complication: the array of items now includes nodes from different documents, so you can't use one saved DOMXPath. Instead, where the original code has $wishlistxpath->evaluate($xpath, $node), use

(new DOMXPath($node->ownerDocument))->evaluate($xpath, $node);

Hope this helps someone.

This entry was posted by Danny on April 6, 2016 at 9:31 am under Uncategorized. You can leave a response, or trackback from your own site. Follow any responses to this entry through the RSS 2.0 feed.

Free Medical Advice
Recent Posts
Pages
Categories

bililiteRange Javascript jQuery Judaism Medical Informatics Microsoft Office OpenOffice.org Parsedown PDF PHP Scheme Uncategorized Web Design Wordpress
April 2016

S M T W T F S

1 2

3 4 5 6 7 8 9

10 11 12 13 14 15 16

17 18 19 20 21 22 23

24 25 26 27 28 29 30

« Jan Apr »
Archives
Archives
Judaism
Medical Informatics
- Bililite
Web Design
- A List Apart
- jQuery
Meta

Getting multiple pages in the Amazon Wish List

Leave a Reply

Free Medical Advice

Recent Posts

Pages

Categories

Archives

Judaism

Medical Informatics

Web Design

Meta