Hacking at 0300

Hacking My Way Again to an Amazon Wishlist Widget

Amazon long ago elminated its API for getting wishlists. 4 years ago I made a screen-scraping WordPress widget to display my wishlist. Unfortunately, as happens with screen-scraping, Amazon changed their format and URL's. And now I can't seem to get the ItemLookup API to work either.

doitlikejustin has a vanilla PHP wishlist scraper, but PHP 5 now has it's own HTML parser in DOMDocument, so I implemented my own.

The wishlist page has a simple structure, and all links to Amazon products have as part of the URL "dp/{ASIN}", where {ASIN} is the Amazon ID number, and all the individual items are contained in <div>s that have an id that starts with "item_", and the title is in a link that has an id that starts with "itemName". The image and author list are in consistent positions relative to those. Other advertisements for Amazon products that you see on the page are added with Javascript, so they won't show up when we grab the page with PHP.

Images URL images have the format "http://ecx.images-amazon.com/images/I/{idcode}._SL{size}.jpg" (with possibly some extra parameters before the "SL"). I just
pull the relevant idcode out and create my own URL with the desired size.

function wishlist($listID){
	$size = 100;
	$ret = array();
	$wishlistdom = new DOMDocument();
	// ignore parsing warnings
	@$wishlistdom->loadHTMLFile("http://www.amazon.com/gp/registry/wishlist/$listID?disableNav=1");
	$wishlistxpath = new DOMXPath ($wishlistdom);
	// I want to be able to limit and rearrange the list, so I turn it into an array
	$items = iterator_to_array($wishlistxpath->query("//div[starts-with(@id,'item_')]"));
	// filter $items as desired, then pull out the data
	foreach ($items as $item){
		$link = $wishlistxpath->evaluate(".//a[starts-with(@id, 'itemName')]", $item)->item(0);
		$href = $link->attributes->getNamedItem('href')->nodeValue;
		if (preg_match ('|/dp/\w+|', $href, $matches)){
			$href = "http://amazon.com$matches[0]"; // simplify the URL
		}else{
			$href = "http://amazon.com$href";
		}
		$title = $link->textContent;
		$author = $link->parentNode->nextSibling->textContent;
		$image = $wishlistxpath->query(".//img", $item)->item(0)->attributes->getNamedItem('src')->nodeValue;
		if (preg_match ('|http://ecx.images-amazon.com/images/I/[^.]+|', $image, $matches)){
			$image = $matches[0]."._SL$size.jpg";
		}else{
			$image = "http://ecx.images-amazon.com/images/G/01/x-site/icons/no-img-sm._SL${size}_.jpg";
		}
		$image = "<img src='$image' alt='$title'><br/>";
		$ret[] = "<a href='$href'>$image$title<br/>$author</a>";
	}
	return ret;
}

Now this only gets the first page (25 items) of a wish list. I modified it to allow finding all the items on a wish list.

This entry was posted by Danny on April 5, 2016 at 4:15 pm under Wordpress. You can leave a response, or trackback from your own site. Follow any responses to this entry through the RSS 2.0 feed.

2 Comments

Max says:

Very cool. Do you know of any example where Amazon allowed a third party to write via an inbound API to someone’s wishlist? Eg a brand page allowing you to drop its products into your own wishlist?
June 28, 2017, 2:28 pm
Danny says:

@Max:
As far as I can tell, Amazon does not want to open up their wishlist to any API (even their own). So, no third-party stuff either.
–Danny
June 29, 2017, 7:44 am

S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Hacking My Way Again to an Amazon Wishlist Widget

2 Comments

Max says:

Danny says:

Leave a Reply

Free Medical Advice

Recent Posts

Pages

Archives

Judaism

Medical Informatics

Web Design

Meta

Hacking at 0300

Hacking My Way Again to an Amazon Wishlist Widget

2 Comments

Max says:

Danny says:

Leave a Reply

Free Medical Advice

Recent Posts

Pages

Categories

Archives

Judaism

Medical Informatics

Web Design

Meta