Skip to content

Getting multiple pages in the Amazon Wish List

I figured out how to get all the pages from screen-scraping the Amazon wish list. Basically, look for the "Next" button (it's in a <li class=a-last> element). If that element is present, look for the next page.

function getwishlistitems ($listID, $page=1){
	// ignore parsing warnings
	$wishlistdom = new DOMDocument();
	@$wishlistdom->loadHTMLFile("http://www.amazon.com/gp/registry/wishlist/$listID?disableNav=1&page=$page");
	$wishlistxpath = new DOMXPath ($wishlistdom);
	$items = iterator_to_array($wishlistxpath->query("//div[starts-with(@id,'item_')]"));
	if ($wishlistxpath->evaluate("count(//li[@class='a-last'])")) { // this is the "Next->" button
		$items = array_merge($items, $this->getwishlistitems($listID, $filter, $page+1));
	}
	return $items;
}

Note that this creates a complication: the array of items now includes nodes from different documents, so you can't use one saved DOMXPath. Instead, where the original code has $wishlistxpath->evaluate($xpath, $node), use

(new DOMXPath($node->ownerDocument))->evaluate($xpath, $node);

Hope this helps someone.

Hacking My Way Again to an Amazon Wishlist Widget

Amazon long ago elminated its API for getting wishlists. 4 years ago I made a screen-scraping WordPress widget to display my wishlist. Unfortunately, as happens with screen-scraping, Amazon changed their format and URL's. And now I can't seem to get the ItemLookup API to work either.

doitlikejustin has a vanilla PHP wishlist scraper, but PHP 5 now has it's own HTML parser in DOMDocument, so I implemented my own.

The wishlist page has a simple structure, and all links to Amazon products have as part of the URL "dp/{ASIN}", where {ASIN} is the Amazon ID number, and all the individual items are contained in <div>s that have an id that starts with "item_", and the title is in a link that has an id that starts with "itemName". The image and author list are in consistent positions relative to those. Other advertisements for Amazon products that you see on the page are added with Javascript, so they won't show up when we grab the page with PHP.

Images URL images have the format "http://ecx.images-amazon.com/images/I/{idcode}._SL{size}.jpg" (with possibly some extra parameters before the "SL"). I just
pull the relevant idcode out and create my own URL with the desired size.

function wishlist($listID){
	$size = 100;
	$ret = array();
	$wishlistdom = new DOMDocument();
	// ignore parsing warnings
	@$wishlistdom->loadHTMLFile("http://www.amazon.com/gp/registry/wishlist/$listID?disableNav=1");
	$wishlistxpath = new DOMXPath ($wishlistdom);
	// I want to be able to limit and rearrange the list, so I turn it into an array
	$items = iterator_to_array($wishlistxpath->query("//div[starts-with(@id,'item_')]"));
	// filter $items as desired, then pull out the data
	foreach ($items as $item){
		$link = $wishlistxpath->evaluate(".//a[starts-with(@id, 'itemName')]", $item)->item(0);
		$href = $link->attributes->getNamedItem('href')->nodeValue;
		if (preg_match ('|/dp/\w+|', $href, $matches)){
			$href = "http://amazon.com$matches[0]"; // simplify the URL
		}else{
			$href = "http://amazon.com$href";
		}
		$title = $link->textContent;
		$author = $link->parentNode->nextSibling->textContent;
		$image = $wishlistxpath->query(".//img", $item)->item(0)->attributes->getNamedItem('src')->nodeValue;
		if (preg_match ('|http://ecx.images-amazon.com/images/I/[^.]+|', $image, $matches)){
			$image = $matches[0]."._SL$size.jpg";
		}else{
			$image = "http://ecx.images-amazon.com/images/G/01/x-site/icons/no-img-sm._SL${size}_.jpg";
		}
		$image = "<img src='$image' alt='$title'><br/>";
		$ret[] = "<a href='$href'>$image$title<br/>$author</a>";
	}
	return ret;
}

Now this only gets the first page (25 items) of a wish list. I modified it to allow finding all the items on a wish list.

Odd bug with Date’s

It's been almost a year since I last posted. I'm still programming, but it's mostly visible on github, especially my trying to help with jquery/globalize by implementing nongregorian calendars.

I finally solved a bug that was, um, bugging me. In order to test my Julian Day routines, I needed to create a javascript Date at midnight UTC rather than local time. I thought I was clever when I did:

d = new Date();
d.setUTCFullYear( year );
d.setUTCMonth( month );
d.setUTCDate( date );
d.setUTCHours( 0 );
d.setUTCMinutes( 0 );
d.setUTCSeconds( 0 );

And everything worked fine, until last night when I would set month=1 (February) but the month would end up as 2 (March). This had never happened before, and the code hadn't changed.

I finally realized that the date was 2016-01-28 St. Louis time at 2300, or 2016-01-29 UTC. So setting the year to a non-leap year like 2015, then setting the month to 1 with setUTCMonth() meant I was trying to set it to 2015-02-29, which Date helpfully corrected to 2015-03-01, then the date was set correctly. The only way that bug would turn up is if my code was run on the last day of the month and tested with a shorter month.

Turns out there's a much better way to set UTC time:

d = new Date( Date.UTC( year, month, date ) )

And now I am wiser.

New flexcal

After some prompting by those actually using it, I paid more attention to my flexcal date picker plugin, adding features like buttons, drop-down menus and formatted dates. The documentation is on my github pages at github.bililite.com/flexcal, and the code is on github at github.com/dwachss/flexcal. All other posts about it are obsolete.

The current stable version is 3.4.

Animating arbitrary values with jQuery

jQuery's animate is useful for animating CSS properties, but there are times that you want to take advantage of the bookkeeping that jQuery provides to animate, with easing functions etc., but not to change an animatable property. The flip plugin for Mike Alsup's cycle2 has a clever hack to animate some otherwise unused property instead:


<input type=button value="Click Me" id=rotatebutton />
<div id=rotatetest style="height: 100px; width: 100px; background: #abcdef" >Whee!</div>

$('#rotatebutton').click(function(){
  $('<div>'). // create an unused element (could use a preexisting one to save some time and memory)
    animate({lineHeight: 360}, // we aren't really using lineHeight; we just want something numeric.
      // We want it to go from 0 (the default for anything that has a value of 'auto') to 360
    {
      duration: 2000,
      easing: 'easeOutElastic', // use jQuery and jQuery UI to manage the animation timing
      step: function (now){
        $('#rotatetest').css('transform', 'rotate('+now+'deg)'); // use the number that animate gives us
      }
    });
});

Running pages locally

I want to start using github pages for documentation, which would allow me to host them on github but still edit them locally and just push the edits. The problem is debugging. Anything that relies on AJAX is a security risk if I'm trying to get local files, so browsers reject any $.get('localfile.json'). I understand the restriction, but it makes development very annoying. There are proposals to allow some kind of package, with only descendents of the current file, but everyone is too scared that users will download something to their Documents folder and expose themselves.

So the only solution seems to be to set up a local http server and use that. The simplest I've found (not very fast, but I don't need that) is to use python's http.server. First, install python (choco install python works), and then in my PowerShell profile I have a line:

function svr { Start-Process "C:\tools\python\python.exe" "-m http.server" }

So I navigate to my desired folder, run svr, and it starts a python window running the server. localhost:8000 is the URL of the folder the server is running on.

New jquery.ui.subclass.js

I finally updated my jQuery widget subclassing code to use the newest version of jQuery UI, which incorporated a lot of the original ideas I outlined back in 2010. The new documentation is now on my github pages, and I've updated the flexcal posts to reflect it.

It is a breaking change; instead of $.namespace.widgetname.subclass('namespace.newwidgetname', {methods...}) you use the real jQuery UI way: $.widget('namespace.newwidgetname', $.namespace.widgetname, {methods...}).

I've also changed all my flexcal-related widgets to the bililite namespace, per jQuery UI guidelines. It's now $.bililite.flexcal instead of $.ui.flexcal, and so on for all the fields in that (like $.bililite.flexcal.tol10n).

Hope not too many people are inconvenienced.

New jQuery plugins, $.repo and $.getScripts

See the code.

jQuery plugin to allow using cdn.rawgit.com to get the latest commit of a github repo

github won't let you hotlink to their site directly; raw.githubusercontent.com sends its content with a X-Content-Type-Options:nosniff header, so modern browsers won't accept it as javascript.

http://rawgit.com gets around that by pulling the raw file and re-serving it with more lenient headers, but the rate is throttled so you can't use it on public sites. http://cdn.rawgit.com isn't throttled but is cached, permanently. Once a given URL is fetched, it stays in the cache and if the file is updated on github, it won't be on cdn.rawgit.com . So having a script tag <script src="http://cdn.rawgit.com/user/repo/master/file.js"> lets you get the script from github, but even when the master branch is updated, the script retrieved will remain the same.

The answer is to use a specific tag or commit in the script tag: <script src="http://cdn.rawgit.com/user/repo/abc1234/file.js"> and change that when the underlying repo is updated. But that is terribly inconvenient.

For stable libraries, that's not a problem, since they should be tagged with version numbers: http://cdn.rawgit.com/user/repo/v1.0/file.js and that's probably what you want. However, if you always want the latest version, that won't work.

$.repo uses the github API to get the SHA for the latest commit to the master, and returns a $.Deferred that resolved to the appropriate URL (with no trailing slash):

$.repo('user/repo').then(function (repo){
	$.getScript(repo+'/file.js');
});

The github api is also rate-limited (to 60 requests an hour from a given IP address), so the repo address is cached for fixed period of time (default 1 hour), with the value saved in localStorage.

$.repo('user/repo', time); // if the cached value is more than time msec old, get a new one
$.repo('user/repo', 0); // force a refresh from github's server

$.getScripts

$.getScript is useful, but it is asynchronous, which means that you can't load scripts that depend on one another with:

$.getScript('first.js');
$.getScript('second.js');
$.getScript('third.js');

You have to do:

$.getScript('first.js').then(function(){
	return $.getScript('second.js');
}).then(function(){
	return $.getScript('third.js');
}).then(function(){
	// use the scripts
});

$.getScripts(Array) abstracts this out, so you can do:

$.getScripts(['first.js', 'second.js', 'third.js']).then(function(){
	// use the scripts
});

It's basically a very simple script loader.

Improved flexcal

Two new additions to flexcal, which is now at version 2.2. See the code and a demo, that uses my new github pages.

Mouse Wheel

The calendar now responds to wheel events, changing the month with scroll up/down and changing the calendar tab with scroll left/right. I initially tried to throttle it since my trackpad was too sensitive, but that made it too slow. I'm not sure waht to do about that. It works well with an actual mouse wheel, thoug.

Keith Wood's Calendars

Keith Wood is maintaining the jQuery plugin that eventually became the official jQuery UI datepicker. His version, however, handles multiple calendar systems, much like flexcal (though I like my plugin, of course). His code is on github, and the documentation is on his personal site. He has support for many calendar systems, and I wanted to let flexcal use that. I don't use his datepicker code, just the calendar systems.

I'm using a naming convention of language-calendar, like he-jewish for a calendar localized to hebrew, using a Jewish lunar calendar. It's the opposite of Woods, who uses calendar-language, like islamic-ar for a calendar localized to arabic, using the Islamic lunar calendar. For my names, the default language is en and the default calendar system in gregorian. Thus, islamic would be an English language Islamic calendar, and zh-TW would be a Taiwanese Chinese Gregorian calendar (my parser is smart enough to not be confused by the extra hyphen). The language codes are ISO 639 two-letter codes.

There is a new function, $.ui.flexcal.tol10n(name) that creates a localization object with appropriate language and calendar system. It is designed to be transparent; doing

$('input').flexcal({
  calendars: ['ar-islamic']
});

will look in the existing $.ui.flexcal.l10n object for 'ar-islamic', then try to find it in the jQuery UI datepicker localizations (if those are loaded) (note that jQuery UI only uses Gregorian calendars), then Woods's calendars if they exist.

You can use the bridge directly if you want; $.ui.flexcal.tol10n(name) returns a localization object that you can modify as desired and then pass to flexcal.

To use this, you need to include the necessary script files: jquery.calendars.js for the basic code, jquery.calendars.name.js (like jquery.calendars.islamic.js) for the calendar-system-specific routines (these are all in English), and jquery.calendars.name-language.js (like jquery.calendars.islamic-ar.js)for the language-specific localization, or jquery.calendars.language.js for language-specific localization using the Gregorian calendar.

Of note, the text that is used for the "Next Month" and "Previous Month" is localized for the datepicker, not the underlying calendar system, so if you want that included automatically, also include the jquery.calendars.picker-name.js. I have a little hack in flexcal so you do not have to include the entire jquery.calendars.picker.js package; just include the jquery.calendars.picker-mock.js before including the localization code.
It should be clear which files to include if you look at the list.

TODO

His calendar code also has some nice formatting options, which flexcal does not have out of the box, though it is possible with some work. I'd like to get a bridge to work with my code as well.
Some other options that would be nice include: setting the first day of the week (now is fixed at Sunday) and having the option of showing or selecting days from other months.

Creating a PowerShell shortcut

I'm sure there must be an easier way to run a program from PowerShell, but I haven't found anything simpler than

& "C:\Program Files (x86)\Notepad++\notepad++.exe"

with the ampersand and the full path. I could add "C:\Program Files (x86)\Notepad++" to $env:PATH, but I'd still have to type notepad++.exe file. I wanted some way to make a shortcut to a program name without having to create a new file, either a .LNK or .BAT file.

Turns out you can do this with PowerShell functions:

function npp { & "C:\Program Files (x86)\Notepad++\notepad++.exe" $args }

lets me run Notepad++ from PowerShell without nearly so much typing. Stuck that in my Profile.ps1 and now I'm happy.
Hope this turns out to be useful to someone.