Well, that took a bit longer than expected. I used an RLE optimization. It could use some more work, but it already brings down the calculations to well under a millisec for my 700*180 map, and that's only if the water is disrupted. On top of this, I actually make 10 simulations per frame, not once per game tick, like in say, Boulder Dash. This way my water will flow nicely and quickly though corridors and such. Other techniques have the disadvantage of teleporting the water so you can't see how it flowed, or move the water too slow, making it seem like sand or lava.
Another thing that I do is scattering the water. Some algorithms will produce nasty vertical shafts if there's a leak. This happens when falling down is always prioritized. It will cause an 'edge of the world' wall of water. I occasionally jitter the edges a little so the water forms nice splashy piles.

RLE bottom-top optimization plotted for clarity/debugging. It took some head scratching to get it to work. Static water cells are removed from the simulation until disturbed (or their neighbors are).
Idea mentioned in earlier post. This will be a visual effect and nothing that's part of the water simulation. I can look at neighbors and perhaps figure out a suitable water tile. I'm worried that since the water jitters around a lot, it will look kind of crappy. Also, it will probably be hard to figure out if the waves should be toned up or down. I might be able to build a table, like the HQ3x algorithm does.

Edit: I'm retarded! My millisec counter code was wrong and when I corrected it the MS count jumped up 20x! The sim still runs under a 18ms (which is a frame), so I didn't see this. Luckily I also discovered that I was compiling in debug mode, which has a bunch of bound checks and so for arrays which are VERY slow. When I made a normal build, the simulation was pretty fast again, but not fast enough. Then I realized that... Well, I'm dealing with large water masses here. A pixel (which represents a tile) is probably a cubic meters of water. Running the simulation 10 times per frame just looks good when looking at the entire map from a distance, but Zoomed in, a tile could potentially travel 10px per frame this way. So I just run the sim one time per frame now, and I'm back to 0.5ms or so. This way waves will jitter around less too.