Friday, September 21, 2007

Software performance with large sets of data

I have seen this many, many times. Something that runs fast during development and maybe even testing because the data used in testing was too small and didn't match real world conditions. If you are working on a small set of data, everything is fast, even slow things.

Your application is useful and popular. Your users love it. Your users love you. But over the next week, something curious happens. As people use the application, it gets progressively slower and slower. Soon, the complaints start filtering in. Within a few weeks, the app is well-neigh unusable due to all the insufferable delays it subjects users to-- and your users turn on you.

Raise your hand if this has ever happened to a project you've worked on. If I had a buck for every time I've personally seen this, I'd have enough for a nice lunch date. Developers test with tiny toy data sets, assume all is well, and then find out the hard way that everything is fast for small n.

I remember a client-side Javascript sort routine we implemented in a rich intranet web app circa 2002. It worked great on our small test datasets, but when we deployed it to production, we were astonished to find that sorting a measly hundred items could take upwards of 5 seconds on a user's desktop machine. JavaScript isn't known for its speed, but what the heck?

Well, guess which sort algorithm we used?

No comments:

Post a Comment