Perilously Inefficient03 Dec 2015
If computer science has taught me anything, it is that there is always a more efficient way to do something. The fact that not all algorithms are created equal supports this idea.
Since learning CS, my mind is constantly running two parallel processes:
- A stream of consciousness (like every other human on Earth)
- A stream of algorithms to determine if I’m doing something most efficiently
Generally I’m only happy when the second process gives the green light to the first process.
An example of when this didn’t happen occurred today. For an assignment in HONORS 240, I am responsible for making a visual argument about a certain dataset. The dataset that I decided on regards America’s top philanthropists, compiled by The Chronicle of Philanthropy.
In the Chronicle’s infinite wisdom, options for easily exporting the dataset are nonexistent. Therefore, I had to find out a way to do it myself.
The programmer in me wanted to use BeautifulSoup, a Python package for web scraping. This, I thought, would be most efficient.
Lo and behold, I was wrong. Some of those measly efficiency algorithms of mine forgot to take into account the [self-proclaimed] steep learning curve associated with BeautifulSoup. Without anything more than a cursory knowledge of HTML, I was doomed to fail with such an approach.
Thus, I was left to try something else. The most efficient way of doing it? Copy-and-pasting.
Yes, it sounds deplorable. But some beautiful behind-the-scenes technology helped me out big time. All I had to do was:
- Copy the complete table from the Chronicle’s web page
- Paste it into a Microsoft Excel workbook
- Remove superfluous headers
- Clear formatting
Excel didn’t do a wonderful job of saving the data as a CSV, and so I had to manually change a handful of entries with a text editor. The results of the work, however, aren’t all too bad.
Though I probably saved myself a ton of time going the stone-age way, I’m still rather upset that I didn’t solve my problem in a programmatic fashion. BeautifulSoup seems hard to learn, and so I backed down from a challenge for the sake of time.
I was perilously inefficient in terms of scalability, yes. But after my copy-and-paste breakthrough, the job took no more than 15 minutes.
Time and scale matter. But with a simple and small task, time matters more.