Blog: Pete Loshin« Mining Valuable Intelligence From Random Numbers | Main | Installing MySQL: A Tale of Two Platforms » Opening Up the Internet: Craigslist + Yahoo! Pipes = Better Data SearchingWe've really come a long way with the web and the Internet over the past dozen years or so. Back then, it was kind of a big deal to run screen-scraping software that could pull data off websites, or access corporate legacy mainframe systems through a webified front end. Now, we're seeing more and more of the web is instantiated in some seriously big data stores, and we're seeing more and more of the owners of those seriously big data stores making data processing tools and APIs available to anyone who wants them, so we can have some nice little mashup applications combining, for example, maps and data with geographical components. But here's something sort of new: a way to make an already popular, useful and generally great website--in this case, Craigslist--with another popular, useful and great website--Yahoo! Pipes. The result is even better than either one. Yahoo! Pipes is kind of like a web version of UNIX piping: a way to take the results of one command (output) and "pipe" it into another command as input. What you get is a very handy way to create very specific and powerful searches, and turn the results into useful information. So, here's the article that got me hooked: How to Actually Search Craigslist. As great as Craigslist is, it has some drawbacks. James Aaron, who wrote the article, is a student at San Jose State's School of Library and Information Science, and is looking for a job currently. He likes Craigslist, but, as he explains, it could be even more helpful if there were ways to search better: There is no way to truncate searches, such as "librar*" to include librarian, library, libraries, etc. There is no way to perform Boolean AND, OR, NOT searches. There is no way to remove frequently occuring irrelevant items. There is no way to search two sub-regions at once. So, unless I want to perform 20 searches a day and receive MANY completely irrelevant hits, I basically have to browse. The answer, he tells us, is Yahoo! Pipes, and he explains just how to use Pipes with Craigslist to make Craigslist that much more useful. In other words, more evidence of just how much the entire web is evolving into the world's biggest ever data store, with the most powerful ever set of tools for extracting business intelligence. How could you use this kind of capability to extract actionable knowledge from the web? |