Business Intelligence Network
Business Intelligence Resources

Blog: Pete Loshin

« Open Document Follies | Main | Microsoft Wants More Open Standards? »

Resource for Searching Open Source Code

The C/C++ Source Code Search Engine (csourcesearch.net) is quite something: some guy (I think) who goes by the nom de code "Sembiance" decided it would be a good idea to build a searchable database of open C/C++ source code. So he did it.

It's an interesting open source story for a lot of reasons:

  1. You can only do this kind of thing with open source code.
  2. It's actually pretty useful for anyone planning to use open source code in their enterprise, or for anyone who wants to make sure that their non-open code doesn't actually come from the open source code world. Though you might want to try one of the commercial services, like those from Black Duck Software or Palamida, Inc..
  3. It's an intriguing example of a database application that was created by an individual using some very powerful open source software.

Of course, you can't build a huge searchable code base just from the raw source code; you've got to have the right tools to do all the formatting and data basing and indexing and so on. What might have been a multi-year, multi-million dollar project if done from scratch apparently turned out to be a hobby for Sembiance, using open source tools. The ones cited on csourcesearch.net include:

  • MySQL for the database engine
  • Apache Lucene which is a text search engine library written entirely in Java and running on top of the Apache webserver.
  • CodeWorker, a universal parsing tool & a source code generator. CodeWorker allowed Sembiance to parse out different parts of the source code he was working with and let his code understand them in context.
  • GeSHi - Generic Syntax Highlighter, a tool for highlighting your code based on the appropriate syntax, so C/C++ code looks like it should.
  • Gentoo Linux is cited as a significant contributor to Sembiance's project, probably in large part due to the Portage software management tool. When I say "software management" I mean the task of keeping track of which software packages are installed on the system, at which revision, and whether they've been patched.
  • PHP, the general-purpose scripting language that makes csourcesearch.net a full-fledged member of the LAMP community (O'Reilly's ONLamp.com is as good a place to start as any).
  • Also mentioned in the acknowledgment section of csourcesearch.net is freenode, a service for providing "interactive services to peer-directed project communities." In other words, a good place to get help, implemented on Internet Relay Chat (IRC).
  • Then, there's Flooble.com. I can't really tell what exactly it is, other than a showcase site for Animus Pactum Consulting that also happens to include a pretty decent resource for webmasters looking for design information and free Java/JavaScript code.

You can browse and search by what kind of license is used, by individual packages, and by software categories. All in all, csourcesearch.net provides an intriguing tool for exploring the world of open source C/C++ software for anyone interested in knowing more, whether you're looking to do due diligence on your own code base or just interested in learning more about how to build your own applications.

  Posted by Pete Loshin on December 6, 2005 8:37 AM |

Post a comment