If fifteen years ago you would have told us that some day, deep in the bowels of Carnegie Mellon University, a supercomputer cluster would scan hundreds of millions of Web pages, examine text patterns, and teach itself about the Ramones, we might have believed you -- we were into some far-out stuff back then. But this project is about more than the make of Johnny's guitar (Mosrite) or the name of the original drummer (Tommy). NELL, or Never-Ending Language Learning system, constantly surfs the Web and classifies everything it scans into specific categories (such as cities, universities, and musicians) and relations. One example The New York Times cites:
Peyton Manning is a football player (category). The Indianapolis Colts is a football team (category). By scanning text patterns, NELL can infer with a high probability that Peyton Manning plays for the Indianapolis Colts - even if it has never read that Mr. Manning plays for the Colts.
But sports and music factoids aside, the system is not without its flaws. For instance, when Internet cookies were categorized as baked goods, "[i]t started this whole avalanche of mistakes," according to researcher Tom M. Mitchell. Apparently, NELL soon "learned" that one could delete pastries (the mere thought of which is sure to give us night terrors for quite some time). Luckily, human operators stepped in and corrected the thing, and now it's back on course, accumulating data and giving researchers insights that might someday lead to a true semantic web.