Aug. 7, 2006
 
TECHNOLOGY: An Engine That ‘Does Search Right’
 
By Mark Roth
Pittsburgh Post-Gazette
 
Raul Valdes-Perez is chief executive officer and co-founder of Vivisimo, Inc., a search-engine company. (SHNS photo by Tony Tye / Pittsburgh Post-Gazette)
Pittsburgh, PA (SHNS) -- If Raul Valdes-Perez were organizing this article, it might start out this way:
 
Raul Valdes-Perez and Vivisimo (33)
Clustering (6)
Business (3)
Google (3)
Carnegie Mellon University (3)
Serendipity (7)
Future (6)
Other topics (5)
 
That's a rough approximation of how Valdes-Perez's company, Vivisimo, assembles information with its Internet search engine.
 
Like its more famous cousins -- Google, Yahoo and Microsoft -- Vivisimo spits out a list of Web pages based on the search terms that people enter. But on the left side of the search page, Vivisimo also clusters the top results by major subthemes, using an algorithm developed by Valdes-Perez and company co-founders Jerome Pesenti and Christopher Palmer.
 
So if Vivisimo's clustering software were used to parse this article, it might find the themes cited above, plus the number of paragraphs covered by each theme. Then you could click on a theme and look at just those paragraphs.
 
Valdes-Perez said clustering allows a computer user to go quickly to the Web pages that are most relevant, instead of having to rely on a search engine's software to put the pages he wants at the top of the heap.
 
Google's overwhelmingly dominant search engine ranks a Web page based largely on how many other Web pages are linked to it, much as a scientist is sometimes ranked by how often his research is cited by other scientists. The philosophy behind that approach, Valdes-Perez said, is that "if you have high-value information, it will work its way to the top."
 
But for any individual Web user, that may not be true, he said. The Web pages that would be most relevant may not be on the first page or the second page or even the third page of Google's or Yahoo's listings, and that's where Vivisimo's method of categorizing Web pages by major themes can be a great help, he said.
 
"We're trying to move search away from this idea that ranking (Web pages) is the solution to everything. Instead, our basic philosophy is, don't just try to show the best 10 or the best five pages, but instead dredge up a larger amount of stuff, the top 200 or 500, organize that quickly in half a second or so, and show the major themes to the user."
 
Valdes-Perez believes that Vivisimo "does search right" -- a brash statement for a company that is just now reaching 40 employees, compared with Google's 5,680.
 
Unlike Google, Vivisimo doesn't make most of its money from everyday consumers using its search engine, but by selling its software and services to companies and government agencies, such as Firstgov.org, the federal government's official Web portal.
 
Valdes-Perez arrived in Pittsburgh by way of Cuba, Chicago, Brazil and Boston.
 
His father, an electrical engineer, and his mother, a math teacher, fled Castro's Cuba in 1961, when Raul was 5. Valdes-Perez got his bachelor's and master's degrees in information engineering at the University of Illinois at Chicago, and then spent a couple years writing software in Brazil before landing at the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology.
 
It was there that he conceived the hope of one day studying under Carnegie Mellon's Herbert Simon, the school's first Nobel Prize laureate and a founder of the field of artificial intelligence.
 
Valdes-Perez was accepted into Carnegie Mellon's Ph.D. program in computer science in 1986, and spent the next five years with Simon as his doctoral adviser. Even after getting his Ph.D., Valdes-Perez continued to meet with Simon every other week.
 
"I went mainly to listen," he said. "I would go to him and say 'Gee, I've been reading about this stuff lately,' and then I would sit back and he would share his wisdom."
 
As Vivisimo has evolved, it has named its clustering search engine Clusty.com, and has developed its own "crawler" to go out and find Web pages before grouping them by theme.
 
The clustering technology may even preserve one feature of the Web that is rapidly disappearing -- serendipity, or the chance you'll accidentally bump into a Web site you wouldn't have found otherwise.
 
As search engines and other digital technology have become more powerful, they have made it easier for people to find exactly what they want and dispense with all the rest.
 
But Valdes-Perez said that by clustering Web pages into themes, Vivisimo can sometimes reveal connections that people wouldn't have seen otherwise. To demonstrate that, he recently used the search terms "Osama bin Laden" and "Madonna" for a group in Washington D.C.
 
One of the themes that was generated was "niece," he said, and when he opened that folder, it revealed Web sites about a niece of the terrorist "who actually hates him but has aspirations to be a pop singer like Madonna," Valdes-Perez said.
 
Where will search go in the future?
 
Many of the major companies are now investing in "personalized search," he said, using a record of which sites a user has visited to help tailor their future searches.
 
A more intriguing possibility is research aimed at generating direct answers to people's questions when they type them into a Web search engine.
 
A big challenge there, he said, is to design a system that will understand what kind of information a person wants about a particular subject.
 
In general, an ongoing goal of search is to make it "smarter" so we can be "dumber" -- to develop software that will do the hard thinking for us.
 
"There's a quote I used to have on my home page at CMU that says 'Society advances by reducing the number of tasks that require thought.'
 
"You can accomplish much more by applying your thinking time to higher level stuff."
 
E-mail Mark Roth at mroth@post-gazette.com.
 
Distributed by Scripps Howard News Service, www.scrippsnews.com.