Aug. 7, 2006
TECHNOLOGY: An Engine That ‘Does Search Right’
By Mark Roth
Pittsburgh Post-Gazette
 |
Raul Valdes-Perez is chief executive officer and co-founder of Vivisimo, Inc., a search-engine company. (SHNS photo by Tony Tye / Pittsburgh Post-Gazette)
|
Pittsburgh, PA (SHNS) -- If Raul Valdes-Perez were organizing this article,
it might start out this way:
Raul Valdes-Perez and Vivisimo (33)
Clustering (6)
Business (3)
Google (3)
Carnegie Mellon University (3)
Serendipity (7)
Future (6)
Other topics (5)
That's a rough approximation of how Valdes-Perez's company, Vivisimo,
assembles information with its Internet search engine.
Like its more famous cousins -- Google, Yahoo and Microsoft -- Vivisimo
spits out a list of Web pages based on the search terms that people enter.
But on the left side of the search page, Vivisimo also clusters the top
results by major subthemes, using an algorithm developed by Valdes-Perez and
company co-founders Jerome Pesenti and Christopher Palmer.
So if Vivisimo's clustering software were used to parse this article, it
might find the themes cited above, plus the number of paragraphs covered by
each theme. Then you could click on a theme and look at just those
paragraphs.
Valdes-Perez said clustering allows a computer user to go quickly to the Web
pages that are most relevant, instead of having to rely on a search engine's
software to put the pages he wants at the top of the heap.
Google's overwhelmingly dominant search engine ranks a Web page based
largely on how many other Web pages are linked to it, much as a scientist is
sometimes ranked by how often his research is cited by other scientists.
The philosophy behind that approach, Valdes-Perez said, is that "if you have
high-value information, it will work its way to the top."
But for any individual Web user, that may not be true, he said. The Web
pages that would be most relevant may not be on the first page or the second
page or even the third page of Google's or Yahoo's listings, and that's
where Vivisimo's method of categorizing Web pages by major themes can be a
great help, he said.
"We're trying to move search away from this idea that ranking (Web pages) is
the solution to everything. Instead, our basic philosophy is, don't just try
to show the best 10 or the best five pages, but instead dredge up a larger
amount of stuff, the top 200 or 500, organize that quickly in half a second
or so, and show the major themes to the user."
Valdes-Perez believes that Vivisimo "does search right" -- a brash statement
for a company that is just now reaching 40 employees, compared with Google's
5,680.
Unlike Google, Vivisimo doesn't make most of its money from everyday
consumers using its search engine, but by selling its software and services
to companies and government agencies, such as Firstgov.org, the federal
government's official Web portal.
Valdes-Perez arrived in Pittsburgh by way of Cuba, Chicago, Brazil and
Boston.
His father, an electrical engineer, and his mother, a math teacher, fled
Castro's Cuba in 1961, when Raul was 5. Valdes-Perez got his bachelor's and
master's degrees in information engineering at the University of Illinois at
Chicago, and then spent a couple years writing software in Brazil before
landing at the Artificial Intelligence Laboratory at the Massachusetts
Institute of Technology.
It was there that he conceived the hope of one day studying under Carnegie
Mellon's Herbert Simon, the school's first Nobel Prize laureate and a
founder of the field of artificial intelligence.
Valdes-Perez was accepted into Carnegie Mellon's Ph.D. program in computer
science in 1986, and spent the next five years with Simon as his doctoral
adviser. Even after getting his Ph.D., Valdes-Perez continued to meet with
Simon every other week.
"I went mainly to listen," he said. "I would go to him and say 'Gee, I've
been reading about this stuff lately,' and then I would sit back and he
would share his wisdom."
As Vivisimo has evolved, it has named its clustering search engine
Clusty.com, and has developed its own "crawler" to go out and find Web pages
before grouping them by theme.
The clustering technology may even preserve one feature of the Web that is
rapidly disappearing -- serendipity, or the chance you'll accidentally bump
into a Web site you wouldn't have found otherwise.
As search engines and other digital technology have become more powerful,
they have made it easier for people to find exactly what they want and
dispense with all the rest.
But Valdes-Perez said that by clustering Web pages into themes, Vivisimo can
sometimes reveal connections that people wouldn't have seen otherwise.
To demonstrate that, he recently used the search terms "Osama bin Laden" and
"Madonna" for a group in Washington D.C.
One of the themes that was generated was "niece," he said, and when he
opened that folder, it revealed Web sites about a niece of the terrorist
"who actually hates him but has aspirations to be a pop singer like
Madonna," Valdes-Perez said.
Where will search go in the future?
Many of the major companies are now investing in "personalized search," he
said, using a record of which sites a user has visited to help tailor their
future searches.
A more intriguing possibility is research aimed at generating direct answers
to people's questions when they type them into a Web search engine.
A big challenge there, he said, is to design a system that will understand
what kind of information a person wants about a particular subject.
In general, an ongoing goal of search is to make it "smarter" so we can be
"dumber" -- to develop software that will do the hard thinking for us.
"There's a quote I used to have on my home page at CMU that says 'Society
advances by reducing the number of tasks that require thought.'
"You can accomplish much more by applying your thinking time to higher level
stuff."
E-mail Mark Roth at mroth@post-gazette.com.
Distributed by Scripps Howard News Service, www.scrippsnews.com.