Largest Online News Website Launches
By Dana Greenlee,
WebTalk Radio Monday, June 7, 2004; 12:00pm EST
A conversation with CEO Rich Skrenta...
News and information online is so vast that it can be unmanageable in
its volume. This is precisely why news aggregation search Websites
like Yahoo News and Google News are growing like crazy. Last month
marks the launch of a new entry - Topix.net, said to be the world�s
largest online news Website with over 150,000 news categories and news
sources from around the world.
Rich Skrenta, CEO of Topix.net and co-founder of Netscape�s Open
Directory Project took a few minutes out of his news day to explain
his huge online news creation.
Q: What does your site do and how does it work?
Skrenta: Topix.net is reading all of the news published online
constantly and categorizing stories, both geographically as well as by
subject. We�ve got a basic news roll-up for every zip code in the
United States � for 30,000 towns across the U.S. We also have 150,000
subject categories. We have a page about every sports team, about
every celebrity, every music style. We even have a page about mobile
home manufacturing, which has a surprisingly large amount of news on
it.
Q: You can�t fit 30,000 zip codes on your front page. Does the
cream of the crop rise to the top so that when you go to your home
page you see some of the major hot topics?
Skrenta: We have links to the major cities in the country and users
can type in zip codes and go to a page just for news about their town.
We�ve got some of the categories surfaced on our front page: U.S. and
world news, journalism news, health, technology � basically a random
assortment of categories from deep within our system. But to get the
full experience of Topix, you have to click around and experience the
full breadth by viewing the internal parts of the site.
Q: There are a lot of people saying you�re the largest news Website
that�s ever been created. Would that be a true statement?
Skrenta: Based on the number of categories, yes. If you look at Google
News, they�ve got eight categories: health, politics, entertainment,
sports and so on, basically corresponding to the standard Associated
Press taxonomy. Yahoo News has 100 full coverage sections. We have
150,000 pages. Our goal is to have a page constantly updated from the
broadest variety of sources about every person, place and thing in the
world. We haven�t done every person, place or thing yet, but we�ve
done the first 150,000. We�ve got a page about every public company.
There are 5,500 public companies. We�re tracking references to every
disease and drug - both brand and generic - 21,000 sports
personalities, 45,000 celebrities, anyone who�s ever been in a movie,
anyone who�s ever put out a music album.
Q: Building out your keyword database � you must have spent a lot
of long nights working.
Skrenta: Yeah � it�s a massive knowledge base which drives our system
in conjunction with some artificial intelligence that we developed.
The knowledge base knows the name of every street in the country,
every bridge, tunnel, hospital, school, body of water, baseball
stadium, park � in addition to the other subjects and keywords its
looking for. It�s about 10 million lines of text that are constantly
being looked for in every story that comes through our system.
Q: What�s the difference between what Topix.net is doing and what
Google is doing in terms of the broad picture of what you�re actually
indexing?
Skrenta: If you go to Google News and want to get information about,
say, IBM, you�d find a lot of stories that contained the three letters
�IBM,� but it might not be a good relevant overview of IBM�s current
business. When we look at a story, we�re trying to determine not just
if a story contains certain keywords, but actually if it�s about the
concept that our topic is about.
A story we recently saw said, �Dot-com survivors have aged like fine
bordeaux.� Now this is a reference to a style of wine and we have a
wine page in Topix, but it�s not a wine story. It�s a business story.
It�s a stray reference to something else. If you search for bordeaux
on Google News, you would get this story. But it�s not what you�re
looking for. Our system can tell the difference between stray
references to concepts and stories that are actually about the
concept.
Q: You come from an incredible background of creating things we now
take for granted on the Web. Have you now created this artificial
intelligence (AI) that�s just proprietary to Topix that no one else
has duplicated?
Skrenta: Yeah, it�s pretty unique in the industry. I haven�t seen
anything else like it. We looked at what had been done in academic AI.
If you�re going to develop an AI technique, 85 percent accuracy is
pretty good. But for our purposes, we had to get far above 85 percent
to make the stories look good on a page. If our AI was only 85
percent, that would mean that on every one of our pages, 2-3 stories
would be bad. We had to get way above 99 percent.
Q: It�s a pleasing site. You don�t get just a list of headlines;
it�s formatted beautifully and very well organized. It�s a pleasure to
read and it�s exciting.
Skrenta: I�m glad to hear you say that. When we looked at creating a
look and feel for the site, we looked at a bunch of newspaper sites
out there. I didn�t really feel like a lot of them looked like a
newspaper. The Wall Street Journal Online is the one that was closest
to a newspaper look and feel.
We did some research and found that newspaper layout design is
actually a rich feel with a 150-year history and there are books and
guidelines about rules to follow to make things visually appealing in
a print newspaper. Things like if you have a photo to a story and the
photo is a picture of a person in profile, the person should be facing
the text of the story; very subtle, not obvious rules about how to do
newspaper layout properly. When we looked at online newspapers, many
didn�t follow any of these rules at all. I couldn�t figure out why �
maybe the separation between the print and online divisions at the
company. We thought we�d bring some of these rules to bear on a
Website design, adapt it to the Web and come up with something a
little more reminiscent of a newspaper.
Q: You were one of the founders of the Open Directory Project. What
did you learn from building the Open Directory that you�ve applied to
this new site?
Skrenta: The Open Directory was built with 60,000 volunteer editors.
We built a giant Web directory similar to Yahoo, but 3-4 times bigger
than Yahoo Directory. It�s actually the directory tab on Google.com,
in addition to being used by AOL and Netscape. It was a very
successful project, but we sort of took the opposite tact with
Topix.net. We have zero human editors at Topix � it�s all done with
AI. Humans are really good at some tasks, but the scale of what we�re
doing here is just so vast that we couldn�t have humans manually
editing stories or selecting topics or categorizing them. It�s too big
a project for even 60,000 people to undertake.
Q: What�s fascinating is what it seems like what you�re building is
somewhat of a dynamically populated directory. It looks like you still
have that focus on categorizations of content, which is different than
a regular search engine. Was your vision to create a directory-type of
search engine?
Skrenta: What we�re trying to do is classify text by concept instead
of keyword. When you go to Google and type in a name like Scott
Peterson or Janet Jackson, these are actually relatively common names.
There are thousands of people in the country with those names. Some of
them make it into the media, besides the ones we commonly think of. We
wanted a system that could be intelligent enough to decide a document
was actually about that concept as opposed to being a strict keyword
match.
About Source of Article
Dana Greenlee is producer and co-host of the WebTalkGuys Radio
Show. WebTalkGuys, a Seattle-based talk show featuring
technology news and interviews. It is broadcast on WebTalkGuys Radio,
Sonic Box, via Pocket PC at Mazingo Networks and the telephone via the
Mobile Broadcast Network. It's on the radio in Seattle at KLAY
1180 AM. Past show and interviews are also webcast via the
Internet at
http://www.webtalkguys.com/. Greenlee is also a member of the The
International Academy of Digital Arts & Sciences.
|