| Largest Online News Website LaunchesBy Dana Greenlee, 
          WebTalk RadioMonday, June 7, 2004; 12:00pm EST
 
 A conversation with CEO Rich Skrenta...
 
 News and information online is so vast that it can be unmanageable in 
          its volume. This is precisely why news aggregation search Websites 
          like Yahoo News and Google News are growing like crazy. Last month 
          marks the launch of a new entry - Topix.net, said to be the world�s 
          largest online news Website with over 150,000 news categories and news 
          sources from around the world.
 
 Rich Skrenta, CEO of Topix.net and co-founder of Netscape�s Open 
          Directory Project took a few minutes out of his news day to explain 
          his huge online news creation.
 
 Q: What does your site do and how does it work?
 Skrenta: Topix.net is reading all of the news published online 
          constantly and categorizing stories, both geographically as well as by 
          subject. We�ve got a basic news roll-up for every zip code in the 
          United States � for 30,000 towns across the U.S. We also have 150,000 
          subject categories. We have a page about every sports team, about 
          every celebrity, every music style. We even have a page about mobile 
          home manufacturing, which has a surprisingly large amount of news on 
          it.
 
 
 Q: You can�t fit 30,000 zip codes on your front page. Does the 
          cream of the crop rise to the top so that when you go to your home 
          page you see some of the major hot topics?
 Skrenta: We have links to the major cities in the country and users 
          can type in zip codes and go to a page just for news about their town. 
          We�ve got some of the categories surfaced on our front page: U.S. and 
          world news, journalism news, health, technology � basically a random 
          assortment of categories from deep within our system. But to get the 
          full experience of Topix, you have to click around and experience the 
          full breadth by viewing the internal parts of the site.
 
 Q: There are a lot of people saying you�re the largest news Website 
          that�s ever been created. Would that be a true statement?
 Skrenta: Based on the number of categories, yes. If you look at Google 
          News, they�ve got eight categories: health, politics, entertainment, 
          sports and so on, basically corresponding to the standard Associated 
          Press taxonomy. Yahoo News has 100 full coverage sections. We have 
          150,000 pages. Our goal is to have a page constantly updated from the 
          broadest variety of sources about every person, place and thing in the 
          world. We haven�t done every person, place or thing yet, but we�ve 
          done the first 150,000. We�ve got a page about every public company. 
          There are 5,500 public companies. We�re tracking references to every 
          disease and drug - both brand and generic - 21,000 sports 
          personalities, 45,000 celebrities, anyone who�s ever been in a movie, 
          anyone who�s ever put out a music album.
 
 Q: Building out your keyword database � you must have spent a lot 
          of long nights working.
 Skrenta: Yeah � it�s a massive knowledge base which drives our system 
          in conjunction with some artificial intelligence that we developed. 
          The knowledge base knows the name of every street in the country, 
          every bridge, tunnel, hospital, school, body of water, baseball 
          stadium, park � in addition to the other subjects and keywords its 
          looking for. It�s about 10 million lines of text that are constantly 
          being looked for in every story that comes through our system.
 
 Q: What�s the difference between what Topix.net is doing and what 
          Google is doing in terms of the broad picture of what you�re actually 
          indexing?
 Skrenta: If you go to Google News and want to get information about, 
          say, IBM, you�d find a lot of stories that contained the three letters 
          �IBM,� but it might not be a good relevant overview of IBM�s current 
          business. When we look at a story, we�re trying to determine not just 
          if a story contains certain keywords, but actually if it�s about the 
          concept that our topic is about.
 
 A story we recently saw said, �Dot-com survivors have aged like fine 
          bordeaux.� Now this is a reference to a style of wine and we have a 
          wine page in Topix, but it�s not a wine story. It�s a business story. 
          It�s a stray reference to something else. If you search for bordeaux 
          on Google News, you would get this story. But it�s not what you�re 
          looking for. Our system can tell the difference between stray 
          references to concepts and stories that are actually about the 
          concept.
 
 Q: You come from an incredible background of creating things we now 
          take for granted on the Web. Have you now created this artificial 
          intelligence (AI) that�s just proprietary to Topix that no one else 
          has duplicated?
 Skrenta: Yeah, it�s pretty unique in the industry. I haven�t seen 
          anything else like it. We looked at what had been done in academic AI. 
          If you�re going to develop an AI technique, 85 percent accuracy is 
          pretty good. But for our purposes, we had to get far above 85 percent 
          to make the stories look good on a page. If our AI was only 85 
          percent, that would mean that on every one of our pages, 2-3 stories 
          would be bad. We had to get way above 99 percent.
 
 Q: It�s a pleasing site. You don�t get just a list of headlines; 
          it�s formatted beautifully and very well organized. It�s a pleasure to 
          read and it�s exciting.
 Skrenta: I�m glad to hear you say that. When we looked at creating a 
          look and feel for the site, we looked at a bunch of newspaper sites 
          out there. I didn�t really feel like a lot of them looked like a 
          newspaper. The Wall Street Journal Online is the one that was closest 
          to a newspaper look and feel.
 
 We did some research and found that newspaper layout design is 
          actually a rich feel with a 150-year history and there are books and 
          guidelines about rules to follow to make things visually appealing in 
          a print newspaper. Things like if you have a photo to a story and the 
          photo is a picture of a person in profile, the person should be facing 
          the text of the story; very subtle, not obvious rules about how to do 
          newspaper layout properly. When we looked at online newspapers, many 
          didn�t follow any of these rules at all. I couldn�t figure out why � 
          maybe the separation between the print and online divisions at the 
          company. We thought we�d bring some of these rules to bear on a 
          Website design, adapt it to the Web and come up with something a 
          little more reminiscent of a newspaper.
 
 Q: You were one of the founders of the Open Directory Project. What 
          did you learn from building the Open Directory that you�ve applied to 
          this new site?
 Skrenta: The Open Directory was built with 60,000 volunteer editors. 
          We built a giant Web directory similar to Yahoo, but 3-4 times bigger 
          than Yahoo Directory. It�s actually the directory tab on Google.com, 
          in addition to being used by AOL and Netscape. It was a very 
          successful project, but we sort of took the opposite tact with 
          Topix.net. We have zero human editors at Topix � it�s all done with 
          AI. Humans are really good at some tasks, but the scale of what we�re 
          doing here is just so vast that we couldn�t have humans manually 
          editing stories or selecting topics or categorizing them. It�s too big 
          a project for even 60,000 people to undertake.
 
 Q: What�s fascinating is what it seems like what you�re building is 
          somewhat of a dynamically populated directory. It looks like you still 
          have that focus on categorizations of content, which is different than 
          a regular search engine. Was your vision to create a directory-type of 
          search engine?
 Skrenta: What we�re trying to do is classify text by concept instead 
          of keyword. When you go to Google and type in a name like Scott 
          Peterson or Janet Jackson, these are actually relatively common names. 
          There are thousands of people in the country with those names. Some of 
          them make it into the media, besides the ones we commonly think of. We 
          wanted a system that could be intelligent enough to decide a document 
          was actually about that concept as opposed to being a strict keyword 
          match.
 
          About Source of ArticleDana Greenlee is producer and co-host of the WebTalkGuys Radio 
          Show.  WebTalkGuys, a Seattle-based talk show featuring 
          technology news and interviews. It is broadcast on WebTalkGuys Radio, 
          Sonic Box, via Pocket PC at Mazingo Networks and the telephone via the 
          Mobile Broadcast Network.  It's on the radio in Seattle at KLAY 
          1180 AM.  Past show and interviews are also webcast via the 
          Internet at 
          http://www.webtalkguys.com/. Greenlee is also a member of the The 
          International Academy of Digital Arts & Sciences.
 
             |