San Francisco Chronicle LogoHearst Newspapers Logo

Internet Archive, repository of modern culture, turns 20

By Updated
Jason Scott, MC'd the event during a 20th anniversary celebration of the Internet Archive in San Francisco, Calif., on Wednesday, October 26, 2016.
Jason Scott, MC'd the event during a 20th anniversary celebration of the Internet Archive in San Francisco, Calif., on Wednesday, October 26, 2016.Carlos Avila Gonzalez/The Chronicle

When the Internet Archive was created 20 years ago, few envisioned how a small galaxy of about 500,000 websites would evolve into the center of human communication and culture.

Now, the nonprofit San Francisco organization — which celebrated the milestone with a party Wednesday night — curates a vast digital archive that includes more than 370 million websites and 273 billion pages, many captured before they disappeared forever.

It’s more than an archive of Internet sites. The organization, founded by computer scientist and entrepreneur Brewster Kahle, now has a virtual storehouse ranging from digitally converted books and historic film to funny memes and audio recordings of Grateful Dead concerts.

Advertisement

Article continues below this ad

Future scholars will be able to search through an archive of news talk shows and political advertising to better understand the twists and turns of this year’s presidential election season.

“When Brewster started this, a lot of people thought he was crazy or irrelevant,” said Rick Prelinger, a film archivist and associate professor of film and digital media at UC Santa Cruz.

“First off, who thought the Web was anything that needed to be saved back in 1996, ’97 or ’98?” he said. “It was just screens you looked at. I don’t think anybody anticipated that our culture would move online so rapidly. He did see that. He’s got a good instinct for that kind of thing.”

Larry Dieterich, left, checks out the tabletop scribe that digitized books demonstrated by Tim Bigelow, right, during a 20th anniversary celebration of the Internet Archive in San Francisco, Calif., on Wednesday, October 26, 2016.
Larry Dieterich, left, checks out the tabletop scribe that digitized books demonstrated by Tim Bigelow, right, during a 20th anniversary celebration of the Internet Archive in San Francisco, Calif., on Wednesday, October 26, 2016.Carlos Avila Gonzalez/The Chronicle

About 600 people turned out for the party in the Internet Archive’s neoclassic, Greek-columned home, the former Christian Scientist church on Funston Avenue in the Richmond District. Guests included early tech entrepreneur Marc Canter, co-founder of what would become Macromedia, early Apple employee Dan Kottke, and Washington journalist Kathy Kiely.

Advertisement

Article continues below this ad

The crowd included past and present Internet Archive employees, and others who volunteered their time or money to help the organization over the years.

Kahle’s goal was to create the digital version of the Great Library of Alexandria, the lost repository of the ancient world. He believed that preserving the particularly ephemeral World Wide Web, as it was then called, would be key for future historians to be able to understand the contexts of this time.

“The Web is a sharing extravaganza of people trusting each other with who they are, and making it public,” Kahle said. “We wanted to make that permanent.”

Jason Scott, right, calls up a vintage Pac-Man game for Mouse Reeve, left, during a 20th anniversary celebration of the Internet Archive in San Francisco, Calif., on Wednesday, October 26, 2016.
Jason Scott, right, calls up a vintage Pac-Man game for Mouse Reeve, left, during a 20th anniversary celebration of the Internet Archive in San Francisco, Calif., on Wednesday, October 26, 2016.Carlos Avila Gonzalez/The Chronicle

About the same time, he co-founded Alexa Internet, a Web research and information company that also derived its name from that ancient library. Amazon bought Alexa in 1999 in a deal worth about $250 million.

Advertisement

Article continues below this ad

Kahle “had a reasonable vision of the future and a path to getting there,” said Ronna Tanenbaum, Alexa’s former head of design. “He was trying to protect, preserve and create universal access to knowledge.”

The Internet Archive has survived through community donations and by working with about 1,000 libraries around the world that pay the group to help digitize books and other material. But the site itself remains free.

“It is an organization that gives things away,” Kahle said. “Who does that? The interesting thing is that ‘free’ works so well on the Web.”

The archive is best known for the Wayback Machine, which uses computer algorithms to crawl the Web and constantly save snapshots of sites. Online visitors use it to compare how websites, like SFGate.com, have changed over the years. It’s also a repository of Internet firms that went belly-up long ago, like Pets.com.

The site’s 3 million to 4 million visitors a day show that “people want old stuff, they want to remember,” Kahle said.

Advertisement

Article continues below this ad

Last week, the archive released an easier way to search the Wayback Machine, which has also helped repair 1 million broken citation links on Wikipedia.

In another section is an archive of about 3 million hours of TV news broadcasts. That includes a searchable database of political ads captured during the current election season, which political scientists 100 years from now might use to figure out what we were thinking during this “craziest, most disruptive election since the Civil War,” said Roger Macdonald, the TV archive’s director.

Or future scholars might find something else in the archive more indicative of our moment in time.

“The most important value of the archive in the future inherently can’t be anticipated now,” Macdonald said.

Kahle said the archive has digitized about 2.5 million books, although it’s still a long way from its goal of 10 million books by 2020, and from the Library of Congress. But archive visitors can now search through books it has digitized, a feat not possible until recently.

Advertisement

Article continues below this ad

“Today, our whole world is the Internet and digital material,” said San Francisco historian Woody LaBounty. “And they’re taking on this ridiculous, massive job trying to capture just a small part of it and saving it for posterity. It’s a daunting task.”

Benny Evangelista is a San Francisco Chronicle staff writer. Email: bevangelista@sfchronicle.com Twitter: @ChronicleBenny

|Updated
Benny Evangelista