DNAinfo, Gothamist, and What We Lose in the Disappearing Digital Archive

We may earn a commission from links on this page.
Photo illustration by Elena Scotti/Splinter/GMG, photos via Shutterstock
Photo illustration by Elena Scotti/Splinter/GMG, photos via Shutterstock

On Monday afternoon, as a few hundred people gathered at New York’s City Hall to protest the shuttering of local news sites DNAinfo and Gothamist, a carousel of speakers took turns proclaiming their love for journalism—including frequent targets of the press. Tragedies make strange bedfellows.

The sites wrote negative articles about him, New York City Councilmember Jumaane Williams said, but “at least they’d spell my name right.” Other local Democrats, taking turns on a megaphone a day before many of them were up for re-election, similarly joked about how various reporters had at times been pains in the ass. “The absence of DNAInfo and Gothamist leaves a large hole in local media coverage that isn’t easily filled,” Councilmember Antonio Reynoso said.

Yet a week after the publications were shut down, what happens to that journalism—the first drafts of local history DNAinfo, Gothamist, and its companion sites in other cities had given the world—remains an open question. The sites were temporarily blacked out last week, spurring a fierce social media backlash. They soon came back online. And former Gothamist publisher and co-founder Jake Dobkin promised that some version of the archives would be preserved.

But the details of that effort—including whether it’s tied to a broader negotiation between the former staffers’ union and their employer, billionaire Trump supporter Joe Ricketts—are unclear. Hal Danziger, DNAinfo’s chief technical officer, told Quartz last Friday that such “details are among the issues the company will address in the coming weeks.”

“We are having conversations about maintaining availability [of the archives] in the future,” added Lowell Peterson, executive director of Writers Guild of America-East, which represents both sites in addition to Gizmodo Media Group. “The union committee has made it clear that this is important and I believe management understands.”

There could be a lot to flesh out there, particularly given how Gothamist has in the past deleted articles critical of Ricketts. But the outcome of the fight to archive the sites may be a precursor of what’s to come as the disruption of the media industry continues. It’s likely that additional existing publications will close in the face of economic upheaval, leaving their sites vulnerable to technical failure without consistent upkeep. Mounting political and legal pressure, meanwhile, may push the owners of publications to erase more controversial work entirely.

One prescient example of the latter threat: Gawker.com, whose domain, social media accounts, and nearly 200,000 posts are up for auction. Splinter’s parent company, Univision, passed on the site when it purchased Gawker Media’s other properties last year. The law firm overseeing Gawker.com’s sale has received more than a dozen inquiries about the site, The Wall Street Journal reported last month (emphasis mine):

As it stands, [Gawker.com] is potentially more valuable to a person who wants articles removed than it is to a person who wants the archive preserved, a person familiar with the matter said, pointing to venture capitalist Peter Thiel who secretly financed Hogan’s case, the ruling on which drove it out of business. A new owner of Gawker could also face legal pressure to remove articles, people familiar with the matter said.

A complete shutdown of the site by an affluent Gawker hater—or even the removal of individual posts—would create a gaping hole in the public record. Case in point: On Thursday, when The New York Times published accusations by several women of comedian Louis C.K.’s sexual misconduct, the paper originally included a broken link to a 2015 Defamer post that was among the first published accounts of the allegations. The Times eventually added a working link to the piece by the Gawker sub-site, which provided valuable context by describing previous efforts to confirm such rumors.

Louis C.K. wasn’t an outlier for the late gossip site, either: Gawker reported on director James Toback’s sleaze years before many other larger outlets, reignited public awareness of Bill Cosby’s history of rape allegations, and called for information about the “open secret” of Harvey Weinstein’s misogyny way back in 2015. Taken together, the work represents important early steps toward the current conversation around sexual harassment and assault.

The Gawker archives hold other important pieces of 21st century media history as well: the first exposé on Silk Road, the crowdsourcing campaign to buy a video of former Toronto Mayor Rob Ford smoking crack, the defining takedown of smarm as a cultural force, an investigation into Donald Trump’s alleged hairpiece, the evolution of blogging as a distinct writing style on the internet—all potentially gone.

It’d be a decisive win for those in power whom Gawker journalists tried to hold accountable, and who are increasingly exercising their domination over journalists elsewhere. Whereas libraries are chock-full of decades’ worth of microfilm copies of newspapers and magazines, no similar record-keeping exist for the vast majority digital publications. It means whoever owns a site can theoretically euthanize its body of work in one fell swoop.

The other, potentially broader, threat to digital preservation is more mundane: technical limitations. A 2014 study by the Reynolds Journalism Institute at the University of Missouri suggested about one in six online news organizations had lost significant amounts of news content from server crashes, evolutions in software, human error, and media storage failures. That number was even higher for outlets that also published in print, many of whom maintain sprawling archives of back issues—“morgues”—that haven’t been fully digitized.

Those newspapers and magazines are often owned by larger media companies that now find themselves in a financial tailspin amid tech giants’ dominance of the advertising market. And it seems likely that outlets that eventually go out of business, where no one is paying attention to renewing domain licenses, among other basic upkeep, are at greater risk.

The more dystopian possibility is that new owners will try to squeeze a few final pennies out of shuttered publications by auctioning off archives to the highest bidder. Take what may happen to Gawker, but magnified with numerous local and regional publications across the country.

There are some organizations that provide a buffer to digital erasure. Informal groups like Archive Team have backed up a burgeoning mountain of digital history, and there’s a growing array of tools for DIY archiving and awareness of the topic in general.

Perhaps the largest such effort is the Wayback Machine, a project of the San Francisco-based nonprofit Internet Archive, where about 15 full-time, part-time, and volunteer engineers attempt to catalog as much of the public web as possible. They’ve logged about 599 billion URLs so far, adding about 2,500 more every second, the Wayback Machine’s director, Mark Graham, told Splinter.

Graham said his engineers try to prioritize web pages that may be at risk, including news sites under threat of shutdown by repressive regimes, like in Turkey or Egypt. “Most libraries throughout history have burned down,” he said, “and the principal cause or actor behind that has been governments.” In the U.S., the Wayback Machine has already captured more than 336,000 URLs from DNAinfo alone, though they’re not searchable for, say, news about specific neighborhoods or individual politicians.

“We built a site index and site search for the Wayback Machine, and we are exploring other search capabilities,” Graham said. “[An indexed archive] would take a lot of work.”

The Library of Congress also has extensive records, though Abbie Grotke, its web archiving team lead, told Splinter that collections are largely focused around specific themes, like a 2003 Iraq War archive. “We’re a little behind getting our collections online,” she said. Unlike some other national libraries, the Library of Congress doesn’t have a legal mandate to archive general news en masse.

What’s more, copyright law essentially requires permission from publications’ owners to make their archives publicly accessible. The tools used to capture web pages have their own limitations as well. “For very large sites that have a lot of content, sometimes crawlers can’t get as deep as we’d like,” Grotke said. “We can’t always get very interactive content or things behind search engines.”

In lieu of such a centralized effort, a third-party benefactor or academic institution would be better bets for maintaining these slices of history in perpetuity. But the same questions of coordination and permission still apply. After the alternative weekly Boston Phoenix closed in 2013, its digital archives sat in limbo for two years before its owner finally donated them to the Northeastern University library. A public campaign to save 34 years’ worth of reporting by Philadelphia City Paper, however, has come up empty since its closure in 2015.

It’s a foreboding sign for Gothamist and DNAinfo, despite the turnout at Monday afternoon’s protest in Manhattan. The publications were founded in 2003 and 2009, respectively, and together they catalogued New York at a more granular level than larger local newspapers, covering overlooked school board meetings, low-level campaign finance, and minor development projects. Their archives represent the life’s work of the journalists who toiled in those newsrooms, but also a collective memory for the rest of us who live here.