📜 ⬆️ ⬇️

Library of Congress continues to attempt to archive all tweets from 2006-2012

Two years ago, the communications director of the Library of Congress announced a plan to make an archive of all Twitter, starting in March 2006. Already at that time (March 2010) it was a very large amount: then 55 million messages a day were posted on Twitter, and the total size of the database since the site was founded was measured in terabytes.

But they were only flowers. By the summer of 2012, Twitter traffic had grown to 400 million messages per day, and the Library of Congress did not launch the promised full text search archive. In this regard, some people began to doubt that librarians could do the job. Last week there were rumors that they quietly abandoned the ambitious project . In fact, it is not.

Journalists from the Nieman Journalism Lab interviewed Jennifer Gavin, who heads up the Twitter archiving project at the Library of Congress. She assures that the plans remain in force, just “a good librarian is never in a hurry”, that is, they are not going to provide their service at the same pace as Twitter does.

Of course, the task was much more difficult technically than it seemed at first. "The process of developing technical specifications is still ongoing, but we are much closer to its completion," Gavin said. “I cannot give you a specific date when we will be ready to announce it officially.” Criteria are now being determined how to sort the source data: by keywords, by time, etc. Developers have not yet decided what should be the user interface of the system.
')
“Last year, we began to partially receive material from Twitter. Now we get it almost daily. This is a very large amount of data, ”says Gavin. At the same time there is a six-month embargo on archiving fresh tweets. According to the terms of the agreement with the company, the database created should be available only for non-commercial intra-library use and preservation. The system will be available only for registered library library visitors.

Source: https://habr.com/ru/post/148233/


All Articles