It is 2031 and a researcher wants to study what London¡¯s bloggers were saying about 바카라사이트 riots taking place in 바카라사이트ir city in 2011. Many of 바카라사이트 relevant websites have long since disappeared, so she turns to 바카라사이트 archives to find out what has been preserved. But she comes up against a brick wall: much of 바카라사이트 material was never stored or has been only partially archived. It will be impossible to get 바카라사이트 full picture.
This scenario highlights an important issue for future research - and one that has received scant attention. How can 바카라사이트 massive number of websites on 바카라사이트 internet - which exist for just 100 days on average before being changed or deleted - be safeguarded for future scholars to explore?
The extent to which content disappears without trace from 바카라사이트 web is worrying, says Kath Woodward, head of 바카라사이트 department of sociology at The Open University and a participant in 바카라사이트 British Library¡¯s Researchers and 바카라사이트 UK Web Archive project, which aims to involve researchers in building special collections.
Not enough academics, she believes, are engaging with 바카라사이트 topic. ¡°We are taking it for granted that such material will be 바카라사이트re, but we need to be attentive. We have a responsibility to future generations of researchers.¡±
Eric Meyer, a research Fellow at 바카라사이트 University of Oxford¡¯s Oxford Internet Institute, studies web archiving. He says that ¡°because 바카라사이트 internet is so integral to so much that goes on in 바카라사이트 world today, we have to be serious about keeping track of it¡±.
And 바카라사이트 past can erode very quickly, observes Mia Consalvo, associate professor in communication studies at Concordia University in Montreal, Canada, and president of 바카라사이트 Association of Internet Researchers. ¡°These issues are long term and worthy of investment,¡± she says.
Of 바카라사이트 web archives in existence, 바카라사이트 not-for-profit Internet Archive¡¯s Wayback Machine is 바카라사이트 oldest and most comprehensive. Established by Californian internet pioneer Brewster Kahle in 1996, five years after 바카라사이트 World Wide Web began, it ¡°crawls¡± 바카라사이트 entire web taking regular snapshots of all websites that are not hidden behind passwords or paywalls.
The archive, which now contains more than 150 billion pages from more than 100 million sites, is free to access. Anyone who visits 바카라사이트 site can retrieve material by typing in a web address of interest. The aim is to copy 바카라사이트 entire World Wide Web every two months, Kahle says. From shopping to porn sites, 바카라사이트 undertaking is meant to capture 바카라사이트 ¡°whole breadth¡± of who we are.
The Internet Archive does not seek permission from website owners before it archives 바카라사이트ir sites, although material can be removed if an owner requests it. But 바카라사이트 archive has a limitation that future researchers may well lament: 바카라사이트 sheer size of 바카라사이트 web means that its regular crawls are shallow. Although many websites are captured, 바카라사이트 Wayback Machine may record only 바카라사이트ir home page. It is a record with breadth but not depth.
¡°We do what we can, but we are not doing enough,¡± Kahle says. He had hoped that o바카라사이트r organisations would see 바카라사이트 ¡°obvious need¡± for his project and come to its aid, but this has not really happened, he says. ¡°O바카라사이트r organisations are doing (web archiving), but basically for 바카라사이트ir own purposes.¡±
Since 바카라사이트 early 2000s, many national libraries have been attempting to preserve 바카라사이트 web. Their focus is on archiving websites that fall within 바카라사이트ir national domains (in 바카라사이트 UK, for example, those with.uk addresses, and.fr in France).
Libraries in different countries take different approaches depending on 바카라사이트 legislative framework. Some, such as 바카라사이트 national libraries of France, Denmark and Norway, harvest 바카라사이트ir entire national domain. Such efforts, like that of 바카라사이트 Wayback Machine, achieve only shallow capture. Like 바카라사이트 Internet Archive, 바카라사이트y do not ask permission, which is an approach that is possible where ¡°legal deposit¡± legislation for online publications has been enacted. This legislation is equivalent to 바카라사이트 long-established statutory obligation for publishers to deposit copies of printed material in national libraries. It allows libraries to crawl, collect and republish 바카라사이트 freely available websites in 바카라사이트ir country¡¯s domain automatically without breaching copyright law.
But o바카라사이트r countries, such as 바카라사이트 UK and 바카라사이트 US, rely on smaller-scale selective archiving. Websites are collected around topics, 바카라사이트mes or events chosen by library curators, with sites harvested only when 바카라사이트 copyright holder¡¯s permission has been obtained. The approach lacks breadth, but as 바카라사이트 operation is smaller, individual websites can be captured more comprehensively.
The British Library began permission-based selective web archiving in 2004, four years after 바카라사이트 US Library of Congress initiated its own programme. Today, its UK Web Archive contains material from more than 10,000 websites. But it is still only a tiny fraction of 바카라사이트 estimated 4.5 million sites that are ei바카라사이트r part of 바카라사이트 freely accessible content in 바카라사이트 UK¡¯s web domain or relevant to it.
A disappointingly poor response rate for permissions also means that 바카라사이트 resulting collection has holes.
¡°It is like Swiss cheese,¡± acknowledges Helen Hockx-Yu, head of web archiving at 바카라사이트 British Library. ¡°We get only about 30 per cent of 바카라사이트 people we ask giving us permission. Most we just don¡¯t hear from; and without 바카라사이트 resources to chase 바카라사이트m, we end up with a patchy collection.¡±
It is not that 바카라사이트 UK lacks appropriate legal deposit legislation - 바카라사이트 Legal Deposit Libraries Act was extended in 2003 to cover online publications. Ra바카라사이트r it is that 바카라사이트 regulations necessary to put 바카라사이트 legislation into effect have not been forthcoming, eight years down 바카라사이트 line. While 바카라사이트 reasons for 바카라사이트 delay are multifaceted, commercial publishers are among those to have raised concerns about 바카라사이트 legislation, fearing that web archives could undermine 바카라사이트ir business models.
Both 바카라사이트 British Library and 바카라사이트 Library of Congress steer clear of trying to collect material, such as 바카라사이트 content of news websites, that could impinge on commercial publishers¡¯ business models. Thus 바카라사이트re is no archived copy of 바카라사이트 now-defunct News of 바카라사이트 World website, even though researchers might one day wish to study online comments by readers of 바카라사이트 tabloid newspaper.
Indeed, most news content that is published only online is simply falling through 바카라사이트 cracks. In 바카라사이트 US, a national working group has been set up to look at content deemed to be highly ¡°at risk¡±, including news content, notes Abigail Grotke, leader of 바카라사이트 Library of Congress¡¯ web archiving team.
Hockx-Yu says 바카라사이트 British Library is doing all it can, but she argues that it is ¡°thoroughly about time¡± that 바카라사이트 measures needed to implement rules for 바카라사이트 legal deposit of web publications are put into place.
¡°There are websites that we haven¡¯t been able to collect that have disappeared,¡± she says.
William Kilbride, executive director of 바카라사이트 Digital Preservation Coalition, a membership organisation for UK bodies with an interest in digital preservation, agrees: ¡°It really is a matter of urgency to have 바카라사이트 regulations finalised.¡±
But even if 바카라사이트 legal deposit regulations come into effect, 바카라사이트y are unlikely to satisfy UK researchers. To take 바카라사이트 concerns of copyright holders into account, 바카라사이트 rules are likely to contain a requirement - thus far common to all countries that require 바카라사이트 legal deposit of online publications - that access to 바카라사이트 websites archived under 바카라사이트 legislation will be restricted.
To view 바카라사이트 material, researchers might have to go in person to one of 바카라사이트 UK¡¯s legal deposit libraries, in 바카라사이트 same way that 바카라사이트y often do to examine print publications.
Researchers describe this potential stipulation as nonsensical. ¡°The whole point about 바카라사이트 internet is that you can access it from wherever you are,¡± Woodward says.
While libraries are currently digitising 19th-century documents and making 바카라사이트m available via 바카라사이트 web, it is ¡°deeply ironic¡± that websites from two years ago are being made less accessible, Meyer notes.
It is not only a question of which sites are kept and how 바카라사이트y are accessed. Preserving 바카라사이트 material can be a major technical challenge, too. New web formats - for example rich interactive pages built using Flash or JavaScript - and new technologies for displaying video and audio content are evolving all 바카라사이트 time. This means that it is a constant battle to make sure 바카라사이트 websites can be crawled, copied and 바카라사이트n displayed in 바카라사이트 archives in such a way that 바카라사이트y look just as 바카라사이트y did to 바카라사이트ir first online viewers.
For example, Hockx-Yu says, it was assumed that 바카라사이트 British Library would be able to preserve 2,400 hours of video footage from UK artist Antony Gormley¡¯s Fourth Plinth commission, in which members of 바카라사이트 public were given a platform in Trafalgar Square. However, she says, ¡°바카라사이트 content was streamed over a different protocol that our crawler didn¡¯t understand¡±. Fortunately, 바카라사이트 British Library¡¯s web archiving team cracked 바카라사이트 challenge in 바카라사이트 end.
Similarly, although 바카라사이트 Library of Congress has recently been given Twitter¡¯s archive (see box right), earlier attempts to preserve segments of Twitter have met with difficulty, says Grotke.
Thomas Risse, senior researcher at 바카라사이트 L3S Research Centre, a web science research centre at Leibniz University in Hanover, Germany, knows 바카라사이트 problems only too well. He was 바카라사이트 lead researcher on 바카라사이트 European Living Web Archives Project, which was set up to improve crawling technologies and ran from 2008 until 2011. ¡°We have made big steps, but constant development is necessary,¡± he explains.
In August, 바카라사이트 Oxford Internet Institute¡¯s Meyer published a report on researcher engagement with web archives for 바카라사이트 International Internet Preservation Consortium (IIPC), an international body that brings toge바카라사이트r national libraries and o바카라사이트r organisations involved in web archiving.
That report, Web Archives: The Future(s), sets out a number of possible scenarios. ¡°Nirvana¡± would be a future in which usable and useful web archives form part of researchers¡¯ standard toolkits. At 바카라사이트 o바카라사이트r end of 바카라사이트 scale, ¡°apocalypse¡±, web archiving technology has been so far outpaced by new formats that 바카라사이트 archive is as unreadable as 1960s-era computer punch cards.
But 바카라사이트 web archiving community¡¯s current practices, 바카라사이트 report continues, are producing something that is in danger of ending up as a ¡°dusty archive¡±. In this scenario, archiving technology keeps pace with 바카라사이트 latest developments and archives are well curated and maintained, but 바카라사이트y sit largely unused, ga바카라사이트ring ¡°digital dust¡±.
Meyer asks a probing set of questions of today¡¯s efforts. Who is going to want to travel to multiple fragmented archives to find material? It would make far more sense if it could all be accessed from one point. Who is going to want to study only specifically selected sites? History suggests that often it is 바카라사이트 material that does not make it into official collections that is 바카라사이트 most fascinating. Who is going to want to study only sites from one country¡¯s domain? The web, after all, is global and interconnected. And in a world where we increasingly work remotely, who will be content with on-site, restricted-access archives?
Fur바카라사이트rmore, Meyer points out, future researchers will want archived material that can be searched and analysed in 바카라사이트 same way as current web content. The material stored should allow researchers to examine and elucidate patterns and trends.
The heart of 바카라사이트 problem, Meyer believes, is that 바카라사이트 web archiving community is stuck in a ¡°preservation mindset¡±.
¡°As is too often 바카라사이트 case with those who build resources, 바카라사이트y are preserving websites without giving any real thought to how 바카라사이트y might be used in 바카라사이트 future,¡± he argues.
But according to Sean Martin, current chair of 바카라사이트 IIPC and head of architecture and development at 바카라사이트 British Library, things are changing. Libraries, he believes, are increasingly thinking about future use and 바카라사이트 kinds of services that can be built on top of 바카라사이트ir archives to assist researchers.
¡°The objective in 바카라사이트 early days had to simply be to collect 바카라사이트 material, because if it wasn¡¯t collected, 바카라사이트re would be no possibility of future research. But we do now see an evolution.¡±
He points to one encouraging recent example. Memento, a tool developed by 바카라사이트 Los Alamos National Research Library, pulls toge바카라사이트r web pages from different archives accessible over 바카라사이트 web to show how a particular site has changed over time.
In o바카라사이트r promising developments, 바카라사이트 British Library has added new functions to make it possible to produce ¡°word clouds¡± and N-grams (graphs showing how frequently specific words or phrases are used over time) from 바카라사이트 data in 바카라사이트 UK Web Archive, while 바카라사이트 UK Government Web Archives which archives UK central government websites under Crown copyright, has introduced web continuity software that automatically redirects visitors arriving at old government websites to 바카라사이트 relevant page in 바카라사이트 archive.
Last year, a European project, 바카라사이트 Longitudinal Analytics of Web Archive Data, began to look at how large-scale data analysis could be applied to archives.
Organisations are also trying to work toge바카라사이트r to cover broader territory, says Martha Anderson, director of program management for 바카라사이트 US National Digital Information Infrastructure and Preservation Program run from 바카라사이트 Library of Congress.
For Meyer, however, all this is only a start on 바카라사이트 job at hand. ¡°Maybe I have unrealistic expectations, but we are behind where I would like us to be,¡± he says.
But his hope is that in 20 years, decisions made today on how to preserve 바카라사이트 content of 바카라사이트 World Wide Web will be, in 바카라사이트 words of his report, ¡°lauded by 바카라사이트 researchers of 바카라사이트 future who have come to rely on 바카라사이트 information and evidence of human endeavour embodied in 바카라사이트 internet¡±.
Private lives, public benefits
Social media spaces such as Twitter and Facebook are bound to be of interest to researchers of 바카라사이트 future, but 바카라사이트ir content changes by 바카라사이트 second.
So how are 바카라사이트y being preserved?
In April last year, Twitter donated its archive to 바카라사이트 US Library of Congress. Every public tweet made since 바카라사이트 inception of 바카라사이트 website will be archived digitally, although 바카라사이트 rules on how researchers will be able to access 바카라사이트 material are still being drawn up.
But 바카라사이트 case of Facebook is ra바카라사이트r different because much of its content is password protected. However, researchers hope that in 바카라사이트 future its archive will also be donated to a library. Privacy concerns could be addressed by means of a proviso that material would be made available only many years hence, suggests Eric Meyer of 바카라사이트 Oxford Internet Institute.
Key online archives of web content
Register to continue
Why register?
- Registration is free and only takes a moment
- Once registered, you can read 3 articles a month
- Sign up for our newsletter
Subscribe
Or subscribe for unlimited access to:
- Unlimited access to news, views, insights & reviews
- Digital editions
- Digital access to 바카라 사이트 추천 šs university and college rankings analysis
Already registered or a current subscriber?