There are myths that we tell each other about the internet. These internet myths aren’t helpful. They can adversely affect our planning and our online experiences.
So onward to the internet myths:
What’s online, stays online forever…. – Tell that to the archiveteam.org who were frustrated in their efforts to save Yahoo! Groups for posterity by Verizon. This obstructive unhelpful behaviour is entirely in keeping with Verizon’s core values.
Or the 700,000 Tumblr blogs containing over 800 terabytes of that were deleted from the web. Machine learning software designated any image that contained round shapes or beige for deletion. This was part of a Verizon effort to purge adult content from Tumblr.
Finally, keen students of digital media services like Amazon Prime, Apple iTunes, Apple Music, Spotify etc will know how content appears and disappears in their digital libraries. A case in point would be my collection of James A. Michener books that I bought on Apple Books and then disappeared less than a fortnight later. No refund, and no mindless reading material to keep me occupied on a long haul flight.
Everything is online…. – no it isn’t. There are large rafts of content that aren’t digital let alone online. Google and IBM have worked on large scale digitalisation projects. But there is content that never made the jump from analogue to digital. Master tapes decaying in record company vaults. If you go through Discogs like I do on a regular basis, you can see a huge body of recording that have never made it online via legal, or illegal means.
You can find anything you want online…. – I remember the first time I found a web ring. These were connected ‘walkthroughs of pages by different authors all linked together in a giant ring. They catering to people who liked different subject areas. Education had double the amount of subject area webbings compared to sports. Cats hadn’t conquered the web yet: there were just 17 rings for animals and pets.
And you got an arcane level of detail in discussions about the subject area. Over time, the web became too vast for ‘surfers’ and search engines came to past. Prior to the rise of the modern social web, I saw estimates that Google only indexes 15% of the available web. So 85% of content that hasn’t disappeared isn’t searchable.
Secondly, a lot of content is being created on platforms like Instagram; where search is essentially broken in nature. There is a similar curation of search in TikTok where the focus is on the ‘now’.
Then there is the concept of link rot. Where deleted content or broken links caused by SEO (search optimisation), or technology platform transition mean that content disappears. The phenomenon has been studied since at least the mid-1990s and Library studies academics have put serious efforts into documenting it. They set out measure how ‘unstable‘ in nature the worldwide web is as a research resource. The quote below by Sarah Rhodes of the Georgetown University Law Center sums up the problem quite elegantly:
In the context of web archiving and digital preservation, one often hears that the average life span of a web page is forty-four days. This statistic has been repeated among those in the digital preservation community for years, but it never seems to be accompanied by a citation. In a 2002 article by Peter Lyman, a footnote briefly explains why the source of this figure is so elusive: “These data sources were originally published on the Web, but are no longer available, illustrating the prob- lem of Web archiving.” Ironically, the very source of a statistic often used to sup- port the cause of web preservation has itself become a victim of “link rot.”Breaking Down Link Rot: The Chesapeake Project Legal Information Archive’s Examination of URL Stability by Sarah Rhodes, Georgetown University Law Center (2010)
More online related posts here.