Email is NOT for storage
Email is not a storage mechanism.
I’ll say that again… Email is not a storage mechanism.
By this, I mean that the purpose of electronic mail is not to store important files, information, or future reference material. It was never intended for that purpose, and even in today’s standards it still falls short of that use. Of course, there is GMail today. Of course, there are GMail extensions (like gDisk and GMail Drive Shell Extension) that allow you to store your MP3 collection, photos, etc. That is a good example of what I’m referring to. I’ll explain…
GMail, as most of us are aware, is not a typical electronic mail system. It does not operate within the paradigm of traditional email systems. Google Mail’s primary interface is via the web page in which email messages are only sorted by “threads” (“conversations” in GMail-speak). But, more on GMail later. Back to the point…
Email began as a way for users of a time-share system to communicate with one another, coordinating within the same closed system. Soon thereafter, it became a method of communicating with users of other time-share systems, yet with serious limitations — namely, the sender of a message was required to know the path in which the message took to get to the intended recipient. Instead of having addresses such as
poe@deadpoets.org
there were “addresses” such as
localhost!nextdoor!nextnextdoor!poe@deadpoets.org
which meant that the message had to travel from localhost, to nextdoor, to nextnextdoor, to finally deadpoets.org in order to reach user ‘poe’. The machines did this in an automated way, as long as the route specified was correct. If one of the machines along the message route was offline, or otherwise not accepting incoming mail, the sending machine held the message for a certain period of time until either the message was accepted on the receiving host or the sending machine effectively gave up — at which point, the message was lost forever.
Eventually, the machines connected to the network grew in number and a machine’s knowledge of other networked machines needed to scale as well. Email needed to change with the new networking methodology, which is why we have user@somesystem.com today. The sender of a message needs to know only the address(es) of the recipient(s), the subject of the message, and the message itself.
Notice in all the above explaination it reads “the message”, and not “the file”? There is a reason for that.
Consider this example:
Alice wants to send Bob some files. The total size of the files is 9.5MB. The contents don’t matter for the purpose of this example, so lets just say the email contains a few large photo images, and a large document. In order for Alice to send these files, in an email to Bob, she would need to first specify Bob’s email address as the intended recipient. Next, she will likely describe the contents in a few words in the Subject: field of the message, “The stuff I wanted to give you”. Then, she sets about attaching each file she wants to send to Bob. Each of these files becomes encoded in a very long set of letters and numbers, completely unreadable by any human, and inserted into the email message “envelope” so that each email system that handles the message will be aware that it is a message with a Subject: and multiple files attached of differing size and type, such not to get the files intermingled among each other, nor this specific message’s contents intermingled with any other message that might be handled. Next, Alice presses “Send”. It takes a moment for her computer to actually send it because most email systems aren’t expecting (or designed) to handle messages of that size… but it gets sent. The message is then copied into Alice’s “Sent Mail” mail store (sometimes called “outbox”). Bob does not see this message right away — this is not file sharing, nor is it Instant Messaging (IM). Alice’s message is received from Alice’s computer, is copied onto Alice’s email server, which then needs to determine which machine handles Bob’s email. Once that is determined, the message is sent again — to the machine listed after the ‘@’ in Bob’s email address. That receiving machine typically makes attempts to verify that it is a message coming from an actual person (like Alice, and not a Spam robot), is destined for a person that it handles mail for (like Bob, and not Boob), and that the size of the message is within the system’s constraints for reasonable handling (typically 10MB). After the message is accepted, it is written to Bob’s email server (this is the 3rd copy of the message) for delivery handling. Assuming that Bob has not forwarded his mail elsewhere (which would further the process of sending/copying the message again), the message is then stored in a holding area on the server’s hard drive, to await Bob’s email client. Once Bob’s email client connects to the email server, the message is copied yet another (4th) time to Bob’s computer. The message will reside on both Bob’s email server, Bob’s computer, and Alice’s email server, and Alice’s computer (in her “Sent Mail”, remember?) until either Alice or Bob delete their respective copies of the message. For a single 10MB message, it has taken multiple computers copying, and costing a total of at least 40MB of storage space. This is not taking into account various spam/anti-virus systems, which also typically store each message for a short time. More importantly, this is also not taking into account that had the message been addressed to more than one person (say Bob and Charles), the message would be stored 6 times — server and user’s computer, for each user — which would amount to a total of 60MB for the sender and two recipients.
Email systems treat messages as such. Sure, each message is a file, but a message to be delivered to user1 cannot/should not/will not be considered the same message as to be delivered to user2, even though it has the same file attached to the message and may contain the exact same contents. Electronic mail is designed this way for privacy; not file-sharing.
“But, Mr. Linux Ninja Geek… storage is cheap!”
Yes. Storage is cheap. However, transmission is not. It takes relatively a small amount of effort for your computer to generate data, or even say copy data from your camera, and store it onto your computer. It is much more effort to transmit that same data across the Internet to another computer, and have it stored there indefinitely. Enter email into that equation and the effort is mulitplied by each computer the message travels through to get to the final destination.
“Ok. So, I shouldn’t send files attachments in email at all??”
That is not what we’re talking about. We’re talking about storing email messages indefinitely. Consider that information in a typical message has a given lifetime. Normally, this lifetime is very short, on the order of days or even weeks, possibly even months. After this time passes, is the information in the message of the same importance, or has it become much less important?
To demonstrate this, let us employ an analogy…
In the old days, before email, people wrote correspondance — stone tablets, papyrus, handwritten, typewritten. The message itself was carried, by another human, to its intended recipient, and either read aloud or delivered into their hands. Once the information within the message was received, what happened to the message itself? In the case of stone tablets, it was likely destroyed — or made into some type of monolith, depending on what the actual message was. In the case of papyrus, the message was read aloud, retrieved, and kept for futher use — this is why the message was stored on a scroll, because it contained more than a single message for more than a single recipient. In the case of handwritten or typewritten correspondance, either the letter was kept in a folder in a file, or it was discarded sometime after the message was understood.
That last part, concerning handwritten/typewritten letters, is probably the closest analogy to email. After the letter was filed away, what was its disposition? More often than not, the letter sat in the file for a long time, until someone either tossed it out with the trash, or it was framed for historical purposes. Point: a letter was hardly ever kept “in case I need it again”. The physical letter’s disposition was certain, upon the moment of receipt, similarly to stone tablets and papyrus scrolls. Why? Because physical objects need space to be stored indefinitely. The more physical objects that need to be stored, the more space required, of course.
Hard drive space is required to store electronic mail messages as well. In all cases of message storage, the information contained within does not change after delivery.
Enter GMail. GMail’s claim to fame was that, initially, the storage amount was enormous, compared to other offerings like Hotmail. Leveraging Google’s search abilities, supposedly you could instantaneously find any email you ever received in the GMail system. This goal is in line with having conversational correspondance with other people connected to the Internet, only in a different way. GMail does not sort messages by date, subject, or even sender of the message like typical email client software. The only sorting mechanism available is by “thread”, which makes GMail seem more like Usenet, or an online discussion forum. This design structure does not seem to lend itself for file storage at all — much less attachment storage. Sure, you can save a message (or entire conversations) indefinitely, for later review. How easy and practical is it to do that? How important is that email from years ago? More importantly, how many other email systems are similar to GMail? It doesn’t seem that GMail is a good gauge as to what an email system can or cannot do, since it seems to be a consensus that GMail is different from the rest, and since GMail is due to fail without warning among other technical limitations.
Given that a message’s information/content/meaning does not change after it is delivered, why is email kept for so long?
Not just that, but if a message is noticed to be lost (presumably a while after it was actually lost), why is it so important to have the message restored? What could possibly be contained in a message, that wasn’t noticed to be missing, that has become critical this very moment? Could the information not be resent from the sender?
More to the point: Why are people storing information in email?
You’re currently reading “Email is NOT for storage”, an entry on Paranoid Linux Ninja Geek
- Published:
- 03.01.11 / 6pm
- Category:
- life, philosophy, rant, tech
- Tags:
- Post Navigation:
- « Lifestream
In a Puff of Smoke »





Comments are closed
Comments are currently closed on this entry.