Jump to content

How much physical space would an exabyte of data require?


Recommended Posts

I've recently been thinking about how humans would survive on long-term space voyages. And while it's definitely a fact that we can survive even without books, which I consider the bare minimum requirement for a human being to not be completely bored, it would still be best for travelers to have as much content at their disposal as possible. The limitations of the speed of light, of course, make easy access to the existing internet impossible once out of Earth's general area by a half light-minute or so, so it would be necessary to bring along a diverse and near-inexhaustible amount of information for purposes of entertainment and education.

So what if a spaceship were to have, stored in its data archives, every movie ever made and every episode of every series ever made (excluding ones that are lost or illegal, of course) in the highest possible resolutions? Every work of written literature, and the sum total of all human scientific knowledge as well as, thrown in for good measure, a few photorealistic virtual-reality games and environments for the crew to enjoy using either a conventional VR headset or some form of real life holodeck.

I'd imagine that such a massive data archive would require hundreds of petabytes at least, so I'm thinking that an Exabyte of capacity would be enough for it all, or at least something close to that estimation.

 

So my question is: how much physical space would such an archive require? How compact could such a thing be made with present day tech, or with future tech based on a logical prediction. I'd hope that, with effort, it could be made small enough to fit about a craft like SpaceX's Starship, or at least a larger successor craft.

Link to comment
Share on other sites

Even less if you use MicroSD cards. Currently the biggest ones allow 512GB, while only being 11*15*0,7mm in size (=115,5mm^3). You need 2 milllion of them, which results in 0,231m^3.

Link to comment
Share on other sites

i wouldnt want to build an exobyte storage cluster on microsd for obvious reasons. 

m.2 sticks are up to about 2tb. and i know ive seen quad m.2 to pcie-16 adapter cards. so thats 8 tb per board. i found server boards with 7 16x slots to give you 56 tb per server. of course the server eats all the space you save with the m.2 and on top of that for large nas clusters you probably want mechanical drives, ssds just aren't cut out for server loads. 14 tb drives are available. you could get a 48 bay nas server for about $20k  that gets you up to 672tb. about 2000 of those and there's your exabyte. congratulations, you have invented the datacenter.  you might be able to cram 3 of those per rack so you would need 666 racks because math worships satan. thats not counting routers and switches and everything else that setup would need. so it would be about the size of a fairly large room if you want practical storage. if speed is not an issue you could go with a tape system with a robotic tape recovery. i think 10 tb tapes are the norm these days and you would only need 100000 of them. 

Edited by Nuke
Link to comment
Share on other sites

By the time we need that much storage space for such a voyage, it will have become "trivial" to create such a storage space.    The real issue will be digitizing everything and getting it there....

Link to comment
Share on other sites

Samsung's modern flash memory (V-NAND) is somewhere in the 2-4 Terabits per square inch ball park (Oh no, imperial units, run away!), and will probably be made on thinned wafers stacked to 200-300um thick, so if we could do the packaging to stack them on the die level we'd be looking at about 300mm^3/TB (byte). For a real device we could probably happily triple that taking cooling, datalines and such into account. Let's call it 1cm^3/TB. So if sufficiently motivated, I'm getting about the same values as everyone else that we should be able to cram the whole Exabyte into about a 1m cube. It would cost a bloody fortune both for development and manufacturing, and the datarates out would not be particularly impressive, but it could be done! 

 

If we want to go near future though, single atom storage is starting to peak over the horizon! Ars Technica article

Edit: Got a better source for the stack thickness: here.

Edited by Cunjo Carl
Link to comment
Share on other sites

This might be a good application for holographic storage assuming it could be developed beyond proof of concept. For this application we don't need read-write capability, so a dense write-one-read-many approach would work. Theoretically you can get 1 bit per cube of the laser's wavelength, which means ~30 TB/cm3 with a fluorine excimer laser (157 nm). If we use a blu-ray laser (320 nm) it would have a theoretical maximum of ~3.5 TB/cm3. You're looking at somewhere between 0.035 m3 and 0.3 m3 for an exabyte assuming we can get close to the theoretical limit (I would assume something close to 0.1 m3 accounting for error-correction, redundancy, etc.). Obviously this wouldn't beat single atom storage for density, but it could probably handle the occasional cosmic ray.

You probably would also want an archive of knowledge and teaching materials, particularly for skills that would be hard to maintain on a generation ship (assuming that is what we're talking about). For example geology would be a little hard to maintain on a ship.

Link to comment
Share on other sites

7 hours ago, Mad Rocket Scientist said:

Magnetic tape is probably a better solution in terms of cost. The latest advances are around 25 GB/in^2. https://newatlas.com/sony-ibm-magnetic-tape-density-record/50743/ Really not a ton more space, maybe 20m^3, but also cheaper and more resistant to radiation.

Judging by what the uncool kids over at https://www.reddit.com/r/DataHoarder/ have to say, (on a good sale*) retail hard drives make it hard to justify tape.  Tape might still be surviving in "enterprise grade disk" vs. "enterprise grade tape" (*all* tape these days are for servers.  Hard drives appear to be heading for either for servers or the cheapest computers, and I expect the cheapest computers will simply switch to even less flash), but it is getting close.  Flash vs. hard drive seems to have a factor of 5 difference in cost, but that is coming down quickly.  I wouldn't be surprised if the first manned flight to Mars wouldn't have this question: flash will have won outright (or be replaced with something more high tech).

Apparently Google is already storing exabytes of data (I expect their answer would be "x% of a datacenter" if they were talking.  They aren't.)

* this involves waiting for a holiday sale at Best Buy and buying the 10TB WD external drives and removing ["shucking"] the hard drive (USA only).  Gets to be around $16/TB.  LTO-6 tape is on Amazon for less than $10/TB, but good luck finding a cheap price on a LTO-6 drive.

Link to comment
Share on other sites

On 5/4/2019 at 11:16 AM, Elthy said:

Even less if you use MicroSD cards. Currently the biggest ones allow 512GB, while only being 11*15*0,7mm in size (=115,5mm^3). You need 2 milllion of them, which results in 0,231m^3.

I can imagine it now, somewhere in the spacecraft there's a bathtub-sized chest filled to the brim with micro SD cards, each carefully labelled with a small letter code, maybe colour coded too. Strapped to its side, a phone book-sized index detailing what is found on each card. Finding the right card for what you want to watch would be like finding a particular piece of LEGO in a chock-full ball pit. 

Link to comment
Share on other sites

23 minutes ago, Codraroll said:

I can imagine it now, somewhere in the spacecraft there's a bathtub-sized chest filled to the brim with micro SD cards, each carefully labelled with a small letter code, maybe colour coded too. Strapped to its side, a phone book-sized index detailing what is found on each card. Finding the right card for what you want to watch would be like finding a particular piece of LEGO in a chock-full ball pit. 

Well obviously you need a large robotic arm mechanism to automatically select the correct micro SD car for you. ;)

Link to comment
Share on other sites

sd cards come with all the wear leveling machinery taken out and is essentially dumb flash that fails when it fails. its completely unmanaged unlike say an ssd. im not to thrilled about flash's data retention rates either. or rather the lack of hard data about data retention rates. im not sure if we have been using ssd's long enough to characterize their failure rates accurately when used for archival storage. 

Edited by Nuke
Link to comment
Share on other sites

This is a question that has a lot of relevance right now - for example when the Square Kilometer Array telescope in Australia & South Africa starts operating it will generate & store about an exabyte of processed data per day. There are more than 250,000 radio telescopes in the array & each one will generate 160 Gbits of raw, unprocessed data per second - which I think works out to be about 400 exabytes of raw data per day in total before processing.

For data storage density nature currently has us well beaten - 1 gram of perfectly encoded DNA could theoretically store 455 exabytes of data, if you could keep it in a state that was both stable & somehow readable.

Link to comment
Share on other sites

7 hours ago, Lukaszenko said:

"DNA Fountain" method can store 215 petabytes of data in a single gram of DNA. So, looking at the density of DNA, you could easily fit an exabyte into < 3 cm^3. 

dna is not optimal for space applications though

Link to comment
Share on other sites

7 hours ago, Listy said:

This is a question that has a lot of relevance right now - for example when the Square Kilometer Array telescope in Australia & South Africa starts operating it will generate & store about an exabyte of processed data per day. There are more than 250,000 radio telescopes in the array & each one will generate 160 Gbits of raw, unprocessed data per second - which I think works out to be about 400 exabytes of raw data per day in total before processing.

For data storage density nature currently has us well beaten - 1 gram of perfectly encoded DNA could theoretically store 455 exabytes of data, if you could keep it in a state that was both stable & somehow readable.

I really doubt that anyone will be interested in the raw data (except those interested in developing such systems, but I suspect they will have to go to the site for such data), the point is to process the array to mimic a single radio telescope.  This cuts down the data needed by a factor of a quarter of a million, and the "denoising" effects by merely averaging those individual telescopes would be significant (and I expect significantly more sophisticated telescopic tricks plus more well known DSP filtering as well).

But that still means that you are getting upwards of 160 Gb/s of highly useful data.  Better start hoarding LTO tape.

Modern storage devices use a great deal of their bits for error correcting (and detection).  CDs and DVDs have 1 bit of ECC for each data bit.  I've heard that hard drives wouldn't work at all without nearly all their error correction bits (but don't expect to find out how many are needed/on the drive).  So I'd expect that with enough ECC (and hints of where the data goes.  Think of the ancient trick of numbering your Hollerith deck) you could get DNA storage to fit: just that I doubt that anyone expects a CRISPR unit to be cost effective storing data.

Link to comment
Share on other sites

what if we take optical media up the spectrum? using hf uv and xray.  could we get more information density than what we currently have available with hard drives and better?

Edited by Nuke
Link to comment
Share on other sites

22 hours ago, DAL59 said:

dna is not optimal for space applications though

Why not?

Quote

For data storage density nature currently has us well beaten - 1 gram of perfectly encoded DNA could theoretically store 455 exabytes of data, if you could keep it in a state that was both stable & somehow readable.

All sources I checked say that "215 petabytes of data in a single gram of DNA" is 85% of the theoretical limit. Where did you find 455 exabytes :o?

Edited by Lukaszenko
Link to comment
Share on other sites

19 hours ago, wumpus said:

I really doubt that anyone will be interested in the raw data (except those interested in developing such systems, but I suspect they will have to go to the site for such data), the point is to process the array to mimic a single radio telescope.  This cuts down the data needed by a factor of a quarter of a million, and the "denoising" effects by merely averaging those individual telescopes would be significant (and I expect significantly more sophisticated telescopic tricks plus more well known DSP filtering as well).

But that still means that you are getting upwards of 160 Gb/s of highly useful data.  Better start hoarding LTO tape.

Modern storage devices use a great deal of their bits for error correcting (and detection).  CDs and DVDs have 1 bit of ECC for each data bit.  I've heard that hard drives wouldn't work at all without nearly all their error correction bits (but don't expect to find out how many are needed/on the drive).  So I'd expect that with enough ECC (and hints of where the data goes.  Think of the ancient trick of numbering your Hollerith deck) you could get DNA storage to fit: just that I doubt that anyone expects a CRISPR unit to be cost effective storing data.

Very often old data is very valuable, in this setting cost is way more important than space. 
In this setting you would use tape. 

In space probe you go or die as the Israel lander . Manual: t-30s, Jeb: "major computer fail, going in manual using only mechjeb while assuming it to be bugged, Val, you are abort commander, on my call you are in command until back in orbit. "

Link to comment
Share on other sites

On 5/4/2019 at 11:26 AM, daniel l. said:

I've recently been thinking about how humans would survive on long-term space voyages. And while it's definitely a fact that we can survive even without books, which I consider the bare minimum requirement for a human being to not be completely bored, it would still be best for travelers to have as much content at their disposal as possible. The limitations of the speed of light, of course, make easy access to the existing internet impossible once out of Earth's general area by a half light-minute or so, so it would be necessary to bring along a diverse and near-inexhaustible amount of information for purposes of entertainment and education.

So what if a spaceship were to have, stored in its data archives, every movie ever made and every episode of every series ever made (excluding ones that are lost or illegal, of course) in the highest possible resolutions? Every work of written literature, and the sum total of all human scientific knowledge as well as, thrown in for good measure, a few photorealistic virtual-reality games and environments for the crew to enjoy using either a conventional VR headset or some form of real life holodeck.

I'd imagine that such a massive data archive would require hundreds of petabytes at least, so I'm thinking that an Exabyte of capacity would be enough for it all, or at least something close to that estimation.

 

So my question is: how much physical space would such an archive require? How compact could such a thing be made with present day tech, or with future tech based on a logical prediction. I'd hope that, with effort, it could be made small enough to fit about a craft like SpaceX's Starship, or at least a larger successor craft.

Well, wikipedia says that according to 2007 study there was a 281exabytes of digital information on Earth, soo i think even if 90% of it is just copies you still need way more than 1 exabyte to store it, huh. However, Google proceses ~20-30 petabytes per day, AT&AT handles a 20 petabytes per week stream of data, archive.org holds about 50 petabytes, so i think 1-2 exabytes'll be enough to entertain crew during a year of flight or so, but if you want something like a copy of the whole Internet for lets say an interstellar decade(s)-long trip, portable source of scientific knowlege or so, 50-100 exabytes may be requred (althrough 9-month Mars trip Google cashe may still fit on Starship ;) ). And also the amount of digital data on Earth is growing logariphmicaly fast, so you'd beter hurry - in 5 years you'll need something about a couple of zettabytes, espicially for "scientific data"™ and VR games with 16k textures, lol)

Link to comment
Share on other sites

3 minutes ago, NiL said:

Well, wikipedia says that according to 2007 study there was a 281exabytes of digital information on Earth, soo i think even if 90% of it is just copies you still need way more than 1 exabyte to store it, huh. However, Google proceses ~20-30 petabytes per day, AT&AT handles a 20 petabytes per week stream of data, archive.org holds about 50 petabytes, so i think 1-2 exabytes'll be enough to entertain crew during a year of flight or so, but if you want something like a copy of the whole Internet for lets say an interstellar decade(s)-long trip, portable source of scientific knowlege or so, 50-100 exabytes may be requred (althrough 9-month Mars trip Google cashe may still fit on Starship ;) ). And also the amount of digital data on Earth is growing logariphmicaly fast, so you'd beter hurry - in 5 years you'll need something about a couple of zettabytes, espicially for "scientific data"™ and VR games with 16k textures, lol)

Wait, here [ https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf ] they saying that total amound of digital data on Earth alreay was 33 zettabytes in 2018 (!!!) and predicted to be 175 zB in 2025. Oh shi... I think an extra SLS block II launch with 200t of DNA-drives is needed.

Link to comment
Share on other sites

On 5/10/2019 at 4:47 PM, DAL59 said:

radiation

It did cross my mind, but it can't be that hard to shield 3 cm^3. Besides, since the first sentence of this discussion is "about how humans would survive on long-term space voyage", we'll be using dna storage in one way or another whether we like it or not.

Link to comment
Share on other sites

On 5/10/2019 at 11:29 PM, Lukaszenko said:

Why not?

All sources I checked say that "215 petabytes of data in a single gram of DNA" is 85% of the theoretical limit. Where did you find 455 exabytes :o?

The claim seems to date to a paper published in Science in 2012: Next-Generation Digital Information Storage in DNA 

From the methods "Theoretical DNA density was calculated by using 2 bits per nucleotide of single stranded DNA. The molecular weight of DNA we used was based on an average of 330.95 g/mol/nucleotide of anhydrous weight for the sodium salt of an ATGC balanced library. This results in a weight density of 1 bit per 2.75 x 10-22 g, and thus 4.5 x 10-20 bytes per gram. Of course, practical maximums would be several orders of magnitude less dense depending the types of redundancy, barcoding, and encoding schemes desired."

The 215 petabytes / gram figure is certainly much closer to the realistic 'practical maximum' the authors mention & it's still a huge number!

DNA can be encapsulated in silica for long term error free storage (1 week at +70 degrees C & estimated to be ~2000 years at +10C, or up to 1 million years at -20C). 1 million years is about the age of the oldest fossil remains where DNA fragments have been successfully extracted & sequenced as well, so it would seem that on Earth at least for very long term storage controlling temperature & the chemistry of the surrounding environment is more important than worrying about radiation damage. In deep space maintaining your DNA data bank at an ultra cold temperature might be fairly easy & radiation might become a much bigger concern.

Link to comment
Share on other sites

On 5/9/2019 at 8:32 PM, Nuke said:

what if we take optical media up the spectrum? using hf uv and xray.  could we get more information density than what we currently have available with hard drives and better?

Magnetic media has resolved tighter than optical (blue lasers at least) can resolve.  I doubt you will be focusing UV very well (the semi conductor industry has been dragging their legs going from normal optical to uv for years.  Expect any company that can't make the transition in the next year or so to drop out of the race for producing "bleeding edge chips"*) and I've never heard of an x-ray lens being big enough to be functional (perhaps mirrors will work).  In any even, modern NAND flash approaches optical resolution limits (for cost reasons, modern CPUs push into "really weird optical tricks") and then stack the transistors 48-96 layers deep.  You aren't getting a tape to do that, unless you go 3D (holographic tape might work, but still won't touch the density.  The holograms would effectively act as a ECC/frequency expanding trick).

On 5/10/2019 at 10:47 AM, DAL59 said:

radiation

The entire problem assumed humans would be on board.  And remember when I mentioned that optical media used half the bits to correct the other data bits?  You would have to do that with DNA as well, but obviously not as much.  In other news I'd like my own DNA re-encoded with these tricks so it can "scrub" out any transcription errors and mutations (but not with today's technology.  It will have to wait).

Link to comment
Share on other sites

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...