Jump to content

Software engineers and the rest of the world.


PB666

Recommended Posts

What was the point of all that? If you strip away all the unnecessary jargon, you were just admitting that the actual datasets aren't too large or too complex by today's standards.

It is rather innocent to believe that all data sets are published. The published data sets are the starting point, the referense set, each genome referenced adds SNPs in general I will not make those data sets public until after I publish or more likely, in the US, HIPAA may prevent authors from publishing these at all, therefore they will remain private to the study. After the results are digested they might publish the digest in supplimentary literatue.

Rest anecdotes snipped as impertinent.

- - - Updated - - -

The amount of data that Google crunches would surely boggle the mind...provided anyone actually knew how much that really is. This XKCD What-If has an interesting estimate of the capacity of Google and estimates it around 15 exabytes. Which is a lot. That's about the same order of magnitude as the total information content of the human genome...of every living human combined. I'm not a biologist, so the extent of my knowledge of encoding the human genome comes down to "2-bits per base pair" and I'm not sure whether to count both pairs of all chromosomes. But if you do, I think it runs out to about 1.5 GB of information per human, so counting all ~7.3 billion of us, it hits around 11 exabytes.

Of course this is just sort of a comparison of the static storage capacity of Google, which doesn't really tell us the amount of transient data they process. I don't really feel like spending the time trying to research a good estimate for that, so I'll do the lazy thing and just point out that the estimate for global internet traffic per month is on the order of 70 exabytes/month. The fraction of that that passes through Google's servers is anyone's guess, but it should at least give a bit of a reference for just how much data that our civilization routinely tosses around.

Thats the starting point then you might want to look for epistasis and expand to look at the environment variables and also look for epistasis.

https://en.m.wikipedia.org/wiki/Epistasis

Link to comment
Share on other sites

It is rather innocent to believe that all data sets are published. The published data sets are the starting point, the referense set, each genome referenced adds SNPs in general I will not make those data sets public until after I publish or more likely, in the US, HIPAA may prevent authors from publishing these at all, therefore they will remain private to the study. After the results are digested they might publish the digest in supplimentary literatue.

Of course your datasets can be public or private, but it's irrelevant to this discussion. The amount of data is rather small. You're working with gigabytes or terabytes, not with petabytes or exabytes. Analyzing such amounts of data isn't too challenging computationally. There may be more combinations of variables than particles in the visible universe, but it doesn't matter, because data miners learned to deal with such complexity a long time ago.

Link to comment
Share on other sites

Of course your datasets can be public or private, but it's irrelevant to this discussion. The amount of data is rather small. You're working with gigabytes or terabytes, not with petabytes or exabytes. Analyzing such amounts of data isn't too challenging computationally. There may be more combinations of variables than particles in the visible universe, but it doesn't matter, because data miners learned to deal with such complexity a long time ago.

The last time i checked google was not sending their suprcomputers to small laboratories sonthat they could have mainframe processing. BTW, if you check you google results, how accurate are they.

Link to comment
Share on other sites

The last time i checked google was not sending their suprcomputers to small laboratories sonthat they could have mainframe processing. BTW, if you check you google results, how accurate are they.

Almost nobody uses mainframes anymore. A supercomputer is typically just a bunch of ordinary servers. The same kind of hardware that runs most web services. You can buy individual servers yourself or rent them on demand from your favorite cloud computing provider. If you work in an academic setting, you can probably get access to a computing cluster with tens or even hundreds of servers.

Link to comment
Share on other sites

Almost nobody uses mainframes anymore. A supercomputer is typically just a bunch of ordinary servers. The same kind of hardware that runs most web services. You can buy individual servers yourself or rent them on demand from your favorite cloud computing provider. If you work in an academic setting, you can probably get access to a computing cluster with tens or even hundreds of servers.

Not true. Mainframes are alive and well. The 'academic setting' has nothing on big industry... if anything, they rely on access to what big industry provides.

http://www.fool.com/investing/general/2015/01/24/heres-why-ibm-is-still-building-mainframes.aspx

Link to comment
Share on other sites

Not true. Mainframes are alive and well. The 'academic setting' has nothing on big industry... if anything, they rely on access to what big industry provides.

http://www.fool.com/investing/general/2015/01/24/heres-why-ibm-is-still-building-mainframes.aspx

Mainframes are rare in industry too these days. They're mostly used in situations where redundancy and reliability are more important than performance. The biggest and most powerful mainframe on sale today has only about 2x more performance and 67% more memory than the biggest server you can build from commodity hardware. The price tag, on the other hand, is much larger than 2x.

Link to comment
Share on other sites

Mainframes are rare in industry too these days. They're mostly used in situations where redundancy and reliability are more important than performance. The biggest and most powerful mainframe on sale today has only about 2x more performance and 67% more memory than the biggest server you can build from commodity hardware. The price tag, on the other hand, is much larger than 2x.

I don't know where you're getting your information from, but I can tell you I've been out there (in major industry) for the last 35+ years and big boxes are still alive and well. So lets just agree to disagree.

Link to comment
Share on other sites

I don't know where you're getting your information from, but I can tell you I've been out there (in major industry) for the last 35+ years and big boxes are still alive and well. So lets just agree to disagree.

Most of that 35+ years is ancient history.

Google has used commodity hardware from the beginning, and most other big internet companies have made similar choices. Around 10 years ago, Intel and AMD basically took the entire supercomputer market in a few years. These days, if you want performance and scalability, you choose x64. Even IBM is too small to develop cost-effective alternatives to the hardware everyone else is using.

There are basically two reasons for using mainframes. You may have decades worth of legacy software that's not going to be replaced very soon. Alternatively, you may prefer reliability over performance, and hence choose mainframe hardware designed for that particular niche. The niche exists and appears to be stable, but it's not very large compared to the rest of the server market.

Link to comment
Share on other sites

Most of that 35+ years is ancient history.

Google has used commodity hardware from the beginning, and most other big internet companies have made similar choices. Around 10 years ago, Intel and AMD basically took the entire supercomputer market in a few years. These days, if you want performance and scalability, you choose x64. Even IBM is too small to develop cost-effective alternatives to the hardware everyone else is using.

There are basically two reasons for using mainframes. You may have decades worth of legacy software that's not going to be replaced very soon. Alternatively, you may prefer reliability over performance, and hence choose mainframe hardware designed for that particular niche. The niche exists and appears to be stable, but it's not very large compared to the rest of the server market.

Also note that for those who run mainframes for legacy software and not reliability, it is largely an internal IBM business decision to sell [actually I'm pretty sure it's still only lease] a mainframe over supplying the software stack to run on emulation. They were selling "AS/400" [not exactly a mainframe, but with a similar market] hardware that was a Power emulating AS/400 when powerPC was new, I'm pretty sure every AS/400 sold since was emulated.

But that niche makes IBM a nice chunk of change.

Link to comment
Share on other sites

Well in the spirit of the original update i thought i would summarize my progress as it goes into some of the problems. Basically after trying fix several problems with my linux ubuntu install, it finnaly messed up the mbr of the booting drive that i literally had to reprogramm grub2 or the bios everytime the computer started.

The whole thing started out bad, the reason was a fully vetted installation guide for a dual boot system is not given on the ask ubuntu site. (the folks that give answers on ask make alot of assumptions about how familiar people are with the OS, which is in the spirit of the original post). The critical ingredient was hiw to setup the linux drive. Simply partitioning the drive with ef4 format is not sufficient, one small drive of about 650 mb needs to be set up with an efi. This was not explained, the second thing is that the home directory should be on a separate partition, this allows an inhanced data protection.

BTW, the correct instructions were found on one of those click baity sites.

To do this right i had to wipe both windows and ubuntu, :^(. The whole problem started because the iso that was recommended was done so on a mistaken assumption, also the fault of bad explanation. Amd64 ubuntu is amd64/compatible x86-64 not solely for AMD....the i386 is for roughly any 32 bit machine. These names come for historical reasons, intel had the first 32 bit OS for the genre and AMD had the first fully 64 bit. Many of the commands were failing, lots of things didn't work. I thought gee did the installer didnt recognize my machine and installed the wrong version, nope the installer the was the wrong version.

Things are definitely improved, I found an iso for 15.04, having that and having Ubuntu unleashed, i got on how to create a iso with my linux box and put the version of ubuntu on the mem pen that had the bad linux, therefore wiping it to bit-bucket land. The windows install went well, the ubuntu 15.04 install (Something else) went well up until the partitioning, in which it was not showing the values required in the list, i reset the table, didn't work, cleared values and tried again, to no avail, and the computer froze. SO i rebooted and attempted to reinstall again and this time the right partition type appeared in the list and everything proceeded according to instructions. The lesson here is bad advice not only waste time, but it can make the damage diificult to reverse.....be concise but also complete. The other thing is that 15.04 has better cooperation with UEFI so that only a few changes are needed to make the two OS compatible on the same machine.

With 15.04 x86-64 running stably many things just work better. This version of ubuntu has more features, like a more personalizable desktop and more choices of programs that autoload and compiie, though honestly most are junk. Monodevelop has an install link, thank god because the install instructions on the xamarin site is (in the spirit of this thread) a waste of time. Though exceptionally on the previous version of ubuntu my sound card had a problem but ubuntu gave several solutions and one of these was the fix, for 15.04 two of the solutions were left of the list, so sound blaster now makes freaky noise like a childs machine gun.

So basically the trend here is rather than provide good well thought out instructions, the linux folks have option fo push button solutions. We need a course for the GNU community on how to communicate with humans. :^)

BTW my ubuntu desktop is starting to look like my XP desktop : ^).

Link to comment
Share on other sites

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...