Software engineers and the rest of the world.

PB666 · October 11, 2015

Pretty much a wasted morning so a bit of a rant.

As many of you know I am in search of a modern 64 bit software platform, either and IDE or something of the sort to take a process intensive project(s) and speed them up.

Having pretty much run into a brick wall with ubuntu and the various OS specific idiosyncracies. I though I was done with the reading but then was direct on the net to other resources.

The first is the Intel x64 and IA32 programming language reference guide (the 3603 page guide).

Now they start out with a general description of the software specifications (note that each software term gets converted into hexadec prior to making. exe, so the manual is basically a compendium of how to use the nomenclature with the architecture to get the expected results) , which is great, but then they sort of push the reader off the cliff with extremely technical descriptions that really, seriously most users. Most users need upfront the basics, after they have the basics stuff, you can put the details in an appendix or supplimentary materials. What the use needs to know for instance how a r register can be accessed, how it is properly referenced in code, what are the limitations. Simple

can you access a r register in the same way as an legacy register and specifically are the differences between r and legacy in accessing.

The basic problem is that you would have to already know old specifications and old methods to understand this stuff, the basically ignore the nomenclature arguement

you have just added for example 30 new registers to your CPU, you might actually tell folks what the naming conventions are . . . . . . . . .

One that 3603 page document they couldn't've have added this

and a reasonably verbose description thereof by subdividing this picture into parts and explaining the parts (graduate student seminar 101). Even so this is a poorly drawn picture because the legends are randomly interdispersed with images objects (case missed it the legend is in the top right corner, and a sort of legend is below the Instruction pointer/flags image). So . . . . still do not know how R8 to R15 can be accessed, which may explain why some IDEs don't interact with these terms, yet. Kate (ubuntu) seems to ignore all the x64 registers.

Second thing is Intel, guys seriously, you have built a CPU that has I dunno 1 to 200 subvariants for each instruction type (looking at the manual you could have completely described the entire Z80 insrtuction variants on a single x64 instruction type page), so why can't you write an assembler . . . . . . . .Most computers can boot from a memory pen, and just about any fool can write an assembler for core linux or MS. So any way they dust themselves off ans send the user to NASM, YASM, whatever pages. Ah at last an IDE, well no.

Microsoft has decided after 2005 that they don't want to support an inline assembly and so now you must use the routine assembler for C++. Not sure why but it may have to do with their moving away from old dos compatibility, in either case I expected to see some description of how to use this as part of C++ or something in the C++ drop down that gives access to the routine assembler, nope, not to be found. No description, nothing, just saying it can be done, but no descripition of how it can be done. So now its likely I will have to pour through an online programming C++ guide (Or buy one, even though I already have 3 none were written specifically after 2010) to find a rare and obscure reference. Wonderful. So stumbling around trying DL what appear to be 64 bit files for FASM, windows 10, decided, instead of copying these files like I instructed it to, loaded the files into the C++ derived assembly editor. More wonderful, you don't give me direct access to the assembler but you give the file copying routine access.

Fine so moved onto YASM. Now I applaud the open source stuff I really do, but for god's sake should they please explain how to install the software. If this was publication stuff and a referee caught wind of your methods section, you would be immediately rejected, I guess that is why science has referees and companies do not. So basically users have to move your download to a working directory where you are supposed, then substitute the actual version of YASM for the name in the instruction. You might think the *.exe installs the software, check the install list, nope, it is the software, so if you use command line code to access it, it basically needs to be named right on the command line. The best thing to do when you get the software is to rename it so that the actual description works on the software, better yet rename it y.exe (let the name try to figure out why they gave confusing instructions).

Intel also mention 2 other softwares, but both are grossly out of date.

The assembler that actually works, called FLAT assembler FASM has a rather primative IDE, but, as I said, without a readme on the install, anyone whose installed a zip into KSP can handle the install on this one (which means either we need to give better instructions or installing from an unzip is information we inherit at birth as part of the geek gene). They do provide a pdf (PDF) but I overlooked will looking for the readme. Anyways, they actually might have made it through a thorough graduate school. Spoke too quickly, this is not a full x64 IDE, it only handles x64 up to MMX standard (about 15 years old now) it apparently does not handle the r sets of registers. hmmpf. SOL.

I've been picking on MS and Intel. I see many Linux sites where basically the moderator kills a thread that the user is asking a question I want answered? why? Ubuntu, why have a site were users are only allowed to give a third of the answer and you kill the thread, because it raises potential for conflict? The other annoying thing (very in fact) is that the moderator will direct the discussion to a link and the link will be dead, this has happened so many times, its rather annoying. Jeeze if they think the link is so valuable that you are killing future discussion in a thread because a link exists, give a synopsis of the sites information, because that site may disappear 10 seconds after you link it. (me being hypocritical, but this is a game group not something people may otherwise use for important stuff).

The other annoying thing, I know everyone wants money, I know advertising is important, throwing click-bait willy-nilly into a helpsite is not helpful, its harmful; its actually the easiest way to get a OS controlling malware (adware) on to your computer. The other thing I have found is that the click-bait is killing bandwidth, major big time killing. I was having trouble using my Ipad when I had fire-fox loaded on my linux box, just sitting there, I added this add-on for Firefox that only permits scripts after it ask the user, many of these help sites two -thirds the visible content disappears, and the bandwidth reappears. On some pages it looks like basically the content is an add-on, on top of a big advertising script page (e.g. BBC News, :^( ) I think we need to start referring to this stuff by its old name, "bouncing bologna".

Web page hosting is not that expensive, I used to get it for free on my previous ISP and it had no advertising what-so-ever on it. So why now do these help-gurus treat web-page space likes its Time Square.

Edited October 11, 2015 by PB666

Jouni · October 11, 2015

You're not reading a "Teach yourself Intel x64 assembly in 21 days/weeks/months/years" guide. You're reading a reference manual intended for the small minority of experts who are already skilled in assembly programming but need a reference for technical details. If you're not already familiar with a modern assembly language, you should spend a couple of years learning and practicing one first. The websites of most decent universities should point to suitable learning resources.

magnemoe · October 11, 2015

This, assembly is expert level stuff. No reason to work with assembly unless you are an c++ guru in the first hand.

30 years ago I programmed assembly was trying to make an game and it was the only way to work fast with the graphic memory in the DOS days.

My demo only worked on a few computers.

PB666 · October 11, 2015

You're not reading a "Teach yourself Intel x64 assembly in 21 days/weeks/months/years" guide. You're reading a reference manual intended for the small minority of experts who are already skilled in assembly programming but need a reference for technical details. If you're not already familiar with a modern assembly language, you should spend a couple of years learning and practicing one first. The websites of most decent universities should point to suitable learning resources.

Not really, but anyway you start with an elitist bias so not wasting time. This misinfo would not have lasted long in the era of the II or 8088. Such elitism generally meant you customer based moved on to other products, as what happened to the Mac and OS2.

- - - Updated - - -

This, assembly is expert level stuff. No reason to work with assembly unless you are an c++ guru in the first hand.
30 years ago I programmed assembly was trying to make an game and it was the only way to work fast with the graphic memory in the DOS days.
My demo only worked on a few computers.

The folks at linux don't think so they built assembler stuff right into the OS. So basically what you are arguing is that if you progarm IA32 then its ok to write stand alones, but if you program x64, no sense targeting anything lower than C++ dll. C++ for me is nothing more than a convinient interface, prolly use it to inteface VB. Its making the same argument that K2 has about VB, people using it for the gui shells it produces. I dont mind using it, but C++ is completely independent of the x64 instruction set and so the two should not be confused.

Jouni · October 11, 2015

Not really, but anyway you start with an elitist bias so not wasting time. This misinfo would not have lasted long in the era of the II or 8088. Such elitism generally meant you customer based moved on to other products, as what happened to the Mac and OS2.

Well, it's no longer the 8088 era. Most programmers don't write assembly by hand, because it's almost always a waste of time. Unless you're really good at writing assembly and devote a significant fraction of your time to keeping up with hardware developments, your compiler probably writes better assembly code than you do. Those of use who have specialized on something else just write our performance-critical code in our favorite language, and let the compiler sort out the low-level issues.

PB666 · October 11, 2015

Well, it's no longer the 8088 era. Most programmers don't write assembly by hand, because it's almost always a waste of time. Unless you're really good at writing assembly and devote a significant fraction of your time to keeping up with hardware developments, your compiler probably writes better assembly code than you do. Those of use who have specialized on something else just write our performance-critical code in our favorite language, and let the compiler sort out the low-level issues.

The key phrase is almost always, but even there its a logical flaw, because unless you specifcslly kniw what version of c++ you are using and the default specks, its not likely to be optimal for that CPU. Or to put it like this, i bought a specific set up to perform a soecific function should i just assume that some unspecified off the shelf is optimal, while it use those math.h functions that relly on x87 or the latest version of SSE. If you are writing a game to be used across many platforms, you may not care, but if you are writing a program say to analyze certain characteristcs of every star in a thirty triilion star msp of the local universe, you might care.

The other, really, without patronizing intel. It would be so easy to take a nearly retired engineere and have him set with a software engineer and write an assembler with on the fly documentation. if for no other reason than to have a quick platform to test things like whether you 80586 processor will divide properly in all instances of integer divide. Had they have done that they woukd have avoided a reputation hit. Seriously nowadays memory pen, simple OS and an IDE and you have a sweet way of getting people inteseted in your machine. Arrogance is ts own tax.

Jouni · October 11, 2015

If you are writing a game to be used across many platforms, you may not care, but if you are writing a program say to analyze certain characteristcs of every star in a thirty triilion star msp of the local universe, you might care.

It's actually the other way around. If you're writing a game that thousands of people will play for a long time, it pays off to optimize the game as far as possible. If you're just processing and analyzing data, writing the code can easily take longer and be more expensive than running it. In the latter case, you have to know when the code is good enough, so you can stop optimizing it and start using your time for more productive purposes.

The other, really, without patronizing intel. It would be so easy to take a nearly retired engineere and have him set with a software engineer and write an assembler with on the fly documentation. if for no other reason than to have a quick platform to test things like whether you 80586 processor will divide properly in all instances of integer divide.

If you're referring to that old division bug, Intel did way more than that. They started hiring world's top experts in automatic software verification, made huge leaps in the theory of derivation and verification of provably correct hardware, and put that theory to good use.

PB666 · October 12, 2015

We are touching on the same thread that you and K2 are arguing, the one were you are arguing that efficient tracing is a waste of time. I kind of think we are not going to agree. I would be willing to bet that one of those world class engineers prolly on of them has his assembler that he uses to verify. In the real world in silico analysis does not bounce very far.

SomeGuy12 · October 12, 2015

PB, I personally have had to optimize stuff in assembler for micro controllers just a few weeks ago. I was running on the internal oscillator at x1 multiplier for power consumption reasons, and the compiler's solution was just not adequate. By manually using lots of registers (PICs have 16 which is a lot for a microcontroller) I was able to boost performance considerably.

Even then, I used it very sparingly. I had an operation I needed to happen on a hard timing edge, so I pre-loaded registers with the states, and I had a loop that was acquiring data at a high sampling rate that needed to work smoothly.

What I'm saying is, before you even touch assembler, have you

1. Optimized your algorithm?

2. Used all the multithreading you can?

3. Identified a core, tiny main loop where all the magic is happening where you think you can boost performance further?

Have you tried GCC or Intel's compiler? It often beats Microsoft's compiler on the programming shootouts and it may support ASM.

What I did in order to learn assembler quick and dirty was I just found the loops I needed to speed up in the assembler listing file. I looked at how the compiler did it, looking up each instruction. I then copied the compiler's solution to a section of inline C code, and got it to run using the compiler's working solution for that step. I had to email the compiler authors as their inline assembler had errors and could not read it's own listing file format.

Once I got the code working again in inline, I make tweaks a little bit at a time, checking the results for correctness, and eventually I was able to pare it down to about 1/4 the instructions, although I used a lot more registers to do it. You can do the same, possibly.

Be aware that this means your solution will be machine specific. Those special registers you are having a tough time digging up? They won't be on every chip.

Also, if you are doing something complex, with complex logic, you just don't have time to optimize it all in assembler. You would probably be better served by redesigning your algorithm to be more cache friendly, or making it multithread better and just rent access to more cores if you need it. I was using a janky, poorly written compiler to get the speedups I was getting - I bet you can't do better than 50% faster hand optimizing the output of GCC.

For that matter, if your algorithm is N-body gravity, which is sounded like above, you should be using a GPU. Orders of magnitude more power. Check the Nvidia example for a basic implementation.

Edited October 12, 2015 by SomeGuy12

Jouni · October 12, 2015

We are touching on the same thread that you and K2 are arguing, the one were you are arguing that efficient tracing is a waste of time. I kind of think we are not going to agree.

This is mostly about a different issue. I'm arguing that if you're interested in efficiency, you should make sure that you're measuring the real costs vs. the real benefits.

Assume that you want to analyze a dataset. You know which methods you want to use, but you have to implement them yourself. There are two scenarios. Either you spend one week writing the code and four weeks running it, or you spend four weeks writing and optimizing the code in order to make it run in a week. Which one is more efficient?

In both cases, you get the results in five weeks. The difference is whether you lose three weeks of working time or three weeks of computing time. If your employer is running a large-scale datacenter, your time is probably worth around 1000 CPU cores running 24/7. If you buy the CPU time from Amazon or your employer only has a few computing servers, your time can be worth as little as 100-200 CPU cores. Unless the analysis requires a lot of hardware, it's probably more efficient to spend just one week writing the code and use those three weeks on another project.

I would be willing to bet that one of those world class engineers prolly on of them has his assembler that he uses to verify. In the real world in silico analysis does not bounce very far.

I wasn't talking about engineers. I was talking about world-class researchers making major scientific breakthroughs and putting the results to good use. Even though modern CPUs are orders of magnitude more complex than those we had in the 90s, we haven't seen a similar increase in hardware bugs. That's because Intel not only tests its products but also formally proves that many of their subsystems work correctly. As any scientist knows, experiments can only prove that you're wrong, while no amount of experiments will prove you right.

PB666 · October 12, 2015

This is mostly about a different issue. I'm arguing that if you're interested in efficiency, you should make sure that you're measuring the real costs vs. the real benefits.
Assume that you want to analyze a dataset. You know which methods you want to use, but you have to implement them yourself. There are two scenarios. Either you spend one week writing the code and four weeks running it, or you spend four weeks writing and optimizing the code in order to make it run in a week. Which one is more efficient?
In both cases, you get the results in five weeks. The difference is whether you lose three weeks of working time or three weeks of computing time. If your employer is running a large-scale datacenter, your time is probably worth around 1000 CPU cores running 24/7. If you buy the CPU time from Amazon or your employer only has a few computing servers, your time can be worth as little as 100-200 CPU cores. Unless the analysis requires a lot of hardware, it's probably more efficient to spend just one week writing the code and use those three weeks on another project.
I wasn't talking about engineers. I was talking about world-class researchers making major scientific breakthroughs and putting the results to good use. Even though modern CPUs are orders of magnitude more complex than those we had in the 90s, we haven't seen a similar increase in hardware bugs. That's because Intel not only tests its products but also formally proves that many of their subsystems work correctly. As any scientist knows, experiments can only prove that you're wrong, while no amount of experiments will prove you right.

You mean like cigsrette companies hiring scietist, soda companies hiring scientist, oil companies hiring scientist to study global warming, lol. In silico analysis . . . If you are testing processors for errors better think you are actually testing them, a scientist cant tell you wether a microchip with twenty billion processors on it has got a circuit just a little too close as a result of a little bit to much force applied in som abcure manufacturing step. I think we are again heading directly into the same logic as the other thread, no thanks.

Jouni · October 12, 2015

You mean like cigsrette companies hiring scietist, soda companies hiring scientist, oil companies hiring scientist to study global warming, lol. In silico analysis . . . If you are testing processors for errors better think you are actually testing them, a scientist cant tell you wether a microchip with twenty billion processors on it has got a circuit just a little too close as a result of a little bit to much force applied in som abcure manufacturing step. I think we are again heading directly into the same logic as the other thread, no thanks.

A few points:

1) Large technology companies have always invested in scientific research. The most famous example was the Bell Labs, whose employees received several Nobel prizes for the work done there.

2) "In silico" is biology/bioinformatics-specific jargon. It doesn't mean anything in this context.

3) Testing can only reveal that the processor works correctly with a negligibly small fraction of possible inputs. Testing for manufacturing defects is no different from other testing. I was talking about formal verification, where integrated circuits (or computer programs) are treated as mathematical proofs, and the proofs are verified to be correct. If you've had computer science education, you should be familiar with the basic techniques, such as preconditions, postconditions, and invariants.

wumpus · October 12, 2015

It's actually the other way around. If you're writing a game that thousands of people will play for a long time, it pays off to optimize the game as far as possible. If you're just processing and analyzing data, writing the code can easily take longer and be more expensive than running it. In the latter case, you have to know when the code is good enough, so you can stop optimizing it and start using your time for more productive purposes.
If you're referring to that old division bug, Intel did way more than that. They started hiring world's top experts in automatic software verification, made huge leaps in the theory of derivation and verification of provably correct hardware, and put that theory to good use.

Eh? The old division bug happened because both the original hardware design and the verification proof both had mistakes. From memory, the verification engineer manually proved half the lookup table correct (must have been doing an inversion before multiplying) then proved the rest by comparison to the previously proven numbers. Unfortunately, the second part wasn't quite correct.

Considering that they were designing the most complex processors in the world (they started the Pentium [the one with the division bug] and the Pentium Pro [that could keep up with most outrageously expensive RISC chips] at the same time), I'd say they would have already started building a team of the best and brightest in automatic software verification (verification takes *way* more engineer-hours than design) with or without said bug.

- - - Updated - - -

Not really, but anyway you start with an elitist bias so not wasting time. This misinfo would not have lasted long in the era of the II or 8088. Such elitism generally meant you customer based moved on to other products, as what happened to the Mac and OS2.

Why are you still whining about "elitist bias"? If you are so much a better coder that you can produce your great work while ignoring the last 30-40 years of software design, just read the 3603 pages and write the code.

Or you could actually listen to those who tell you that managing megabytes in assembler was an absolute disaster and that doing the same for gigabytes might not be the best way to proceed. Why not write it in Z-80 code while you are at it if you prefer that scheme? People don't buy AMD64 machines to run programs written in Assembler, so why in the world would Intel care about assembly (and bother to make CPUs that were easy to write assembler in)? They've only shipped a few billion CPUs since the last major assembler work was written (no clue what it was, probably a MS-BASIC derived thing kept alive since it contained the last code Bill Gates wrote), assembly is strictly an afterthought (small exception: The Linux 1.x kernal was "written in C that may have well been assembly...").

And just out of curiosity, just how in the world do you expect assembler to help your memory localization problems? Because that [and hard drive localization problems...] is more or less everything that slows down a modern computer. Cycles (and cutting a square root down) are free. You can take 100 times (for "small programs", expanding to infinity for longer ones) longer to write your great work in assembler than python, but if the program has to keep accessing memory instead of cache, the thing will grind to a stop.

Jouni · October 12, 2015

Eh? The old division bug happened because both the original hardware design and the verification proof both had mistakes. From memory, the verification engineer manually proved half the lookup table correct (must have been doing an inversion before multiplying) then proved the rest by comparison to the previously proven numbers. Unfortunately, the second part wasn't quite correct.

Since then, Intel has developed better verification methods that don't depend as much on people not making mistakes.

Considering that they were designing the most complex processors in the world (they started the Pentium [the one with the division bug] and the Pentium Pro [that could keep up with most outrageously expensive RISC chips] at the same time), I'd say they would have already started building a team of the best and brightest in automatic software verification (verification takes *way* more engineer-hours than design) with or without said bug.

The Pentium FDIV bug was the main catalyst for Intel investing in formal verification, because it cost them real money. (Or at least that's what the people hired in the late 90s always say.) Obviously they would have done that eventually without the bug, but the way they harvested formal verification talent from the entire world sped up the development by at least 5 years.

K^2 · October 12, 2015

It's actually the other way around. If you're writing a game that thousands of people will play for a long time, it pays off to optimize the game as far as possible. If you're just processing and analyzing data, writing the code can easily take longer and be more expensive than running it. In the latter case, you have to know when the code is good enough, so you can stop optimizing it and start using your time for more productive purposes.

It's even worse than that. I was working on a cluster that had 500 nodes with dual Titans on them. So I could have ran two layers of parallel computing. I already had the data split into coarse almost independent blocks that were perfect for processing on individual nodes in an MPI environment. And that brought computation time down from several months to 1-2 days per run. By running all of my numerical integration steps on graphics hardware, I would have been able to bring it down to mere hours of execution time. I've ran the tests and was happy with the results. But it would still take easily a week of my time to set it up and make sure it works correctly in full scale. And despite the fact that I ended up running the code with full data sets at least a dozen times, I've never bothered finishing that optimization. Because time taken by cluster to do computations while I'm asleep or doing something else is way less valuable than my time actually digging in code trying to speed it up.

But yeah, this was a clear example of a case where a bottleneck was obvious as well. If you're running complex-valued integrals in 7 dimensions, that's always a safe bet. Life isn't always so simple, though. Especially when you've grown bored of academia, and it is now your day job to make sure that hundreds of thousands of users are getting a good framerate in their favorite (hopefully) game.

Edit: There is one thing on which I have to categorically disagree, though. This goes both for scientific software and game dev. There are cases where optimizer won't do jack for you. Write your best version of C code to multiply two quaternions, for example. Then write the same code with SSE intrinsics and watch performance go up by a factor of 2-3. Same goes for a lot of the heavy lifting, and if you write your code this way from the start, it's virtually no extra effort on your part.

Edited October 12, 2015 by K^2

PB666 · October 13, 2015

But yeah, this was a clear example of a case where a bottleneck was obvious as well. If you're running complex-valued integrals in 7 dimensions, that's always a safe bet. Life isn't always so simple, though. Especially when you've grown bored of academia, and it is now your day job to make sure that hundreds of thousands of users are getting a good framerate in their favorite (hopefully) game.
Edit: There is one thing on which I have to categorically disagree, though. This goes both for scientific software and game dev. There are cases where optimizer won't do jack for you. Write your best version of C code to multiply two quaternions, for example. Then write the same code with SSE intrinsics and watch performance go up by a factor of 2-3. Same goes for a lot of the heavy lifting, and if you write your code this way from the start, it's virtually no extra effort on your part.

Yep, while 7 dimensions is pretty far up there, I have had analysis as far as 10, where you need to be able to access most of the variables directly, it really builds up the code. The problem in science Jouri doesn't realize is that there is theoretically no limit on the dimensions one might have or the size of any one dimension. If you go out to 4 dimensions on a modern computer it does not take that much processor power, add a couple of more and it could take weeks. How you know you are going to have problems, if your nesteds in Excel break the machine, your are already in VBA land and moving toward stand alone programs. I had a i386 assembly program that ran on 40mhz DX with 64mb of memory for a month to complete.

One of the common things that is done now-a-days is to data-mine clinical, demographic and environmental parameters with computerized submissions, some of these studies in Europe for example DAISY study of type 1 diabetes has 10,000s of patients and you are looking for correlations of potentially hundreds of variables. This is particularly important because the identical twins studies pretty much divides environmental and genetic influences and currently only a small fraction of both have been identified (genetics is rather progressed). So you can have 200,000 SNPs, millions if they emply disease-specific genome sequencing coupled with typing correlated potentially with 100s of environmental variables and the computer needs to be able to pick out that an gastrointestinal virus outbreak in some province in 1972 is associated with a spike of new cases that appear a year later or 40 years later. The basic problem here is not that you are looking for unknowns, you are, but just like dark matter and dark energy, you know the unknowns are there, and so the equipment and the analysis need to be sensitive enough to detect these. He's basically wrong, if you know your algorithm is going to have to run either once or repeatedly for several months, then its better to optimize, in this case it will allow you to include more comparisons.

K^2 · October 13, 2015

Well, I hate to throw yet another language at you, but if straight up integration with minimal branching is your task, and you have no access to a cluster, then CUDA is your friend. GPU is way better than CPU at that sort of thing. Way, way better if you are happy with single precision. Up side, it's just like writing in C. Downside is that it's just like trying to program for thousand computers. In C. And it is annoying to set up. (But there are video tutorials on Youtube.) Holly transistors it is fast, though. Like, "I can't believe you just did four hours of math in a minute," fast. Well, depending on your graphics card. But even on low end it is unbelievably fast. It is like having your own cluster. One that is only good for this kinds of task, but if that is all you need to do, that is good enough.

PB666 · October 13, 2015

Well, I hate to throw yet another language at you, but if straight up integration with minimal branching is your task, and you have no access to a cluster, then CUDA is your friend. GPU is way better than CPU at that sort of thing. Way, way better if you are happy with single precision. Up side, it's just like writing in C. Downside is that it's just like trying to program for thousand computers. In C. And it is annoying to set up. (But there are video tutorials on Youtube.) Holly transistors it is fast, though. Like, "I can't believe you just did four hours of math in a minute," fast. Well, depending on your graphics card. But even on low end it is unbelievably fast. It is like having your own cluster. One that is only good for this kinds of task, but if that is all you need to do, that is good enough.

I've heard GPUs can be run, there was a professor on NPR talking about how he increased the speed of his computer 5 fold by using the GPU to do some of the math. Right now I would be very happy if I could get the ubuntu debugger to work they way everyone says it works. The problem is that I never considered the GPU capabilities when i built my current, its just an off the shelf board with an Intel GPU (which is now not being used) and anoother off the shelf video card. I think the i7 has the high performance on-board processor, not willing to pay the extra 200$.

First branch I have calculated the minimum number of branches to fully utilize the i5s x64 instruction set is 40 per complete cycle, hopefully at least 32 cycles per call, its about 1:1 conditional:unconditional with about a 40% jump instruction frequency (could be off by 10%).

Anyone familiar with using gdb and kgdb on unbuntu 14.0, seems like its broken.

Jouni · October 13, 2015

Yep, while 7 dimensions is pretty far up there, I have had analysis as far as 10, where you need to be able to access most of the variables directly, it really builds up the code. The problem in science Jouri doesn't realize is that there is theoretically no limit on the dimensions one might have or the size of any one dimension.

What's that supposed to mean? Using 10 dimension means just that you're handling objects that consist of 10 numerical values. In the things I'm working on, objects typically have from billions to hundreds of billions of "dimensions". Processing such data isn't that hard, because running time depends on algorithms, not on dimensions.

What you described in the next paragraph sounds like a fairly standard data mining problem with datasets that aren't too large by today's standards. I know a lot of people working on similar problems, and exactly none of them writes any code in assembly.

PB666 · October 13, 2015

What's that supposed to mean? Using 10 dimension means just that you're handling objects that consist of 10 numerical values. In the things I'm working on, objects typically have from billions to hundreds of billions of "dimensions". Processing such data isn't that hard, because running time depends on algorithms, not on dimensions.
What you described in the next paragraph sounds like a fairly standard data mining problem with datasets that aren't too large by today's standards. I know a lot of people working on similar problems, and exactly none of them writes any code in assembly.

Then i seriously doubt you know what you are talikng about. one of the questions ask in one our worksops awhile back was why one oarticular reaearch could not resolve more interactions. For example the might 5000 factor in a cell that can be identified, but the could only display inteactions for a limited number simply because they lack CPU power, as I understand this the chinese built him a multimillion dollar supercomputer.

You seem to be easily blowing off problems that you have absolutely no idea what you are talking about, I saw the same behavior in the tracing discussion with K2. Science I can tell you with out hesitation has processing needs that blows the doors off of game computers. Just submit a reasonably long query to Blast and see how long it takes for them to get you back a response. to the specific critique you raise, ther are feilds like name, address, PID, that combined can create a search. There are other things that do not. When you have say 10000 base units, and each base units has 1000000 max quartenary markers and each of those markers may be in condition related epistasi, but you have no aprioris, and in addition there can be environment gene epistasi again with maybe one aprior. The rest is carving out dimensions. You ca optimize it starting with the most linked alleles or supect invironmental links, but the reality you will burn though those rapidly, in the case of low penetrance diseaese you are looking for a large environmental comtributer that is virtually invisible and 90% without aprioris. You can have facto x in epistasis with an associated gene, but because one ir two of the epistatic genes are so uncommon in folks withou the gene env combo it will be virtually invisible. That is why some groups advocate sequencing the entire genome of people affected by low penetrance disease, in the sheer hop they can find a marker with better linkage to a nearby marker with poor linkage.

Think about all the variables in your life

month born in, year born in, were you breast fed, how long, did you get adequate vitamins, what types if meat did you eat, how old were you when you started eating cereals, do you work indoors or out, are you exposed to alot of microbes or not, when did you start havin relationship with others, do you have siblings with similar conditions, what diseaes gave you had, what vaccinations have you had, what do you eat, what types of fat, sugars, protein, do you exrcise, indoors or outdoors,.......now multiply all that agianst all the rare markers you have........A common man would say no, but this is not common stuff, genes and environment are in dynamic equilibrium, selection acts in the gene via the individual, and the individual interacts with the gene in every way he can vary his environment. Even moving across a bridge has been demonstared to have an effect on the genetic penetrance. This is why i make the statement about quantum states be a small scale behavior, penetrance in humans is the equivilant of quantum uncertainty.

Jouni · October 13, 2015

Then i seriously doubt you know what you are talikng about. one of the questions ask in one our worksops awhile back was why one oarticular reaearch could not resolve more interactions. For example the might 5000 factor in a cell that can be identified, but the could only display inteactions for a limited number simply because they lack CPU power, as I understand this the chinese built him a multimillion dollar supercomputer.

If you doubt whether I know what I'm talking about, you can trivially check what kind of expertise and what qualifications I have. I haven't made any effort to hide my real identity on the forums.

Science I can tell you with out hesitation has processing needs that blows the doors off of game computers. Just submit a reasonably long query to Blast and see how long it takes for them to get you back a response. to the specific critique you raise, ther are feilds like name, address, PID, that combined can create a search. There are other things that do not. When you have say 10000 base units, and each base units has 1000000 max quartenary markers and each of those markers may be in condition related epistasi, but you have no aprioris, and in addition there can be environment gene epistasi again with maybe one aprior.

What you describe is not a very large dataset by today's standards. You have maybe a few hundred gigabytes, while businesses routinely mine terabytes or even petabytes of similar data. The methods are well-known, and the entire world from programming languages to software and from hardware to infrastructure has been optimized for processing that kind of data.

Biologists tend to have trouble with smaller datasets than people from many other fields. In part this happens because biologists (especially those who chose biology already in college) often lack the culture, the infrastructure, and the expertise for processing large amounts of data. Another reason is that biological data is often sequence data, which requires quite different methods, hardware, and infrastructure to process than mainstream numerical/categorical data.

Yourself · October 16, 2015

Well, I hate to throw yet another language at you, but if straight up integration with minimal branching is your task, and you have no access to a cluster, then CUDA is your friend. GPU is way better than CPU at that sort of thing. Way, way better if you are happy with single precision. Up side, it's just like writing in C. Downside is that it's just like trying to program for thousand computers. In C. And it is annoying to set up. (But there are video tutorials on Youtube.) Holly transistors it is fast, though. Like, "I can't believe you just did four hours of math in a minute," fast. Well, depending on your graphics card. But even on low end it is unbelievably fast. It is like having your own cluster. One that is only good for this kinds of task, but if that is all you need to do, that is good enough.

It really is remarkable what consumer-grade hardware is actually capable of. And it just keeps getting better.

PB666 · October 16, 2015

If you doubt whether I know what I'm talking about, you can trivially check what kind of expertise and what qualifications I have. I haven't made any effort to hide my real identity on the forums.
What you describe is not a very large dataset by today's standards. You have maybe a few hundred gigabytes, while businesses routinely mine terabytes or even petabytes of similar data. The methods are well-known, and the entire world from programming languages to software and from hardware to infrastructure has been optimized for processing that kind of data.
Biologists tend to have trouble with smaller datasets than people from many other fields. In part this happens because biologists (especially those who chose biology already in college) often lack the culture, the infrastructure, and the expertise for processing large amounts of data. Another reason is that biological data is often sequence data, which requires quite different methods, hardware, and infrastructure to process than mainstream numerical/categorical data.

No because the published data sets are the starting points, you believe that gigabytes of informstion is the data set, in reality that is the starting point, once you have your initials the responsibly you have to sequence you patient poulations and family trios, in that case youbcould have 100 genomes of data, then that could could gonthrough its own genomewide association study. so for example, if we are to talk about a shortish region, mtDNA, the last time i checked there were 10000 or so published variants, probably more. The y-chromosomal variants are probably equal to the number of men walk on the earth, for the most part we don't care. In the HLA region there are something like 3000 DRB1 variants or B variants. This is the thing you fail to understand, there is a reason for the cliche, the biology always gets you.

We can add to this, prolly for a non geneticist, he looks at a book and he sees a pretty picture of a gene in a box that comes from some 'model' organism and is inclined to believe that a gene is something that is 3000 or so nt and fits in a quadpartite structure. QTLs for genes can be founds 100000s of nt away from the gene. Human genome variation project reveals that there can be 1000s of variants within the range of QTLs for a gene.

You might argue that these are trivialities, but the QTLs for low penetrance diseases are the lingua franca of genetics. Most of the newly discovered links are best described as QTL. One particular example resides in a region that would be otherwise considered junk DNA between three other genes at a spread of 0.5M nt. It turned out to be the master control locus of all three genes. if you can imagine your first dimension is the chromosome, the second dimension has the origin at the centromer, the third dimension are all the possible variants at each locus, then cross product of which are all the epistatic locus, you havent even crossed. Then there is logical processing, individual genomes have two superposed haplotypes strucutes that are generally not known but can be inferred based on population data, these structures are almost as important as the variation itself since most control elemnts are cis acting.

Jouni · October 16, 2015

No because the published data sets are the starting points, you believe that gigabytes of informstion is the data set, in reality that is the starting point, once you have your initials the responsibly you have to sequence you patient poulations and family trios, in that case youbcould have 100 genomes of data, then that could could gonthrough its own genomewide association study.
[..]

What was the point of all that? If you strip away all the unnecessary jargon, you were just admitting that the actual datasets aren't too large or too complex by today's standards.

When I was a student, we had one of the best data mining / machine learning / data analysis groups in the world at the university. I took a few classes from them, and while I ultimately chose to specialize in another area, I'm quite familiar with the basic techniques for handling the kind of complexity you describe.

These days I'm working in a major bioinformatics / genetics research institute. While my own research is mostly computer science, I get a fair bit of exposure to the work done in genetics. Based on what I see, computation is rarely the real bottleneck. If something is computationally infeasible, it's probably so infeasible that a 10x or 100x performance improvement won't make it feasible. The real bottleneck is almost always the lack of relevant expertise.

People with a background in methods (computer science, statistics, mathematics, physics, or something similar) are generally familiar with one or more relevant methodological areas, such as algorithms, statistics, or high-performance computing. People with a background in biology understand the data better and ask better questions about it. People with a mixed/bioinformatics background are familiar with the best practices and tools in bioinformatics. It's rare to see a person who's familiar with most of the relevant aspects of a research project. Or in other words, understands what the group is really doing. Most of us don't have what it takes to become world-class experts in multiple unrelated fields, and to stay up-to-date in all of them.

I'm less familiar with the data analysis large companies like Google, Amazon, and Walmart are doing. Based on what's publicly available, they work with orders of magnitude more data than we do, and they're facing at least as complex questions. While biology is a complex subject, language, economy, and human behavior are also quite complex.

Yourself · October 17, 2015

I'm less familiar with the data analysis large companies like Google, Amazon, and Walmart are doing. Based on what's publicly available, they work with orders of magnitude more data than we do, and they're facing at least as complex questions. While biology is a complex subject, language, economy, and human behavior are also quite complex.

The amount of data that Google crunches would surely boggle the mind...provided anyone actually knew how much that really is. This XKCD What-If has an interesting estimate of the capacity of Google and estimates it around 15 exabytes. Which is a lot. That's about the same order of magnitude as the total information content of the human genome...of every living human combined. I'm not a biologist, so the extent of my knowledge of encoding the human genome comes down to "2-bits per base pair" and I'm not sure whether to count both pairs of all chromosomes. But if you do, I think it runs out to about 1.5 GB of information per human, so counting all ~7.3 billion of us, it hits around 11 exabytes.

Of course this is just sort of a comparison of the static storage capacity of Google, which doesn't really tell us the amount of transient data they process. I don't really feel like spending the time trying to research a good estimate for that, so I'll do the lazy thing and just point out that the estimate for global internet traffic per month is on the order of 70 exabytes/month. The fraction of that that passes through Google's servers is anyone's guess, but it should at least give a bit of a reference for just how much data that our civilization routinely tosses around.

Software engineers and the rest of the world.

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation