Computers of the future : more like embedded systems or FPGA combinatorial logic?

SomeGuy12 · November 26, 2015

So I have been working on several moderately complex software systems recently. (a control system, a game, stuff like that)

I've noticed that there is a huuuuggggeee difference in reliability between a system where :

1. Some outsourced programmers slapped together a mess in a high level language that depends on a bunch of desktop libraries and interpreters
2. Somebody wrote a rigid, explicitly defined architecture with documentation (visio flowchart) for each module and a test script for each module

The system in #2 is like 100 times faster and more responsive and rarely fails. I've been able to add features to it that worked the first time I clicked the build+run button.

I've also started on a fault tolerant design where the client needs a piece of equipment that works at high temperatures. I've been using a CPLD. It's more work but not as bad as you'd expect...

So this got me thinking. The basic microprocessor architecture is there's an ALU and a set of control registers, and a block of memory. Processor loads instructions, opcode selects which circuits get used to determine the output, output is stored in a register, memory controller saves it back to memory later. For the 1970s, 1980s, etc, this basic architecture is one of the most efficient ways to use limited numbers of transistors. You use magnetic tapes and platters so the amount of memory needed is minimized.

Well, it doesn't have to be that way. My CPLD design, when the system is powered up, each module begins operating. At one end, SPI modules begin polling sensor ADCs for data. They hand the data to modules that act like dual ported memory, which pass the data to dedicated comparator modules that keep running integration counts of total signal magnitude, and then they pass the data to a module that implements a ring buffer in parallel ram.

It's only like 5 times harder than doing the same thing in C. And it has occurred to me that you could design vastly more complex systems using this method. You would do it with "bottom up" design, implementing well tested building blocks that you hook together to build large and complex software systems.

You could have network stacks, web browsers, email clients, the whole 9 yards.

The difference is that all the running software on a computer would occupy actual chip regions of specific FPGAs in the computer system. There would be a set of low level interfaces that very strictly enforce who gets to access which parts of storage and the I/O.

Why would you do this?

1. I think a rigid, careful design of the software in a computer system would result in stuff that is a lot more reliable.
2. Applications that are loaded into a chip would start up truly instantaneously, within the next video frame
3. It would be a lot more secure. It would literally be impossible to do most forms of computer hacking today. Data used internally by one application cannot be accessed by [I]anybody[/I] - not the OS, not other applications, nothing, because the internal data is literally not connected by LUT routing to anything else. (well, not directly - applications could still leak information due to faulty design but they get a chance to process the data before outputting it)

TLDR, a megacorp with enough resources could make a family of computers that start instantly, work every time, are almost perfectly reliable, and hackers can rarely do anything with them. I think they might have a big enough advantage in the marketplace to overcome their drawbacks - the drawbacks being that you must have a large number of expensive FPGA chips in each computer, and they need to be specific models. Newer computers would need to keep inside them a whole circuit-board full of "legacy" chips in order to run older software. It would cost about 5 to 10 times as much money to develop software of the same complexity.

But it's not just desktop pcs/tablets/laptops that would benefit.

Automated cars basically have to be built this way. Maybe "have to" is a bit strong, but it's the correct solution. The microprocessor/microcontroller architectures take too long to restart and have a lot more failure modes. It would never have been possible for the recent round of hacks to be done had they used automotive CPLDs instead. Edited November 26, 2015 by SomeGuy12

magnemoe · November 26, 2015

Problem with this design is that it would be impossible to add new software.
Also that it does not scale up well to complex programs.
It looks like things move in the opposite direction, GPU are getting more and more flexible while they earlier was far more designed for real time rendering only.

You are however right about the benefits.

Yourself · November 26, 2015

[quote name='SomeGuy12']
1. I think a rigid, careful design of the software in a computer system would result in stuff that is a lot more reliable. [/quote]

Well, that's pretty much true regardless of the underlying hardware. The problem is it isn't cost effective or practical. Software engineering has deadlines and budgets and that's what cuts into our ability to create reliable and robust software. Another thing to keep in mind is that cost of testing software does not grow linearly with the complexity of that software. It's a combinatorial problem, if I stick two modules together the number of states everything can be in doesn't double, it squares.

[quote]2. Applications that are loaded into a chip would start up truly instantaneously, within the next video frame[/quote]

I'm working on an application right now at work that takes about 20 seconds to start up. We haven't really optimized it so we could probably cut that time down immensely, probably in the realm of 10-15 seconds. I don't imagine it'll get much faster than that, though, because there's stuff it just [b]must[/b] do before starting and that includes reading through about 1GB worth of data and processing it into appropriate searchable data structures and building geometric primitives for displaying it (on a map). I don't see how your solution can have enough of an impact on execution time or I/O to improve that necessary load time.

[quote]3. It would be a lot more secure. It would literally be impossible to do most forms of computer hacking today. Data used internally by one application cannot be accessed by [I]anybody[/I] - not the OS, not other applications, nothing, because the internal data is literally not connected by LUT routing to anything else. (well, not directly - applications could still leak information due to faulty design but they get a chance to process the data before outputting it)[/QUOTE]

If it takes input and produces output, it is not automatically secure. Security is hard. Like really hard. It's borderline impossible to do it right because of the combinatorial problem of test coverage. The effort required to test software increases faster than the complexity of that software. Even with separate modules tested to 100% with unit tests, there's still integration tests you have to consider, because unit tests don't tell you how modules can interact with each other.

We have that exact sort of design at work, most of our new software architectures are highly modular, because it's the best you can do to simplify complicated software. It is definitely a more maintainable approach, but it's not a magic bullet. You can be absolutely sure that each module works 100% correctly in isolation, but as soon as you integrate them into a larger system, they can start interacting with each other in unforeseen ways.

SomeGuy12 · November 26, 2015

magnemoe : you can add new software so long as you have remaining area on one of the FPGAs in your computer, or are willing to shuffle automatically. I don't see any technical reason that would make complex programs "impossible", just more expensive to create.

"Yourself", what I meant by instant start is that the application would basically be "running" all the time because the computer system gives physical chip area to each installed application. So you aren't really "starting" the application, you are actually just making it active again.

[COLOR="silver"][SIZE=1]- - - Updated - - -[/SIZE][/COLOR]

[quote name='Yourself']
I'm working on an application right now at work that takes about 20 seconds to start up. We haven't really optimized it so we could probably cut that time down immensely, probably in the realm of 10-15 seconds. I don't imagine it'll get much faster than that, though, because there's stuff it just [b]must[/b] do before starting and that includes reading through about 1GB worth of data and processing it into appropriate searchable data structures and building geometric primitives for displaying it (on a map). I don't see how your solution can have enough of an impact on execution time or I/O to improve that necessary load time..[/QUOTE]

You could in principle pause your application after starting up and copy the data in memory to disk directly. Startup would occur the opposite way. This would take 5-10 seconds for users on conventional hard drives if there is a gigabyte of memory in use (100 megs a second, sequential read) and possible 2-3 seconds for SSD users. I don't know of any applications that do this, although Windows does this.

Jouni · November 26, 2015

New computer architectures will be widely adopted only if they make programming easier. If the only improve security, performance, or another less important aspect, they will remain niche solutions for specific problems.

GPUs are a good example of this. Five years ago, they were the future. Then we equipped our supercomputers and computer clusters with them and learned their limitations. They were essentially computers with a large number of slow cores and a small amount of fast memory. Programming them was slow and difficult, and the hardware kept changing all the time. In the end, ordinary computers with a smaller number of faster cores and a larger amount of memory were better for most tasks.

Yourself · November 26, 2015

[quote name='SomeGuy12']You could in principle pause your application after starting up and copy the data in memory to disk directly. Startup would occur the opposite way. This would take 5-10 seconds for users on conventional hard drives if there is a gigabyte of memory in use (100 megs a second, sequential read) and possible 2-3 seconds for SSD users. I don't know of any applications that do this, although Windows does this.[/QUOTE]

In fact that's one of the things we're going to try doing. At least in a sense. It won't be an exact memory dump (which isn't practical for various reasons), but the idea of caching the data in a form closer to what's needed by the application is something we've thought of. There is some trickiness here because the original data can change between runs of the application, so we have to build in some kind of timestamp that allows us to determine when we should read the processed data and when we should read the unprocessed data. But now we're starting to make things more complicated.

Of course then you might wonder why we don't just pre-process all the data and just read it that way. That comes with its own costs because now we have to develop a new tool chain to support that. And, if we decide to change our minds about what the processed data looks like later it's not so easy to do that, because we'd have to back that change into our tool chain and processes.

I can tell you that anything that is more difficult or time consuming to develop is definitely not going to catch on. I don't think I've met an engineer that didn't want to spend all the time they could making something perfect, but at some point you have to finish something. Not to mention that an engineer's time is expensive. Really really expensive. So the most important feature of just about any system is engineering efficiency. It ultimately all boils down to minimizing development and maintenance time (especially maintenance because it is, by a wide margin, the largest time sink).

[quote name='Jouni']GPUs are a good example of this. Five years ago, they were the future. Then we equipped our supercomputers and computer clusters with them and learned their limitations. They were essentially computers with a large number of slow cores and a small amount of fast memory. Programming them was slow and difficult, and the hardware kept changing all the time. In the end, ordinary computers with a smaller number of faster cores and a larger amount of memory were better for most tasks.[/QUOTE]

I'd say that future came true. GPUs are the go-to hardware for massively parallel computation, because that's exactly what they're good at, this is why we have compute shaders now; GPUs have gone general purpose. I don't recall there being any particular notion that they'd replace CPUs entirely, but they have taken on a lot of heavy computation that CPUs used to do. Modern GPUs are a pretty critical component of a computer. Even the OS UI is hardware accelerated nowadays. You'd be hard pressed to find a software renderer in anything anymore.

In fact, one of the things I'd like to try is developing a world-wide weather model for our flight simulators that runs on the GPU. The GPU is probably the only practical option we have. It may also be worthwhile to move our weather radar model over to the GPU as well. Right now we have to dedicate a a whole CPU core to it and even then we're barely meeting our required frame times.

Jouni · November 26, 2015

[quote name='Yourself']I'd say that future came true. GPUs are the go-to hardware for massively parallel computation, because that's exactly what they're good at, this is why we have compute shaders now; GPUs have gone general purpose.[/QUOTE]

Their niche is much smaller than that. If you feel that Fortran or Matlab would be an appropriate tool for your task, then you should consider using a GPU. Otherwise it's probably not cost-effective. GPUs are good for processing numerical data, but much worse with combinatorial data. In the latter case, computing units with 10x less cores and 10x more memory would be much more useful.

I work in a genomics/bioinformatics institute with several local computing clusters with a total of almost 20000 cores. As far as I know, the only GPUs there are the ones in our laptops and desktops. Some tools can benefit from GPUs, but they're rare enough that it's better to rent the capacity from Amazon, instead of having dedicated systems of our own.

[QUOTE]Even the OS UI is hardware accelerated nowadays. You'd be hard pressed to find a software renderer in anything anymore.[/QUOTE]

Hardware accelerated UI was common already in the 90s. The processors were so slow back then that you couldn't have acceptable performance otherwise.

Nuke · November 26, 2015

intel recently bought out altera, one of the major fpga producers, plans are to make xeons with on die fpgas that can be used to accelerate certain algorithms to help server side applications. fpga's real benefit is versatility, it can be a robust single core processor one minute and a vector processor the next. you need to emulate an out of production legacy chip, you can, you need instructions for non standard data formats, it can do that too. you pretty much get to decide what kind of chip it is. downside is that an fpga is significantly slower than a processor or asic. not so much slow as cant support higher clock rates than hundreds of mhz (because of the increased propagation delays caused by internal routing of signals), you can still do stuff in parallel though. for some applications having a bank of gpus is better (ultimate solutions might stick a cpu, fpga, gpu, and neural net all on one die). should also point out that fpgas arent any more expensive than cpus or gpus. low end dev boards exist for $50 or less, not much more than an arduino or raspberry pi, and at the high end they can rival a high end cpu in terms of cost.

its always going to be flexibility vs performance. you can be versatile with a bunch of virtual machines, but it is a serious performance hit over less convenient cross compiled code. typical users have more cpu than they need and so giving them interpreted software that is easier to develop certainly cuts down the cost. however in a server farm you are always going to want to run as close to the metal as possible to do the same job with less servers in order to cut costs. the lines are blurred in web applications where you need server side scripting environments but thats more of an exception to the rule.

Sign In

Computers of the future : more like embedded systems or FPGA combinatorial logic?

Recommended Posts

SomeGuy12

Link to comment

Share on other sites

magnemoe

Link to comment

Share on other sites

Yourself

Link to comment

Share on other sites

SomeGuy12

Link to comment

Share on other sites

Jouni

Link to comment

Share on other sites

Yourself

Link to comment

Share on other sites

Jouni

Link to comment

Share on other sites

Nuke

Link to comment

Share on other sites

Join the conversation

Forum

Activity

Community

Mods

Social Media