2017-06-04
You asked if C/C++ will have a comeback if speed will be more important for ML. The short answer is no, because programmer time is more expensive than CPU time, so programmer-friendly languages will always win as a best practice. I think I covered this in my last essay, but I failed to fully convince you, so here is a different angle, with a conclusion at the end.
Note that the ideas below are not some shared view of the IT community, simply my view of the future. While I see it as inevitable, it is specuation nevertheless.
We reached the limits of what individual computers can do. Or did we? What are FPGAs? What's the untapped opportunity in them for the mass market? What prevented us from unlocking it already? How can we finally unlock this power? Finally, why AI needs FGPAs so badly: the artificial brain.
FPGA stands for Field-Programmable Gate Array. Yeah, I know, that’s meaningless, so let’s try an analogy.
I assume you know what 3D printing is. It is rapid prototyping for objects. FPGAs are rapid prototyping for processors.
Oversimplified, a processor contains logic gates and wires between them. While there are many kinds of logic gates, there is a rule that from one specific type and lots of wires, you can build any kind of processor, or logic circuit. From this follows, that not the logic gate is what matters, but how you compose them, alas the wires.
An FPGA contains loads of this one logic gate. Think of a gate array. Remember, it's called Field-Programmable Gate Array! As for the wires, FPGAs don't contain fixed wires, but a mechanism to dynamically define how those logic gates are connected. That configuration can be loaded onto the FPGA, in form of a configuration file. You can even change it later. That's called programming the FPGA. So by now we have covered 75% of the name, programmable gate arrays. Field-Programmable simply means, that the wiring is not defined (programmed) in the factory, when the processor is manufactured, rather, it is defined in the field.
Let it sink into your brain, because these few words will soon change the world: Field-Programmable Gate Arrays. Now, they kind of make sense, don't they? :)
If you want to know more, drop by at Wikipedia: Field-programmable gate array.
To understand the future of the computing, let's start in its past.
Single core processors calculate results in a sequence. Instructions are executed one after the other. These instructions are extremely simple, like adding up 2 numbers, or multiplying them, etc.
The interesting part to emphasize is, that at any given moment, only a tiny part of the processor is actually working! It is hugely inefficient. To add up two numbers, it has a dedicated hardware component in it that can do just that: add up two numbers. Each processor supports a list of such instructions, of which only one is working at any given moment.
For a while, processors were made more efficient by making them faster. But as we reached the physical limits, we needed an alternative route forward.
The most straightforward solution is, if we can't make them run faster, let's pack more of them into a single computer! Hence we have 2-core, 4-core, etc processors. At the time of this writing, as far as I know, top-of-the-shelf is in the 32-core range.
We will quickly reach the limits here as well. Partly because processors are not the bottleneck for many applications. Partly because we simply can't keep doubling the size of the processor forever due to physical constrains.
Partly, in some applications, double the performance just doesn't cut it. They need far greater performance advantages, like in the 1000000x (million times) range. Doubling the number of cores will never you get there.
If you really want to squeeze each drop of performance out of a silicon chip, you don't buy a generic one. In fact you don't buy one. You build it yourself.
I'm sure you have heard of Bitcoin. In bitcoin mining, computation efficiency is king. If you can solve Bitcoin equations faster, you are rich. It turns out, a purpose built processor can solve these equations about 1M times faster. One. Million. Times. Holly crap! =o
Too bad designing and manufacturing processors from scratch is so hard/expensive. If only...
FPGAs give you most of the benefits of special-purpose processors, for a fraction of the cost. They are about 10x slower, but that means an FPGA based bitcoin miner is still 100k times faster than a processor based one.
That's impressive! Well, assuming you want to do just that. For the rest of us, that's totally irrelevant. The real questions are..
How will it make Excel run faster? How will it render my videos faster? What can I do with this that I couldn't do before?
Today, FPGAs are still special purpose systems, not used by the mainstream. You have a processor, you plug in an FPGA, you program the FPGA in a special language (or hire super expensive developers to do it for you), you configure its wiring, than you give it a task and it gives you the results much faster than any generic processor could do it for you.
In other words, FPGAs are an optimization, not the best practice.
In 20-30 years, every computer could have an FPGA inside it, and they could easily make our computers many orders of magnitudes faster. Some applications could run, as said, 100k times faster. Or more.
Generic processors offer many generic, super simple instructions. They can be calculated in 1 or a few clock cycles. FPGAs can be configured to perform complex, special purpose instructions, in the same amount of time, a few clock cycles.
So, if a program that runs on a generic CPU requires 1 million clock cycles to give you the result, an FPGA equivalent may give you the same result in, say, 10 clock cycles.
That's impressive, but not today's reality, so what's missing?
The programming languages we have traditionally been using are simply not up for the task. They were designed for sequential computation, and FPGAs are running parallel computation. You simply can't take a traditional program, that was writtin in, say, C/C++ or Java, and translate it to run on an FPGA. Interestingly, Multiple core processors also run into a light version of similar difficulties, thus a new breed of programming languages, called functional programming languages are gaining foothold. (Sounds familiar, right?) These programs can be executed both on sequential CPUs and FPGAs. One such example is Haskell.
The programming languages available for programming FPGAs have traditionally been astronomically more complex than programming a sequential CPU. Thus, talent was simply not available.
As functional programming languages like Haskell are gaining popularity, the number of programmers is increasing, while the complexity of writing programs that run on FPGAs is dramatically falling.
It's one thing to run your program 100000x faster, but speed is not always a bottleneck. Development time and cost are often more important, especially in the beginning. But if you got started programming a sequential CPU, and project really takes off, when do you switch? You throw out all your people and code and start from scratch? Nope. You don't switch. You go with the flow and stick to using generic CPUs.
Every program in a computer is using the memory, but FPGAs were not designed to be a shared resource. An FPGA is a special purpose device, used for a singular purpose, at least today. But this could change...
With little modification, the FPGA could be used by many programs on a computer. Just like a program can allocate some memory for its own use, a program could allocate part of the FPGA for its own use. That way, it could load super-instructions onto the FPGA and use it to speed up its execution.
I say it could be the new memory, because FPGAs would find themselves in similar market conditions. 1) Writing memory takes time. Thus, memory size matters. Writing the FPGA takes time. Thus, FPGA size matters. 2) Memory response time matters. FPGA response time matters.
Effectively, the FPGA industry could find itself replaying the history of the memory industry: the race to build faster & bigger units.
For such a product, the biggest barrier to entry was the chicken-and-egg problem: for developers to write programs that utilize FPGAs, a critical mass of customers must have FPGAs first. For customers, to buy FPGAs, their pet programs must utilize them.
Fortunately, being able to write a single program solves the issue. Software vendors, that write computation heavy programs, could offer their customers an upgrade: get an FPGA to make our program run in 5 seconds instead of 5 hours. Some special case customers will buy an FPGA. At which point, they will start asking their other software vendors to offer FPGA accelerated versions of their programs. And the snowball effect begins.
This sounds all good, except for one problem. The FPGA still can't be used as a shared resource. And integrating all the pieces together is a super complex problem. So, all other problems being effectively sorted out, which vendor will be the one to provide a solution?
Intel comes to mind as a natural candidate. Yet Intel is a player in processors, not FPGAs, so I researched it a bit, and it seems the biggest FPGA players are Xilinx and Altera. As I see Xilinx has about 20% bigger market share, roughly stable between 2010-2014. And now the interesting part. Intel acquired Altera in 2015.
Will Intel make a transition again? Originally, they were a memory company. Then, they became a processor company. Now, they could be the FPGA-processor company? (Now it sounds funny to say that FPGAs are the new memory.)
Think of our brains. We have one thought at a time, so our thinking runs on a "sequential CPU". Yet at the same time, we have lots of systems running in the background. Because we don't think about them consciously, they are like hardware support for us. We also know, that those "background systems" do change over time. Think of muscle memory. When you learn to drive a car, with practice, you are effectively programming a subsystem of your brain to handle driving the car for you. Then you give it higher level instructions, by thinking of where you want to drive, but it is much less effort then the first time you tried to control the car.
FPGAs could provide a platform for similar subsystems for AI controlled machines. Nature was way more successful than us at creating intelligence. Maybe we should borrow some of the big ideas.
So, to conclude, if the above materializes, C/C++ won't have a comeback, because extending our computers with FPGAs will outdate the very requirements C/C++ were designed for. There is far more leverage in utilizing FPGAs to drive performance. Even better, it can be a best practice, as opposed to using C/C++ for optimizations only.