I’m back from the 44th International Symposium on Computer Architecture, and this is a perfect time for me to summarise my thoughts on the conference.
The conference was in Toronto, which was a refreshing change for me to see correct spelling and sensible units for a change. Beyond that, the conference had a lot of interesting developments, and some that were not quite as interesting.
First, let me address the 15-month elephant in the room. That’s Google’s Tensor processing unit (TPU) paper. I wasn’t impressed with the paper (although I am impressed by the engineering), although a lot of people seemed to be impressed by that paper. Indeed, only one other person I spoke with at the conference seemed to share my views on the matter. My criticism of the TPU paper is that it really gives little information. An application-specific integrated circuit (ASIC) will obviously have much less power/energy and much higher performance over a general purpose processor. The really interesting parts of the TPU would have been the Tensorflow-to-control-instructions compiler and driver. Unfortunately, these details still remain elusive. In fact, the whole paper describes (or fails to describe) technology that is over four years old and has already been replaced. In my opinion, I find more information in Google’s Project Zero blog than I did in the TPU paper and the associated talk.
Which brings me to the next bit. ISCA had a ‘Trends in Machine Learning’ workshop, which I found as, if not more interesting than the main conference. There were some really cool demos, such as real-time neural networks running on embedded devices such as an iPhone and a Raspberry Pi, ‘Clinc’, which can process natural speech and respond to queries on your finances, DeepSpeech by Baidu, amongst others. The trend towards machine learning was apparent even through the main program, with multiple papers on accelerating neural networks.
Just like any other conference, there were also some presentations that had me completely zoned out. There were some that felt just like a rehash of old ideas, and some which left me scratching my head.
The overarching theme of the keynotes and the panel discussion, however, were on the inevitable end of Moore’s law, with Mark Bohr from Intel claiming that the law was merely tired and shagged out after a long squawk, with plots showing that the number of transistors are indeed doubling as expected from Moore’s law. Partha Ranganathan from Google, on the other hand pointed out that the law had joined the choir invisible. Partha argued that this was indeed a fun time to be an architect, to talk to those annoying people working with software to co-design hardware and software with the point of unlocking more potential.
In fact, if there is one message that I would take away from the conference, it is that we computer architects have to fundamentally change the way we look at our job. For years, computer architects were perfectly happy using the extra transistors that the devices folks gave us to make faster computers, and the evil people working in software would take away this performance through even more bloated software. Now, the pipeline has dried up, the devices people are not able to give us faster and smaller transistors, and they definitely cannot give us more power-efficient transistors because Dennard scaling is almost certainly dead. As a result, we architects have to find ways to use these transistors more efficiently. This means talking to the software gremlins, understanding their evil algorithms and implementing them in beautiful silicon. The future is almost certainly in going green by reducing our power requirements and in grudgingly enabling the software people to unlock greater functionality, not by relying on faster computers, but by relying on custom, bespoke hardware that can run their algorithms in an efficient manner.
How such bespoke hardware should be deployed remains a challenge. We can almost certainly not sell chips for mobile phones with the area of a football field with billions of custom accelerators that are almost always turned off except for a few running a custom app. My opinion is that the best way to deploy these accelerators in the present moment is in datacentres, to provide them to users as a service. Google is already making great headway by allowing people to rent cloud machines with TPUs (I still dislike the TPU paper 😄) and to use TensorFlow to accelerate their workloads. I could envision cloud providers allowing people to time-multiplex multiple accelerators in some sort of mutual-fund or Massdrop like cloud service.
Or maybe the future is in taking a step back and rethink our obsession with Von Neumann machines with variants of the five-stage pipeline and redraw our computers from scratch. I think it’s an exciting time to be an architect, and also scary. As a PhD candidate, I have to try really hard to look at the exciting bits and not the scary ones. 😄