all 58 comments

[–]ryuukyuuimo 11 points12 points  (7 children)

Would also like to know the power requirements.

[–]RoboTeddy4k[S] 31 points32 points  (6 children)

Hah, yeah. Lee Sedol's brain is running on ~20 watts...

[–]rawrnnn 13 points14 points  (4 children)

For 32 years

[–]RoboTeddy4k[S] 6 points7 points  (2 children)

Just 20 watts while playing, though! If we want to compare lifetime use, we gotta include the energy spent training AlphaGo (which is likely immense)

[–]SexyIsMyMiddleName 3 points4 points  (0 children)

I want both to run on 10 watts and see who wins!

[–]j_heg 1 point2 points  (0 children)

The results of the training are reproducible by pure data copying, which only uses negligible energy, though. Now if you create a population of divergent bots with the same initial training, you could amortize the energy requirements per bot quite greatly.

[–]disgr4ce 1 point2 points  (0 children)

^ All of my upvotes to you. People need to understand this.

[–]Moo3 2 points3 points  (0 children)

When Lee left the room during match 2, someone on a Chines forum said " he's not out smoking a cigarette. He's looking for the power mains!" lol

[–]sanxiyn 20 points21 points  (15 children)

DeepMind stated that it's exactly same hardware configuration.

[–]s-mores1k 28 points29 points  (13 children)

Economist has an update on the 2nd match, there they state 1920 CPUs and 280 GPUs.

[–]FrellPumpkin 6 points7 points  (6 children)

Your sure they are running it on that rack? The thing with neural networks is, it's really demanding to build and train them, it takes tons of ressources and computing time. But when it's done running the finished network is if you compare it really lightweight. For example all these semi self driving cars like Tesla and so on, they don't have dozens for GPUs in their trunks. Thats the charm of neural networks.

[–]ScuttleCrab14k 13 points14 points  (1 child)

Don't forget that AlphaGo does Monte Carlo tree search, too. And because it uses a second neural network to evaluate positions generated by MCTS, it probably needs to run that value network many times in parallel.

[–]FrellPumpkin 1 point2 points  (0 children)

good point!

[–]s-mores1k 2 points3 points  (3 children)

Yes, I'm sure. Somewhere it was mentioned that just running AlphaGo at this level costs upwards of $1000/hour.

Sure, any single evaluation is cheap, but they're running it as the evaluation part of a MCTS engine, so they're burning a lot of resources. That said, hats off to the engineer who designed the load distribution algorithms, because to be able to bump up the capacity by 50% from the Fan Hui match is impressive.

[–]notlogic 13 points14 points  (1 child)

Fun note... It's five times more expensive per hour to run a Lee Sedol, as he gets $20,000 per loss.

[–]bittered 2 points3 points  (0 children)

No, he got a flat $150,000 appearance fee and $20,000 per win (nothing for a loss). That's $170,000 total. AlphaGo got the overall winnings of $1,000,000 (which was donated to charity).

[–]pixl_graphix 4 points5 points  (0 children)

That said, hats off to the engineer who designed the load distribution algorithms

This is something that Google excels at. 15 years ago they were hiring every data scientist they could find and building one of the largest distributed computing networks on earth (their search platform). Google is also one of the leaders in power optimization and control in the datacenter.

As for the cost per hour, at this point Google cares very little about it. Youtube for example, was costing Google millions of dollars per day that until recently they were not making back in ad revenue. If this entire operation were costing Google a million a day, they would see it as a good investment, since if they create a workable AGI, it will earn them billions if not more.

[–]darkmighty18k 1 point2 points  (0 children)

Ah a number finally! Thanks you, I were very curious. This makes the learning very impressive.

[–]KillerDucky 0 points1 point  (2 children)

Maybe they edited the article, because I don't see that quote there now.

[–]s-mores1k 0 points1 point  (1 child)

Really? It's there for me.

The version playing against Mr Lee uses 1,920 standard processor chips and 280 special ones developed originally to produce graphics for video games—a particularly demanding task.

[–]KillerDucky 0 points1 point  (0 children)

Ah oops, I searched 1920 (no comma!), CPU, and hardware... oops :)

[–]funkiestj -4 points-3 points  (1 child)

from the economist article:

“Lee is a genius who is constantly creating new moves; what machine can replicate that?” asked one. At a pre-match press conference Mr Lee said he was confident he would win 5-0, or perhaps 4-1.

Ha ha ha. Don't give AlphaGo access to a 3-d printer -- it might start building terminators.

[–]ultimatt42 6 points7 points  (0 children)

Actually it prefers Gobots.

[–]xcombelle2k 7 points8 points  (0 children)

I think they said similar, not exactly same

[–]pookpooi 13 points14 points  (17 children)

It's 1900 CPU and 280 GPU from the economist article

[–]kenyal30k 3 points4 points  (16 children)

I'm sorry for this newbie question

why is gpu important here

is it for graphical

[–]MrPapillon 13 points14 points  (0 children)

GPUs aren't graphical only anymore. GPUs are specific as they can compute things massively in parallel. Imagine thousands things in parallel instead of just a few for CPUs.

[–]siblbombs 4 points5 points  (7 children)

Almost all of the things we do in deep learning boil down to mathematical operations, matrix multiplications etc...

AlphaGo relies on Convolutional networks to 'see' the board, this architecture has been pretty popular for the past several years and has been heavily optimized to work on graphics cards. Nvidia has even started developing the code to do these operations themselves, most people just use Nvidia's implementation these days (idk if the article says it, but there's pretty much a 100% chance Google is using Nvidia GPUs).

[–]ergzay 2 points3 points  (6 children)

This is the much better answer than /u/MrPapillon and /u/KapteeniJ .

[–]MrPapillon 2 points3 points  (5 children)

I beg to differ. CUDA is vastly superior to OpenCL in terms of GPU general computing, so NVidia GPUs is often the default choice for this stuff as CUDA is still NVidia specific. Also if you manage to heavily parallelize some algorithm, GPUs offer cheap efficient power by its massive multithreading ability. I don't know of the specifics for convolutional networks and their current implementations, but GPUs can be rapidly vastly superior to CPUs if you are not affected by their constraints. For example, even path tracing that somehow goes against GPUs' philosophy of avoiding branching, still outperform CPUs by far because of the big advantage of the efficient heavily multithreaded hardware. So having NVidia's implementation of a convolutional network or not, GPUs are often a winning choice for scientific or engineering computing if possible.

[–]ergzay 3 points4 points  (1 child)

Neither your post nor his post mentions CUDA or OpenCL so I'm not sure why you're talking about those.

[–]MrPapillon 1 point2 points  (0 children)

I answer to:

  • why GPUs.
  • why NVidias.

The initial question was "why GPUs, aren't they for graphical stuff only?". My answer is a global answer to that question, siblbombs's answer is an interesting answer, but too specific. It provides interesting info on AlphaGo, but not on why GPUs are used nowadays.

[–]j_heg 2 points3 points  (2 children)

CUDA is vastly superior to OpenCL in terms of GPU general computing

At least if you're an nVidia salesman for sure.

[–]MrPapillon 0 points1 point  (1 child)

Before OpenCL 2.0, that was a strong case. Poor flexibility for OpenCL, bad performance on both ATI and NVidia cards, OpenCL having a poor debugging environment, and a lot of other details. With OpenCL 2.0 and probably new improvements, OpenCL might have removed some of its weaknesses, but that would be a new situation and CUDA had already many large advantages for years while OpenCL only few.

[–]j_heg 1 point2 points  (0 children)

Even better, APIs such as HSA or Vulkan remove the "use our weird language" limitation and should allow for writing compiler backends. That's even without considering the proprietary nature of CUDA and nVidia's hardware's (at least until recently) poor performance on integers and some other things.

[–]KapteeniJ3d 4 points5 points  (0 children)

Gpus are capable of hundred times as many calculations per second, compared to cpus. The problem is, the computation they can do is very restricted, usually only applicable to things like rendering graphics. However, plenty of programs have found ways to utilize that ridiculously high computational power to boost the programs they write.

[–]Creative-Name -1 points0 points  (2 children)

GPUs can also be used as standard processors, and can be quite a lot faster than a standard CPU. The issue is they're harder to program for general computing, as they are intended for graphics

[–]MrMoenty 17 points18 points  (0 children)

GPUs are good for highly parallel computations, as they feature a high number of processing cores. For instance, nVidia's GTX titan contains no less than 2688 of them!

The downside is that, unlike general purpose CPUs, a GPU's cores can't all be programmed individually and have to be doing roughly the same thing in order to be effective. However, they work very well for highly parallel computations with a lot of structure to them, like computer graphics (duh), matrix multiplications or evaluating neural networks.

[–]LichenSymbiont 1 point2 points  (0 children)

Furthermore, they are like extensions, of mini motherboards (good that I caught the autocorrect here, as it translated motherboard to "another birds"), with just a processor to crunch calculations.

[–]gabjuasfijwee 12 points13 points  (0 children)

They have a gymnasium full of teenagers with abacuses

[–]mungedexpress 8 points9 points  (4 children)

To build the neural "network" weights, it took an enormous amount of computing power for Alpha Go to play against itself millions of times. This was after it was trained. This is what truly set DeepMind apart from others because it has the resources to pursue otherwise impossible scenarios in developing their AI(s).

Likely they had to go through several iterations. My guess is it costs several million dollars of compute resources (they likely got a huge discount). For anyone outside of Google to train something like that can go over millions of dollars easy to train an AI with self-reinforcement training.

Then based on their calculations of the boundaries for a 2 hour game, they set it at ~1200 CPUs and ~170 GPUs.

Of course that is all hear-say until this can be verified by others.

edit: As an estimate for the cost of the most expensive part of developing the core neural net. If they have to train one million games in one week using the same setup in matches, it would require:

28,569,600 CPUs

4,047,360 GPUs

That's about 53.5MW in power alone assuming 16-core processors at 30W each processor, just for the CPUs alone for one million games.

They ran several millions of games. I wouldn't even be surprised if their power requirement was close to a requiring a power plant.

[–]L0K3N 7 points8 points  (1 child)

LOL Tianhe-2 doesn't even have 3 million Cores. You are overestimating by a couple orders of magnitude at least.

[–]j_heg 0 points1 point  (0 children)

Not sure how Tianhe-2 entered the picture, but Google absolutely has to have millions of CPU sockets in operation by now.

[–]icydocking 7 points8 points  (0 children)

28,569,600 CPUs 4,047,360 GPUs That's about 53.5MW in power alone assuming 16-core processors at 30W each processor, just for the CPUs alone for one million games. They ran several millions of games. I wouldn't even be surprised if their power requirement was close to a requiring a power plant.

What? How do you arrive at these numbers? Are you saying that you think it takes 2 hours for AlphaGo to learn from one game using the given CPUs and GPUs?

[–]hivemind_downvote 4 points5 points  (0 children)

In one of the articles linked off the mega thread it said they used cheaper variants of their networks when playing the training games. It's also likely they didn't give themselves as much time.

That said it's still likely a staggering amount of resources.

[–]cloudone 1 point2 points  (0 children)

Yes. But a lot of resources were used for training since the October match.

[–]tolesto 2 points3 points  (8 children)

i think that's what they used to train it, not what they used to play against fan hui, i think it was about 10 times less hardware.

[–]KaitoIris 7 points8 points  (7 children)

The paper says they used 1202 CPUs and 176 GPUs to evaluate already pre computed neural networks (in particular, they state that evaluating these networks is much more expensive than traditional searches like just doing MCTS). They also stated that they used the 1202 CPU version to play against Fan Hui.

[–]heptara 4 points5 points  (4 children)

1202 x 100 + 176 x 210 is USD $25 an hour for electricity?

Go players wishing to improve their game would certainly consider this hourly rate (after suitable markup) affordable to gain access to the system.

[–]KaitoIris 7 points8 points  (3 children)

You also need the hardware. I had a quick look at amazon cloud prices (assuming that their servers are actually suited for running AlphaGo; you might need other hardware) and it seems like an hour of running AlphaGo would cost about $80. While it is affordable, I guess the average player is better off just taking lessions from a good teacher for this price. Might be interesting for high level pros, though.

[–]siblbombs 2 points3 points  (2 children)

The amazon GPU instances are really not that great at this point. I think you could get a scaled down version of AlphaGo running on about $6k worth of hardware (as many TitanX graphics cards as you can fit in a beefy desktop) and it would be competitive at the amateur level.

[–]KaitoIris 2 points3 points  (1 child)

But then you could just buy CrazyStone for $80, run it on a decent PC, and you already have a ~6d Bot. No need for AlphaGo if you just want to play on high amateur level.

[–]siblbombs 1 point2 points  (0 children)

True, but I'm not very qualified to try and assess the rank of AlphaGo since I don't play :)

DM could ditch the search step entirely and just go with the policy network, that approach beat Pachi 85% of the time. They could also drastically reduce the amount of search that the program does, in which case you could run it on a current desktop that has a Nvidia 970, but idk how strong it would be.

[–]tolesto 2 points3 points  (1 child)

guess there must have been some confusion when i was reading peoples thought on it, but it appears that each individual machine has 48 cpus and 8 gpus, which would make it around 22 machines.

[–]KaitoIris 4 points5 points  (0 children)

Maybe the confusion stems from the fact that there are two versions of AlphaGo: An asynchronous one, which uses 48 CPUs and 8 GPUs (probably in a single computer), and a distributed one, which uses 1202 CPUs and 176 GPUs. Google used the former to play against bots, while the latter was used against Fan Hui.

[–]qunow -3 points-2 points  (0 children)

I suppose Google isn't using the distributed version in this match?