Training Hours –
Llama 2(7b) ~ 180,000 GPU hours
~10b model ~ 100,000 GPU hours
Llama 2(70b) ~ 1,700,000 GPU hours.
~ 100b model ~ 1,000,000 GPU hours
Renting
Invidia A100 $1-2 per GPU per hour
10b model: $150,000
100b model: $1,500,000
Buying
Invidia A. 100: ~$10,000
-> GPU Cluser: ~$10,000 x 1000 = $10,000,000
Training Energy Cost (100b model): ~1,000 magawatt hour
Energy Price: $ ~$100 per megawatt hour
Marginal Energy cost (100b model):
1,000x$100 = $100,000
0.5 token
GPT-3 175b
2T token
Llama 2 70b
3.5T token
Falcon 180b
Trillion words = 1,000,000 novels = 1,000,000,000 news articles