Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,8 @@ datasets:
|
|
| 10 |
|
| 11 |
---
|
| 12 |
|
| 13 |
-
- [Version
|
|
|
|
| 14 |
|
| 15 |
# BERTIN GPT-J-6B
|
| 16 |
|
|
@@ -52,7 +53,7 @@ BERTIN-GPT-J-6B was finetuned on [mC4-es-sampled (gaussian)](https://huggingface
|
|
| 52 |
|
| 53 |
## Training procedure
|
| 54 |
|
| 55 |
-
This model was finetuned for
|
| 56 |
|
| 57 |
## Intended Use and Limitations
|
| 58 |
|
|
|
|
| 10 |
|
| 11 |
---
|
| 12 |
|
| 13 |
+
- [Version v1beta2](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2): June 6th, 2022 (*[full](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2) and [half-precision weights](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2-half)*)
|
| 14 |
+
- [Version v1beta1](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta1-half): April 28th, 2022 (*half-precision weights only*)
|
| 15 |
|
| 16 |
# BERTIN GPT-J-6B
|
| 17 |
|
|
|
|
| 53 |
|
| 54 |
## Training procedure
|
| 55 |
|
| 56 |
+
This model was finetuned for 40 billion tokens (40,384,790,528) over 616,000 steps on a single TPU v3-8 VM. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
|
| 57 |
|
| 58 |
## Intended Use and Limitations
|
| 59 |
|