Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
mpasila 's Collections
not very positive datasets
Finnish fine-tunes
Japanese2English datasets
ExLlamaV2 quantizations
Finnish Instruct Datasets
Pre-training dataset prep
Magnum used datasets

Pre-training dataset prep

updated Oct 26, 2024

Some datasets I should probably use.

Upvote
-

  • JeanKaddour/minipile

    Viewer • Updated Jun 20, 2023 • 1.01M • 1.75k • 135

  • wikimedia/wikipedia

    Viewer • Updated Jan 9, 2024 • 61.6M • 70.3k • 1.05k

  • neuralwork/arxiver

    Viewer • Updated Nov 1, 2024 • 63.4k • 711 • 365

  • ohsuz/tiny-textbooks-edu

    Viewer • Updated Jun 11, 2024 • 3.31M • 158 • 2

  • ohsuz/tiny-code-textbooks-edu

    Viewer • Updated Jun 11, 2024 • 1.84M • 75 • 2
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs