Finetune Nomic-ai-embedding using SentenceTransformersFinetuneEngine

#19

by Miheer29 - opened May 8, 2024

May 8, 2024

Im trying to finetune Nomic-ai-embedding using SentenceTransformersFinetuneEngine and am running into an issue:

from llama_index.finetuning import SentenceTransformersFinetuneEngine

finetune_engine = SentenceTransformersFinetuneEngine(
train_dataset, # Dataset to be trained on
model_id="nomic-ai/nomic-embed-text-v1", # HuggingFace reference to base embeddings model
model_output_path="llama_model_v1", # Output directory for fine-tuned embeddings model
val_dataset=test_dataset, # Dataset to validate on
epochs=2, # Number of Epochs to train for
)

Error:

Tried these steps but it didnt work:

You can first download the model to a local directory. Then, you can download these two files and also place them in the repository:

https://huggingface.co/nomic-ai/nomic-embed-text-v1/blob/main/modeling_hf_nomic_bert.py
https://huggingface.co/nomic-ai/nomic-embed-text-v1/blob/main/configuration_hf_nomic_bert.py
Then, you must update your local config.json to no longer say:

"auto_map": {
"AutoConfig": "nomic-ai/nomic-embed-text-v1--configuration_hf_nomic_bert.NomicBertConfig",
"AutoModel": "nomic-ai/nomic-embed-text-v1--modeling_hf_nomic_bert.NomicBertModel",
"AutoModelForMaskedLM": "nomic-ai/nomic-bert-2048--modeling_hf_nomic_bert.NomicBertForPreTraining"
},
but instead to say:

"auto_map": {
"AutoConfig": "configuration_hf_nomic_bert.NomicBertConfig",
"AutoModel": "modeling_hf_nomic_bert.NomicBertModel",
},
Now these files are local, and we don't need to download them from Hugging Face. As a result, you should now be able to initialize the SentenceTransformersFinetuneEngine with the path to your local directory. It should then no longer complain about the lack of trust_remote_code=True.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment