Instructions to use nomic-ai/nomic-embed-text-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use nomic-ai/nomic-embed-text-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use nomic-ai/nomic-embed-text-v1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1", trust_remote_code=True) model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v1", trust_remote_code=True) - Transformers.js
How to use nomic-ai/nomic-embed-text-v1 with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('sentence-similarity', 'nomic-ai/nomic-embed-text-v1'); - Notebooks
- Google Colab
- Kaggle
Finetune Nomic-ai-embedding using SentenceTransformersFinetuneEngine
Hi
Im trying to finetune Nomic-ai-embedding using SentenceTransformersFinetuneEngine and am running into an issue:
from llama_index.finetuning import SentenceTransformersFinetuneEngine
finetune_engine = SentenceTransformersFinetuneEngine(
train_dataset, # Dataset to be trained on
model_id="nomic-ai/nomic-embed-text-v1", # HuggingFace reference to base embeddings model
model_output_path="llama_model_v1", # Output directory for fine-tuned embeddings model
val_dataset=test_dataset, # Dataset to validate on
epochs=2, # Number of Epochs to train for
)
Error:
Tried these steps but it didnt work:
You can first download the model to a local directory. Then, you can download these two files and also place them in the repository:
https://huggingface.co/nomic-ai/nomic-embed-text-v1/blob/main/modeling_hf_nomic_bert.py
https://huggingface.co/nomic-ai/nomic-embed-text-v1/blob/main/configuration_hf_nomic_bert.py
Then, you must update your local config.json to no longer say:
"auto_map": {
"AutoConfig": "nomic-ai/nomic-embed-text-v1--configuration_hf_nomic_bert.NomicBertConfig",
"AutoModel": "nomic-ai/nomic-embed-text-v1--modeling_hf_nomic_bert.NomicBertModel",
"AutoModelForMaskedLM": "nomic-ai/nomic-bert-2048--modeling_hf_nomic_bert.NomicBertForPreTraining"
},
but instead to say:
"auto_map": {
"AutoConfig": "configuration_hf_nomic_bert.NomicBertConfig",
"AutoModel": "modeling_hf_nomic_bert.NomicBertModel",
},
Now these files are local, and we don't need to download them from Hugging Face. As a result, you should now be able to initialize the SentenceTransformersFinetuneEngine with the path to your local directory. It should then no longer complain about the lack of trust_remote_code=True.
