YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Ocean-Buoy-Log-Captioner

Overview

Ocean-Buoy-Log-Captioner is a Vision-Encoder-Decoder model designed for multimodal log analysis. It is specifically trained to generate descriptive text summaries (captions) of environmental events by jointly processing structured sensor readings (input features) and a short event code/description.

This model is an adaptation of image captioning architectures, where the "visual" input is a feature vector derived from time-series sensor data ($\text{Temp}$, $\text{Salinity}$, $\text{pH}$, etc.) and the "caption" is the long-form visual_observation text and event_description from the log.

Model Architecture

Architecture: Vision Encoder Decoder Model (VisionEncoderDecoderModel).
Encoder (Input Processor): DistilBERT. This is adapted to process the concatenated and tokenized string representation of the structured data (timestamp, temperature_c, event_code, etc.) into a contextual vector.
Decoder (Text Generator): GPT-2 (small variant, 6 layers). This generates the fluent natural language description.
Input Modality: Structured log data represented as a single, tokenized string (e.g., [T:14.5] [S:35.2] [pH:8.1] [CODE:P_OK]).
Output: A natural language sentence summarizing the status and visual observation (e.g., "Surface appears calm, no visual anomalies. Normal Operation.").

Intended Use

Automated Log Summarization: Generating human-readable summaries for millions of sensor log entries, making environmental monitoring data accessible to non-experts.
Anomaly Description: Translating complex numeric shifts ($\text{pH} < 7.9$) and error codes ($\text{E_ERR}$) into descriptive, actionable text.
Data Retrieval: Enabling natural language querying against structured time-series data using the generated captions as an intermediate representation.

Limitations

Structured Input Format: The model is highly sensitive to the exact tokenization format of the structured input features (e.g., [T:VALUE]). Any deviation in the input format will lead to poor generation quality.
Causality: The model generates descriptive text but does not perform true causal reasoning (i.e., it can describe a low $\text{pH}$ and high $\text{T}$, but cannot definitively state that the $\text{low pH}$ was caused by the $\text{high T}$).
Max Length Constraint: The output is limited to short, abstractive descriptions due to the 128 token max length.

Example Code (Python)

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "Multimodal/Ocean-Buoy-Log-Captioner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Structured data represented as a tokenizable string following the training format
# T=Temperature, S=Salinity, pH=pH Level, CODE=Event Code
structured_log = "[T:14.0] [S:35.5] [pH:7.7] [CODE:A_LOW] [EVENT:Low pH reading]"

# Encode the structured log using the encoder tokenizer
input_ids = tokenizer.encode(structured_log, return_tensors="pt")

# Generate the descriptive text summary
output_ids = model.generate(
    input_ids,
    max_length=50,
    num_beams=4,
    early_stopping=True
)

# Decode the generated text
caption = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(f"Structured Log Input: {structured_log}")
print(f"Generated Log Caption: {caption}")
# Expected output (approx): Water color is slightly turbid, suggesting sediment runoff or upwelling. Low pH reading detected.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support