YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Ocean-Buoy-Log-Captioner
Overview
Ocean-Buoy-Log-Captioner is a Vision-Encoder-Decoder model designed for multimodal log analysis. It is specifically trained to generate descriptive text summaries (captions) of environmental events by jointly processing structured sensor readings (input features) and a short event code/description.
This model is an adaptation of image captioning architectures, where the "visual" input is a feature vector derived from time-series sensor data ($\text{Temp}$, $\text{Salinity}$, $\text{pH}$, etc.) and the "caption" is the long-form visual_observation text and event_description from the log.
Model Architecture
- Architecture: Vision Encoder Decoder Model (
VisionEncoderDecoderModel). - Encoder (Input Processor): DistilBERT. This is adapted to process the concatenated and tokenized string representation of the structured data (
timestamp,temperature_c,event_code, etc.) into a contextual vector. - Decoder (Text Generator): GPT-2 (small variant, 6 layers). This generates the fluent natural language description.
- Input Modality: Structured log data represented as a single, tokenized string (e.g.,
[T:14.5] [S:35.2] [pH:8.1] [CODE:P_OK]). - Output: A natural language sentence summarizing the status and visual observation (e.g., "Surface appears calm, no visual anomalies. Normal Operation.").
Intended Use
- Automated Log Summarization: Generating human-readable summaries for millions of sensor log entries, making environmental monitoring data accessible to non-experts.
- Anomaly Description: Translating complex numeric shifts ($\text{pH} < 7.9$) and error codes ($\text{E_ERR}$) into descriptive, actionable text.
- Data Retrieval: Enabling natural language querying against structured time-series data using the generated captions as an intermediate representation.
Limitations
- Structured Input Format: The model is highly sensitive to the exact tokenization format of the structured input features (e.g.,
[T:VALUE]). Any deviation in the input format will lead to poor generation quality. - Causality: The model generates descriptive text but does not perform true causal reasoning (i.e., it can describe a low $\text{pH}$ and high $\text{T}$, but cannot definitively state that the $\text{low pH}$ was caused by the $\text{high T}$).
- Max Length Constraint: The output is limited to short, abstractive descriptions due to the 128 token max length.
Example Code (Python)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "Multimodal/Ocean-Buoy-Log-Captioner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Structured data represented as a tokenizable string following the training format
# T=Temperature, S=Salinity, pH=pH Level, CODE=Event Code
structured_log = "[T:14.0] [S:35.5] [pH:7.7] [CODE:A_LOW] [EVENT:Low pH reading]"
# Encode the structured log using the encoder tokenizer
input_ids = tokenizer.encode(structured_log, return_tensors="pt")
# Generate the descriptive text summary
output_ids = model.generate(
input_ids,
max_length=50,
num_beams=4,
early_stopping=True
)
# Decode the generated text
caption = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(f"Structured Log Input: {structured_log}")
print(f"Generated Log Caption: {caption}")
# Expected output (approx): Water color is slightly turbid, suggesting sediment runoff or upwelling. Low pH reading detected.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support