assets/large_language_models/rag/components/update_pinecone_index/spec.yaml (42 lines of code) (raw):
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command
tags:
Preview: ""
version: 0.0.45
name: llm_rag_update_pinecone_index
display_name: LLM - Update Pinecone Index
is_deterministic: true
description: |
Uploads `embeddings` into Pinecone index specified in `pinecone_config`. The Index will be created if it doesn't exist.
Each record in the Index will have the following metadata populated:
- "id", String
- "content", String
- "url", String
- "filepath", String
- "title", String
- "metadata_json_string", String
"metadata_json_string" contains all metadata for a document/record serialized as a JSON string.
inputs:
embeddings:
type: uri_folder
mode: direct
description: "Embeddings output produced from parallel_create_embeddings."
pinecone_config:
type: string
description: 'JSON string containing the Pinecone index configuration. e.g. {"index_name": "my-index"}'
connection_id:
type: string
optional: true
description: "The id of the connection to the Pinecone project where the index lives."
outputs:
index:
type: uri_folder
description: "Uri folder containing the MLIndex yaml describing the newly created/updated Pinecone index."
environment: azureml:llm-rag-embeddings@latest
code: '../src/'
command: >-
python -m azureml.rag.tasks.update_pinecone
--embeddings '${{inputs.embeddings}}'
--pinecone_config '${{inputs.pinecone_config}}'
--output ${{outputs.index}}
$[[--connection_id '${{inputs.connection_id}}']]