Copyright 2024 Google LLC
  
Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
limitations under the License.

This section shows you how to upload Vectors into a new Qdrant Collection and run simple search queries using the official Qdrant client.

In this example, you use a dataset from a CSV file that contains a list of books in different genres. Qdrant will serve as a search engine.

Install kubectl and the Google Cloud SDK with the necessary authentication plugin for Google Kubernetes Engine (GKE).

In [None]:
%%bash

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
apt-get update && apt-get install apt-transport-https ca-certificates gnupg
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
apt-get update && sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin

Install a Qdrant client:

In [None]:
! pip install qdrant-client[fastembed] python-dotenv -U

Replace \<CLUSTER_NAME> with your cluster name, e.g. qdrant-cluster. Retrieve the GKE cluster's credentials using the gcloud command.

In [None]:
%%bash

export KUBERNETES_CLUSTER_NAME= <CLUSTER_NAME> 
gcloud container clusters get-credentials $KUBERNETES_CLUSTER_NAME --region $GOOGLE_CLOUD_REGION

Download the dataset from Git.

In [2]:
%%bash

export DATASET_PATH=https://raw.githubusercontent.com/epam/kubernetes-engine-samples/qdrant-installation/databases/qdrant/manifests/04-notebook/dataset.csv
curl -s -LO $DATASET_PATH

Please run the next command and check if Qdrant internal load balancer achieved an IP address. If you see ip address in the output proceed to the next step if blanc please repeat the command after a few minutes or check the status of qdrant-ilb service from your console, proceed to the next step only when IP address appears.

In [None]:
%%bash
kubectl get svc qdrant-ilb -n qdrant --output jsonpath="{.status.loadBalancer.ingress[0].ip}"

Create an .env file with environment variables required for connecting to Qdrant in a Kubernetes cluster.

In [None]:
%%bash
echo QDRANT_ENDPOINT="http://$(kubectl get svc qdrant-ilb -n qdrant --output jsonpath="{.status.loadBalancer.ingress[0].ip}"):6333" > .env
echo APIKEY=$(kubectl get secret qdrant-database-apikey -n qdrant --template='{{index  .data "api-key"}}'| base64 -d) >> .env


Import the required Python and Qdrant libraries:

In [25]:
from dotenv import load_dotenv
from qdrant_client import QdrantClient
from qdrant_client.http import models
import os
import csv

Load data from a CSV file for inserting data into a Qdrant collection:

In [8]:
books = [*csv.DictReader(open('/content/dataset.csv'))]

Prepare data for uploading:

In [9]:
documents: list[dict[str, any]] = []
metadata: list[dict[str, any]] = []
ids: list[int] = []

for idx, doc in enumerate(books):
    ids.append(idx)
    documents.append(doc["description"])
    metadata.append(
        {
            "title": doc["title"],
            "author": doc["author"],
            "publishDate": doc["publishDate"],
        }
    )

Define a Qdrant connection, it requires an API Key for authentication:

In [None]:
load_dotenv()
qdrant = QdrantClient(
    url=os.getenv("QDRANT_ENDPOINT"), api_key=os.getenv("APIKEY"))

Create a Qdrant collection and insert data. This method establishes a connection to Qdrant, creates a new collection named `my_books`, and uploads the book data to `my_books`.

In [27]:
qdrant.add(collection_name="my_books", documents=documents, metadata=metadata, ids=ids, parallel=2)

[]

Query the Qdrant database. This method runs a search query about `drama about people and unhappy love` and displays results.

It prints each result separated by a line of dashes, in the following format :

- Title: Title of the book
- Author: Author of the book
- Description: As stored in your document's description metadata field
- Published: Book publication date
- Score: Qdrant's relevancy score

In [None]:
results = qdrant.query(
    collection_name="my_books",
    query_text="drama about people and unhappy love",
    limit=2,
)
for result in results:
    print("Title:", result.metadata["title"], "\nAuthor:", result.metadata["author"])
    print("Description:", result.metadata["document"], "Published:", result.metadata["publishDate"], "\nScore:", result.score)
    print("-----")