Technology Sharing

LLM-Alibaba Cloud DashVector ModelScope Multimodal Vectorized Real-time Text Image Search Practice Summary

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Preface

This article uses Alibaba Cloud's vector search service (DashVector) and combines ONE-PEACE Multimodal Model, build a real-time "text search image" multimodal retrieval capability. The overall process is as follows:
image.png

  1. Multimodal data embedding storage. Through the ONE-PEACE model serviceEmbeddingThe interface converts data sets of multiple modalities into high-dimensional vectors.
  2. Multimodal Query RetrievalBased on the multimodal embedding capability provided by the ONE-PEACE model, we can freely combine inputs of different modalities, such as single text, text + audio, audio + image, and other multimodal inputs, obtain the Embedding vector, and use DashVector to retrieve similar results across modalities.

Prerequisites

Environment Preparation

# 安装 dashscope 和 dashvector sdk
pip3 install dashscope dashvector

# 显示图片
pip3 install Pillow
  • 1
  • 2
  • 3
  • 4
  • 5

data preparation

illustrate
Since DashScope's ONE-PEACE model service currently only supports image and audio input in URL format, you need to upload the dataset to a public network storage (such as oss/s3) in advance and obtain the URL address list of the corresponding images and audio.

step

Image data embedding storage

I used Alibaba Cloud's OSS to save the image, and obtained the URL of the image that can be accessed externally through the OSS Browser interface:
image.png
image.png
This URL should also be obtained through the interface, which has not been studied yet. Interested friends can try to obtain it in batches using the interface. The purpose of obtaining this URL is to enable Alibaba Cloud's DashScope service to read the image for embedding and save it in the DashVector vector database.
After getting the URL, write the URL to our imagenet1k-urls.txt The file will be read by our code later for embedding:
image.png
The code to execute the embedding is as follows (I will post the complete code and directory structure later, here I will only post the embedded code):

    def index_image(self):

        # 创建集合:指定集合名称和向量维度, ONE-PEACE 模型产生的向量统一为 1536 维
        collection = self.vector_client.get(self.vector_collection_name)
        if not collection:
            rsp = self.vector_client.create(self.vector_collection_name, 1536)
            collection = self.vector_client.get(self.vector_collection_name)
            if not rsp:
                raise DashVectorException(rsp.code, reason=rsp.message)

        # 调用 dashscope ONE-PEACE 模型生成图片 Embedding,并插入 dashvector
        with open(self.IMAGENET1K_URLS_FILE_PATH, 'r') as file:
            for i, line in enumerate(file):
                url = line.strip('n')
                input = [{'image': url}]
                result = MultiModalEmbedding.call(model=MultiModalEmbedding.Models.multimodal_embedding_one_peace_v1,
                                                  input=input,
                                                  api_key=os.environ["DASHSCOPE_API_KEY"],
                                                  auto_truncation=True)
                if result.status_code != 200:
                    print(f"ONE-PEACE failed to generate embedding of {url}, result: {result}")
                    continue
                embedding = result.output["embedding"]
                collection.insert(
                    Doc(
                        id=str(i),
                        vector=embedding,
                        fields={'image_url': url}
                    )
                )
                if (i + 1) % 100 == 0:
                    print(f"---- Succeeded to insert {i + 1} image embeddings")

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • Read IMAGENET1K_URLS_FILE_PATHThe image URL in the image is then executed to request the DashScope request to vectorize and store our image.
  • When inserting into the vector database, the URL of the image is taken as a vector attribute.

After execution, you canVector search service console, check the vector data:
image.png
image.png

Text Retrieval

To retrieve data from a text vector database, I entercatThree pictures (topk=3 in our code) are retrieved. You can check the effect. Two of them are pictures of cats, but one is a picture of a dog:
image.png
This is because there are similarities between the dog and the cat. Next, we willtopkIf it is set to 2, the dog will not be detected in theory. Let's take a look at the effect. Sure enough, there is no dog:
image.png
The reason why dogs appear is that I stored 4 animal pictures in the vector library, 2 cats and 2 dogs. If our topk is set to 3, one more dog will be detected.

Complete code

multi_model.pyThe file is as follows:

import os

import dashscope
from dashvector import Client, Doc, DashVectorException
from dashscope import MultiModalEmbedding
from dashvector import Client
from urllib.request import urlopen
from PIL import Image


class DashVectorMultiModel:
    def __init__(self):
        # 我们需要同时开通 DASHSCOPE_API_KEY 和 DASHVECTOR_API_KEY
        os.environ["DASHSCOPE_API_KEY"] = ""
        os.environ["DASHVECTOR_API_KEY"] = ""
        os.environ["DASHVECTOR_ENDPOINT"] = ""

        dashscope.api_key = os.environ["DASHSCOPE_API_KEY"]

        # 由于 ONE-PEACE 模型服务当前只支持 url 形式的图片、音频输入,因此用户需要将数据集提前上传到
        # 公共网络存储(例如 oss/s3),并获取对应图片、音频的 url 列表。
        # 该文件每行存储数据集单张图片的公共 url,与当前python脚本位于同目录下
        self.IMAGENET1K_URLS_FILE_PATH = "imagenet1k-urls.txt"

        self.vector_client = self.init_vector_client()
        self.vector_collection_name = 'imagenet1k_val_embedding'

    def init_vector_client(self):
        return Client(
          api_key=os.environ["DASHVECTOR_API_KEY"],
          endpoint=os.environ["DASHVECTOR_ENDPOINT"]
        )

    def index_image(self):

        # 创建集合:指定集合名称和向量维度, ONE-PEACE 模型产生的向量统一为 1536 维
        collection = self.vector_client.get(self.vector_collection_name)
        if not collection:
            rsp = self.vector_client.create(self.vector_collection_name, 1536)
            collection = self.vector_client.get(self.vector_collection_name)
            if not rsp:
                raise DashVectorException(rsp.code, reason=rsp.message)

        # 调用 dashscope ONE-PEACE 模型生成图片 Embedding,并插入 dashvector
        with open(self.IMAGENET1K_URLS_FILE_PATH, 'r') as file:
            for i, line in enumerate(file):
                url = line.strip('n')
                input = [{'image': url}]
                result = MultiModalEmbedding.call(model=MultiModalEmbedding.Models.multimodal_embedding_one_peace_v1,
                                                  input=input,
                                                  api_key=os.environ["DASHSCOPE_API_KEY"],
                                                  auto_truncation=True)
                if result.status_code != 200:
                    print(f"ONE-PEACE failed to generate embedding of {url}, result: {result}")
                    continue
                embedding = result.output["embedding"]
                collection.insert(
                    Doc(
                        id=str(i),
                        vector=embedding,
                        fields={'image_url': url}
                    )
                )
                if (i + 1) % 100 == 0:
                    print(f"---- Succeeded to insert {i + 1} image embeddings")

    def show_image(self, image_list):
        for img in image_list:
            # 注意:show() 函数在 Linux 服务器上可能需要安装必要的图像浏览器组件才生效
            # 建议在支持 jupyter notebook 的服务器上运行该代码
            img.show()

    def text_search(self, input_text):
        # 获取上述入库的集合
        collection = self.vector_client.get('imagenet1k_val_embedding')

        # 获取文本 query 的 Embedding 向量
        input = [{'text': input_text}]
        result = MultiModalEmbedding.call(model=MultiModalEmbedding.Models.multimodal_embedding_one_peace_v1,
                                          input=input,
                                          api_key=os.environ["DASHSCOPE_API_KEY"],
                                          auto_truncation=True)
        if result.status_code != 200:
            raise Exception(f"ONE-PEACE failed to generate embedding of {input}, result: {result}")
        text_vector = result.output["embedding"]

        # DashVector 向量检索
        rsp = collection.query(text_vector, topk=2)
        image_list = list()
        for doc in rsp:
            img_url = doc.fields['image_url']
            img = Image.open(urlopen(img_url))
            image_list.append(img)
        return image_list


if __name__ == '__main__':
    a = DashVectorMultiModel()
    # 执行 embedding 操作
    a.index_image()
    # 文本检索
    text_query = "Traffic light"
    a.show_image(a.text_search(text_query))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • After opening the API KEY of DashScope and DashVector, replace the aboveDASHSCOPE_API_KEY,DASHVECTOR_API_KEY,DASHVECTOR_ENDPOINT

The code directory structure is as follows, put the txt file and py file in the same directory:
image.png


Additional Notes

  • Use local images:I uploaded the image to OSS, you can also use a local image file and replace the file path in txt with the local image path, as follows:

    image.png

  • If we use local images, we have to modify the above code and modify the following code:

    # 将 img = Image.open(urlopen(img_url)) 替换为下边的代码
    img = Image.open(img_url)
    
    • 1
    • 2