Use this file to discover all available pages before exploring further.
StarRocks is a High-Performance Analytical Database.
StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
Usually StarRocks is categorized into OLAP, and it has showed excellent performance in ClickBench — a Benchmark For Analytical DBMS. Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.
Here we’ll show how to use the StarRocks Vector Store.
/Users/dirlt/utils/py3env/lib/python3.9/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.7) or chardet (5.1.0)/charset_normalizer (2.0.9) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported "
Load all markdown files under the docs directoryfor starrocks documents, you can clone repo from github.com/StarRocks/starrocks, and there is docs directory in it.
Split docs into tokens, and set update_vectordb = True because there are new docs/tokens.
# load text splitter and split docs into snippets of texttext_splitter = TokenTextSplitter(chunk_size=400, chunk_overlap=50)split_docs = text_splitter.split_documents(documents)# tell vectordb to update text embeddingsupdate_vectordb = True
split_docs[-20]
Document(page_content='Compile StarRocks with Docker\n\nThis topic describes how to compile StarRocks using Docker.\n\nOverview\n\nStarRocks provides development environment images for both Ubuntu 22.04 and CentOS 7.9. With the image, you can launch a Docker container and compile StarRocks in the container.\n\nStarRocks version and DEV ENV image\n\nDifferent branches of StarRocks correspond to different development environment images provided on StarRocks Docker Hub.\n\nFor Ubuntu 22.04:\n\n| Branch name | Image name |\n | --------------- | ----------------------------------- |\n | main | starrocks/dev-env-ubuntu:latest |\n | branch-3.0 | starrocks/dev-env-ubuntu:3.0-latest |\n | branch-2.5 | starrocks/dev-env-ubuntu:2.5-latest |\n\nFor CentOS 7.9:\n\n| Branch name | Image name |\n | --------------- | ------------------------------------ |\n | main | starrocks/dev-env-centos7:latest |\n | branch-3.0 | starrocks/dev-env-centos7:3.0-latest |\n | branch-2.5 | starrocks/dev-env-centos7:2.5-latest |\n\nPrerequisites\n\nBefore compiling StarRocks, make sure the following requirements are satisfied:\n\nHardware\n\n', metadata={'source': 'docs/developers/build-starrocks/Build_in_docker.md'})
Convert tokens into embeddings and put them into vectordb
Here we use StarRocks as vectordb, you can configure StarRocks instance via StarRocksSettings.Configuring StarRocks instance is pretty much like configuring mysql instance. You need to specify: