Create a healthcare search data store

To search clinical data in Vertex AI Search, you can follow one of these workflows:

  • Create a healthcare data store, import FHIR R4 data into the data store, connect it to a healthcare search app, and query the clinical data.
  • Create a healthcare search app, create a healthcare data store and import FHIR R4 data into the data store during the app creation process, and query the clinical data. For more information, see Create a healthcare search app.

This page describes the first method.

About data import frequency

You can import FHIR R4 data into a data store in the following ways:

  • Batch import: a one-time import. Data is imported into a data store. For further incremental imports, see Refresh healthcare data.

  • Streaming import (Preview): a near real-time streaming data import. Any incremental changes in the source FHIR store are synchronized in the Vertex AI Search data store. Streaming requires a data connector, which is a type of a data store that contains an entity. An entity is also a data store instance.

    The data streaming rate for a given Google Cloud project is dependent on the following quotas. If you exceed the quota you might experience streaming delays.

You can select the data import frequency at the time of data store creation and you cannot change this configuration later.

Streaming import is available for all the resources that Vertex AI Search supports. For more information, see Healthcare FHIR R4 data schema reference.

Before you begin

Before you create the healthcare data store and import data into it, complete these requirements:

  • Understand the relationship between apps and data stores for healthcare search. For more information, see About apps and data stores.

  • Prepare your FHIR data for ingestion.

  • Vertex AI Search for healthcare provides search services only in the US multi-region (us). Therefore, your healthcare search app and data stores must reside in the us multi-region.

Create a data store

You can create a data store either in the Google Cloud console or using the API. The two approaches differ in the following way:

  • In the Google Cloud console: Select the source FHIR store in the Cloud Healthcare API and import FHIR data as part of the healthcare search data store creation process. To stream FHIR data (Preview), your source FHIR store must be in the same Google Cloud project as the destination Vertex AI Search data store.
  • Through the REST API: You can import FHIR data from a Cloud Healthcare API FHIR store that's in the same Google Cloud project or a different one.
    1. Use the dataStores.create method to create a healthcare data store.
    2. Use the documents.import method to specify the FHIR store in Cloud Healthcare API and import FHIR R4 data.

To create a healthcare data store, complete the following steps.

Console

  1. In the Google Cloud console, go to the Agent Builder page.

    Agent Builder

  2. In the navigation menu, click Data Stores.

  3. Click Create data store.

  4. In the Select a data source pane, select Healthcare API (FHIR) as your data source.
  5. To import data from your FHIR store, do one of the following:
    • Select the FHIR store from the list of available FHIR stores:
      1. Expand the FHIR store field.
      2. In this list, select a dataset that resides in a permitted location and then select a FHIR store that uses FHIR version R4.

        To stream FHIR data (Preview), your source FHIR store must be in the same Google Cloud project as the destination Vertex AI Search data store.

    • Enter the FHIR store manually:
      1. Expand the FHIR store field.
      2. Click Enter FHIR store manually.
      3. In the FHIR store name dialog, enter the full name of the FHIR store in the following format:

        project/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID/fhirStores/FHIR_STORE_ID

      4. Click Save.
  6. In the Synchronization section, select one of the following options. This selection cannot be changed after the data store is created.
    • One time: to perform a one-time batch data import. For further incremental imports, see Refresh healthcare data.
    • Streaming: to perform a near real-time streaming data import. To stream data, you must create a data connector, which is a type of a data store. This is a Preview feature. To set up a streaming data store using the REST API, contact your customer engineer.
  7. In the What is the schema for this data? section, select one of these options:
    • Google predefined schema: to retain the Google-defined schema configurations, such as indexability, searchability, and retrievability, for the supported FHIR resources and elements. Once you select this option, you cannot update the schema after you create the data store. If you want to be able to change the schema after the data store creation, select the Custom schema (Preview) option.
      1. Click Continue.
      2. In the Your data store name field, enter a name for your data store.
      3. Click Create.
      4. The data store you created is listed on the Data Stores page.

    • Custom schema (Preview): to define your own schema configurations, such as indexability, searchability, and retrievability, for the supported FHIR resources and elements. This is a Preview feature. To set up a configurable schema, contact your customer engineer.
      1. Click Continue.
      2. Review the schema, expand each field, and edit the field settings.
      3. Click Add new fields to add new fields on the supported FHIR resources. You cannot remove the fields provided in the Google-defined schema.
      4. Click Continue.
      5. In the Your data connector name field, enter a name for your data connector.
      6. Click Create.
      7. The data connector you created is listed on the Data Stores page. The source FHIR store is added as an entity within the data connector.

  8. Click Continue.

REST

  1. Create a data store.

    curl -X POST\
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json"\
     -H "X-Goog-User-Project: PROJECT_ID" \
    "https://us-discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/us/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
     -d '{
        "displayName": "DATA_STORE_DISPLAY_NAME",
        "industryVertical": "HEALTHCARE_FHIR",
        "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
        "searchTier": "STANDARD",
        "searchAddOns": ["LLM"]
    }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store that you want to create. This ID can contain only lowercase letters, digits, underscores, and hyphens.
    • DATA_STORE_DISPLAY_NAME: the display name of the Vertex AI Search data store that you want to create.
  2. If the source FHIR store and the target Vertex AI Search data store are in the same Google Cloud project, call the following method to perform a one-time batch import. If they're not in the same project, go to the next step.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://us-discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/us/dataStores/DATA_STORE_ID/branches/0/documents:import" \
    -d '{
       "reconciliation_mode": "FULL",
       "fhir_store_source": {"fhir_store": "projects/PROJECT_ID/locations/CLOUD_HEALTHCARE_DATASET_LOCATION/datasets/CLOUD_HEALTHCARE_DATASET_ID/fhirStores/FHIR_STORE_ID"}
    }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
    • CLOUD_HEALTHCARE_DATASET_ID: the ID of the Cloud Healthcare API dataset that contains the source FHIR store.
    • CLOUD_HEALTHCARE_DATASET_LOCATION: the location of the Cloud Healthcare API dataset that contains the source FHIR store.
    • FHIR_STORE_ID: the ID of the Cloud Healthcare API FHIR R4 store.
  3. If the source FHIR store and the target Vertex AI Search data store are in different Google Cloud projects, call the following method to perform a one-time batch import. If they're in the same project, go back to the previous step.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -H "X-Goog-User-Project: TARGET_PROJECT_ID" \
    "https://us-discoveryengine.googleapis.com/v1alpha/projects/TARGET_PROJECT_ID/locations/us/dataStores/DATA_STORE_ID/branches/0/documents:import" \
    -d '{
       "reconciliation_mode": "FULL",
       "fhir_store_source": {"fhir_store": "projects/SOURCE_PROJECT_ID/locations/CLOUD_HEALTHCARE_DATASET_LOCATION/datasets/CLOUD_HEALTHCARE_DATASET_ID/fhirStores/FHIR_STORE_ID"}
    }'
    

    Replace the following:

    • TARGET_PROJECT_ID: the ID of the Google Cloud project that contains the Vertex AI Search data store.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
    • SOURCE_PROJECT_ID: the ID of the Google Cloud project that contains the Cloud Healthcare API dataset and FHIR store.
    • CLOUD_HEALTHCARE_DATASET_ID: the ID of the Cloud Healthcare API dataset that contains the source FHIR store.
    • CLOUD_HEALTHCARE_DATASET_LOCATION: the location of the Cloud Healthcare API dataset that contains the source FHIR store.
    • FHIR_STORE_ID: the ID of the Cloud Healthcare API FHIR R4 store.
  4. Optional: To set up a streaming data import (Preview) using the REST API, contact your customer engineer.

Python

For more information, see the Vertex AI Agent Builder Python API reference documentation.

To authenticate to Vertex AI Agent Builder, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

Create a data store


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

Import documents

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "us"
# data_store_id = "YOUR_DATA_STORE_ID"
# healthcare_project_id = "YOUR_HEALTHCARE_PROJECT_ID"
# healthcare_location = "YOUR_HEALTHCARE_LOCATION"
# healthcare_dataset_id = "YOUR_HEALTHCARE_DATASET_ID"
# healthcare_fihr_store_id = "YOUR_HEALTHCARE_FHIR_STORE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    fhir_store_source=discoveryengine.FhirStoreSource(
        fhir_store=client.fhir_store_path(
            healthcare_project_id,
            healthcare_location,
            healthcare_dataset_id,
            healthcare_fihr_store_id,
        ),
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Verify data store creation and FHIR data import

This task shows you how to verify whether a data store was created successfully and whether FHIR data was imported into the data store successfully.

  • In the Google Cloud console: Select the data store and verify its details.
  • Through the REST API:
    1. Use the dataStores.get method to get the healthcare data store details.
    2. Use the operations.get method to get the details of the import operation.

To verify data store creation and data import, complete the following steps.

Console

  1. In the Google Cloud console, go to the Agent Builder page.

    Agent Builder

  2. In the navigation menu, click Data Stores.

    The Data Stores page displays a list of data stores in your Google Cloud project with their details.

  3. Verify whether the data store or the data connector that you created is in the data stores list.

  4. Select the data store or the data connector and verify its details.

    • For a data store:
      • The summary table lists the following details:
        • The data store ID, type, and region.
        • The number of documents indicating the number of FHIR resources imported.
        • The timestamp when the last document was imported.
        • Optionally, click View details to see the document import details, such as the details about a successful, partial, or failed import.
      • The Documents tab lists the resource IDs of the imported FHIR resources and their resource types in a paginated table. You can filter this table to verify whether a particular resource was imported.
      • The Activity tab lists the document import details, such as the details about a successful, partial, or failed import.
    • For a data connector:
      • The summary table lists the following details:
        • The collection ID, type, and region.
        • The name of the connected app.
        • The state of the connector, which is either active or paused.
      • The Entities table shows the entity within the data connector. The entity's name is the source FHIR store name. The entity's ID is the data connector's ID appended with the source FHIR store name.
        • Click the entity name to see its details. Because an entity is a data store instance within a data connector, the entity details are the same as a data store details.
  5. In the Schema tab, view the properties for the supported FHIR resources and elements. Click Edit to configure the schema. This is a Private preview feature. To set up a configurable schema, contact your customer engineer.

REST

  1. Verify the data store creation.

    curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json"\
     -H "X-Goog-User-Project: PROJECT_ID" \
     "https://us-discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/us/collections/default_collection/dataStores/DATA_STORE_ID"
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
  2. Verify whether the FHIR data import operation is complete.

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://us-discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/us/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/operations/IMPORT_OPERATION_ID"
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
    • IMPORT_OPERATION_ID: the operation ID of the long-running operation that's returned when you call the import method

What's next