Home Error when Trying to Load Data Into Azure Cognitive Search Index (AttributeError: 'str' object has no attribute 'get')

Questions

Error when Trying to Load Data Into Azure Cognitive Search Index (AttributeError: 'str' object has no attribute 'get')

byMR

October 23, 2023

I am trying load data (with embeddings) into my Azure Cognitive Search index. This is my process after adding the embedding fields to my Pandas dataframe:

input data = df.to_json() # Where DF is the Pandas dataframe with embedding fields

# Use SearchIndexingBufferedSender to upload the documents in batches optimized for indexing  
with SearchIndexingBufferedSender(  
    endpoint=service_endpoint,  
    index_name=index_name,  
    credential=credential,  
) as batch_client:  
    # Add upload actions for all documents  
    batch_client.upload_documents(documents=input_data)  
print(f"Uploaded {len(input_data)} documents in total")

I am getting the following error:

File /packages/azure/search/documents/_search_indexing_buffered_sender.py:322, in SearchIndexingBufferedSender._retry_action(self, action)
    320     self._callback_fail(action)
    321     return
--> 322 key = action.additional_properties.get(self._index_key)
    323 counter = self._retry_counter.get(key)
    324 if not counter:
    325     # first time that fails

AttributeError: 'str' object has no attribute 'get'

Since my input data is relatively small, I have also tried loading the data without batches:

search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)
result = search_client.upload_documents(input_data, timeout = 50)

And this gives me a different error:

File /packages/azure/search/documents/_generated/operations/_documents_operations.py:1251, in DocumentsOperations.index(self, batch, request_options, **kwargs)
   1249     map_error(status_code=response.status_code, response=response, error_map=error_map)
   1250     error = self._deserialize.failsafe_deserialize(_models.SearchError, pipeline_response)
-> 1251     raise HttpResponseError(response=response, model=error)
   1253 if response.status_code == 200:
   1254     deserialized = self._deserialize("IndexDocumentsResult", pipeline_response)

HttpResponseError: () The request is invalid. Details: A null value was found with the expected type 'search.documentFields[Nullable=False]'. The expected type 'search.documentFields[Nullable=False]' does not allow null values.
Code: 
Message: The request is invalid. Details: A null value was found with the expected type 'search.documentFields[Nullable=False]'. The expected type 'search.documentFields[Nullable=False]' does not allow null values.

But my dataframe does not have any empty values, so that makes me think there is something wrong with the format of the file I am sending. I have tried both of these with no success:

input_data = df.to_json()
input_data = df.to_json(orient="records")

Here is my index definition:


index_client = SearchIndexClient(
    endpoint=service_endpoint, credential=credential)

fields = [
    SimpleField(name="Id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),
    SearchableField(name="Field1", type=SearchFieldDataType.String),
    SearchableField(name="Field2", type=SearchFieldDataType.String, filterable=True),
    SearchableField(name="Field3", type=SearchFieldDataType.String, filterable=True),
    SearchableField(name="Field4", type=SearchFieldDataType.String, filterable=True),
    SearchableField(name="Field5", type=SearchFieldDataType.String, filterable=True),
    SearchField(name="Field4_vec", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=384, vector_search_profile="myHnswProfile"),
    SearchField(name="Field5_vec", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=384, vector_search_profile="myHnswProfile")
]

# Configure the vector search configuration  
vector_search = VectorSearch(
    algorithms=[
        HnswVectorSearchAlgorithmConfiguration(
            name="myHnsw",
            kind=VectorSearchAlgorithmKind.HNSW,
            parameters=HnswParameters(
                m=4,
                ef_construction=400,
                ef_search=500,
                metric="cosine"
            )
        ),
        ExhaustiveKnnVectorSearchAlgorithmConfiguration(
            name="myExhaustiveKnn",
            kind=VectorSearchAlgorithmKind.EXHAUSTIVE_KNN,
            parameters=ExhaustiveKnnParameters(
                metric="cosine"
            )
        )
    ],
    profiles=[
        VectorSearchProfile(
            name="myHnswProfile",
            algorithm="myHnsw",
        ),
        VectorSearchProfile(
            name="myExhaustiveKnnProfile",
            algorithm="myExhaustiveKnn",
        )
    ]
)

# Create the search index 
index = SearchIndex(name=index_name, fields=fields,
                    vector_search=vector_search)
result = index_client.create_or_update_index(index)
print(f' {result.name} created')

I am unable to post a sample of the data, but it is a Pandas dataframe with the same fields as the index:

Id (string)
Field1 (string)
Field2 (string)
Field3 (string)
Field4 (string)
Field5 (string)
Field4_vec (contents in the shape of [-0.01168345008045435, -0.0396871380507946, -0...]) with dimension 384
Field5_vec (contents in the shape of [-0.01168345008045435, -0.0396871380507946, -0...]) with dimension 384

Any advice is appreciated. Thanks!

>Solution :

For the first error, the documents parameter of batch_client.upload_documents expects a list of dictionaries, not a JSON string. You can try converting your dataframe to a list of dictionaries using input_data = df.to_dict(orient="records").

For the second error, you may be correct, that null values are being detected due to the format trying to upload. Note, that your vector fields can be an empty array [] but can’t be null