Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Error when Trying to Load Data Into Azure Cognitive Search Index (AttributeError: 'str' object has no attribute 'get')

I am trying load data (with embeddings) into my Azure Cognitive Search index. This is my process after adding the embedding fields to my Pandas dataframe:

input data = df.to_json() # Where DF is the Pandas dataframe with embedding fields

# Use SearchIndexingBufferedSender to upload the documents in batches optimized for indexing  
with SearchIndexingBufferedSender(  
    endpoint=service_endpoint,  
    index_name=index_name,  
    credential=credential,  
) as batch_client:  
    # Add upload actions for all documents  
    batch_client.upload_documents(documents=input_data)  
print(f"Uploaded {len(input_data)} documents in total")

I am getting the following error:

File /packages/azure/search/documents/_search_indexing_buffered_sender.py:322, in SearchIndexingBufferedSender._retry_action(self, action)
    320     self._callback_fail(action)
    321     return
--> 322 key = action.additional_properties.get(self._index_key)
    323 counter = self._retry_counter.get(key)
    324 if not counter:
    325     # first time that fails

AttributeError: 'str' object has no attribute 'get'

Since my input data is relatively small, I have also tried loading the data without batches:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential)
result = search_client.upload_documents(input_data, timeout = 50)

And this gives me a different error:

File /packages/azure/search/documents/_generated/operations/_documents_operations.py:1251, in DocumentsOperations.index(self, batch, request_options, **kwargs)
   1249     map_error(status_code=response.status_code, response=response, error_map=error_map)
   1250     error = self._deserialize.failsafe_deserialize(_models.SearchError, pipeline_response)
-> 1251     raise HttpResponseError(response=response, model=error)
   1253 if response.status_code == 200:
   1254     deserialized = self._deserialize("IndexDocumentsResult", pipeline_response)

HttpResponseError: () The request is invalid. Details: A null value was found with the expected type 'search.documentFields[Nullable=False]'. The expected type 'search.documentFields[Nullable=False]' does not allow null values.
Code: 
Message: The request is invalid. Details: A null value was found with the expected type 'search.documentFields[Nullable=False]'. The expected type 'search.documentFields[Nullable=False]' does not allow null values.

But my dataframe does not have any empty values, so that makes me think there is something wrong with the format of the file I am sending. I have tried both of these with no success:

input_data = df.to_json()
input_data = df.to_json(orient="records")

Here is my index definition:


index_client = SearchIndexClient(
    endpoint=service_endpoint, credential=credential)

fields = [
    SimpleField(name="Id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),
    SearchableField(name="Field1", type=SearchFieldDataType.String),
    SearchableField(name="Field2", type=SearchFieldDataType.String, filterable=True),
    SearchableField(name="Field3", type=SearchFieldDataType.String, filterable=True),
    SearchableField(name="Field4", type=SearchFieldDataType.String, filterable=True),
    SearchableField(name="Field5", type=SearchFieldDataType.String, filterable=True),
    SearchField(name="Field4_vec", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=384, vector_search_profile="myHnswProfile"),
    SearchField(name="Field5_vec", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=384, vector_search_profile="myHnswProfile")
]

# Configure the vector search configuration  
vector_search = VectorSearch(
    algorithms=[
        HnswVectorSearchAlgorithmConfiguration(
            name="myHnsw",
            kind=VectorSearchAlgorithmKind.HNSW,
            parameters=HnswParameters(
                m=4,
                ef_construction=400,
                ef_search=500,
                metric="cosine"
            )
        ),
        ExhaustiveKnnVectorSearchAlgorithmConfiguration(
            name="myExhaustiveKnn",
            kind=VectorSearchAlgorithmKind.EXHAUSTIVE_KNN,
            parameters=ExhaustiveKnnParameters(
                metric="cosine"
            )
        )
    ],
    profiles=[
        VectorSearchProfile(
            name="myHnswProfile",
            algorithm="myHnsw",
        ),
        VectorSearchProfile(
            name="myExhaustiveKnnProfile",
            algorithm="myExhaustiveKnn",
        )
    ]
)

# Create the search index 
index = SearchIndex(name=index_name, fields=fields,
                    vector_search=vector_search)
result = index_client.create_or_update_index(index)
print(f' {result.name} created')

I am unable to post a sample of the data, but it is a Pandas dataframe with the same fields as the index:

Id (string)
Field1 (string)
Field2 (string)
Field3 (string)
Field4 (string)
Field5 (string)
Field4_vec (contents in the shape of [-0.01168345008045435, -0.0396871380507946, -0...]) with dimension 384
Field5_vec (contents in the shape of [-0.01168345008045435, -0.0396871380507946, -0...]) with dimension 384

Any advice is appreciated. Thanks!

>Solution :

For the first error, the documents parameter of batch_client.upload_documents expects a list of dictionaries, not a JSON string. You can try converting your dataframe to a list of dictionaries using input_data = df.to_dict(orient="records").

For the second error, you may be correct, that null values are being detected due to the format trying to upload. Note, that your vector fields can be an empty array [] but can’t be null

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading