I am trying to generate a SQL statement with python. Please check the script below:
import re
json_file_object = open("sample_json_paths.txt", "r")
sql = list()
for sample_text in json_file_object:
# sample_text = "$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.Relationship.OrganizationContact.OriginalSystemReference"
sql.append("SELECT\n")
sql.append(" CONVERT(NVARCHAR(32), HashBytes('MD5', concat(rt.id,'_', @now)), 2) AS docid\n")
#Append parentnodeid row
remove_dollar_sign = sample_text.replace("$.","")
json_string_list = remove_dollar_sign.split(".")
nodeid = json_string_list.pop(-1)
parent_node_id = ".".join(json_string_list)
sql.append(" ,'" + parent_node_id + "' " + "AS parentnodeid\n")
#Append nodeid row
sql.append(" ,'" + nodeid + "' " + "AS nodeid\n")
sql.append(" ,SyncCustomerRequestABM_layer.RequestHeader AS RequestHeader\n")
sql.append(" ,final_layer.[key] AS final_layer_key\n")
sql.append(" ,final_layer.[value] AS Ofinal_layer_value\n")
sql.append("FROM dbo.cdm_json_dataset_backup rt")
sql.append("""
OUTER APPLY OPENJSON ( rt.cdm_json) AS layer_root
OUTER APPLY OPENJSON ( layer_root.value )
WITH (
[RequestHeader] NVARCHAR(MAX) AS json,
[SyncCustomerRequest] NVARCHAR(MAX) AS json
) AS SyncCustomerRequestABM_layer\n""")
#append OUTER APPLY with json sub path
remove_leading_json = sample_text.replace(".SyncCustomerRequestABM.SyncCustomerRequest","")
json_string_list = remove_leading_json.split(".")
json_string_list.pop(-1)
json_sub_path = ".".join(json_string_list)
sql.append(" OUTER APPLY OPENJSON ( SyncCustomerRequestABM_layer.SyncCustomerRequest, " + "'" + json_sub_path + "'" + ") AS final_layer\n")
sql.append(" WHERE final_layer.[key] = '" + nodeid + "'\n")
sql.append("UNION ALL\n")
# print(json_sub_path)
sql_output = "".join(sql)
f = open("sql_statements.txt", "a")
f.write(sql_output)
f.close()
print(sql_output)
The scripts gave good result when I tested for the sample_text. However, the statement has unusual line breaks when I read the input from a file and put it to the loop. For example:
SELECT
CONVERT(NVARCHAR(32), HashBytes('MD5', concat(rt.id,'_', @now)), 2) AS docid
,'SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer' AS parentnodeid
,'OriginalSystemReference
' AS nodeid
,SyncCustomerRequestABM_layer.RequestHeader AS RequestHeader
,final_layer.[key] AS final_layer_key
,final_layer.[value] AS Ofinal_layer_value
FROM dbo.cdm_json_dataset_backup rt
OUTER APPLY OPENJSON ( rt.cdm_json) AS layer_root
OUTER APPLY OPENJSON ( layer_root.value )
WITH (
[RequestHeader] NVARCHAR(MAX) AS json,
[SyncCustomerRequest] NVARCHAR(MAX) AS json
) AS SyncCustomerRequestABM_layer
OUTER APPLY OPENJSON ( SyncCustomerRequestABM_layer.SyncCustomerRequest, '$.Customers.Customer') AS final_layer
WHERE final_layer.[key] = 'OriginalSystemReference
'
UNION ALL
The line break occurs at ' AS nodeid and 'OriginalSystemReference – before the UNION ALL
The issue does not occur at the last loop.
You can also check the sample text from the input file as below:
$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.OriginalSystemReference
$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.Accounts.Account.OriginalSystemReference
$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.Accounts.Account.PartySite.PartySiteUse.OriginalSystemReference
$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.PartySite.PartySiteUse.OriginalSystemReference
$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.Relationship.PartySite.PartySiteUse.OriginalSystemReference
$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.Accounts.Account.ChildRelationship.PartySite.PartySiteUse.OriginalSystemReference
$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.Accounts.Account.ParentRelationship.PartySite.PartySiteUse.OriginalSystemReference
How can I fix this issue ?
Thanks
>Solution :
When iterating over the file contents sample_text contains newline characters at the end,
e.g. '$.SyncCustomerRequestABM.SyncCustomerRequest.Customers.Customer.OriginalSystemReference\n'
The issue is that the nodeid by splitting the line and taking the last element of the split.
You can fix the problem by stripping the sample_text at the beginning of each iteration:
sample_text = sample_text.strip()
The reason why it worked for the last line is that it does not contain the newline character in your file.
This will help you with your existing code but I also strongly suggest looking into better ways of generating these string:
- database libraries allow you pass parameters to SQL queries e.g. psycopg https://www.psycopg.org/docs/usage.html#passing-parameters-to-sql-queries
- using f-strings to avoid boilerplate code https://realpython.com/python-f-strings/