Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pub/Sub: How to retrieve messages sent to a dead-letter-topic and store them into a BigQuery table?

For context

I currently have a data mining script which collects a massive throughput of event data and publishes it to a Pub/Sub topic with a schema enforcement (with all fields being optional). A Pub/Sub-to-BigQuery subscription then sends the data to a destination table.

My problem

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

At present the script fetching the data sometimes also picks up events that do not conform to the schema. In order to avoid downtimes, the script has been configured to ingest erroneous messages and send them to the destination table in the form of a single field with a JSON string containing message contents while the rest of the fields for that row are nulls. This obviously poses a data quality issue, since end-users shouldn’t have to filter through so many null values. However, since the schema failure could have resulted on a variety of fields, I need to somehow retain the raw JSON string and monitor which events are causing trouble.

Potential Solution

To move toward a cleaner approach, I have been exploring the use of a dead-letter topic to which these erroneous messages are sent. The idea is to have two separate BigQuery tables:

  • One corresponding a stricter policy that filters out errors before they reach destination table.
  • One which retains only the messages from the dead-letter topic (storing the unacknowledged messages in a GCS bucket would also be an option).

This would avoid the data-quality issue, while also retain information regarding errors in a more cost-efficient way.

Question

Is what I am trying to achieve possible? I have explored Google’s documentation on dead-letter topics, but there is no details on how to retrieve unacknowledged messages.

>Solution :

A dead-letter topic is just like any other topic. Once messages are published to it, you can use a subscription that is attached to the topic to fetch the messages and do whatever you want with them. If all you want to do is write the messages to a BigQuery table, consider attaching a BigQuery subscription to your dead-letter topic.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading