Add a gaussian noise to a Tensorflow Dataset

I have a CSVDataset which has around 6 million rows. For the purposes of this question I am making a TensorSliceDataset as following:-

import tensorflow as tf
import numpy as np

datasetz = tf.data.Dataset.from_tensor_slices((np.random.randn(10, 5), np.random.randn(10,1)))
datasetz = datasetz.map(lambda x, y: (x, x))
datasetz

# <MapDataset element_spec=(TensorSpec(shape=(5,), dtype=tf.float64, name=None), TensorSpec(shape=(5,), dtype=tf.float64, name=None))>

I am trying to make a denoising autoencoder. For this, I need to add some noise to my dataset. If dataset were a numpy.ndarray, I could’ve added the noise the following way:-

corruption_level = 0.3
datasetz = datasetz + (np.random.randn(10, 5) * corruption_level)

But I don’t know how to do it with a CSVDataset object.

>Solution :

This adds each row with random noise:

datasetz = tf.data.Dataset.from_tensor_slices((np.random.randn(10, 5), np.random.randn(10,1)))
datasetz = datasetz.map(lambda x, y: (x+corruption_level*tf.random.uniform(shape=(5,), dtype=tf.float64), y))
datasetz

Leave a Reply