shaping tf.Data as input for LSTM layer fails with incompatible dimensions

I’m trying to build a neural network that predicts the next number from a simple sequence of numbers, thus I’m taking my input of 3 and putting in a tf.data.Dataset, now when I try to feed this to an LSTM layer I get the following error

ValueError: Input 0 of layer "lstm" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 3)

After building the simplest tf.data.Dateset I can image with just 4 samples I try to feed it into a an LSTM which has 64 hidden units, and since the sequence is 3 steps, I’m shaping the input as (2, 3, 1) (batch size=2, steps = 3, features =1), from the data I constructed every dataset tensor will be ((3,1),(1,)) and then when batches, the first layer should receive it’s (2, 3, 1) which is not happening?

But I cannot see why this would happen for such a simple setup:

import tensorflow as tf
from keras.layers import LSTM, Dense

inputs = [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]]
outputs = [[4], [5], [6], [7]]

dataset = tf.data.Dataset.from_tensor_slices((inputs, outputs)).batch(2)


class Model(tf.keras.Model):
    def __init__(self, input_size, hidden_size, num_classes, steps, batch_size):
        super(Model, self, ).__init__()
        self.lstm = LSTM(hidden_size, input_shape=(batch_size,steps, input_size))
        self.fc = Dense(num_classes)

    def call(self, input, training=False):
        out = self.lstm(input)
        out = self.fc(out)
        return out


model = Model(input_size=1, hidden_size=64, num_classes=4, steps=3, batch_size=2)
model.build(input_shape=(2, 3, 1,))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics='accuracy')
model.summary()
model.fit(dataset, epochs=2)

tf.data.Dataset doesn’t have any reshape function, so I cannot follow most of the other answers on SO, what is it missing to fit the input to the LSTM?

>Solution :

Try adding the additional dimension to your data, since a LSTM layer needs the input shape (timesteps, features) without the batch size. This layer also requires floating data and not integers. Also, if you are use categorical_crossentropy, your labels need to be one-hot encoded. Otherwise, use sparse_categorical_crossentropy and make sure your labels begin at 0 and not 4:

import tensorflow as tf
from keras.layers import LSTM, Dense

inputs = tf.constant([[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]], dtype=tf.float32)
outputs = tf.constant([[0], [1], [2], [3]])

dataset = tf.data.Dataset.from_tensor_slices((inputs[..., None], outputs)).batch(2)

class Model(tf.keras.Model):
    def __init__(self, input_size, hidden_size, num_classes, steps):
        super(Model, self, ).__init__()
        self.lstm = LSTM(hidden_size, input_shape=(steps, input_size))
        self.fc = Dense(num_classes)

    def call(self, input, training=False):
        out = self.lstm(input)
        out = self.fc(out)
        return out


model = Model(input_size=1, hidden_size=64, num_classes=4, steps=3)
model.build(input_shape=(2, 3, 1,))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics='accuracy')
model.summary()
model.fit(dataset, epochs=2)

If you want to use categorical_crossentropy as your loss function, try changing your dataset like this:

dataset = tf.data.Dataset.from_tensor_slices((inputs[..., None], tf.keras.utils.to_categorical(outputs, 4))).batch(2)

Leave a Reply