Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do you convert the pandas DataFrame to tensorflow.python.data.ops.dataset_ops.PrefetchDataset

Given that I have the below Tensorflow Dataset:

import tensorflow_datasets as tfds
(raw_train_ds, raw_val_ds, raw_test_ds), info = tfds.load('ag_news_subset',
                                                          split=['train[:90%]',
                                                                 'train[-90%:]',
                                                                 'test'],
                                                          with_info=True)

The type of raw_train_ds is tensorflow.python.data.ops.dataset_ops.PrefetchDataset

I need to apply the below remove_stop_words() method to the description features of the dataset, so i should convert it to DataFrame and i can convert this using the below code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

train_sample_df = \
    tfds.as_dataframe(raw_train_ds.shuffle(batch_size),
                      ds_info=info)[['description', 'label']]

and I must apply remove_stop_words() to this dataframe as below:

def remove_stop_words(tweet):
    tweet = tweet.decode("utf-8")
    #print(tweet," ",type(tweet))
    stopwords = ["a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "as", "at",
                 "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "could", "did",
                 "do", "does", "doing", "down", "during", "each", "few", "for", "from", "further", "had", "has", "have",
                 "having", "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself",
                 "his", "how", "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "it", "it's",
                 "its", "itself", "let's", "me", "more", "most", "my", "myself", "nor", "of", "on", "once", "only",
                 "or", "other", "ought", "our", "ours", "ourselves", "out", "over", "own", "same", "she", "she'd",
                 "she'll", "she's", "should", "so", "some", "such", "than", "that", "that's", "the", "their", "theirs",
                 "them", "themselves", "then", "there", "there's", "these", "they", "they'd", "they'll", "they're",
                 "they've", "this", "those", "through", "to", "too", "under", "until", "up", "very", "was", "we",
                 "we'd", "we'll", "we're", "we've", "were", "what", "what's", "when", "when's", "where", "where's",
                 "which", "while", "who", "who's", "whom", "why", "why's", "with", "would", "you", "you'd", "you'll",
                 "you're", "you've", "your", "yours", "yourself", "yourselves"]
    tweet = tweet.lower()
    words = tweet.split(' ')
    non_stop_words = [w for w in words if w not in stopwords]
    return (" ").join(non_stop_words)

train_sample_df['description'] = train_sample_df['description'].apply(lambda tweet: remove_stop_words(tweet) if tweet is not np.nan else tweet)

and finally I need to convert train_sample_df to the tensorflow.python.data.ops.dataset_ops.PrefetchDataset again, but i don’t know how to do it.

Any idea ?

>Solution :

Try using tf.data.Dataset.from_tensor_slices and then do what you want:

import tensorflow as tf

dataset = tf.data.Dataset.from_tensor_slices((train_sample_df['description'], train_sample_df['label'])).prefetch(10) # call batch, shuffle etc.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading