Keras Pad_sequences Throwing Invalid Literal For Int () With Base 10

May 08, 2024 Post a Comment

Traceback (most recent call last): File '.\keras_test.py', line 62, in X_train = sequence.pad_sequences(X_train, maxlen=max_review_length) File 'C:\P

Solution 1:

The pad_sequence function has its default data type as 'int32':

keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32', 
                                           padding='pre', truncating='pre', value=0.)

The data you're passing is a string instead.

Adding to that, you can't use strings in a keras model.

You must "tokenize" those strings. Even if you may think it could pad strings, you must then decide what character it will pad with:

A space? But spaces may be meaningful characters
A Null character? The best idea, but how to increase the length of a string with null characters?
What if you're working with words instead of chars, where each token/id has a different string length?

That's why you must create a dictionary of integer id values representing each char or word in your existing data. And transform all your strings in lists of ids

Baca Juga

Then you'd probably benefit from starting the model with an Embedding layer.

Example, if you're working with word ids:

Word 0:nullwordWord 1:endofsentenceWord 2:spacecharacter(maybenotimportanttosomelanguages)Word 3:aWord 4:addedWord 5:amWord 6:and....Word 520:plusWord 2014:'ve
Word 
etc.....

Then your sentence would be a list with: [520, 2014, 4, ....]

Python Playground

Keras Pad_sequences Throwing Invalid Literal For Int () With Base 10

Solution 1:

Post a Comment for "Keras Pad_sequences Throwing Invalid Literal For Int () With Base 10"