Keras Pad_sequences Throwing Invalid Literal For Int () With Base 10
Traceback (most recent call last): File '.\keras_test.py', line 62, in X_train = sequence.pad_sequences(X_train, maxlen=max_review_length) File 'C:\P
Solution 1:
The pad_sequence function has its default data type as 'int32':
keras.preprocessing.sequence.pad_sequences(sequences, maxlen=None, dtype='int32',
padding='pre', truncating='pre', value=0.)
The data you're passing is a string instead.
Adding to that, you can't use strings in a keras model.
You must "tokenize" those strings. Even if you may think it could pad strings, you must then decide what character it will pad with:
- A space? But spaces may be meaningful characters
- A Null character? The best idea, but how to increase the length of a string with null characters?
- What if you're working with words instead of chars, where each token/id has a different string length?
That's why you must create a dictionary of integer id values representing each char or word in your existing data. And transform all your strings in lists of ids
Then you'd probably benefit from starting the model with an Embedding
layer.
Example, if you're working with word ids:
Word 0:nullwordWord 1:endofsentenceWord 2:spacecharacter(maybenotimportanttosomelanguages)Word 3:aWord 4:addedWord 5:amWord 6:and....Word 520:plusWord 2014:'ve
Word
etc.....
Then your sentence would be a list with: [520, 2014, 4, ....]
Post a Comment for "Keras Pad_sequences Throwing Invalid Literal For Int () With Base 10"