Tensorflow Training Model On Image And Text Features, With Multi Class Outputs
I have a dataset that includes both images and text features. The labels for the training data is a 2 dimensional array, the same shape as the input images, of 1s/0s. So basically,
Solution 1:
One way is to define two independent sub-models for processing text and image data and then merge the output of those sub-models to create the final model:
--------------- ---------------
- Input Image - - Input Text -
--------------- ---------------
| |
| |
| |
--------------- ---------------------
- Image Model - - Text Model -
- (e.g. CNNs) - - (e.g. Embeddings, -
--------------- - LSTM, Conv1D) -
\ ---------------------
\ /
\ /
\ /
\ /
\ /
\ /
\ /
\ /
----------------------
- Merge -
- (e.g. concatenate) -
----------------------
|
|
|
----------------------
- Upsample -
- (e.g. Dense layer, -
- transpose-conv) -
----------------------
|
|
|
-----------
- Output -
-----------
Each of those boxes corresponds to one or several layers and you may have different ways of implementing them and setting their parameters, though I have mentioned some suggestions in each box.
Post a Comment for "Tensorflow Training Model On Image And Text Features, With Multi Class Outputs"