Skip to content Skip to sidebar Skip to footer

Tensorflow Training Model On Image And Text Features, With Multi Class Outputs

I have a dataset that includes both images and text features. The labels for the training data is a 2 dimensional array, the same shape as the input images, of 1s/0s. So basically,

Solution 1:

One way is to define two independent sub-models for processing text and image data and then merge the output of those sub-models to create the final model:

---------------        ---------------
- Input Image -        - Input Text  -
---------------        ---------------
       |                       |
       |                       |
       |                       |
---------------        ---------------------  
- Image Model -        -     Text Model    -
- (e.g. CNNs) -        - (e.g. Embeddings, -
---------------        -  LSTM, Conv1D)    -
       \               ---------------------
        \                     /
         \                   /
          \                 /
           \               /
            \             /
             \           /
              \         /
               \       /
           ----------------------
           -      Merge         -
           - (e.g. concatenate) -
           ----------------------
                     |
                     |
                     |
           ----------------------
           -      Upsample      -
           - (e.g. Dense layer, -
           -   transpose-conv)  -
           ----------------------
                     |
                     |
                     |
                -----------
                -  Output -
                -----------

Each of those boxes corresponds to one or several layers and you may have different ways of implementing them and setting their parameters, though I have mentioned some suggestions in each box.

Post a Comment for "Tensorflow Training Model On Image And Text Features, With Multi Class Outputs"