I divided the problem into 7 steps
- Process the data
- Make a tf dataset
- Build the model
- Train the model
- Convert the model into tf_lite and ONNX
- Make inference
- Build an endpoint using flask
we have 2 types of data (image, text)
- the images have a shape of (80, 500, 3)
the images has a big background that has no information so i cropped it and it becomes of a shape (80, 250, 3) - for the text the sequence length is 10 so not preprocessing needed
first i split the data into train, test and validation
each set has two elements (image path, encoded text)
after that i created a tf.data.Dataset that return dictionary of
{
'enc_inputs': the image tensor of shape (None, 80, 250, 3),
'dec_inputs': date input to decoder in the form of 'G' + the date
},
'labels' => decoder output in the form of date + 'E'
G and E stands for (GO) and (End) for starting and ending the generation
i divided the model in to 3 parts
-
encoder
-
decoder
-
attention mechanism
-
for the encoder part i chose to use mobilenet v3 small => (low number of parameter with high accuracy).
-
for the decoder part since it has sequence information and the sequence is short there is no need to use a transformer decoder here LSTM unit will be great and even better.
-
for the attention part i used multiheadattention layer with 1 head and key_dim of 128
for training i used the following
- adam optimizer with lr of .001
- sparse categorical crossentropy as a loss function
- accuracy as a metrics
after training done i converted the model into tf_lite format the hall size of the model decreases
from 14 mb to 1.5 mb and inference time from 2.1 sec to 135 ms
Write Inference Code
run the script using
python app.py
then send a post request with an image as key and an image file as a value to the endpoint
http://127.0.0.1:5000/extract_date