v11.2.0

Merge branch 'rolling'
v11.2.0
2025-12-08 20:46:52 +03:00 · 2022-08-28 20:04:36 +02:00 · 2022-08-28 19:54:15 +02:00 · 2022-08-28 19:52:51 +02:00 · 2022-08-27 20:41:14 +02:00 · 2022-08-27 20:21:53 +02:00
275 changed files with 16262 additions and 2679 deletions
--- a/.DS_Store
+++ b/.DS_Store
--- a/.gitignore
+++ b/.gitignore
@@ -4,6 +4,7 @@
 .code-workspace
 /sd-card/htm./.vscode/
 /code/build
+/sd-card/html/debug/

 CMakeLists.txt.user
 CMakeCache.txt
@@ -15,3 +16,5 @@ install_manifest.txt
 compile_commands.json
 CTestTestfile.cmake
 _deps
+code/edgeAI.code-workspace
+.DS_Store
--- a/Changelog.md
+++ b/Changelog.md
@@ -1,5 +1,151 @@
 # Versions

+##### 10.6.2 - Stability Increase (2022-07-24)
+
+- **NEW 10.6.2**: ignore hidden files in model selection (configuration page) 
+
+- **NEW 10.6.1**: Revoke esp32cam & tflite update
+
+- **NEW 10.6.1**: Bug Fix: tflite-filename with ".", HTML spelling error
+
+- IndluxDB: direct injection into InfluxDB - thanks to **[wetneb](https://github.com/wetneb)**
+
+- MQTT: implemented "Retain Flag" and extend with absolute Change (in addition to rate)
+
+- `config.ini`: removal of modelsize (readout from tflite)
+
+- Updated analog neural network file (`ana1000s2.tflite`) & digital neural network file (`dig1400s2q.tflite`)
+
+- TFMicro/Lite: Update (espressif Version 20220716)
+
+- Updated esp32cam (v20220716)
+
+- ESP-IDF: Update to 4.4
+
+- Internal update (CNN algorithm optimizations, reparation for new neural network type)
+
+- Bug Fix: no time with fixed IP, Postprocessing, MQTT
+
+  
+
+##### 10.5.2 - Stability Increase (2022-02-22)
+
+- NEW 10.5.2: Bug Fix: wrong `firmware.bin` (no rate update)
+- NEW 10.5.1: Bug Fix: wrong return value, rate value & PreValue status, HTML: SSID & IP were not displayed 
+- MQTT: changed wifi naming to "wifiRSSI"
+- HTML: check selectable values for consistency
+- Refactoring of check postprocessing consistency (e.g. max rate, negative rate, ...)
+- Bug Fix: corrected error in "Check Consistency Increase"
+
+
+
+##### 10.4.0 - Stability Increase (2022-02-12)
+
+- Graphical configuration: select available neural network files (*.tfl, *.tflite) from drop down menu
+- OTA-update: add option to upload tfl / tflite files to the correct location (`/config/`)
+  - In the future the new files will also be copied to the `firmware` directory of the repository
+- Added Wifi RSSI to MQTT information
+- Updated analog neural network file (`ana-s3-q-20220105.tflite`)
+- Updated digital neural network file (`dig-s1-q-20220102.tflite`)
+- Updated build environment to `Espressif 3.5.0`
+
+
+
+##### 10.3.0 - Stability Increase (2022-01-29)
+
+- Implemented LED flash dimming (`LEDIntensity`). 
+  Remark: as auto illumination in the camera is used, this is rather for energy saving. It will not help reducing reflections
+- Additional camera parameters: saturation, contrast (although not too much impact yet)
+- Some readings will have removable "N"s that can not be removed automatically and are handled with an "error" --> no return value in the field "value" anymore (still reported back via field "raw value")
+- Updated esp32 camera hardware driver
+- Bug fix: MQTT, HTML improvements
+
+**ATTENTION:  The new ESP32 camera hardware driver is much more stable on newer OV2640 versions (no or much less reboots) but seems to be not fully compatible with older versions.**
+
+* If you have problem with stalled systems you can try the following
+  - Update the parameter `ImageQuality` to `12` instead of current value `5` (manually in the `config.ini`)
+
+  - If this is not helping, you might need to update your hardware or stay with version 9.2
+
+##### 10.2.0 - Stability Increase (2022-01-14)
+
+- Due to the updated camera driver, the image looks different and a new setup might be needed
+
+  - Update reference image
+  - Update Alignment marks
+
+- Reduce reboot due to camera problems
+
+- Update esp32-camera to new version (master as of 2022-01-09)
+
+  
+
+##### 10.1.1 - Stability Increase (2022-01-12)
+
+- Bug Fix MQTT problem
+- Issue:
+  - Changing from v9.x to 10.x the MQTT-parameter "Topic" was renamed into "MainTopic" to address multiple number meters. This renaming should have been done automatically in the background within the graphical configuration, but was not working. Instead the parameter "Topic" was deleted and "MainTopic" was set to disabled and "undefined".
+- ToDo
+  - Update the `html.zip`
+  - If old `config.ini` available: copy it to `/config`, open the graphical configuration and save it again.
+  - If old `config.ini` not available: reset the parameter "MainTopic" within the `config.ini` manually
+  - Reboot
+
+##### 10.1.0 - Stability Increase (2022-01-09)
+
+- Reduce ESP32 frequency to 160MHz
+
+- Update tflite (new source: https://github.com/espressif/tflite-micro-esp-examples)
+
+- Update analog neural network (ana-s3-q-20220105.tflite)
+
+- Update digital neural network (dig-s1-q-20220102.tflite)
+
+- Increased web-server buffers
+- bug fix: compiler compatibility
+
+##### 10.0.2 - Stability Increase (2022-01-01)
+
+- NEW v10.0.2: Corrected JSON error
+
+- Updated compiler toolchain to ESP-IDF 4.3
+
+- Removal of memory leak
+
+- Improved error handling during startup (check PSRAM and camera with remark in logfile)
+
+- MQTT: implemented raw value additionally, removal of regex contrain
+
+- Normalized Parameter ``MaxRateValue``  to "change per minute" 
+
+- HTML: improved input handling
+
+- Corrected error handling: in case of error the old value, rate, timestamp are not transmitted any more
+
+  
+
+##### 9.2.0 - External Illumination (2021-12-02)
+
+- Direct JSON access: ``http://IP-ADRESS/json`` 
+- Error message in log file in case camera error during startup
+- Upgrade analog CNN to v9.1.0
+- Upgrade digital CNN to v13.3.0 (added new images)
+- html: support of different ports
+
+##### 9.1.1 - External Illumination (2021-11-16)
+
+- NEW 9.1.1 bug fix: LED implemenetation
+- External LEDs: change control mode (resolve bug with more than 2 LEDs)
+- Additional info into log file
+- Bug fix: decimal shift, html, log file
+
+##### 9.0.0 - External Illumination (2021-10-23)
+
+* Implementation of external illumination to adjust positioning, brightness and color of the illumination now set individually
+  * Technical details can be found in the wiki: https://github.com/jomjol/AI-on-the-edge-device/wiki/External-LED
+    <img src="https://raw.githubusercontent.com/jomjol/ai-on-the-edge-device/master/images/intern_vs_external.jpg" width="500">
+* New housing published for external LEDs and small clearing: https://www.thingiverse.com/thing:5028229
+


 ##### 8.5.0 - Multi Meter Support (2021-10-07)
--- a/FeatureRequest.md
+++ b/FeatureRequest.md
@@ -11,6 +11,15 @@

 ____

+#### #29 Add favicon and use the hostname for the website
+
+* https://github.com/jomjol/AI-on-the-edge-device/issues/927
+
+#### #28 Improved error handling for ROIs
+
+* In case a ROI is out of the image, there is no error message, but a non sense image is used
+* Implement a error message for wrong configuratioin of ROI
+
 #### #27 Use Homie Spec for Mqtt binding

 * Use the standardized Home Protocol for the Mqtt binding 
@@ -55,7 +64,8 @@ ____
 #### #20 Deep sleep and push mode

 * Let the device be normally in deep sleep state, and wake it up periodically to collect data and push it via MQTT or HTTP post.
-
+* Support ESP-NOW to reduce the overhead of connecting to wifi and mqtt 
+* the above should enable battery powered applications
  

 #### #19 Extended log informations
--- a/README.md
+++ b/README.md
@@ -33,171 +33,52 @@ If you have any technical topics, you can file a issue in this repository.

 In other cases you can contact the developer via email: <img src="https://raw.githubusercontent.com/jomjol/AI-on-the-edge-device/master/images/mail.jpg" height="25"> 

------
-## Coming next
-
-* Automated update of the neural network file (tflite) to make the learing of additional pictures much easier and automated (GitHub action)
-* New "hyprid" neural network for digital numbers --> allowing the detection of intermediate states ("ring between two numbers") as a subdigit
-

 ------
 ## Change log
-### Known Issues
-
-* Slow response of web server during picture analysis
-
 **General remark:** Besides the file `firmware.bin`, typically the content of `/html` will need to be updated!

 ------

+##### 11.2.0 - Intermediate Digits (2022-08-28)

+- Updated Tensorflow / TFlite to newest tflite (version as of 2022-07-27)
+- Updated analog neural network file (`ana-cont_11.3.0_s2.tflite` - default, `ana-class100_0120_s1_q.tflite`)
+- Updated digital neural network file (`dig-cont_0570_s3.tflite` - default, `dig-class100_0120_s2_q.tflite`)

+- Added automated filtering of tflite-file in the graphical configuration (thanks to @**[caco3](https://github.com/caco3)**)
+- Updated consistency algorithm & test cases
+- HTML: added favicon and system name, Improved reboot dialog  (thanks to @**[caco3](https://github.com/caco3)**)

-##### 10.6.1 - Stability Increase (2022-07-24)
+##### 11.1.1 - Intermediate Digits (2022-08-22)

- **NEW 10.6.1**: Revoke esp32cam & tflite update
+- New and improved consistency check (especially with analog and digital counters mixed)
+- Bug Fix: digital counter algorithm

- **NEW 10.6.1**: Bug Fix: tflite-filename with ".", HTML spelling error
+##### 11.0.1 - Intermediate Digits (2022-08-18)

- IndluxDB: direct injection into InfluxDB - thanks to **[wetneb](https://github.com/wetneb)**
+- **NEW v11.0.1**: Bug Fix InfluxDB configuration (only update of html.zip necessary)

- MQTT: implemented "Retain Flag" and extend with absolute Change (in addition to rate)
+- Implementation of new CNN types to detect intermediate values of digits with rolling numbers

- `config.ini`: removal of modelsize (readout from tflite)
+  - By default the old algo (0, 1, ..., 9, "N") is active (due to the limited types of digits trained so far)
+  - Activation can be done by selection a tflite file with the new trained model in the 'config.ini'
+  - **Details can be found in the [wiki](https://github.com/jomjol/AI-on-the-edge-device/wiki/Neural-Network-Types)** (different types, trained image types, naming convention)

- Updated analog neural network file (`ana1000s2.tflite`) & digital neural network file (`dig1400s2q.tflite`)
+- Updated  neural network files (and adaption to new naming convention)

- TFMicro/Lite: Update (espressif Version 20220716)
+- Published a tool to download and combine log files - **Thanks to **

- Updated esp32cam (v20220716)
+  - Files see ['/tools/logfile-tool'](tbd), How-to see [wiki](https://github.com/jomjol/AI-on-the-edge-device/wiki/Gasmeter-Log-Downloader)

- ESP-IDF: Update to 4.4
-
- Internal update (CNN algorithm optimizations, reparation for new neural network type)
-
- Bug Fix: no time with fixed IP, Postprocessing, MQTT
+- Bug Fix: InfluxDB enabling in grahic configuration

  

-##### 10.5.2 - Stability Increase (2022-02-22)
-
- NEW 10.5.2: Bug Fix: wrong `firmware.bin` (no rate update)
- NEW 10.5.1: Bug Fix: wrong return value, rate value & PreValue status, HTML: SSID & IP were not displayed 
- MQTT: changed wifi naming to "wifiRSSI"
- HTML: check selectable values for consistency
- Refactoring of check postprocessing consistency (e.g. max rate, negative rate, ...)
- Bug Fix: corrected error in "Check Consistency Increase"
-
-
-
-##### 10.4.0 - Stability Increase (2022-02-12)
-
- Graphical configuration: select available neural network files (*.tfl, *.tflite) from drop down menu
- OTA-update: add option to upload tfl / tflite files to the correct location (`/config/`)
-  - In the future the new files will also be copied to the `firmware` directory of the repository
- Added Wifi RSSI to MQTT information
- Updated analog neural network file (`ana-s3-q-20220105.tflite`)
- Updated digital neural network file (`dig-s1-q-20220102.tflite`)
- Updated build environment to `Espressif 3.5.0`
-
-
-
-##### 10.3.0 - Stability Increase (2022-01-29)
-
- Implemented LED flash dimming (`LEDIntensity`). 
-  Remark: as auto illumination in the camera is used, this is rather for energy saving. It will not help reducing reflections
- Additional camera parameters: saturation, contrast (although not too much impact yet)
- Some readings will have removable "N"s that can not be removed automatically and are handled with an "error" --> no return value in the field "value" anymore (still reported back via field "raw value")
- Updated esp32 camera hardware driver
- Bug fix: MQTT, HTML improvements
-
-**ATTENTION:  The new ESP32 camera hardware driver is much more stable on newer OV2640 versions (no or much less reboots) but seems to be not fully compatible with older versions.**
-
-* If you have problem with stalled systems you can try the following
-  - Update the parameter `ImageQuality` to `12` instead of current value `5` (manually in the `config.ini`)
-
-  - If this is not helping, you might need to update your hardware or stay with version 9.2
-
-##### 10.2.0 - Stability Increase (2022-01-14)
-
- Due to the updated camera driver, the image looks different and a new setup might be needed
-  - Update reference image
-  - Update Alignment marks
-
- Reduce reboot due to camera problems
-
- Update esp32-camera to new version (master as of 2022-01-09)
-
-  
-
-##### 10.1.1 - Stability Increase (2022-01-12)
-
- Bug Fix MQTT problem
- Issue:
-  - Changing from v9.x to 10.x the MQTT-parameter "Topic" was renamed into "MainTopic" to address multiple number meters. This renaming should have been done automatically in the background within the graphical configuration, but was not working. Instead the parameter "Topic" was deleted and "MainTopic" was set to disabled and "undefined".
- ToDo
-  - Update the `html.zip`
-  - If old `config.ini` available: copy it to `/config`, open the graphical configuration and save it again.
-  - If old `config.ini` not available: reset the parameter "MainTopic" within the `config.ini` manually
-  - Reboot
-
-##### 10.1.0 - Stability Increase (2022-01-09)
-
- Reduce ESP32 frequency to 160MHz
-
- Update tflite (new source: https://github.com/espressif/tflite-micro-esp-examples)
-
- Update analog neural network (ana-s3-q-20220105.tflite)
-
- Update digital neural network (dig-s1-q-20220102.tflite)
-
- Increased web-server buffers
- bug fix: compiler compatibility
-
-##### 10.0.2 - Stability Increase (2022-01-01)
-
- NEW v10.0.2: Corrected JSON error
-
- Updated compiler toolchain to ESP-IDF 4.3
-
- Removal of memory leak
-
- Improved error handling during startup (check PSRAM and camera with remark in logfile)
-
- MQTT: implemented raw value additionally, removal of regex contrain
-
- Normalized Parameter ``MaxRateValue``  to "change per minute" 
-
- HTML: improved input handling
-
- Corrected error handling: in case of error the old value, rate, timestamp are not transmitted any more
-
-  
-
-##### 9.2.0 - External Illumination (2021-12-02)
-
- Direct JSON access: ``http://IP-ADRESS/json`` 
- Error message in log file in case camera error during startup
- Upgrade analog CNN to v9.1.0
- Upgrade digital CNN to v13.3.0 (added new images)
- html: support of different ports
-
-##### 9.1.1 - External Illumination (2021-11-16)
-
- NEW 9.1.1 bug fix: LED implemenetation
- External LEDs: change control mode (resolve bug with more than 2 LEDs)
- Additional info into log file
- Bug fix: decimal shift, html, log file
-
-##### 9.0.0 - External Illumination (2021-10-23)
-
-* Implementation of external illumination to adjust positioning, brightness and color of the illumination now set individually
-  * Technical details can be found in the wiki: https://github.com/jomjol/AI-on-the-edge-device/wiki/External-LED
-  <img src="https://raw.githubusercontent.com/jomjol/ai-on-the-edge-device/master/images/intern_vs_external.jpg" width="500">
-* New housing published for external LEDs and small clearing: https://www.thingiverse.com/thing:5028229
-
-
+## Tools

+* Logfile downloader and combiner (Thx to [reserve85](https://github.com/reserve85))
+  * Files see ['/tools/logfile-tool'](tbd), How-to see [wiki](https://github.com/jomjol/AI-on-the-edge-device/wiki/Gasmeter-Log-Downloader)



@@ -211,6 +92,10 @@ There are some ideas and feature requests which are not followed currently - mai

 ## History

+##### 10.6.2 - Stability Increase (2022-07-24)
+
+##### 9.2.0 - External Illumination (2021-12-02)
+
 ##### 8.5.0 Multi Meter Support (2021-10-07)

 ##### 7.1.2 MQTT-Update - (2021-06-17)
--- a/code/components/esp-nn/CMakeLists.txt
+++ b/code/components/esp-nn/CMakeLists.txt
@@ -5,7 +5,9 @@ set(c_srcs
    "src/basic_math/esp_nn_add_ansi.c"
    "src/basic_math/esp_nn_mul_ansi.c"
    "src/convolution/esp_nn_conv_ansi.c"
+    "src/convolution/esp_nn_conv_opt.c"
    "src/convolution/esp_nn_depthwise_conv_ansi.c"
+    "src/convolution/esp_nn_depthwise_conv_opt.c"
    "src/fully_connected/esp_nn_fully_connected_ansi.c"
    "src/softmax/esp_nn_softmax_ansi.c"
    "src/softmax/esp_nn_softmax_opt.c"
@@ -23,7 +25,7 @@ if(CONFIG_IDF_TARGET_ESP32S3)
        "src/convolution/esp_nn_conv_esp32s3.c"
        "src/convolution/esp_nn_depthwise_conv_s8_esp32s3.c"
        "src/convolution/esp_nn_conv_s16_mult8_esp32s3.S"
-        "src/convolution/esp_nn_conv_s16_mult8_1x1_esp32s3.S"
+        "src/convolution/esp_nn_conv_s8_mult8_1x1_esp32s3.S"
        "src/convolution/esp_nn_conv_s16_mult4_1x1_esp32s3.S"
        "src/convolution/esp_nn_depthwise_conv_s8_mult1_3x3_padded_esp32s3.S"
        "src/convolution/esp_nn_depthwise_conv_s16_mult1_esp32s3.S"
--- a/code/components/esp-nn/Kconfig.projbuild
+++ b/code/components/esp-nn/Kconfig.projbuild
@@ -6,8 +6,8 @@ choice NN_OPTIMIZATIONS
   help
      Use ANSI-C versions for verification and debug purpose.
      Optimisations are automatically picked up for a chipset.
-      For ESP32-S3, assembly Optimisations are selected.
-      For ESP32, just the ANSI C versions are selected for now.
+      For ESP32-S3, assembly optimisations are selected.
+      For other platforms(viz., ESP32, ESP32-C3), generic optimisations are used.

 config NN_ANSI_C
   bool "ANSI C"
@@ -17,8 +17,8 @@ config NN_OPTIMIZED
   bool "Optimized versions"
   help
      Optimisations are automatically picked up for a chipset.
-      For ESP32-S3, assembly Optimisations are selected.
-      For ESP32, just the ANSI C versions are selected for now.
+      For ESP32-S3, assembly optimisations are selected.
+      For other platforms(viz., ESP32, ESP32-C3), generic optimisations are used.
 endchoice

 config NN_OPTIMIZATIONS
--- a/code/components/esp-nn/README.md
+++ b/code/components/esp-nn/README.md
@@ -7,7 +7,8 @@ The library contains optimised NN (Neural Network) functions for various Espress

 * Supported ESP chipsets include:
   * ESP32-S3 (Assembly versions optimised to benefit from vector instructions of ESP32-S3)
-   * ESP32 (ANSI C versions)
+   * ESP32 (Generic optimisations)
+   * ESP32-C3 (Generic optimisations)

 ## Performance

@@ -39,8 +40,8 @@ The library contains optimised NN (Neural Network) functions for various Espress
     * Optimized versions
     * ANSI C

-  * Default selection is for `Optimized versions`. For ESP32-S3, assembly versions are automatically selected, whereas for ESP32,  ANSI-C versions are selected by default.
-  * For debugging purposes, you may want to select `ANSI C`
+  * Default selection is for `Optimized versions`. For ESP32-S3, assembly versions are automatically selected, whereas for other chipsets (viz., ESP32, ESP32-C3), generic optimisations are selected.
+  * For debugging purposes, you may want to select `ANSI C` reference versions.


 ## Contributing
--- a/code/components/esp-nn/include/esp_nn.h
+++ b/code/components/esp-nn/include/esp_nn.h
@@ -15,6 +15,7 @@
 #pragma once

 #if defined(CONFIG_NN_OPTIMIZED)
+// select apt optimisations
 #ifdef CONFIG_IDF_TARGET_ESP32S3
 #define ARCH_ESP32_S3 1
 #endif
@@ -31,12 +32,11 @@ extern "C" {
 #include "esp_nn_ansi_headers.h"

 #if defined(CONFIG_NN_OPTIMIZED)
-#ifdef ARCH_ESP32_S3
+#if defined(ARCH_ESP32_S3)
 #include "esp_nn_esp32s3.h"
-#endif
-#ifdef ARCH_ESP32
-#include "esp_nn_esp32.h"
-#endif
+#else // for other platforms use generic optimisations
+#include "esp_nn_generic_opt.h"
+#endif // #if defined(ARCH_ESP32_S3)
 #else
 #include "esp_nn_ansi_c.h"
 #endif
--- a/code/components/esp-nn/include/esp_nn_ansi_c.h
+++ b/code/components/esp-nn/include/esp_nn_ansi_c.h
@@ -19,6 +19,7 @@

 #pragma once

+#include "esp_nn_defs.h"
 #include "esp_nn_ansi_headers.h"

 #define esp_nn_add_elementwise_s8 esp_nn_add_elementwise_s8_ansi
--- a/code/components/esp-nn/include/esp_nn_ansi_headers.h
+++ b/code/components/esp-nn/include/esp_nn_ansi_headers.h
@@ -18,8 +18,7 @@
 * @file        Header definitions to include for esp_nn reference functions
 */

-#include <stdint.h>
-
+#include "esp_nn_defs.h"
 /************************** Basic math functions ****************************/

 /**
@@ -81,28 +80,15 @@ void esp_nn_mul_elementwise_s8_ansi(const int8_t *input1_data,
 *              optimization notes: Though input_offset is int32 type,
 *              offset values are contained in 8 bits [-128, 127]
 */
-void esp_nn_depthwise_conv_s8_ansi(const int8_t *input_data,
-                                   const uint16_t input_wd,
-                                   const uint16_t input_ht,
-                                   const uint16_t channels,
-                                   const int32_t input_offset,
-                                   const uint16_t pad_wd,
-                                   const uint16_t pad_ht,
-                                   const uint16_t stride_wd,
-                                   const uint16_t stride_ht,
-                                   const uint16_t ch_mult,
+void esp_nn_depthwise_conv_s8_ansi(const data_dims_t *input_dims,
+                                   const int8_t *input_data,
+                                   const data_dims_t *filter_dims,
                                   const int8_t *filter_data,
-                                   const uint16_t filter_wd,
-                                   const uint16_t filter_ht,
                                   const int32_t *bias,
+                                   const data_dims_t *output_dims,
                                   int8_t *out_data,
-                                   const uint16_t out_wd,
-                                   const uint16_t out_ht,
-                                   const int32_t out_offset,
-                                   const int32_t *out_shift,
-                                   const int32_t *out_mult,
-                                   const int32_t activation_min,
-                                   const int32_t activation_max);
+                                   const dw_conv_params_t *conv_params,
+                                   const quant_data_t *quant_data);

 /**
 * @brief       2d-convolution channelwise
@@ -112,43 +98,26 @@ void esp_nn_depthwise_conv_s8_ansi(const int8_t *input_data,
 *              inputs type: int8_t, output: int8_t
 *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
 */
-void esp_nn_conv_s8_ansi(const int8_t *input_data,
-                         const uint16_t input_wd,
-                         const uint16_t input_ht,
-                         const uint16_t in_channels,
-                         const int32_t input_offset,
-                         const uint16_t pad_wd,
-                         const uint16_t pad_ht,
-                         const uint16_t stride_wd,
-                         const uint16_t stride_ht,
+void esp_nn_conv_s8_ansi(const data_dims_t *input_dims,
+                         const int8_t *input_data,
+                         const data_dims_t *filter_dims,
                         const int8_t *filter_data,
-                         const uint16_t filter_wd,
-                         const uint16_t filter_ht,
                         const int32_t *bias,
+                         const data_dims_t *output_dims,
                         int8_t *out_data,
-                         const uint16_t out_wd,
-                         const uint16_t out_ht,
-                         const uint16_t out_channels,
-                         const int32_t out_offset,
-                         const int32_t *out_shift,
-                         const int32_t *out_mult,
-                         const int32_t activation_min,
-                         const int32_t activation_max);
+                         const conv_params_t *conv_params,
+                         const quant_data_t *quant_data);

-int esp_nn_get_conv_scratch_size_ansi(const uint16_t input_wd,
-                                      const uint16_t input_ht,
-                                      const uint16_t in_ch,
-                                      const uint16_t out_ch,
-                                      const uint16_t filter_wd,
-                                      const uint16_t filter_ht);
+int esp_nn_get_conv_scratch_size_ansi(const data_dims_t *input_dims,
+                                      const data_dims_t *filter_dims,
+                                      const data_dims_t *output_dims,
+                                      const conv_params_t *conv_params);
 void esp_nn_set_conv_scratch_buf_ansi(const void *buf);

-int esp_nn_get_depthwise_conv_scratch_size_ansi(const uint16_t input_wd,
-                                                const uint16_t input_ht,
-                                                const uint16_t channels,
-                                                const uint16_t ch_mult,
-                                                const uint16_t filter_wd,
-                                                const uint16_t filter_ht);
+int esp_nn_get_depthwise_conv_scratch_size_ansi(const data_dims_t *input_dims,
+                                                const data_dims_t *filter_dims,
+                                                const data_dims_t *output_dims,
+                                                const dw_conv_params_t *conv_params);
 void esp_nn_set_depthwise_conv_scratch_buf_ansi(const void *buf);

 /************************** Activation functions *****************************/
@@ -252,9 +221,6 @@ int32_t esp_nn_get_softmax_scratch_size_opt(const int32_t width, const int32_t h
 */
 void esp_nn_set_softmax_scratch_buf_ansi(void *buffer);

-/* ANSI C function to be hooked up when optimised version needed */
-void esp_nn_set_softmax_scratch_buf_opt(void *buffer);
-
 /**
 * @brief       reference softmax function
 *
@@ -268,6 +234,66 @@ void esp_nn_softmax_s8_ansi(const int8_t *input_data,
                            const int32_t diff_min,
                            int8_t *output_data);

+
+//////////////////////////// Generic optimisations /////////////////////////////
+
+/************************** Convolution functions *****************************/
+
+/**
+ * @brief       2d-convolution channelwise optimized version
+ *
+ * @note        operation: result += (input + offset) * filter
+ *
+ *              inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_conv_s8_opt(const data_dims_t *input_dims,
+                        const int8_t *input_data,
+                        const data_dims_t *filter_dims,
+                        const int8_t *filter_data,
+                        const int32_t *bias,
+                        const data_dims_t *output_dims,
+                        int8_t *out_data,
+                        const conv_params_t *conv_params,
+                        const quant_data_t *quant_data);
+
+/**
+ * @brief       depthwise convolution per channel optimized version
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              Version used in tflite is per channel.
+ *              This version follows the same footsprints.
+ *              Meaning, it has per out_channel shift and multiplier for
+ *              requantization
+ *
+ *              optimization notes: Though input_offset is int32 type,
+ *              offset values are contained in 8 bits [-128, 127]
+ */
+void esp_nn_depthwise_conv_s8_opt(const data_dims_t *input_dims,
+                                  const int8_t *input_data,
+                                  const data_dims_t *filter_dims,
+                                  const int8_t *filter_data,
+                                  const int32_t *bias,
+                                  const data_dims_t *output_dims,
+                                  int8_t *out_data,
+                                  const dw_conv_params_t *conv_params,
+                                  const quant_data_t *quant_data);
+
+int esp_nn_get_conv_scratch_size_opt(const data_dims_t *input_dims,
+                                     const data_dims_t *filter_dims,
+                                     const data_dims_t *output_dims,
+                                     const conv_params_t *conv_params);
+void esp_nn_set_conv_scratch_buf_opt(const void *buf);
+
+int esp_nn_get_depthwise_conv_scratch_size_opt(const data_dims_t *input_dims,
+                                               const data_dims_t *filter_dims,
+                                               const data_dims_t *output_dims,
+                                               const dw_conv_params_t *conv_params);
+void esp_nn_set_depthwise_conv_scratch_buf_opt(const void *buf);
+
+/* ANSI C function to be hooked up when optimised version needed */
+void esp_nn_set_softmax_scratch_buf_opt(void *buffer);
+
 /**
 * @brief       optimised version of softmax function
 *
--- a/code/components/esp-nn/include/esp_nn_defs.h
+++ b/code/components/esp-nn/include/esp_nn_defs.h
@@ -0,0 +1,83 @@
+// Copyright 2022 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include <stdint.h>
+
+/**
+ * @brief structure to club data dims
+ * this structure can be used for input, output and filter
+ */
+typedef struct data_dims {
+    int32_t width;
+    int32_t height;
+    int32_t channels;
+
+    int32_t extra; // can be used as batch or any other param
+} data_dims_t;
+
+/**
+ * @brief 2d data structure (width, height)
+ *
+ */
+typedef struct data_2d {
+    int32_t width;
+    int32_t height;
+} data_2d_t;
+
+/**
+ * @brief min/max activation
+ */
+typedef struct act_params {
+    int32_t min;
+    int32_t max;
+} act_params_t;
+
+/**
+ * @brief per channel quant data
+ *
+ * @note number of shift and mult elements are equal to output channels
+ */
+typedef struct quant_data {
+    int32_t *shift;
+    int32_t *mult;
+} quant_data_t;
+
+/**
+ * @brief params specific to convolution 2d
+ *
+ */
+typedef struct conv_params {
+    int32_t in_offset;
+    int32_t out_offset;
+    data_2d_t stride;
+    data_2d_t padding;
+    data_2d_t dilation;
+    act_params_t activation;
+} conv_params_t;
+
+/**
+ * @brief params specific to depthwise convolution 2d
+ *
+ */
+typedef struct dw_conv_params {
+    int32_t in_offset;
+    int32_t out_offset;
+    int32_t ch_mult; // channel multiplier. (in_ch * ch_mult = out_ch)
+    data_2d_t stride;
+    data_2d_t padding;
+    data_2d_t dilation;
+    act_params_t activation;
+} dw_conv_params_t;
--- a/code/components/esp-nn/include/esp_nn_esp32s3.h
+++ b/code/components/esp-nn/include/esp_nn_esp32s3.h
@@ -19,7 +19,7 @@

 #pragma once

-#include <stdint.h>
+#include "esp_nn_defs.h"
 #include "esp_nn_ansi_headers.h"

 /************************** Basic math functions *****************************/
@@ -85,28 +85,15 @@ void esp_nn_mul_elementwise_s8_esp32s3(const int8_t *input1_data,
 *              optimization notes: Though input_offset is int32 type,
 *              offset values are contained in 8 bits [-128, 127]
 */
-void esp_nn_depthwise_conv_s8_esp32s3(const int8_t *input_data,
-                                      const uint16_t input_wd,
-                                      const uint16_t input_ht,
-                                      const uint16_t channels,
-                                      const int32_t input_offset,
-                                      const uint16_t pad_wd,
-                                      const uint16_t pad_ht,
-                                      const uint16_t stride_wd,
-                                      const uint16_t stride_ht,
-                                      const uint16_t ch_mult,
+void esp_nn_depthwise_conv_s8_esp32s3(const data_dims_t *input_dims,
+                                      const int8_t *input_data,
+                                      const data_dims_t *filter_dims,
                                      const int8_t *filter_data,
-                                      const uint16_t filter_wd,
-                                      const uint16_t filter_ht,
                                      const int32_t *bias,
-                                      int8_t *out_data,
-                                      const uint16_t out_wd,
-                                      const uint16_t out_ht,
-                                      const int32_t out_offset,
-                                      const int32_t *out_shift,
-                                      const int32_t *out_mult,
-                                      const int32_t activation_min,
-                                      const int32_t activation_max);
+                                      const data_dims_t *output_dims,
+                                      int8_t *output_data,
+                                      const dw_conv_params_t *conv_params,
+                                      const quant_data_t *quant_data);

 /**
 * @brief       2d - convolution channelwise
@@ -116,43 +103,26 @@ void esp_nn_depthwise_conv_s8_esp32s3(const int8_t *input_data,
 *              inputs type: int8_t, output: int8_t
 *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
 */
-void esp_nn_conv_s8_esp32s3(const int8_t *input_data,
-                            const uint16_t input_wd,
-                            const uint16_t input_ht,
-                            const uint16_t in_channels,
-                            const int32_t input_offset,
-                            const uint16_t pad_wd,
-                            const uint16_t pad_ht,
-                            const uint16_t stride_wd,
-                            const uint16_t stride_ht,
+void esp_nn_conv_s8_esp32s3(const data_dims_t *input_dims,
+                            const int8_t *input_data,
+                            const data_dims_t *filter_dims,
                            const int8_t *filter_data,
-                            const uint16_t filter_wd,
-                            const uint16_t filter_ht,
                            const int32_t *bias,
-                            int8_t *out_data,
-                            const uint16_t out_wd,
-                            const uint16_t out_ht,
-                            const uint16_t out_channels,
-                            const int32_t out_offset,
-                            const int32_t *out_shift,
-                            const int32_t *out_mult,
-                            const int32_t activation_min,
-                            const int32_t activation_max);
+                            const data_dims_t *output_dims,
+                            int8_t *output_data,
+                            const conv_params_t *conv_params,
+                            const quant_data_t *quant_data);

-int esp_nn_get_conv_scratch_size_esp32s3(const uint16_t input_wd,
-                                         const uint16_t input_ht,
-                                         const uint16_t in_ch,
-                                         const uint16_t out_ch,
-                                         const uint16_t filter_wd,
-                                         const uint16_t filter_ht);
+int esp_nn_get_conv_scratch_size_esp32s3(const data_dims_t *input_dims,
+                                         const data_dims_t *filter_dims,
+                                         const data_dims_t *output_dims,
+                                         const conv_params_t *conv_params);
 void esp_nn_set_conv_scratch_buf_esp32s3(const void *buf);

-int esp_nn_get_depthwise_conv_scratch_size_esp32s3(const uint16_t input_wd,
-                                                   const uint16_t input_ht,
-                                                   const uint16_t channels,
-                                                   const uint16_t ch_mult,
-                                                   const uint16_t filter_wd,
-                                                   const uint16_t filter_ht);
+int esp_nn_get_depthwise_conv_scratch_size_esp32s3(const data_dims_t *input_dims,
+                                                   const data_dims_t *filter_dims,
+                                                   const data_dims_t *output_dims,
+                                                   const dw_conv_params_t *conv_params);
 void esp_nn_set_depthwise_conv_scratch_buf_esp32s3(const void *buf);

 /************************** Pooling functions *****************************/
--- a/code/components/esp-nn/include/esp_nn_generic_opt.h
+++ b/code/components/esp-nn/include/esp_nn_generic_opt.h
@@ -13,28 +13,27 @@
 // limitations under the License.

 /**
- * @file        Header definitions to include for esp_nn optimized functions for
- *              the ESP32 platform.
- *              We are hooking up just the C versions for now.
- *              The file hence is exactly same as `esp_nn_ansi_c.h`
+ * @file        Header definitions to include for esp_nn generic optimisations
+ *              For functions which not having optimisations, _ansi versions are picked.
 */

 #pragma once

+#include "esp_nn_defs.h"
 #include "esp_nn_ansi_headers.h"

 #define esp_nn_add_elementwise_s8 esp_nn_add_elementwise_s8_ansi
 #define esp_nn_mul_elementwise_s8 esp_nn_mul_elementwise_s8_ansi

-#define esp_nn_depthwise_conv_s8 esp_nn_depthwise_conv_s8_ansi
+#define esp_nn_depthwise_conv_s8 esp_nn_depthwise_conv_s8_opt

-#define esp_nn_conv_s8 esp_nn_conv_s8_ansi
+#define esp_nn_conv_s8 esp_nn_conv_s8_opt

-#define esp_nn_get_conv_scratch_size esp_nn_get_conv_scratch_size_ansi
-#define esp_nn_set_conv_scratch_buf esp_nn_set_conv_scratch_buf_ansi
+#define esp_nn_get_conv_scratch_size esp_nn_get_conv_scratch_size_opt
+#define esp_nn_set_conv_scratch_buf esp_nn_set_conv_scratch_buf_opt

-#define esp_nn_get_depthwise_conv_scratch_size esp_nn_get_depthwise_conv_scratch_size_ansi
-#define esp_nn_set_depthwise_conv_scratch_buf esp_nn_set_depthwise_conv_scratch_buf_ansi
+#define esp_nn_get_depthwise_conv_scratch_size esp_nn_get_depthwise_conv_scratch_size_opt
+#define esp_nn_set_depthwise_conv_scratch_buf esp_nn_set_depthwise_conv_scratch_buf_opt

 #define esp_nn_relu6_s8 esp_nn_relu6_s8_ansi

--- a/code/components/esp-nn/src/common/common_functions.h
+++ b/code/components/esp-nn/src/common/common_functions.h
@@ -41,15 +41,39 @@

 __NN_FORCE_INLINE__ int32_t esp_nn_clz32(uint32_t in)
 {
+#if CONFIG_IDF_TARGET_ARCH_XTENSA
    __asm__ volatile("nsau %0, %0" : "+r" (in));
    return in;
-}
-
-__NN_FORCE_INLINE__ int32_t esp_nn_pick_sat_high32_of64(int64_t val64)
-{
-    int32_t sign = (int32_t) (val64 >> 63);
-    int32_t to_add = sign & ((1ul << 31) - 1);
-    return (int32_t) ((int64_t) (val64 + to_add) >> 31);
+#elif defined(__GNUC__)
+    return __builtin_clz(in);
+#else
+    int32_t count = 32;
+    uint32_t x = in, y = in >> 16;
+    if (y != 0) {
+        count -= 16;
+        x = y;
+    }
+    y = x >> 8;
+    if (y != 0) {
+        count -= 8;
+        x = y;
+    }
+    y = x >> 4;
+    if (y != 0) {
+        count -= 4;
+        x = y;
+    }
+    y = x >> 2;
+    if (y != 0) {
+        count -= 2;
+        x = y;
+    }
+    y = x >> 1;
+    if (y != 0) {
+        return count - 2;
+    }
+    return count - x;
+#endif
 }

 /**
@@ -57,8 +81,19 @@ __NN_FORCE_INLINE__ int32_t esp_nn_pick_sat_high32_of64(int64_t val64)
 */
 __NN_FORCE_INLINE__ int32_t esp_nn_saturate8(int32_t in)
 {
+#if CONFIG_IDF_TARGET_ARCH_XTENSA
    __asm__ volatile("clamps %0, %0, 7" : "+a"(in));
    return in;
+#else
+    return max(INT8_MIN, min(in, INT8_MAX));
+#endif
+}
+
+__NN_FORCE_INLINE__ int32_t esp_nn_pick_sat_high32_of64(int64_t val64)
+{
+    int32_t sign = (int32_t) (val64 >> 63);
+    int32_t to_add = sign & ((1ul << 31) - 1);
+    return (int32_t) ((int64_t) (val64 + to_add) >> 31);
 }

 __NN_FORCE_INLINE__ int32_t esp_nn_sat_round_doubling_high_mul(int32_t in0, int32_t in1)
@@ -144,7 +179,7 @@ static void esp_nn_aligned_s8_pad_with_value(const int8_t *src, int8_t *dst,
                                             const uint16_t pad_ht)
 {
    /* memset with pad_val */
-    memset(dst, pad_val, ((input_wd + 2 * pad_wd) * (input_ht + 2 * pad_ht)) * channels * 2);
+    memset(dst, pad_val, ((input_wd + 2 * pad_wd) * (input_ht + 2 * pad_ht)) * channels);
    dst += (pad_wd + input_wd + pad_wd) * channels;

    for (int i = 0; i < input_ht; i++) {
@@ -156,7 +191,6 @@ static void esp_nn_aligned_s8_pad_with_value(const int8_t *src, int8_t *dst,
    }
 }

-#if 0
 static void esp_nn_aligned_s8_pad_end_with_value(const int8_t *src, int8_t *dst,
                                                 const uint16_t input_wd,
                                                 const uint16_t input_ht,
@@ -169,13 +203,16 @@ static void esp_nn_aligned_s8_pad_end_with_value(const int8_t *src, int8_t *dst,
        for (int j = 0; j < input_wd * channels; j++) {
            *dst++ = *src++;
        }
-        memset(dst, pad_val, pad_wd * channels);
-        dst += pad_wd * channels;
+        if (pad_wd) {
+            memset(dst, pad_val, pad_wd * channels);
+            dst += pad_wd * channels;
+        }
    }
    /* pad end `pad_ht` lines at end */
-    memset(dst, pad_val, (input_wd + pad_wd) * pad_ht * channels);
+    if (pad_ht) {
+        memset(dst, pad_val, (input_wd + pad_wd) * pad_ht * channels);
+    }
 }
-#endif

 /**
 * @brief       convert 8 bit input data to 16 bit
--- a/code/components/esp-nn/src/convolution/esp_nn_conv_ansi.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_conv_ansi.c
@@ -12,16 +12,14 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

-#include <stdint.h>
+#include <esp_nn_defs.h>

 #include <common_functions.h>

-int esp_nn_get_conv_scratch_size_ansi(const uint16_t input_wd,
-                                      const uint16_t input_ht,
-                                      const uint16_t in_ch,
-                                      const uint16_t out_ch,
-                                      const uint16_t filter_wd,
-                                      const uint16_t filter_ht)
+int esp_nn_get_conv_scratch_size_ansi(const data_dims_t *input_dims,
+                                      const data_dims_t *filter_dims,
+                                      const data_dims_t *output_dims,
+                                      const conv_params_t *conv_params)
 {
    return 0;
 }
@@ -108,29 +106,35 @@ void esp_nn_conv_u8_ansi(const uint8_t *input_data,
 * Assumption 2: Pointers are valid
 * Assumption 3: dialation width = 1
 */
-void esp_nn_conv_s8_ansi(const int8_t *input_data,
-                         const uint16_t input_wd,
-                         const uint16_t input_ht,
-                         const uint16_t in_channels,
-                         const int32_t input_offset,
-                         const uint16_t pad_wd,
-                         const uint16_t pad_ht,
-                         const uint16_t stride_wd,
-                         const uint16_t stride_ht,
+void esp_nn_conv_s8_ansi(const data_dims_t *input_dims,
+                         const int8_t *input_data,
+                         const data_dims_t *filter_dims,
                         const int8_t *filter_data,
-                         const uint16_t filter_wd,
-                         const uint16_t filter_ht,
                         const int32_t *bias,
+                         const data_dims_t *output_dims,
                         int8_t *out_data,
-                         const uint16_t out_wd,
-                         const uint16_t out_ht,
-                         const uint16_t out_channels,
-                         const int32_t out_offset,
-                         const int32_t *out_shift,
-                         const int32_t *out_mult,
-                         const int32_t activation_min,
-                         const int32_t activation_max)
+                         const conv_params_t *conv_params,
+                         const quant_data_t *quant_data)
 {
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t in_channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_channels = output_dims->channels;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
    int32_t out_ch_idx, out_y, out_x, in_ch_idx, filter_y_idx, filter_x_idx;

    for (out_y = 0; out_y < out_ht; out_y++) {
--- a/code/components/esp-nn/src/convolution/esp_nn_conv_esp32s3.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_conv_esp32s3.c
@@ -12,30 +12,30 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

-#include <stdint.h>
 #include <stdio.h>
+#include <esp_nn_defs.h>

 #include <common_functions.h>

 static int16_t *scratch_buffer = NULL;

-extern void esp_nn_conv_s16_mult8_1x1_esp32s3(const int8_t *input_data,
-                                              const uint16_t input_wd,
-                                              const uint16_t input_ht,
-                                              const uint16_t in_channels,
-                                              const int32_t input_offset,
-                                              const int16_t *filter_data,
-                                              const int32_t *bias,
-                                              int8_t *out_data,
-                                              const uint16_t out_wd,
-                                              const uint16_t out_ht,
-                                              const uint16_t out_channels,
-                                              const int32_t out_offset,
-                                              const int32_t *out_shift,
-                                              const int32_t *out_mult,
-                                              const int32_t activation_min,
-                                              const int32_t activation_max,
-                                              void *buffer /* scratch buffer */);
+extern void esp_nn_conv_s8_mult8_1x1_esp32s3(const int8_t *input_data,
+                                             const uint16_t input_wd,
+                                             const uint16_t input_ht,
+                                             const uint16_t in_channels,
+                                             const int32_t input_offset,
+                                             const int8_t *filter_aligned,
+                                             const int32_t *bias,
+                                             int8_t *out_data,
+                                             const uint16_t out_wd,
+                                             const uint16_t out_ht,
+                                             const uint16_t out_channels,
+                                             const int32_t out_offset,
+                                             const int32_t *out_shift,
+                                             const int32_t *out_mult,
+                                             const int32_t activation_min,
+                                             const int32_t activation_max,
+                                             void *buffer /* scratch buffer */);

 extern void esp_nn_conv_s16_mult4_1x1_esp32s3(const int16_t *input_data,
                                              const uint16_t input_wd,
@@ -81,34 +81,40 @@ extern void esp_nn_aligned_s8_to_s16_with_offset_esp32s3(const int8_t *src, int1

 extern void esp_nn_s8_to_s16_esp32s3(const int8_t *src, int16_t *dst, const int size);

-static void esp_nn_conv_s8_unrolled(const int8_t *input_data,
-                                    const uint16_t input_wd,
-                                    const uint16_t input_ht,
-                                    const uint16_t in_channels,
-                                    const int32_t input_offset,
-                                    const uint16_t pad_wd,
-                                    const uint16_t pad_ht,
-                                    const uint16_t stride_wd,
-                                    const uint16_t stride_ht,
+static void esp_nn_conv_s8_unrolled(const data_dims_t *input_dims,
+                                    const int8_t *input_data,
+                                    const data_dims_t *filter_dims,
                                    const int8_t *filter_data,
-                                    const uint16_t filter_wd,
-                                    const uint16_t filter_ht,
                                    const int32_t *bias,
+                                    const data_dims_t *output_dims,
                                    int8_t *out_data,
-                                    const uint16_t out_wd,
-                                    const uint16_t out_ht,
-                                    const uint16_t out_channels,
-                                    const int32_t out_offset,
-                                    const int32_t *out_shift,
-                                    const int32_t *out_mult,
-                                    const int32_t activation_min,
-                                    const int32_t activation_max)
+                                    const conv_params_t *conv_params,
+                                    const quant_data_t *quant_data)
 {
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t in_ch = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_ch = output_dims->channels;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
    int32_t out_ch_idx, out_y, out_x, in_ch_idx, filter_y_idx, filter_x_idx;

    for (out_y = 0; out_y < out_ht; out_y++) {
        for (out_x = 0; out_x < out_wd; out_x++) {
-            for (out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {
+            for (out_ch_idx = 0; out_ch_idx < out_ch; out_ch_idx++) {
                int32_t conv_out = 0;

                const int32_t base_y = stride_ht * out_y - pad_ht;
@@ -124,10 +130,10 @@ static void esp_nn_conv_s8_unrolled(const int8_t *input_data,
                    for (filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
                        const int32_t in_row = base_y + filter_y_idx;
                        const int32_t in_col = base_x + filter_x_idx;
-                        int32_t input_base_offset = (in_row * input_wd + in_col) * in_channels;
-                        int32_t filter_base_offset = out_ch_idx * in_channels * filter_ht * filter_wd +
-                                                       (filter_y_idx * filter_wd + filter_x_idx) * in_channels;
-                        for (in_ch_idx = 0; in_ch_idx < in_channels; in_ch_idx++) {
+                        int32_t input_base_offset = (in_row * input_wd + in_col) * in_ch;
+                        int32_t filter_base_offset = out_ch_idx * in_ch * filter_ht * filter_wd +
+                                                       (filter_y_idx * filter_wd + filter_x_idx) * in_ch;
+                        for (in_ch_idx = 0; in_ch_idx < in_ch; in_ch_idx++) {
                            conv_out +=
                                (input_data[input_base_offset + in_ch_idx] + input_offset) *
                                filter_data[filter_base_offset + in_ch_idx];
@@ -332,18 +338,35 @@ static void esp_nn_conv_s8_pad_valid_ch3_3x3(const int8_t *input_data,
    }
 }

-int esp_nn_get_conv_scratch_size_esp32s3(const uint16_t input_wd,
-                                         const uint16_t input_ht,
-                                         const uint16_t in_ch,
-                                         const uint16_t out_ch,
-                                         const uint16_t filter_wd,
-                                         const uint16_t filter_ht)
+int esp_nn_get_conv_scratch_size_esp32s3(const data_dims_t *input_dims,
+                                         const data_dims_t *filter_dims,
+                                         const data_dims_t *output_dims,
+                                         const conv_params_t *conv_params)
 {
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t in_ch = input_dims->channels;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_ch = output_dims->channels;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+
    int filter_size = filter_wd * filter_ht * in_ch * out_ch;
    int input_size = input_wd * input_ht * in_ch;
-    int transpose_buf_size = 8 * in_ch; /* to store intermediate data */
+
+    int transpose_buf_size = 2 * (8 * in_ch); /* to store intermediate data */
+    if (input_wd * input_ht < 8) {
+        transpose_buf_size = 0; // not using this for leftover
+    }
    int align_buf_size = 32; /* extra buffer for alignment */
-    return 2 * (filter_size + input_size +  transpose_buf_size) + align_buf_size;
+    if (in_ch % 8 == 0 && filter_wd == 1 && filter_ht == 1 &&
+            pad_wd == 0 && pad_ht == 0 && stride_wd == 1 && stride_ht == 1) {
+        return filter_size + transpose_buf_size + align_buf_size;
+    }
+    return 2 * (filter_size + input_size) +  transpose_buf_size + align_buf_size;
 }

 void esp_nn_set_conv_scratch_buf_esp32s3(void *buf)
@@ -351,29 +374,35 @@ void esp_nn_set_conv_scratch_buf_esp32s3(void *buf)
    scratch_buffer = (int16_t *) buf;
 }

-void esp_nn_conv_s8_esp32s3(const int8_t *input,
-                            const uint16_t input_wd,
-                            const uint16_t input_ht,
-                            const uint16_t channels,
-                            const int32_t input_offset,
-                            const uint16_t pad_wd,
-                            const uint16_t pad_ht,
-                            const uint16_t stride_wd,
-                            const uint16_t stride_ht,
+void esp_nn_conv_s8_esp32s3(const data_dims_t *input_dims,
+                            const int8_t *input,
+                            const data_dims_t *filter_dims,
                            const int8_t *filter_data,
-                            const uint16_t filter_wd,
-                            const uint16_t filter_ht,
                            const int32_t *bias,
+                            const data_dims_t *output_dims,
                            int8_t *out_data,
-                            const uint16_t out_wd,
-                            const uint16_t out_ht,
-                            const uint16_t out_channels,
-                            const int32_t out_offset,
-                            const int32_t *out_shift,
-                            const int32_t *out_mult,
-                            const int32_t activation_min,
-                            const int32_t activation_max)
+                            const conv_params_t *conv_params,
+                            const quant_data_t *quant_data)
 {
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_channels = output_dims->channels;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
    int filter_size = filter_wd * filter_ht * channels * out_channels;
    int input_size = input_wd * input_ht * channels;
    int align_len = 16 - (filter_size & 15);
@@ -387,15 +416,16 @@ void esp_nn_conv_s8_esp32s3(const int8_t *input,

    if (channels % 8 == 0 && filter_wd == 1 && filter_ht == 1 &&
            pad_wd == 0 && pad_ht == 0 && stride_wd == 1 && stride_ht == 1) {
-        int scratch_offset = (int) (filter_data16 + filter_size);
+        int8_t *filter_aligned = (int8_t *) scratch_buffer;
+        int scratch_offset = (int) (filter_aligned + filter_size);
        void *scratch_buf = (void *) (scratch_offset + 16 - (scratch_offset & 15));
-        esp_nn_s8_to_s16_esp32s3(filter_data, filter_data16, filter_size);
-        esp_nn_conv_s16_mult8_1x1_esp32s3(
-            input, input_wd, input_ht, channels, input_offset, filter_data16,
+        memcpy(filter_aligned, filter_data, filter_size); // copy to aligned address
+        esp_nn_conv_s8_mult8_1x1_esp32s3(
+            input, input_wd, input_ht, channels, input_offset, filter_aligned,
            bias, out_data, out_wd, out_ht, out_channels, out_offset,
            out_shift, out_mult, activation_min, activation_max, scratch_buf);
    } else if (channels % 4 == 0 && filter_wd == 1 && filter_ht == 1 &&
-            (input_wd * input_ht) % 16 == 0 && /* TODO: remove this check */
+            (input_wd * input_ht) % 4 == 0 && /* TODO: remove this check */
            pad_wd == 0 && pad_ht == 0 && stride_wd == 1 && stride_ht == 1) {
        int scratch_offset = (int) (input_data16 + input_size);
        void *scratch_buf = (void *) (scratch_offset + 16 - (scratch_offset & 15));
@@ -427,10 +457,7 @@ void esp_nn_conv_s8_esp32s3(const int8_t *input,
        }
    } else {
        /* Basic unrolled version */
-        esp_nn_conv_s8_unrolled(input, input_wd, input_ht, channels, input_offset,
-                                pad_wd, pad_ht, stride_wd, stride_ht,
-                                filter_data, filter_wd, filter_ht, bias,
-                                out_data, out_wd, out_ht, out_channels, out_offset, out_shift,
-                                out_mult, activation_min, activation_max);
+        esp_nn_conv_s8_unrolled(input_dims, input, filter_dims, filter_data,
+                                bias, output_dims, out_data, conv_params, quant_data);
    }
 }
--- a/code/components/esp-nn/src/convolution/esp_nn_conv_opt.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_conv_opt.c
@@ -0,0 +1,179 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <esp_nn_defs.h>
+
+#include <common_functions.h>
+
+int esp_nn_get_conv_scratch_size_opt(const data_dims_t *input_dims,
+                                     const data_dims_t *filter_dims,
+                                     const data_dims_t *output_dims,
+                                     const conv_params_t *conv_params)
+{
+    return 0;
+}
+
+void esp_nn_set_conv_scratch_buf_opt(const void *buf)
+{
+
+}
+
+__attribute__ ((noinline))
+static void esp_nn_conv_s8_1x1(const data_dims_t *input_dims,
+                               const int8_t *input_data,
+                               const int8_t *filter_data,
+                               const int32_t *bias,
+                               const data_dims_t *output_dims,
+                               int8_t *out_data,
+                               const conv_params_t *conv_params,
+                               const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t in_channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_channels = output_dims->channels;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    for (int32_t in_row = 0; in_row < out_ht * stride_ht; in_row += stride_ht) {
+        for (int32_t in_col = 0; in_col < out_wd * stride_wd; in_col += stride_wd) {
+            const int32_t *out_mult = quant_data->mult;
+            const int32_t *out_shift = quant_data->shift;
+            const int8_t *filter_ptr = filter_data;
+            const int8_t *input_base_ptr = input_data + (in_row * input_wd + in_col) * in_channels;
+            int32_t out_ch_idx = 0;
+            for (; out_ch_idx < out_channels; out_ch_idx++) {
+                int32_t conv_out = 0;
+
+                const int8_t *input_ptr = input_base_ptr;
+
+                int32_t in_ch_idx = 0;
+                for (; in_ch_idx < in_channels - 3; in_ch_idx += 4) {
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                }
+                for (; in_ch_idx < in_channels; in_ch_idx ++) {
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                }
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult_fast(conv_out, *out_mult++, *out_shift++);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
+
+/**
+ * Assumption 1: i/p channels == o/p channels
+ * Assumption 2: Pointers are valid
+ * Assumption 3: dialation width = 1
+ */
+void esp_nn_conv_s8_opt(const data_dims_t *input_dims,
+                        const int8_t *input_data,
+                        const data_dims_t *filter_dims,
+                        const int8_t *filter_data,
+                        const int32_t *bias,
+                        const data_dims_t *output_dims,
+                        int8_t *out_data,
+                        const conv_params_t *conv_params,
+                        const quant_data_t *quant_data)
+{
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+
+    if (filter_wd == 1 && filter_ht == 1) {
+        esp_nn_conv_s8_1x1(input_dims, input_data, filter_data, bias,
+                           output_dims, out_data, conv_params, quant_data);
+        return;
+    }
+
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t in_channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_channels = output_dims->channels;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int32_t out_ch_idx, out_y, out_x, filter_y_idx, filter_x_idx;
+
+    for (out_y = 0; out_y < out_ht; out_y++) {
+        for (out_x = 0; out_x < out_wd; out_x++) {
+            const int32_t *out_shift = quant_data->shift;
+            const int32_t *out_mult = quant_data->mult;
+            for (out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {
+                int32_t conv_out = 0;
+
+                const int32_t base_y = stride_ht * out_y - pad_ht;
+                const int32_t base_x = stride_wd * out_x - pad_wd;
+
+                const int32_t filter_y_start = max(0, -base_y);
+                const int32_t filter_x_start = max(0, -base_x);
+
+                const int32_t filter_y_end = min(filter_ht, input_ht - base_y);
+                const int32_t filter_x_end = min(filter_wd, input_wd - base_x);
+
+                for (filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    for (filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t in_row = base_y + filter_y_idx;
+                        const int32_t in_col = base_x + filter_x_idx;
+
+                        const int8_t *input_ptr = input_data +
+                                        (in_row * input_wd + in_col) * in_channels;
+                        const int8_t *filter_ptr = filter_data +
+                                        out_ch_idx * in_channels * filter_ht * filter_wd +
+                                        (filter_y_idx * filter_wd + filter_x_idx) * in_channels;
+                        int32_t in_ch_idx = 0;
+                        for (; in_ch_idx < in_channels - 3; in_ch_idx += 4) {
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                        }
+                        for (; in_ch_idx < in_channels; in_ch_idx ++) {
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                        }
+                    }
+                }
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult_fast(conv_out, *out_mult++, *out_shift++);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
--- a/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_ansi.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_ansi.c
@@ -12,16 +12,13 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

-#include <stdint.h>
-
+#include <esp_nn_defs.h>
 #include <common_functions.h>

-int esp_nn_get_depthwise_conv_scratch_size_ansi(const uint16_t input_wd,
-                                                const uint16_t input_ht,
-                                                const uint16_t channels,
-                                                const uint16_t ch_mult,
-                                                const uint16_t filter_wd,
-                                                const uint16_t filter_ht)
+int esp_nn_get_depthwise_conv_scratch_size_ansi(const data_dims_t *input_dims,
+                                                const data_dims_t *filter_dims,
+                                                const data_dims_t *output_dims,
+                                                const dw_conv_params_t *conv_params)
 {
    return 0;
 }
@@ -31,29 +28,35 @@ void esp_nn_set_depthwise_conv_scratch_buf_ansi(const void *buf)

 }

-void esp_nn_depthwise_conv_s8_ansi(const int8_t *input_data,
-                                   const uint16_t input_wd,
-                                   const uint16_t input_ht,
-                                   const uint16_t channels,
-                                   const int32_t input_offset,
-                                   const uint16_t pad_wd,
-                                   const uint16_t pad_ht,
-                                   const uint16_t stride_wd,
-                                   const uint16_t stride_ht,
-                                   const uint16_t ch_mult,
+void esp_nn_depthwise_conv_s8_ansi(const data_dims_t *input_dims,
+                                   const int8_t *input_data,
+                                   const data_dims_t *filter_dims,
                                   const int8_t *filter_data,
-                                   const uint16_t filter_wd,
-                                   const uint16_t filter_ht,
                                   const int32_t *bias,
+                                   const data_dims_t *output_dims,
                                   int8_t *out_data,
-                                   const uint16_t out_wd,
-                                   const uint16_t out_ht,
-                                   const int32_t out_offset,
-                                   const int32_t *out_shift,
-                                   const int32_t *out_mult,
-                                   const int32_t activation_min,
-                                   const int32_t activation_max)
+                                   const dw_conv_params_t *conv_params,
+                                   const quant_data_t *quant_data)
 {
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+    const uint16_t ch_mult = conv_params->ch_mult;
+
    int out_idx = 0;
    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
        const int16_t base_y = (out_y * stride_ht) - pad_ht;
--- a/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_opt.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_opt.c
@@ -0,0 +1,291 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <esp_nn_defs.h>
+#include <common_functions.h>
+
+int esp_nn_get_depthwise_conv_scratch_size_opt(const data_dims_t *input_dims,
+                                               const data_dims_t *filter_dims,
+                                               const data_dims_t *output_dims,
+                                               const dw_conv_params_t *conv_params)
+{
+    return 0;
+}
+
+void esp_nn_set_depthwise_conv_scratch_buf_opt(const void *buf)
+{
+
+}
+
+/* common channel multiplier == 1 case */
+__attribute__ ((noinline))
+static void esp_nn_depthwise_conv_s8_ch_mult_1(const data_dims_t *input_dims,
+                                               const int8_t *input_data,
+                                               const data_dims_t *filter_dims,
+                                               const int8_t *filter_data,
+                                               const int32_t *bias,
+                                               const data_dims_t *output_dims,
+                                               int8_t *out_data,
+                                               const dw_conv_params_t *conv_params,
+                                               const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int out_idx = 0;
+    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
+        const int16_t base_y = (out_y * stride_ht) - pad_ht;
+        for (int out_x = 0; out_x < out_wd; out_x++) { //width_loop
+            const int16_t base_x = (out_x * stride_wd) - pad_wd;
+
+            const int32_t *out_shift = quant_data->shift;
+            const int32_t *out_mult = quant_data->mult;
+
+            /* Select filter so as the point doesn't lie outside block */
+            int filter_y_start = max(0, -base_y);
+            int filter_x_start = max(0, -base_x);
+            int filter_y_end = min(filter_ht, input_ht - base_y);
+            int filter_x_end = min(filter_wd, input_wd - base_x);
+
+            int ch_idx = 0;
+            for (; ch_idx < channels - 3; ch_idx += 4) {//channel_loop
+                int32_t result0 = 0;
+                int32_t result1 = 0;
+                int32_t result2 = 0;
+                int32_t result3 = 0;
+
+                for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    const int32_t idx_y = base_y + filter_y_idx;
+                    for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t idx_x = base_x + filter_x_idx;
+                        int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                        int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels) + ch_idx;
+                        int32_t input_val0 = input_data[input_index + 0] + input_offset;
+                        int32_t input_val1 = input_data[input_index + 1] + input_offset;
+                        int32_t input_val2 = input_data[input_index + 2] + input_offset;
+                        int32_t input_val3 = input_data[input_index + 3] + input_offset;
+                        int32_t filter_val0 = filter_data[filter_index + 0];
+                        int32_t filter_val1 = filter_data[filter_index + 1];
+                        int32_t filter_val2 = filter_data[filter_index + 2];
+                        int32_t filter_val3 = filter_data[filter_index + 3];
+                        result0 += input_val0 * filter_val0;
+                        result1 += input_val1 * filter_val1;
+                        result2 += input_val2 * filter_val2;
+                        result3 += input_val3 * filter_val3;
+                    }
+                }
+                if (bias) {
+                    result0 += bias[ch_idx + 0];
+                    result1 += bias[ch_idx + 1];
+                    result2 += bias[ch_idx + 2];
+                    result3 += bias[ch_idx + 3];
+                }
+                result0 = esp_nn_multiply_by_quantized_mult_fast(result0, *out_mult++, *out_shift++);
+                result1 = esp_nn_multiply_by_quantized_mult_fast(result1, *out_mult++, *out_shift++);
+                result2 = esp_nn_multiply_by_quantized_mult_fast(result2, *out_mult++, *out_shift++);
+                result3 = esp_nn_multiply_by_quantized_mult_fast(result3, *out_mult++, *out_shift++);
+
+                result0 += out_offset;
+                result1 += out_offset;
+                result2 += out_offset;
+                result3 += out_offset;
+
+                result0 = max(result0, activation_min);
+                result1 = max(result1, activation_min);
+                result2 = max(result2, activation_min);
+                result3 = max(result3, activation_min);
+
+                result0 = min(result0, activation_max);
+                result1 = min(result1, activation_max);
+                result2 = min(result2, activation_max);
+                result3 = min(result3, activation_max);
+
+                out_data[out_idx++] = result0;
+                out_data[out_idx++] = result1;
+                out_data[out_idx++] = result2;
+                out_data[out_idx++] = result3;
+            }
+            for (; ch_idx < channels; ch_idx++) {//channel_loop
+                int32_t result = 0;
+
+                for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    const int32_t idx_y = base_y + filter_y_idx;
+                    for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t idx_x = base_x + filter_x_idx;
+                        int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                        int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels) + ch_idx;
+                        int32_t input_val = input_data[input_index] + input_offset;
+                        int32_t filter_val = filter_data[filter_index];
+                        result += input_val * filter_val;
+                    }
+                }
+                if (bias) {
+                    result += bias[ch_idx];
+                }
+                result = esp_nn_multiply_by_quantized_mult_fast(result, *out_mult++, *out_shift++);
+                result += out_offset;
+                result = max(result, activation_min);
+                result = min(result, activation_max);
+
+                out_data[out_idx++] = result;
+            }
+        }
+    }
+}
+
+void esp_nn_depthwise_conv_s8_opt(const data_dims_t *input_dims,
+                                  const int8_t *input_data,
+                                  const data_dims_t *filter_dims,
+                                  const int8_t *filter_data,
+                                  const int32_t *bias,
+                                  const data_dims_t *output_dims,
+                                  int8_t *out_data,
+                                  const dw_conv_params_t *conv_params,
+                                  const quant_data_t *quant_data)
+{
+    const uint16_t ch_mult = conv_params->ch_mult;
+    if (ch_mult == 1) {
+        esp_nn_depthwise_conv_s8_ch_mult_1(input_dims, input_data, filter_dims, filter_data,
+                                           bias, output_dims, out_data, conv_params, quant_data);
+        return;
+    }
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int out_idx = 0;
+    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
+        const int16_t base_y = (out_y * stride_ht) - pad_ht;
+        for (int out_x = 0; out_x < out_wd; out_x++) { //width_loop
+            const int16_t base_x = (out_x * stride_wd) - pad_wd;
+
+            const int32_t *out_shift = quant_data->shift;
+            const int32_t *out_mult = quant_data->mult;
+
+            /* Select filter so as the point doesn't lie outside block */
+            int filter_y_start = max(0, -base_y);
+            int filter_x_start = max(0, -base_x);
+            int filter_y_end = min(filter_ht, input_ht - base_y);
+            int filter_x_end = min(filter_wd, input_wd - base_x);
+
+            for (int ch_idx = 0; ch_idx < channels; ch_idx++) {//channel_loop
+                int ch_mult_idx = 0;
+                for (; ch_mult_idx < ch_mult - 3; ch_mult_idx += 4) {
+                    int32_t result0 = 0;
+                    int32_t result1 = 0;
+                    int32_t result2 = 0;
+                    int32_t result3 = 0;
+                    const int out_ch_idx =  ch_idx * ch_mult + ch_mult_idx;
+
+                    for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                        const int32_t idx_y = base_y + filter_y_idx;
+                        for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                            const int32_t idx_x = base_x + filter_x_idx;
+                            int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                            int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels * ch_mult) + out_ch_idx;
+                            int32_t input_val = input_data[input_index] + input_offset;
+                            int32_t filter_val0 = filter_data[filter_index + 0];
+                            int32_t filter_val1 = filter_data[filter_index + 1];
+                            int32_t filter_val2 = filter_data[filter_index + 2];
+                            int32_t filter_val3 = filter_data[filter_index + 3];
+                            result0 += input_val * filter_val0;
+                            result1 += input_val * filter_val1;
+                            result2 += input_val * filter_val2;
+                            result3 += input_val * filter_val3;
+                        }
+                    }
+                    if (bias) {
+                        result0 += bias[out_ch_idx + 0];
+                        result1 += bias[out_ch_idx + 1];
+                        result2 += bias[out_ch_idx + 2];
+                        result3 += bias[out_ch_idx + 3];
+                    }
+                    result0 = esp_nn_multiply_by_quantized_mult_fast(result0, *out_mult++, *out_shift++);
+                    result1 = esp_nn_multiply_by_quantized_mult_fast(result1, *out_mult++, *out_shift++);
+                    result2 = esp_nn_multiply_by_quantized_mult_fast(result2, *out_mult++, *out_shift++);
+                    result3 = esp_nn_multiply_by_quantized_mult_fast(result3, *out_mult++, *out_shift++);
+
+                    result0 += out_offset;
+                    result1 += out_offset;
+                    result2 += out_offset;
+                    result3 += out_offset;
+
+                    result0 = max(result0, activation_min);
+                    result1 = max(result1, activation_min);
+                    result2 = max(result2, activation_min);
+                    result3 = max(result3, activation_min);
+                    result0 = min(result0, activation_max);
+                    result1 = min(result1, activation_max);
+                    result2 = min(result2, activation_max);
+                    result3 = min(result3, activation_max);
+
+                    out_data[out_idx++] = result0;
+                    out_data[out_idx++] = result1;
+                    out_data[out_idx++] = result2;
+                    out_data[out_idx++] = result3;
+                }
+                for (; ch_mult_idx < ch_mult; ch_mult_idx++) {
+                    int32_t result = 0;
+                    const int out_ch_idx =  ch_idx * ch_mult + ch_mult_idx;
+
+                    for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                        const int32_t idx_y = base_y + filter_y_idx;
+                        for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                            const int32_t idx_x = base_x + filter_x_idx;
+                            int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                            int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels * ch_mult) + out_ch_idx;
+                            int32_t input_val = input_data[input_index] + input_offset;
+                            int32_t filter_val = filter_data[filter_index];
+                            result += input_val * filter_val;
+                        }
+                    }
+                    if (bias) {
+                        result += bias[out_ch_idx];
+                    }
+                    result = esp_nn_multiply_by_quantized_mult_fast(result, *out_mult++, *out_shift++);
+                    result += out_offset;
+                    result = max(result, activation_min);
+                    result = min(result, activation_max);
+
+                    out_data[out_idx++] = result;
+                }
+            }
+        }
+    }
+}
--- a/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_s8_esp32s3.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_s8_esp32s3.c
@@ -12,8 +12,8 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

-#include <stdint.h>
 #include <stdio.h>
+#include <esp_nn_defs.h>

 #include <common_functions.h>

@@ -353,17 +353,59 @@ void esp_nn_depthwise_conv_s8_ch_mult1(const int8_t *input_data,
    }
 }

-int esp_nn_get_depthwise_conv_scratch_size_esp32s3(const uint16_t input_wd,
-                                                   const uint16_t input_ht,
-                                                   const uint16_t channels,
-                                                   const uint16_t ch_mult,
-                                                   const uint16_t filter_wd,
-                                                   const uint16_t filter_ht)
+int esp_nn_get_depthwise_conv_scratch_size_esp32s3(const data_dims_t *input_dims,
+                                                   const data_dims_t *filter_dims,
+                                                   const data_dims_t *output_dims,
+                                                   const dw_conv_params_t *conv_params)
 {
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t ch_mult = conv_params->ch_mult;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+
    int filter_size = filter_wd * filter_ht * channels * ch_mult;
-    int padding_used = ((filter_wd == 3) && (filter_ht == 3)) * 2;
-    int input_size = (input_wd + padding_used) * (input_ht + padding_used) * channels;
-    return  2 * (filter_size + input_size) + 16; //16 for alignment
+    int pad_width = 0, pad_height = 0;
+
+    if ((ch_mult == 1) && (channels % 8 == 0) && (filter_wd == 3) && (filter_ht == 3)) {
+        if (channels % 16 == 0) {
+            if (pad_wd || pad_ht) {
+                pad_width = pad_wd * 2;
+                pad_height = pad_ht * 2;
+            } else {
+                // check if we need to pad additionally
+                pad_width = (out_wd * stride_wd + filter_wd - 1) - input_wd;
+                pad_height = (out_ht * stride_ht + filter_ht - 1) - input_ht;
+                // printf("in(%d %d %d), out(%d %d), filter (%d %d) stride (%d %d), pad (%d %d)",
+                //         input_wd, input_ht, channels, out_wd, out_ht, filter_wd, filter_ht,
+                //         stride_wd, stride_ht, pad_wd, pad_ht);
+            }
+            if (pad_width || pad_height) {
+                int input_size = (input_wd + pad_width) * (input_ht + pad_height) * channels;
+                // printf("ask1 %d\n", filter_size + input_size + 16);
+                return filter_size + input_size + 16;  // 16 for alignment
+            } else {
+                // printf("ask2 %d\n", filter_size + 16);
+                return filter_size + 16;  // 16 for alignment
+            }
+        } else {
+            int input_size = input_wd * input_ht * channels;
+            // printf("ask3 %d\n", 2 * (filter_size + input_size) + 16);
+            return  2 * (filter_size + input_size) + 16; // 16 for alignment
+        }
+    } else if (ch_mult % 4 == 0) {
+        int input_size = input_wd * input_ht * channels;
+        // printf("ask4 %d\n", 2 * (filter_size + input_size) + 16);
+        return  2 * (filter_size + input_size) + 16; // 16 for alignment
+    }
+    return 32; // just few bytes
 }

 void esp_nn_set_depthwise_conv_scratch_buf_esp32s3(void *buf)
@@ -376,29 +418,38 @@ void esp_nn_set_depthwise_conv_scratch_buf_esp32s3(void *buf)
 * Assumption 2: Pointers are valid
 * Assumption 3: dialation width = 1
 */
-void esp_nn_depthwise_conv_s8_esp32s3(const int8_t *input_data,
-                                      const uint16_t input_wd,
-                                      const uint16_t input_ht,
-                                      const uint16_t channels,
-                                      const int32_t input_offset,
-                                      const uint16_t pad_wd,
-                                      const uint16_t pad_ht,
-                                      const uint16_t stride_wd,
-                                      const uint16_t stride_ht,
-                                      const uint16_t ch_mult,
+
+
+
+void esp_nn_depthwise_conv_s8_esp32s3(const data_dims_t *input_dims,
+                                      const int8_t *input_data,
+                                      const data_dims_t *filter_dims,
                                      const int8_t *filter_data,
-                                      const uint16_t filter_wd,
-                                      const uint16_t filter_ht,
                                      const int32_t *bias,
+                                      const data_dims_t *output_dims,
                                      int8_t *out_data,
-                                      const uint16_t out_wd,
-                                      const uint16_t out_ht,
-                                      const int32_t out_offset,
-                                      const int32_t *out_shift,
-                                      const int32_t *out_mult,
-                                      const int32_t activation_min,
-                                      const int32_t activation_max)
+                                      const dw_conv_params_t *conv_params,
+                                      const quant_data_t *quant_data)
 {
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+    const uint16_t ch_mult = conv_params->ch_mult;
+
    int filter_size = filter_wd * filter_ht * channels * ch_mult;
    int align_len = 16 - (filter_size & 15);
    int input_size = input_wd * input_ht * channels;
@@ -423,18 +474,27 @@ void esp_nn_depthwise_conv_s8_esp32s3(const int8_t *input_data,
                                                                  stride_wd, stride_ht, filter_aligned, bias,
                                                                  out_data, out_wd, out_ht, out_offset, out_shift,
                                                                  out_mult, activation_min, activation_max);
-            } else if ((pad_wd == 0) && (pad_ht == 0) &&
-                    // because this does not handle padding offset cases yet, run just for stride (1, 1).
-                    // end padding of input with `-input_offset` should solve this
-                    (stride_wd == 1) && (stride_ht == 1)) {
+            } else if ((channels % 16 == 0) && (pad_wd == 0) && (pad_ht == 0)) {
                /* process in 8 bits */
                int8_t *filter_aligned = (int8_t *) scratch_buffer;
+                int8_t *input_padded = (int8_t *) scratch_buffer + filter_size + align_len;
+
+                // check if we need to pad additionally
+                int pad_right = (out_wd * stride_wd + filter_wd - 1) - input_wd;
+                int pad_bottom = (out_ht * stride_ht + filter_ht - 1) - input_ht;
+                if (pad_right || pad_bottom) { // pad right and bottom
+                    esp_nn_aligned_s8_pad_end_with_value(input_data, input_padded, input_wd, input_ht,
+                                                         channels, -input_offset, pad_right, pad_bottom);
+                } else {
+                    input_padded = (int8_t *) input_data;
+                }
                memcpy(filter_aligned, filter_data, filter_size);
-                esp_nn_depthwise_conv_s8_mult1_3x3_padded_esp32s3(input_data, input_wd, input_ht, channels, input_offset,
-                                                                  stride_wd, stride_ht, filter_aligned,
-                                                                  bias, out_data, out_wd, out_ht, out_offset, out_shift,
+                esp_nn_depthwise_conv_s8_mult1_3x3_padded_esp32s3(input_padded, input_wd + pad_right,
+                                                                  input_ht + pad_bottom, channels, input_offset,
+                                                                  stride_wd, stride_ht, filter_aligned, bias,
+                                                                  out_data, out_wd, out_ht, out_offset, out_shift,
                                                                  out_mult, activation_min, activation_max);
-            } else { /* (channels % 8) == 0 && pad_wd == 1 && pad_ht == 1 */
+            } else { /* (channels % 8) == 0 */
                esp_nn_s8_to_s16_esp32s3(filter_data, filter_data16, filter_size);
                esp_nn_aligned_s8_to_s16_with_offset_esp32s3(input_data, input_data16, input_size, input_offset);
                esp_nn_depthwise_conv_s16_mult1_3x3_esp32s3(input_data16, input_wd, input_ht, channels,
--- a/code/components/esp-nn/test_app/sdkconfig.defaults.esp32s3
+++ b/code/components/esp-nn/test_app/sdkconfig.defaults.esp32s3
@@ -0,0 +1,8 @@
+# Default configurations for ESP32-S3
+
+CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240=y
+CONFIG_ESP32S3_SPIRAM_SUPPORT=y
+
+CONFIG_ESP32S3_DATA_CACHE_64KB=y
+CONFIG_ESP32S3_DATA_CACHE_8WAYS=y
+CONFIG_ESP32S3_DATA_CACHE_LINE_64B=y
--- a/code/components/esp-nn/tests/src/basic_math_test.c
+++ b/code/components/esp-nn/tests/src/basic_math_test.c
@@ -23,7 +23,9 @@
 #include "test_utils.h"

 #if CONFIG_IDF_CMAKE
+#if (CONFIG_SPIRAM_SUPPORT && (CONFIG_SPIRAM_USE_CAPS_ALLOC || CONFIG_SPIRAM_USE_MALLOC))
 #define IDF_HEAP_CAPS 1
+#endif

 #if IDF_HEAP_CAPS
 #include "esp_heap_caps.h"
@@ -138,6 +140,11 @@ void esp_nn_add_elementwise_s8_test()
        out_c_orig = out_data_c;
        out_opt_orig = out_data_opt;
 #endif
+        if (input1_orig == NULL || input2_orig == NULL || out_c_orig == NULL ||
+                out_opt_orig == NULL) {
+            printf(ANSI_COLOR_RED"%s error allocating buffers\n"ANSI_COLOR_RESET, __FUNCTION__);
+            goto elementwise_add_test_cleanup;
+        }

        for (int i = 0; i < size; ++i) {
            input1[i] = rand() % 256 - 128;
@@ -194,10 +201,10 @@ elementwise_add_test_cleanup:
        if (input2_orig) {
            free(input2_orig);
        }
-        if (out_data_c) {
+        if (out_c_orig) {
            free(out_c_orig);
        }
-        if (out_data_opt) {
+        if (out_opt_orig) {
            free(out_opt_orig);
        }
    }
@@ -282,6 +289,11 @@ void esp_nn_mul_elementwise_s8_test()
        out_c_orig = out_data_c;
        out_opt_orig = out_data_opt;
 #endif
+        if (input1_orig == NULL || input2_orig == NULL || out_c_orig == NULL ||
+                out_opt_orig == NULL) {
+            printf(ANSI_COLOR_RED"%s error allocating buffers\n"ANSI_COLOR_RESET, __FUNCTION__);
+            goto elementwise_mult_test_cleanup;
+        }

        for (int i = 0; i < size; ++i) {
            input1[i] = rand() % 256 - 128;
@@ -333,10 +345,10 @@ elementwise_mult_test_cleanup:
        if (input2_orig) {
            free(input2_orig);
        }
-        if (out_data_c) {
+        if (out_c_orig) {
            free(out_c_orig);
        }
-        if (out_data_opt) {
+        if (out_opt_orig) {
            free(out_opt_orig);
        }
    }
--- a/code/components/esp-nn/tests/src/convolution_test.c
+++ b/code/components/esp-nn/tests/src/convolution_test.c
@@ -22,8 +22,9 @@
 #include "test_utils.h"

 #if CONFIG_IDF_CMAKE
+#if (CONFIG_SPIRAM_SUPPORT && (CONFIG_SPIRAM_USE_CAPS_ALLOC || CONFIG_SPIRAM_USE_MALLOC))
 #define IDF_HEAP_CAPS 1
-
+#endif
 #if IDF_HEAP_CAPS
 #include "esp_heap_caps.h"
 #endif
@@ -44,8 +45,8 @@ void esp_nn_depthwise_conv_s8_test()
    uint16_t filter_ht, filter_wd, ch_mult;
    uint16_t pad_wd, pad_ht, stride_wd, stride_ht;

-    // run for 10 iterations
-    for (int itr = 0; itr < 10; itr++) {
+    // run for 15 iterations
+    for (int itr = 0; itr < 15; itr++) {
        /* prepare data */
        switch (itr) {
        case 0: // (ch_mult 1, (channels % 16) = 0), filter (3,3), pad (0,0)
@@ -144,22 +145,52 @@ void esp_nn_depthwise_conv_s8_test()
            stride_wd = 2;
            stride_ht = 2;
            break;
-        default:
-            input_wd = 4;
-            input_ht = 4;
+        case 8: // same as case 7, with large parameters
+            input_wd = 58;
+            input_ht = 58;
            filter_ht = 3;
            filter_wd = 3;
-            ch_mult = 4;
-            channels = 4;
-            pad_wd = 1;
-            pad_ht = 1;
-            stride_wd = 1;
-            stride_ht = 1;
+            ch_mult = 1;
+            channels = 128;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 2;
+            stride_ht = 2;
+            break;
+        case 9: // (ch_mult 1, (channels % 16) = 0), filter (3,3), pad (0,0)  stride (2,2)
+            input_wd = 6;
+            input_ht = 6;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 16;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 2;
+            stride_ht = 2;
+            break;
+        default:
+            input_wd = 6;
+            input_ht = 6;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 16;
+            stride_wd = rand() % 2 + 1;
+            stride_ht = stride_wd;
+            pad_wd = stride_wd == 1 ? 0 : rand() % 2;
+            pad_ht = pad_wd;
+            printf("stride(%d), pad (%d)\t", stride_wd, pad_wd);
            break;
        }

        uint16_t out_wd = (input_wd - filter_wd + 1) / stride_wd;
        uint16_t out_ht = (input_ht - filter_ht + 1) / stride_ht;
+        if (itr == 9) {
+            // expect the function to handle this gracefully
+            out_wd += 1;
+            out_ht += 1;
+        }
        int in_size = input_wd * input_ht * channels;
        int out_size = out_wd * out_ht * channels * ch_mult;
        int filter_size = filter_wd * filter_ht * channels * ch_mult + 4;
@@ -210,9 +241,16 @@ void esp_nn_depthwise_conv_s8_test()
            out_mult[i] = 0x7eb0e200 + rand() % 50;
        }

-        int scratch_buf_size = esp_nn_get_depthwise_conv_scratch_size(input_wd, input_ht,
-                                                                    channels, ch_mult,
-                                                                    filter_wd, filter_ht);
+        data_dims_t input_dims = {.width = input_wd, .height = input_ht, .channels = channels, 1};
+        data_dims_t output_dims = {.width = out_wd, .height = out_ht, .channels = channels * ch_mult, 1};
+        data_dims_t filter_dims = {.width = filter_wd, .height = filter_ht, 0, 0};
+        dw_conv_params_t conv_params = {.in_offset = input_offset, .out_offset = out_offset, .ch_mult = ch_mult,
+                                        .stride = {stride_wd, stride_ht}, .padding = {pad_wd, pad_ht},
+                                        .dilation = {0, 0}, .activation = {activation_min, activation_max}};
+        quant_data_t quant_data = {.shift = out_shift, .mult = out_mult};
+
+        int scratch_buf_size = esp_nn_get_depthwise_conv_scratch_size(&input_dims, &filter_dims,
+                                                                      &output_dims, &conv_params);
        if (scratch_buf_size > 0) {
 #if IDF_HEAP_CAPS
            scratch_buf = heap_caps_malloc(scratch_buf_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
@@ -234,11 +272,8 @@ void esp_nn_depthwise_conv_s8_test()
        }

        /* C function */
-        esp_nn_depthwise_conv_s8_ansi(input, input_wd, input_ht, channels, input_offset,
-                                    pad_wd, pad_ht, stride_wd, stride_ht, ch_mult,
-                                    filter_data + 4, filter_wd, filter_ht,
-                                    bias + 1, out_data_c, out_wd, out_ht, out_offset, out_shift,
-                                    out_mult, activation_min, activation_max);
+        esp_nn_depthwise_conv_s8_ansi(&input_dims, input, &filter_dims, filter_data + 4,
+                                      bias + 1, &output_dims, out_data_c, &conv_params, &quant_data);

        if (itr == 0) {
            profile_c_end();
@@ -246,11 +281,8 @@ void esp_nn_depthwise_conv_s8_test()
        }

        /* Optimized function */
-        esp_nn_depthwise_conv_s8(input, input_wd, input_ht, channels, input_offset,
-                                pad_wd, pad_ht, stride_wd, stride_ht, ch_mult,
-                                filter_data + 4, filter_wd, filter_ht,
-                                bias + 1, out_data_opt, out_wd, out_ht, out_offset, out_shift,
-                                out_mult, activation_min, activation_max);
+        esp_nn_depthwise_conv_s8(&input_dims, input, &filter_dims, filter_data + 4,
+                                 bias + 1, &output_dims, out_data_opt, &conv_params, &quant_data);

        if (itr == 0) {
            /* disable profiler */
@@ -479,8 +511,16 @@ void esp_nn_conv_s8_test()
            out_mult[i] = 0x7f67f4f8 + rand() % 50;
        }

-        int scratch_buf_size = esp_nn_get_conv_scratch_size(in_wd, in_ht, in_channels,
-                                                            out_channels, filter_wd, filter_ht);
+        data_dims_t input_dims = {.width = in_wd, .height = in_ht, .channels = in_channels, 1};
+        data_dims_t output_dims = {.width = out_wd, .height = out_ht, .channels = out_channels, 1};
+        data_dims_t filter_dims = {.width = filter_wd, .height = filter_ht, 0, 0};
+        conv_params_t conv_params = {.in_offset = input_offset, .out_offset = out_offset,
+                                    .stride = {stride_wd, stride_ht}, .padding = {pad_wd, pad_ht},
+                                    .dilation = {0, 0}, .activation = {activation_min, activation_max}};
+        quant_data_t quant_data = {.shift = out_shift, .mult = out_mult};
+
+        int scratch_buf_size = esp_nn_get_conv_scratch_size(&input_dims, &filter_dims,
+                                                            &output_dims, &conv_params);
        if (scratch_buf_size > 0) {
 #if IDF_HEAP_CAPS
            void *scratch_buf = heap_caps_malloc(scratch_buf_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
@@ -502,11 +542,8 @@ void esp_nn_conv_s8_test()
        }

        /* C function */
-        esp_nn_conv_s8_ansi(input, in_wd, in_ht, in_channels, input_offset,
-                            pad_wd, pad_ht, stride_wd, stride_ht,
-                            filter_data + 2, filter_wd, filter_ht, bias,
-                            out_data_c, out_wd, out_ht, out_channels, out_offset, out_shift,
-                            out_mult, activation_min, activation_max);
+        esp_nn_conv_s8_ansi(&input_dims, input, &filter_dims, filter_data + 2,
+                            bias, &output_dims, out_data_c, &conv_params, &quant_data);

        if (itr == 0) {
            profile_c_end();
@@ -514,11 +551,8 @@ void esp_nn_conv_s8_test()
        }

        /* Optimized function */
-        esp_nn_conv_s8(input, in_wd, in_ht, in_channels, input_offset,
-                    pad_wd, pad_ht, stride_wd, stride_ht,
-                    filter_data + 2, filter_wd, filter_ht, bias,
-                    out_data_opt, out_wd, out_ht, out_channels, out_offset, out_shift,
-                    out_mult, activation_min, activation_max);
+        esp_nn_conv_s8(&input_dims, input, &filter_dims, filter_data + 2,
+                       bias, &output_dims, out_data_opt, &conv_params, &quant_data);

        if (itr == 0) {
            /* disable profiler */
--- a/code/components/esp-nn_20220724.zip
+++ b/code/components/esp-nn_20220724.zip
--- a/code/components/esp-nn_20220827.zip
+++ b/code/components/esp-nn_20220827.zip
--- a/code/components/esp32-camera-master.zip
+++ b/code/components/esp32-camera-master.zip
--- a/code/components/esp32-camera-master_20220724.zip
+++ b/code/components/esp32-camera-master_20220724.zip
--- a/code/components/jomjol_controlcamera/ClassControllCamera.cpp
+++ b/code/components/jomjol_controlcamera/ClassControllCamera.cpp
@@ -263,6 +263,9 @@ void CCamera::EnableAutoExposure(int flashdauer)
        ESP_LOGE(TAGCAMERACLASS, "Camera Capture Failed");
        LEDOnOff(false);
        LightOnOff(false);
+        LogFile.SwitchOnOff(true);
+        LogFile.WriteToFile("Camera Capture Failed (Procedure 'EnableAutoExposure') --> Reboot"
+                "Check that your camera module is working and connected properly.");
        doReboot();
    }
    esp_camera_fb_return(fb);        
@@ -313,7 +316,7 @@ esp_err_t CCamera::CaptureToBasisImage(CImageBasis *_Image, int delay)
        LightOnOff(false);

        LogFile.SwitchOnOff(true);
-        LogFile.WriteToFile("Camera is not working anymore - most propably hardware problem (instablility, ...). "
+        LogFile.WriteToFile("Camera is not working anymore (CCamera::CaptureToBasisImage) - most propably hardware problem (instablility, ...). "
                "System will reboot.");
        doReboot();

@@ -410,6 +413,9 @@ esp_err_t CCamera::CaptureToFile(std::string nm, int delay)
        ESP_LOGE(TAGCAMERACLASS, "CaptureToFile: Camera Capture Failed");
        LEDOnOff(false);
        LightOnOff(false);
+        LogFile.SwitchOnOff(true);
+        LogFile.WriteToFile("Camera Capture Failed (CCamera::CaptureToFile) --> Reboot"
+                "Check that your camera module is working and connected properly.");
        doReboot();

        return ESP_FAIL;
--- a/code/components/jomjol_fileserver_ota/server_file.cpp
+++ b/code/components/jomjol_fileserver_ota/server_file.cpp
@@ -95,6 +95,11 @@ esp_err_t get_tflite_file_handler(httpd_req_t *req)
        _filename = std::string(entry->d_name);
        printf("File: %s\t", _filename.c_str());

+        // ignore all files with starting dot (hidden files)
+        if (_filename.rfind(".", 0) == 0) {
+            continue;
+        }
+
        _fileext = _filename;
        pos = _fileext.find_last_of(".");
        if (pos != std::string::npos)
--- a/code/components/jomjol_fileserver_ota/server_ota.cpp
+++ b/code/components/jomjol_fileserver_ota/server_ota.cpp
@@ -416,6 +416,8 @@ void task_reboot(void *pvParameter)
 }

 void doReboot(){
+    LogFile.SwitchOnOff(true);
+    LogFile.WriteToFile("Reboot triggert by Software (5s).");
    ESP_LOGI(TAGPARTOTA, "Reboot in 5sec");
    LogFile.WriteToFile("Reboot in 5sec");
    xTaskCreate(&task_reboot, "reboot", configMINIMAL_STACK_SIZE * 64, NULL, 10, NULL);
@@ -435,7 +437,7 @@ esp_err_t handler_reboot(httpd_req_t *req)

    LogFile.WriteToFile("handler_reboot");
    ESP_LOGI(TAGPARTOTA, "!!! System will restart within 5 sec!!!");
-    const char* resp_str = "!!! System will restart within 5 sec!!!";
+    const char* resp_str = "<body style='font-family: arial'> <h3 id=t></h3></body><script>var h='Rebooting!<br>The page will automatically reload.<br>'; document.getElementById('t').innerHTML=h; setInterval(function (){h +='.'; document.getElementById('t').innerHTML=h; fetch(window.location.hostname,{mode: 'no-cors'}).then(r=>{window.location.replace('/wasserzaehler_roi.html');})}, 1000);</script>";
    httpd_resp_send(req, resp_str, strlen(resp_str)); 
    
    doReboot();
--- a/code/components/jomjol_flowcontroll/ClassFlowCNNGeneral.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowCNNGeneral.cpp
@@ -28,7 +28,7 @@ ClassFlowCNNGeneral::ClassFlowCNNGeneral(ClassFlowAlignment *_flowalign, t_CNNTy
    flowpostalignment = _flowalign;
 }

-string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution, int prev)
+string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution, int prev, float _vorgaengerAnalog)
 {
    string result = "";    

@@ -41,8 +41,8 @@ string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution
        float zahl = GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float;
        int ergebnis_nachkomma = ((int) floor(zahl * 10) + 10) % 10;
        
-        prev = ZeigerEval(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev);
-        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout(analog) zahl=" + std::to_string(zahl) + ", ergebnis_nachkomma=" + std::to_string(ergebnis_nachkomma) + ", prev=" + std::to_string(prev));
+        prev = ZeigerEvalAnalogNeu(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev);
+//        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout(analog) zahl=" + std::to_string(zahl) + ", ergebnis_nachkomma=" + std::to_string(ergebnis_nachkomma) + ", prev=" + std::to_string(prev));
        result = std::to_string(prev);

        if (_extendedResolution && (CNNType != Digital))
@@ -50,7 +50,7 @@ string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution

        for (int i = GENERAL[_analog]->ROI.size() - 2; i >= 0; --i)
        {
-            prev = ZeigerEval(GENERAL[_analog]->ROI[i]->result_float, prev);
+            prev = ZeigerEvalAnalogNeu(GENERAL[_analog]->ROI[i]->result_float, prev);
            result = std::to_string(prev) + result;
        }
        return result;
@@ -82,13 +82,14 @@ string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution
                result = std::to_string(ergebnis_vorkomma) + std::to_string(ergebnis_nachkomma);
                prev = ergebnis_vorkomma;
                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout(dig100-ext) ergebnis_vorkomma=" + std::to_string(ergebnis_vorkomma) + ", ergebnis_nachkomma=" + std::to_string(ergebnis_nachkomma) + ", prev=" + std::to_string(prev));
-        
-
            }
            else
            {
 //                prev = ZeigerEval(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev);
-                prev = ZeigerEvalHybrid(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev, prev);
+                if (_vorgaengerAnalog >= 0)
+                    prev = ZeigerEvalHybridNeu(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, _vorgaengerAnalog, prev, true);
+                else
+                    prev = ZeigerEvalHybridNeu(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev, prev);
                result = std::to_string(prev);
                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout(dig100)  prev=" + std::to_string(prev));
        
@@ -105,9 +106,11 @@ string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution
        {
            if (GENERAL[_analog]->ROI[i]->result_float >= 0)
            {
-                prev = ZeigerEvalHybrid(GENERAL[_analog]->ROI[i]->result_float, GENERAL[_analog]->ROI[i+1]->result_float, prev);
+                prev = ZeigerEvalHybridNeu(GENERAL[_analog]->ROI[i]->result_float, GENERAL[_analog]->ROI[i+1]->result_float, prev);
+                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout#ZeigerEvalHybridNeu()= " + std::to_string(prev));
                result = std::to_string(prev) + result;
-
+                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout#result= " + result);
+                
            }
            else
            {
@@ -120,57 +123,15 @@ string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution
        return result;
    }

-/*
-    if (CNNType == Digital100)
-    {
-        int zif_akt = -1;
-
-        float zahl = GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float;
-        if (zahl >= 0)       // NaN?
-        {
-            if (_extendedResolution)
-            {
-                int ergebnis_nachkomma = ((int) floor(zahl * 10)) % 10;
-                int ergebnis_vorkomma = ((int) floor(zahl)) % 10;
-
-                result = std::to_string(ergebnis_vorkomma) + std::to_string(ergebnis_nachkomma);
-                zif_akt = ergebnis_vorkomma;
-            }
-            else
-            {
-                zif_akt = ZeigerEvalHybrid(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, -1, -1);
-                result = std::to_string(zif_akt);
-            }
-        }
-        else
-        {
-            result = "N";
-            if (_extendedResolution && (CNNType != Digital))
-                result = "NN";
-        }
-
-        for (int i = GENERAL[_analog]->ROI.size() - 2; i >= 0; --i)
-        {
-            if (GENERAL[_analog]->ROI[i]->result_float >= 0)
-            {
-                zif_akt = ZeigerEvalHybrid(GENERAL[_analog]->ROI[i]->result_float, GENERAL[_analog]->ROI[i+1]->result_float, zif_akt);
-                result = std::to_string(zif_akt) + result;
-            }
-            else
-            {
-                zif_akt = -1;
-                result = "N" + result;
-            }
-        }
-        return result;
-    }
-*/

    return result;
 }

+/*
 int ClassFlowCNNGeneral::ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int eval_vorgaenger)
 {
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybrid( " + std::to_string(zahl) + ", " + std::to_string(zahl_vorgaenger) + ", " + std::to_string(eval_vorgaenger) + ")");
+                
    int ergebnis_nachkomma = ((int) floor(zahl * 10)) % 10;
    int ergebnis_vorkomma = ((int) floor(zahl) + 10) % 10;

@@ -183,10 +144,15 @@ int ClassFlowCNNGeneral::ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int
            return ((int) trunc(zahl) + 10) % 10;
    }

+    // 9.0, da bei getReadout() prev als int übergeben wird (9 statt 9.5)
+    // tritt bei der ersten ziffer von digit auf, wenn analog davor (2. Aufruf von getReadout)
    if ((zahl_vorgaenger >= 0.5 ) && (zahl_vorgaenger < 9.5))
    {
        // kein Ziffernwechsel, da Vorkomma weit genug weg ist (0+/-0.5) --> zahl wird gerundet
-        return ((int) round(zahl) + 10) % 10;
+        if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
+            return ((int) round(zahl) + 10) % 10;
+        else
+            return ((int) trunc(zahl) + 10) % 10;
    }  
    else
    {
@@ -211,38 +177,169 @@ int ClassFlowCNNGeneral::ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int
                        + ", zahl_vorgaenger=" + std::to_string(zahl_vorgaenger) + ", eval_vorgaenger=" + std::to_string(eval_vorgaenger));
    return -1;

-/*
-    if (zahl_vorgaenger > 9.2)              // Ziffernwechsel beginnt
+}
+*/
+
+int ClassFlowCNNGeneral::ZeigerEvalHybridNeu(float zahl, float zahl_vorgaenger, int eval_vorgaenger, bool AnalogerVorgaenger)
+{
+    int result;
+    int ergebnis_nachkomma = ((int) floor(zahl * 10)) % 10;
+    int ergebnis_vorkomma = ((int) floor(zahl) + 10) % 10;
+
+    if (eval_vorgaenger < 0)
    {
-        if (eval_vorgaenger == 0)           // Wechsel hat schon stattgefunden
-        {
-            return ((int) round(zahl) + 10) % 10;      // Annahme, dass die neue Zahl schon in der Nähe des Ziels ist
-        }
+        if ((ergebnis_nachkomma <= DigitalUnschaerfe * 10) || (ergebnis_nachkomma >= DigitalUnschaerfe * 10))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
+            result = (int) (round(zahl) + 10) % 10;
        else
-        {
-            if (zahl_vorgaenger <= 9.5)     // Wechsel startet gerade, aber beginnt erst
-            {
-                if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
-                    return ((int) round(zahl) + 10) % 10;
-                else
-                    return ((int) trunc(zahl) + 10) % 10;
-            }
-            else
-            {
-                return ((int) trunc(zahl) + 10) % 10;   // Wechsel schon weiter fortgeschritten, d.h. über 2 als Nachkomma
-            }
-        }
+            result = (int) ((int) trunc(zahl) + 10) % 10;
+
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - kein Vorgänger - Ergebnis = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
    }

-    if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
-        return ((int) round(zahl) + 10) % 10;
+    if (AnalogerVorgaenger)
+    {
+//        result = ZeigerEvalAnalogToDigitNeu(zahl, eval_vorgaenger);
+        result = ZeigerEvalAnalogToDigitNeu(zahl, zahl_vorgaenger);
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - Analoger Vorgänger, Bewertung über ZeigerEvalAnalogNeu = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }

-    return ((int) trunc(zahl) + 10) % 10;
-*/
+    if ((zahl_vorgaenger >= DigitalUebergangsbereichVorgaenger ) && (zahl_vorgaenger <= (10.0 - DigitalUebergangsbereichVorgaenger)))
+    {
+        // kein Ziffernwechsel, da Vorgänger weit genug weg ist (0+/-DigitalUebergangsbereichVorgaenger) --> zahl wird gerundet
+        if ((ergebnis_nachkomma <= DigitalBand) || (ergebnis_nachkomma >= (10-DigitalBand)))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
+            result = ((int) round(zahl) + 10) % 10;
+        else
+            result = ((int) trunc(zahl) + 10) % 10;
+
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - KEIN Analoger Vorgänger, kein Ziffernwechsel, da Vorkomma weit genug weg = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }  
+
+    if (eval_vorgaenger <= 1)  // Nulldurchgang hat stattgefunden (!Bewertung über Prev_value und nicht Zahl!) --> hier aufrunden (2.8 --> 3, aber auch 3.1 --> 3)
+    {
+        if (ergebnis_nachkomma > 5)
+            result =  (ergebnis_vorkomma + 1) % 10;
+        else
+            result =  ergebnis_vorkomma;
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - KEIN Analoger Vorgänger, Nulldurchgang hat stattgefunden = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }
+
+    // bleibt nur >= 9.5 --> noch kein Nulldurchgang --> 2.8 --> 2, und 3.1 --> 2
+    // hier auf 4 reduziert, da erst ab Vorgänder 9 anfängt umzustellen. Bei 9.5 Vorgänger kann die aktuelle
+    // Zahl noch x.4 - x.5 sein.
+    if (ergebnis_nachkomma >= 4)
+        result =  ergebnis_vorkomma;
+    else
+        result =  (ergebnis_vorkomma - 1 + 10) % 10;
+
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - KEIN Analoger Vorgänger, >= 9.5 --> noch kein Nulldurchgang = " + std::to_string(result) +
+                                                " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+    return result;
 }


+int ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu(float zahl, float ziffer_vorgaenger)
+{
+    int result;
+    int ergebnis_nachkomma = ((int) floor(zahl * 10)) % 10;
+    int ergebnis_vorkomma = ((int) floor(zahl) + 10) % 10;

+    if (ziffer_vorgaenger < 0)
+    {
+        result = (int) floor(zahl);
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu - kein Vorgänger - Ergebnis = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+        return result;
+    }
+
+    if ((ziffer_vorgaenger >= DigitalUebergangsbereichVorgaengerAnalogToDigit ) && (ziffer_vorgaenger <= (10.0 - DigitalUebergangsbereichVorgaengerAnalogToDigit)))
+    {
+        // kein Ziffernwechsel, da Vorgänger weit genug weg ist (0+/-DigitalUebergangsbereichVorgaenger) --> zahl wird gerundet
+        if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
+            result = ((int) round(zahl) + 10) % 10;
+        else
+            result = ((int) trunc(zahl) + 10) % 10;
+
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu - kein Ziffernwechsel, da Vorkomma weit genug weg = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }  
+
+    if (ziffer_vorgaenger <= 1)  // Nulldurchgang hat stattgefunden (!Bewertung über Prev_value und nicht Zahl!) --> hier aufrunden (2.8 --> 3, aber auch 3.1 --> 3)
+    {
+        if (ergebnis_nachkomma > 5)
+            result =  (ergebnis_vorkomma + 1) % 10;
+        else
+            result =  ergebnis_vorkomma;
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu - Nulldurchgang hat stattgefunden = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }
+
+    // bleibt nur >= 9.5 --> noch kein Nulldurchgang --> 2.8 --> 2, und 3.1 --> 2
+    // hier auf 4 reduziert, da erst ab Vorgänder 9 anfängt umzustellen. Bei 9.5 Vorgänger kann die aktuelle
+    // Zahl noch x.4 - x.5 sein.
+    if (ergebnis_nachkomma >= 4)
+        result =  ergebnis_vorkomma;
+    else
+        result =  (ergebnis_vorkomma - 1 + 10) % 10;
+
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu - 9.0 --> noch kein Nulldurchgang = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+    return result;
+}
+
+int ClassFlowCNNGeneral::ZeigerEvalAnalogNeu(float zahl, int ziffer_vorgaenger)
+{
+    float zahl_min, zahl_max;
+    int result;
+
+    if (ziffer_vorgaenger == -1)
+    {
+        result = (int) floor(zahl);
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogNeu - kein Vorgänger - Ergebnis = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+        return result;
+    }
+
+    zahl_min = zahl - AnalogFehler / 10.0;
+    zahl_max = zahl + AnalogFehler / 10.0;
+
+    if ((int) floor(zahl_max) - (int) floor(zahl_min) != 0)
+    {
+        if (ziffer_vorgaenger <= AnalogFehler)
+        {
+            result = ((int) floor(zahl_max) + 10) % 10;
+            if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogNeu - Zahl uneindeutig, Korrektur nach oben - Ergebnis = " + std::to_string(result) +
+                                                        " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+            return result;
+        }
+        if (ziffer_vorgaenger >= 10 - AnalogFehler)
+        {
+            result = ((int) floor(zahl_min) + 10) % 10;
+            if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogNeu - Zahl uneindeutig, Korrektur nach unten - Ergebnis = " + std::to_string(result) +
+                                                        " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+            return result;
+        }
+    }
+    
+
+    result = ((int) floor(zahl) + 10) % 10;
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogNeu - Zahl eindeutig, keine Korrektur notwendig - Ergebnis = " + std::to_string(result) +
+                                                " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+
+    return result;
+
+}
+
+/*
 int ClassFlowCNNGeneral::ZeigerEval(float zahl, int ziffer_vorgaenger)
 {   
    int ergebnis_nachkomma = ((int) floor(zahl * 10) + 10) % 10;
@@ -273,6 +370,7 @@ int ClassFlowCNNGeneral::ZeigerEval(float zahl, int ziffer_vorgaenger)
    ergebnis = (ergebnis + 10) % 10;
    return ergebnis;
 }
+*/

 bool ClassFlowCNNGeneral::ReadParameter(FILE* pfile, string& aktparamgraph)
 {
@@ -760,7 +858,7 @@ bool ClassFlowCNNGeneral::doNeuralNetwork(string time)
                            _fit = _val + _valminus;

                        }
-                        if (result >= 10)
+                        if (result > 10)
                            result = result - 10;
                        if (result < 0)
                            result = result + 10;
@@ -811,34 +909,21 @@ bool ClassFlowCNNGeneral::doNeuralNetwork(string time)
                case Analogue100:
                    {
                        int _num;
-                        float _fit;
                        float _result_save_file;
                        
                        tflite->LoadInputImageBasis(GENERAL[_ana]->ROI[i]->image);        
                        tflite->Invoke();
    
                        _num = tflite->GetOutClassification();
-                        _fit = tflite->GetOutputValue(_num);
-
+                        
                        GENERAL[_ana]->ROI[i]->result_float = (float)_num / 10.0;

 
                        _result_save_file = GENERAL[_ana]->ROI[i]->result_float;

-                        if (_fit < CNNGoodThreshold)
-                        {
-                            GENERAL[_ana]->ROI[i]->isReject = true;
-                            GENERAL[_ana]->ROI[i]->result_float = -1;
-                            _result_save_file+= 100;     // Für den Fall, dass fit nicht ausreichend, soll trotzdem das Ergebnis mit "-10x.y" abgespeichert werden.
-                            string zw = "Value Rejected due to Threshold (Fit: " + to_string(_fit) + "Threshold: " + to_string(CNNGoodThreshold);
-                            printf("Value Rejected due to Threshold (Fit: %f, Threshold: %f\n", _fit, CNNGoodThreshold);
-                            LogFile.WriteToFile(zw);
-                        }
-                        else
-                        {
-                            GENERAL[_ana]->ROI[i]->isReject = false;
-                        }
-
+                        
+                        GENERAL[_ana]->ROI[i]->isReject = false;
+                        
                        printf("Result General(Analog)%i: %f\n", i, GENERAL[_ana]->ROI[i]->result_float); 

                        if (isLogImage)
@@ -885,11 +970,14 @@ std::vector<HTMLInfo*> ClassFlowCNNGeneral::GetHTMLInfo()
    for (int _ana = 0; _ana < GENERAL.size(); ++_ana)
        for (int i = 0; i < GENERAL[_ana]->ROI.size(); ++i)
        {
+            printf("Image: %d\n", (int) GENERAL[_ana]->ROI[i]->image);
+            if (GENERAL[_ana]->ROI[i]->image)
+            {
                if (GENERAL[_ana]->name == "default")
                    GENERAL[_ana]->ROI[i]->image->SaveToFile(FormatFileName("/sdcard/img_tmp/" + GENERAL[_ana]->ROI[i]->name + ".bmp"));
                else
                    GENERAL[_ana]->ROI[i]->image->SaveToFile(FormatFileName("/sdcard/img_tmp/" + GENERAL[_ana]->name + "_" + GENERAL[_ana]->ROI[i]->name + ".bmp"));
-
+            }

            HTMLInfo *zw = new HTMLInfo;
            if (GENERAL[_ana]->name == "default")
--- a/code/components/jomjol_flowcontroll/ClassFlowCNNGeneral.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowCNNGeneral.h
@@ -24,6 +24,13 @@ protected:
    t_CNNType CNNType;
    std::vector<general*> GENERAL;
    float CNNGoodThreshold;
+    float AnalogFehler = 3.0;
+    float AnalogToDigtalFehler = 0.8;
+    float DigitalUnschaerfe = 0.2;
+    int DigitalBand = 3;
+    float DigitalAnalogerVorgaengerUebergangsbereich = 2;
+    float DigitalUebergangsbereichVorgaengerAnalogToDigit = 1; // war vorher 2
+    float DigitalUebergangsbereichVorgaenger = 0.9;

    string cnnmodelfile;
    int modelxsize, modelysize, modelchannel;
@@ -34,8 +41,12 @@ protected:
    bool SaveAllFiles;   
 //    bool extendedResolution;

-    int ZeigerEval(float zahl, int ziffer_vorgaenger);
-    int ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int eval_vorgaenger);
+//    int ZeigerEval(float zahl, int ziffer_vorgaenger);
+//    int ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int eval_vorgaenger);
+    int ZeigerEvalAnalogNeu(float zahl, int ziffer_vorgaenger);
+    int ZeigerEvalAnalogToDigitNeu(float zahl, float ziffer_vorgaenger);
+    int ZeigerEvalHybridNeu(float zahl, float zahl_vorgaenger, int eval_vorgaenger, bool AnalogerVorgaenger = false);
+


    bool doNeuralNetwork(string time); 
@@ -50,7 +61,7 @@ public:
    bool doFlow(string time);

    string getHTMLSingleStep(string host);
-    string getReadout(int _analog, bool _extendedResolution = false, int prev = -1);   
+    string getReadout(int _analog, bool _extendedResolution = false, int prev = -1, float _vorgaengerAnalog = -1);   

    void DrawROI(CImageBasis *_zw); 

--- a/code/components/jomjol_flowcontroll/ClassFlowControll.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowControll.cpp
@@ -305,6 +305,7 @@ bool ClassFlowControll::doFlow(string time)
            if (i) i -= 1;    // vorheriger Schritt muss wiederholt werden (vermutlich Bilder aufnehmen)
            result = false;
            if (repeat > 5) {
+                LogFile.SwitchOnOff(true);
                LogFile.WriteToFile("Wiederholung 5x nicht erfolgreich --> reboot");
                doReboot();
                // Schritt wurde 5x wiederholt --> reboot
@@ -493,6 +494,8 @@ bool ClassFlowControll::ReadParameter(FILE* pfile, string& aktparamgraph)
                // reboot notwendig damit die neue wlan.ini auch benutzt wird !!!
                fclose(pfile);
                printf("do reboot\n");
+                LogFile.SwitchOnOff(true);
+                LogFile.WriteToFile("Reboot to activate new HOSTNAME.");
                esp_restart();
                hard_restart();                   
                doReboot();
@@ -586,6 +589,8 @@ esp_err_t ClassFlowControll::GetJPGStream(std::string _fn, httpd_req_t *req)
        {
            std::vector<HTMLInfo*> htmlinfo;
            htmlinfo = GetAllDigital();
+            printf("After getClassFlowControll::GetAllDigital\n");
+
            for (int i = 0; i < htmlinfo.size(); ++i)
            {
                if (_fn == htmlinfo[i]->filename)
--- a/code/components/jomjol_flowcontroll/ClassFlowPostProcessing.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowPostProcessing.cpp
@@ -238,8 +238,9 @@ void ClassFlowPostProcessing::SavePreValue()

        _zw = NUMBERS[j]->name + "\t" + NUMBERS[j]->timeStamp + "\t" + RundeOutput(NUMBERS[j]->PreValue, NUMBERS[j]->Nachkomma) + "\n";
        printf("Write PreValue Zeile: %s\n", _zw.c_str());
-
-        fputs(_zw.c_str(), pFile);
+        if (pFile) {
+            fputs(_zw.c_str(), pFile);
+        }
    }

    UpdatePreValueINI = false;
@@ -568,8 +569,10 @@ void ClassFlowPostProcessing::InitNUMBERS()
        NUMBERS.push_back(_number);
    }

-    for (int i = 0; i < NUMBERS.size(); ++i)
+    for (int i = 0; i < NUMBERS.size(); ++i) {
        printf("Number %s, Anz DIG: %d, Anz ANA %d\n", NUMBERS[i]->name.c_str(), NUMBERS[i]->AnzahlDigital, NUMBERS[i]->AnzahlAnalog);
+    }
+
 }

 string ClassFlowPostProcessing::ShiftDecimal(string in, int _decShift){
@@ -667,7 +670,7 @@ bool ClassFlowPostProcessing::doFlow(string zwtime)
        if (NUMBERS[j]->digit_roi)
        {
            if (NUMBERS[j]->analog_roi) 
-                NUMBERS[j]->ReturnRawValue = flowDigit->getReadout(j, false, previous_value) + NUMBERS[j]->ReturnRawValue;
+                NUMBERS[j]->ReturnRawValue = flowDigit->getReadout(j, false, previous_value, NUMBERS[j]->analog_roi->ROI[0]->result_float) + NUMBERS[j]->ReturnRawValue;
            else
                NUMBERS[j]->ReturnRawValue = flowDigit->getReadout(j, NUMBERS[j]->isExtendedResolution, previous_value);        // Extended Resolution nur falls es keine analogen Ziffern gibt
        }
--- a/code/components/jomjol_logfile/ClassLogFile.cpp
+++ b/code/components/jomjol_logfile/ClassLogFile.cpp
@@ -73,7 +73,7 @@ void ClassLogFile::WriteToDedicatedFile(std::string _fn, std::string info, bool

 //    pFile = OpenFileAndWait(_fn.c_str(), "a"); 
    pFile = fopen(_fn.c_str(), "a+");
-    printf("Logfile opened: %s\n", _fn.c_str());
+//    printf("Logfile opened: %s\n", _fn.c_str());

    if (pFile!=NULL) {
        if (_time)
--- a/code/components/jomjol_tfliteclass/server_tflite.cpp
+++ b/code/components/jomjol_tfliteclass/server_tflite.cpp
@@ -314,6 +314,7 @@ esp_err_t handler_wasserzaehler(httpd_req_t *req)
        
        std::vector<HTMLInfo*> htmlinfodig;
        htmlinfodig = tfliteflow.GetAllDigital();  
+
        for (int i = 0; i < htmlinfodig.size(); ++i)
        {
            if (tfliteflow.GetTypeDigital() == Digital)
--- a/code/components/tflite-lib/CMakeLists.txt
+++ b/code/components/tflite-lib/CMakeLists.txt
@@ -25,7 +25,8 @@ list(REMOVE_ITEM srcs_kernels
          "${tfmicro_kernels_dir}/depthwise_conv.cc"
          "${tfmicro_kernels_dir}/fully_connected.cc"
          "${tfmicro_kernels_dir}/mul.cc"
-          "${tfmicro_kernels_dir}/pooling.cc")
+          "${tfmicro_kernels_dir}/pooling.cc"
+          "${tfmicro_kernels_dir}/softmax.cc")

 FILE(GLOB esp_nn_kernels
          "${tfmicro_kernels_dir}/esp_nn/*.cc")
@@ -38,6 +39,10 @@ set(lib_srcs
          "${tflite_dir}/kernels/kernel_util.cc"
          "${tflite_dir}/micro/memory_planner/greedy_memory_planner.cc"
          "${tflite_dir}/micro/memory_planner/linear_memory_planner.cc"
+          "${tflite_dir}/micro/arena_allocator/non_persistent_arena_buffer_allocator.cc"
+          "${tflite_dir}/micro/arena_allocator/persistent_arena_buffer_allocator.cc"
+          "${tflite_dir}/micro/arena_allocator/recording_single_arena_buffer_allocator.cc"
+          "${tflite_dir}/micro/arena_allocator/single_arena_buffer_allocator.cc"
          "${tflite_dir}/c/common.cc"
          "${tflite_dir}/core/api/error_reporter.cc"
          "${tflite_dir}/core/api/flatbuffer_conversions.cc"
--- a/code/components/tflite-lib/tensorflow/lite/builtin_ops.h
+++ b/code/components/tflite-lib/tensorflow/lite/builtin_ops.h
@@ -179,6 +179,12 @@ typedef enum {
  kTfLiteBuiltinMultinomial = 149,
  kTfLiteBuiltinGelu = 150,
  kTfLiteBuiltinDynamicUpdateSlice = 151,
+  kTfLiteBuiltinRelu0To1 = 152,
+  kTfLiteBuiltinUnsortedSegmentProd = 153,
+  kTfLiteBuiltinUnsortedSegmentMax = 154,
+  kTfLiteBuiltinUnsortedSegmentSum = 155,
+  kTfLiteBuiltinAtan2 = 156,
+  kTfLiteBuiltinUnsortedSegmentMin = 157,
 } TfLiteBuiltinOperator;

 #ifdef __cplusplus
--- a/code/components/tflite-lib/tensorflow/lite/c/c_api_types.h
+++ b/code/components/tflite-lib/tensorflow/lite/c/c_api_types.h
@@ -113,7 +113,13 @@ typedef struct TfLiteQuantizationParams {
 } TfLiteQuantizationParams;

 // --------------------------------------------------------------------------
-// Opaque types used by c_api_opaque.h.
+// Opaque types used by c_api.h, c_api_opaque.h and common.h.
+
+// TfLiteOpaqueContext is an opaque version of TfLiteContext;
+typedef struct TfLiteOpaqueContext TfLiteOpaqueContext;
+
+// TfLiteOpaqueNode is an opaque version of TfLiteNode;
+typedef struct TfLiteOpaqueNode TfLiteOpaqueNode;

 // TfLiteOpaqueTensor is an opaque version of TfLiteTensor;
 typedef struct TfLiteOpaqueTensor TfLiteOpaqueTensor;
--- a/code/components/tflite-lib/tensorflow/lite/c/common.cc
+++ b/code/components/tflite-lib/tensorflow/lite/c/common.cc
@@ -14,7 +14,11 @@ limitations under the License.
 ==============================================================================*/

 #include "tensorflow/lite/c/common.h"
+
 #include "tensorflow/lite/c/c_api_types.h"
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+#include "tensorflow/lite/tensorflow_profiler_logger.h"
+#endif

 #ifndef TF_LITE_STATIC_MEMORY
 #include <stdlib.h>
@@ -99,7 +103,12 @@ void TfLiteFloatArrayFree(TfLiteFloatArray* a) { free(a); }
 void TfLiteTensorDataFree(TfLiteTensor* t) {
  if (t->allocation_type == kTfLiteDynamic ||
      t->allocation_type == kTfLitePersistentRo) {
-    free(t->data.raw);
+    if (t->data.raw) {
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+      tflite::OnTfLiteTensorDealloc(t);
+#endif
+      free(t->data.raw);
+    }
  }
  t->data.raw = nullptr;
 }
@@ -161,7 +170,7 @@ void TfLiteTensorFree(TfLiteTensor* t) {
  t->dims = nullptr;

  if (t->dims_signature) {
-    TfLiteIntArrayFree((TfLiteIntArray *) t->dims_signature);
+    TfLiteIntArrayFree((TfLiteIntArray*)t->dims_signature);
  }
  t->dims_signature = nullptr;

@@ -191,16 +200,12 @@ void TfLiteTensorReset(TfLiteType type, const char* name, TfLiteIntArray* dims,
 }

 TfLiteStatus TfLiteTensorCopy(const TfLiteTensor* src, TfLiteTensor* dst) {
-  if (!src || !dst)
-    return kTfLiteOk;
-  if (src->bytes != dst->bytes)
-    return kTfLiteError;
-  if (src == dst)
-    return kTfLiteOk;
+  if (!src || !dst) return kTfLiteOk;
+  if (src->bytes != dst->bytes) return kTfLiteError;
+  if (src == dst) return kTfLiteOk;

  dst->type = src->type;
-  if (dst->dims)
-    TfLiteIntArrayFree(dst->dims);
+  if (dst->dims) TfLiteIntArrayFree(dst->dims);
  dst->dims = TfLiteIntArrayCopy(src->dims);
  memcpy(dst->data.raw, src->data.raw, src->bytes);
  dst->buffer_handle = src->buffer_handle;
@@ -218,8 +223,17 @@ void TfLiteTensorRealloc(size_t num_bytes, TfLiteTensor* tensor) {
  // TODO(b/145340303): Tensor data should be aligned.
  if (!tensor->data.raw) {
    tensor->data.raw = (char*)malloc(num_bytes);
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+    tflite::OnTfLiteTensorAlloc(tensor, num_bytes);
+#endif
  } else if (num_bytes > tensor->bytes) {
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+    tflite::OnTfLiteTensorDealloc(tensor);
+#endif
    tensor->data.raw = (char*)realloc(tensor->data.raw, num_bytes);
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+    tflite::OnTfLiteTensorAlloc(tensor, num_bytes);
+#endif
  }
  tensor->bytes = num_bytes;
 }
--- a/code/components/tflite-lib/tensorflow/lite/c/common.h
+++ b/code/components/tflite-lib/tensorflow/lite/c/common.h
@@ -173,9 +173,9 @@ void TfLiteFloatArrayFree(TfLiteFloatArray* a);
    }                                                 \
  } while (false)
 #else  // TF_LITE_STRIP_ERROR_STRINGS
-#define UNUSED(...) (void)sizeof(#__VA_ARGS__)
-#define TF_LITE_KERNEL_LOG(context, ...) UNUSED(__VA_ARGS__)
-#define TF_LITE_MAYBE_KERNEL_LOG(context, ...) UNUSED(__VA_ARGS__)
+#define ARGS_UNUSED(...) (void)sizeof(#__VA_ARGS__)
+#define TF_LITE_KERNEL_LOG(context, ...) ARGS_UNUSED(__VA_ARGS__)
+#define TF_LITE_MAYBE_KERNEL_LOG(context, ...) ARGS_UNUSED(__VA_ARGS__)
 #endif  // TF_LITE_STRIP_ERROR_STRINGS

 // Check whether value is true, and if not return kTfLiteError from
@@ -842,6 +842,12 @@ typedef struct TfLiteContext {
                                   size_t* bytes);
 } TfLiteContext;

+// `TfLiteRegistrationExternal` is an external version of `TfLiteRegistration`
+// for C API which doesn't use internal types (such as `TfLiteContext`) but only
+// uses stable API types (such as `TfLiteOpaqueContext`). The purpose of each
+// field is the exactly the same as with `TfLiteRegistration`.
+typedef struct TfLiteRegistrationExternal TfLiteRegistrationExternal;
+
 typedef struct TfLiteRegistration {
  // Initializes the op from serialized data.
  // Called only *once* for the lifetime of the op, so any one-time allocations
@@ -903,8 +909,31 @@ typedef struct TfLiteRegistration {
  // Note: It is the responsibility of the registration binder to set this
  // properly.
  int version;
+
+  // The external version of `TfLiteRegistration`. Since we can't use internal
+  // types (such as `TfLiteContext`) for C API to maintain ABI stability.
+  // C API user will provide `TfLiteRegistrationExternal` to implement custom
+  // ops. We keep it inside of `TfLiteRegistration` and use it to route
+  // callbacks properly.
+  TfLiteRegistrationExternal* registration_external;
 } TfLiteRegistration;

+// Old version of `TfLiteRegistration` to maintain binary backward
+// compatibility.
+// WARNING: This structure is deprecated / not an official part of the API.
+// It should be only used for binary backward compatibility.
+typedef struct TfLiteRegistration_V1 {
+  void* (*init)(TfLiteContext* context, const char* buffer, size_t length);
+  void (*free)(TfLiteContext* context, void* buffer);
+  TfLiteStatus (*prepare)(TfLiteContext* context, TfLiteNode* node);
+  TfLiteStatus (*invoke)(TfLiteContext* context, TfLiteNode* node);
+  const char* (*profiling_string)(const TfLiteContext* context,
+                                  const TfLiteNode* node);
+  int32_t builtin_code;
+  const char* custom_name;
+  int version;
+} TfLiteRegistration_V1;
+
 // The flags used in `TfLiteDelegate`. Note that this is a bitmask, so the
 // values should be 1, 2, 4, 8, ...etc.
 typedef enum TfLiteDelegateFlags {
--- a/code/components/tflite-lib/tensorflow/lite/core/api/flatbuffer_conversions.cc
+++ b/code/components/tflite-lib/tensorflow/lite/core/api/flatbuffer_conversions.cc
@@ -493,6 +493,11 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
      return ParseSquare(op, error_reporter, allocator, builtin_data);
    }

+    case BuiltinOperator_SQUARED_DIFFERENCE: {
+      return ParseSquaredDifference(op, error_reporter, allocator,
+                                    builtin_data);
+    }
+
    case BuiltinOperator_SQUEEZE: {
      return ParseSqueeze(op, error_reporter, allocator, builtin_data);
    }
@@ -840,14 +845,25 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
    // TODO(aselle): Implement call in BuiltinOptions, but nullptrs are
    // ok for now, since there is no call implementation either.
    case BuiltinOperator_CALL:
+    case BuiltinOperator_COMPLEX_ABS:
    case BuiltinOperator_CONCAT_EMBEDDINGS:
    case BuiltinOperator_COS:
    case BuiltinOperator_CUSTOM:
+    case BuiltinOperator_DENSIFY:
+    case BuiltinOperator_DYNAMIC_UPDATE_SLICE:
    case BuiltinOperator_EMBEDDING_LOOKUP:
    case BuiltinOperator_EQUAL:
+    case BuiltinOperator_HASHTABLE_FIND:
+    case BuiltinOperator_HASHTABLE_IMPORT:
+    case BuiltinOperator_HASHTABLE_SIZE:
+    case BuiltinOperator_IMAG:
    case BuiltinOperator_MATRIX_DIAG:
    case BuiltinOperator_MATRIX_SET_DIAG:
+    case BuiltinOperator_NON_MAX_SUPPRESSION_V4:
+    case BuiltinOperator_NON_MAX_SUPPRESSION_V5:
    case BuiltinOperator_RELU_N1_TO_1:
+    case BuiltinOperator_RELU_0_TO_1:
+    case BuiltinOperator_SCATTER_ND:
    case BuiltinOperator_SELECT:
    case BuiltinOperator_SELECT_V2:
    case BuiltinOperator_SLICE:
@@ -855,23 +871,17 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
    case BuiltinOperator_TOPK_V2:
    case BuiltinOperator_TRANSPOSE:
    case BuiltinOperator_RANGE:
-    case BuiltinOperator_SQUARED_DIFFERENCE:
-    case BuiltinOperator_REVERSE_V2:
-    case BuiltinOperator_WHERE:
    case BuiltinOperator_RANK:
-    case BuiltinOperator_NON_MAX_SUPPRESSION_V4:
-    case BuiltinOperator_NON_MAX_SUPPRESSION_V5:
-    case BuiltinOperator_SCATTER_ND:
-    case BuiltinOperator_DENSIFY:
-    case BuiltinOperator_SEGMENT_SUM:
-    case BuiltinOperator_RFFT2D:
-    case BuiltinOperator_IMAG:
    case BuiltinOperator_REAL:
-    case BuiltinOperator_COMPLEX_ABS:
-    case BuiltinOperator_HASHTABLE_FIND:
-    case BuiltinOperator_HASHTABLE_IMPORT:
-    case BuiltinOperator_HASHTABLE_SIZE:
-    case BuiltinOperator_DYNAMIC_UPDATE_SLICE:
+    case BuiltinOperator_RFFT2D:
+    case BuiltinOperator_SEGMENT_SUM:
+    case BuiltinOperator_REVERSE_V2:
+    case BuiltinOperator_UNSORTED_SEGMENT_MAX:
+    case BuiltinOperator_UNSORTED_SEGMENT_MIN:
+    case BuiltinOperator_UNSORTED_SEGMENT_PROD:
+    case BuiltinOperator_UNSORTED_SEGMENT_SUM:
+    case BuiltinOperator_ATAN2:
+    case BuiltinOperator_WHERE:
      return kTfLiteOk;
    case BuiltinOperator_PLACEHOLDER_FOR_GREATER_OP_CODES:
      return kTfLiteError;
@@ -2189,6 +2199,14 @@ TfLiteStatus ParseSquare(const Operator*, ErrorReporter*, BuiltinDataAllocator*,
  return kTfLiteOk;
 }

+// We have this parse function instead of directly returning kTfLiteOk from the
+// switch-case in ParseOpData because this function is used as part of the
+// selective registration for the OpResolver implementation in micro.
+TfLiteStatus ParseSquaredDifference(const Operator*, ErrorReporter*,
+                                    BuiltinDataAllocator*, void**) {
+  return kTfLiteOk;
+}
+
 TfLiteStatus ParseStridedSlice(const Operator* op,
                               ErrorReporter* error_reporter,
                               BuiltinDataAllocator* allocator,
--- a/code/components/tflite-lib/tensorflow/lite/core/api/flatbuffer_conversions.h
+++ b/code/components/tflite-lib/tensorflow/lite/core/api/flatbuffer_conversions.h
@@ -356,6 +356,11 @@ TfLiteStatus ParseSqrt(const Operator* op, ErrorReporter* error_reporter,
 TfLiteStatus ParseSquare(const Operator* op, ErrorReporter* error_reporter,
                         BuiltinDataAllocator* allocator, void** builtin_data);

+TfLiteStatus ParseSquaredDifference(const Operator* op,
+                                    ErrorReporter* error_reporter,
+                                    BuiltinDataAllocator* allocator,
+                                    void** builtin_data);
+
 TfLiteStatus ParseStridedSlice(const Operator* op,
                               ErrorReporter* error_reporter,
                               BuiltinDataAllocator* allocator,
--- a/code/components/tflite-lib/tensorflow/lite/core/api/op_resolver.h
+++ b/code/components/tflite-lib/tensorflow/lite/core/api/op_resolver.h
@@ -23,6 +23,16 @@ limitations under the License.
 #include "tensorflow/lite/core/api/error_reporter.h"
 #include "tensorflow/lite/schema/schema_generated.h"

+// Opaque type similar to TfLiteDelegate / TfLiteOpaqueDelegate.
+// This is used for cases (e.g. when using "TF Lite with Google Play Services")
+// where the TF Lite runtime might be built using a newer (or older)
+// version of the TF Lite sources than the app, and hence might have a
+// different definition of the TfLiteDelegate type. TF Lite APIs use
+// TfLiteOpaqueDelegate rather than TfLiteDelegate when they want to
+// refer to a delegate defined with that potentially different version
+// of the TfLiteDelegate type.
+struct TfLiteOpaqueDelegateStruct;
+
 namespace tflite {

 /// Abstract interface that returns TfLiteRegistrations given op codes or custom
@@ -37,8 +47,10 @@ class OpResolver {
  virtual const TfLiteRegistration* FindOp(const char* op,
                                           int version) const = 0;

+  // Represents a sequence of delegates.
  using TfLiteDelegatePtrVector =
      std::vector<std::unique_ptr<TfLiteDelegate, void (*)(TfLiteDelegate*)>>;
+
  // Returns optional delegates for resolving and handling ops in the flatbuffer
  // model. This may be used in addition to the standard TfLiteRegistration
  // lookup for graph resolution.
@@ -47,16 +59,55 @@ class OpResolver {
    return {};
  }

-  // Represent a function that creates a TfLite delegate instance.
+  // Represents a function that creates a TfLite delegate instance.
  using TfLiteDelegateCreator =
      std::function<std::unique_ptr<TfLiteDelegate, void (*)(TfLiteDelegate*)>(
          int /*num_threads*/)>;
+
+  // Represents a sequence of delegate creator functions.
  using TfLiteDelegateCreators = std::vector<TfLiteDelegateCreator>;
+
  // Returns a vector of delegate creators to create optional delegates for
  // resolving and handling ops in the flatbuffer model. This may be used in
  // addition to the standard TfLiteRegistration lookup for graph resolution.
+  //
+  // Note that this method is not used (will not be called) if you are using
+  // TF Lite in Google Play Services; the GetOpaqueDelegateCreators method
+  // (see below) is used for that case.
  virtual TfLiteDelegateCreators GetDelegateCreators() const { return {}; }

+  // TODO(b/202712825): it would be nice if we could avoid the need for separate
+  // "opaque" types & methods for use only with TF Lite in Google Play Services.
+
+  // Represents an opaque delegate instance.
+  // WARNING: Experimental interface, subject to change.
+  using TfLiteOpaqueDelegatePtr =
+      std::unique_ptr<TfLiteOpaqueDelegateStruct,
+                      void (*)(TfLiteOpaqueDelegateStruct*)>;
+
+  // Represents a function that creates an opaque delegate instance.
+  // WARNING: Experimental interface, subject to change.
+  using TfLiteOpaqueDelegateCreator =
+      std::function<TfLiteOpaqueDelegatePtr(int /*num_threads*/)>;
+
+  // Represents a sequence of opaque delegate creator functions.
+  // WARNING: Experimental interface, subject to change.
+  using TfLiteOpaqueDelegateCreators = std::vector<TfLiteOpaqueDelegateCreator>;
+
+  // Returns a vector of opaque delegate creators to create optional opaque
+  // delegates for resolving and handling ops in the flatbuffer model. This may
+  // be used in addition to the standard TfLiteRegistration lookup for graph
+  // resolution.
+  //
+  // Note that this method will be called only if you are using TF Lite in
+  // Google Play Services; if you are using regular TF Lite, GetDelegateCreators
+  // (see above) is used instead.
+  //
+  // WARNING: Experimental interface, subject to change.
+  virtual TfLiteOpaqueDelegateCreators GetOpaqueDelegateCreators() const {
+    return {};
+  }
+
  virtual ~OpResolver() {}

 private:
--- a/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/fft.cc
+++ b/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/fft.cc
@@ -13,10 +13,10 @@ See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/
 #include "tensorflow/lite/experimental/microfrontend/lib/fft.h"
-#include "tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h"

 #include <string.h>

+#include "tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h"

 void FftCompute(struct FftState* state, const int16_t* input,
                int input_scale_shift) {
@@ -37,9 +37,9 @@ void FftCompute(struct FftState* state, const int16_t* input,

  // Apply the FFT.
  kissfft_fixed16::kiss_fftr(
-    reinterpret_cast<kissfft_fixed16::kiss_fftr_cfg>(state->scratch),
-    state->input,
-    reinterpret_cast<kissfft_fixed16::kiss_fft_cpx*>(state->output));
+      reinterpret_cast<kissfft_fixed16::kiss_fftr_cfg>(state->scratch),
+      state->input,
+      reinterpret_cast<kissfft_fixed16::kiss_fft_cpx*>(state->output));
 }

 void FftInit(struct FftState* state) {
--- a/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/fft_util.cc
+++ b/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/fft_util.cc
@@ -13,10 +13,11 @@ See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/
 #include "tensorflow/lite/experimental/microfrontend/lib/fft_util.h"
-#include "tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h"

 #include <stdio.h>

+#include "tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h"
+
 int FftPopulateState(struct FftState* state, size_t input_size) {
  state->input_size = input_size;
  state->fft_size = 1;
--- a/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h
+++ b/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h
@@ -31,4 +31,3 @@ namespace kissfft_fixed16 {
 #undef KISS_FFT_H

 #endif  // TENSORFLOW_LITE_EXPERIMENTAL_MICROFRONTEND_LIB_KISS_FFT_INT16_H_
-
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/common.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/common.h
@@ -15,6 +15,7 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_COMMON_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_COMMON_H_

+#include <algorithm>
 #ifndef ALLOW_SLOW_GENERIC_DEPTHWISECONV_FALLBACK
 #ifdef GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
 #define ALLOW_SLOW_GENERIC_DEPTHWISECONV_FALLBACK
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/compatibility.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/compatibility.h
@@ -86,6 +86,16 @@ using int32 = std::int32_t;
 using uint32 = std::uint32_t;
 #endif  // !defined(TF_LITE_STATIC_MEMORY)

+// Allow for cross-compiler usage of function signatures - currently used for
+// specifying named RUY profiler regions in templated methods.
+#if defined(_MSC_VER)
+#define TFLITE_PRETTY_FUNCTION __FUNCSIG__
+#elif defined(__GNUC__)
+#define TFLITE_PRETTY_FUNCTION __PRETTY_FUNCTION__
+#else
+#define TFLITE_PRETTY_FUNCTION __func__
+#endif
+
 // TFLITE_DEPRECATED()
 //
 // Duplicated from absl/base/macros.h to avoid pulling in that library.
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/portable_tensor_utils.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/portable_tensor_utils.h
@@ -324,7 +324,7 @@ void ApplySigmoidFloat(const int16_t* input, int32_t n_batch, int32_t n_input,
 //     - n_input: the size for input and output.
 //     - output:  the 16 bit output
 // The input is in Qm.15-m format and the output is in Q0.15 format.
-void ApplyTanh(int32_t integer_bits, const int16_t* input, int32_t n_batch,
+void ApplyTanh(int32_t intger_bits, const int16_t* input, int32_t n_batch,
               int32_t n_input, int16_t* output);

 // Apply Tanh to a quantized vector. Tbe internal calculation is in float.
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/add.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/add.h
@@ -15,6 +15,7 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_ADD_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_ADD_H_

+#include <algorithm>
 #include <type_traits>

 #include "fixedpoint/fixedpoint.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/concatenation.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/concatenation.h
@@ -16,6 +16,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_CONCATENATION_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_CONCATENATION_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/compatibility.h"
 #include "tensorflow/lite/kernels/internal/cppmath.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/conv.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/conv.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_CONV_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_CONV_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/types.h"

--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/div.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/div.h
@@ -0,0 +1,247 @@
+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+#ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_DIV_H_
+#define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_DIV_H_
+
+#include <algorithm>
+
+#include "tensorflow/lite/kernels/internal/common.h"
+
+namespace tflite {
+
+namespace reference_ops {
+
+template <typename T>
+inline void DivCheckArithmeticParams(const ArithmeticParams& params) {
+  TFLITE_DCHECK_LE(params.quantized_activation_min,
+                   params.quantized_activation_max);
+  // Input offset is negative input zero point. Activation tensors are
+  // asymmetric quantized so they span the full int8 range.
+  constexpr int32_t max_value =
+      static_cast<int32_t>(std::numeric_limits<T>::max());
+  TFLITE_DCHECK_GE(params.input1_offset, -max_value);
+  TFLITE_DCHECK_LE(params.input1_offset, max_value);
+  TFLITE_DCHECK_GE(params.input2_offset, -max_value);
+  TFLITE_DCHECK_LE(params.input2_offset, max_value);
+  TFLITE_DCHECK_GE(params.output_offset, -max_value);
+  TFLITE_DCHECK_LE(params.output_offset, max_value);
+}
+
+// Element-wise div that can often be used for inner loop of broadcast Div as
+// well as the non-broadcast Div.
+template <typename T>
+inline void DivElementwise(int size, const ArithmeticParams& params,
+                           const T* input1_data, const T* input2_data,
+                           T* output_data) {
+  DivCheckArithmeticParams<T>(params);
+
+  for (int i = 0; i < size; ++i) {
+    int32_t input1_val = params.input1_offset + input1_data[i];
+    int32_t input2_val = params.input2_offset + input2_data[i];
+    TFLITE_DCHECK_NE(input2_val, 0);
+    if (input2_val < 0) {
+      // Invert signs to avoid a negative input2_val as input2_inv needs to be
+      // positive to be used as multiplier of MultiplyByQuantizedMultiplier.
+      input1_val = -input1_val;
+      input2_val = -input2_val;
+    }
+    int recip_shift;
+    const int32_t input2_inv = GetReciprocal(input2_val, 31, &recip_shift);
+    const int headroom = CountLeadingSignBits(input1_val);
+    const int32_t unscaled_quotient =
+        MultiplyByQuantizedMultiplierGreaterThanOne(input1_val, input2_inv,
+                                                    headroom);
+    const int total_shift = params.output_shift - recip_shift - headroom;
+    const int32_t unclamped_result =
+        params.output_offset +
+        MultiplyByQuantizedMultiplierSmallerThanOneExp(
+            unscaled_quotient, params.output_multiplier, total_shift);
+    const int32_t clamped_output =
+        std::min(params.quantized_activation_max,
+                 std::max(params.quantized_activation_min, unclamped_result));
+    output_data[i] = static_cast<T>(clamped_output);
+  }
+}
+
+inline void Div(const ArithmeticParams& params,
+                const RuntimeShape& input1_shape, const uint8_t* input1_data,
+                const RuntimeShape& input2_shape, const uint8_t* input2_data,
+                const RuntimeShape& output_shape, uint8_t* output_data) {
+  TFLITE_DCHECK_LE(params.quantized_activation_min,
+                   params.quantized_activation_max);
+  const int flat_size =
+      MatchingElementsSize(input1_shape, input2_shape, output_shape);
+
+  DivElementwise(flat_size, params, input1_data, input2_data, output_data);
+}
+
+inline void Div(const ArithmeticParams& params,
+                const RuntimeShape& input1_shape, const int8_t* input1_data,
+                const RuntimeShape& input2_shape, const int8_t* input2_data,
+                const RuntimeShape& output_shape, int8_t* output_data) {
+  TFLITE_DCHECK_LE(params.quantized_activation_min,
+                   params.quantized_activation_max);
+  const int flat_size =
+      MatchingElementsSize(input1_shape, input2_shape, output_shape);
+
+  DivElementwise(flat_size, params, input1_data, input2_data, output_data);
+}
+
+template <typename T, int N = 5>
+inline void BroadcastDivSlowQuantized(
+    const ArithmeticParams& params, const RuntimeShape& unextended_input1_shape,
+    const T* input1_data, const RuntimeShape& unextended_input2_shape,
+    const T* input2_data, const RuntimeShape& unextended_output_shape,
+    T* output_data) {
+  TFLITE_DCHECK_LE(unextended_input1_shape.DimensionsCount(), N);
+  TFLITE_DCHECK_LE(unextended_input2_shape.DimensionsCount(), N);
+  TFLITE_DCHECK_LE(unextended_output_shape.DimensionsCount(), N);
+
+  NdArrayDesc<N> desc1;
+  NdArrayDesc<N> desc2;
+  NdArrayDesc<N> output_desc;
+  NdArrayDescsForElementwiseBroadcast(unextended_input1_shape,
+                                      unextended_input2_shape, &desc1, &desc2);
+  CopyDimsToDesc(RuntimeShape::ExtendedShape(N, unextended_output_shape),
+                 &output_desc);
+
+  DivCheckArithmeticParams<T>(params);
+
+  auto div_func = [&](int indexes[N]) {
+    int32_t input1_val =
+        params.input1_offset + input1_data[SubscriptToIndex(desc1, indexes)];
+    int32_t input2_val =
+        params.input2_offset + input2_data[SubscriptToIndex(desc2, indexes)];
+    TFLITE_DCHECK_NE(input2_val, 0);
+    if (input2_val < 0) {
+      // Invert signs to avoid a negative input2_val as input2_inv needs to be
+      // positive to be used as multiplier of MultiplyByQuantizedMultiplier.
+      input1_val = -input1_val;
+      input2_val = -input2_val;
+    }
+    int recip_shift;
+    const int32_t input2_inv = GetReciprocal(input2_val, 31, &recip_shift);
+    const int headroom = CountLeadingSignBits(input1_val);
+    const int32_t unscaled_quotient =
+        MultiplyByQuantizedMultiplierGreaterThanOne(input1_val, input2_inv,
+                                                    headroom);
+    const int total_shift = params.output_shift - recip_shift - headroom;
+    const int32_t unclamped_result =
+        params.output_offset +
+        MultiplyByQuantizedMultiplierSmallerThanOneExp(
+            unscaled_quotient, params.output_multiplier, total_shift);
+    const int32_t clamped_output =
+        std::min(params.quantized_activation_max,
+                 std::max(params.quantized_activation_min, unclamped_result));
+    output_data[SubscriptToIndex(output_desc, indexes)] =
+        static_cast<T>(clamped_output);
+  };
+  NDOpsHelper<N>(output_desc, div_func);
+}
+
+template <int N = 5>
+inline void BroadcastDivSlow(const ArithmeticParams& params,
+                             const RuntimeShape& unextended_input1_shape,
+                             const uint8_t* input1_data,
+                             const RuntimeShape& unextended_input2_shape,
+                             const uint8_t* input2_data,
+                             const RuntimeShape& unextended_output_shape,
+                             uint8_t* output_data) {
+  BroadcastDivSlowQuantized<uint8_t, N>(
+      params, unextended_input1_shape, input1_data, unextended_input2_shape,
+      input2_data, unextended_output_shape, output_data);
+}
+
+template <int N = 5>
+inline void BroadcastDivSlow(const ArithmeticParams& params,
+                             const RuntimeShape& unextended_input1_shape,
+                             const int8_t* input1_data,
+                             const RuntimeShape& unextended_input2_shape,
+                             const int8_t* input2_data,
+                             const RuntimeShape& unextended_output_shape,
+                             int8_t* output_data) {
+  BroadcastDivSlowQuantized<int8_t, N>(
+      params, unextended_input1_shape, input1_data, unextended_input2_shape,
+      input2_data, unextended_output_shape, output_data);
+}
+
+// TODO(jiawen): We can implement BroadcastDiv on buffers of arbitrary
+// dimensionality if the runtime code does a single loop over one dimension
+// that handles broadcasting as the base case. The code generator would then
+// generate max(D1, D2) nested for loops.
+template <typename T, int N = 5>
+void BroadcastDivSlow(const ArithmeticParams& params,
+                      const RuntimeShape& unextended_input1_shape,
+                      const T* input1_data,
+                      const RuntimeShape& unextended_input2_shape,
+                      const T* input2_data,
+                      const RuntimeShape& unextended_output_shape,
+                      T* output_data) {
+  T output_activation_min;
+  T output_activation_max;
+  GetActivationParams(params, &output_activation_min, &output_activation_max);
+
+  TFLITE_DCHECK_LE(unextended_input1_shape.DimensionsCount(), N);
+  TFLITE_DCHECK_LE(unextended_input2_shape.DimensionsCount(), N);
+  TFLITE_DCHECK_LE(unextended_output_shape.DimensionsCount(), N);
+
+  NdArrayDesc<N> desc1;
+  NdArrayDesc<N> desc2;
+  NdArrayDesc<N> output_desc;
+  NdArrayDescsForElementwiseBroadcast(unextended_input1_shape,
+                                      unextended_input2_shape, &desc1, &desc2);
+  CopyDimsToDesc(RuntimeShape::ExtendedShape(N, unextended_output_shape),
+                 &output_desc);
+
+  // In Tensorflow, the dimensions are canonically named (batch_number, row,
+  // col, channel), with extents (batches, height, width, depth), with the
+  // trailing dimension changing most rapidly (channels has the smallest
+  // stride, typically 1 element).
+  //
+  // In generated C code, we store arrays with the dimensions reversed. The
+  // first dimension has smallest stride.
+
+  auto div_func = [&](int indexes[N]) {
+    output_data[SubscriptToIndex(output_desc, indexes)] =
+        ActivationFunctionWithMinMax(
+            input1_data[SubscriptToIndex(desc1, indexes)] /
+                input2_data[SubscriptToIndex(desc2, indexes)],
+            output_activation_min, output_activation_max);
+  };
+  NDOpsHelper<N>(output_desc, div_func);
+}
+
+template <typename T>
+inline void Div(const ArithmeticParams& params,
+                const RuntimeShape& input1_shape, const T* input1_data,
+                const RuntimeShape& input2_shape, const T* input2_data,
+                const RuntimeShape& output_shape, T* output_data) {
+  T output_activation_min;
+  T output_activation_max;
+  GetActivationParams(params, &output_activation_min, &output_activation_max);
+
+  const int flat_size =
+      MatchingElementsSize(input1_shape, input2_shape, output_shape);
+  for (int i = 0; i < flat_size; ++i) {
+    output_data[i] = ActivationFunctionWithMinMax(
+        input1_data[i] / input2_data[i], output_activation_min,
+        output_activation_max);
+  }
+}
+
+}  // namespace reference_ops
+}  // namespace tflite
+
+#endif  // TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_DIV_H_
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/fully_connected.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/fully_connected.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_FULLY_CONNECTED_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_FULLY_CONNECTED_H_

+#include <algorithm>
+
 #include "ruy/profiler/instrumentation.h"  // from @ruy
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/cppmath.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/hard_swish.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/hard_swish.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_ACTIVATIONS_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_ACTIVATIONS_H_

+#include <algorithm>
+
 #include "ruy/profiler/instrumentation.h"  // from @ruy
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/types.h"
@@ -23,9 +25,9 @@ namespace tflite {
 namespace reference_ops {

 inline int16_t SaturatingLeftShift(int16_t value, int amount) {
-  int32_t result = static_cast<int32_t>(value) * (1 << amount);
-  result = std::min<int32_t>(result, std::numeric_limits<int16_t>::max());
-  result = std::max<int32_t>(result, std::numeric_limits<int16_t>::min());
+  int64_t result = static_cast<int64_t>(value) * (1 << amount);
+  result = std::min<int64_t>(result, std::numeric_limits<int16_t>::max());
+  result = std::max<int64_t>(result, std::numeric_limits<int16_t>::min());
  return result;
 }

--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/add.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/add.h
@@ -15,6 +15,7 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_ADD_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_ADD_H_

+#include <algorithm>
 #include <limits>

 #include "tensorflow/lite/kernels/internal/common.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/conv.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/conv.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_CONV_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_CONV_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/depthwise_conv.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/depthwise_conv.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_DEPTHWISE_CONV_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_DEPTHWISE_CONV_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/fully_connected.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/fully_connected.h
@@ -15,11 +15,101 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_FULLY_CONNECTED_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_FULLY_CONNECTED_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
 namespace reference_integer_ops {

+// For per-channel functions, since it is defined in quantization spec that
+// weights are symmetric
+// (https://www.tensorflow.org/lite/performance/quantization_spec#symmetric_vs_asymmetric),
+// zero_point (params.weights_offset) is always 0.
+// However, for per-tensor functions, params.weights_offset is still applied for
+// backward compatibility.
+
+inline void FullyConnectedPerChannel(
+    const FullyConnectedParams& params, const int32_t* output_multiplier,
+    const int* output_shift, const RuntimeShape& input_shape,
+    const int8_t* input_data, const RuntimeShape& filter_shape,
+    const int8_t* filter_data, const RuntimeShape& bias_shape,
+    const int32_t* bias_data, const RuntimeShape& output_shape,
+    int8_t* output_data) {
+  const int32_t input_offset = params.input_offset;
+  const int32_t output_offset = params.output_offset;
+  const int32_t output_activation_min = params.quantized_activation_min;
+  const int32_t output_activation_max = params.quantized_activation_max;
+  TFLITE_DCHECK_GE(filter_shape.DimensionsCount(), 2);
+  TFLITE_DCHECK_EQ(output_shape.DimensionsCount(), 2);
+
+  TFLITE_DCHECK_LE(output_activation_min, output_activation_max);
+  const int filter_dim_count = filter_shape.DimensionsCount();
+  const int batches = output_shape.Dims(0);
+  const int output_depth = output_shape.Dims(1);
+  TFLITE_DCHECK_LE(output_depth, filter_shape.Dims(filter_dim_count - 2));
+  const int accum_depth = filter_shape.Dims(filter_dim_count - 1);
+  for (int b = 0; b < batches; ++b) {
+    for (int out_c = 0; out_c < output_depth; ++out_c) {
+      int32_t acc = 0;
+      for (int d = 0; d < accum_depth; ++d) {
+        int32_t input_val = input_data[b * accum_depth + d];
+        int32_t filter_val = filter_data[out_c * accum_depth + d];
+        acc += filter_val * (input_val + input_offset);
+      }
+      if (bias_data) {
+        acc += bias_data[out_c];
+      }
+      acc = MultiplyByQuantizedMultiplier(acc, output_multiplier[out_c],
+                                          output_shift[out_c]);
+      acc += output_offset;
+      acc = std::max(acc, output_activation_min);
+      acc = std::min(acc, output_activation_max);
+      output_data[out_c + output_depth * b] = static_cast<int8_t>(acc);
+    }
+  }
+}
+
+template <typename AccumScalar>
+inline void FullyConnectedPerChannel(
+    const FullyConnectedParams& params, const int32_t* output_multiplier,
+    const int* output_shift, const RuntimeShape& input_shape,
+    const int16_t* input_data, const RuntimeShape& filter_shape,
+    const int8_t* filter_data, const RuntimeShape& bias_shape,
+    const AccumScalar* bias_data, const RuntimeShape& output_shape,
+    int16_t* output_data) {
+  const int32_t output_activation_min = params.quantized_activation_min;
+  const int32_t output_activation_max = params.quantized_activation_max;
+  TFLITE_DCHECK_GE(filter_shape.DimensionsCount(), 2);
+  TFLITE_DCHECK_GE(output_shape.DimensionsCount(), 1);
+
+  TFLITE_DCHECK_LE(output_activation_min, output_activation_max);
+  const int filter_dim_count = filter_shape.DimensionsCount();
+  const int output_dim_count = output_shape.DimensionsCount();
+  const int batches = FlatSizeSkipDim(output_shape, output_dim_count - 1);
+  const int output_depth = output_shape.Dims(output_dim_count - 1);
+  TFLITE_DCHECK_LE(output_depth, filter_shape.Dims(filter_dim_count - 2));
+  const int accum_depth = filter_shape.Dims(filter_dim_count - 1);
+  for (int b = 0; b < batches; ++b) {
+    for (int out_c = 0; out_c < output_depth; ++out_c) {
+      AccumScalar acc = 0;
+      for (int d = 0; d < accum_depth; ++d) {
+        int32_t input_val = input_data[b * accum_depth + d];
+        int32_t filter_val = filter_data[out_c * accum_depth + d];
+        acc += filter_val * input_val;
+      }
+      if (bias_data) {
+        acc += bias_data[out_c];
+      }
+      int32_t acc_scaled = MultiplyByQuantizedMultiplier(
+          acc, output_multiplier[out_c], output_shift[out_c]);
+      acc_scaled = std::max(acc_scaled, output_activation_min);
+      acc_scaled = std::min(acc_scaled, output_activation_max);
+      output_data[out_c + output_depth * b] = static_cast<int16_t>(acc_scaled);
+    }
+  }
+}
+
 inline void FullyConnected(
    const FullyConnectedParams& params, const RuntimeShape& input_shape,
    const int8_t* input_data, const RuntimeShape& filter_shape,
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/l2normalization.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/l2normalization.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_L2NORMALIZATION_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_L2NORMALIZATION_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/logistic.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/logistic.h
@@ -15,7 +15,9 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_LOGISTIC_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_LOGISTIC_H_

+#include <algorithm>
 #include <limits>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/mean.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/mean.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_MEAN_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_MEAN_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/mul.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/mul.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_MUL_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_MUL_H_

+#include <algorithm>
+
 #include "fixedpoint/fixedpoint.h"
 #include "ruy/profiler/instrumentation.h"  // from @ruy
 #include "tensorflow/lite/kernels/internal/common.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/pooling.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/pooling.h
@@ -15,7 +15,9 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_POOLING_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_POOLING_H_

+#include <algorithm>
 #include <limits>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/tanh.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/tanh.h
@@ -15,6 +15,7 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_TANH_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_TANH_H_

+#include <algorithm>
 #include <limits>

 #include "fixedpoint/fixedpoint.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/transpose_conv.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/integer_ops/transpose_conv.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_TRANSPOSE_CONV_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_TRANSPOSE_CONV_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/mul.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/mul.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_MUL_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_MUL_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/pooling.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/pooling.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_POOLING_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_POOLING_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/cppmath.h"
 #include "tensorflow/lite/kernels/internal/quantization_util.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/prelu.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/prelu.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_PRELU_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_PRELU_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/compatibility.h"
 #include "tensorflow/lite/kernels/internal/types.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/process_broadcast_shapes.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/process_broadcast_shapes.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_PROCESS_BROADCAST_SHAPES_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_PROCESS_BROADCAST_SHAPES_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/types.h"

 namespace tflite {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/reduce.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/reduce.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_REDUCE_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_REDUCE_H_

+#include <algorithm>
+
 #include "ruy/profiler/instrumentation.h"  // from @ruy
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/cppmath.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/requantize.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/requantize.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_REQUANTIZE_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_REQUANTIZE_H_

+#include <algorithm>
+
 #include "ruy/profiler/instrumentation.h"  // from @ruy
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/types.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/resize_nearest_neighbor.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/resize_nearest_neighbor.h
@@ -15,6 +15,7 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_RESIZE_NEAREST_NEIGHBOR_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_RESIZE_NEAREST_NEIGHBOR_H_

+#include <algorithm>
 #include <cmath>

 #include "tensorflow/lite/kernels/internal/cppmath.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/softmax.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/softmax.h
@@ -15,6 +15,7 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_SOFTMAX_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_SOFTMAX_H_

+#include <algorithm>
 #include <limits>

 #include "fixedpoint/fixedpoint.h"
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/transpose_conv.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/reference/transpose_conv.h
@@ -15,6 +15,8 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_TRANSPOSE_CONV_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_TRANSPOSE_CONV_H_

+#include <algorithm>
+
 #include "tensorflow/lite/kernels/internal/common.h"
 #include "tensorflow/lite/kernels/internal/types.h"

--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/runtime_shape.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/runtime_shape.h
@@ -27,6 +27,11 @@ class RuntimeShape {
 public:
  RuntimeShape& operator=(RuntimeShape const&) = delete;

+  // RuntimeShape in TFLM supports up to 5 dimensions.
+  // The name kMaxSmallSize comes from the same file of the upstream
+  // tensorflow lite repo and need to be kept the same for max reuse.
+  static constexpr int kMaxSmallSize = 5;
+
  RuntimeShape() : size_(0) {}

  explicit RuntimeShape(int dimensions_count) : size_(dimensions_count) {}
@@ -104,11 +109,9 @@ class RuntimeShape {
                sizeof(int32_t) * shape.DimensionsCount());
  }

-  // A maximum of 4 dimensions are supported on TFLM.
-  static constexpr int kMaxSize = 5;
  int32_t size_;
  union {
-    int32_t dims_[kMaxSize];
+    int32_t dims_[kMaxSmallSize];
  };
 };

--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/types.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/types.h
@@ -974,11 +974,11 @@ struct StridedSliceParams {
  int8_t strides_count;
  int32_t strides[5];

-  int16_t begin_mask;
-  int16_t ellipsis_mask;
-  int16_t end_mask;
-  int16_t new_axis_mask;
-  int16_t shrink_axis_mask;
+  uint16_t begin_mask;
+  uint16_t ellipsis_mask;
+  uint16_t end_mask;
+  uint16_t new_axis_mask;
+  uint16_t shrink_axis_mask;
 };

 struct TanhParams {
--- a/code/components/tflite-lib/tensorflow/lite/kernels/kernel_util.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/kernel_util.h
@@ -177,6 +177,14 @@ inline int64_t NumElements(const TfLiteTensor* t) {
  return NumElements(t->dims);
 }

+inline int64_t NumElements(const int* dims, int num_dims) {
+  int64_t count = 1;
+  for (int i = 0; i < num_dims; ++i) {
+    count *= dims[i];
+  }
+  return count;
+}
+
 // Determines whether tensor is constant.
 // TODO(b/138199592): Introduce new query which checks for constant OR
 // persistent-read-only, which would be useful for most tensor kernels that
@@ -308,7 +316,7 @@ TfLiteStatus CalculateShapeForBroadcast(TfLiteContext* context,
                                        const TfLiteTensor* input3,
                                        TfLiteIntArray** output_shape);

-// Return the size of given type in bytes. Return 0 in in case of string.
+// Return the size of given type in bytes. Return 0 in case of string.
 int TfLiteTypeGetSize(TfLiteType type);

 // Whether the current platform is mobile (Android or iOS).
--- a/code/components/tflite-lib/tensorflow/lite/micro/all_ops_resolver.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/all_ops_resolver.cc
@@ -43,6 +43,7 @@ AllOpsResolver::AllOpsResolver() {
  AddDepthwiseConv2D();
  AddDequantize();
  AddDetectionPostprocess();
+  AddDiv();
  AddElu();
  AddEqual();
  AddEthosU();
@@ -104,6 +105,7 @@ AllOpsResolver::AllOpsResolver() {
  AddSqueeze();
  AddStridedSlice();
  AddSub();
+  AddSum();
  AddSvdf();
  AddTanh();
  AddTranspose();
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/ibuffer_allocator.h
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/ibuffer_allocator.h
@@ -12,8 +12,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/
-#ifndef TENSORFLOW_LITE_MICRO_IBUFFER_ALLOCATOR_H_
-#define TENSORFLOW_LITE_MICRO_IBUFFER_ALLOCATOR_H_
+#ifndef TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_IBUFFER_ALLOCATOR_H_
+#define TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_IBUFFER_ALLOCATOR_H_

 #include <cstddef>
 #include <cstdint>
@@ -97,4 +97,4 @@ class INonPersistentBufferAllocator {

 }  // namespace tflite

-#endif  // TENSORFLOW_LITE_MICRO_IBUFFER_ALLOCATOR_H_
+#endif  // TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_IBUFFER_ALLOCATOR_H_
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/non_persistent_arena_buffer_allocator.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/non_persistent_arena_buffer_allocator.cc
@@ -0,0 +1,170 @@
+/* Copyright 2022 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+#include "tensorflow/lite/micro/arena_allocator/non_persistent_arena_buffer_allocator.h"
+
+#include "tensorflow/lite/micro/memory_helpers.h"
+#include "tensorflow/lite/micro/micro_error_reporter.h"
+
+namespace tflite {
+
+NonPersistentArenaBufferAllocator::NonPersistentArenaBufferAllocator(
+    uint8_t* buffer, size_t buffer_size)
+    : buffer_head_(buffer),
+      buffer_tail_(buffer + buffer_size),
+      head_temp_(buffer),
+      next_temp_(buffer) {}
+
+NonPersistentArenaBufferAllocator::~NonPersistentArenaBufferAllocator() {}
+
+// Allocates a temporary buffer. This buffer is not resizable.
+uint8_t* NonPersistentArenaBufferAllocator::AllocateTemp(size_t size,
+                                                         size_t alignment) {
+  uint8_t* const aligned_result = AlignPointerUp(next_temp_, alignment);
+  const size_t available_memory = buffer_tail_ - aligned_result;
+  if (available_memory < size) {
+    MicroPrintf(
+        "Failed to allocate temp memory. Requested: %u, "
+        "available %u, missing: %u",
+        size, available_memory, size - available_memory);
+    return nullptr;
+  }
+  next_temp_ = aligned_result + size;
+  temp_buffer_ptr_check_sum_ ^= reinterpret_cast<intptr_t>(aligned_result);
+  temp_buffer_count_++;
+  return aligned_result;
+}
+
+// Signals that a temporary buffer is no longer needed.
+void NonPersistentArenaBufferAllocator::DeallocateTemp(uint8_t* temp_buf) {
+  temp_buffer_ptr_check_sum_ ^= reinterpret_cast<intptr_t>(temp_buf);
+  temp_buffer_count_--;
+}
+
+// Returns true if all temporary buffers are already deallocated.
+bool NonPersistentArenaBufferAllocator::IsAllTempDeallocated() {
+  if (temp_buffer_count_ != 0 || temp_buffer_ptr_check_sum_ != 0) {
+    MicroPrintf(
+        "Number of allocated temp buffers: %d. Checksum passing status: %d",
+        temp_buffer_count_, !temp_buffer_ptr_check_sum_);
+    return false;
+  }
+  return true;
+}
+
+// Signals that all temporary allocations can be reclaimed. TFLM calls this
+// API when it knows that all temporary buffers that it requested has been
+// deallocated. The goal of API is to facilitate implementations of
+// INonPersistentBufferAllocator can reuse buffer with some reasonable
+// complexity.
+TfLiteStatus NonPersistentArenaBufferAllocator::ResetTempAllocations() {
+  if (!IsAllTempDeallocated()) {
+    MicroPrintf(
+        "All temp buffers must be freed before calling ResetTempAllocations()");
+    return kTfLiteError;
+  }
+  next_temp_ = head_temp_;
+  return kTfLiteOk;
+}
+
+// Returns a buffer that is resizable viable ResizeBuffer().
+uint8_t* NonPersistentArenaBufferAllocator::AllocateResizableBuffer(
+    size_t size, size_t alignment) {
+  // Only supports one resizable buffer, which starts at the buffer head.
+  uint8_t* expected_resizable_buf = AlignPointerUp(buffer_head_, alignment);
+
+  if (resizable_buffer_allocated_) {
+    MicroPrintf(
+        "Cannot allocate a new resizable buffer when one is already allocated");
+    return nullptr;
+  }
+
+  if (ResizeBuffer(expected_resizable_buf, size, alignment) == kTfLiteOk) {
+    resizable_buffer_allocated_ = true;
+    return expected_resizable_buf;
+  }
+  return nullptr;
+}
+
+// Resizes a buffer that is previously returned by the AllocateResizableBuffer.
+// Note that ResizeBuffer(old_resizable_buf, 0, 1) effectively deallocates
+// a previous allocated resizable buffer.
+TfLiteStatus NonPersistentArenaBufferAllocator::ResizeBuffer(
+    uint8_t* resizable_buf, size_t size, size_t alignment) {
+  // Only supports one resizable buffer, which starts at the buffer head.
+  uint8_t* expect_resizable_buf = AlignPointerUp(buffer_head_, alignment);
+  if (resizable_buf != expect_resizable_buf) {
+    MicroPrintf("Internal error: buffer is not resizable");
+    return kTfLiteError;
+  }
+  if (head_temp_ != next_temp_) {
+    MicroPrintf("ResetTempAllocations() is not called before ResizeBuffer().");
+    return kTfLiteError;
+  }
+
+  const size_t available_memory = buffer_tail_ - expect_resizable_buf;
+  if (available_memory < size) {
+    MicroPrintf(
+        "Failed to resize buffer. Requested: %u, available %u, missing: %u",
+        size, available_memory, size - available_memory);
+    return kTfLiteError;
+  }
+  head_temp_ = expect_resizable_buf + size;
+  next_temp_ = head_temp_;
+
+  return kTfLiteOk;
+}
+
+// Frees up the memory occupied by the resizable buffer.
+TfLiteStatus NonPersistentArenaBufferAllocator::DeallocateResizableBuffer(
+    uint8_t* resizable_buf) {
+  TfLiteStatus status = ResizeBuffer(resizable_buf, 0, 1);
+  if (status == kTfLiteOk) {
+    resizable_buffer_allocated_ = false;
+  }
+  return status;
+}
+
+// Returns a pointer pointing to the start of the overlay memory, which is
+// used for activation tensors and scratch buffers by kernels at Invoke stage.
+uint8_t* NonPersistentArenaBufferAllocator::GetOverlayMemoryAddress() const {
+  return buffer_head_;
+}
+
+// Reserves the size of the overlay memory. This overlay is reserved for the
+// kernels at Invoke stage. This is referred to as the overlay because before
+// Invoket state, the same memory can be used for temp buffers. The layout of
+// the memory is planned by the memory planner separately at Invoke stage.
+TfLiteStatus
+NonPersistentArenaBufferAllocator::ReserveNonPersistentOverlayMemory(
+    size_t size, size_t alignment) {
+  uint8_t* expect_resizable_buf = AlignPointerUp(buffer_head_, alignment);
+  return ResizeBuffer(expect_resizable_buf, size, alignment);
+}
+
+// Returns the size of non-persistent buffer in use.
+size_t NonPersistentArenaBufferAllocator::GetNonPersistentUsedBytes() const {
+  return (next_temp_ - buffer_head_);
+}
+
+// Returns the number of bytes available with a given alignment. This number
+// takes in account any temporary allocations.
+size_t NonPersistentArenaBufferAllocator::GetAvailableMemory(
+    size_t alignment) const {
+  uint8_t* const aligned_temp = AlignPointerUp(next_temp_, alignment);
+  uint8_t* const aligned_tail = AlignPointerDown(buffer_tail_, alignment);
+  return aligned_tail - aligned_temp;
+}
+
+}  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/non_persistent_arena_buffer_allocator.h
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/non_persistent_arena_buffer_allocator.h
@@ -0,0 +1,105 @@
+/* Copyright 2022 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+#ifndef TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_NON_PERSISTENT_ARENA_BUFFER_ALLOCATOR_H_
+#define TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_NON_PERSISTENT_ARENA_BUFFER_ALLOCATOR_H_
+
+#include <cstddef>
+#include <cstdint>
+
+#include "tensorflow/lite/c/common.h"
+#include "tensorflow/lite/core/api/error_reporter.h"
+#include "tensorflow/lite/micro/arena_allocator/ibuffer_allocator.h"
+#include "tensorflow/lite/micro/compatibility.h"
+
+namespace tflite {
+
+// Implement INonPersistentBufferAllocator on an arena that is dedicated for
+// non-persistent buffers.
+class NonPersistentArenaBufferAllocator : public INonPersistentBufferAllocator {
+ public:
+  NonPersistentArenaBufferAllocator(uint8_t* buffer, size_t buffer_size);
+  virtual ~NonPersistentArenaBufferAllocator();
+
+  // Allocates a temporary buffer. This buffer is not resizable.
+  uint8_t* AllocateTemp(size_t size, size_t alignment) override;
+
+  // Signals that a temporary buffer is no longer needed.
+  void DeallocateTemp(uint8_t* buf) override;
+
+  // Returns true if all temporary buffers are already deallocated.
+  bool IsAllTempDeallocated() override;
+
+  // Signals that all temporary allocations can be reclaimed. TFLM calls this
+  // API when it knows that all temporary buffers that it requested has been
+  // deallocated.
+  TfLiteStatus ResetTempAllocations() override;
+
+  // Returns a buffer that is resizable viable ResizeBuffer().
+  uint8_t* AllocateResizableBuffer(size_t size, size_t alignment) override;
+
+  // Resizes a buffer that is previously returned by the
+  // AllocateResizableBuffer.
+  TfLiteStatus ResizeBuffer(uint8_t* resizable_buf, size_t size,
+                            size_t alignment) override;
+
+  // Frees up the memory occupied by the resizable buffer.
+  TfLiteStatus DeallocateResizableBuffer(uint8_t* resizable_buf) override;
+
+  // Returns a pointer pointing to the start of the overlay memory, which is
+  // used for activation tensors and scratch buffers by kernels at Invoke stage.
+  uint8_t* GetOverlayMemoryAddress() const override;
+
+  // Reserves the size of the overlay memory. This overlay is reserved for the
+  // kernels at Invoke stage. This is referred to as the overlay because before
+  // Invoket state, the same memory can be used for temp buffers. The layout of
+  // the memory is planned by the memory planner separately at Invoke stage.
+  TfLiteStatus ReserveNonPersistentOverlayMemory(size_t size,
+                                                 size_t alignment) override;
+
+  // Returns the size of non-persistent buffer in use.
+  size_t GetNonPersistentUsedBytes() const override;
+
+  // Returns the number of bytes available with a given alignment. This number
+  // takes in account any temporary allocations.
+  size_t GetAvailableMemory(size_t alignment) const override;
+
+  TF_LITE_REMOVE_VIRTUAL_DELETE
+
+ private:
+  // The memory arena that this allocator manages.
+  uint8_t* const buffer_head_;
+  uint8_t* const buffer_tail_;
+
+  // The whole region is split into two parts:
+  // buffer_head_ to head_temp_ - 1 belongs to the only resizable buffer.
+  // head_temp_ to buffer_tail_ can be used for (non-resizable) temp buffers.
+  uint8_t* head_temp_;
+
+  // next_temp_ points to the next available temp buffer allocation address and
+  // its range is between head_temp_ and buffer_tail_
+  uint8_t* next_temp_;
+
+  // XOR Check sum for outstanding temp buffers.
+  // If all temp buffers are deallocated OR no temp buffers are allocated,
+  // temp_buffer_ptr_check_sum_ == nullptr.
+  intptr_t temp_buffer_ptr_check_sum_ = 0;
+  // Count of outstanding temp buffers.
+  int temp_buffer_count_ = 0;
+  bool resizable_buffer_allocated_ = false;
+};
+
+}  // namespace tflite
+
+#endif  // TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_NON_PERSISTENT_ARENA_BUFFER_ALLOCATOR_H_
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/persistent_arena_buffer_allocator.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/persistent_arena_buffer_allocator.cc
@@ -0,0 +1,52 @@
+/* Copyright 2022 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+#include "tensorflow/lite/micro/arena_allocator/persistent_arena_buffer_allocator.h"
+
+#include "tensorflow/lite/micro/memory_helpers.h"
+#include "tensorflow/lite/micro/micro_error_reporter.h"
+
+namespace tflite {
+
+PersistentArenaBufferAllocator::PersistentArenaBufferAllocator(
+    uint8_t* buffer, size_t buffer_size)
+    : buffer_head_(buffer),
+      buffer_tail_(buffer + buffer_size),
+      tail_temp_(buffer_tail_) {}
+
+PersistentArenaBufferAllocator::~PersistentArenaBufferAllocator() {}
+
+uint8_t* PersistentArenaBufferAllocator::AllocatePersistentBuffer(
+    size_t size, size_t alignment) {
+  uint8_t* const aligned_result =
+      AlignPointerDown(tail_temp_ - size, alignment);
+  if (aligned_result < buffer_head_) {
+#ifndef TF_LITE_STRIP_ERROR_STRINGS
+    const size_t missing_memory = buffer_head_ - aligned_result;
+    MicroPrintf(
+        "Failed to allocate tail memory. Requested: %u, "
+        "available %u, missing: %u",
+        size, size - missing_memory, missing_memory);
+#endif
+    return nullptr;
+  }
+  tail_temp_ = aligned_result;
+  return aligned_result;
+}
+
+size_t PersistentArenaBufferAllocator::GetPersistentUsedBytes() const {
+  return buffer_tail_ - tail_temp_;
+}
+
+}  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/persistent_arena_buffer_allocator.h
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/persistent_arena_buffer_allocator.h
@@ -0,0 +1,59 @@
+/* Copyright 2022 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+#ifndef TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_PERSISTENT_ARENA_BUFFER_ALLOCATOR_H_
+#define TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_PERSISTENT_ARENA_BUFFER_ALLOCATOR_H_
+
+#include <cstddef>
+#include <cstdint>
+
+#include "tensorflow/lite/c/common.h"
+#include "tensorflow/lite/core/api/error_reporter.h"
+#include "tensorflow/lite/micro/arena_allocator/ibuffer_allocator.h"
+#include "tensorflow/lite/micro/compatibility.h"
+
+namespace tflite {
+
+// PersistentArenaBufferAllocator is an implementatation of
+// IPersistentBufferAllocator interface on an arena that is dedicated for
+// persistent buffers.
+class PersistentArenaBufferAllocator : public IPersistentBufferAllocator {
+ public:
+  PersistentArenaBufferAllocator(uint8_t* buffer, size_t buffer_size);
+  virtual ~PersistentArenaBufferAllocator();
+
+  // Allocates persistent memory. The persistent buffer is never freed.
+  // Returns nullptr if errors occured.
+  uint8_t* AllocatePersistentBuffer(size_t size, size_t alignment) override;
+
+  // Returns the size of all persistent allocations in bytes.
+  size_t GetPersistentUsedBytes() const override;
+
+  TF_LITE_REMOVE_VIRTUAL_DELETE
+ private:
+  // The memory arena that this allocator manages.
+  uint8_t* const buffer_head_;
+  uint8_t* const buffer_tail_;
+
+  // The whole region is split into two parts:
+  // tail_temp_ to buffer_tail_ contains allocated buffers;
+  // buffer_head_ to tail_temp_ - 1 belongs to still available spaces.
+  // So in essence, the allocated region grows from the bottom and emulates
+  // SingleArenaBufferAllocator's persistent part.
+  uint8_t* tail_temp_;
+};
+
+}  // namespace tflite
+
+#endif  // TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_PERSISTENT_ARENA_BUFFER_ALLOCATOR_H_
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/recording_single_arena_buffer_allocator.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/recording_single_arena_buffer_allocator.cc
@@ -13,7 +13,7 @@ See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/

-#include "tensorflow/lite/micro/recording_simple_memory_allocator.h"
+#include "tensorflow/lite/micro/arena_allocator/recording_single_arena_buffer_allocator.h"

 #include <new>

@@ -21,47 +21,49 @@ limitations under the License.

 namespace tflite {

-RecordingSimpleMemoryAllocator::RecordingSimpleMemoryAllocator(
+RecordingSingleArenaBufferAllocator::RecordingSingleArenaBufferAllocator(
    ErrorReporter* error_reporter, uint8_t* buffer_head, size_t buffer_size)
-    : SimpleMemoryAllocator(error_reporter, buffer_head, buffer_size),
+    : SingleArenaBufferAllocator(error_reporter, buffer_head, buffer_size),
      requested_head_bytes_(0),
      requested_tail_bytes_(0),
      used_bytes_(0),
      alloc_count_(0) {}

-RecordingSimpleMemoryAllocator::~RecordingSimpleMemoryAllocator() {}
+RecordingSingleArenaBufferAllocator::~RecordingSingleArenaBufferAllocator() {}

-RecordingSimpleMemoryAllocator* RecordingSimpleMemoryAllocator::Create(
-    ErrorReporter* error_reporter, uint8_t* buffer_head, size_t buffer_size) {
+RecordingSingleArenaBufferAllocator*
+RecordingSingleArenaBufferAllocator::Create(ErrorReporter* error_reporter,
+                                            uint8_t* buffer_head,
+                                            size_t buffer_size) {
  TFLITE_DCHECK(error_reporter != nullptr);
  TFLITE_DCHECK(buffer_head != nullptr);
-  RecordingSimpleMemoryAllocator tmp =
-      RecordingSimpleMemoryAllocator(error_reporter, buffer_head, buffer_size);
+  RecordingSingleArenaBufferAllocator tmp = RecordingSingleArenaBufferAllocator(
+      error_reporter, buffer_head, buffer_size);

-  uint8_t* allocator_buffer =
-      tmp.AllocatePersistentBuffer(sizeof(RecordingSimpleMemoryAllocator),
-                                   alignof(RecordingSimpleMemoryAllocator));
+  uint8_t* allocator_buffer = tmp.AllocatePersistentBuffer(
+      sizeof(RecordingSingleArenaBufferAllocator),
+      alignof(RecordingSingleArenaBufferAllocator));
  // Use the default copy constructor to populate internal states.
-  return new (allocator_buffer) RecordingSimpleMemoryAllocator(tmp);
+  return new (allocator_buffer) RecordingSingleArenaBufferAllocator(tmp);
 }

-size_t RecordingSimpleMemoryAllocator::GetRequestedBytes() const {
+size_t RecordingSingleArenaBufferAllocator::GetRequestedBytes() const {
  return requested_head_bytes_ + requested_tail_bytes_;
 }

-size_t RecordingSimpleMemoryAllocator::GetUsedBytes() const {
+size_t RecordingSingleArenaBufferAllocator::GetUsedBytes() const {
  return used_bytes_;
 }

-size_t RecordingSimpleMemoryAllocator::GetAllocatedCount() const {
+size_t RecordingSingleArenaBufferAllocator::GetAllocatedCount() const {
  return alloc_count_;
 }

-TfLiteStatus RecordingSimpleMemoryAllocator::ResizeBuffer(
+TfLiteStatus RecordingSingleArenaBufferAllocator::ResizeBuffer(
    uint8_t* resizable_buf, size_t size, size_t alignment) {
  const uint8_t* previous_head = head();
  TfLiteStatus status =
-      SimpleMemoryAllocator::ResizeBuffer(resizable_buf, size, alignment);
+      SingleArenaBufferAllocator::ResizeBuffer(resizable_buf, size, alignment);
  if (status == kTfLiteOk) {
    used_bytes_ += head() - previous_head;
    requested_head_bytes_ = size;
@@ -69,11 +71,11 @@ TfLiteStatus RecordingSimpleMemoryAllocator::ResizeBuffer(
  return status;
 }

-uint8_t* RecordingSimpleMemoryAllocator::AllocatePersistentBuffer(
+uint8_t* RecordingSingleArenaBufferAllocator::AllocatePersistentBuffer(
    size_t size, size_t alignment) {
  const uint8_t* previous_tail = tail();
  uint8_t* result =
-      SimpleMemoryAllocator::AllocatePersistentBuffer(size, alignment);
+      SingleArenaBufferAllocator::AllocatePersistentBuffer(size, alignment);
  if (result != nullptr) {
    used_bytes_ += previous_tail - tail();
    requested_tail_bytes_ += size;
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/recording_single_arena_buffer_allocator.h
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/recording_single_arena_buffer_allocator.h
@@ -13,28 +13,27 @@ See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/

-#ifndef TENSORFLOW_LITE_MICRO_RECORDING_SIMPLE_MEMORY_ALLOCATOR_H_
-#define TENSORFLOW_LITE_MICRO_RECORDING_SIMPLE_MEMORY_ALLOCATOR_H_
+#ifndef TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_RECORDING_SINGLE_ARENA_BUFFER_ALLOCATOR_H_
+#define TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_RECORDING_SINGLE_ARENA_BUFFER_ALLOCATOR_H_

+#include "tensorflow/lite/micro/arena_allocator/single_arena_buffer_allocator.h"
 #include "tensorflow/lite/micro/compatibility.h"
-#include "tensorflow/lite/micro/simple_memory_allocator.h"

 namespace tflite {

-// Utility class used to log allocations of a SimpleMemoryAllocator. Should only
-// be used in debug/evaluation settings or unit tests to evaluate allocation
-// usage.
-class RecordingSimpleMemoryAllocator : public SimpleMemoryAllocator {
+// Utility class used to log allocations of a SingleArenaBufferAllocator. Should
+// only be used in debug/evaluation settings or unit tests to evaluate
+// allocation usage.
+class RecordingSingleArenaBufferAllocator : public SingleArenaBufferAllocator {
 public:
-  RecordingSimpleMemoryAllocator(ErrorReporter* error_reporter,
-                                 uint8_t* buffer_head, size_t buffer_size);
+  RecordingSingleArenaBufferAllocator(ErrorReporter* error_reporter,
+                                      uint8_t* buffer_head, size_t buffer_size);
  // TODO(b/157615197): Cleanup constructors/destructor and use factory
  // functions.
-  ~RecordingSimpleMemoryAllocator() override;
+  ~RecordingSingleArenaBufferAllocator() override;

-  static RecordingSimpleMemoryAllocator* Create(ErrorReporter* error_reporter,
-                                                uint8_t* buffer_head,
-                                                size_t buffer_size);
+  static RecordingSingleArenaBufferAllocator* Create(
+      ErrorReporter* error_reporter, uint8_t* buffer_head, size_t buffer_size);

  // Returns the number of bytes requested from the head or tail.
  size_t GetRequestedBytes() const;
@@ -62,4 +61,4 @@ class RecordingSimpleMemoryAllocator : public SimpleMemoryAllocator {

 }  // namespace tflite

-#endif  // TENSORFLOW_LITE_MICRO_RECORDING_SIMPLE_MEMORY_ALLOCATOR_H_
+#endif  // TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_RECORDING_SINGLE_ARENA_BUFFER_ALLOCATOR_H_
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/single_arena_buffer_allocator.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/single_arena_buffer_allocator.cc
@@ -13,7 +13,7 @@ See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/

-#include "tensorflow/lite/micro/simple_memory_allocator.h"
+#include "tensorflow/lite/micro/arena_allocator/single_arena_buffer_allocator.h"

 #include <cstddef>
 #include <cstdint>
@@ -29,42 +29,45 @@ limitations under the License.

 namespace tflite {

-SimpleMemoryAllocator::SimpleMemoryAllocator(ErrorReporter* error_reporter,
-                                             uint8_t* buffer_head,
-                                             uint8_t* buffer_tail)
-    : error_reporter_(error_reporter),
+SingleArenaBufferAllocator::SingleArenaBufferAllocator(
+    ErrorReporter* error_reporter, uint8_t* buffer_head, uint8_t* buffer_tail)
+    :
+#if !defined(TF_LITE_STRIP_ERROR_STRINGS)
+      error_reporter_(error_reporter),
+#endif
      buffer_head_(buffer_head),
      buffer_tail_(buffer_tail),
      head_(buffer_head),
      tail_(buffer_tail),
-      temp_(buffer_head_) {}
+      temp_(buffer_head_) {
+}

-SimpleMemoryAllocator::SimpleMemoryAllocator(ErrorReporter* error_reporter,
-                                             uint8_t* buffer,
-                                             size_t buffer_size)
-    : SimpleMemoryAllocator(error_reporter, buffer, buffer + buffer_size) {}
+SingleArenaBufferAllocator::SingleArenaBufferAllocator(
+    ErrorReporter* error_reporter, uint8_t* buffer, size_t buffer_size)
+    : SingleArenaBufferAllocator(error_reporter, buffer, buffer + buffer_size) {
+}

 /* static */
-SimpleMemoryAllocator* SimpleMemoryAllocator::Create(
+SingleArenaBufferAllocator* SingleArenaBufferAllocator::Create(
    ErrorReporter* error_reporter, uint8_t* buffer_head, size_t buffer_size) {
  TFLITE_DCHECK(error_reporter != nullptr);
  TFLITE_DCHECK(buffer_head != nullptr);
-  SimpleMemoryAllocator tmp =
-      SimpleMemoryAllocator(error_reporter, buffer_head, buffer_size);
+  SingleArenaBufferAllocator tmp =
+      SingleArenaBufferAllocator(error_reporter, buffer_head, buffer_size);

-  // Allocate enough bytes from the buffer to create a SimpleMemoryAllocator.
-  // The new instance will use the current adjusted tail buffer from the tmp
-  // allocator instance.
+  // Allocate enough bytes from the buffer to create a
+  // SingleArenaBufferAllocator. The new instance will use the current adjusted
+  // tail buffer from the tmp allocator instance.
  uint8_t* allocator_buffer = tmp.AllocatePersistentBuffer(
-      sizeof(SimpleMemoryAllocator), alignof(SimpleMemoryAllocator));
+      sizeof(SingleArenaBufferAllocator), alignof(SingleArenaBufferAllocator));
  // Use the default copy constructor to populate internal states.
-  return new (allocator_buffer) SimpleMemoryAllocator(tmp);
+  return new (allocator_buffer) SingleArenaBufferAllocator(tmp);
 }

-SimpleMemoryAllocator::~SimpleMemoryAllocator() {}
+SingleArenaBufferAllocator::~SingleArenaBufferAllocator() {}

-uint8_t* SimpleMemoryAllocator::AllocateResizableBuffer(size_t size,
-                                                        size_t alignment) {
+uint8_t* SingleArenaBufferAllocator::AllocateResizableBuffer(size_t size,
+                                                             size_t alignment) {
  // Only supports one resizable buffer, which starts at the buffer head.
  uint8_t* expect_resizable_buf = AlignPointerUp(buffer_head_, alignment);
  if (ResizeBuffer(expect_resizable_buf, size, alignment) == kTfLiteOk) {
@@ -73,20 +76,20 @@ uint8_t* SimpleMemoryAllocator::AllocateResizableBuffer(size_t size,
  return nullptr;
 }

-TfLiteStatus SimpleMemoryAllocator::DeallocateResizableBuffer(
+TfLiteStatus SingleArenaBufferAllocator::DeallocateResizableBuffer(
    uint8_t* resizable_buf) {
  return ResizeBuffer(resizable_buf, 0, 1);
 }

-TfLiteStatus SimpleMemoryAllocator::ReserveNonPersistentOverlayMemory(
+TfLiteStatus SingleArenaBufferAllocator::ReserveNonPersistentOverlayMemory(
    size_t size, size_t alignment) {
  uint8_t* expect_resizable_buf = AlignPointerUp(buffer_head_, alignment);
  return ResizeBuffer(expect_resizable_buf, size, alignment);
 }

-TfLiteStatus SimpleMemoryAllocator::ResizeBuffer(uint8_t* resizable_buf,
-                                                 size_t size,
-                                                 size_t alignment) {
+TfLiteStatus SingleArenaBufferAllocator::ResizeBuffer(uint8_t* resizable_buf,
+                                                      size_t size,
+                                                      size_t alignment) {
  // Only supports one resizable buffer, which starts at the buffer head.
  uint8_t* expect_resizable_buf = AlignPointerUp(buffer_head_, alignment);
  if (head_ != temp_ || resizable_buf != expect_resizable_buf) {
@@ -112,8 +115,8 @@ TfLiteStatus SimpleMemoryAllocator::ResizeBuffer(uint8_t* resizable_buf,
  return kTfLiteOk;
 }

-uint8_t* SimpleMemoryAllocator::AllocatePersistentBuffer(size_t size,
-                                                         size_t alignment) {
+uint8_t* SingleArenaBufferAllocator::AllocatePersistentBuffer(
+    size_t size, size_t alignment) {
  uint8_t* const aligned_result = AlignPointerDown(tail_ - size, alignment);
  if (aligned_result < head_) {
 #ifndef TF_LITE_STRIP_ERROR_STRINGS
@@ -129,7 +132,8 @@ uint8_t* SimpleMemoryAllocator::AllocatePersistentBuffer(size_t size,
  return aligned_result;
 }

-uint8_t* SimpleMemoryAllocator::AllocateTemp(size_t size, size_t alignment) {
+uint8_t* SingleArenaBufferAllocator::AllocateTemp(size_t size,
+                                                  size_t alignment) {
  uint8_t* const aligned_result = AlignPointerUp(temp_, alignment);
  const size_t available_memory = tail_ - aligned_result;
  if (available_memory < size) {
@@ -145,12 +149,12 @@ uint8_t* SimpleMemoryAllocator::AllocateTemp(size_t size, size_t alignment) {
  return aligned_result;
 }

-void SimpleMemoryAllocator::DeallocateTemp(uint8_t* temp_buf) {
+void SingleArenaBufferAllocator::DeallocateTemp(uint8_t* temp_buf) {
  temp_buffer_ptr_check_sum_ ^= (reinterpret_cast<intptr_t>(temp_buf));
  temp_buffer_count_--;
 }

-bool SimpleMemoryAllocator::IsAllTempDeallocated() {
+bool SingleArenaBufferAllocator::IsAllTempDeallocated() {
  if (temp_buffer_count_ != 0 || temp_buffer_ptr_check_sum_ != 0) {
    MicroPrintf(
        "Number of allocated temp buffers: %d. Checksum passing status: %d",
@@ -160,7 +164,7 @@ bool SimpleMemoryAllocator::IsAllTempDeallocated() {
  return true;
 }

-TfLiteStatus SimpleMemoryAllocator::ResetTempAllocations() {
+TfLiteStatus SingleArenaBufferAllocator::ResetTempAllocations() {
  // TODO(b/209453859): enable error check based on IsAllTempDeallocated after
  // all AllocateTemp have been paird with DeallocateTemp
  if (!IsAllTempDeallocated()) {
@@ -172,34 +176,34 @@ TfLiteStatus SimpleMemoryAllocator::ResetTempAllocations() {
  return kTfLiteOk;
 }

-uint8_t* SimpleMemoryAllocator::GetOverlayMemoryAddress() const {
+uint8_t* SingleArenaBufferAllocator::GetOverlayMemoryAddress() const {
  return buffer_head_;
 }

-size_t SimpleMemoryAllocator::GetNonPersistentUsedBytes() const {
+size_t SingleArenaBufferAllocator::GetNonPersistentUsedBytes() const {
  return std::max(head_ - buffer_head_, temp_ - buffer_head_);
 }

-size_t SimpleMemoryAllocator::GetPersistentUsedBytes() const {
+size_t SingleArenaBufferAllocator::GetPersistentUsedBytes() const {
  return buffer_tail_ - tail_;
 }

-size_t SimpleMemoryAllocator::GetAvailableMemory(size_t alignment) const {
+size_t SingleArenaBufferAllocator::GetAvailableMemory(size_t alignment) const {
  uint8_t* const aligned_temp = AlignPointerUp(temp_, alignment);
  uint8_t* const aligned_tail = AlignPointerDown(tail_, alignment);
  return aligned_tail - aligned_temp;
 }

-size_t SimpleMemoryAllocator::GetUsedBytes() const {
+size_t SingleArenaBufferAllocator::GetUsedBytes() const {
  return GetPersistentUsedBytes() + GetNonPersistentUsedBytes();
 }

-size_t SimpleMemoryAllocator::GetBufferSize() const {
+size_t SingleArenaBufferAllocator::GetBufferSize() const {
  return buffer_tail_ - buffer_head_;
 }

-uint8_t* SimpleMemoryAllocator::head() const { return head_; }
+uint8_t* SingleArenaBufferAllocator::head() const { return head_; }

-uint8_t* SimpleMemoryAllocator::tail() const { return tail_; }
+uint8_t* SingleArenaBufferAllocator::tail() const { return tail_; }

 }  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/single_arena_buffer_allocator.h
+++ b/code/components/tflite-lib/tensorflow/lite/micro/arena_allocator/single_arena_buffer_allocator.h
@@ -13,37 +13,37 @@ See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/

-#ifndef TENSORFLOW_LITE_MICRO_SIMPLE_MEMORY_ALLOCATOR_H_
-#define TENSORFLOW_LITE_MICRO_SIMPLE_MEMORY_ALLOCATOR_H_
+#ifndef TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_SINGLE_ARENA_BUFFER_ALLOCATOR_H_
+#define TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_SINGLE_ARENA_BUFFER_ALLOCATOR_H_

 #include <cstddef>
 #include <cstdint>

 #include "tensorflow/lite/c/common.h"
 #include "tensorflow/lite/core/api/error_reporter.h"
+#include "tensorflow/lite/micro/arena_allocator/ibuffer_allocator.h"
 #include "tensorflow/lite/micro/compatibility.h"
-#include "tensorflow/lite/micro/ibuffer_allocator.h"

 namespace tflite {

 // TODO(petewarden): This allocator never frees up or reuses  any memory, even
 // though we have enough information about lifetimes of the tensors to do so.
 // This makes it pretty wasteful, so we should use a more intelligent method.
-class SimpleMemoryAllocator : public INonPersistentBufferAllocator,
-                              public IPersistentBufferAllocator {
+class SingleArenaBufferAllocator : public INonPersistentBufferAllocator,
+                                   public IPersistentBufferAllocator {
 public:
  // TODO(b/157615197): Cleanup constructors/destructor and use factory
  // functions.
-  SimpleMemoryAllocator(ErrorReporter* error_reporter, uint8_t* buffer_head,
-                        uint8_t* buffer_tail);
-  SimpleMemoryAllocator(ErrorReporter* error_reporter, uint8_t* buffer,
-                        size_t buffer_size);
-  virtual ~SimpleMemoryAllocator();
+  SingleArenaBufferAllocator(ErrorReporter* error_reporter,
+                             uint8_t* buffer_head, uint8_t* buffer_tail);
+  SingleArenaBufferAllocator(ErrorReporter* error_reporter, uint8_t* buffer,
+                             size_t buffer_size);
+  virtual ~SingleArenaBufferAllocator();

-  // Creates a new SimpleMemoryAllocator from a given buffer head and size.
-  static SimpleMemoryAllocator* Create(ErrorReporter* error_reporter,
-                                       uint8_t* buffer_head,
-                                       size_t buffer_size);
+  // Creates a new SingleArenaBufferAllocator from a given buffer head and size.
+  static SingleArenaBufferAllocator* Create(ErrorReporter* error_reporter,
+                                            uint8_t* buffer_head,
+                                            size_t buffer_size);

  // Resizes a buffer that is previously returned by the
  // AllocateResizableBuffer. In current implementation, it Adjusts the head
@@ -126,7 +126,9 @@ class SimpleMemoryAllocator : public INonPersistentBufferAllocator,
 private:
  size_t GetBufferSize() const;

+#if !defined(TF_LITE_STRIP_ERROR_STRINGS)
  ErrorReporter* error_reporter_;
+#endif
  uint8_t* buffer_head_;
  uint8_t* buffer_tail_;
  uint8_t* head_;
@@ -147,4 +149,4 @@ class SimpleMemoryAllocator : public INonPersistentBufferAllocator,

 }  // namespace tflite

-#endif  // TENSORFLOW_LITE_MICRO_SIMPLE_MEMORY_ALLOCATOR_H_
+#endif  // TENSORFLOW_LITE_MICRO_ARENA_ALLOCATOR_SINGLE_ARENA_BUFFER_ALLOCATOR_H_
--- a/code/components/tflite-lib/tensorflow/lite/micro/fake_micro_context.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/fake_micro_context.cc
@@ -16,10 +16,10 @@ limitations under the License.
 #include "tensorflow/lite/micro/fake_micro_context.h"

 #include "tensorflow/lite/kernels/internal/compatibility.h"
+#include "tensorflow/lite/micro/arena_allocator/single_arena_buffer_allocator.h"
 #include "tensorflow/lite/micro/micro_allocator.h"
 #include "tensorflow/lite/micro/micro_arena_constants.h"
 #include "tensorflow/lite/micro/micro_error_reporter.h"
-#include "tensorflow/lite/micro/simple_memory_allocator.h"

 namespace tflite {
 namespace {
@@ -30,7 +30,7 @@ static uint8_t dummy_tensor_arena[KDummyTensorArenaSize];
 }  // namespace

 FakeMicroContext::FakeMicroContext(TfLiteTensor* tensors,
-                                   SimpleMemoryAllocator* allocator,
+                                   SingleArenaBufferAllocator* allocator,
                                   MicroGraph* micro_graph)
    : MicroContext(
          MicroAllocator::Create(dummy_tensor_arena, KDummyTensorArenaSize,
@@ -67,10 +67,10 @@ TfLiteEvalTensor* FakeMicroContext::GetEvalTensor(int tensor_index) {
 }

 void* FakeMicroContext::AllocatePersistentBuffer(size_t bytes) {
-  // FakeMicroContext use SimpleMemoryAllocator, which does not automatically
-  // apply the buffer alignment like MicroAllocator.
-  // The buffer alignment is potentially wasteful but allows the
-  // fake_micro_context to work correctly with optimized kernels.
+  // FakeMicroContext use SingleArenaBufferAllocator, which does not
+  // automatically apply the buffer alignment like MicroAllocator. The buffer
+  // alignment is potentially wasteful but allows the fake_micro_context to work
+  // correctly with optimized kernels.
  return allocator_->AllocatePersistentBuffer(bytes,
                                              MicroArenaBufferAlignment());
 }
--- a/code/components/tflite-lib/tensorflow/lite/micro/fake_micro_context.h
+++ b/code/components/tflite-lib/tensorflow/lite/micro/fake_micro_context.h
@@ -23,7 +23,7 @@ namespace tflite {
 // A fake of MicroContext for kernel util tests.
 class FakeMicroContext : public MicroContext {
 public:
-  FakeMicroContext(TfLiteTensor* tensors, SimpleMemoryAllocator* allocator,
+  FakeMicroContext(TfLiteTensor* tensors, SingleArenaBufferAllocator* allocator,
                   MicroGraph* micro_graph);

  void* AllocatePersistentBuffer(size_t bytes) override;
@@ -46,7 +46,7 @@ class FakeMicroContext : public MicroContext {
  TfLiteTensor* tensors_;
  int allocated_tensor_count_ = 0;

-  SimpleMemoryAllocator* allocator_;
+  SingleArenaBufferAllocator* allocator_;

  TF_LITE_REMOVE_VIRTUAL_DELETE
 };
--- a/code/components/tflite-lib/tensorflow/lite/micro/kernels/activations.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/kernels/activations.cc
@@ -24,6 +24,7 @@ limitations under the License.
 #include "tensorflow/lite/kernels/kernel_util.h"
 #include "tensorflow/lite/kernels/op_macros.h"
 #include "tensorflow/lite/micro/kernels/kernel_util.h"
+#include "tensorflow/lite/micro/micro_error_reporter.h"
 #include "tensorflow/lite/micro/micro_utils.h"

 namespace tflite {
@@ -60,8 +61,8 @@ TfLiteStatus ReluEval(TfLiteContext* context, TfLiteNode* node) {
      return kTfLiteOk;
    }
    default: {
-      TF_LITE_KERNEL_LOG(context, "Only float32 is supported currently, got %s",
-                         TfLiteTypeGetName(input->type));
+      MicroPrintf("Only float32 is supported currently, got %s",
+                  TfLiteTypeGetName(input->type));
      return kTfLiteError;
    }
  }
@@ -99,8 +100,8 @@ TfLiteStatus Relu6Eval(TfLiteContext* context, TfLiteNode* node) {
      return kTfLiteOk;
    }
    default: {
-      TF_LITE_KERNEL_LOG(context, "Only float32 is supported currently, got %s",
-                         TfLiteTypeGetName(input->type));
+      MicroPrintf("Only float32 is supported currently, got %s",
+                  TfLiteTypeGetName(input->type));
      return kTfLiteError;
    }
  }
@@ -109,25 +110,11 @@ TfLiteStatus Relu6Eval(TfLiteContext* context, TfLiteNode* node) {
 }  // namespace

 TfLiteRegistration Register_RELU() {
-  return {/*init=*/ReluInit,
-          /*free=*/nullptr,
-          /*prepare=*/ReluPrepare,
-          /*invoke=*/ReluEval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(ReluInit, ReluPrepare, ReluEval);
 }

 TfLiteRegistration Register_RELU6() {
-  return {/*init=*/Relu6Init,
-          /*free=*/nullptr,
-          /*prepare=*/Relu6Prepare,
-          /*invoke=*/Relu6Eval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(Relu6Init, Relu6Prepare, Relu6Eval);
 }

 }  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/kernels/add.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/kernels/add.cc
@@ -159,14 +159,7 @@ TfLiteStatus AddEval(TfLiteContext* context, TfLiteNode* node) {
 }

 TfLiteRegistration Register_ADD() {
-  return {/*init=*/AddInit,
-          /*free=*/nullptr,
-          /*prepare=*/AddPrepare,
-          /*invoke=*/AddEval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(AddInit, AddPrepare, AddEval);
 }

 }  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/kernels/add_n.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/kernels/add_n.cc
@@ -208,14 +208,7 @@ TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) {
 }  // namespace

 TfLiteRegistration Register_ADD_N() {
-  return {/*init=*/nullptr,
-          /*free=*/nullptr,
-          /*prepare=*/Prepare,
-          /*invoke=*/Eval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(nullptr, Prepare, Eval);
 }

 }  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/kernels/arg_min_max.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/kernels/arg_min_max.cc
@@ -104,25 +104,11 @@ TfLiteStatus ArgMaxEval(TfLiteContext* context, TfLiteNode* node) {
 }  // namespace arg_min_max

 TfLiteRegistration Register_ARG_MAX() {
-  return {/*init=*/nullptr,
-          /*free=*/nullptr,
-          /*prepare=*/nullptr,
-          /*invoke=*/arg_min_max::ArgMaxEval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(nullptr, nullptr, arg_min_max::ArgMaxEval);
 }

 TfLiteRegistration Register_ARG_MIN() {
-  return {/*init=*/nullptr,
-          /*free=*/nullptr,
-          /*prepare=*/nullptr,
-          /*invoke=*/arg_min_max::ArgMinEval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(nullptr, nullptr, arg_min_max::ArgMinEval);
 }

 }  // namespace micro
--- a/code/components/tflite-lib/tensorflow/lite/micro/kernels/assign_variable.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/kernels/assign_variable.cc
@@ -95,14 +95,7 @@ TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) {
 }  // namespace.

 TfLiteRegistration Register_ASSIGN_VARIABLE() {
-  return {/*init=*/nullptr,
-          /*free=*/nullptr,
-          /*prepare=*/Prepare,
-          /*invoke=*/Eval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(nullptr, Prepare, Eval);
 }

 }  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/kernels/batch_to_space_nd.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/kernels/batch_to_space_nd.cc
@@ -105,14 +105,7 @@ TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) {
 }  // namespace.

 TfLiteRegistration Register_BATCH_TO_SPACE_ND() {
-  return {/*init=*/nullptr,
-          /*free=*/nullptr,
-          /*prepare=*/Prepare,
-          /*invoke=*/Eval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(nullptr, Prepare, Eval);
 }

 }  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/kernels/broadcast_args.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/kernels/broadcast_args.cc
@@ -84,14 +84,8 @@ TfLiteStatus BroadcastArgsEval(TfLiteContext* context, TfLiteNode* node) {
 }  // namespace

 TfLiteRegistration Register_BROADCAST_ARGS() {
-  return {/*init=*/nullptr,
-          /*free=*/nullptr,
-          /*prepare=*/BroadcastArgsPrepare,
-          /*invoke=*/BroadcastArgsEval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(nullptr, BroadcastArgsPrepare,
+                                   BroadcastArgsEval);
 }

-}  // namespace tflite
+}  // namespace tflite
--- a/code/components/tflite-lib/tensorflow/lite/micro/kernels/broadcast_to.cc
+++ b/code/components/tflite-lib/tensorflow/lite/micro/kernels/broadcast_to.cc
@@ -116,14 +116,8 @@ TfLiteStatus BroadcastToEval(TfLiteContext* context, TfLiteNode* node) {
 }  // namespace

 TfLiteRegistration Register_BROADCAST_TO() {
-  return {/*init=*/nullptr,
-          /*free=*/nullptr,
-          /*prepare=*/BroadcastToPrepare,
-          /*invoke=*/BroadcastToEval,
-          /*profiling_string=*/nullptr,
-          /*builtin_code=*/0,
-          /*custom_name=*/nullptr,
-          /*version=*/0};
+  return tflite::micro::RegisterOp(nullptr, BroadcastToPrepare,
+                                   BroadcastToEval);
 }

-}  // namespace tflite
+}  // namespace tflite
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
jomjol	bce99da6e5	v11.2.0	2022-08-28 20:04:36 +02:00
jomjol	234925c850	Merge branch 'rolling'	2022-08-28 19:54:15 +02:00
jomjol	c9b7a5f84c	v11.2.0	2022-08-28 19:52:51 +02:00
jomjol	338184712d	Delete dig-cont_0570_s3_q.tflite	2022-08-27 20:41:14 +02:00
jomjol	b2c510d73e	Rolling 20220827	2022-08-27 20:21:53 +02:00
jomjol	6a5d5511f1	Merge pull request #949 from caco3/beautify-restart Beautify restart	2022-08-27 18:03:03 +02:00
jomjol	ec99ac3a3b	Merge pull request #951 from haverland/rolling Testcases for #942	2022-08-27 18:01:43 +02:00
Frank Haverland	f676f12e02	Testcases for https://github.com/jomjol/AI-on-the-edge-device/issues/942	2022-08-27 16:48:42 +02:00
George Ruinelli	296a50a6d2	.	2022-08-26 23:51:24 +02:00
George Ruinelli	b1ee3d8793	Show progress on reboot and reload page automatically	2022-08-26 23:45:25 +02:00
jomjol	993fbfe5a8	Rolling 2022-08-26	2022-08-26 21:20:26 +02:00
jomjol	2b60e81a52	Rolling 2022-08-24	2022-08-24 17:52:21 +02:00
jomjol	11418459b8	Merge pull request #939 from caco3/rolling Extend Config Page	2022-08-24 17:46:34 +02:00
George Ruinelli	3b8b8e47da	Added link to wiki. Added filter for CNNs: on digital selection show all files except those which contain '/ana' in theri name and vice versa for the analog selection.	2022-08-22 23:48:48 +02:00
jomjol	ae302d49ef	Rolling 2022-08-22	2022-08-22 22:17:32 +02:00
jomjol	0153229d3c	Merge pull request #936 from haverland/rolling add test case to reproduce	2022-08-22 21:07:02 +02:00
jomjol	eeb74dd6fd	Merge pull request #931 from caco3/master added favicon and adjusted window title to show hostname first	2022-08-22 21:01:46 +02:00
Frank Haverland	ecc62a3ba9	add test case to reproduce	2022-08-22 21:01:44 +02:00
jomjol	aca60465f0	v11.1.1	2022-08-22 18:20:00 +02:00
jomjol	57bdca37fc	v11.1.1	2022-08-22 18:12:08 +02:00
jomjol	6409397770	Merge pull request #933 from haverland/rolling Fix for #921, #919	2022-08-22 17:58:30 +02:00
Frank Haverland	974044adf0	last analog result_float as Readout parameter, testcases for #921 , #919	2022-08-22 16:10:07 +02:00
Frank Haverland	59aeeda786	Merge branch 'rolling' into analogtodig_as_float	2022-08-22 10:33:10 +02:00
Frank Haverland	b6bf8d992f	Test case for postprocessing	2022-08-22 10:31:25 +02:00
jomjol	9d31edc67a	Rolling 2022-08-21	2022-08-21 21:33:56 +02:00
George Ruinelli	1b4f4bdd6d	added favicon and adjusted window title to show hostname first	2022-08-21 21:05:59 +02:00
jomjol	c9a879d329	v11.1.0	2022-08-21 17:56:46 +02:00
jomjol	ea69b1be00	v11.1.0	2022-08-21 17:44:34 +02:00
jomjol	2a8b3a87ea	Merge pull request #922 from haverland/rolling Rolling	2022-08-21 16:47:07 +02:00
jomjol	52783734ce	Merge pull request #925 from jochenchrist/master Remove .DS_Store	2022-08-21 16:44:12 +02:00
jomjol	0e1b390ec6	Merge pull request #918 from ppisljar/patch-1 Update FeatureRequest.md	2022-08-21 16:41:30 +02:00
jochen	ab49bdf82f	.DS_Store removed	2022-08-20 19:25:52 +02:00
jochenchrist	25e7051271	Delete .DS_Store	2022-08-20 19:23:08 +02:00
Frank Haverland	7315f9adfc	Merge branch 'jomjol:rolling' into rolling	2022-08-19 21:33:08 +02:00
Frank Haverland	af1aee4ac3	add testcase for #921	2022-08-19 21:30:11 +02:00
Frank Haverland	d6ff7eef88	fix problems with early transition of digits if analog pointers. #921	2022-08-19 21:05:23 +02:00
Frank Haverland	7706b4dbc3	fix for #919 the prev is int, so <9.0 instead of <9.5	2022-08-18 19:28:13 +02:00
Peter Pisljar	3561ecd2b7	Update FeatureRequest.md	2022-08-18 15:16:57 +02:00
jomjol	74c7ff7fdf	v11.0.1	2022-08-15 22:48:42 +02:00
jomjol	a68ce353ad	Merge pull request #910 from haverland/rolling Fix naming of models and new version	2022-08-13 15:58:57 +02:00
Frank Haverland	0d168f3445	Merge branch 'jomjol:rolling' into rolling	2022-08-13 15:38:57 +02:00
Frank Haverland	073e04a3cc	fix naming of models and new versions	2022-08-13 15:37:04 +02:00
jomjol	591dc048d4	v11.0.0	2022-08-13 14:26:04 +02:00
jomjol	bfe8d3b37a	v11.0.0	2022-08-13 14:20:40 +02:00
jomjol	9695dba415	Merge branch 'master' into rolling	2022-08-07 21:20:28 +02:00
jomjol	6a48f0502e	Merge pull request #885 from haverland/rolling CNNThreshold removed vor Analog100 and Digital100	2022-08-07 21:19:37 +02:00
Frank Haverland	4a8d6592ab	CNNThreshold removed for Analog100 and Digital100	2022-07-28 19:43:45 +02:00
jomjol	434aebd641	Merge pull request #881 from haverland/rolling Ignore hidden files in configuration->model selection	2022-07-25 19:11:29 +02:00
Frank Haverland	c124c38e70	ignore hidden files in configuration->model selection	2022-07-25 16:30:11 +02:00
Frank Haverland	e6d60bb124	Merge branch 'jomjol:rolling' into rolling	2022-07-24 20:20:53 +02:00
Frank Haverland	3f3532defe	Revert "Fix for #712 "Incorrect rollover digital numbers"" This reverts commit `11bfaf0e91`.	2022-07-20 19:12:19 +02:00
Frank Haverland	a0ffc88e47	Merge branch 'rolling' of https://github.com/haverland/AI-on-the-edge-device into rolling	2022-07-20 18:37:05 +02:00
Frank Haverland	11bfaf0e91	Fix for #712 "Incorrect rollover digital numbers"	2022-07-20 18:35:42 +02:00