v11.2.0

Merge branch 'rolling'
v11.2.0
2025-12-09 13:06:54 +03:00 · 2022-08-28 20:04:36 +02:00 · 2022-08-28 19:54:15 +02:00 · 2022-08-28 19:52:51 +02:00 · 2022-08-27 20:41:14 +02:00 · 2022-08-27 20:21:53 +02:00
368 changed files with 29092 additions and 3533 deletions
--- a/.DS_Store
+++ b/.DS_Store
--- a/.gitignore
+++ b/.gitignore
@@ -4,6 +4,7 @@
 .code-workspace
 /sd-card/htm./.vscode/
 /code/build
+/sd-card/html/debug/

 CMakeLists.txt.user
 CMakeCache.txt
@@ -15,3 +16,5 @@ install_manifest.txt
 compile_commands.json
 CTestTestfile.cmake
 _deps
+code/edgeAI.code-workspace
+.DS_Store
--- a/Changelog.md
+++ b/Changelog.md
@@ -1,5 +1,151 @@
 # Versions

+##### 10.6.2 - Stability Increase (2022-07-24)
+
+- **NEW 10.6.2**: ignore hidden files in model selection (configuration page) 
+
+- **NEW 10.6.1**: Revoke esp32cam & tflite update
+
+- **NEW 10.6.1**: Bug Fix: tflite-filename with ".", HTML spelling error
+
+- IndluxDB: direct injection into InfluxDB - thanks to **[wetneb](https://github.com/wetneb)**
+
+- MQTT: implemented "Retain Flag" and extend with absolute Change (in addition to rate)
+
+- `config.ini`: removal of modelsize (readout from tflite)
+
+- Updated analog neural network file (`ana1000s2.tflite`) & digital neural network file (`dig1400s2q.tflite`)
+
+- TFMicro/Lite: Update (espressif Version 20220716)
+
+- Updated esp32cam (v20220716)
+
+- ESP-IDF: Update to 4.4
+
+- Internal update (CNN algorithm optimizations, reparation for new neural network type)
+
+- Bug Fix: no time with fixed IP, Postprocessing, MQTT
+
+  
+
+##### 10.5.2 - Stability Increase (2022-02-22)
+
+- NEW 10.5.2: Bug Fix: wrong `firmware.bin` (no rate update)
+- NEW 10.5.1: Bug Fix: wrong return value, rate value & PreValue status, HTML: SSID & IP were not displayed 
+- MQTT: changed wifi naming to "wifiRSSI"
+- HTML: check selectable values for consistency
+- Refactoring of check postprocessing consistency (e.g. max rate, negative rate, ...)
+- Bug Fix: corrected error in "Check Consistency Increase"
+
+
+
+##### 10.4.0 - Stability Increase (2022-02-12)
+
+- Graphical configuration: select available neural network files (*.tfl, *.tflite) from drop down menu
+- OTA-update: add option to upload tfl / tflite files to the correct location (`/config/`)
+  - In the future the new files will also be copied to the `firmware` directory of the repository
+- Added Wifi RSSI to MQTT information
+- Updated analog neural network file (`ana-s3-q-20220105.tflite`)
+- Updated digital neural network file (`dig-s1-q-20220102.tflite`)
+- Updated build environment to `Espressif 3.5.0`
+
+
+
+##### 10.3.0 - Stability Increase (2022-01-29)
+
+- Implemented LED flash dimming (`LEDIntensity`). 
+  Remark: as auto illumination in the camera is used, this is rather for energy saving. It will not help reducing reflections
+- Additional camera parameters: saturation, contrast (although not too much impact yet)
+- Some readings will have removable "N"s that can not be removed automatically and are handled with an "error" --> no return value in the field "value" anymore (still reported back via field "raw value")
+- Updated esp32 camera hardware driver
+- Bug fix: MQTT, HTML improvements
+
+**ATTENTION:  The new ESP32 camera hardware driver is much more stable on newer OV2640 versions (no or much less reboots) but seems to be not fully compatible with older versions.**
+
+* If you have problem with stalled systems you can try the following
+  - Update the parameter `ImageQuality` to `12` instead of current value `5` (manually in the `config.ini`)
+
+  - If this is not helping, you might need to update your hardware or stay with version 9.2
+
+##### 10.2.0 - Stability Increase (2022-01-14)
+
+- Due to the updated camera driver, the image looks different and a new setup might be needed
+
+  - Update reference image
+  - Update Alignment marks
+
+- Reduce reboot due to camera problems
+
+- Update esp32-camera to new version (master as of 2022-01-09)
+
+  
+
+##### 10.1.1 - Stability Increase (2022-01-12)
+
+- Bug Fix MQTT problem
+- Issue:
+  - Changing from v9.x to 10.x the MQTT-parameter "Topic" was renamed into "MainTopic" to address multiple number meters. This renaming should have been done automatically in the background within the graphical configuration, but was not working. Instead the parameter "Topic" was deleted and "MainTopic" was set to disabled and "undefined".
+- ToDo
+  - Update the `html.zip`
+  - If old `config.ini` available: copy it to `/config`, open the graphical configuration and save it again.
+  - If old `config.ini` not available: reset the parameter "MainTopic" within the `config.ini` manually
+  - Reboot
+
+##### 10.1.0 - Stability Increase (2022-01-09)
+
+- Reduce ESP32 frequency to 160MHz
+
+- Update tflite (new source: https://github.com/espressif/tflite-micro-esp-examples)
+
+- Update analog neural network (ana-s3-q-20220105.tflite)
+
+- Update digital neural network (dig-s1-q-20220102.tflite)
+
+- Increased web-server buffers
+- bug fix: compiler compatibility
+
+##### 10.0.2 - Stability Increase (2022-01-01)
+
+- NEW v10.0.2: Corrected JSON error
+
+- Updated compiler toolchain to ESP-IDF 4.3
+
+- Removal of memory leak
+
+- Improved error handling during startup (check PSRAM and camera with remark in logfile)
+
+- MQTT: implemented raw value additionally, removal of regex contrain
+
+- Normalized Parameter ``MaxRateValue``  to "change per minute" 
+
+- HTML: improved input handling
+
+- Corrected error handling: in case of error the old value, rate, timestamp are not transmitted any more
+
+  
+
+##### 9.2.0 - External Illumination (2021-12-02)
+
+- Direct JSON access: ``http://IP-ADRESS/json`` 
+- Error message in log file in case camera error during startup
+- Upgrade analog CNN to v9.1.0
+- Upgrade digital CNN to v13.3.0 (added new images)
+- html: support of different ports
+
+##### 9.1.1 - External Illumination (2021-11-16)
+
+- NEW 9.1.1 bug fix: LED implemenetation
+- External LEDs: change control mode (resolve bug with more than 2 LEDs)
+- Additional info into log file
+- Bug fix: decimal shift, html, log file
+
+##### 9.0.0 - External Illumination (2021-10-23)
+
+* Implementation of external illumination to adjust positioning, brightness and color of the illumination now set individually
+  * Technical details can be found in the wiki: https://github.com/jomjol/AI-on-the-edge-device/wiki/External-LED
+    <img src="https://raw.githubusercontent.com/jomjol/ai-on-the-edge-device/master/images/intern_vs_external.jpg" width="500">
+* New housing published for external LEDs and small clearing: https://www.thingiverse.com/thing:5028229
+


 ##### 8.5.0 - Multi Meter Support (2021-10-07)
--- a/FeatureRequest.md
+++ b/FeatureRequest.md
@@ -11,6 +11,31 @@

 ____

+#### #29 Add favicon and use the hostname for the website
+
+* https://github.com/jomjol/AI-on-the-edge-device/issues/927
+
+#### #28 Improved error handling for ROIs
+
+* In case a ROI is out of the image, there is no error message, but a non sense image is used
+* Implement a error message for wrong configuratioin of ROI
+
+#### #27 Use Homie Spec for Mqtt binding
+
+* Use the standardized Home Protocol for the Mqtt binding 
+* https://homieiot.github.io/
+
+#### #26 Changes behaviour for "N" replacement
+
+* in case the higher digits has already increased by minium 1 - don't set the "N" to the last value, but to "0"
+* https://github.com/jomjol/AI-on-the-edge-device/issues/792
+
+
+#### #25 Trigger Measurement via MQTT
+
+* https://github.com/jomjol/AI-on-the-edge-device/issues/727
+
+
 #### #24 Show Mqtt state directly in Webserver

 * Show MQTT log in Web page. E.g. connection established or failed to connect...
@@ -39,7 +64,8 @@ ____
 #### #20 Deep sleep and push mode

 * Let the device be normally in deep sleep state, and wake it up periodically to collect data and push it via MQTT or HTTP post.
-
+* Support ESP-NOW to reduce the overhead of connecting to wifi and mqtt 
+* the above should enable battery powered applications
  

 #### #19 Extended log informations
@@ -48,18 +74,15 @@ ____

  

-#### #18 Document WLAN-strength in web page
+#### ~~#18 Document WLAN-strength in web page~~

-* https://github.com/jomjol/AI-on-the-edge-device/issues/563
+* ~~https://github.com/jomjol/AI-on-the-edge-device/issues/563~~



-#### #17 Direct InfluxDB connection
+#### ~~#17 Direct InfluxDB connection~~

-* https://github.com/jomjol/AI-on-the-edge-device/issues/534
-* Direct interface to a InfluxDB data base
-* Integrate InfluxDB interface in firmware
-* Adapt html web page for configuration
+* ~~Done in v10.6.0~~


 #### #16 Serial Communication
@@ -101,9 +124,9 @@ ____

  

-#### #12 Less reboots due to memory leakage
+#### ~~#12 Less reboots due to memory leakage~~

-* Issue: #414 & #425  #430
+* ~~Issue: #414 & #425  #430~~

  

@@ -222,4 +245,4 @@ ____

 * ~~Implementation of a software module for external light source (e.g. WS8132 LED controller, ...)~~
 * ~~Update of the camera module to use the external light instead of the internal flash light~~
-* ~~Adopt the configuration algorithm with a configurable light source~~
+* ~~Adopt the configuration algorithm with a configurable light source~~
--- a/README.md
+++ b/README.md
@@ -4,11 +4,10 @@ This is an example of Artificial Intelligence (AI) calculations on a very cheap

 ### Details on **function**, **installation** and **configuration** can be found on the **[Wiki Page](https://github.com/jomjol/AI-on-the-edge-device/wiki)**

-A 3d-printable housing can be found here: https://www.thingiverse.com/thing:4573481
-
-or here https://www.thingiverse.com/thing:5028229
-
-respectively ESP32-Cam housing only: https://www.thingiverse.com/thing:4571627
+A 3d-printable housing can be found here:
+  - https://www.thingiverse.com/thing:4573481 (Water Meter)
+  - https://www.thingiverse.com/thing:5028229 (Power Meter)
+  - https://www.thingiverse.com/thing:4571627 (ESP32-Cam housing only)

 <img src="https://raw.githubusercontent.com/jomjol/AI-on-the-edge-device/master/images/watermeter_all.jpg" width="200"><img src="https://raw.githubusercontent.com/jomjol/AI-on-the-edge-device/master/images/main.jpg" width="200"><img src="https://raw.githubusercontent.com/jomjol/AI-on-the-edge-device/master/images/size.png" width="200"> 

@@ -34,152 +33,58 @@ If you have any technical topics, you can file a issue in this repository.

 In other cases you can contact the developer via email: <img src="https://raw.githubusercontent.com/jomjol/AI-on-the-edge-device/master/images/mail.jpg" height="25"> 

------
-## Coming next
-
-* Automated update of the neural network file (tflite) to make the learing of additional pictures much easier and automated (GitHub action)
-* New "hyprid" neural network for digital numbers --> allowing the detection of intermediate states ("ring between two numbers") as a subdigit
-

 ------
 ## Change log
-### Known Issues
-
-* slow response of web server during picture analysis
-
-**General remark:** Beside the `firmware.bin`, typically also the content of `/html` needs to be updated!
+**General remark:** Besides the file `firmware.bin`, typically the content of `/html` will need to be updated!

 ------

+##### 11.2.0 - Intermediate Digits (2022-08-28)

+- Updated Tensorflow / TFlite to newest tflite (version as of 2022-07-27)
+- Updated analog neural network file (`ana-cont_11.3.0_s2.tflite` - default, `ana-class100_0120_s1_q.tflite`)
+- Updated digital neural network file (`dig-cont_0570_s3.tflite` - default, `dig-class100_0120_s2_q.tflite`)

-##### 10.5.2 - Stability Increase (2022-02-22)
+- Added automated filtering of tflite-file in the graphical configuration (thanks to @**[caco3](https://github.com/caco3)**)
+- Updated consistency algorithm & test cases
+- HTML: added favicon and system name, Improved reboot dialog  (thanks to @**[caco3](https://github.com/caco3)**)

- **NEW 10.5.2:** Bug Fix: wrong `firmware.bin` (no rate update)
- NEW 10.5.1: Bug Fix: wrong return value, rate value & PreValue status, HTML: SSID & IP were not displayed 
- MQTT: changed wifi naming to "wifiRSSI"
- HTML: check select able values for consistency
- Refactoring of check postprocessing consistency (e.g. max rate, negative rate, ...)
- Bug Fix: corrected error in "Check Consistency Increase"
+##### 11.1.1 - Intermediate Digits (2022-08-22)

+- New and improved consistency check (especially with analog and digital counters mixed)
+- Bug Fix: digital counter algorithm

+##### 11.0.1 - Intermediate Digits (2022-08-18)

-##### 10.4.0 - Stability Increase (2022-02-12)
+- **NEW v11.0.1**: Bug Fix InfluxDB configuration (only update of html.zip necessary)

- Graphical configuration: select available neural network files (*.tfl, *.tflite) from drop down menu
- OTA-update: add option to upload tfl / tflite files to the correct locatioin (`/config/`)
-  - in future the new files will also be copied to the `firmware` directory of the repository
- Added Wifi RSSI to MQTT information
- Updated analog neural network file (`ana-s3-q-20220105.tflite`)
- Updated digital neural network file (`dig-s1-q-20220102.tflite`)
- Updated build environment to `Espressif 3.5.0`
+- Implementation of new CNN types to detect intermediate values of digits with rolling numbers

+  - By default the old algo (0, 1, ..., 9, "N") is active (due to the limited types of digits trained so far)
+  - Activation can be done by selection a tflite file with the new trained model in the 'config.ini'
+  - **Details can be found in the [wiki](https://github.com/jomjol/AI-on-the-edge-device/wiki/Neural-Network-Types)** (different types, trained image types, naming convention)

+- Updated  neural network files (and adaption to new naming convention)

-##### 10.3.0 - Stability Increase (2022-01-29)
+- Published a tool to download and combine log files - **Thanks to **

- Implemented LED flash dimming (`LEDIntensity`). 
-  Remark: as auto illumination in the camera is used, this is rather for energy saving. It will not help reducing reflections
- Additional camera parameters: saturation, contrast (although not too much impact yet)
- Readings with not automatically removable "N"s are handled like "error" --> no return value in the field "value" anymore 
-  (still reported back via field "raw value")
- Updated esp32 camera hardware driver
- Bug fix: MQTT, html improvements
+  - Files see ['/tools/logfile-tool'](tbd), How-to see [wiki](https://github.com/jomjol/AI-on-the-edge-device/wiki/Gasmeter-Log-Downloader)

-**ATTENTION:  The new ESP32 camera hardware driver is much more stable on newer OV2640  versions (no or much less reboots) but seems to be not fully compatible with older versions.**
-
-* If you have problem with stalled systems you can try the following
-  - Update the parameter `ImageQuality` to `12` instead of current value `5` (manually in the `config.ini`)
-
-  - If this is not helping, you might need to update your hardware or stay with version 9.2
-
-##### 10.2.0 - Stability Increase (2022-01-14)
-
- Due to the update camera driver, the image looks different and a new setup might be needed
-
-  - Update reference image
-  - Update Alignment marks
-
- Reduce reboot due to camera problems
-
- Update esp32-camera to new version (master as of 2022-01-09)
+- Bug Fix: InfluxDB enabling in grahic configuration

  

-##### 10.1.1 - Stability Increase (2022-01-12)
+## Tools

- Bug Fix MQTT problem
- Issue:
-  - Changing from v9.x to 10.x the MQTT-paramter "Topic" was renamed  into "MainTopic" to address multiple number meters This renaming should have been done automatically in the background  within the graphical configuration, but was not working. Instead the  parameter "Topic" was deleted and "MainTopic" was set to disabled and  "undefined".
- ToDo
-  - Update the `html.zip`
-  - If old `config.ini` available: copy it to `/config`, open the graphical configuration and save it again.
-  - If old `config.ini` not available: reset the parameter "MainTopic" within the `config.ini` manually
-  - Reboot
-
-##### 10.1.0 - Stability Increase (2022-01-09)
-
- Reduce ESP32 frequency to 160MHz
-
- Update tflite (new source: https://github.com/espressif/tflite-micro-esp-examples)
-
- Update analog neural network (ana-s3-q-20220105.tflite)
-
- Update digital neural network (dig-s1-q-20220102.tflite)
-
- Increased web-server buffers
- bug fix: compiler compatibility
-
-##### 10.0.2 - Stability Increase (2022-01-01)
-
- NEW v10.0.2: Corrected JSON error
-
- Updated compiler toolchain to ESP-IDF 4.3
-
- Removal of memory leak
-
- Improved error handling during startup (check PSRAM and camera with remark in logfile)
-
- MQTT: implemented raw value additionally, removal of regex contrain
-
- Normalized Parameter ``MaxRateValue``  to "change per minute" 
-
- HTML: improved input handling
-
- Corrected error handling: in case of error the old value, rate, timestamp are not transmitted any more
-
-  
-
-##### 9.2.0 - External Illumination (2021-12-02)
-
- Direct JSON access: ``http://IP-ADRESS/json`` 
- Error message in log file in case camera error during startup
- Upgrade analog CNN to v9.1.0
- Upgrade digital CNN to v13.3.0 (added new images)
- html: support of different ports
-
-##### 9.1.1 - External Illumination (2021-11-16)
-
- NEW 9.1.1 bug fix: LED implemenetation
- External LEDs: change control mode (resolve bug with more than 2 LEDs)
- Additional info into log file
- Bug fix: decimal shift, html, log file
-
-##### 9.0.0 - External Illumination (2021-10-23)
-
-* Implementation of external illumination to adjust positioning, brightness and color of the illumination now individually
-  * Technical details can be found in the wiki: https://github.com/jomjol/AI-on-the-edge-device/wiki/External-LED
-  <img src="https://raw.githubusercontent.com/jomjol/ai-on-the-edge-device/master/images/intern_vs_external.jpg" width="500">
-* New housing published for external LEDs and small clearing: https://www.thingiverse.com/thing:5028229
+* Logfile downloader and combiner (Thx to [reserve85](https://github.com/reserve85))
+  * Files see ['/tools/logfile-tool'](tbd), How-to see [wiki](https://github.com/jomjol/AI-on-the-edge-device/wiki/Gasmeter-Log-Downloader)



+## Additional Ideas

-
-
-## Additional ideas
-
-There are some ideas and feature request, which are not followed currently - mainly due to capacity reasons on side of the developer. They are collected here: [FeatureRequest.md](FeatureRequest.md)
+There are some ideas and feature requests which are not followed currently - mainly due to capacity reasons on side of the developer. They are collected here: [FeatureRequest.md](FeatureRequest.md)



@@ -187,7 +92,11 @@ There are some ideas and feature request, which are not followed currently - mai

 ## History

-##### 8.5.0 - Multi Meter Support (2021-10-07)
+##### 10.6.2 - Stability Increase (2022-07-24)
+
+##### 9.2.0 - External Illumination (2021-12-02)
+
+##### 8.5.0 Multi Meter Support (2021-10-07)

 ##### 7.1.2 MQTT-Update - (2021-06-17)

--- a/code/components/esp-nn/.gitignore
+++ b/code/components/esp-nn/.gitignore
@@ -0,0 +1,57 @@
+.config
+*.o
+*.i
+*.s
+*.orig
+*.pyc
+
+# gtags
+GTAGS
+GRTAGS
+GPATH
+
+# emacs
+.dir-locals.el
+
+# emacs temp file suffixes
+*~
+.#*
+\#*#
+
+# eclipse setting
+.settings
+
+# MacOS directory files
+.DS_Store
+
+# Example project files
+examples/**/sdkconfig
+examples/**/sdkconfig.old
+examples/**/build
+
+# Test app files
+test_app/build
+test_app/sdkconfig
+test_app/sdkconfig.old
+
+# Doc build artifacts
+docs/_build/
+docs/doxygen-warning-log.txt
+docs/sphinx-warning-log.txt
+docs/sphinx-warning-log-sanitized.txt
+docs/xml/
+docs/xml_in/
+docs/man/
+docs/doxygen_sqlite3.db
+
+TEST_LOGS
+
+
+# gcov coverage reports
+*.gcda
+*.gcno
+coverage.info
+coverage_report/
+
+# VS Code Settings
+.vscode/
--- a/code/components/esp-nn/.gitlab-ci.yml
+++ b/code/components/esp-nn/.gitlab-ci.yml
@@ -0,0 +1,55 @@
+stages:
+  - build
+
+variables:
+  BATCH_BUILD: "1"
+  V: "0"
+  MAKEFLAGS: "-j8 --no-keep-going"
+  IDF_PATH: "$CI_PROJECT_DIR/esp-idf"
+  LOG_PATH: "$CI_PROJECT_DIR"
+
+.set_git_config: &set_git_config
+  # Set git config
+  - git config user.email "test@espressif.com"
+  - git config user.name "Espressif"
+
+.add_ssh_key: &add_ssh_key
+  # Add gitlab ssh key
+  - mkdir -p ~/.ssh
+  - chmod 700 ~/.ssh
+  - echo -n $GITLAB_KEY > ~/.ssh/id_rsa_base64
+  - base64 --decode --ignore-garbage ~/.ssh/id_rsa_base64 > ~/.ssh/id_rsa
+  - chmod 600 ~/.ssh/id_rsa
+  - echo -e "Host gitlab.espressif.cn\n\tStrictHostKeyChecking no\n" >> ~/.ssh/config
+
+before_script:
+  # Add gitlab ssh key
+  - *add_ssh_key
+  # Set git config
+  - *set_git_config
+
+.build_esp32s3: &build_esp32s3
+  - idf.py set-target esp32s3 build
+
+.build_esp32: &build_esp32
+  - idf.py set-target esp32 build
+
+build_demo:
+  stage: build
+  image: $CI_DOCKER_REGISTRY/esp32-ci-env:esp-nn
+  tags:
+    - build
+  script:
+    # Clone IDF
+    - git clone --recursive --single-branch -b release/v4.4 --reference-if-able /local_references/gitlab/ https://gitlab-ci-token:${BOT_TOKEN}@gitlab.espressif.cn:6688/espressif/esp-idf.git
+    - cd esp-idf
+    - ./install.sh
+    - . ./export.sh
+    - cd ..
+    # Build examples now
+    - cd test_app
+    # Build esp32s3
+    - *build_esp32s3
+    # Build esp32
+    - *build_esp32
+    - cd -
--- a/code/components/esp-nn/CMakeLists.txt
+++ b/code/components/esp-nn/CMakeLists.txt
@@ -0,0 +1,50 @@
+idf_build_get_property(idf_target IDF_TARGET)
+
+set(c_srcs
+    "src/activation_functions/esp_nn_relu_ansi.c"
+    "src/basic_math/esp_nn_add_ansi.c"
+    "src/basic_math/esp_nn_mul_ansi.c"
+    "src/convolution/esp_nn_conv_ansi.c"
+    "src/convolution/esp_nn_conv_opt.c"
+    "src/convolution/esp_nn_depthwise_conv_ansi.c"
+    "src/convolution/esp_nn_depthwise_conv_opt.c"
+    "src/fully_connected/esp_nn_fully_connected_ansi.c"
+    "src/softmax/esp_nn_softmax_ansi.c"
+    "src/softmax/esp_nn_softmax_opt.c"
+    "src/pooling/esp_nn_avg_pool_ansi.c"
+    "src/pooling/esp_nn_max_pool_ansi.c")
+
+if(CONFIG_IDF_TARGET_ESP32S3)
+    set(s3_srcs
+        "src/common/esp_nn_common_functions_esp32s3.S"
+        "src/common/esp_nn_multiply_by_quantized_mult_esp32s3.S"
+        "src/common/esp_nn_multiply_by_quantized_mult_ver1_esp32s3.S"
+        "src/activation_functions/esp_nn_relu_s8_esp32s3.S"
+        "src/basic_math/esp_nn_add_s8_esp32s3.S"
+        "src/basic_math/esp_nn_mul_s8_esp32s3.S"
+        "src/convolution/esp_nn_conv_esp32s3.c"
+        "src/convolution/esp_nn_depthwise_conv_s8_esp32s3.c"
+        "src/convolution/esp_nn_conv_s16_mult8_esp32s3.S"
+        "src/convolution/esp_nn_conv_s8_mult8_1x1_esp32s3.S"
+        "src/convolution/esp_nn_conv_s16_mult4_1x1_esp32s3.S"
+        "src/convolution/esp_nn_depthwise_conv_s8_mult1_3x3_padded_esp32s3.S"
+        "src/convolution/esp_nn_depthwise_conv_s16_mult1_esp32s3.S"
+        "src/convolution/esp_nn_depthwise_conv_s16_mult1_3x3_esp32s3.S"
+        "src/convolution/esp_nn_depthwise_conv_s16_mult1_3x3_no_pad_esp32s3.S"
+        "src/convolution/esp_nn_depthwise_conv_s16_mult8_3x3_esp32s3.S"
+        "src/convolution/esp_nn_depthwise_conv_s16_mult4_esp32s3.S"
+        "src/convolution/esp_nn_depthwise_conv_s16_mult8_esp32s3.S"
+        "src/fully_connected/esp_nn_fully_connected_s8_esp32s3.S"
+        "src/pooling/esp_nn_max_pool_s8_esp32s3.S"
+        "src/pooling/esp_nn_avg_pool_s8_esp32s3.S")
+endif()
+
+idf_component_register(SRCS "${c_srcs}"
+                            "${s3_srcs}"
+                       INCLUDE_DIRS "include" "src/common")
+
+if(CONFIG_IDF_TARGET_ESP32S3)
+    target_compile_options(${COMPONENT_LIB} PRIVATE -mlongcalls -fno-unroll-loops -O2 -Wno-unused-function)
+else()
+    target_compile_options(${COMPONENT_LIB} PRIVATE -Wno-unused-function)
+endif()
--- a/code/components/esp-nn/Kconfig.projbuild
+++ b/code/components/esp-nn/Kconfig.projbuild
@@ -0,0 +1,29 @@
+menu "ESP-NN"
+
+choice NN_OPTIMIZATIONS
+   bool "Optimization for nn functions"
+   default NN_OPTIMIZED
+   help
+      Use ANSI-C versions for verification and debug purpose.
+      Optimisations are automatically picked up for a chipset.
+      For ESP32-S3, assembly optimisations are selected.
+      For other platforms(viz., ESP32, ESP32-C3), generic optimisations are used.
+
+config NN_ANSI_C
+   bool "ANSI C"
+   help
+      ANSI C versions for verification and debug purposes.
+config NN_OPTIMIZED
+   bool "Optimized versions"
+   help
+      Optimisations are automatically picked up for a chipset.
+      For ESP32-S3, assembly optimisations are selected.
+      For other platforms(viz., ESP32, ESP32-C3), generic optimisations are used.
+endchoice
+
+config NN_OPTIMIZATIONS
+   int
+   default 0 if NN_ANSI_C
+   default 1 if NN_OPTIMIZED
+
+endmenu
--- a/code/components/esp-nn/LICENSE
+++ b/code/components/esp-nn/LICENSE
@@ -0,0 +1,202 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/code/components/esp-nn/README.md
+++ b/code/components/esp-nn/README.md
@@ -0,0 +1,55 @@
+# ESP-NN
+
+The library contains optimised NN (Neural Network) functions for various Espressif chipsets.
+
+* Supported platforms:
+   * TensorFlow Lite Micro (TFLite Micro). Repo can be found [here](https://github.com/espressif/tflite-micro-esp-examples)
+
+* Supported ESP chipsets include:
+   * ESP32-S3 (Assembly versions optimised to benefit from vector instructions of ESP32-S3)
+   * ESP32 (Generic optimisations)
+   * ESP32-C3 (Generic optimisations)
+
+## Performance
+
+### Kernelwise performance for s8 versions:
+
+  * Kernelwise performance on ESP32-S3 chip
+    * Numbers are ticks taken for kernel to execute
+    * Chip config: 240MHz, SPI: QPI 80MHz, Data cache: 64KB
+
+    | Function        | ANSI C  | ESP32-S3 Opt  | Opt Ratio | Data info   | Memory    |
+    | ----------------| --------|---------|---------|-------------|-----------|
+    | elementwise_add | 320397  | 87119   | 3.68    | size = 1615 | External  |
+    | elementwise_mul | 125958  | 44239   | 2.85    | size = 1615 | External  |
+    | convolution     | 4663012 | 428675  | 10.88   | input(10,10), filter(64x1x1x64) | External |
+    | convolution     | 301014  | 32433   | 9.28    | input(8,8), filter(16x1x1x16) | External |
+    | convolution     | 2115418 | 1020923 | 2.07    | input(10,10), filter(64x3x3x3) | External |
+    | depthwise conv  | 1190062 | 203278  | 5.85    | input (18, 18), pad(0,0), stride(1,1) filter: 1x3x3x16 | External |
+    | depthwise conv  | 837072  | 182335  | 4.59    | input (12, 12), pad(1,1), stride(1,1)  filter: 8x5x5x4 | External |
+    | max pool        | 485714  | 76747   | 6.33    | input(16,16), filter (1x3x3x16) | Internal |
+    | avg pool        | 541462  | 160580  | 3.37    | input(16,16), filter (1x3x3x16) | Internal |
+    | fully connected | 15853   | 9547    | 1.66    | len: 265, ch = 3 | Internal |
+    | prelu (relu6)   | 19472   | 2734    | 7.12    | size, 1615  | Internal  |
+
+
+## Configuration
+
+  * To configure, please use `idf.py menuconfig` and under `ESP-NN` select `NN_OPTIMIZATIONS`
+  * There are two options presented:
+     * Optimized versions
+     * ANSI C
+
+  * Default selection is for `Optimized versions`. For ESP32-S3, assembly versions are automatically selected, whereas for other chipsets (viz., ESP32, ESP32-C3), generic optimisations are selected.
+  * For debugging purposes, you may want to select `ANSI C` reference versions.
+
+
+## Contributing
+
+If you encounter an issue with ESP-NN, or wish to submit a feature request, please use the Issues section on the Github.
+
+For general questions related to this library, please use the esp32.com forum.
+
+## Copyrights and License
+
+All original source code in this repository is Copyright (C) 2020-2021 Espressif Systems. This source code is licensed under the Apache License 2.0 as described in the file LICENSE.
--- a/code/components/esp-nn/include/esp_nn.h
+++ b/code/components/esp-nn/include/esp_nn.h
@@ -0,0 +1,46 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#if defined(CONFIG_NN_OPTIMIZED)
+// select apt optimisations
+#ifdef CONFIG_IDF_TARGET_ESP32S3
+#define ARCH_ESP32_S3 1
+#endif
+#ifdef CONFIG_IDF_TARGET_ESP32
+#define ARCH_ESP32 1
+#endif
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* reference kernels included by default */
+#include "esp_nn_ansi_headers.h"
+
+#if defined(CONFIG_NN_OPTIMIZED)
+#if defined(ARCH_ESP32_S3)
+#include "esp_nn_esp32s3.h"
+#else // for other platforms use generic optimisations
+#include "esp_nn_generic_opt.h"
+#endif // #if defined(ARCH_ESP32_S3)
+#else
+#include "esp_nn_ansi_c.h"
+#endif
+
+#ifdef __cplusplus
+}
+#endif
--- a/code/components/esp-nn/include/esp_nn_ansi_c.h
+++ b/code/components/esp-nn/include/esp_nn_ansi_c.h
@@ -0,0 +1,47 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+/**
+ * @file        Header definitions to include for ANSI C versions.
+ *              These are just typedefs to pick up ANSI versions.
+ */
+
+#pragma once
+
+#include "esp_nn_defs.h"
+#include "esp_nn_ansi_headers.h"
+
+#define esp_nn_add_elementwise_s8 esp_nn_add_elementwise_s8_ansi
+#define esp_nn_mul_elementwise_s8 esp_nn_mul_elementwise_s8_ansi
+
+#define esp_nn_depthwise_conv_s8 esp_nn_depthwise_conv_s8_ansi
+
+#define esp_nn_conv_s8 esp_nn_conv_s8_ansi
+
+#define esp_nn_get_conv_scratch_size esp_nn_get_conv_scratch_size_ansi
+#define esp_nn_set_conv_scratch_buf esp_nn_set_conv_scratch_buf_ansi
+
+#define esp_nn_get_depthwise_conv_scratch_size esp_nn_get_depthwise_conv_scratch_size_ansi
+#define esp_nn_set_depthwise_conv_scratch_buf esp_nn_set_depthwise_conv_scratch_buf_ansi
+
+#define esp_nn_relu6_s8 esp_nn_relu6_s8_ansi
+
+#define esp_nn_avg_pool_s8 esp_nn_avg_pool_s8_ansi
+#define esp_nn_max_pool_s8 esp_nn_max_pool_s8_ansi
+
+#define esp_nn_fully_connected_s8 esp_nn_fully_connected_s8_ansi
+
+#define esp_nn_get_softmax_scratch_size esp_nn_get_softmax_scratch_size_ansi
+#define esp_nn_set_softmax_scratch_buf esp_nn_set_softmax_scratch_buf_ansi
+#define esp_nn_softmax_s8 esp_nn_softmax_s8_ansi
--- a/code/components/esp-nn/include/esp_nn_ansi_headers.h
+++ b/code/components/esp-nn/include/esp_nn_ansi_headers.h
@@ -0,0 +1,309 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+/**
+ * @file        Header definitions to include for esp_nn reference functions
+ */
+
+#include "esp_nn_defs.h"
+/************************** Basic math functions ****************************/
+
+/**
+ * @brief       elementwise addition
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ *
+ *              shift values are expected to be <= 0
+ */
+void esp_nn_add_elementwise_s8_ansi(const int8_t *input1_data,
+                                    const int8_t *input2_data,
+                                    const int32_t input1_offset,
+                                    const int32_t input2_offset,
+                                    const int32_t input1_mult,
+                                    const int32_t input2_mult,
+                                    const int32_t input1_shift,
+                                    const int32_t input2_shift,
+                                    const int32_t left_shift,
+                                    int8_t *output,
+                                    const int32_t out_offset,
+                                    const int32_t out_mult,
+                                    const int32_t out_shift,
+                                    const int32_t activation_min,
+                                    const int32_t activation_max,
+                                    const int32_t size);
+/**
+ * @brief       elementwise multiplication
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ *
+ *              output shift is expected to be <= 0
+ */
+void esp_nn_mul_elementwise_s8_ansi(const int8_t *input1_data,
+                                    const int8_t *input2_data,
+                                    const int32_t input1_offset,
+                                    const int32_t input2_offset,
+                                    int8_t *output,
+                                    const int32_t out_offset,
+                                    const int32_t out_mult,
+                                    const int32_t out_shift,
+                                    const int32_t activation_min,
+                                    const int32_t activation_max,
+                                    const int32_t size);
+
+
+/************************** Convolution functions *****************************/
+
+/**
+ * @brief       depthwise convolution per channel
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              Version used in tflite is per channel.
+ *              This version follows the same footsprints.
+ *              Meaning, it has per out_channel shift and multiplier for
+ *              requantization
+ *
+ *              optimization notes: Though input_offset is int32 type,
+ *              offset values are contained in 8 bits [-128, 127]
+ */
+void esp_nn_depthwise_conv_s8_ansi(const data_dims_t *input_dims,
+                                   const int8_t *input_data,
+                                   const data_dims_t *filter_dims,
+                                   const int8_t *filter_data,
+                                   const int32_t *bias,
+                                   const data_dims_t *output_dims,
+                                   int8_t *out_data,
+                                   const dw_conv_params_t *conv_params,
+                                   const quant_data_t *quant_data);
+
+/**
+ * @brief       2d-convolution channelwise
+ *
+ * @note        operation: result += (input + offset) * filter
+ *
+ *              inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_conv_s8_ansi(const data_dims_t *input_dims,
+                         const int8_t *input_data,
+                         const data_dims_t *filter_dims,
+                         const int8_t *filter_data,
+                         const int32_t *bias,
+                         const data_dims_t *output_dims,
+                         int8_t *out_data,
+                         const conv_params_t *conv_params,
+                         const quant_data_t *quant_data);
+
+int esp_nn_get_conv_scratch_size_ansi(const data_dims_t *input_dims,
+                                      const data_dims_t *filter_dims,
+                                      const data_dims_t *output_dims,
+                                      const conv_params_t *conv_params);
+void esp_nn_set_conv_scratch_buf_ansi(const void *buf);
+
+int esp_nn_get_depthwise_conv_scratch_size_ansi(const data_dims_t *input_dims,
+                                                const data_dims_t *filter_dims,
+                                                const data_dims_t *output_dims,
+                                                const dw_conv_params_t *conv_params);
+void esp_nn_set_depthwise_conv_scratch_buf_ansi(const void *buf);
+
+/************************** Activation functions *****************************/
+
+/**
+ * @brief       relu6
+ *
+ * @note        inout: int8_t
+ */
+void esp_nn_relu6_s8_ansi(int8_t *data, uint16_t size);
+
+/************************** Pooling functions *****************************/
+
+
+/**
+ * @brief       max_pool
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_max_pool_s8_ansi(const int8_t *input,
+                             const uint16_t input_wd,
+                             const uint16_t input_ht,
+                             int8_t *output,
+                             const uint16_t output_wd,
+                             const uint16_t output_ht,
+                             const uint16_t stride_wd,
+                             const uint16_t stride_ht,
+                             const uint16_t filter_wd,
+                             const uint16_t filter_ht,
+                             const uint16_t pad_wd,
+                             const uint16_t pad_ht,
+                             const int32_t activation_min,
+                             const int32_t activation_max,
+                             const uint16_t channels);
+
+/**
+ * @brief       avg_pool
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_avg_pool_s8_ansi(const int8_t *input,
+                             const uint16_t input_wd,
+                             const uint16_t input_ht,
+                             int8_t *output,
+                             const uint16_t output_wd,
+                             const uint16_t output_ht,
+                             const uint16_t stride_wd,
+                             const uint16_t stride_ht,
+                             const uint16_t filter_wd,
+                             const uint16_t filter_ht,
+                             const uint16_t pad_wd,
+                             const uint16_t pad_ht,
+                             const int32_t activation_min,
+                             const int32_t activation_max,
+                             const uint16_t channels);
+
+
+/************************** Fully connected functions ***********************/
+
+/**
+ * @brief       fully connected
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_fully_connected_s8_ansi(const int8_t *input_data,
+                                    const int32_t input_offset,
+                                    const uint16_t row_len,
+                                    const int8_t *filter_data,
+                                    const int32_t filter_offset,
+                                    const int32_t *bias,
+                                    int8_t *out_data,
+                                    const uint16_t out_channels,
+                                    const int32_t out_offset,
+                                    const int32_t out_shift,
+                                    const int32_t out_mult,
+                                    const int32_t activation_min,
+                                    const int32_t activation_max);
+
+/**
+ * @brief   Get scratch buffer size needed by softmax function
+ *
+ * @param   width
+ * @param   height
+ * @return  size in bytes
+ *
+ * @note    buffer must be 4 byte aligned
+ */
+int32_t esp_nn_get_softmax_scratch_size_ansi(const int32_t width, const int32_t height);
+
+/* ANSI C function to be hooked up when optimised version needed */
+int32_t esp_nn_get_softmax_scratch_size_opt(const int32_t width, const int32_t height);
+
+/**
+ * @brief   Set scratch buffer to be used by softmax function
+ *
+ * @param   buffer  this can be NULL if one needs to unset it
+ *                  must be aligned to 4 bytes
+ */
+void esp_nn_set_softmax_scratch_buf_ansi(void *buffer);
+
+/**
+ * @brief       reference softmax function
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ */
+void esp_nn_softmax_s8_ansi(const int8_t *input_data,
+                            const int32_t height,
+                            const int32_t width,
+                            const int32_t mult,
+                            const int32_t shift,
+                            const int32_t diff_min,
+                            int8_t *output_data);
+
+
+//////////////////////////// Generic optimisations /////////////////////////////
+
+/************************** Convolution functions *****************************/
+
+/**
+ * @brief       2d-convolution channelwise optimized version
+ *
+ * @note        operation: result += (input + offset) * filter
+ *
+ *              inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_conv_s8_opt(const data_dims_t *input_dims,
+                        const int8_t *input_data,
+                        const data_dims_t *filter_dims,
+                        const int8_t *filter_data,
+                        const int32_t *bias,
+                        const data_dims_t *output_dims,
+                        int8_t *out_data,
+                        const conv_params_t *conv_params,
+                        const quant_data_t *quant_data);
+
+/**
+ * @brief       depthwise convolution per channel optimized version
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              Version used in tflite is per channel.
+ *              This version follows the same footsprints.
+ *              Meaning, it has per out_channel shift and multiplier for
+ *              requantization
+ *
+ *              optimization notes: Though input_offset is int32 type,
+ *              offset values are contained in 8 bits [-128, 127]
+ */
+void esp_nn_depthwise_conv_s8_opt(const data_dims_t *input_dims,
+                                  const int8_t *input_data,
+                                  const data_dims_t *filter_dims,
+                                  const int8_t *filter_data,
+                                  const int32_t *bias,
+                                  const data_dims_t *output_dims,
+                                  int8_t *out_data,
+                                  const dw_conv_params_t *conv_params,
+                                  const quant_data_t *quant_data);
+
+int esp_nn_get_conv_scratch_size_opt(const data_dims_t *input_dims,
+                                     const data_dims_t *filter_dims,
+                                     const data_dims_t *output_dims,
+                                     const conv_params_t *conv_params);
+void esp_nn_set_conv_scratch_buf_opt(const void *buf);
+
+int esp_nn_get_depthwise_conv_scratch_size_opt(const data_dims_t *input_dims,
+                                               const data_dims_t *filter_dims,
+                                               const data_dims_t *output_dims,
+                                               const dw_conv_params_t *conv_params);
+void esp_nn_set_depthwise_conv_scratch_buf_opt(const void *buf);
+
+/* ANSI C function to be hooked up when optimised version needed */
+void esp_nn_set_softmax_scratch_buf_opt(void *buffer);
+
+/**
+ * @brief       optimised version of softmax function
+ *
+ * @note        the function uses extra buffer (4 * width bytes)
+ *              hence, scratch buffers must be set before calling this.
+ */
+void esp_nn_softmax_s8_opt(const int8_t *input_data,
+                           const int32_t height,
+                           const int32_t width,
+                           const int32_t mult,
+                           const int32_t shift,
+                           const int32_t diff_min,
+                           int8_t *output_data);
--- a/code/components/esp-nn/include/esp_nn_defs.h
+++ b/code/components/esp-nn/include/esp_nn_defs.h
@@ -0,0 +1,83 @@
+// Copyright 2022 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include <stdint.h>
+
+/**
+ * @brief structure to club data dims
+ * this structure can be used for input, output and filter
+ */
+typedef struct data_dims {
+    int32_t width;
+    int32_t height;
+    int32_t channels;
+
+    int32_t extra; // can be used as batch or any other param
+} data_dims_t;
+
+/**
+ * @brief 2d data structure (width, height)
+ *
+ */
+typedef struct data_2d {
+    int32_t width;
+    int32_t height;
+} data_2d_t;
+
+/**
+ * @brief min/max activation
+ */
+typedef struct act_params {
+    int32_t min;
+    int32_t max;
+} act_params_t;
+
+/**
+ * @brief per channel quant data
+ *
+ * @note number of shift and mult elements are equal to output channels
+ */
+typedef struct quant_data {
+    int32_t *shift;
+    int32_t *mult;
+} quant_data_t;
+
+/**
+ * @brief params specific to convolution 2d
+ *
+ */
+typedef struct conv_params {
+    int32_t in_offset;
+    int32_t out_offset;
+    data_2d_t stride;
+    data_2d_t padding;
+    data_2d_t dilation;
+    act_params_t activation;
+} conv_params_t;
+
+/**
+ * @brief params specific to depthwise convolution 2d
+ *
+ */
+typedef struct dw_conv_params {
+    int32_t in_offset;
+    int32_t out_offset;
+    int32_t ch_mult; // channel multiplier. (in_ch * ch_mult = out_ch)
+    data_2d_t stride;
+    data_2d_t padding;
+    data_2d_t dilation;
+    act_params_t activation;
+} dw_conv_params_t;
--- a/code/components/esp-nn/include/esp_nn_esp32s3.h
+++ b/code/components/esp-nn/include/esp_nn_esp32s3.h
@@ -0,0 +1,231 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+/**
+ * @file        Header definitions to include for esp_nn optimized functions for
+ *              the ESP32-S3 platform
+ */
+
+#pragma once
+
+#include "esp_nn_defs.h"
+#include "esp_nn_ansi_headers.h"
+
+/************************** Basic math functions *****************************/
+
+
+/**
+ * @brief       elementwise addition
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ *
+ *              shift values are expected to be <= 0
+ */
+void esp_nn_add_elementwise_s8_esp32s3(const int8_t *input1_data,
+                                       const int8_t *input2_data,
+                                       const int32_t input1_offset,
+                                       const int32_t input2_offset,
+                                       const int32_t input1_mult,
+                                       const int32_t input2_mult,
+                                       const int32_t input1_shift,
+                                       const int32_t input2_shift,
+                                       const int32_t left_shift,
+                                       int8_t *output,
+                                       const int32_t out_offset,
+                                       const int32_t out_mult,
+                                       const int32_t out_shift,
+                                       const int32_t activation_min,
+                                       const int32_t activation_max,
+                                       const int32_t size);
+
+/**
+ * @brief       elementwise multiplication
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ *
+ *              output shift is expected to be <= 0
+ */
+void esp_nn_mul_elementwise_s8_esp32s3(const int8_t *input1_data,
+                                       const int8_t *input2_data,
+                                       const int32_t input1_offset,
+                                       const int32_t input2_offset,
+                                       int8_t *output,
+                                       const int32_t out_offset,
+                                       const int32_t out_mult,
+                                       const int32_t out_shift,
+                                       const int32_t activation_min,
+                                       const int32_t activation_max,
+                                       const int32_t size);
+
+
+/************************** Convolution functions *****************************/
+
+/**
+ * @brief       depthwise convolution per channel
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              Version used in tflite is per channel.
+ *              This version follows the same footsprints.
+ *              Meaning, it has per out_channel shift and multiplier for
+ *              requantization
+ *
+ *              optimization notes: Though input_offset is int32 type,
+ *              offset values are contained in 8 bits [-128, 127]
+ */
+void esp_nn_depthwise_conv_s8_esp32s3(const data_dims_t *input_dims,
+                                      const int8_t *input_data,
+                                      const data_dims_t *filter_dims,
+                                      const int8_t *filter_data,
+                                      const int32_t *bias,
+                                      const data_dims_t *output_dims,
+                                      int8_t *output_data,
+                                      const dw_conv_params_t *conv_params,
+                                      const quant_data_t *quant_data);
+
+/**
+ * @brief       2d - convolution channelwise
+ *
+ * @note        operation: result += (input + offset) * filter
+ *
+ *              inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_conv_s8_esp32s3(const data_dims_t *input_dims,
+                            const int8_t *input_data,
+                            const data_dims_t *filter_dims,
+                            const int8_t *filter_data,
+                            const int32_t *bias,
+                            const data_dims_t *output_dims,
+                            int8_t *output_data,
+                            const conv_params_t *conv_params,
+                            const quant_data_t *quant_data);
+
+int esp_nn_get_conv_scratch_size_esp32s3(const data_dims_t *input_dims,
+                                         const data_dims_t *filter_dims,
+                                         const data_dims_t *output_dims,
+                                         const conv_params_t *conv_params);
+void esp_nn_set_conv_scratch_buf_esp32s3(const void *buf);
+
+int esp_nn_get_depthwise_conv_scratch_size_esp32s3(const data_dims_t *input_dims,
+                                                   const data_dims_t *filter_dims,
+                                                   const data_dims_t *output_dims,
+                                                   const dw_conv_params_t *conv_params);
+void esp_nn_set_depthwise_conv_scratch_buf_esp32s3(const void *buf);
+
+/************************** Pooling functions *****************************/
+
+/**
+ * @brief       max_pool
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_max_pool_s8_esp32s3(const int8_t *input,
+                                const uint16_t input_wd,
+                                const uint16_t input_ht,
+                                int8_t *output,
+                                const uint16_t output_wd,
+                                const uint16_t output_ht,
+                                const uint16_t stride_wd,
+                                const uint16_t stride_ht,
+                                const uint16_t filter_wd,
+                                const uint16_t filter_ht,
+                                const uint16_t pad_wd,
+                                const uint16_t pad_ht,
+                                const int32_t activation_min,
+                                const int32_t activation_max,
+                                const uint16_t channels);
+
+/**
+ * @brief       avg_pool
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ */
+void esp_nn_avg_pool_s8_esp32s3(const int8_t *input,
+                                const uint16_t input_wd,
+                                const uint16_t input_ht,
+                                int8_t *output,
+                                const uint16_t output_wd,
+                                const uint16_t output_ht,
+                                const uint16_t stride_wd,
+                                const uint16_t stride_ht,
+                                const uint16_t filter_wd,
+                                const uint16_t filter_ht,
+                                const uint16_t pad_wd,
+                                const uint16_t pad_ht,
+                                const int32_t activation_min,
+                                const int32_t activation_max,
+                                const uint16_t channels);
+
+
+/************************** Fully connected functions *****************************/
+
+/**
+ * @brief       fully connected
+ *
+ * @note        inputs type: int8_t, output: int8_t
+ *              input offsets: although int32_t, they are contained in 8 bits [-128, 127]
+ *
+ *              Current version works only on aligned input.
+ *              row_len and channels should both be multiple of 8.
+ */
+void esp_nn_fully_connected_s8_esp32s3(const int8_t *input_data,
+                                       const int32_t input_offset,
+                                       const uint16_t row_len,
+                                       const int8_t *filter_data,
+                                       const int32_t filter_offset,
+                                       const int32_t *bias,
+                                       int8_t *out_data,
+                                       const uint16_t out_channels,
+                                       const int32_t out_offset,
+                                       const int32_t out_shift,
+                                       const int32_t out_mult,
+                                       const int32_t activation_min,
+                                       const int32_t activation_max);
+
+/**
+ * @brief       relu6
+ *
+ * @note        inout: int8_t
+ */
+void esp_nn_relu6_s8_esp32s3(int8_t *data, uint16_t size);
+
+/********************** function defines ***************************/
+
+#define esp_nn_add_elementwise_s8 esp_nn_add_elementwise_s8_esp32s3
+#define esp_nn_mul_elementwise_s8 esp_nn_mul_elementwise_s8_esp32s3
+
+#define esp_nn_depthwise_conv_s8 esp_nn_depthwise_conv_s8_esp32s3
+
+#define esp_nn_get_conv_scratch_size esp_nn_get_conv_scratch_size_esp32s3
+#define esp_nn_set_conv_scratch_buf esp_nn_set_conv_scratch_buf_esp32s3
+
+#define esp_nn_get_depthwise_conv_scratch_size esp_nn_get_depthwise_conv_scratch_size_esp32s3
+#define esp_nn_set_depthwise_conv_scratch_buf esp_nn_set_depthwise_conv_scratch_buf_esp32s3
+
+#define esp_nn_conv_s8 esp_nn_conv_s8_esp32s3
+
+#define esp_nn_relu6_s8 esp_nn_relu6_s8_esp32s3
+
+#define esp_nn_avg_pool_s8 esp_nn_avg_pool_s8_esp32s3
+#define esp_nn_max_pool_s8 esp_nn_max_pool_s8_esp32s3
+
+#define esp_nn_fully_connected_s8 esp_nn_fully_connected_s8_esp32s3
+
+#define esp_nn_get_softmax_scratch_size esp_nn_get_softmax_scratch_size_opt
+#define esp_nn_set_softmax_scratch_buf esp_nn_set_softmax_scratch_buf_opt
+#define esp_nn_softmax_s8 esp_nn_softmax_s8_opt
--- a/code/components/esp-nn/include/esp_nn_generic_opt.h
+++ b/code/components/esp-nn/include/esp_nn_generic_opt.h
@@ -0,0 +1,47 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+/**
+ * @file        Header definitions to include for esp_nn generic optimisations
+ *              For functions which not having optimisations, _ansi versions are picked.
+ */
+
+#pragma once
+
+#include "esp_nn_defs.h"
+#include "esp_nn_ansi_headers.h"
+
+#define esp_nn_add_elementwise_s8 esp_nn_add_elementwise_s8_ansi
+#define esp_nn_mul_elementwise_s8 esp_nn_mul_elementwise_s8_ansi
+
+#define esp_nn_depthwise_conv_s8 esp_nn_depthwise_conv_s8_opt
+
+#define esp_nn_conv_s8 esp_nn_conv_s8_opt
+
+#define esp_nn_get_conv_scratch_size esp_nn_get_conv_scratch_size_opt
+#define esp_nn_set_conv_scratch_buf esp_nn_set_conv_scratch_buf_opt
+
+#define esp_nn_get_depthwise_conv_scratch_size esp_nn_get_depthwise_conv_scratch_size_opt
+#define esp_nn_set_depthwise_conv_scratch_buf esp_nn_set_depthwise_conv_scratch_buf_opt
+
+#define esp_nn_relu6_s8 esp_nn_relu6_s8_ansi
+
+#define esp_nn_avg_pool_s8 esp_nn_avg_pool_s8_ansi
+#define esp_nn_max_pool_s8 esp_nn_max_pool_s8_ansi
+
+#define esp_nn_fully_connected_s8 esp_nn_fully_connected_s8_ansi
+
+#define esp_nn_get_softmax_scratch_size esp_nn_get_softmax_scratch_size_opt
+#define esp_nn_set_softmax_scratch_buf esp_nn_set_softmax_scratch_buf_opt
+#define esp_nn_softmax_s8 esp_nn_softmax_s8_opt
--- a/code/components/esp-nn/src/activation_functions/esp_nn_relu_ansi.c
+++ b/code/components/esp-nn/src/activation_functions/esp_nn_relu_ansi.c
@@ -0,0 +1,30 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <stdlib.h>
+
+#include <common_functions.h>
+
+void esp_nn_relu6_s8_ansi(int8_t *data, uint16_t size)
+{
+    int32_t i;
+
+    for (i = 0; i < size; i++) {
+        int32_t ip = data[i];
+
+        ip = max(ip, 0);
+        data[i] = min(ip, 6);
+    }
+}
--- a/code/components/esp-nn/src/basic_math/esp_nn_add_ansi.c
+++ b/code/components/esp-nn/src/basic_math/esp_nn_add_ansi.c
@@ -0,0 +1,97 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+
+#include <common_functions.h>
+
+void esp_nn_add_elementwise_u8_ansi(const uint8_t *input1_data,
+                                    const uint8_t *input2_data,
+                                    const int32_t input1_offset,
+                                    const int32_t input2_offset,
+                                    const int32_t input1_mult,
+                                    const int32_t input2_mult,
+                                    const int32_t input1_shift,
+                                    const int32_t input2_shift,
+                                    const int32_t left_shift,
+                                    uint8_t *output,
+                                    const int32_t out_offset,
+                                    const int32_t out_mult,
+                                    const int32_t out_shift,
+                                    const int32_t activation_min,
+                                    const int32_t activation_max,
+                                    const int32_t size)
+{
+    for (int i = 0; i < size; i++) {
+        int32_t tmp1 = input1_data[i] + input1_offset;
+        int32_t tmp2 = input2_data[i] + input2_offset;
+
+        tmp1 <<= left_shift;
+        tmp2 <<= left_shift;
+
+        tmp1 = esp_nn_sat_round_doubling_high_mul(tmp1, input1_mult);
+        tmp2 = esp_nn_sat_round_doubling_high_mul(tmp2, input2_mult);
+
+        tmp1 = esp_nn_div_by_power_of_two(tmp1, -input1_shift);
+        tmp2 = esp_nn_div_by_power_of_two(tmp2, -input2_shift);
+
+        int32_t out = tmp1 + tmp2;
+        out = esp_nn_sat_round_doubling_high_mul(out, out_mult);
+        out = esp_nn_div_by_power_of_two(out, -out_shift);
+        out = out + out_offset;
+
+        out = max(activation_min, min(out, activation_max));
+        output[i] = (uint8_t) out;
+    }
+}
+
+void esp_nn_add_elementwise_s8_ansi(const int8_t *input1_data,
+                                    const int8_t *input2_data,
+                                    const int32_t input1_offset,
+                                    const int32_t input2_offset,
+                                    const int32_t input1_mult,
+                                    const int32_t input2_mult,
+                                    const int32_t input1_shift,
+                                    const int32_t input2_shift,
+                                    const int32_t left_shift,
+                                    int8_t *output,
+                                    const int32_t out_offset,
+                                    const int32_t out_mult,
+                                    const int32_t out_shift,
+                                    const int32_t activation_min,
+                                    const int32_t activation_max,
+                                    const int32_t size)
+{
+    for (int i = 0; i < size; i++) {
+        int32_t tmp1 = input1_data[i] + input1_offset;
+        int32_t tmp2 = input2_data[i] + input2_offset;
+
+        tmp1 <<= left_shift;
+        tmp2 <<= left_shift;
+
+        tmp1 = esp_nn_sat_round_doubling_high_mul(tmp1, input1_mult);
+        tmp2 = esp_nn_sat_round_doubling_high_mul(tmp2, input2_mult);
+
+        tmp1 = esp_nn_div_by_power_of_two(tmp1, -input1_shift);
+        tmp2 = esp_nn_div_by_power_of_two(tmp2, -input2_shift);
+
+        int32_t out = tmp1 + tmp2;
+        out = esp_nn_sat_round_doubling_high_mul(out, out_mult);
+        out = esp_nn_div_by_power_of_two(out, -out_shift);
+        out = out + out_offset;
+
+        out = max(activation_min, min(out, activation_max));
+        output[i] = (int8_t) out;
+    }
+}
--- a/code/components/esp-nn/src/basic_math/esp_nn_mul_ansi.c
+++ b/code/components/esp-nn/src/basic_math/esp_nn_mul_ansi.c
@@ -0,0 +1,42 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+
+#include <common_functions.h>
+
+void esp_nn_mul_elementwise_s8_ansi(const int8_t *input1_data,
+                                    const int8_t *input2_data,
+                                    const int32_t input1_offset,
+                                    const int32_t input2_offset,
+                                    int8_t *output,
+                                    const int32_t out_offset,
+                                    const int32_t out_mult,
+                                    const int32_t out_shift,
+                                    const int32_t activation_min,
+                                    const int32_t activation_max,
+                                    const int32_t size)
+{
+    for (int i = 0; i < size; i++) {
+        int32_t tmp1 = input1_data[i] + input1_offset;
+        int32_t tmp2 = input2_data[i] + input2_offset;
+
+        int32_t out = tmp1 * tmp2;
+        out = esp_nn_multiply_by_quantized_mult(out, out_mult, out_shift);
+        out = out + out_offset;
+
+        out = max(activation_min, min(out, activation_max));
+        output[i] = (int8_t) out;
+    }
+}
--- a/code/components/esp-nn/src/common/common_functions.h
+++ b/code/components/esp-nn/src/common/common_functions.h
@@ -0,0 +1,255 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <string.h>
+
+/**
+ * c99 standard still doesn't strictly inline functions
+ * We need to use attribute as well to do this.
+ */
+#define __NN_FORCE_INLINE__ __attribute((always_inline)) static inline
+
+/* min/max macros */
+#ifndef max
+#define max(a, b) ({            \
+    __typeof__ (a) _a = (a);    \
+    __typeof__ (b) _b = (b);    \
+    _a > _b ? _a : _b;          \
+})
+
+#define min(a, b) ({            \
+    __typeof__ (a) _a = (a);    \
+    __typeof__ (b) _b = (b);    \
+    _a < _b ? _a : _b;          \
+})
+#endif
+
+__NN_FORCE_INLINE__ int32_t esp_nn_clz32(uint32_t in)
+{
+#if CONFIG_IDF_TARGET_ARCH_XTENSA
+    __asm__ volatile("nsau %0, %0" : "+r" (in));
+    return in;
+#elif defined(__GNUC__)
+    return __builtin_clz(in);
+#else
+    int32_t count = 32;
+    uint32_t x = in, y = in >> 16;
+    if (y != 0) {
+        count -= 16;
+        x = y;
+    }
+    y = x >> 8;
+    if (y != 0) {
+        count -= 8;
+        x = y;
+    }
+    y = x >> 4;
+    if (y != 0) {
+        count -= 4;
+        x = y;
+    }
+    y = x >> 2;
+    if (y != 0) {
+        count -= 2;
+        x = y;
+    }
+    y = x >> 1;
+    if (y != 0) {
+        return count - 2;
+    }
+    return count - x;
+#endif
+}
+
+/**
+ * Signed saturate a 32 bit value to 8 bits keeping output in 32 bit variable.
+ */
+__NN_FORCE_INLINE__ int32_t esp_nn_saturate8(int32_t in)
+{
+#if CONFIG_IDF_TARGET_ARCH_XTENSA
+    __asm__ volatile("clamps %0, %0, 7" : "+a"(in));
+    return in;
+#else
+    return max(INT8_MIN, min(in, INT8_MAX));
+#endif
+}
+
+__NN_FORCE_INLINE__ int32_t esp_nn_pick_sat_high32_of64(int64_t val64)
+{
+    int32_t sign = (int32_t) (val64 >> 63);
+    int32_t to_add = sign & ((1ul << 31) - 1);
+    return (int32_t) ((int64_t) (val64 + to_add) >> 31);
+}
+
+__NN_FORCE_INLINE__ int32_t esp_nn_sat_round_doubling_high_mul(int32_t in0, int32_t in1)
+{
+    int32_t result;
+    int64_t in0_64 = (int64_t) in0;
+    bool overflow = (in0 == in1) && (in0 == (int32_t) INT32_MIN);
+
+    /* Nudge value */
+    int64_t nudge_val = 1 << 30;
+    if ((in0 < 0) ^ (in1 < 0)) {
+        nudge_val = 1 - nudge_val;
+    }
+
+    /* Multiply and add nudge */
+    int64_t mult = in0_64 * in1 + nudge_val;
+
+    /* Round and pickup 32 bits */
+    result = esp_nn_pick_sat_high32_of64(mult);
+
+    return overflow ? INT32_MAX : result;
+}
+
+/**
+ * fast version
+ * this will fail for values closer to INT32_MAX and INT32_MIN by `1 << (exponent - 1)`.
+ * We can afford to do this because we are at the very last stage of filter.
+ * Also it is pretty rare condition as our output is going to be 8 bit.
+ */
+__NN_FORCE_INLINE__ int32_t esp_nn_div_by_power_of_two_fast(int32_t val, int32_t exponent)
+{
+    int32_t to_add = (1 << (exponent - 1)) - (val < 0);
+    return (int32_t) ((val + to_add) >> exponent);
+}
+
+__NN_FORCE_INLINE__ int32_t esp_nn_div_by_power_of_two(int32_t val, int32_t exponent)
+{
+    int32_t result;
+
+    const int32_t mask = (1 << exponent) - 1;
+    const int32_t remainder = val & mask;
+
+    result = val >> exponent;
+    int32_t threshold = (mask >> 1) + (result < 0);
+
+    if (remainder > threshold) {
+        result += 1;
+    }
+    return result;
+}
+
+__NN_FORCE_INLINE__ int32_t esp_nn_multiply_by_quantized_mult(int32_t x, int32_t mult, int32_t shift)
+{
+    int32_t left_shift = shift > 0 ? shift : 0;
+    int32_t right_shift = shift > 0 ? 0 : -shift;
+    int32_t result = esp_nn_sat_round_doubling_high_mul(x * (1 << left_shift), mult);
+    return esp_nn_div_by_power_of_two(result, right_shift);
+}
+
+__NN_FORCE_INLINE__ int32_t esp_nn_multiply_by_quantized_mult_fast(int32_t x, int32_t mult, int32_t shift)
+{
+    int32_t left_shift = max(shift, 0);
+    int32_t right_shift = left_shift - shift;
+
+    int64_t nudge_val = 1 << 30;
+    int64_t in0_64 = (int64_t) (x << left_shift);
+
+    /* Multiply and add nudge */
+    int64_t mult_64 = in0_64 * mult + nudge_val;
+    int32_t result = (int32_t) (mult_64 >> 31);
+    if (right_shift) {
+        result = esp_nn_div_by_power_of_two_fast(result, right_shift);
+    }
+    return result;
+}
+
+static void esp_nn_aligned_s8_pad_with_value(const int8_t *src, int8_t *dst,
+                                             const uint16_t input_wd,
+                                             const uint16_t input_ht,
+                                             const uint16_t channels,
+                                             const int32_t pad_val,
+                                             const uint16_t pad_wd,
+                                             const uint16_t pad_ht)
+{
+    /* memset with pad_val */
+    memset(dst, pad_val, ((input_wd + 2 * pad_wd) * (input_ht + 2 * pad_ht)) * channels);
+    dst += (pad_wd + input_wd + pad_wd) * channels;
+
+    for (int i = 0; i < input_ht; i++) {
+        dst += pad_wd * channels;
+        for (int j = 0; j < input_wd * channels; j++) {
+            *dst++ = *src++;
+        }
+        dst += pad_wd * channels;
+    }
+}
+
+static void esp_nn_aligned_s8_pad_end_with_value(const int8_t *src, int8_t *dst,
+                                                 const uint16_t input_wd,
+                                                 const uint16_t input_ht,
+                                                 const uint16_t channels,
+                                                 const int32_t pad_val,
+                                                 const uint16_t pad_wd,
+                                                 const uint16_t pad_ht)
+{
+    for (int i = 0; i < input_ht; i++) {
+        for (int j = 0; j < input_wd * channels; j++) {
+            *dst++ = *src++;
+        }
+        if (pad_wd) {
+            memset(dst, pad_val, pad_wd * channels);
+            dst += pad_wd * channels;
+        }
+    }
+    /* pad end `pad_ht` lines at end */
+    if (pad_ht) {
+        memset(dst, pad_val, (input_wd + pad_wd) * pad_ht * channels);
+    }
+}
+
+/**
+ * @brief       convert 8 bit input data to 16 bit
+ *
+ * @param       src int8_t source data
+ * @param       dst int16_t dst data
+ * @param       size length of data
+ * @param       offset  offset to be added to src data. Range: [-128, 127]
+ */
+__NN_FORCE_INLINE__ void esp_nn_s8_to_s16_with_offset(const int8_t *src, int16_t *dst,
+                                                      const int size, const int32_t offset)
+{
+    int i = 0;
+    for (; i < size; i += 2) {
+        dst[i + 0] = src[i + 0] + offset;
+        dst[i + 1] = src[i + 1] + offset;
+    }
+    if(i < size) {
+        dst[i] = src[i] + offset;
+    }
+}
+
+/**
+ * @brief       convert 8 bit input data to 16 bit
+ *
+ * @param       src int8_t source data
+ * @param       dst int16_t dst data
+ * @param       size length of data
+ */
+__NN_FORCE_INLINE__ void esp_nn_s8_to_s16(const int8_t *src, int16_t *dst, const int size)
+{
+    int i = 0;
+    for (; i < size; i += 2) {
+        dst[i + 0] = src[i + 0];
+        dst[i + 1] = src[i + 1];
+    }
+    if(i < size) {
+        dst[i] = src[i];
+    }
+}
--- a/code/components/esp-nn/src/convolution/esp_nn_conv_ansi.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_conv_ansi.c
@@ -0,0 +1,179 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <esp_nn_defs.h>
+
+#include <common_functions.h>
+
+int esp_nn_get_conv_scratch_size_ansi(const data_dims_t *input_dims,
+                                      const data_dims_t *filter_dims,
+                                      const data_dims_t *output_dims,
+                                      const conv_params_t *conv_params)
+{
+    return 0;
+}
+
+void esp_nn_set_conv_scratch_buf_ansi(const void *buf)
+{
+
+}
+
+/**
+ * Assumption 1: i/p channels == o/p channels
+ * Assumption 2: Pointers are valid
+ * Assumption 3: dialation width = 1
+ */
+void esp_nn_conv_u8_ansi(const uint8_t *input_data,
+                         const uint16_t input_wd,
+                         const uint16_t input_ht,
+                         const uint16_t in_channels,
+                         const int32_t input_offset,
+                         const uint16_t pad_wd,
+                         const uint16_t pad_ht,
+                         const uint16_t stride_wd,
+                         const uint16_t stride_ht,
+                         const uint8_t *filter_data,
+                         const uint16_t filter_wd,
+                         const uint16_t filter_ht,
+                         const int32_t filter_offset,
+                         const int32_t *bias,
+                         uint8_t *out_data,
+                         const uint16_t out_wd,
+                         const uint16_t out_ht,
+                         const uint16_t out_channels,
+                         const int32_t out_offset,
+                         const int32_t out_shift,
+                         const int32_t out_mult,
+                         const int32_t activation_min,
+                         const int32_t activation_max)
+{
+    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
+        const int16_t base_y = (out_y * stride_ht) - pad_ht;
+        for (int out_x = 0; out_x < out_wd; out_x++) { //width_loop
+            const int16_t base_x = (out_x * stride_wd) - pad_wd;
+            for (int out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {//channel_loop
+                int32_t result = 0;
+
+                /* Select filter so as the point doesn't lie outside block */
+                int filter_y_start = max(0, -base_y);
+                int filter_x_start = max(0, -base_x);
+                int filter_y_end = min(filter_ht, input_ht - base_y);
+                int filter_x_end = min(filter_wd, input_wd - base_x);
+
+                for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    const int32_t idx_y = base_y + filter_y_idx;
+                    for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t idx_x = base_x + filter_x_idx;
+                        for (int in_ch_idx = 0; in_ch_idx < in_channels; in_ch_idx++) {
+                            int32_t input_index = (idx_y * input_wd + idx_x) * in_channels + in_ch_idx;
+                            int32_t filter_index = ((out_ch_idx * filter_ht + filter_y_idx)
+                                                    * filter_wd + filter_x_idx) * in_channels
+                                                   + in_ch_idx;
+                            int32_t input_val = input_data[input_index] + input_offset;
+                            int32_t filter_val = filter_data[filter_index] + filter_offset;
+                            result += input_val * filter_val;
+                        }
+                    }
+                }
+                if (bias) {
+                    result += bias[out_ch_idx];
+                }
+                result = esp_nn_multiply_by_quantized_mult(result, out_mult, out_shift);
+                result += out_offset;
+                result = max(result, activation_min);
+                result = min(result, activation_max);
+
+                int out_index = (out_y * out_wd + out_x) * out_channels + out_ch_idx;
+                out_data[out_index] = (uint8_t) result;
+            }
+        }
+    }
+}
+
+/**
+ * Assumption 1: i/p channels == o/p channels
+ * Assumption 2: Pointers are valid
+ * Assumption 3: dialation width = 1
+ */
+void esp_nn_conv_s8_ansi(const data_dims_t *input_dims,
+                         const int8_t *input_data,
+                         const data_dims_t *filter_dims,
+                         const int8_t *filter_data,
+                         const int32_t *bias,
+                         const data_dims_t *output_dims,
+                         int8_t *out_data,
+                         const conv_params_t *conv_params,
+                         const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t in_channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_channels = output_dims->channels;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int32_t out_ch_idx, out_y, out_x, in_ch_idx, filter_y_idx, filter_x_idx;
+
+    for (out_y = 0; out_y < out_ht; out_y++) {
+        for (out_x = 0; out_x < out_wd; out_x++) {
+            for (out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {
+                int32_t conv_out = 0;
+
+                const int32_t base_y = stride_ht * out_y - pad_ht;
+                const int32_t base_x = stride_wd * out_x - pad_wd;
+
+                const int32_t filter_y_start = max(0, -base_y);
+                const int32_t filter_x_start = max(0, -base_x);
+
+                const int32_t filter_y_end = min(filter_ht, input_ht - base_y);
+                const int32_t filter_x_end = min(filter_wd, input_wd - base_x);
+
+                for (filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    for (filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t in_row = base_y + filter_y_idx;
+                        const int32_t in_col = base_x + filter_x_idx;
+                        int32_t input_base_offset = (in_row * input_wd + in_col) * in_channels;
+                        int32_t filter_base_offset = out_ch_idx * in_channels * filter_ht * filter_wd +
+                                                       (filter_y_idx * filter_wd + filter_x_idx) * in_channels;
+                        for (in_ch_idx = 0; in_ch_idx < in_channels; in_ch_idx++) {
+                            conv_out +=
+                                (input_data[input_base_offset + in_ch_idx] + input_offset) *
+                                filter_data[filter_base_offset + in_ch_idx];
+                        }
+                    }
+                }
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult(conv_out, out_mult[out_ch_idx], out_shift[out_ch_idx]);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
--- a/code/components/esp-nn/src/convolution/esp_nn_conv_esp32s3.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_conv_esp32s3.c
@@ -0,0 +1,463 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdio.h>
+#include <esp_nn_defs.h>
+
+#include <common_functions.h>
+
+static int16_t *scratch_buffer = NULL;
+
+extern void esp_nn_conv_s8_mult8_1x1_esp32s3(const int8_t *input_data,
+                                             const uint16_t input_wd,
+                                             const uint16_t input_ht,
+                                             const uint16_t in_channels,
+                                             const int32_t input_offset,
+                                             const int8_t *filter_aligned,
+                                             const int32_t *bias,
+                                             int8_t *out_data,
+                                             const uint16_t out_wd,
+                                             const uint16_t out_ht,
+                                             const uint16_t out_channels,
+                                             const int32_t out_offset,
+                                             const int32_t *out_shift,
+                                             const int32_t *out_mult,
+                                             const int32_t activation_min,
+                                             const int32_t activation_max,
+                                             void *buffer /* scratch buffer */);
+
+extern void esp_nn_conv_s16_mult4_1x1_esp32s3(const int16_t *input_data,
+                                              const uint16_t input_wd,
+                                              const uint16_t input_ht,
+                                              const uint16_t in_channels,
+                                              const int16_t *filter_data,
+                                              const int32_t *bias,
+                                              int8_t *out_data,
+                                              const uint16_t out_wd,
+                                              const uint16_t out_ht,
+                                              const uint16_t out_channels,
+                                              const int32_t out_offset,
+                                              const int32_t *out_shift,
+                                              const int32_t *out_mult,
+                                              const int32_t activation_min,
+                                              const int32_t activation_max,
+                                              void *buffer /* scratch buffer */);
+
+extern void esp_nn_conv_s16_mult8_esp32s3(const int16_t *input_data,
+                                          const uint16_t input_wd,
+                                          const uint16_t input_ht,
+                                          const uint16_t in_channels,
+                                          const uint16_t pad_wd,
+                                          const uint16_t pad_ht,
+                                          const uint16_t stride_wd,
+                                          const uint16_t stride_ht,
+                                          const int16_t *filter_data,
+                                          const uint16_t filter_wd,
+                                          const uint16_t filter_ht,
+                                          const int32_t *bias,
+                                          int8_t *out_data,
+                                          const uint16_t out_wd,
+                                          const uint16_t out_ht,
+                                          const uint16_t out_channels,
+                                          const int32_t out_offset,
+                                          const int32_t *out_shift,
+                                          const int32_t *out_mult,
+                                          const int32_t activation_min,
+                                          const int32_t activation_max);
+
+extern void esp_nn_aligned_s8_to_s16_with_offset_esp32s3(const int8_t *src, int16_t *dst,
+                                                         const int size, const int32_t offset);
+
+extern void esp_nn_s8_to_s16_esp32s3(const int8_t *src, int16_t *dst, const int size);
+
+static void esp_nn_conv_s8_unrolled(const data_dims_t *input_dims,
+                                    const int8_t *input_data,
+                                    const data_dims_t *filter_dims,
+                                    const int8_t *filter_data,
+                                    const int32_t *bias,
+                                    const data_dims_t *output_dims,
+                                    int8_t *out_data,
+                                    const conv_params_t *conv_params,
+                                    const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t in_ch = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_ch = output_dims->channels;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int32_t out_ch_idx, out_y, out_x, in_ch_idx, filter_y_idx, filter_x_idx;
+
+    for (out_y = 0; out_y < out_ht; out_y++) {
+        for (out_x = 0; out_x < out_wd; out_x++) {
+            for (out_ch_idx = 0; out_ch_idx < out_ch; out_ch_idx++) {
+                int32_t conv_out = 0;
+
+                const int32_t base_y = stride_ht * out_y - pad_ht;
+                const int32_t base_x = stride_wd * out_x - pad_wd;
+
+                const int32_t filter_y_start = max(0, -base_y);
+                const int32_t filter_x_start = max(0, -base_x);
+
+                const int32_t filter_y_end = min(filter_ht, input_ht - base_y);
+                const int32_t filter_x_end = min(filter_wd, input_wd - base_x);
+
+                for (filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    for (filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t in_row = base_y + filter_y_idx;
+                        const int32_t in_col = base_x + filter_x_idx;
+                        int32_t input_base_offset = (in_row * input_wd + in_col) * in_ch;
+                        int32_t filter_base_offset = out_ch_idx * in_ch * filter_ht * filter_wd +
+                                                       (filter_y_idx * filter_wd + filter_x_idx) * in_ch;
+                        for (in_ch_idx = 0; in_ch_idx < in_ch; in_ch_idx++) {
+                            conv_out +=
+                                (input_data[input_base_offset + in_ch_idx] + input_offset) *
+                                filter_data[filter_base_offset + in_ch_idx];
+                        }
+                    }
+                }
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult_fast(conv_out, out_mult[out_ch_idx], out_shift[out_ch_idx]);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
+
+static void esp_nn_conv_s8_pad_valid(const int8_t *input_data,
+                                     const uint16_t input_wd,
+                                     const uint16_t input_ht,
+                                     const uint16_t in_channels,
+                                     const int32_t input_offset,
+                                     const uint16_t stride_wd,
+                                     const uint16_t stride_ht,
+                                     const int8_t *filter_data,
+                                     const uint16_t filter_wd,
+                                     const uint16_t filter_ht,
+                                     const int32_t *bias,
+                                     int8_t *out_data,
+                                     const uint16_t out_wd,
+                                     const uint16_t out_ht,
+                                     const uint16_t out_channels,
+                                     const int32_t out_offset,
+                                     const int32_t *out_shift,
+                                     const int32_t *out_mult,
+                                     const int32_t activation_min,
+                                     const int32_t activation_max)
+{
+    int32_t out_ch_idx, out_y, out_x, in_ch_idx, filter_y_idx, filter_x_idx;
+
+    for (out_y = 0; out_y < out_ht; out_y++) {
+        for (out_x = 0; out_x < out_wd; out_x++) {
+            for (out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {
+                int32_t conv_out = 0;
+
+                const int32_t base_y = stride_ht * out_y;
+                const int32_t base_x = stride_wd * out_x;
+
+                for (filter_y_idx = 0; filter_y_idx < filter_ht; filter_y_idx++) {
+                    for (filter_x_idx = 0; filter_x_idx < filter_wd; filter_x_idx++) {
+                        const int32_t in_row = base_y + filter_y_idx;
+                        const int32_t in_col = base_x + filter_x_idx;
+                        int32_t input_base_offset = (in_row * input_wd + in_col) * in_channels;
+                        int32_t filter_base_offset = out_ch_idx * in_channels * filter_ht * filter_wd +
+                                                       (filter_y_idx * filter_wd + filter_x_idx) * in_channels;
+                        const int8_t *input_data_ptr = input_data + input_base_offset;
+                        const int8_t *filter_data_ptr = filter_data + filter_base_offset;
+                        for (in_ch_idx = 0; in_ch_idx < in_channels; in_ch_idx++) {
+                            conv_out += (*input_data_ptr++ + input_offset) * *filter_data_ptr++;
+                        }
+                    }
+                }
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult_fast(conv_out, out_mult[out_ch_idx], out_shift[out_ch_idx]);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
+
+static void esp_nn_conv_s8_pad_valid_3x3(const int8_t *input_data,
+                                         const uint16_t input_wd,
+                                         const uint16_t input_ht,
+                                         const uint16_t in_channels,
+                                         const int32_t input_offset,
+                                         const uint16_t stride_wd,
+                                         const uint16_t stride_ht,
+                                         const int8_t *filter_data,
+                                         const int32_t *bias,
+                                         int8_t *out_data,
+                                         const uint16_t out_wd,
+                                         const uint16_t out_ht,
+                                         const uint16_t out_channels,
+                                         const int32_t out_offset,
+                                         const int32_t *out_shift,
+                                         const int32_t *out_mult,
+                                         const int32_t activation_min,
+                                         const int32_t activation_max)
+{
+    int32_t out_ch_idx, out_y, out_x, in_ch_idx, filter_y_idx, filter_x_idx;
+
+    for (out_y = 0; out_y < out_ht; out_y++) {
+        for (out_x = 0; out_x < out_wd; out_x++) {
+            const int32_t base_y = stride_ht * out_y;
+            const int32_t base_x = stride_wd * out_x;
+            for (out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {
+                int32_t conv_out = 0;
+                for (filter_y_idx = 0; filter_y_idx < 3; filter_y_idx++) {
+                    for (filter_x_idx = 0; filter_x_idx < 3; filter_x_idx++) {
+                        const int32_t in_row = base_y + filter_y_idx;
+                        const int32_t in_col = base_x + filter_x_idx;
+                        int32_t input_base_offset = (in_row * input_wd + in_col) * in_channels;
+                        int32_t filter_base_offset = out_ch_idx * in_channels * 3 * 3 +
+                                                       (filter_y_idx * 3 + filter_x_idx) * in_channels;
+                        const int8_t *input_data_ptr = input_data + input_base_offset;
+                        const int8_t *filter_data_ptr = filter_data + filter_base_offset;
+                        for (in_ch_idx = 0; in_ch_idx < in_channels; in_ch_idx++) {
+                            conv_out += (*input_data_ptr++ + input_offset) * *filter_data_ptr++;
+                        }
+                    }
+                }
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult_fast(conv_out, out_mult[out_ch_idx], out_shift[out_ch_idx]);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
+
+static void esp_nn_conv_s8_pad_valid_ch3_3x3(const int8_t *input_data,
+                                             const uint16_t input_wd,
+                                             const uint16_t input_ht,
+                                             const int32_t input_offset,
+                                             const uint16_t stride_wd,
+                                             const uint16_t stride_ht,
+                                             const int8_t *filter_data,
+                                             const int32_t *bias,
+                                             int8_t *out_data,
+                                             const uint16_t out_wd,
+                                             const uint16_t out_ht,
+                                             const uint16_t out_channels,
+                                             const int32_t out_offset,
+                                             const int32_t *out_shift,
+                                             const int32_t *out_mult,
+                                             const int32_t activation_min,
+                                             const int32_t activation_max)
+{
+    int32_t out_ch_idx, out_y, out_x, filter_y_idx;
+
+    /* use scratch_buffer to pre-compute offset factor */
+    int16_t *filter_sum = (int16_t *) scratch_buffer;
+    const int8_t *filter_ptr = filter_data;
+    for (out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {
+        int16_t sum_val = 0;
+        for (int i = 0; i < 9; i++) {
+            sum_val += *filter_ptr++;
+            sum_val += *filter_ptr++;
+            sum_val += *filter_ptr++;
+        }
+        *filter_sum++ = sum_val;
+    }
+
+    for (out_y = 0; out_y < out_ht; out_y++) {
+        for (out_x = 0; out_x < out_wd; out_x++) {
+            const int8_t *filter_data_ptr = filter_data;
+            const int32_t base_y = stride_ht * out_y;
+            const int32_t base_x = stride_wd * out_x;
+            const int8_t *input_base_ptr = input_data + (base_y * input_wd + base_x) * 3;
+            int16_t *filter_sum = (int16_t *) scratch_buffer;
+            for (out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {
+                int32_t conv_out = 0;
+
+                for (filter_y_idx = 0; filter_y_idx < 3; filter_y_idx++) {
+                    const int8_t *input_data_ptr = input_base_ptr + (filter_y_idx * input_wd) * 3;
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+                    conv_out += (*input_data_ptr++) * (*filter_data_ptr++);
+                }
+
+                conv_out += *filter_sum++ * input_offset;
+
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult_fast(conv_out, out_mult[out_ch_idx], out_shift[out_ch_idx]);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
+
+int esp_nn_get_conv_scratch_size_esp32s3(const data_dims_t *input_dims,
+                                         const data_dims_t *filter_dims,
+                                         const data_dims_t *output_dims,
+                                         const conv_params_t *conv_params)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t in_ch = input_dims->channels;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_ch = output_dims->channels;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+
+    int filter_size = filter_wd * filter_ht * in_ch * out_ch;
+    int input_size = input_wd * input_ht * in_ch;
+
+    int transpose_buf_size = 2 * (8 * in_ch); /* to store intermediate data */
+    if (input_wd * input_ht < 8) {
+        transpose_buf_size = 0; // not using this for leftover
+    }
+    int align_buf_size = 32; /* extra buffer for alignment */
+    if (in_ch % 8 == 0 && filter_wd == 1 && filter_ht == 1 &&
+            pad_wd == 0 && pad_ht == 0 && stride_wd == 1 && stride_ht == 1) {
+        return filter_size + transpose_buf_size + align_buf_size;
+    }
+    return 2 * (filter_size + input_size) +  transpose_buf_size + align_buf_size;
+}
+
+void esp_nn_set_conv_scratch_buf_esp32s3(void *buf)
+{
+    scratch_buffer = (int16_t *) buf;
+}
+
+void esp_nn_conv_s8_esp32s3(const data_dims_t *input_dims,
+                            const int8_t *input,
+                            const data_dims_t *filter_dims,
+                            const int8_t *filter_data,
+                            const int32_t *bias,
+                            const data_dims_t *output_dims,
+                            int8_t *out_data,
+                            const conv_params_t *conv_params,
+                            const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_channels = output_dims->channels;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int filter_size = filter_wd * filter_ht * channels * out_channels;
+    int input_size = input_wd * input_ht * channels;
+    int align_len = 16 - (filter_size & 15);
+    int16_t *filter_data16 = scratch_buffer;
+    int16_t *input_data16 = scratch_buffer + filter_size + align_len;
+
+    if (scratch_buffer == NULL) {
+        printf("esp_nn_conv error! scratch_buffer not set!\n");
+        return;
+    }
+
+    if (channels % 8 == 0 && filter_wd == 1 && filter_ht == 1 &&
+            pad_wd == 0 && pad_ht == 0 && stride_wd == 1 && stride_ht == 1) {
+        int8_t *filter_aligned = (int8_t *) scratch_buffer;
+        int scratch_offset = (int) (filter_aligned + filter_size);
+        void *scratch_buf = (void *) (scratch_offset + 16 - (scratch_offset & 15));
+        memcpy(filter_aligned, filter_data, filter_size); // copy to aligned address
+        esp_nn_conv_s8_mult8_1x1_esp32s3(
+            input, input_wd, input_ht, channels, input_offset, filter_aligned,
+            bias, out_data, out_wd, out_ht, out_channels, out_offset,
+            out_shift, out_mult, activation_min, activation_max, scratch_buf);
+    } else if (channels % 4 == 0 && filter_wd == 1 && filter_ht == 1 &&
+            (input_wd * input_ht) % 4 == 0 && /* TODO: remove this check */
+            pad_wd == 0 && pad_ht == 0 && stride_wd == 1 && stride_ht == 1) {
+        int scratch_offset = (int) (input_data16 + input_size);
+        void *scratch_buf = (void *) (scratch_offset + 16 - (scratch_offset & 15));
+        esp_nn_s8_to_s16_esp32s3(filter_data, filter_data16, filter_size);
+        esp_nn_aligned_s8_to_s16_with_offset_esp32s3(input, input_data16, input_size, input_offset);
+        esp_nn_conv_s16_mult4_1x1_esp32s3(
+            input_data16, input_wd, input_ht, channels, filter_data16,
+            bias, out_data, out_wd, out_ht, out_channels, out_offset,
+            out_shift, out_mult, activation_min, activation_max, scratch_buf);
+    } else if (channels % 8 == 0) {
+        esp_nn_s8_to_s16_esp32s3(filter_data, filter_data16, filter_size);
+        esp_nn_aligned_s8_to_s16_with_offset_esp32s3(input, input_data16, input_size, input_offset);
+        esp_nn_conv_s16_mult8_esp32s3(
+            input_data16, input_wd, input_ht, channels, pad_wd, pad_ht,
+            stride_wd, stride_ht, filter_data16, filter_wd, filter_ht, bias,
+            out_data, out_wd, out_ht, out_channels, out_offset, out_shift,
+            out_mult, activation_min, activation_max);
+    } else if (pad_wd == 0 && pad_ht == 0) {
+        if (filter_wd == 3 && filter_ht == 3 && channels == 3) {
+            esp_nn_conv_s8_pad_valid_ch3_3x3(input, input_wd, input_ht, input_offset,
+                                             stride_wd, stride_ht, filter_data, bias,
+                                             out_data, out_wd, out_ht, out_channels, out_offset,
+                                             out_shift, out_mult, activation_min, activation_max);
+        } else {
+            esp_nn_conv_s8_pad_valid(input, input_wd, input_ht, channels, input_offset,
+                                     stride_wd, stride_ht, filter_data, filter_wd, filter_ht, bias,
+                                     out_data, out_wd, out_ht, out_channels, out_offset, out_shift,
+                                     out_mult, activation_min, activation_max);
+        }
+    } else {
+        /* Basic unrolled version */
+        esp_nn_conv_s8_unrolled(input_dims, input, filter_dims, filter_data,
+                                bias, output_dims, out_data, conv_params, quant_data);
+    }
+}
--- a/code/components/esp-nn/src/convolution/esp_nn_conv_opt.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_conv_opt.c
@@ -0,0 +1,179 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <esp_nn_defs.h>
+
+#include <common_functions.h>
+
+int esp_nn_get_conv_scratch_size_opt(const data_dims_t *input_dims,
+                                     const data_dims_t *filter_dims,
+                                     const data_dims_t *output_dims,
+                                     const conv_params_t *conv_params)
+{
+    return 0;
+}
+
+void esp_nn_set_conv_scratch_buf_opt(const void *buf)
+{
+
+}
+
+__attribute__ ((noinline))
+static void esp_nn_conv_s8_1x1(const data_dims_t *input_dims,
+                               const int8_t *input_data,
+                               const int8_t *filter_data,
+                               const int32_t *bias,
+                               const data_dims_t *output_dims,
+                               int8_t *out_data,
+                               const conv_params_t *conv_params,
+                               const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t in_channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_channels = output_dims->channels;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    for (int32_t in_row = 0; in_row < out_ht * stride_ht; in_row += stride_ht) {
+        for (int32_t in_col = 0; in_col < out_wd * stride_wd; in_col += stride_wd) {
+            const int32_t *out_mult = quant_data->mult;
+            const int32_t *out_shift = quant_data->shift;
+            const int8_t *filter_ptr = filter_data;
+            const int8_t *input_base_ptr = input_data + (in_row * input_wd + in_col) * in_channels;
+            int32_t out_ch_idx = 0;
+            for (; out_ch_idx < out_channels; out_ch_idx++) {
+                int32_t conv_out = 0;
+
+                const int8_t *input_ptr = input_base_ptr;
+
+                int32_t in_ch_idx = 0;
+                for (; in_ch_idx < in_channels - 3; in_ch_idx += 4) {
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                }
+                for (; in_ch_idx < in_channels; in_ch_idx ++) {
+                    conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                }
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult_fast(conv_out, *out_mult++, *out_shift++);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
+
+/**
+ * Assumption 1: i/p channels == o/p channels
+ * Assumption 2: Pointers are valid
+ * Assumption 3: dialation width = 1
+ */
+void esp_nn_conv_s8_opt(const data_dims_t *input_dims,
+                        const int8_t *input_data,
+                        const data_dims_t *filter_dims,
+                        const int8_t *filter_data,
+                        const int32_t *bias,
+                        const data_dims_t *output_dims,
+                        int8_t *out_data,
+                        const conv_params_t *conv_params,
+                        const quant_data_t *quant_data)
+{
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+
+    if (filter_wd == 1 && filter_ht == 1) {
+        esp_nn_conv_s8_1x1(input_dims, input_data, filter_data, bias,
+                           output_dims, out_data, conv_params, quant_data);
+        return;
+    }
+
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t in_channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t out_channels = output_dims->channels;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int32_t out_ch_idx, out_y, out_x, filter_y_idx, filter_x_idx;
+
+    for (out_y = 0; out_y < out_ht; out_y++) {
+        for (out_x = 0; out_x < out_wd; out_x++) {
+            const int32_t *out_shift = quant_data->shift;
+            const int32_t *out_mult = quant_data->mult;
+            for (out_ch_idx = 0; out_ch_idx < out_channels; out_ch_idx++) {
+                int32_t conv_out = 0;
+
+                const int32_t base_y = stride_ht * out_y - pad_ht;
+                const int32_t base_x = stride_wd * out_x - pad_wd;
+
+                const int32_t filter_y_start = max(0, -base_y);
+                const int32_t filter_x_start = max(0, -base_x);
+
+                const int32_t filter_y_end = min(filter_ht, input_ht - base_y);
+                const int32_t filter_x_end = min(filter_wd, input_wd - base_x);
+
+                for (filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    for (filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t in_row = base_y + filter_y_idx;
+                        const int32_t in_col = base_x + filter_x_idx;
+
+                        const int8_t *input_ptr = input_data +
+                                        (in_row * input_wd + in_col) * in_channels;
+                        const int8_t *filter_ptr = filter_data +
+                                        out_ch_idx * in_channels * filter_ht * filter_wd +
+                                        (filter_y_idx * filter_wd + filter_x_idx) * in_channels;
+                        int32_t in_ch_idx = 0;
+                        for (; in_ch_idx < in_channels - 3; in_ch_idx += 4) {
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                        }
+                        for (; in_ch_idx < in_channels; in_ch_idx ++) {
+                            conv_out += (*input_ptr++ + input_offset) * *filter_ptr++;
+                        }
+                    }
+                }
+                if (bias) {
+                    conv_out += bias[out_ch_idx];
+                }
+                conv_out = esp_nn_multiply_by_quantized_mult_fast(conv_out, *out_mult++, *out_shift++);
+                conv_out += out_offset;
+                conv_out = max(conv_out, activation_min);
+                conv_out = min(conv_out, activation_max);
+                *out_data++ = (int8_t) conv_out;
+            }
+        }
+    }
+}
--- a/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_ansi.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_ansi.c
@@ -0,0 +1,100 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <esp_nn_defs.h>
+#include <common_functions.h>
+
+int esp_nn_get_depthwise_conv_scratch_size_ansi(const data_dims_t *input_dims,
+                                                const data_dims_t *filter_dims,
+                                                const data_dims_t *output_dims,
+                                                const dw_conv_params_t *conv_params)
+{
+    return 0;
+}
+
+void esp_nn_set_depthwise_conv_scratch_buf_ansi(const void *buf)
+{
+
+}
+
+void esp_nn_depthwise_conv_s8_ansi(const data_dims_t *input_dims,
+                                   const int8_t *input_data,
+                                   const data_dims_t *filter_dims,
+                                   const int8_t *filter_data,
+                                   const int32_t *bias,
+                                   const data_dims_t *output_dims,
+                                   int8_t *out_data,
+                                   const dw_conv_params_t *conv_params,
+                                   const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+    const uint16_t ch_mult = conv_params->ch_mult;
+
+    int out_idx = 0;
+    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
+        const int16_t base_y = (out_y * stride_ht) - pad_ht;
+        for (int out_x = 0; out_x < out_wd; out_x++) { //width_loop
+            const int16_t base_x = (out_x * stride_wd) - pad_wd;
+            for (int ch_idx = 0; ch_idx < channels; ch_idx++) {//channel_loop
+                for (int ch_mult_idx = 0; ch_mult_idx < ch_mult; ch_mult_idx++) {
+                    int32_t result = 0;
+                    const int out_ch_idx = ch_mult_idx + ch_idx * ch_mult;
+
+                    /* Select filter so as the point doesn't lie outside block */
+                    int filter_y_start = max(0, -base_y);
+                    int filter_x_start = max(0, -base_x);
+                    int filter_y_end = min(filter_ht, input_ht - base_y);
+                    int filter_x_end = min(filter_wd, input_wd - base_x);
+
+                    for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                        const int32_t idx_y = base_y + filter_y_idx;
+                        for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                            const int32_t idx_x = base_x + filter_x_idx;
+                            int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                            int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels * ch_mult) + out_ch_idx;
+                            int32_t input_val = input_data[input_index] + input_offset;
+                            int32_t filter_val = filter_data[filter_index];
+                            result += input_val * filter_val;
+                        }
+                    }
+                    if (bias) {
+                        result += bias[out_ch_idx];
+                    }
+                    result = esp_nn_multiply_by_quantized_mult(result, out_mult[out_ch_idx], out_shift[out_ch_idx]);
+                    result += out_offset;
+                    result = max(result, activation_min);
+                    result = min(result, activation_max);
+
+                    out_data[out_idx++] = result;
+                }
+            }
+        }
+    }
+}
--- a/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_opt.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_opt.c
@@ -0,0 +1,291 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <esp_nn_defs.h>
+#include <common_functions.h>
+
+int esp_nn_get_depthwise_conv_scratch_size_opt(const data_dims_t *input_dims,
+                                               const data_dims_t *filter_dims,
+                                               const data_dims_t *output_dims,
+                                               const dw_conv_params_t *conv_params)
+{
+    return 0;
+}
+
+void esp_nn_set_depthwise_conv_scratch_buf_opt(const void *buf)
+{
+
+}
+
+/* common channel multiplier == 1 case */
+__attribute__ ((noinline))
+static void esp_nn_depthwise_conv_s8_ch_mult_1(const data_dims_t *input_dims,
+                                               const int8_t *input_data,
+                                               const data_dims_t *filter_dims,
+                                               const int8_t *filter_data,
+                                               const int32_t *bias,
+                                               const data_dims_t *output_dims,
+                                               int8_t *out_data,
+                                               const dw_conv_params_t *conv_params,
+                                               const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int out_idx = 0;
+    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
+        const int16_t base_y = (out_y * stride_ht) - pad_ht;
+        for (int out_x = 0; out_x < out_wd; out_x++) { //width_loop
+            const int16_t base_x = (out_x * stride_wd) - pad_wd;
+
+            const int32_t *out_shift = quant_data->shift;
+            const int32_t *out_mult = quant_data->mult;
+
+            /* Select filter so as the point doesn't lie outside block */
+            int filter_y_start = max(0, -base_y);
+            int filter_x_start = max(0, -base_x);
+            int filter_y_end = min(filter_ht, input_ht - base_y);
+            int filter_x_end = min(filter_wd, input_wd - base_x);
+
+            int ch_idx = 0;
+            for (; ch_idx < channels - 3; ch_idx += 4) {//channel_loop
+                int32_t result0 = 0;
+                int32_t result1 = 0;
+                int32_t result2 = 0;
+                int32_t result3 = 0;
+
+                for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    const int32_t idx_y = base_y + filter_y_idx;
+                    for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t idx_x = base_x + filter_x_idx;
+                        int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                        int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels) + ch_idx;
+                        int32_t input_val0 = input_data[input_index + 0] + input_offset;
+                        int32_t input_val1 = input_data[input_index + 1] + input_offset;
+                        int32_t input_val2 = input_data[input_index + 2] + input_offset;
+                        int32_t input_val3 = input_data[input_index + 3] + input_offset;
+                        int32_t filter_val0 = filter_data[filter_index + 0];
+                        int32_t filter_val1 = filter_data[filter_index + 1];
+                        int32_t filter_val2 = filter_data[filter_index + 2];
+                        int32_t filter_val3 = filter_data[filter_index + 3];
+                        result0 += input_val0 * filter_val0;
+                        result1 += input_val1 * filter_val1;
+                        result2 += input_val2 * filter_val2;
+                        result3 += input_val3 * filter_val3;
+                    }
+                }
+                if (bias) {
+                    result0 += bias[ch_idx + 0];
+                    result1 += bias[ch_idx + 1];
+                    result2 += bias[ch_idx + 2];
+                    result3 += bias[ch_idx + 3];
+                }
+                result0 = esp_nn_multiply_by_quantized_mult_fast(result0, *out_mult++, *out_shift++);
+                result1 = esp_nn_multiply_by_quantized_mult_fast(result1, *out_mult++, *out_shift++);
+                result2 = esp_nn_multiply_by_quantized_mult_fast(result2, *out_mult++, *out_shift++);
+                result3 = esp_nn_multiply_by_quantized_mult_fast(result3, *out_mult++, *out_shift++);
+
+                result0 += out_offset;
+                result1 += out_offset;
+                result2 += out_offset;
+                result3 += out_offset;
+
+                result0 = max(result0, activation_min);
+                result1 = max(result1, activation_min);
+                result2 = max(result2, activation_min);
+                result3 = max(result3, activation_min);
+
+                result0 = min(result0, activation_max);
+                result1 = min(result1, activation_max);
+                result2 = min(result2, activation_max);
+                result3 = min(result3, activation_max);
+
+                out_data[out_idx++] = result0;
+                out_data[out_idx++] = result1;
+                out_data[out_idx++] = result2;
+                out_data[out_idx++] = result3;
+            }
+            for (; ch_idx < channels; ch_idx++) {//channel_loop
+                int32_t result = 0;
+
+                for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    const int32_t idx_y = base_y + filter_y_idx;
+                    for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t idx_x = base_x + filter_x_idx;
+                        int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                        int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels) + ch_idx;
+                        int32_t input_val = input_data[input_index] + input_offset;
+                        int32_t filter_val = filter_data[filter_index];
+                        result += input_val * filter_val;
+                    }
+                }
+                if (bias) {
+                    result += bias[ch_idx];
+                }
+                result = esp_nn_multiply_by_quantized_mult_fast(result, *out_mult++, *out_shift++);
+                result += out_offset;
+                result = max(result, activation_min);
+                result = min(result, activation_max);
+
+                out_data[out_idx++] = result;
+            }
+        }
+    }
+}
+
+void esp_nn_depthwise_conv_s8_opt(const data_dims_t *input_dims,
+                                  const int8_t *input_data,
+                                  const data_dims_t *filter_dims,
+                                  const int8_t *filter_data,
+                                  const int32_t *bias,
+                                  const data_dims_t *output_dims,
+                                  int8_t *out_data,
+                                  const dw_conv_params_t *conv_params,
+                                  const quant_data_t *quant_data)
+{
+    const uint16_t ch_mult = conv_params->ch_mult;
+    if (ch_mult == 1) {
+        esp_nn_depthwise_conv_s8_ch_mult_1(input_dims, input_data, filter_dims, filter_data,
+                                           bias, output_dims, out_data, conv_params, quant_data);
+        return;
+    }
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+
+    int out_idx = 0;
+    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
+        const int16_t base_y = (out_y * stride_ht) - pad_ht;
+        for (int out_x = 0; out_x < out_wd; out_x++) { //width_loop
+            const int16_t base_x = (out_x * stride_wd) - pad_wd;
+
+            const int32_t *out_shift = quant_data->shift;
+            const int32_t *out_mult = quant_data->mult;
+
+            /* Select filter so as the point doesn't lie outside block */
+            int filter_y_start = max(0, -base_y);
+            int filter_x_start = max(0, -base_x);
+            int filter_y_end = min(filter_ht, input_ht - base_y);
+            int filter_x_end = min(filter_wd, input_wd - base_x);
+
+            for (int ch_idx = 0; ch_idx < channels; ch_idx++) {//channel_loop
+                int ch_mult_idx = 0;
+                for (; ch_mult_idx < ch_mult - 3; ch_mult_idx += 4) {
+                    int32_t result0 = 0;
+                    int32_t result1 = 0;
+                    int32_t result2 = 0;
+                    int32_t result3 = 0;
+                    const int out_ch_idx =  ch_idx * ch_mult + ch_mult_idx;
+
+                    for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                        const int32_t idx_y = base_y + filter_y_idx;
+                        for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                            const int32_t idx_x = base_x + filter_x_idx;
+                            int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                            int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels * ch_mult) + out_ch_idx;
+                            int32_t input_val = input_data[input_index] + input_offset;
+                            int32_t filter_val0 = filter_data[filter_index + 0];
+                            int32_t filter_val1 = filter_data[filter_index + 1];
+                            int32_t filter_val2 = filter_data[filter_index + 2];
+                            int32_t filter_val3 = filter_data[filter_index + 3];
+                            result0 += input_val * filter_val0;
+                            result1 += input_val * filter_val1;
+                            result2 += input_val * filter_val2;
+                            result3 += input_val * filter_val3;
+                        }
+                    }
+                    if (bias) {
+                        result0 += bias[out_ch_idx + 0];
+                        result1 += bias[out_ch_idx + 1];
+                        result2 += bias[out_ch_idx + 2];
+                        result3 += bias[out_ch_idx + 3];
+                    }
+                    result0 = esp_nn_multiply_by_quantized_mult_fast(result0, *out_mult++, *out_shift++);
+                    result1 = esp_nn_multiply_by_quantized_mult_fast(result1, *out_mult++, *out_shift++);
+                    result2 = esp_nn_multiply_by_quantized_mult_fast(result2, *out_mult++, *out_shift++);
+                    result3 = esp_nn_multiply_by_quantized_mult_fast(result3, *out_mult++, *out_shift++);
+
+                    result0 += out_offset;
+                    result1 += out_offset;
+                    result2 += out_offset;
+                    result3 += out_offset;
+
+                    result0 = max(result0, activation_min);
+                    result1 = max(result1, activation_min);
+                    result2 = max(result2, activation_min);
+                    result3 = max(result3, activation_min);
+                    result0 = min(result0, activation_max);
+                    result1 = min(result1, activation_max);
+                    result2 = min(result2, activation_max);
+                    result3 = min(result3, activation_max);
+
+                    out_data[out_idx++] = result0;
+                    out_data[out_idx++] = result1;
+                    out_data[out_idx++] = result2;
+                    out_data[out_idx++] = result3;
+                }
+                for (; ch_mult_idx < ch_mult; ch_mult_idx++) {
+                    int32_t result = 0;
+                    const int out_ch_idx =  ch_idx * ch_mult + ch_mult_idx;
+
+                    for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                        const int32_t idx_y = base_y + filter_y_idx;
+                        for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                            const int32_t idx_x = base_x + filter_x_idx;
+                            int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                            int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels * ch_mult) + out_ch_idx;
+                            int32_t input_val = input_data[input_index] + input_offset;
+                            int32_t filter_val = filter_data[filter_index];
+                            result += input_val * filter_val;
+                        }
+                    }
+                    if (bias) {
+                        result += bias[out_ch_idx];
+                    }
+                    result = esp_nn_multiply_by_quantized_mult_fast(result, *out_mult++, *out_shift++);
+                    result += out_offset;
+                    result = max(result, activation_min);
+                    result = min(result, activation_max);
+
+                    out_data[out_idx++] = result;
+                }
+            }
+        }
+    }
+}
--- a/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_s8_esp32s3.c
+++ b/code/components/esp-nn/src/convolution/esp_nn_depthwise_conv_s8_esp32s3.c
@@ -0,0 +1,543 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdio.h>
+#include <esp_nn_defs.h>
+
+#include <common_functions.h>
+
+static int16_t *scratch_buffer = NULL;
+
+extern void esp_nn_depthwise_conv_s16_mult8_3x3_esp32s3(const int16_t *input_data,
+                                                        const uint16_t input_wd,
+                                                        const uint16_t input_ht,
+                                                        const uint16_t channels,
+                                                        const uint16_t pad_wd,
+                                                        const uint16_t pad_ht,
+                                                        const uint16_t stride_wd,
+                                                        const uint16_t stride_ht,
+                                                        const uint16_t ch_mult,
+                                                        const int16_t *filter_data,
+                                                        const int32_t *bias,
+                                                        int8_t *out_data,
+                                                        const uint16_t out_wd,
+                                                        const uint16_t out_ht,
+                                                        const int32_t out_offset,
+                                                        const int32_t *out_shift,
+                                                        const int32_t *out_mult,
+                                                        const int32_t activation_min,
+                                                        const int32_t activation_max);
+
+extern void esp_nn_depthwise_conv_s8_mult1_3x3_padded_esp32s3(const int8_t *input_data,
+                                                              const uint16_t input_wd,
+                                                              const uint16_t input_ht,
+                                                              const uint16_t channels,
+                                                              const int32_t input_offset,
+                                                              const uint16_t stride_wd,
+                                                              const uint16_t stride_ht,
+                                                              const int8_t *filter_data,
+                                                              const int32_t *bias,
+                                                              int8_t *out_data,
+                                                              const uint16_t out_wd,
+                                                              const uint16_t out_ht,
+                                                              const int32_t out_offset,
+                                                              const int32_t *out_shift,
+                                                              const int32_t *out_mult,
+                                                              const int32_t activation_min,
+                                                              const int32_t activation_max);
+
+extern void esp_nn_depthwise_conv_s16_mult1_3x3_no_pad_esp32s3(const int16_t *input_data,
+                                                               const uint16_t input_wd,
+                                                               const uint16_t input_ht,
+                                                               const uint16_t channels,
+                                                               const uint16_t stride_wd,
+                                                               const uint16_t stride_ht,
+                                                               const int16_t *filter_data,
+                                                               const int32_t *bias,
+                                                               int8_t *out_data,
+                                                               const uint16_t out_wd,
+                                                               const uint16_t out_ht,
+                                                               const int32_t out_offset,
+                                                               const int32_t *out_shift,
+                                                               const int32_t *out_mult,
+                                                               const int32_t activation_min,
+                                                               const int32_t activation_max);
+
+extern void esp_nn_depthwise_conv_s16_mult8_esp32s3(const int16_t *input_data,
+                                                    const uint16_t input_wd,
+                                                    const uint16_t input_ht,
+                                                    const uint16_t channels,
+                                                    const uint16_t pad_wd,
+                                                    const uint16_t pad_ht,
+                                                    const uint16_t stride_wd,
+                                                    const uint16_t stride_ht,
+                                                    const uint16_t ch_mult,
+                                                    const int16_t *filter_data,
+                                                    const uint16_t filter_wd,
+                                                    const uint16_t filter_ht,
+                                                    const int32_t *bias,
+                                                    int8_t *out_data,
+                                                    const uint16_t out_wd,
+                                                    const uint16_t out_ht,
+                                                    const int32_t out_offset,
+                                                    const int32_t *out_shift,
+                                                    const int32_t *out_mult,
+                                                    const int32_t activation_min,
+                                                    const int32_t activation_max);
+
+extern void esp_nn_depthwise_conv_s16_mult4_esp32s3(const int16_t *input_data,
+                                                    const uint16_t input_wd,
+                                                    const uint16_t input_ht,
+                                                    const uint16_t channels,
+                                                    const uint16_t pad_wd,
+                                                    const uint16_t pad_ht,
+                                                    const uint16_t stride_wd,
+                                                    const uint16_t stride_ht,
+                                                    const uint16_t ch_mult,
+                                                    const int16_t *filter_data,
+                                                    const uint16_t filter_wd,
+                                                    const uint16_t filter_ht,
+                                                    const int32_t *bias,
+                                                    int8_t *out_data,
+                                                    const uint16_t out_wd,
+                                                    const uint16_t out_ht,
+                                                    const int32_t out_offset,
+                                                    const int32_t *out_shift,
+                                                    const int32_t *out_mult,
+                                                    const int32_t activation_min,
+                                                    const int32_t activation_max);
+
+extern void esp_nn_depthwise_conv_s16_mult1_3x3_esp32s3(const int16_t *input_data,
+                                                        const uint16_t input_wd,
+                                                        const uint16_t input_ht,
+                                                        const uint16_t channels,
+                                                        const uint16_t pad_wd,
+                                                        const uint16_t pad_ht,
+                                                        const uint16_t stride_wd,
+                                                        const uint16_t stride_ht,
+                                                        const int16_t *filter_data,
+                                                        const int32_t *bias,
+                                                        int8_t *out_data,
+                                                        const uint16_t out_wd,
+                                                        const uint16_t out_ht,
+                                                        const int32_t out_offset,
+                                                        const int32_t *out_shift,
+                                                        const int32_t *out_mult,
+                                                        const int32_t activation_min,
+                                                        const int32_t activation_max);
+
+extern void esp_nn_depthwise_conv_s16_mult1_esp32s3(const int16_t *input_data,
+                                                    const uint16_t input_wd,
+                                                    const uint16_t input_ht,
+                                                    const uint16_t channels,
+                                                    const uint16_t pad_wd,
+                                                    const uint16_t pad_ht,
+                                                    const uint16_t stride_wd,
+                                                    const uint16_t stride_ht,
+                                                    const int16_t *filter_data,
+                                                    const uint16_t filter_wd,
+                                                    const uint16_t filter_ht,
+                                                    const int32_t *bias,
+                                                    int8_t *out_data,
+                                                    const uint16_t out_wd,
+                                                    const uint16_t out_ht,
+                                                    const int32_t out_offset,
+                                                    const int32_t *out_shift,
+                                                    const int32_t *out_mult,
+                                                    const int32_t activation_min,
+                                                    const int32_t activation_max);
+
+extern void esp_nn_s8_to_s16_esp32s3(const int8_t *src, int16_t *dst, const int size);
+
+extern void esp_nn_aligned_s8_to_s16_with_offset_esp32s3(const int8_t *src, int16_t *dst,
+                                                         const int size, const int32_t offset);
+
+static void esp_nn_depthwise_conv_s8_unrolled(const int8_t *input_data,
+                                              const uint16_t input_wd,
+                                              const uint16_t input_ht,
+                                              const uint16_t channels,
+                                              const int32_t input_offset,
+                                              const uint16_t pad_wd,
+                                              const uint16_t pad_ht,
+                                              const uint16_t stride_wd,
+                                              const uint16_t stride_ht,
+                                              const uint16_t ch_mult,
+                                              const int8_t *filter_data,
+                                              const uint16_t filter_wd,
+                                              const uint16_t filter_ht,
+                                              const int32_t *bias,
+                                              int8_t *out_data,
+                                              const uint16_t out_wd,
+                                              const uint16_t out_ht,
+                                              const int32_t out_offset,
+                                              const int32_t *out_shift,
+                                              const int32_t *out_mult,
+                                              const int32_t activation_min,
+                                              const int32_t activation_max)
+{
+    int out_idx = 0;
+    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
+        const int16_t base_y = (out_y * stride_ht) - pad_ht;
+        for (int out_x = 0; out_x < out_wd; out_x++) { //width_loop
+            const int16_t base_x = (out_x * stride_wd) - pad_wd;
+            for (int ch_idx = 0; ch_idx < channels; ch_idx++) {//channel_loop
+                int ch_mult_idx = 0;
+                for (; ch_mult_idx < ch_mult - 3; ch_mult_idx += 4) {
+                    int32_t result0 = 0, result1 = 0, result2 = 0, result3 = 0;
+                    const int out_ch_idx = ch_mult_idx + ch_idx * ch_mult;
+
+                    /* Select filter so as the point doesn't lie outside block */
+                    int filter_y_start = max(0, -base_y);
+                    int filter_x_start = max(0, -base_x);
+                    int filter_y_end = min(filter_ht, input_ht - base_y);
+                    int filter_x_end = min(filter_wd, input_wd - base_x);
+
+                    for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                        const int32_t idx_y = base_y + filter_y_idx;
+                        for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                            const int32_t idx_x = base_x + filter_x_idx;
+                            int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                            int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels * ch_mult) + out_ch_idx;
+                            int32_t input_val = input_data[input_index] + input_offset;
+                            int32_t filter_val0 = filter_data[filter_index + 0];
+                            int32_t filter_val1 = filter_data[filter_index + 1];
+                            int32_t filter_val2 = filter_data[filter_index + 2];
+                            int32_t filter_val3 = filter_data[filter_index + 3];
+                            result0 += input_val * filter_val0;
+                            result1 += input_val * filter_val1;
+                            result2 += input_val * filter_val2;
+                            result3 += input_val * filter_val3;
+                        }
+                    }
+                    if (bias) {
+                        result0 += bias[out_ch_idx + 0];
+                        result1 += bias[out_ch_idx + 1];
+                        result2 += bias[out_ch_idx + 2];
+                        result3 += bias[out_ch_idx + 3];
+                    }
+                    result0 = esp_nn_multiply_by_quantized_mult(result0,
+                                out_mult[out_ch_idx + 0], out_shift[out_ch_idx + 0]);
+                    result1 = esp_nn_multiply_by_quantized_mult(result1,
+                                out_mult[out_ch_idx + 1], out_shift[out_ch_idx + 1]);
+                    result2 = esp_nn_multiply_by_quantized_mult(result2,
+                                out_mult[out_ch_idx + 2], out_shift[out_ch_idx + 2]);
+                    result3 = esp_nn_multiply_by_quantized_mult(result3,
+                                out_mult[out_ch_idx + 3], out_shift[out_ch_idx + 3]);
+
+                    result0 += out_offset;
+                    result1 += out_offset;
+                    result2 += out_offset;
+                    result3 += out_offset;
+
+                    result0 = max(result0, activation_min);
+                    result1 = max(result1, activation_min);
+                    result2 = max(result2, activation_min);
+                    result3 = max(result3, activation_min);
+
+                    result0 = min(result0, activation_max);
+                    result1 = min(result1, activation_max);
+                    result2 = min(result2, activation_max);
+                    result3 = min(result3, activation_max);
+
+                    out_data[out_idx++] = result0;
+                    out_data[out_idx++] = result1;
+                    out_data[out_idx++] = result2;
+                    out_data[out_idx++] = result3;
+                }
+
+                /* left-over */
+                for (; ch_mult_idx < ch_mult; ch_mult_idx++) {
+                    int32_t result = 0;
+                    const int out_ch_idx = ch_mult_idx + ch_idx * ch_mult;
+
+                    /* Select filter so as the point doesn't lie outside block */
+                    int filter_y_start = max(0, -base_y);
+                    int filter_x_start = max(0, -base_x);
+                    int filter_y_end = min(filter_ht, input_ht - base_y);
+                    int filter_x_end = min(filter_wd, input_wd - base_x);
+
+                    for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                        const int32_t idx_y = base_y + filter_y_idx;
+                        for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                            const int32_t idx_x = base_x + filter_x_idx;
+                            int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                            int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * (channels * ch_mult) + out_ch_idx;
+                            int32_t input_val = input_data[input_index] + input_offset;
+                            int32_t filter_val = filter_data[filter_index];
+                            result += input_val * filter_val;
+                        }
+                    }
+                    if (bias) {
+                        result += bias[out_ch_idx];
+                    }
+                    result = esp_nn_multiply_by_quantized_mult(result, out_mult[out_ch_idx], out_shift[out_ch_idx]);
+                    result += out_offset;
+                    result = max(result, activation_min);
+                    result = min(result, activation_max);
+
+                    out_data[out_idx++] = result;
+                }
+            }
+        }
+    }
+}
+
+void esp_nn_depthwise_conv_s8_ch_mult1(const int8_t *input_data,
+                                       const uint16_t input_wd,
+                                       const uint16_t input_ht,
+                                       const uint16_t channels,
+                                       const int32_t input_offset,
+                                       const uint16_t pad_wd,
+                                       const uint16_t pad_ht,
+                                       const uint16_t stride_wd,
+                                       const uint16_t stride_ht,
+                                       const int8_t *filter_data,
+                                       const uint16_t filter_wd,
+                                       const uint16_t filter_ht,
+                                       const int32_t *bias,
+                                       int8_t *out_data,
+                                       const uint16_t out_wd,
+                                       const uint16_t out_ht,
+                                       const int32_t out_offset,
+                                       const int32_t *out_shift,
+                                       const int32_t *out_mult,
+                                       const int32_t activation_min,
+                                       const int32_t activation_max)
+{
+    int out_idx = 0;
+    for (int out_y = 0; out_y < out_ht; out_y++) { //height loop
+        const int16_t base_y = (out_y * stride_ht) - pad_ht;
+        for (int out_x = 0; out_x < out_wd; out_x++) { //width_loop
+            const int16_t base_x = (out_x * stride_wd) - pad_wd;
+            for (int ch_idx = 0; ch_idx < channels; ch_idx++) {//channel_loop
+                int32_t result = 0;
+                /* Select filter so as the point doesn't lie outside block */
+                int filter_y_start = max(0, -base_y);
+                int filter_x_start = max(0, -base_x);
+                int filter_y_end = min(filter_ht, input_ht - base_y);
+                int filter_x_end = min(filter_wd, input_wd - base_x);
+
+                for (int filter_y_idx = filter_y_start; filter_y_idx < filter_y_end; filter_y_idx++) {
+                    const int32_t idx_y = base_y + filter_y_idx;
+                    for (int filter_x_idx = filter_x_start; filter_x_idx < filter_x_end; filter_x_idx++) {
+                        const int32_t idx_x = base_x + filter_x_idx;
+                        int32_t input_index = (idx_y * input_wd + idx_x) * channels + ch_idx;
+                        int32_t filter_index = (filter_y_idx * filter_wd + filter_x_idx) * channels + ch_idx;
+                        int32_t input_val = input_data[input_index] + input_offset;
+                        int32_t filter_val = filter_data[filter_index];
+                        result += input_val * filter_val;
+                    }
+                }
+                if (bias) {
+                    result += bias[ch_idx];
+                }
+                result = esp_nn_multiply_by_quantized_mult(result, out_mult[ch_idx], out_shift[ch_idx]);
+                result += out_offset;
+                result = max(result, activation_min);
+                result = min(result, activation_max);
+
+                out_data[out_idx++] = result;
+            }
+        }
+    }
+}
+
+int esp_nn_get_depthwise_conv_scratch_size_esp32s3(const data_dims_t *input_dims,
+                                                   const data_dims_t *filter_dims,
+                                                   const data_dims_t *output_dims,
+                                                   const dw_conv_params_t *conv_params)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t ch_mult = conv_params->ch_mult;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+
+    int filter_size = filter_wd * filter_ht * channels * ch_mult;
+    int pad_width = 0, pad_height = 0;
+
+    if ((ch_mult == 1) && (channels % 8 == 0) && (filter_wd == 3) && (filter_ht == 3)) {
+        if (channels % 16 == 0) {
+            if (pad_wd || pad_ht) {
+                pad_width = pad_wd * 2;
+                pad_height = pad_ht * 2;
+            } else {
+                // check if we need to pad additionally
+                pad_width = (out_wd * stride_wd + filter_wd - 1) - input_wd;
+                pad_height = (out_ht * stride_ht + filter_ht - 1) - input_ht;
+                // printf("in(%d %d %d), out(%d %d), filter (%d %d) stride (%d %d), pad (%d %d)",
+                //         input_wd, input_ht, channels, out_wd, out_ht, filter_wd, filter_ht,
+                //         stride_wd, stride_ht, pad_wd, pad_ht);
+            }
+            if (pad_width || pad_height) {
+                int input_size = (input_wd + pad_width) * (input_ht + pad_height) * channels;
+                // printf("ask1 %d\n", filter_size + input_size + 16);
+                return filter_size + input_size + 16;  // 16 for alignment
+            } else {
+                // printf("ask2 %d\n", filter_size + 16);
+                return filter_size + 16;  // 16 for alignment
+            }
+        } else {
+            int input_size = input_wd * input_ht * channels;
+            // printf("ask3 %d\n", 2 * (filter_size + input_size) + 16);
+            return  2 * (filter_size + input_size) + 16; // 16 for alignment
+        }
+    } else if (ch_mult % 4 == 0) {
+        int input_size = input_wd * input_ht * channels;
+        // printf("ask4 %d\n", 2 * (filter_size + input_size) + 16);
+        return  2 * (filter_size + input_size) + 16; // 16 for alignment
+    }
+    return 32; // just few bytes
+}
+
+void esp_nn_set_depthwise_conv_scratch_buf_esp32s3(void *buf)
+{
+    scratch_buffer = (int16_t *) buf;
+}
+
+/**
+ * Assumption 1: i/p channels == o/p channels
+ * Assumption 2: Pointers are valid
+ * Assumption 3: dialation width = 1
+ */
+
+
+
+void esp_nn_depthwise_conv_s8_esp32s3(const data_dims_t *input_dims,
+                                      const int8_t *input_data,
+                                      const data_dims_t *filter_dims,
+                                      const int8_t *filter_data,
+                                      const int32_t *bias,
+                                      const data_dims_t *output_dims,
+                                      int8_t *out_data,
+                                      const dw_conv_params_t *conv_params,
+                                      const quant_data_t *quant_data)
+{
+    const uint16_t input_wd = input_dims->width;
+    const uint16_t input_ht = input_dims->height;
+    const uint16_t channels = input_dims->channels;
+    const int32_t input_offset = conv_params->in_offset;
+    const int32_t out_offset = conv_params->out_offset;
+    const uint16_t pad_wd = conv_params->padding.width;
+    const uint16_t pad_ht = conv_params->padding.height;
+    const uint16_t stride_wd = conv_params->stride.width;
+    const uint16_t stride_ht = conv_params->stride.height;
+    const uint16_t filter_wd = filter_dims->width;
+    const uint16_t filter_ht = filter_dims->height;
+    const uint16_t out_wd = output_dims->width;
+    const uint16_t out_ht = output_dims->height;
+    const int32_t *out_shift = quant_data->shift;
+    const int32_t *out_mult = quant_data->mult;
+    const int32_t activation_min = conv_params->activation.min;
+    const int32_t activation_max = conv_params->activation.max;
+    const uint16_t ch_mult = conv_params->ch_mult;
+
+    int filter_size = filter_wd * filter_ht * channels * ch_mult;
+    int align_len = 16 - (filter_size & 15);
+    int input_size = input_wd * input_ht * channels;
+    int16_t *filter_data16 = scratch_buffer;
+    int16_t *input_data16 = scratch_buffer + filter_size + align_len;
+    if (scratch_buffer == NULL) {
+        printf("esp_nn_depthwise_conv error! scratch_buffer not set!\n");
+        return;
+    }
+
+    if ((ch_mult == 1) && (channels % 8 == 0)) {
+        if ((filter_wd == 3) && (filter_ht == 3)) {
+            if ((channels % 16 == 0) && (pad_wd == 1) && (pad_ht == 1)) {
+                /* process in 8 bits */
+                int8_t *filter_aligned = (int8_t *) scratch_buffer;
+                int8_t *input_padded = (int8_t *) scratch_buffer + filter_size + align_len;
+                memcpy(filter_aligned, filter_data, filter_size);
+                esp_nn_aligned_s8_pad_with_value(input_data, input_padded, input_wd, input_ht, channels,
+                                                 -input_offset, pad_wd, pad_ht);
+                esp_nn_depthwise_conv_s8_mult1_3x3_padded_esp32s3(input_padded, input_wd + 2 * pad_wd,
+                                                                  input_ht + 2 * pad_ht, channels, input_offset,
+                                                                  stride_wd, stride_ht, filter_aligned, bias,
+                                                                  out_data, out_wd, out_ht, out_offset, out_shift,
+                                                                  out_mult, activation_min, activation_max);
+            } else if ((channels % 16 == 0) && (pad_wd == 0) && (pad_ht == 0)) {
+                /* process in 8 bits */
+                int8_t *filter_aligned = (int8_t *) scratch_buffer;
+                int8_t *input_padded = (int8_t *) scratch_buffer + filter_size + align_len;
+
+                // check if we need to pad additionally
+                int pad_right = (out_wd * stride_wd + filter_wd - 1) - input_wd;
+                int pad_bottom = (out_ht * stride_ht + filter_ht - 1) - input_ht;
+                if (pad_right || pad_bottom) { // pad right and bottom
+                    esp_nn_aligned_s8_pad_end_with_value(input_data, input_padded, input_wd, input_ht,
+                                                         channels, -input_offset, pad_right, pad_bottom);
+                } else {
+                    input_padded = (int8_t *) input_data;
+                }
+                memcpy(filter_aligned, filter_data, filter_size);
+                esp_nn_depthwise_conv_s8_mult1_3x3_padded_esp32s3(input_padded, input_wd + pad_right,
+                                                                  input_ht + pad_bottom, channels, input_offset,
+                                                                  stride_wd, stride_ht, filter_aligned, bias,
+                                                                  out_data, out_wd, out_ht, out_offset, out_shift,
+                                                                  out_mult, activation_min, activation_max);
+            } else { /* (channels % 8) == 0 */
+                esp_nn_s8_to_s16_esp32s3(filter_data, filter_data16, filter_size);
+                esp_nn_aligned_s8_to_s16_with_offset_esp32s3(input_data, input_data16, input_size, input_offset);
+                esp_nn_depthwise_conv_s16_mult1_3x3_esp32s3(input_data16, input_wd, input_ht, channels,
+                                                            pad_wd, pad_ht, stride_wd, stride_ht, filter_data16,
+                                                            bias, out_data, out_wd, out_ht, out_offset, out_shift,
+                                                            out_mult, activation_min, activation_max);
+            }
+        } else { // all other ch_mult == 1, `channels % 8 == 0`
+            esp_nn_depthwise_conv_s8_ch_mult1(input_data, input_wd, input_ht, channels, input_offset,
+                                              pad_wd, pad_ht, stride_wd, stride_ht,
+                                              filter_data, filter_wd, filter_ht,
+                                              bias, out_data, out_wd, out_ht, out_offset, out_shift,
+                                              out_mult, activation_min, activation_max);
+        }
+    } else if (ch_mult % 8 == 0) {
+        esp_nn_s8_to_s16_esp32s3(filter_data, filter_data16, filter_size);
+        esp_nn_aligned_s8_to_s16_with_offset_esp32s3(input_data, input_data16, input_size, input_offset);
+        if (filter_wd == 3 && filter_ht == 3) {
+            esp_nn_depthwise_conv_s16_mult8_3x3_esp32s3(input_data16, input_wd, input_ht, channels,
+                                                        pad_wd, pad_ht, stride_wd, stride_ht, ch_mult,
+                                                        filter_data16, bias,
+                                                        out_data, out_wd, out_ht, out_offset, out_shift,
+                                                        out_mult, activation_min, activation_max);
+        } else {
+            esp_nn_depthwise_conv_s16_mult8_esp32s3(input_data16, input_wd, input_ht, channels,
+                                                    pad_wd, pad_ht, stride_wd, stride_ht, ch_mult,
+                                                    filter_data16, filter_wd, filter_ht, bias,
+                                                    out_data, out_wd, out_ht, out_offset, out_shift,
+                                                    out_mult, activation_min, activation_max);
+        }
+    } else if (ch_mult % 4 == 0) {
+        esp_nn_s8_to_s16_esp32s3(filter_data, filter_data16, filter_size);
+        esp_nn_aligned_s8_to_s16_with_offset_esp32s3(input_data, input_data16, input_size, input_offset);
+        esp_nn_depthwise_conv_s16_mult4_esp32s3(input_data16, input_wd, input_ht, channels,
+                                                pad_wd, pad_ht, stride_wd, stride_ht, ch_mult,
+                                                filter_data16, filter_wd, filter_ht, bias,
+                                                out_data, out_wd, out_ht, out_offset, out_shift,
+                                                out_mult, activation_min, activation_max);
+    } else {
+        esp_nn_depthwise_conv_s8_unrolled(input_data, input_wd, input_ht, channels, input_offset,
+                                          pad_wd, pad_ht, stride_wd, stride_ht, ch_mult,
+                                          filter_data, filter_wd, filter_ht,
+                                          bias, out_data, out_wd, out_ht, out_offset, out_shift,
+                                          out_mult, activation_min, activation_max);
+    }
+}
--- a/code/components/esp-nn/src/fully_connected/esp_nn_fully_connected_ansi.c
+++ b/code/components/esp-nn/src/fully_connected/esp_nn_fully_connected_ansi.c
@@ -0,0 +1,50 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+
+#include <common_functions.h>
+
+void esp_nn_fully_connected_s8_ansi(const int8_t *input_data,
+                                    const int32_t input_offset,
+                                    const uint16_t row_len,
+                                    const int8_t *filter_data,
+                                    const int32_t filter_offset,
+                                    const int32_t *bias,
+                                    int8_t *out_data,
+                                    const uint16_t out_channels,
+                                    const int32_t out_offset,
+                                    const int32_t out_shift,
+                                    const int32_t out_mult,
+                                    const int32_t activation_min,
+                                    const int32_t activation_max)
+{
+    for (int32_t out_c = 0; out_c < out_channels; ++out_c) {
+        int32_t result = 0;
+        for (int32_t data_idx = 0; data_idx < row_len; data_idx++) {
+            int32_t filter_index = row_len * out_c + data_idx;
+            int32_t input_val = input_data[data_idx];
+            int32_t filter_val = filter_data[filter_index];
+            result += (filter_val + filter_offset) * (input_val + input_offset);
+        }
+        if (bias) {
+            result += bias[out_c];
+        }
+        result = esp_nn_multiply_by_quantized_mult(result, out_mult, out_shift);
+        result += out_offset;
+        result = max(result, activation_min);
+        result = min(result, activation_max);
+        out_data[out_c] = (int8_t) result;
+    }
+}
--- a/code/components/esp-nn/src/pooling/esp_nn_avg_pool_ansi.c
+++ b/code/components/esp-nn/src/pooling/esp_nn_avg_pool_ansi.c
@@ -0,0 +1,72 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+
+#include <common_functions.h>
+
+void esp_nn_avg_pool_s8_ansi(const int8_t *input,
+                             const uint16_t input_wd,
+                             const uint16_t input_ht,
+                             int8_t *output,
+                             const uint16_t output_wd,
+                             const uint16_t output_ht,
+                             const uint16_t stride_wd,
+                             const uint16_t stride_ht,
+                             const uint16_t filter_wd,
+                             const uint16_t filter_ht,
+                             const uint16_t pad_wd,
+                             const uint16_t pad_ht,
+                             const int32_t activation_min,
+                             const int32_t activation_max,
+                             const uint16_t channels)
+{
+    int32_t base_y = -pad_ht;
+    for (int32_t out_y = 0; out_y < output_ht; out_y++, base_y += stride_ht) {
+        int32_t base_x = -pad_wd;
+        for (int32_t out_x = 0; out_x < output_wd; out_x++, base_x += stride_wd) {
+            for (int32_t ch_idx = 0; ch_idx < channels; ch_idx++) {
+                int32_t result = 0;
+                int32_t filter_cnt = 0;
+                /* Make sure filter does not cross the input box */
+                int32_t filter_y_start = max(0, -base_y);
+                int32_t filter_x_start = max(0, -base_x);
+
+                int32_t filter_y_end = min(filter_ht, input_ht - base_y);
+                int32_t filter_x_end = min(filter_wd, input_wd - base_x);
+
+                for (int32_t filter_y = filter_y_start; filter_y < filter_y_end; filter_y++) {
+                    for (int32_t filter_x = filter_x_start; filter_x < filter_x_end; filter_x++) {
+                        int32_t in_x_idx = base_x + filter_x;
+                        int32_t in_y_idx = base_y + filter_y;
+                        int32_t input_index = (in_y_idx * input_wd + in_x_idx) * channels + ch_idx;
+                        result += input[input_index];
+                        filter_cnt++;
+                    }
+                }
+
+                /* Rounded average */
+                result = result > 0 ? (result + filter_cnt / 2) / filter_cnt
+                                    : (result - filter_cnt / 2) / filter_cnt;
+
+                /* Activation function */
+                result = max(result, activation_min);
+                result = min(result, activation_max);
+
+                int32_t output_index = (out_y * output_wd + out_x) * channels + ch_idx;
+                output[output_index] = (int8_t) result;
+            }
+        }
+    }
+}
--- a/code/components/esp-nn/src/pooling/esp_nn_max_pool_ansi.c
+++ b/code/components/esp-nn/src/pooling/esp_nn_max_pool_ansi.c
@@ -0,0 +1,66 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+
+#include <common_functions.h>
+
+void esp_nn_max_pool_s8_ansi(const int8_t *input,
+                             const uint16_t input_wd,
+                             const uint16_t input_ht,
+                             int8_t *output,
+                             const uint16_t output_wd,
+                             const uint16_t output_ht,
+                             const uint16_t stride_wd,
+                             const uint16_t stride_ht,
+                             const uint16_t filter_wd,
+                             const uint16_t filter_ht,
+                             const uint16_t pad_wd,
+                             const uint16_t pad_ht,
+                             const int32_t activation_min,
+                             const int32_t activation_max,
+                             const uint16_t channels)
+{
+    int32_t base_y = -pad_ht;
+    for (int32_t out_y = 0; out_y < output_ht; out_y++, base_y += stride_ht) {
+        int32_t base_x = -pad_wd;
+        for (int32_t out_x = 0; out_x < output_wd; out_x++, base_x += stride_wd) {
+            /* Make sure filter does not cross the input box */
+            int32_t filter_y_start = max(0, -base_y);
+            int32_t filter_x_start = max(0, -base_x);
+            int32_t filter_y_end = min(filter_ht, input_ht - base_y);
+            int32_t filter_x_end = min(filter_wd, input_wd - base_x);
+
+            for (int32_t ch_idx = 0; ch_idx < channels; ch_idx++) {
+                int8_t result = INT8_MIN;
+
+                for (int32_t filter_y = filter_y_start; filter_y < filter_y_end; filter_y++) {
+                    for (int32_t filter_x = filter_x_start; filter_x < filter_x_end; filter_x++) {
+                        int32_t in_x_idx = base_x + filter_x;
+                        int32_t in_y_idx = base_y + filter_y;
+                        int32_t input_index = (in_y_idx * input_wd + in_x_idx) * channels + ch_idx;
+                        result = max(input[input_index], result);
+                    }
+                }
+
+                /* Activation function */
+                result = max(result, activation_min);
+                result = min(result, activation_max);
+
+                int32_t output_index = (out_y * output_wd + out_x) * channels + ch_idx;
+                output[output_index] = result;
+            }
+        }
+    }
+}
--- a/code/components/esp-nn/src/softmax/esp_nn_softmax_ansi.c
+++ b/code/components/esp-nn/src/softmax/esp_nn_softmax_ansi.c
@@ -0,0 +1,88 @@
+// Copyright 2022 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "softmax_common.h"
+
+int32_t esp_nn_get_softmax_scratch_size_ansi(const int32_t width, const int32_t height)
+{
+    (void) width;
+    (void) height;
+    return 0;
+}
+
+void esp_nn_set_softmax_scratch_buf_ansi(void *buffer)
+{
+    (void) buffer;
+    return;
+}
+
+void esp_nn_softmax_s8_ansi(const int8_t *input_data,
+                            const int32_t height,
+                            const int32_t width,
+                            const int32_t mult,
+                            const int32_t shift,
+                            const int32_t diff_min,
+                            int8_t *output_data)
+{
+    // The representation chosen for the input to the exp() function is Q5.26.
+    // We need to leave extra space since values that we skip might be as large as
+    // -32 before multiplying by input mult, and therefore as large as
+    // -16 afterwards.  Note that exp(-8) is definitely not insignificant to
+    // accumulation, but exp(-16) definitely is.
+#define ACCUM_BITS  12
+#define DIFF_BITS   5
+
+    const int32_t mask = (1 << shift);
+    int32_t col = 0;
+    const int8_t *in_ptr = input_data;
+    int8_t *out_ptr = output_data;
+
+    for (int row_idx = 0; row_idx < height; row_idx++) {
+        int8_t max_in_row = in_ptr[0];
+        for (col = 1; col < width; col++) {
+            max_in_row = max(max_in_row, in_ptr[col]);
+        }
+
+        int32_t input_diff = 0;
+        int32_t sum_of_exps = 0;
+
+        for (col = 0; col < width; col++) {
+            input_diff = in_ptr[col] - max_in_row;
+            if (input_diff >= diff_min) {
+                const int32_t input_diff_rescaled = SAT_HIGH_MUL(input_diff * mask, mult);
+                const int32_t exp_raw = esp_nn_exp_on_negative_values(input_diff_rescaled);
+                sum_of_exps += DIV_POW2(exp_raw, ACCUM_BITS);
+            }
+        }
+
+        const int32_t headroom_plus1 = esp_nn_clz32((uint32_t) sum_of_exps);
+        const int32_t shifted_scale = ONE_OVER_ONE_X((sum_of_exps << headroom_plus1) - (1 << 31));
+        const int32_t bits_over_unit = ACCUM_BITS - headroom_plus1 + 31 - sizeof(int8_t) * 8;
+
+        for (col = 0; col < width; col++) {
+            input_diff = in_ptr[col] - max_in_row;
+            if (input_diff >= diff_min) {
+                const int32_t input_diff_rescaled = SAT_HIGH_MUL(input_diff * mask, mult);
+                const int32_t exp_raw = esp_nn_exp_on_negative_values(input_diff_rescaled);
+                const int32_t shifted_output = SAT_HIGH_MUL(shifted_scale, exp_raw);
+                const int32_t result = DIV_POW2(shifted_output, bits_over_unit) - 128;
+                out_ptr[col] = (int8_t) esp_nn_saturate8(result);
+            } else {
+                out_ptr[col] = -128;
+            }
+        }
+        in_ptr  += width;
+        out_ptr += width;
+    }
+}
--- a/code/components/esp-nn/src/softmax/esp_nn_softmax_opt.c
+++ b/code/components/esp-nn/src/softmax/esp_nn_softmax_opt.c
@@ -0,0 +1,108 @@
+// Copyright 2022 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "softmax_common.h"
+#include <stdio.h>
+
+static int32_t *scratch_buf = NULL;
+
+/**
+ * @brief   Get scratch buffer size needed by softmax function
+ *
+ * @param   width
+ * @param   height
+ * @return  size in bytes
+ *
+ * @note    buffer must be 4 byte aligned
+ */
+int32_t esp_nn_get_softmax_scratch_size_opt(const int32_t width, const int32_t height)
+{
+    (void) height;
+    return width * 4;
+}
+
+/**
+ * @brief   Set scratch buffer to be used by softmax function
+ *
+ * @param   buffer  this can be NULL if one needs to unset it
+ *                  must be aligned to 4 bytes
+ */
+void esp_nn_set_softmax_scratch_buf_opt(void *buffer)
+{
+    scratch_buf = (int32_t *) buffer;
+}
+
+void esp_nn_softmax_s8_opt(const int8_t *input_data,
+                           const int32_t height,
+                           const int32_t width,
+                           const int32_t mult,
+                           const int32_t shift,
+                           const int32_t diff_min,
+                           int8_t *output_data)
+{
+    if (scratch_buf == NULL) {
+        printf("%s error! scratch buffer not set\n", __FUNCTION__);
+        return;
+    }
+    // The representation chosen for the input to the exp() function is Q5.26.
+    // We need to leave extra space since values that we skip might be as large as
+    // -32 before multiplying by input mult, and therefore as large as
+    // -16 afterwards.  Note that exp(-8) is definitely not insignificant to
+    // accumulation, but exp(-16) definitely is.
+#define ACCUM_BITS  12
+#define DIFF_BITS   5
+
+    const int32_t mask = (1 << shift);
+    int32_t col = 0;
+    const int8_t *in_ptr = input_data;
+    int8_t *out_ptr = output_data;
+
+    for (int row_idx = 0; row_idx < height; row_idx++) {
+        int8_t max_in_row = in_ptr[0];
+        for (col = 1; col < width; col++) {
+            max_in_row = max(max_in_row, in_ptr[col]);
+        }
+
+        int32_t input_diff = 0;
+        int32_t sum_of_exps = 0;
+
+        for (col = 0; col < width; col++) {
+            input_diff = in_ptr[col] - max_in_row;
+            if (input_diff >= diff_min) {
+                const int32_t input_diff_rescaled = SAT_HIGH_MUL(input_diff * mask, mult);
+                const int32_t exp_raw = esp_nn_exp_on_negative_values(input_diff_rescaled);
+                scratch_buf[col] = exp_raw; // store to avoid duplicate calculation later
+                sum_of_exps += DIV_POW2(exp_raw, ACCUM_BITS);
+            }
+        }
+
+        const int32_t headroom_plus1 = esp_nn_clz32((uint32_t) sum_of_exps);
+        const int32_t shifted_scale = ONE_OVER_ONE_X((sum_of_exps << headroom_plus1) - (1 << 31));
+        const int32_t bits_over_unit = ACCUM_BITS - headroom_plus1 + 31 - sizeof(int8_t) * 8;
+
+        for (col = 0; col < width; col++) {
+            input_diff = in_ptr[col] - max_in_row;
+            if (input_diff >= diff_min) {
+                int32_t exp_raw = scratch_buf[col];
+                const int32_t shifted_output = SAT_HIGH_MUL(shifted_scale, exp_raw);
+                const int32_t result = DIV_POW2(shifted_output, bits_over_unit) - 128;
+                out_ptr[col] = (int8_t) esp_nn_saturate8(result);
+            } else {
+                out_ptr[col] = -128;
+            }
+        }
+        in_ptr  += width;
+        out_ptr += width;
+    }
+}
--- a/code/components/esp-nn/src/softmax/softmax_common.h
+++ b/code/components/esp-nn/src/softmax/softmax_common.h
@@ -0,0 +1,104 @@
+// Copyright 2022 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <common_functions.h>
+
+#define MASK_IF_ZERO(x)                 (x) == 0 ? ~0 : 0
+#define MASK_IF_NON_ZERO(x)             (x) != 0 ? ~0 : 0
+#define SELECT_USING_MASK(mask, a, b)   ((mask) & (a)) ^ (~(mask) & (b))
+#define SAT_HIGH_MUL(x, y)              esp_nn_sat_round_doubling_high_mul((x), (y))
+#define DIV_POW2(x,y)                   esp_nn_div_by_power_of_two((x), (y))
+
+__NN_FORCE_INLINE__ int32_t mul_power_of_2(int val, int exp)
+{
+    const int32_t thresh = ((1 << (31 - exp)) - 1);
+    int32_t result = val << exp;
+    result = SELECT_USING_MASK(MASK_IF_NON_ZERO(val > thresh), INT32_MAX, result);
+    result = SELECT_USING_MASK(MASK_IF_NON_ZERO(val < -thresh), INT32_MIN, result);
+    return result;
+}
+
+/**
+ * @brief   Calculate `1 / (1 + x)` for x in [0, 1]
+ *
+ * @param   val     input value to calculate `1/(1+x)` for
+ * @return  `int32_t` result
+ * @note    Newton-Raphson division
+ *
+ *          https://en.wikipedia.org/wiki/Division_algorithm#Newton.E2.80.93Raphson_division
+ *          Refer to that page for the logic behind the 48/17 and 32/17 constants.
+ *          Pseudocode: https://en.wikipedia.org/wiki/Division_algorithm#Pseudocode
+ */
+__NN_FORCE_INLINE__ int32_t esp_nn_one_over_one_plus_x_for_x_in_0_1(int32_t val)
+{
+    const int64_t sum = (int64_t) val + INT32_MAX;
+    const int32_t half_denominator = (int32_t) ((sum + (sum >= 0 ? 1 : -1)) / 2L);
+    int32_t constant_48_over_17 = 1515870810;
+    int32_t constant_neg_32_over_17 = -1010580540;
+    int32_t x = constant_48_over_17 + SAT_HIGH_MUL(half_denominator, constant_neg_32_over_17);
+    const int32_t fixed_2_one = (1 << 29);
+
+    x += mul_power_of_2(SAT_HIGH_MUL(x, fixed_2_one - SAT_HIGH_MUL(half_denominator, x)), 2);
+    x += mul_power_of_2(SAT_HIGH_MUL(x, fixed_2_one - SAT_HIGH_MUL(half_denominator, x)), 2);
+    x += mul_power_of_2(SAT_HIGH_MUL(x, fixed_2_one - SAT_HIGH_MUL(half_denominator, x)), 2);
+
+    return mul_power_of_2(x, 1);
+}
+
+#define ONE_OVER_ONE_X(x)   esp_nn_one_over_one_plus_x_for_x_in_0_1((x))
+
+/**
+ * @brief   Return exp(x) for x < 0.
+ *
+ */
+__NN_FORCE_INLINE__ int32_t esp_nn_exp_on_negative_values(int32_t val)
+{
+    int32_t shift = 24;
+
+    const int32_t one_quarter = (1 << shift);
+    int32_t mask = one_quarter - 1;
+    const int32_t val_mod_minus_quarter = (val & mask) - one_quarter;
+    const int32_t remainder             = val_mod_minus_quarter - val;
+
+    // calculate exponent for x in [-1/4, 0) in `result`
+    const int32_t x                     = (val_mod_minus_quarter << 5) + (1 << 28);
+    const int32_t x2                    = SAT_HIGH_MUL(x, x);
+    const int32_t x3                    = SAT_HIGH_MUL(x2, x);
+    const int32_t x4                    = SAT_HIGH_MUL(x2, x2);
+    const int32_t one_over_3            = 715827883;
+    const int32_t one_over_8            = 1895147668;
+
+    const int32_t x4_over_4 = DIV_POW2(x4, 2);
+    const int32_t x4_over_4_plus_x3_over_6_plus_x2_over_2 = DIV_POW2(SAT_HIGH_MUL(x4_over_4 + x3, one_over_3) + x2, 1);
+    int32_t result = one_over_8 + SAT_HIGH_MUL(one_over_8, x + x4_over_4_plus_x3_over_6_plus_x2_over_2);
+
+#define SELECT_IF_NON_ZERO(x) {                                   \
+    mask   = MASK_IF_NON_ZERO(remainder & (1 << shift++));        \
+    result = SELECT_USING_MASK(mask, SAT_HIGH_MUL(result, x), result); \
+}
+
+    SELECT_IF_NON_ZERO(1672461947)
+    SELECT_IF_NON_ZERO(1302514674)
+    SELECT_IF_NON_ZERO(790015084)
+    SELECT_IF_NON_ZERO(290630308)
+    SELECT_IF_NON_ZERO(39332535)
+    SELECT_IF_NON_ZERO(720401)
+    SELECT_IF_NON_ZERO(242)
+
+#undef SELECT_IF_NON_ZERO
+
+    mask = MASK_IF_ZERO(val);
+    return SELECT_USING_MASK(mask, INT32_MAX, result);
+}
--- a/code/components/esp-nn/test_app/CMakeLists.txt
+++ b/code/components/esp-nn/test_app/CMakeLists.txt
@@ -0,0 +1,9 @@
+# The following lines of boilerplate have to be in your project's
+# CMakeLists in this exact order for cmake to work correctly
+cmake_minimum_required(VERSION 3.5)
+
+set(EXTRA_COMPONENT_DIRS "../" "../tests/")
+set(IDF_EXCLUDE_COMPONENTS test test_app)
+
+include($ENV{IDF_PATH}/tools/cmake/project.cmake)
+project(test_app)
--- a/code/components/esp-nn/test_app/main/CMakeLists.txt
+++ b/code/components/esp-nn/test_app/main/CMakeLists.txt
@@ -0,0 +1,7 @@
+
+set(COMPONENT_SRCS "main.c")
+set(COMPONENT_ADD_INCLUDEDIRS "")
+
+set(COMPONENT_PRIV_REQUIRES tests)
+
+register_component()
--- a/code/components/esp-nn/test_app/main/component.mk
+++ b/code/components/esp-nn/test_app/main/component.mk
@@ -0,0 +1,8 @@
+#
+# Main component makefile.
+#
+# This Makefile can be left empty. By default, it will take the sources in the 
+# src/ directory, compile them and link them into lib(subdirectory_name).a
+# in the build directory. This behaviour is entirely configurable,
+# please read the ESP-IDF documents if you need to do this.
+# 
--- a/code/components/esp-nn/test_app/main/main.c
+++ b/code/components/esp-nn/test_app/main/main.c
@@ -0,0 +1,87 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <freertos/FreeRTOS.h>
+#include <freertos/task.h>
+#include <esp_log.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <test_functions.h>
+#include <esp_timer.h>
+
+static const char *TAG = "test_app";
+static uint32_t start_c, start_opt, total_c, total_opt;
+
+void profile_c_start()
+{
+    /* initiate profiling */
+    start_c = esp_cpu_get_ccount();
+}
+
+void profile_c_end()
+{
+    /* record profile number */
+    total_c = esp_cpu_get_ccount() - start_c;
+}
+
+void profile_opt_start()
+{
+    /* initiate profiling */
+    start_opt = esp_cpu_get_ccount();
+}
+
+void profile_opt_end()
+{
+    /* record profile number */
+    total_opt = esp_cpu_get_ccount() - start_opt;
+}
+
+void app_main()
+{
+    /* s8 tests */
+    ESP_LOGI(TAG, "Running s8 tests...");
+    esp_nn_add_elementwise_s8_test();
+    printf("add, c %u opt %u\n", total_c, total_opt);
+    esp_nn_mul_elementwise_s8_test();
+    printf("mul, c %u opt %u\n", total_c, total_opt);
+    esp_nn_depthwise_conv_s8_test();
+    printf("depthwise, c %u opt %u\n", total_c, total_opt);
+    esp_nn_conv_s8_test();
+    printf("conv2d, c %u opt %u\n", total_c, total_opt);
+
+    esp_nn_relu6_s8_test();
+    printf("relu, c %u opt %u\n", total_c, total_opt);
+    esp_nn_avg_pool_s8_test();
+    printf("avg_pool, c %u opt %u\n", total_c, total_opt);
+    esp_nn_max_pool_s8_test();
+    printf("max_pool, c %u opt %u\n", total_c, total_opt);
+    esp_nn_fully_connected_s8_test();
+    printf("fully_connected, c %u opt %u\n", total_c, total_opt);
+    esp_nn_softmax_s8_test();
+    printf("softmax, c %u opt %u\n", total_c, total_opt);
+    ESP_LOGI(TAG, "s8 tests done!\n");
+
+    /* u8 tests */
+    //ESP_LOGI(TAG, "Running u8 tests...");
+    //esp_nn_add_elementwise_u8_test();
+    //esp_nn_depthwise_conv_u8_test();
+    //esp_nn_conv_u8_test();
+    //esp_nn_avg_pool_u8_test();
+    //esp_nn_max_pool_u8_test();
+    //esp_nn_fully_connected_u8_test();
+    //ESP_LOGI(TAG, "u8 tests done!\n");
+}
--- a/code/components/esp-nn/test_app/sdkconfig.defaults
+++ b/code/components/esp-nn/test_app/sdkconfig.defaults
@@ -0,0 +1,5 @@
+
+#
+# esp-nn
+#
+CONFIG_NN_ESP32=y
--- a/code/components/esp-nn/test_app/sdkconfig.defaults.esp32s3
+++ b/code/components/esp-nn/test_app/sdkconfig.defaults.esp32s3
@@ -0,0 +1,8 @@
+# Default configurations for ESP32-S3
+
+CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240=y
+CONFIG_ESP32S3_SPIRAM_SUPPORT=y
+
+CONFIG_ESP32S3_DATA_CACHE_64KB=y
+CONFIG_ESP32S3_DATA_CACHE_8WAYS=y
+CONFIG_ESP32S3_DATA_CACHE_LINE_64B=y
--- a/code/components/esp-nn/tests/CMakeLists.txt
+++ b/code/components/esp-nn/tests/CMakeLists.txt
@@ -0,0 +1,15 @@
+
+set(COMPONENT_ADD_INCLUDEDIRS ./include/)
+set(COMPONENT_SRCS "src/basic_math_test.c"
+                   "src/convolution_test.c"
+                   "src/fully_connected_test.c"
+                   "src/pooling_test.c"
+                   "src/relu_test.c"
+                   "src/softmax_test.c")
+
+set(COMPONENT_REQUIRES )
+set(COMPONENT_PRIV_REQUIRES esp-nn)
+
+register_component()
+
+target_compile_options(${COMPONENT_LIB} PRIVATE -Wno-unused-function)
--- a/code/components/esp-nn/tests/README.md
+++ b/code/components/esp-nn/tests/README.md
@@ -0,0 +1,4 @@
+# Tests for esp_nn library
+
+- Include these in your test framework and run the framework.
+- For IDF test please refer `test_app`
--- a/code/components/esp-nn/tests/component.mk
+++ b/code/components/esp-nn/tests/component.mk
@@ -0,0 +1,5 @@
+#FIXME
+
+COMPONENT_ADD_INCLUDEDIRS := include/
+
+COMPONENT_SRCDIRS :=  src/
--- a/code/components/esp-nn/tests/include/test_functions.h
+++ b/code/components/esp-nn/tests/include/test_functions.h
@@ -0,0 +1,48 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+
+/* int8_t ops tests */
+void esp_nn_add_elementwise_s8_test();
+void esp_nn_mul_elementwise_s8_test();
+
+void esp_nn_depthwise_conv_s8_test();
+void esp_nn_conv_s8_test();
+
+void esp_nn_avg_pool_s8_test();
+void esp_nn_max_pool_s8_test();
+
+void esp_nn_fully_connected_s8_test();
+
+void esp_nn_relu6_s8_test();
+
+void esp_nn_softmax_s8_test();
+
+/* uint8_t ops tests */
+void esp_nn_add_elementwise_u8_test();
+
+void esp_nn_depthwise_conv_u8_test();
+void esp_nn_conv_u8_test();
+
+void esp_nn_avg_pool_u8_test();
+void esp_nn_max_pool_u8_test();
+
+void esp_nn_fully_connected_u8_test();
+
+/* instructions test functions */
+void compare_instructions_test();
+void arith_instructions_test();
+void min_max_instructions_test();
+void bitwise_instructions_test();
+void load_store_instructions_test();
--- a/code/components/esp-nn/tests/include/test_utils.h
+++ b/code/components/esp-nn/tests/include/test_utils.h
@@ -0,0 +1,87 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <common_functions.h>
+#include <stdio.h>
+
+/* mult value range */
+#define MULT_MAX    INT32_MAX
+#define MULT_MIN    0
+
+/* shift value range */
+#define SHIFT_MIN   -31
+#define SHIFT_MAX   30
+
+/**
+ * @brief callback function to run before C function
+ */
+void profile_c_start();
+
+/**
+ * @brief callback function to run after C function
+ */
+void profile_c_end();
+
+/**
+ * @brief callback function to run before optimized function
+ */
+void profile_opt_start();
+
+/**
+ * @brief callback function to run after optimized function
+ */
+void profile_opt_end();
+
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_GREEN   "\x1b[32m"
+#define ANSI_COLOR_YELLOW  "\x1b[33m"
+#define ANSI_COLOR_BLUE    "\x1b[34m"
+#define ANSI_COLOR_MAGENTA "\x1b[35m"
+#define ANSI_COLOR_CYAN    "\x1b[36m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
+#define CHECK_EQUAL(ARRAY1, ARRAY2, size) ({    \
+    bool res = true;                            \
+    for (int _i = 0; _i < size; _i++) {         \
+        if (ARRAY1[_i] != ARRAY2[_i]) {         \
+            res = false;                        \
+            break;                              \
+        }                                       \
+    }                                           \
+    res;                                        \
+})
+
+#define PRINT_ARRAY_INT(ARRAY, width, height) ({        \
+    int *_array = (int *) ARRAY;                        \
+    for (int _j = 0; _j < height; _j++) {               \
+        for (int _i = 0; _i < width; _i++) {            \
+            printf("%d\t", _array[width * _j + _i]);    \
+        }                                               \
+        printf("\n");                                   \
+    }                                                   \
+    printf("\n");                                       \
+})
+
+#define PRINT_ARRAY_HEX(ARRAY, width, height) ({        \
+    uint8_t *_array = (uint8_t *) ARRAY;                \
+    for (int _j = 0; _j < height; _j++) {               \
+        for (int _i = 0; _i < width; _i++) {            \
+            printf("%02x\t", _array[width * _j + _i]);  \
+        }                                               \
+        printf("\n");                                   \
+    }                                                   \
+    printf("\n");                                       \
+})
--- a/code/components/esp-nn/tests/src/basic_math_test.c
+++ b/code/components/esp-nn/tests/src/basic_math_test.c
@@ -0,0 +1,355 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+
+#include <common_functions.h>
+#include <esp_nn.h>
+#include "test_utils.h"
+
+#if CONFIG_IDF_CMAKE
+#if (CONFIG_SPIRAM_SUPPORT && (CONFIG_SPIRAM_USE_CAPS_ALLOC || CONFIG_SPIRAM_USE_MALLOC))
+#define IDF_HEAP_CAPS 1
+#endif
+
+#if IDF_HEAP_CAPS
+#include "esp_heap_caps.h"
+#endif
+#endif
+
+void esp_nn_add_elementwise_s8_test()
+{
+    /* prepare data */
+    const int size = 1600 + 8 + 7; /* odd len to test leftover */
+    int8_t *input1;
+    int8_t *input2;
+    int8_t *out_data_c;
+    int8_t *out_data_opt;
+    int8_t *input1_orig = NULL;
+    int8_t *input2_orig = NULL;
+    int8_t *out_c_orig = NULL;
+    int8_t *out_opt_orig = NULL;
+    int32_t input1_offset = 34;
+    int32_t input2_offset = 35;
+    int32_t output_offset = 36;
+    int32_t input1_shift = -8; // right_shift amt always <= 0
+    int32_t input2_shift = -8; // right_shift amt always <= 0
+    int32_t output_shift = -9; // right_shift amt always <= 0
+    int32_t left_shift = 15; // always +ve
+    int32_t input1_mult = INT32_MAX;
+    int32_t input2_mult = INT32_MAX;
+    int32_t output_mult = INT32_MAX;
+    int32_t activation_min = -128;
+    int32_t activation_max = 127;
+
+    for (int itr = 0; itr < 10; itr++) {
+        switch (itr) {
+        case 0: // all zeros
+            input1_offset = 0;
+            input2_offset = 0;
+            output_offset = 0;
+            input1_mult = 0;
+            input2_mult = 0;
+            output_mult = 0;
+            input1_shift = 0;
+            input2_shift = 0;
+            output_shift = 0;
+            left_shift = 0;
+        break;
+        case 1: // hit min
+            input1_offset = -127;
+            input2_offset = -127;
+            output_offset = -128;
+            input1_mult = MULT_MIN;
+            input2_mult = MULT_MIN;
+            output_mult = MULT_MIN;
+            input1_shift = 0;
+            input2_shift = 0;
+            output_shift = 0;
+            left_shift = 0;
+        break;
+        case 2: // hit max
+            input1_offset = 128;
+            input2_offset = 128;
+            output_offset = -127;
+            input1_mult = MULT_MAX;
+            input2_mult = MULT_MAX;
+            output_mult = MULT_MAX;
+            input1_shift = SHIFT_MIN;
+            input2_shift = SHIFT_MIN;
+            output_shift = SHIFT_MIN;
+            left_shift = 30 - 8; // since input is 8 bits
+        break;
+        case 3: // hit extreme max
+            input1_offset = 128;
+            input2_offset = 128;
+            output_offset = -127;
+            input1_mult = MULT_MAX;
+            input2_mult = MULT_MAX;
+            output_mult = MULT_MAX;
+            input1_shift = 0;
+            input2_shift = 0;
+            output_shift = 0;
+            left_shift = 30 - 8; // -8 since input is 8 bit
+        break;
+        default:  // practical random input
+            input1_offset = rand() % 256 - 127; // range [-127, 128]
+            input2_offset = rand() % 256 - 127; // range [-127, 128]
+            output_offset = rand() % 256 - 128; // range [-128, 127]
+            input1_mult = MULT_MAX / 2 + rand() % INT16_MAX;
+            input2_mult = MULT_MAX / 2 + rand() % INT16_MAX;
+            output_mult = MULT_MAX / 2 + rand() % INT16_MAX;
+            input1_shift = -8 + rand() % 4;
+            input2_shift = -8 + rand() % 4;
+            output_shift = -8 + rand() % 4;
+            left_shift = rand() % 15;
+        }
+#if IDF_HEAP_CAPS
+        input1_orig = (int8_t *) heap_caps_malloc(size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        input2_orig = (int8_t *) heap_caps_malloc(size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        out_c_orig = (int8_t *) heap_caps_malloc(size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        out_opt_orig = (int8_t *) heap_caps_malloc(size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+
+        input1 = 16 + input1_orig - ((uint32_t) input1_orig & 0xf);
+        input2 = 16 + input2_orig - ((uint32_t) input2_orig & 0xf);
+        out_data_c = 16 + out_c_orig - ((uint32_t) out_c_orig & 0xf);
+        out_data_opt = 16 + out_opt_orig - ((uint32_t) out_opt_orig & 0xf);
+#else
+        input1 = memalign(16, size);
+        input2 = memalign(16, size);
+        out_data_c = memalign(16, size);
+        out_data_opt = memalign(16, size);
+
+        input1_orig = input1;
+        input2_orig = input2;
+        out_c_orig = out_data_c;
+        out_opt_orig = out_data_opt;
+#endif
+        if (input1_orig == NULL || input2_orig == NULL || out_c_orig == NULL ||
+                out_opt_orig == NULL) {
+            printf(ANSI_COLOR_RED"%s error allocating buffers\n"ANSI_COLOR_RESET, __FUNCTION__);
+            goto elementwise_add_test_cleanup;
+        }
+
+        for (int i = 0; i < size; ++i) {
+            input1[i] = rand() % 256 - 128;
+            input2[i] = rand() % 256 - 128;
+        }
+
+        if (itr == 0) {
+            /* enable profiler */
+            profile_c_start();
+        }
+        /* C function */
+        esp_nn_add_elementwise_s8_ansi(input1, input2, input1_offset, input2_offset,
+                                       input1_mult, input2_mult, input1_shift, input2_shift,
+                                       left_shift, out_data_c, output_offset, output_mult,
+                                       output_shift, activation_min, activation_max, size);
+
+        if (itr == 0) {
+            profile_c_end();
+            profile_opt_start();
+        }
+
+        /* Optimized function */
+        esp_nn_add_elementwise_s8(input1, input2, input1_offset, input2_offset,
+                                  input1_mult, input2_mult, input1_shift, input2_shift,
+                                  left_shift, out_data_opt, output_offset, output_mult,
+                                  output_shift, activation_min, activation_max, size);
+        if (itr == 0) {
+            /* disable profiler */
+            profile_opt_end();
+        }
+
+        bool ret = CHECK_EQUAL(out_data_c, out_data_opt, size);
+        if (ret == false) {
+            printf(ANSI_COLOR_RED"%s[%d] failed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+            printf("Output: \n");
+            PRINT_ARRAY_HEX(out_data_opt, size, 1);
+            printf("Expected: \n");
+            PRINT_ARRAY_HEX(out_data_c, size, 1);
+            printf("Input1:\n");
+            PRINT_ARRAY_HEX(input1, size, 1);
+            printf("Input2:\n");
+            PRINT_ARRAY_HEX(input2, size, 1);
+            printf("in1_shift %d, in2_shift %d, left_shift %d, out_shift %d\n",
+                   input1_shift, input2_shift, left_shift, output_shift);
+            printf("in1_mult %d, in2_mult %d, out_mult %d\n", input1_mult, input2_mult, output_mult);
+            goto elementwise_add_test_cleanup;
+        }
+        printf(ANSI_COLOR_GREEN"%s[%d] passed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+
+elementwise_add_test_cleanup:
+        if (input1_orig) {
+            free(input1_orig);
+        }
+        if (input2_orig) {
+            free(input2_orig);
+        }
+        if (out_c_orig) {
+            free(out_c_orig);
+        }
+        if (out_opt_orig) {
+            free(out_opt_orig);
+        }
+    }
+}
+
+void esp_nn_mul_elementwise_s8_test()
+{
+    /* prepare data */
+    const int size = 1600 + 8 + 7; /* odd len to test leftover */
+    int8_t *input1;
+    int8_t *input2;
+    int8_t *out_data_c;
+    int8_t *out_data_opt;
+    int32_t input1_offset = 34;
+    int32_t input2_offset = 35;
+    int32_t output_offset = 36;
+    int32_t output_shift = -7;
+    int32_t output_mult = MULT_MAX; // max out_mult
+    int32_t activation_min = -128;
+    int32_t activation_max = 127;
+    int8_t *input1_orig = NULL;
+    int8_t *input2_orig = NULL;
+    int8_t *out_c_orig = NULL;
+    int8_t *out_opt_orig = NULL;
+
+    for (int itr = 0; itr < 10; itr++) {
+        switch (itr) {
+        case 0: // all zeros
+            input1_offset = 0;
+            input2_offset = 0;
+            output_offset = 0;
+            output_mult = 0;
+            output_shift = 0;
+        break;
+        case 1: // hit min
+            input1_offset = -127;
+            input2_offset = -127;
+            output_offset = -128;
+            output_mult = MULT_MIN;
+            output_shift = 0;
+        break;
+        case 2: // hit max
+            input1_offset = 128;
+            input2_offset = 128;
+            output_offset = -127;
+            output_mult = MULT_MAX;
+            output_shift = SHIFT_MIN;
+        break;
+        case 3: // hit extreme max
+            input1_offset = 128;
+            input2_offset = 128;
+            output_offset = -127;
+            output_mult = MULT_MAX;
+            output_shift = 0;
+        break;
+        default:  // practical random input
+            input1_offset = rand() % 256 - 127; // range [-127, 128]
+            input2_offset = rand() % 256 - 127; // range [-127, 128]
+            output_offset = rand() % 256 - 128; // range [-128, 127]
+            output_mult = MULT_MAX / 2 + rand() % INT16_MAX;
+            output_shift = -8 + rand() % 4;
+        }
+
+#if IDF_HEAP_CAPS
+        input1_orig = (int8_t *) heap_caps_malloc(size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        input2_orig = (int8_t *) heap_caps_malloc(size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        out_c_orig = (int8_t *) heap_caps_malloc(size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        out_opt_orig = (int8_t *) heap_caps_malloc(size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+
+        input1 = 16 + input1_orig - ((uint32_t) input1_orig & 0xf);
+        input2 = 16 + input2_orig - ((uint32_t) input2_orig & 0xf);
+        out_data_c = 16 + out_c_orig - ((uint32_t) out_c_orig & 0xf);
+        out_data_opt = 16 + out_opt_orig - ((uint32_t) out_opt_orig & 0xf);
+#else
+        input1 = memalign(16, size);
+        input2 = memalign(16, size);
+        out_data_c = memalign(16, size);
+        out_data_opt = memalign(16, size);
+
+        input1_orig = input1;
+        input2_orig = input2;
+        out_c_orig = out_data_c;
+        out_opt_orig = out_data_opt;
+#endif
+        if (input1_orig == NULL || input2_orig == NULL || out_c_orig == NULL ||
+                out_opt_orig == NULL) {
+            printf(ANSI_COLOR_RED"%s error allocating buffers\n"ANSI_COLOR_RESET, __FUNCTION__);
+            goto elementwise_mult_test_cleanup;
+        }
+
+        for (int i = 0; i < size; ++i) {
+            input1[i] = rand() % 256 - 128;
+            input2[i] = rand() % 256 - 128;
+        }
+
+        if (itr == 0) {
+            /* enable profiler */
+            profile_c_start();
+        }
+        /* C function */
+        esp_nn_mul_elementwise_s8_ansi(input1, input2, input1_offset, input2_offset,
+                                       out_data_c, output_offset, output_mult, output_shift,
+                                       activation_min, activation_max, size);
+
+        if (itr == 0) {
+            profile_c_end();
+            profile_opt_start();
+        }
+        /* Optimized function */
+        esp_nn_mul_elementwise_s8(input1, input2, input1_offset, input2_offset,
+                                  out_data_opt, output_offset, output_mult, output_shift,
+                                  activation_min, activation_max, size);
+
+        if (itr == 0) {
+            /* disable profiler */
+            profile_opt_end();
+        }
+
+        bool ret = CHECK_EQUAL(out_data_c, out_data_opt, size);
+        if (ret == false) {
+            printf(ANSI_COLOR_RED"%s[%d] failed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+            printf("Output: \n");
+            PRINT_ARRAY_HEX(out_data_opt, size, 1);
+            printf("Expected: \n");
+            PRINT_ARRAY_HEX(out_data_c, size, 1);
+            printf("Input1:\n");
+            PRINT_ARRAY_HEX(input1, size, 1);
+            printf("Input2:\n");
+            PRINT_ARRAY_HEX(input2, size, 1);
+            goto elementwise_mult_test_cleanup;
+        }
+        printf(ANSI_COLOR_GREEN"%s[%d] passed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+
+elementwise_mult_test_cleanup:
+        if (input1_orig) {
+            free(input1_orig);
+        }
+        if (input2_orig) {
+            free(input2_orig);
+        }
+        if (out_c_orig) {
+            free(out_c_orig);
+        }
+        if (out_opt_orig) {
+            free(out_opt_orig);
+        }
+    }
+}
--- a/code/components/esp-nn/tests/src/convolution_test.c
+++ b/code/components/esp-nn/tests/src/convolution_test.c
@@ -0,0 +1,605 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+
+#include <esp_nn.h>
+#include "test_utils.h"
+
+#if CONFIG_IDF_CMAKE
+#if (CONFIG_SPIRAM_SUPPORT && (CONFIG_SPIRAM_USE_CAPS_ALLOC || CONFIG_SPIRAM_USE_MALLOC))
+#define IDF_HEAP_CAPS 1
+#endif
+#if IDF_HEAP_CAPS
+#include "esp_heap_caps.h"
+#endif
+#endif
+
+void esp_nn_depthwise_conv_s8_test()
+{
+    int8_t *input = NULL, *filter_data = NULL, *out_data_c = NULL, *out_data_opt = NULL;
+    int32_t *bias = NULL;
+    int32_t input_offset = 5; /* some number in [-128, 127] */
+    int32_t out_offset = 7;
+    int32_t activation_min = -125;
+    int32_t activation_max = 120;
+    void *scratch_buf = NULL;
+
+    /* independent variables */
+    int input_wd, input_ht, channels;
+    uint16_t filter_ht, filter_wd, ch_mult;
+    uint16_t pad_wd, pad_ht, stride_wd, stride_ht;
+
+    // run for 15 iterations
+    for (int itr = 0; itr < 15; itr++) {
+        /* prepare data */
+        switch (itr) {
+        case 0: // (ch_mult 1, (channels % 16) = 0), filter (3,3), pad (0,0)
+            input_wd = 18;
+            input_ht = 18;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 16;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 1: // (ch_mult 1, (channels % 16) = 0), filter (3,3), pad (1,1)
+            input_wd = 10;
+            input_ht = 10;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 16;
+            pad_wd = 1;
+            pad_ht = 1;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 2: // (ch_mult 1, (channels % 8) = 0), filter (3,3), pad (1,1)
+            input_wd = 10;
+            input_ht = 10;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 24;
+            pad_wd = 1;
+            pad_ht = 1;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 3: // other filter sizes (ch_mult 1, (channels % 8) = 0)
+            input_wd = 10;
+            input_ht = 10;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 24;
+            pad_wd = 1;
+            pad_ht = 1;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 4: // other filter sizes (ch_mult 8 = 0)
+            input_wd = 6;
+            input_ht = 6;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 8;
+            channels = 4;
+            pad_wd = 1;
+            pad_ht = 1;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 5: // other filter sizes (ch_mult 8 = 0)
+            input_wd = 12;
+            input_ht = 12;
+            filter_ht = 5;
+            filter_wd = 5;
+            ch_mult = 8;
+            channels = 4;
+            pad_wd = 1;
+            pad_ht = 1;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 6: // other filter sizes (ch_mult 4 = 0)
+            input_wd = 6;
+            input_ht = 6;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 4;
+            channels = 4;
+            pad_wd = 1;
+            pad_ht = 1;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 7: // (ch_mult 1, (channels % 16) = 0), filter (3,3), pad (0,0)  stride (2,2)
+            input_wd = 6;
+            input_ht = 6;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 16;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 2;
+            stride_ht = 2;
+            break;
+        case 8: // same as case 7, with large parameters
+            input_wd = 58;
+            input_ht = 58;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 128;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 2;
+            stride_ht = 2;
+            break;
+        case 9: // (ch_mult 1, (channels % 16) = 0), filter (3,3), pad (0,0)  stride (2,2)
+            input_wd = 6;
+            input_ht = 6;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 16;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 2;
+            stride_ht = 2;
+            break;
+        default:
+            input_wd = 6;
+            input_ht = 6;
+            filter_ht = 3;
+            filter_wd = 3;
+            ch_mult = 1;
+            channels = 16;
+            stride_wd = rand() % 2 + 1;
+            stride_ht = stride_wd;
+            pad_wd = stride_wd == 1 ? 0 : rand() % 2;
+            pad_ht = pad_wd;
+            printf("stride(%d), pad (%d)\t", stride_wd, pad_wd);
+            break;
+        }
+
+        uint16_t out_wd = (input_wd - filter_wd + 1) / stride_wd;
+        uint16_t out_ht = (input_ht - filter_ht + 1) / stride_ht;
+        if (itr == 9) {
+            // expect the function to handle this gracefully
+            out_wd += 1;
+            out_ht += 1;
+        }
+        int in_size = input_wd * input_ht * channels;
+        int out_size = out_wd * out_ht * channels * ch_mult;
+        int filter_size = filter_wd * filter_ht * channels * ch_mult + 4;
+        int bias_size = channels * ch_mult + 1;
+        int32_t out_shift[channels * ch_mult];
+        int32_t out_mult[channels * ch_mult];
+
+#if IDF_HEAP_CAPS
+        int8_t *input_orig = (int8_t *) heap_caps_malloc(in_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        int8_t *out_c_orig = (int8_t *) heap_caps_malloc(out_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        int8_t *out_opt_orig = (int8_t *) heap_caps_malloc(out_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        filter_data = (int8_t *) heap_caps_malloc(filter_size, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        bias = (int32_t *) heap_caps_malloc(bias_size * 4, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+
+        input = 16 + input_orig - ((uint32_t) input_orig & 0xf);
+        out_data_c = 16 + out_c_orig - ((uint32_t) out_c_orig & 0xf);
+        out_data_opt = 16 + out_opt_orig - ((uint32_t) out_opt_orig & 0xf);
+#else
+        input = memalign(16, in_size + 16);
+        filter_data = memalign(16, filter_size);
+        out_data_c = memalign(16, out_size + 16);
+        out_data_opt = memalign(16, out_size + 16);
+        bias = memalign(16, bias_size * 4);
+        int8_t *input_orig = input;
+        int8_t *out_c_orig = out_data_c;
+        int8_t *out_opt_orig = out_data_opt;
+#endif
+        if (bias == NULL || input == NULL || filter_data == NULL ||
+                out_data_c == NULL || out_data_opt == NULL || bias == NULL) {
+            printf(ANSI_COLOR_RED"%s[%d] allocations failed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+            goto dc_s8_cleanup;
+        }
+
+        /* Generate input data */
+        for (int i = 0; i < in_size; ++i) {
+            input[i] = rand() % 128;
+        }
+
+        /* Generate filter data */
+        for (int i = 0; i < filter_size; ++i) {
+            filter_data[i] = rand() % 256 - 128;
+        }
+
+        /* Generate bias data */
+        for (int i = 0; i < channels * ch_mult; ++i) {
+            bias[i + 1] = rand() % INT16_MAX; //0th index left for unalignment
+            out_shift[i] = -8 + rand() % 3;
+            out_mult[i] = 0x7eb0e200 + rand() % 50;
+        }
+
+        data_dims_t input_dims = {.width = input_wd, .height = input_ht, .channels = channels, 1};
+        data_dims_t output_dims = {.width = out_wd, .height = out_ht, .channels = channels * ch_mult, 1};
+        data_dims_t filter_dims = {.width = filter_wd, .height = filter_ht, 0, 0};
+        dw_conv_params_t conv_params = {.in_offset = input_offset, .out_offset = out_offset, .ch_mult = ch_mult,
+                                        .stride = {stride_wd, stride_ht}, .padding = {pad_wd, pad_ht},
+                                        .dilation = {0, 0}, .activation = {activation_min, activation_max}};
+        quant_data_t quant_data = {.shift = out_shift, .mult = out_mult};
+
+        int scratch_buf_size = esp_nn_get_depthwise_conv_scratch_size(&input_dims, &filter_dims,
+                                                                      &output_dims, &conv_params);
+        if (scratch_buf_size > 0) {
+#if IDF_HEAP_CAPS
+            scratch_buf = heap_caps_malloc(scratch_buf_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+            int align_sz = 16 - (((int32_t) scratch_buf) & 0xf);
+#else
+            scratch_buf = memalign(16, scratch_buf_size);
+            int align_sz = 0;
+#endif
+            if (scratch_buf == NULL) {
+                printf(ANSI_COLOR_RED"%s[%d] scratch_buf alloc failed size %d\n"ANSI_COLOR_RESET,
+                       __FUNCTION__, itr, scratch_buf_size);
+                goto dc_s8_cleanup;
+            }
+            esp_nn_set_depthwise_conv_scratch_buf(scratch_buf + align_sz);
+        }
+        if (itr == 0) {
+            /* enable profiler */
+            profile_c_start();
+        }
+
+        /* C function */
+        esp_nn_depthwise_conv_s8_ansi(&input_dims, input, &filter_dims, filter_data + 4,
+                                      bias + 1, &output_dims, out_data_c, &conv_params, &quant_data);
+
+        if (itr == 0) {
+            profile_c_end();
+            profile_opt_start();
+        }
+
+        /* Optimized function */
+        esp_nn_depthwise_conv_s8(&input_dims, input, &filter_dims, filter_data + 4,
+                                 bias + 1, &output_dims, out_data_opt, &conv_params, &quant_data);
+
+        if (itr == 0) {
+            /* disable profiler */
+            profile_opt_end();
+        }
+
+        bool ret = CHECK_EQUAL(out_data_c, out_data_opt, out_size);
+        if (ret == false) {
+            printf(ANSI_COLOR_RED"%s[%d] failed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+            printf("Output: \n");
+            PRINT_ARRAY_HEX(out_data_opt, out_size / out_ht, out_ht);
+            printf("Expected: \n");
+            PRINT_ARRAY_HEX(out_data_c, out_size / out_ht, out_ht);
+            printf("Input:\n");
+            PRINT_ARRAY_HEX(input, in_size / input_ht, input_ht);
+            printf("Filter data:\n");
+            PRINT_ARRAY_HEX(filter_data + 4, (filter_size - 4) / filter_ht, filter_ht);
+            printf("bias data:\n");
+            PRINT_ARRAY_INT(bias + 1, ch_mult * channels, 1);
+            goto dc_s8_cleanup;
+        }
+        printf(ANSI_COLOR_GREEN"%s[%d] passed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+
+    dc_s8_cleanup:
+        if (input) {
+            free(input_orig);
+        }
+        if (filter_data) {
+            free(filter_data);
+        }
+        if (out_data_c) {
+            free(out_c_orig);
+        }
+        if (out_data_opt) {
+            free(out_opt_orig);
+        }
+        if (bias) {
+            free(bias);
+        }
+        if (scratch_buf) {
+            free(scratch_buf);
+        }
+    }
+}
+
+void esp_nn_conv_s8_test()
+{
+    const int32_t input_offset = 5; /* some number in [-128, 127] */
+    const int32_t activation_min = -125;
+    const int32_t activation_max = 122;
+    const int32_t out_offset = 3;
+
+    void *scratch_buf = NULL;
+    int8_t *input_orig;
+    int8_t *out_c_orig;
+    int8_t *out_opt_orig;
+    int8_t *filter_data;
+    int32_t *bias;
+
+    /* independent variable */
+    int in_wd, in_ht, in_channels, out_channels;
+    uint16_t filter_ht, filter_wd;
+    uint16_t pad_wd, pad_ht, stride_wd, stride_ht;
+
+    // run for 10 iterations
+    for (int itr = 0; itr < 10; itr++) {
+        switch (itr) {
+        case 0: // ch % 8 == 0 && filter (1,1), padding (0,0)
+            in_wd = 10;
+            in_ht = 10;
+            in_channels = 64;
+            out_channels = 64;
+            filter_ht = 1;
+            filter_wd = 1;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 1: // ch % 4 == 0 && (in_wd * in_ht) % 16 == 0
+            in_wd = 4;
+            in_ht = 4;
+            in_channels = 20;
+            out_channels = 8;
+            filter_ht = 1;
+            filter_wd = 1;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 2: // ch, filter (3x3x3)
+            in_wd = 10;
+            in_ht = 10;
+            in_channels = 3;
+            out_channels = 64;
+            filter_ht = 3;
+            filter_wd = 3;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 3: // remaining pad (0, 0)
+            in_wd = 10;
+            in_ht = 10;
+            in_channels = 3;
+            out_channels = 64;
+            filter_ht = 1;
+            filter_wd = 1;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 4: // unopt case
+            in_wd = 10;
+            in_ht = 10;
+            in_channels = 12;
+            out_channels = 64;
+            filter_ht = 3;
+            filter_wd = 3;
+            pad_wd = 1;
+            pad_ht = 1;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        case 5: // ch % 8 == 0 & stride (2,2)
+            in_wd = 16;
+            in_ht = 16;
+            in_channels = 16;
+            out_channels = 16;
+            filter_ht = 1;
+            filter_wd = 1;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 2;
+            stride_ht = 2;
+            break;
+        case 6: // ch % 8 == 0 && filter (1,1), padding (0,0)
+            in_wd = 2;
+            in_ht = 2;
+            in_channels = 8;
+            out_channels = 8;
+            filter_ht = 1;
+            filter_wd = 1;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        default: // ch % 8 == 0
+            in_wd = 8;
+            in_ht = 8;
+            in_channels = 16;
+            out_channels = 16;
+            filter_ht = 1;
+            filter_wd = 1;
+            pad_wd = 0;
+            pad_ht = 0;
+            stride_wd = 1;
+            stride_ht = 1;
+            break;
+        }
+
+        /* prepare data */
+        uint16_t out_wd = (in_wd - filter_wd + 1) / stride_wd;
+        uint16_t out_ht = (in_ht - filter_ht + 1) / stride_ht;
+
+        int in_size = in_wd * in_ht * in_channels;
+        int filter_size = filter_wd * filter_ht * in_channels * out_channels + 2;
+        int out_size = out_wd * out_ht * out_channels;
+
+#if IDF_HEAP_CAPS
+        input_orig = (int8_t *) heap_caps_malloc(in_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        out_c_orig = (int8_t *) heap_caps_malloc(out_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        out_opt_orig = (int8_t *) heap_caps_malloc(out_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        filter_data = (int8_t *) heap_caps_malloc(filter_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+        bias = (int32_t *) heap_caps_malloc(128 + sizeof (int32_t) * out_channels, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+
+        int8_t *input = 16 + input_orig - ((uint32_t) input_orig & 0xf);
+        int8_t *out_data_c = 16 + out_c_orig - ((uint32_t) out_c_orig & 0xf);
+        int8_t *out_data_opt = 16 + out_opt_orig - ((uint32_t) out_opt_orig & 0xf);
+#else
+        int8_t *input = memalign(16, in_size);
+        int8_t *out_data_c = memalign(16, out_size);
+        int8_t *out_data_opt = memalign(16, out_size);
+        filter_data = memalign(16, filter_size);
+        bias = calloc(1, 128 + sizeof (int32_t) * out_channels);
+        input_orig = input;
+        out_c_orig = out_data_c;
+        out_opt_orig = out_data_opt;
+#endif
+        int32_t *out_shift = calloc(1, 128 + sizeof (int32_t) * out_channels);
+        int32_t *out_mult = calloc(1, 128 + sizeof (int32_t) * out_channels);
+
+        if (input == NULL || filter_data == NULL ||
+                out_data_c == NULL || out_data_opt == NULL) {
+            printf(ANSI_COLOR_RED"%s allocations failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+            goto conv_s8_cleanup;
+        }
+
+        if (bias == NULL || out_shift == NULL || out_mult == NULL) {
+            printf(ANSI_COLOR_RED"%s allocations failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+            goto conv_s8_cleanup;
+        }
+
+        /* Generate input data between -128 -> +127 */
+        for (int i = 0; i < in_size; ++i) {
+            input[i] = rand() % 255 - 128;
+        }
+
+        /* Generate filter data between -128 -> +127 */
+        for (int i = 0; i < filter_size; ++i) {
+            filter_data[i] = rand() % 256 - 128;
+        }
+
+        /* Generate bias data */
+        for (int i = 0; i < out_channels; ++i) {
+            bias[i] = (int32_t)rand() % UINT16_MAX + UINT8_MAX;
+        }
+
+        /* Shift and multiplier */
+        for (int i = 0; i < out_channels; ++i) {
+            out_shift[i] = -10 + rand() % 2;
+            out_mult[i] = 0x7f67f4f8 + rand() % 50;
+        }
+
+        data_dims_t input_dims = {.width = in_wd, .height = in_ht, .channels = in_channels, 1};
+        data_dims_t output_dims = {.width = out_wd, .height = out_ht, .channels = out_channels, 1};
+        data_dims_t filter_dims = {.width = filter_wd, .height = filter_ht, 0, 0};
+        conv_params_t conv_params = {.in_offset = input_offset, .out_offset = out_offset,
+                                    .stride = {stride_wd, stride_ht}, .padding = {pad_wd, pad_ht},
+                                    .dilation = {0, 0}, .activation = {activation_min, activation_max}};
+        quant_data_t quant_data = {.shift = out_shift, .mult = out_mult};
+
+        int scratch_buf_size = esp_nn_get_conv_scratch_size(&input_dims, &filter_dims,
+                                                            &output_dims, &conv_params);
+        if (scratch_buf_size > 0) {
+#if IDF_HEAP_CAPS
+            void *scratch_buf = heap_caps_malloc(scratch_buf_size + 32, MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
+            int align_sz = 16 - (((int32_t) scratch_buf) & 0xf);
+#else
+            void *scratch_buf = memalign(16, scratch_buf_size);
+            int align_sz = 0;
+#endif
+            if (scratch_buf == NULL) {
+                printf(ANSI_COLOR_RED"%s scratch_buf alloc failed size %d\n"ANSI_COLOR_RESET, __FUNCTION__, scratch_buf_size);
+                goto conv_s8_cleanup;
+            }
+            esp_nn_set_conv_scratch_buf(scratch_buf + align_sz);
+        }
+
+        if (itr == 0) {
+            /* enable profiler */
+            profile_c_start();
+        }
+
+        /* C function */
+        esp_nn_conv_s8_ansi(&input_dims, input, &filter_dims, filter_data + 2,
+                            bias, &output_dims, out_data_c, &conv_params, &quant_data);
+
+        if (itr == 0) {
+            profile_c_end();
+            profile_opt_start();
+        }
+
+        /* Optimized function */
+        esp_nn_conv_s8(&input_dims, input, &filter_dims, filter_data + 2,
+                       bias, &output_dims, out_data_opt, &conv_params, &quant_data);
+
+        if (itr == 0) {
+            /* disable profiler */
+            profile_opt_end();
+        }
+
+        bool ret = CHECK_EQUAL(out_data_c, out_data_opt, out_size);
+        if (ret == false) {
+            printf(ANSI_COLOR_RED"%s[%d] failed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+            printf("Output: \n");
+            PRINT_ARRAY_HEX(out_data_opt, out_size / out_ht, out_ht);
+            printf("Expected: \n");
+            PRINT_ARRAY_HEX(out_data_c, out_size / out_ht, out_ht);
+            printf("Input:\n");
+            PRINT_ARRAY_HEX(input, in_size / in_ht, in_ht);
+            printf("Filter data:\n");
+            PRINT_ARRAY_HEX(filter_data + 2, (filter_size - 2) / filter_ht, filter_ht);
+            printf("bias data:\n");
+            PRINT_ARRAY_INT(bias, out_channels, 1);
+            goto conv_s8_cleanup;
+        }
+        printf(ANSI_COLOR_GREEN"%s[%d] passed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+
+    conv_s8_cleanup:
+        if (input) {
+            free(input_orig);
+        }
+        if (filter_data) {
+            free(filter_data);
+        }
+        if (out_data_c) {
+            free(out_c_orig);
+        }
+        if (out_data_opt) {
+            free(out_opt_orig);
+        }
+        if (bias) {
+            free(bias);
+        }
+        if (out_shift) {
+            free(out_shift);
+        }
+        if (out_mult) {
+            free(out_mult);
+        }
+        if (scratch_buf) {
+            free(scratch_buf);
+        }
+    }
+}
--- a/code/components/esp-nn/tests/src/fully_connected_test.c
+++ b/code/components/esp-nn/tests/src/fully_connected_test.c
@@ -0,0 +1,111 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <esp_nn.h>
+#include "test_utils.h"
+
+
+void esp_nn_fully_connected_s8_test()
+{
+    /* prepare data */
+    static uint16_t row_len = 256 + 8 + 7; /* odd len to test unaligned+left-over */
+    static uint16_t out_channels = 3;
+    int8_t input[row_len];
+    int8_t filter_data[row_len * out_channels];
+    int8_t output_c[out_channels], output_opt[out_channels];
+    static int32_t activation_min = -128;
+    static int32_t activation_max = 127;
+    static int32_t input_offset = 0;
+    static int32_t filter_offset = 0;
+    int32_t out_shift = -10;
+    static int32_t out_offset = 127;
+    int32_t out_mult = 0x59e492c4;
+    for (int itr = 0; itr < 5; itr++) {
+        out_mult = INT32_MAX / row_len + rand() % INT16_MAX;
+        switch (itr) {
+        case 0:
+            out_shift = -10;
+            break;
+        case 1:
+            out_shift = SHIFT_MIN;
+            break;
+        case 2:
+            out_shift = SHIFT_MAX;
+            break;
+        case 3:
+            out_shift = 0;
+            break;
+        default:
+            out_shift = -10 + rand() % 5;
+            break;
+        }
+        if (itr == 0) {
+            out_shift = SHIFT_MAX;
+        }
+        /* Generate input and filter data */
+        for (int i = 0; i < row_len; ++i) {
+            input[i] = rand() % 256 - 128;
+        }
+        for (int i = 0; i < row_len * out_channels; ++i) {
+            filter_data[i] = rand() % 256 - 128;
+        }
+
+        if (itr == 0) {
+            /* enable profiler */
+            profile_c_start();
+        }
+
+        /* C function */
+        esp_nn_fully_connected_s8_ansi(input, input_offset, row_len, filter_data, filter_offset,
+                                    NULL, output_c, out_channels, out_offset, out_shift, out_mult,
+                                    activation_min, activation_max);
+
+        if (itr == 0) {
+            profile_c_end();
+            profile_opt_start();
+        }
+
+        /* Optimized function */
+        esp_nn_fully_connected_s8(input, input_offset, row_len, filter_data, filter_offset,
+                                NULL, output_opt, out_channels, out_offset, out_shift, out_mult,
+                                activation_min, activation_max);
+
+        if (itr == 0) {
+            /* disable profiler */
+            profile_opt_end();
+        }
+
+        bool ret = CHECK_EQUAL(output_c, output_opt, out_channels);
+        if (ret == false) {
+            printf(ANSI_COLOR_RED"%s[%d] failed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+            printf("Output: \n");
+            PRINT_ARRAY_HEX(output_opt, out_channels, 1);
+            printf("Expected: \n");
+            PRINT_ARRAY_HEX(output_c, out_channels, 1);
+            printf("Input:\n");
+            PRINT_ARRAY_HEX(input, row_len, 1);
+            printf("Filter data:\n");
+            PRINT_ARRAY_HEX(filter_data, row_len, out_channels);
+            printf("Out shift: %d\n", out_shift);
+            printf("Out mult: %x\n", out_mult);
+            return;
+        }
+        printf(ANSI_COLOR_GREEN"%s[%d] passed\n"ANSI_COLOR_RESET, __FUNCTION__, itr);
+    }
+}
--- a/code/components/esp-nn/tests/src/pooling_test.c
+++ b/code/components/esp-nn/tests/src/pooling_test.c
@@ -0,0 +1,184 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+
+#include <esp_nn.h>
+#include "test_utils.h"
+
+
+void esp_nn_avg_pool_s8_test()
+{
+    /* prepare data */
+    const uint16_t input_wd = 16;
+    const uint16_t input_ht = 16;
+    const uint16_t channels = 16; /* With TFLite example, I have seen it 256 */
+    const int size = input_wd * input_ht * channels;
+    int8_t *input, *output_c, *output_opt;
+    const int32_t activation_min = -128;
+    const int32_t activation_max = 127;
+    const uint16_t pad_wd = 1;
+    const uint16_t pad_ht = 1;
+    const uint16_t stride_wd = 1;
+    const uint16_t stride_ht = 1;
+    const uint16_t filter_ht = 3;
+    const uint16_t filter_wd = 3;
+    const uint16_t out_wd = input_wd / stride_wd;
+    const uint16_t out_ht = input_ht / stride_ht;
+    const int out_size = out_wd * out_ht * channels;
+
+    input = memalign(16, size);
+    output_c = memalign(16, out_size);
+    output_opt = memalign(16, out_size);
+
+    if (input == NULL || output_c == NULL || output_opt == NULL) {
+        printf(ANSI_COLOR_RED"%s allocations failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+        goto avg_pool_s8_cleanup;
+    }
+    /**
+     * width/height, channels etc look suspicious but it it true.
+     * It actually depends upon where in model this is actually placed.
+     * If at the end wd/ht tends to be smaller and depth larger.
+     */
+
+    for (int i = 0; i < size; ++i) {
+        input[i] = rand() % 256 - 128;
+    }
+
+    /* enable profiler */
+    profile_c_start();
+
+    /* C function */
+    esp_nn_avg_pool_s8_ansi(input, input_wd, input_ht, output_c, out_wd, out_ht,
+                              stride_wd, stride_ht, filter_wd, filter_ht, pad_wd, pad_ht,
+                              activation_min, activation_max, channels);
+
+    profile_c_end();
+    profile_opt_start();
+
+    /* Optimized function */
+    esp_nn_avg_pool_s8(input, input_wd, input_ht, output_opt, out_wd, out_ht,
+                         stride_wd, stride_ht, filter_wd, filter_ht, pad_wd, pad_ht,
+                         activation_min, activation_max, channels);
+
+    /* disable profiler */
+    profile_opt_end();
+
+
+    bool ret = CHECK_EQUAL(output_c, output_opt, out_size);
+    if (ret == false) {
+        printf(ANSI_COLOR_RED"%s failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+        printf("Output: \n");
+        PRINT_ARRAY_HEX(output_opt, out_wd * channels, out_ht);
+        printf("Expected: \n");
+        PRINT_ARRAY_HEX(output_c, out_wd * channels, out_ht);
+        printf("Input:\n");
+        PRINT_ARRAY_HEX(input, input_wd * channels, input_ht);
+        goto avg_pool_s8_cleanup;
+    }
+    printf(ANSI_COLOR_GREEN"%s passed\n"ANSI_COLOR_RESET, __FUNCTION__);
+
+avg_pool_s8_cleanup:
+    if (input) {
+        free(input);
+    }
+    if (output_c) {
+        free(output_c);
+    }
+    if (output_opt) {
+        free(output_opt);
+    }
+}
+
+void esp_nn_max_pool_s8_test()
+{
+    /* prepare data */
+    const uint16_t input_wd = 16;
+    const uint16_t input_ht = 16;
+    const uint16_t channels = 16; /* With TFLite example, I have seen it 256 */
+    int8_t *input, *output_c, *output_opt;
+    const int size = input_wd * input_ht * channels;
+    const int32_t activation_min = -128;
+    const int32_t activation_max = 127;
+    const uint16_t pad_wd = 1;
+    const uint16_t pad_ht = 1;
+    const uint16_t stride_wd = 1;
+    const uint16_t stride_ht = 1;
+    const uint16_t filter_ht = 3;
+    const uint16_t filter_wd = 3;
+    const uint16_t out_wd = input_wd / stride_wd;
+    const uint16_t out_ht = input_ht / stride_ht;
+    const int out_size = out_wd * out_ht * channels;
+
+    input = memalign(16, size);
+    output_c = memalign(16, out_size);
+    output_opt = memalign(16, out_size);
+
+    if (input == NULL || output_c == NULL || output_opt == NULL) {
+        printf(ANSI_COLOR_RED"%s allocations failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+        goto max_pool_s8_cleanup;
+    }
+
+    for (int i = 0; i < size; ++i) {
+        input[i] = rand() % 256 - 128;
+    }
+
+    /* enable profiler */
+    profile_c_start();
+
+    /* C function */
+    esp_nn_max_pool_s8_ansi(input, input_wd, input_ht, output_c, out_wd, out_ht,
+                            stride_wd, stride_ht, filter_wd, filter_ht, pad_wd, pad_ht,
+                            activation_min, activation_max, channels);
+
+    profile_c_end();
+    profile_opt_start();
+
+    /* Optimized function */
+    esp_nn_max_pool_s8(input, input_wd, input_ht, output_opt, out_wd, out_ht,
+                       stride_wd, stride_ht, filter_wd, filter_ht, pad_wd, pad_ht,
+                       activation_min, activation_max, channels);
+
+    /* disable profiler */
+    profile_opt_end();
+
+
+    bool ret = CHECK_EQUAL(output_c, output_opt, out_wd * out_ht * channels);
+    if (ret == false) {
+        printf(ANSI_COLOR_RED"%s failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+        printf("Output: \n");
+        PRINT_ARRAY_HEX(output_opt, out_wd * out_ht * channels, 1);
+        printf("Expected: \n");
+        PRINT_ARRAY_HEX(output_c, out_wd * out_ht * channels, 1);
+        printf("Input:\n");
+        PRINT_ARRAY_HEX(input, 8, size / 8);
+        goto max_pool_s8_cleanup;
+    }
+    printf(ANSI_COLOR_GREEN"%s passed\n"ANSI_COLOR_RESET, __FUNCTION__);
+
+max_pool_s8_cleanup:
+    if (input) {
+        free(input);
+    }
+    if (output_c) {
+        free(output_c);
+    }
+    if (output_opt) {
+        free(output_opt);
+    }
+}
--- a/code/components/esp-nn/tests/src/relu_test.c
+++ b/code/components/esp-nn/tests/src/relu_test.c
@@ -0,0 +1,83 @@
+// Copyright 2020-2021 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+
+#include <esp_nn.h>
+#include "test_utils.h"
+
+void esp_nn_relu6_s8_test()
+{
+    const int size = 1600 + 8 + 7;
+    int8_t *input, *inout_ansi, *inout_opt;
+
+    input = memalign(16, size);
+    inout_ansi = memalign(16, size);
+    inout_opt = memalign(16, size);
+
+    if (input == NULL || inout_ansi == NULL || inout_opt == NULL) {
+        printf(ANSI_COLOR_RED"%s allocations failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+        goto relu6_s8_cleanup;
+    }
+    /* Generate filter data between -128 -> +127 */
+    for (int i = 0; i < size; ++i) {
+        input[i] = rand() % 255 - 128;
+        inout_ansi[i] = input[i];
+        inout_opt[i] = input[i];
+    }
+
+    /* enable profiler */
+    profile_c_start();
+
+    /* C function */
+    esp_nn_relu6_s8_ansi(inout_ansi, size);
+
+    profile_c_end();
+    profile_opt_start();
+
+    /* Optimized function */
+    esp_nn_relu6_s8(inout_opt, size);
+
+    /* disable profiler */
+    profile_opt_end();
+
+    bool ret = CHECK_EQUAL(inout_ansi, inout_opt, size);
+    if (ret == false) {
+        printf(ANSI_COLOR_RED"%s failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+        printf("Output: \n");
+        PRINT_ARRAY_HEX(inout_opt, size, 1);
+        printf("Expected: \n");
+        PRINT_ARRAY_HEX(inout_ansi, size, 1);
+        printf("Input:\n");
+        PRINT_ARRAY_HEX(input, size, 1);
+        goto relu6_s8_cleanup;
+    }
+    printf(ANSI_COLOR_GREEN"%s passed\n"ANSI_COLOR_RESET, __FUNCTION__);
+
+relu6_s8_cleanup:
+    if (input) {
+        free (input);
+    }
+    if (inout_ansi) {
+        free (inout_ansi);
+    }
+    if (inout_opt) {
+        free (inout_opt);
+    }
+
+}
--- a/code/components/esp-nn/tests/src/softmax_test.c
+++ b/code/components/esp-nn/tests/src/softmax_test.c
@@ -0,0 +1,101 @@
+// Copyright 2022 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <malloc.h>
+
+#include <esp_nn.h>
+#include "test_utils.h"
+
+void esp_nn_softmax_s8_test()
+{
+    const int32_t height = 8;
+    const int32_t width = 32;
+    const int32_t diff_min = -128;
+    const int32_t mult = INT32_MAX / 2;
+    const int32_t shift = 7;
+    void *scratch_buf = NULL;
+    const int size = width * height;
+    int8_t *input, *out_ansi, *out_opt;
+
+    input = memalign(16, size);
+    out_ansi = memalign(16, size);
+    out_opt = memalign(16, size);
+
+    if (input == NULL || out_ansi == NULL || out_opt == NULL) {
+        printf(ANSI_COLOR_RED"%s buffer allocations failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+        goto softmax_s8_cleanup;
+    }
+
+    /* Generate input data between -128 -> +127 */
+    for (int i = 0; i < size; ++i) {
+        input[i] = rand() % 255 - 128;
+    }
+
+    /* enable profiler */
+    profile_c_start();
+
+    /* C function */
+    esp_nn_softmax_s8_ansi(input, height, width, mult, shift, diff_min, out_ansi);
+
+    profile_c_end();
+
+    int32_t scratch_buf_size = esp_nn_get_softmax_scratch_size(width, height);
+    if (scratch_buf_size) {
+        scratch_buf = memalign(4, scratch_buf_size);
+        if (scratch_buf == NULL) {
+            printf(ANSI_COLOR_RED"%s scratch_buf alloc failed size %d\n"ANSI_COLOR_RESET, __FUNCTION__, scratch_buf_size);
+            goto softmax_s8_cleanup;
+        }
+        esp_nn_set_softmax_scratch_buf(scratch_buf);
+    }
+
+    profile_opt_start();
+
+    /* Optimized function */
+    esp_nn_softmax_s8(input, height, width, mult, shift, diff_min, out_opt);
+
+    /* disable profiler */
+    profile_opt_end();
+
+    bool ret = CHECK_EQUAL(out_ansi, out_opt, size);
+    if (ret == false) {
+        printf(ANSI_COLOR_RED"%s failed\n"ANSI_COLOR_RESET, __FUNCTION__);
+        printf("Output: \n");
+        PRINT_ARRAY_HEX(out_opt, width, height);
+        printf("Expected: \n");
+        PRINT_ARRAY_HEX(out_ansi, width, height);
+        printf("Input:\n");
+        PRINT_ARRAY_HEX(input, width, height);
+        goto softmax_s8_cleanup;
+    }
+    printf(ANSI_COLOR_GREEN"%s passed\n"ANSI_COLOR_RESET, __FUNCTION__);
+
+softmax_s8_cleanup:
+    if (input) {
+        free (input);
+    }
+    if (out_ansi) {
+        free (out_ansi);
+    }
+    if (out_opt) {
+        free (out_opt);
+    }
+    if (scratch_buf) {
+        free (scratch_buf);
+    }
+}
--- a/code/components/esp-nn_20220827.zip
+++ b/code/components/esp-nn_20220827.zip
--- a/code/components/esp32-camera-master_neu_20220121.zip
+++ b/code/components/esp32-camera-master_neu_20220121.zip
--- a/code/components/esp32-camera-master_old_version.zip
+++ b/code/components/esp32-camera-master_old_version.zip
--- a/code/components/jomjol_controlcamera/ClassControllCamera.cpp
+++ b/code/components/jomjol_controlcamera/ClassControllCamera.cpp
@@ -263,6 +263,9 @@ void CCamera::EnableAutoExposure(int flashdauer)
        ESP_LOGE(TAGCAMERACLASS, "Camera Capture Failed");
        LEDOnOff(false);
        LightOnOff(false);
+        LogFile.SwitchOnOff(true);
+        LogFile.WriteToFile("Camera Capture Failed (Procedure 'EnableAutoExposure') --> Reboot"
+                "Check that your camera module is working and connected properly.");
        doReboot();
    }
    esp_camera_fb_return(fb);        
@@ -313,7 +316,7 @@ esp_err_t CCamera::CaptureToBasisImage(CImageBasis *_Image, int delay)
        LightOnOff(false);

        LogFile.SwitchOnOff(true);
-        LogFile.WriteToFile("Camera is not working anymore - most propably hardware problem (instablility, ...). "
+        LogFile.WriteToFile("Camera is not working anymore (CCamera::CaptureToBasisImage) - most propably hardware problem (instablility, ...). "
                "System will reboot.");
        doReboot();

@@ -410,6 +413,9 @@ esp_err_t CCamera::CaptureToFile(std::string nm, int delay)
        ESP_LOGE(TAGCAMERACLASS, "CaptureToFile: Camera Capture Failed");
        LEDOnOff(false);
        LightOnOff(false);
+        LogFile.SwitchOnOff(true);
+        LogFile.WriteToFile("Camera Capture Failed (CCamera::CaptureToFile) --> Reboot"
+                "Check that your camera module is working and connected properly.");
        doReboot();

        return ESP_FAIL;
--- a/code/components/jomjol_fileserver_ota/server_file.cpp
+++ b/code/components/jomjol_fileserver_ota/server_file.cpp
@@ -95,8 +95,13 @@ esp_err_t get_tflite_file_handler(httpd_req_t *req)
        _filename = std::string(entry->d_name);
        printf("File: %s\t", _filename.c_str());

+        // ignore all files with starting dot (hidden files)
+        if (_filename.rfind(".", 0) == 0) {
+            continue;
+        }
+
        _fileext = _filename;
-        pos = _fileext.find(".");
+        pos = _fileext.find_last_of(".");
        if (pos != std::string::npos)
            _fileext = _fileext.erase(0, pos + 1);

--- a/code/components/jomjol_fileserver_ota/server_ota.cpp
+++ b/code/components/jomjol_fileserver_ota/server_ota.cpp
@@ -416,6 +416,8 @@ void task_reboot(void *pvParameter)
 }

 void doReboot(){
+    LogFile.SwitchOnOff(true);
+    LogFile.WriteToFile("Reboot triggert by Software (5s).");
    ESP_LOGI(TAGPARTOTA, "Reboot in 5sec");
    LogFile.WriteToFile("Reboot in 5sec");
    xTaskCreate(&task_reboot, "reboot", configMINIMAL_STACK_SIZE * 64, NULL, 10, NULL);
@@ -435,7 +437,7 @@ esp_err_t handler_reboot(httpd_req_t *req)

    LogFile.WriteToFile("handler_reboot");
    ESP_LOGI(TAGPARTOTA, "!!! System will restart within 5 sec!!!");
-    const char* resp_str = "!!! System will restart within 5 sec!!!";
+    const char* resp_str = "<body style='font-family: arial'> <h3 id=t></h3></body><script>var h='Rebooting!<br>The page will automatically reload.<br>'; document.getElementById('t').innerHTML=h; setInterval(function (){h +='.'; document.getElementById('t').innerHTML=h; fetch(window.location.hostname,{mode: 'no-cors'}).then(r=>{window.location.replace('/wasserzaehler_roi.html');})}, 1000);</script>";
    httpd_resp_send(req, resp_str, strlen(resp_str)); 
    
    doReboot();
--- a/code/components/jomjol_flowcontroll/CMakeLists.txt
+++ b/code/components/jomjol_flowcontroll/CMakeLists.txt
@@ -2,6 +2,6 @@ FILE(GLOB_RECURSE app_sources ${CMAKE_CURRENT_SOURCE_DIR}/*.*)

 idf_component_register(SRCS ${app_sources}
                    INCLUDE_DIRS "."
-                    REQUIRES jomjol_tfliteclass jomjol_helper jomjol_controlcamera jomjol_mqtt jomjol_fileserver_ota jomjol_image_proc jomjol_wlan)
+                    REQUIRES jomjol_tfliteclass jomjol_helper jomjol_controlcamera jomjol_mqtt jomjol_influxdb jomjol_fileserver_ota jomjol_image_proc jomjol_wlan)


--- a/code/components/jomjol_flowcontroll/ClassFlow.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlow.cpp
@@ -19,7 +19,6 @@ void ClassFlow::SetInitialParameter(void)
 std::vector<string> ClassFlow::ZerlegeZeile(std::string input, std::string delimiter)
 {
 	std::vector<string> Output;
-//	std::string delimiter = " =,";

 	input = trim(input, delimiter);
 	size_t pos = findDelimiterPos(input, delimiter);
--- a/code/components/jomjol_flowcontroll/ClassFlow.h
+++ b/code/components/jomjol_flowcontroll/ClassFlow.h
@@ -26,7 +26,6 @@ struct HTMLInfo
 class ClassFlow
 {
 protected:
-//	std::vector<string> ZerlegeZeile(string input);
 	std::vector<string> ZerlegeZeile(string input, string delimiter = " =, \t");
 	bool isNewParagraph(string input);
 	bool GetNextParagraph(FILE* pfile, string& aktparamgraph);
--- a/code/components/jomjol_flowcontroll/ClassFlowAlignment.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowAlignment.cpp
@@ -19,6 +19,7 @@ void ClassFlowAlignment::SetInitialParameter(void)
    initalrotate = 0;
    anz_ref = 0;
    initialmirror = false;
+    use_antialiasing = false;
    initialflip = false;
    SaveAllFiles = false;
    namerawimage =  "/sdcard/img_tmp/raw.jpg";
@@ -94,7 +95,12 @@ bool ClassFlowAlignment::ReadParameter(FILE* pfile, string& aktparamgraph)
        if ((toUpper(zerlegt[0]) == "SEARCHFIELDY") && (zerlegt.size() > 1))
        {
            suchey = std::stod(zerlegt[1]);
-        }               
+        }   
+        if ((toUpper(zerlegt[0]) == "ANTIALIASING") && (zerlegt.size() > 1))
+        {
+            if (toUpper(zerlegt[1]) == "TRUE")
+                use_antialiasing = true;
+        }   
        if ((zerlegt.size() == 3) && (anz_ref < 2))
        {
            References[anz_ref].image_file = FormatFileName("/sdcard" + zerlegt[0]);
@@ -175,7 +181,10 @@ bool ClassFlowAlignment::doFlow(string time)
 
    if ((initalrotate != 0) || initialflip)
    {
-        rt.Rotate(initalrotate);
+        if (use_antialiasing)
+            rt.RotateAntiAliasing(initalrotate);
+        else
+            rt.Rotate(initalrotate);
        if (SaveAllFiles) AlignAndCutImage->SaveToFile(FormatFileName("/sdcard/img_tmp/rot.jpg"));
    }

--- a/code/components/jomjol_flowcontroll/ClassFlowAlignment.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowAlignment.h
@@ -16,6 +16,7 @@ protected:
    float initalrotate;
    bool initialmirror;
    bool initialflip;
+    bool use_antialiasing;
    RefInfo References[2];
    int anz_ref;
    string namerawimage;
--- a/code/components/jomjol_flowcontroll/ClassFlowCNNGeneral.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowCNNGeneral.cpp
@@ -17,6 +17,7 @@ ClassFlowCNNGeneral::ClassFlowCNNGeneral(ClassFlowAlignment *_flowalign, t_CNNTy
    string cnnmodelfile = "";
    modelxsize = 1;
    modelysize = 1;
+    CNNGoodThreshold = 0.0;
    ListFlowControll = NULL;
    previousElement = NULL;   
    SaveAllFiles = false; 
@@ -27,20 +28,21 @@ ClassFlowCNNGeneral::ClassFlowCNNGeneral(ClassFlowAlignment *_flowalign, t_CNNTy
    flowpostalignment = _flowalign;
 }

-string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution = false)
+string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution, int prev, float _vorgaengerAnalog)
 {
    string result = "";    
+
    if (GENERAL[_analog]->ROI.size() == 0)
        return result;
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout _analog=" + std::to_string(_analog) + ", _extendedResolution=" + std::to_string(_extendedResolution) + ", prev=" + std::to_string(prev));

-    if (CNNType == Analogue)
+    if (CNNType == Analogue || CNNType == Analogue100)
    {
        float zahl = GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float;
        int ergebnis_nachkomma = ((int) floor(zahl * 10) + 10) % 10;
-
-        int prev = -1;
-
-        prev = ZeigerEval(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev);
+        
+        prev = ZeigerEvalAnalogNeu(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev);
+//        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout(analog) zahl=" + std::to_string(zahl) + ", ergebnis_nachkomma=" + std::to_string(ergebnis_nachkomma) + ", prev=" + std::to_string(prev));
        result = std::to_string(prev);

        if (_extendedResolution && (CNNType != Digital))
@@ -48,7 +50,7 @@ string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution

        for (int i = GENERAL[_analog]->ROI.size() - 2; i >= 0; --i)
        {
-            prev = ZeigerEval(GENERAL[_analog]->ROI[i]->result_float, prev);
+            prev = ZeigerEvalAnalogNeu(GENERAL[_analog]->ROI[i]->result_float, prev);
            result = std::to_string(prev) + result;
        }
        return result;
@@ -66,25 +68,31 @@ string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution
        return result;
    }

-    if (CNNType == DigitalHyprid)
+    if ((CNNType == DoubleHyprid10) || (CNNType == Digital100))
    {
-        int zif_akt = -1;

        float zahl = GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float;
        if (zahl >= 0)       // NaN?
        {
-            if (_extendedResolution)
+            if (_extendedResolution)            // ist nur gesetzt, falls es die erste Ziffer ist (kein Analog vorher!)
            {
                int ergebnis_nachkomma = ((int) floor(zahl * 10)) % 10;
                int ergebnis_vorkomma = ((int) floor(zahl)) % 10;

                result = std::to_string(ergebnis_vorkomma) + std::to_string(ergebnis_nachkomma);
-                zif_akt = ergebnis_vorkomma;
+                prev = ergebnis_vorkomma;
+                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout(dig100-ext) ergebnis_vorkomma=" + std::to_string(ergebnis_vorkomma) + ", ergebnis_nachkomma=" + std::to_string(ergebnis_nachkomma) + ", prev=" + std::to_string(prev));
            }
            else
            {
-                zif_akt = ZeigerEvalHybrid(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, -1, -1);
-                result = std::to_string(zif_akt);
+//                prev = ZeigerEval(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev);
+                if (_vorgaengerAnalog >= 0)
+                    prev = ZeigerEvalHybridNeu(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, _vorgaengerAnalog, prev, true);
+                else
+                    prev = ZeigerEvalHybridNeu(GENERAL[_analog]->ROI[GENERAL[_analog]->ROI.size() - 1]->result_float, prev, prev);
+                result = std::to_string(prev);
+                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout(dig100)  prev=" + std::to_string(prev));
+        
            }
        }
        else
@@ -98,26 +106,37 @@ string ClassFlowCNNGeneral::getReadout(int _analog = 0, bool _extendedResolution
        {
            if (GENERAL[_analog]->ROI[i]->result_float >= 0)
            {
-                zif_akt = ZeigerEvalHybrid(GENERAL[_analog]->ROI[i]->result_float, GENERAL[_analog]->ROI[i+1]->result_float, zif_akt);
-                result = std::to_string(zif_akt) + result;
+                prev = ZeigerEvalHybridNeu(GENERAL[_analog]->ROI[i]->result_float, GENERAL[_analog]->ROI[i+1]->result_float, prev);
+                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout#ZeigerEvalHybridNeu()= " + std::to_string(prev));
+                result = std::to_string(prev) + result;
+                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout#result= " + result);
+                
            }
            else
            {
-                zif_akt = -1;
+                prev = -1;
                result = "N" + result;
+                if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::getReadout(result_float<0 /'N')  result_float=" + std::to_string(GENERAL[_analog]->ROI[i]->result_float));
+        
            }
        }
        return result;
    }

+
    return result;
 }

+/*
 int ClassFlowCNNGeneral::ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int eval_vorgaenger)
 {
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybrid( " + std::to_string(zahl) + ", " + std::to_string(zahl_vorgaenger) + ", " + std::to_string(eval_vorgaenger) + ")");
+                
    int ergebnis_nachkomma = ((int) floor(zahl * 10)) % 10;
+    int ergebnis_vorkomma = ((int) floor(zahl) + 10) % 10;

-    if (zahl_vorgaenger < 0)                // keine Vorzahl vorhanden !!! --> Runde die Zahl
+
+    if (eval_vorgaenger < 0)                // keine Vorzahl vorhanden !!! --> Runde die Zahl
    {
        if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
            return ((int) round(zahl) + 10) % 10;
@@ -125,46 +144,221 @@ int ClassFlowCNNGeneral::ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int
            return ((int) trunc(zahl) + 10) % 10;
    }

-    if (zahl_vorgaenger > 9.2)              // Ziffernwechsel beginnt
+    // 9.0, da bei getReadout() prev als int übergeben wird (9 statt 9.5)
+    // tritt bei der ersten ziffer von digit auf, wenn analog davor (2. Aufruf von getReadout)
+    if ((zahl_vorgaenger >= 0.5 ) && (zahl_vorgaenger < 9.5))
    {
-        if (eval_vorgaenger == 0)           // Wechsel hat schon stattgefunden
-        {
-            return ((int) round(zahl) + 10) % 10;      // Annahme, dass die neue Zahl schon in der Nähe des Ziels ist
-        }
+        // kein Ziffernwechsel, da Vorkomma weit genug weg ist (0+/-0.5) --> zahl wird gerundet
+        if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
+            return ((int) round(zahl) + 10) % 10;
        else
+            return ((int) trunc(zahl) + 10) % 10;
+    }  
+    else
+    {
+        if (eval_vorgaenger <= 1)  // Nulldurchgang hat stattgefunden (!Bewertung über Prev_value und nicht Zahl!) --> hier aufrunden (2.8 --> 3, aber auch 3.1 --> 3)
        {
-            if (zahl_vorgaenger <= 9.5)     // Wechsel startet gerade, aber beginnt erst
-            {
-                if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
-                    return ((int) round(zahl) + 10) % 10;
-                else
-                    return ((int) trunc(zahl) + 10) % 10;
-            }
+            if (ergebnis_nachkomma > 5)
+                return (ergebnis_vorkomma + 1) % 10;
            else
-            {
-                return ((int) trunc(zahl) + 10) % 10;   // Wechsel schon weiter fortgeschritten, d.h. über 2 als Nachkomma
-            }
+                return ergebnis_vorkomma;
+        }
+        else // bleibt nur >= 9.5 --> noch kein Nulldurchgang --> 2.8 --> 2, und 3.1 --> 2
+        {
+            // hier auf 4 reduziert, da erst ab Vorgänder 9 anfängt umzustellen. Bei 9.5 Vorgänger kann die aktuelle
+            // Zahl noch x.4 - x.5 sein.
+            if (ergebnis_nachkomma >= 4)
+                return ergebnis_vorkomma;
+            else
+                return (ergebnis_vorkomma - 1 + 10) % 10;
        }
    }
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybrid(return -1)  zahl=" + std::to_string(zahl) 
+                        + ", zahl_vorgaenger=" + std::to_string(zahl_vorgaenger) + ", eval_vorgaenger=" + std::to_string(eval_vorgaenger));
+    return -1;

-    if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
-        return ((int) round(zahl) + 10) % 10;
+}
+*/

-    return ((int) trunc(zahl) + 10) % 10;
+int ClassFlowCNNGeneral::ZeigerEvalHybridNeu(float zahl, float zahl_vorgaenger, int eval_vorgaenger, bool AnalogerVorgaenger)
+{
+    int result;
+    int ergebnis_nachkomma = ((int) floor(zahl * 10)) % 10;
+    int ergebnis_vorkomma = ((int) floor(zahl) + 10) % 10;
+
+    if (eval_vorgaenger < 0)
+    {
+        if ((ergebnis_nachkomma <= DigitalUnschaerfe * 10) || (ergebnis_nachkomma >= DigitalUnschaerfe * 10))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
+            result = (int) (round(zahl) + 10) % 10;
+        else
+            result = (int) ((int) trunc(zahl) + 10) % 10;
+
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - kein Vorgänger - Ergebnis = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }
+
+    if (AnalogerVorgaenger)
+    {
+//        result = ZeigerEvalAnalogToDigitNeu(zahl, eval_vorgaenger);
+        result = ZeigerEvalAnalogToDigitNeu(zahl, zahl_vorgaenger);
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - Analoger Vorgänger, Bewertung über ZeigerEvalAnalogNeu = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }
+
+    if ((zahl_vorgaenger >= DigitalUebergangsbereichVorgaenger ) && (zahl_vorgaenger <= (10.0 - DigitalUebergangsbereichVorgaenger)))
+    {
+        // kein Ziffernwechsel, da Vorgänger weit genug weg ist (0+/-DigitalUebergangsbereichVorgaenger) --> zahl wird gerundet
+        if ((ergebnis_nachkomma <= DigitalBand) || (ergebnis_nachkomma >= (10-DigitalBand)))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
+            result = ((int) round(zahl) + 10) % 10;
+        else
+            result = ((int) trunc(zahl) + 10) % 10;
+
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - KEIN Analoger Vorgänger, kein Ziffernwechsel, da Vorkomma weit genug weg = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }  
+
+    if (eval_vorgaenger <= 1)  // Nulldurchgang hat stattgefunden (!Bewertung über Prev_value und nicht Zahl!) --> hier aufrunden (2.8 --> 3, aber auch 3.1 --> 3)
+    {
+        if (ergebnis_nachkomma > 5)
+            result =  (ergebnis_vorkomma + 1) % 10;
+        else
+            result =  ergebnis_vorkomma;
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - KEIN Analoger Vorgänger, Nulldurchgang hat stattgefunden = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }
+
+    // bleibt nur >= 9.5 --> noch kein Nulldurchgang --> 2.8 --> 2, und 3.1 --> 2
+    // hier auf 4 reduziert, da erst ab Vorgänder 9 anfängt umzustellen. Bei 9.5 Vorgänger kann die aktuelle
+    // Zahl noch x.4 - x.5 sein.
+    if (ergebnis_nachkomma >= 4)
+        result =  ergebnis_vorkomma;
+    else
+        result =  (ergebnis_vorkomma - 1 + 10) % 10;
+
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalHybridNeu - KEIN Analoger Vorgänger, >= 9.5 --> noch kein Nulldurchgang = " + std::to_string(result) +
+                                                " zahl: " + std::to_string(zahl) + " zahl_vorgaenger = " + std::to_string(zahl_vorgaenger)+ " eval_vorgaenger = " + std::to_string(eval_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+    return result;
 }

-int ClassFlowCNNGeneral::ZeigerEval(float zahl, int ziffer_vorgaenger)
+
+int ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu(float zahl, float ziffer_vorgaenger)
 {
+    int result;
+    int ergebnis_nachkomma = ((int) floor(zahl * 10)) % 10;
+    int ergebnis_vorkomma = ((int) floor(zahl) + 10) % 10;
+
+    if (ziffer_vorgaenger < 0)
+    {
+        result = (int) floor(zahl);
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu - kein Vorgänger - Ergebnis = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+        return result;
+    }
+
+    if ((ziffer_vorgaenger >= DigitalUebergangsbereichVorgaengerAnalogToDigit ) && (ziffer_vorgaenger <= (10.0 - DigitalUebergangsbereichVorgaengerAnalogToDigit)))
+    {
+        // kein Ziffernwechsel, da Vorgänger weit genug weg ist (0+/-DigitalUebergangsbereichVorgaenger) --> zahl wird gerundet
+        if ((ergebnis_nachkomma <= 2) || (ergebnis_nachkomma >= 8))     // Band um die Ziffer --> Runden, da Ziffer im Rahmen Ungenauigkeit erreicht
+            result = ((int) round(zahl) + 10) % 10;
+        else
+            result = ((int) trunc(zahl) + 10) % 10;
+
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu - kein Ziffernwechsel, da Vorkomma weit genug weg = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }  
+
+    if (ziffer_vorgaenger <= 1)  // Nulldurchgang hat stattgefunden (!Bewertung über Prev_value und nicht Zahl!) --> hier aufrunden (2.8 --> 3, aber auch 3.1 --> 3)
+    {
+        if (ergebnis_nachkomma > 5)
+            result =  (ergebnis_vorkomma + 1) % 10;
+        else
+            result =  ergebnis_vorkomma;
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu - Nulldurchgang hat stattgefunden = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+        return result;
+    }
+
+    // bleibt nur >= 9.5 --> noch kein Nulldurchgang --> 2.8 --> 2, und 3.1 --> 2
+    // hier auf 4 reduziert, da erst ab Vorgänder 9 anfängt umzustellen. Bei 9.5 Vorgänger kann die aktuelle
+    // Zahl noch x.4 - x.5 sein.
+    if (ergebnis_nachkomma >= 4)
+        result =  ergebnis_vorkomma;
+    else
+        result =  (ergebnis_vorkomma - 1 + 10) % 10;
+
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogToDigitNeu - 9.0 --> noch kein Nulldurchgang = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " DigitalUnschaerfe = " +  std::to_string(DigitalUnschaerfe));
+    return result;
+}
+
+int ClassFlowCNNGeneral::ZeigerEvalAnalogNeu(float zahl, int ziffer_vorgaenger)
+{
+    float zahl_min, zahl_max;
+    int result;
+
+    if (ziffer_vorgaenger == -1)
+    {
+        result = (int) floor(zahl);
+        if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogNeu - kein Vorgänger - Ergebnis = " + std::to_string(result) +
+                                                    " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+        return result;
+    }
+
+    zahl_min = zahl - AnalogFehler / 10.0;
+    zahl_max = zahl + AnalogFehler / 10.0;
+
+    if ((int) floor(zahl_max) - (int) floor(zahl_min) != 0)
+    {
+        if (ziffer_vorgaenger <= AnalogFehler)
+        {
+            result = ((int) floor(zahl_max) + 10) % 10;
+            if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogNeu - Zahl uneindeutig, Korrektur nach oben - Ergebnis = " + std::to_string(result) +
+                                                        " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+            return result;
+        }
+        if (ziffer_vorgaenger >= 10 - AnalogFehler)
+        {
+            result = ((int) floor(zahl_min) + 10) % 10;
+            if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogNeu - Zahl uneindeutig, Korrektur nach unten - Ergebnis = " + std::to_string(result) +
+                                                        " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+            return result;
+        }
+    }
+    
+
+    result = ((int) floor(zahl) + 10) % 10;
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEvalAnalogNeu - Zahl eindeutig, keine Korrektur notwendig - Ergebnis = " + std::to_string(result) +
+                                                " zahl: " + std::to_string(zahl) + " ziffer_vorgaenger = " + std::to_string(ziffer_vorgaenger) + " AnalogFehler = " +  std::to_string(AnalogFehler));
+
+    return result;
+
+}
+
+/*
+int ClassFlowCNNGeneral::ZeigerEval(float zahl, int ziffer_vorgaenger)
+{   
    int ergebnis_nachkomma = ((int) floor(zahl * 10) + 10) % 10;
    int ergebnis_vorkomma = ((int) floor(zahl) + 10) % 10;
-    int ergebnis, ergebnis_rating;
+    int ergebnis;
+    float ergebnis_rating;
+    if (debugdetailgeneral) LogFile.WriteToFile("ClassFlowCNNGeneral::ZeigerEval erg_v=" + std::to_string(ergebnis_vorkomma) + ", erg_n=" + std::to_string(ergebnis_nachkomma) + ", ziff_v=" + std::to_string(ziffer_vorgaenger));

    if (ziffer_vorgaenger == -1)
        return ergebnis_vorkomma % 10;

+    // Ist die aktuelle Stelle schon umgesprungen und die Vorstelle noch nicht?
+    // Akt.: 2.1, Vorstelle = 0.9 => 1.9
+    // Problem sind mehrere Rundungen 
+    // Bsp. zahl=4.5, Vorgänger= 9.6 (ziffer_vorgaenger=0)
+    // Tritt nur auf bei Übergang von analog auf digit
    ergebnis_rating = ergebnis_nachkomma - ziffer_vorgaenger;
    if (ergebnis_nachkomma >= 5)
-        ergebnis_rating-=5;
+        ergebnis_rating-=5.1;
    else
        ergebnis_rating+=5;
    ergebnis = (int) round(zahl);
@@ -172,10 +366,11 @@ int ClassFlowCNNGeneral::ZeigerEval(float zahl, int ziffer_vorgaenger)
        ergebnis-=1;
    if (ergebnis == -1)
        ergebnis+=10;
-
+    
    ergebnis = (ergebnis + 10) % 10;
    return ergebnis;
 }
+*/

 bool ClassFlowCNNGeneral::ReadParameter(FILE* pfile, string& aktparamgraph)
 {
@@ -206,12 +401,12 @@ bool ClassFlowCNNGeneral::ReadParameter(FILE* pfile, string& aktparamgraph)
    while (this->getNextLine(pfile, &aktparamgraph) && !this->isNewParagraph(aktparamgraph))
    {
        zerlegt = this->ZerlegeZeile(aktparamgraph);
-        if ((zerlegt[0] == "LogImageLocation") && (zerlegt.size() > 1))
+        if ((toUpper(zerlegt[0]) == "LOGIMAGELOCATION") && (zerlegt.size() > 1))
        {
            this->LogImageLocation = "/sdcard" + zerlegt[1];
            this->isLogImage = true;
        }
-        if ((zerlegt[0] == "LogImageSelect") && (zerlegt.size() > 1))
+        if ((toUpper(zerlegt[0]) == "LOGIMAGESELECT") && (zerlegt.size() > 1))
        {
            LogImageSelect = zerlegt[1];
            isLogImageSelect = true;            
@@ -221,20 +416,20 @@ bool ClassFlowCNNGeneral::ReadParameter(FILE* pfile, string& aktparamgraph)
        {
            this->logfileRetentionInDays = std::stoi(zerlegt[1]);
        }
-        if ((toUpper(zerlegt[0]) == "MODELTYPE") && (zerlegt.size() > 1))
-        {
-            if (toUpper(zerlegt[1]) == "DIGITHYPRID")
-                CNNType = DigitalHyprid;
-        }
+//        if ((toUpper(zerlegt[0]) == "MODELTYPE") && (zerlegt.size() > 1))
+//        {
+//            if (toUpper(zerlegt[1]) == "DIGITHYPRID")
+//                CNNType = DigitalHyprid;
+//        }

-        if ((zerlegt[0] == "Model") && (zerlegt.size() > 1))
+        if ((toUpper(zerlegt[0]) == "MODEL") && (zerlegt.size() > 1))
        {
            this->cnnmodelfile = zerlegt[1];
        }
-        if ((zerlegt[0] == "ModelInputSize") && (zerlegt.size() > 2))
+        
+        if ((toUpper(zerlegt[0]) == "CNNGOODTHRESHOLD") && (zerlegt.size() > 1))
        {
-            this->modelxsize = std::stoi(zerlegt[1]);
-            this->modelysize = std::stoi(zerlegt[2]);
+            CNNGoodThreshold = std::stof(zerlegt[1]);
        }
        if (zerlegt.size() >= 5)
        {
@@ -256,11 +451,14 @@ bool ClassFlowCNNGeneral::ReadParameter(FILE* pfile, string& aktparamgraph)
        }
    }

+    if (!getNetworkParameter())
+        return false;

-   for (int _ana = 0; _ana < GENERAL.size(); ++_ana)
+
+    for (int _ana = 0; _ana < GENERAL.size(); ++_ana)
        for (int i = 0; i < GENERAL[_ana]->ROI.size(); ++i)
        {
-            GENERAL[_ana]->ROI[i]->image = new CImageBasis(modelxsize, modelysize, 3);
+            GENERAL[_ana]->ROI[i]->image = new CImageBasis(modelxsize, modelysize, modelchannel);
            GENERAL[_ana]->ROI[i]->image_org = new CImageBasis(GENERAL[_ana]->ROI[i]->deltax, GENERAL[_ana]->ROI[i]->deltay, 3);
        }

@@ -398,7 +596,7 @@ bool ClassFlowCNNGeneral::doAlignAndCut(string time)

 void ClassFlowCNNGeneral::DrawROI(CImageBasis *_zw)
 {
-    if (CNNType == Analogue)
+    if (CNNType == Analogue || CNNType == Analogue100)
    {
        int r = 0;
        int g = 255;
@@ -408,7 +606,6 @@ void ClassFlowCNNGeneral::DrawROI(CImageBasis *_zw)
            for (int i = 0; i < GENERAL[_ana]->ROI.size(); ++i)
            {
                _zw->drawRect(GENERAL[_ana]->ROI[i]->posx, GENERAL[_ana]->ROI[i]->posy, GENERAL[_ana]->ROI[i]->deltax, GENERAL[_ana]->ROI[i]->deltay, r, g, b, 1);
-//                _zw->drawCircle((int) (GENERAL[_ana]->ROI[i]->posx + GENERAL[_ana]->ROI[i]->deltax/2), (int)  (GENERAL[_ana]->ROI[i]->posy + GENERAL[_ana]->ROI[i]->deltay/2), (int) (GENERAL[_ana]->ROI[i]->deltax/2), r, g, b, 2);
                _zw->drawEllipse( (int) (GENERAL[_ana]->ROI[i]->posx + GENERAL[_ana]->ROI[i]->deltax/2), (int)  (GENERAL[_ana]->ROI[i]->posy + GENERAL[_ana]->ROI[i]->deltay/2), (int) (GENERAL[_ana]->ROI[i]->deltax/2), (int) (GENERAL[_ana]->ROI[i]->deltay/2), r, g, b, 2);
                _zw->drawLine((int) (GENERAL[_ana]->ROI[i]->posx + GENERAL[_ana]->ROI[i]->deltax/2), (int) GENERAL[_ana]->ROI[i]->posy, (int) (GENERAL[_ana]->ROI[i]->posx + GENERAL[_ana]->ROI[i]->deltax/2), (int) (GENERAL[_ana]->ROI[i]->posy + GENERAL[_ana]->ROI[i]->deltay), r, g, b, 2);
                _zw->drawLine((int) GENERAL[_ana]->ROI[i]->posx, (int) (GENERAL[_ana]->ROI[i]->posy + GENERAL[_ana]->ROI[i]->deltay/2), (int) GENERAL[_ana]->ROI[i]->posx + GENERAL[_ana]->ROI[i]->deltax, (int) (GENERAL[_ana]->ROI[i]->posy + GENERAL[_ana]->ROI[i]->deltay/2), r, g, b, 2);
@@ -422,6 +619,72 @@ void ClassFlowCNNGeneral::DrawROI(CImageBasis *_zw)
    }
 } 

+bool ClassFlowCNNGeneral::getNetworkParameter()
+{
+    if (disabled)
+        return true;
+
+    CTfLiteClass *tflite = new CTfLiteClass;  
+    string zwcnn = "/sdcard" + cnnmodelfile;
+    zwcnn = FormatFileName(zwcnn);
+    printf(zwcnn.c_str());printf("\n");
+    if (!tflite->LoadModel(zwcnn)) {
+        printf("Can't read model file /sdcard%s\n", cnnmodelfile.c_str());
+        LogFile.WriteToFile("Cannot load model");
+        delete tflite;
+        return false;
+    } 
+    tflite->MakeAllocate();
+
+    if (CNNType == AutoDetect)
+    {
+        tflite->GetInputDimension(false);
+        modelxsize = tflite->ReadInputDimenstion(0);
+        modelysize = tflite->ReadInputDimenstion(1);
+        modelchannel = tflite->ReadInputDimenstion(2);
+
+        int _anzoutputdimensions = tflite->GetAnzOutPut();
+        switch (_anzoutputdimensions) 
+        {
+            case 2:
+                CNNType = Analogue;
+                printf("TFlite-Type set to Analogue\n");
+                break;
+            case 10:
+                CNNType = DoubleHyprid10;
+                printf("TFlite-Type set to DoubleHyprid10\n");
+                break;
+            case 11:
+                CNNType = Digital;
+                printf("TFlite-Type set to Digital\n");
+                break;
+            case 20:
+                CNNType = DigitalHyprid10;
+                printf("TFlite-Type set to DigitalHyprid10\n");
+                break;
+//            case 22:
+//                CNNType = DigitalHyprid;
+//                printf("TFlite-Type set to DigitalHyprid\n");
+//                break;
+             case 100:
+                if (modelxsize==32 && modelysize == 32) {
+                    CNNType = Analogue100;
+                    printf("TFlite-Type set to Analogue100\n");
+                } else {
+                    CNNType = Digital100;
+                    printf("TFlite-Type set to Digital\n");
+                }
+                break;
+            default:
+                LogFile.WriteToFile("ERROR ERROR ERROR - tflite passt nicht zur Firmware - ERROR ERROR ERROR (outout_dimension=" + std::to_string(_anzoutputdimensions) + ")");
+                printf("ERROR ERROR ERROR - tflite passt nicht zur Firmware - ERROR ERROR ERROR\n");
+        }
+    }
+
+    delete tflite;
+    return true;
+}
+
 bool ClassFlowCNNGeneral::doNeuralNetwork(string time)
 {
    if (disabled)
@@ -442,32 +705,6 @@ bool ClassFlowCNNGeneral::doNeuralNetwork(string time)
    } 
    tflite->MakeAllocate();

-    if (CNNType == AutoDetect)
-    {
-        int _anzoutputdimensions = tflite->GetAnzOutPut();
-        switch (_anzoutputdimensions) 
-        {
-            case 2:
-                CNNType = Analogue;
-                printf("TFlite-Type set to Analogue\n");
-                break;
-            case 11:
-                CNNType = Digital;
-                printf("TFlite-Type set to Digital\n");
-                break;
-            case 20:
-                CNNType = DigitalHyprid10;
-                printf("TFlite-Type set to DigitalHyprid10\n");
-                break;
-            case 22:
-                CNNType = DigitalHyprid;
-                printf("TFlite-Type set to DigitalHyprid\n");
-                break;
-            default:
-                printf("ERROR ERROR ERROR - tflite passt nicht zur Firmware - ERROR ERROR ERROR\n");
-        }
-    }
-
    for (int _ana = 0; _ana < GENERAL.size(); ++_ana)
    {
        for (int i = 0; i < GENERAL[_ana]->ROI.size(); ++i)
@@ -492,6 +729,7 @@ bool ClassFlowCNNGeneral::doNeuralNetwork(string time)
                        if (isLogImage)
                            LogImage(logPath, GENERAL[_ana]->ROI[i]->name, &GENERAL[_ana]->ROI[i]->result_float, NULL, time, GENERAL[_ana]->ROI[i]->image_org);
                    } break;
+
                case Digital:
                    {
                        GENERAL[_ana]->ROI[i]->result_klasse = 0;
@@ -500,17 +738,19 @@ bool ClassFlowCNNGeneral::doNeuralNetwork(string time)

                        if (isLogImage)
                        {
+                            string _imagename = GENERAL[_ana]->name +  "_" + GENERAL[_ana]->ROI[i]->name;
                            if (isLogImageSelect)
                            {
                                if (LogImageSelect.find(GENERAL[_ana]->ROI[i]->name) != std::string::npos)
-                                    LogImage(logPath, GENERAL[_ana]->ROI[i]->name, NULL, &GENERAL[_ana]->ROI[i]->result_klasse, time, GENERAL[_ana]->ROI[i]->image_org);
+                                    LogImage(logPath, _imagename, NULL, &GENERAL[_ana]->ROI[i]->result_klasse, time, GENERAL[_ana]->ROI[i]->image_org);
                            }
                            else
                            {
-                                LogImage(logPath, GENERAL[_ana]->ROI[i]->name, NULL, &GENERAL[_ana]->ROI[i]->result_klasse, time, GENERAL[_ana]->ROI[i]->image_org);
+                                LogImage(logPath, _imagename, NULL, &GENERAL[_ana]->ROI[i]->result_klasse, time, GENERAL[_ana]->ROI[i]->image_org);
                            }
                        }
                    } break;
+/*
                case DigitalHyprid:
                    {
                        int _num, _nachkomma;
@@ -536,8 +776,20 @@ bool ClassFlowCNNGeneral::doNeuralNetwork(string time)
                        if (debugdetailgeneral) LogFile.WriteToFile(_zwres);

                        if (isLogImage)
-                            LogImage(logPath, GENERAL[_ana]->ROI[i]->name, &GENERAL[_ana]->ROI[i]->result_float, NULL, time, GENERAL[_ana]->ROI[i]->image_org);
+                        {
+                            string _imagename = GENERAL[_ana]->name +  "_" + GENERAL[_ana]->ROI[i]->name;
+                            if (isLogImageSelect)
+                            {
+                                if (LogImageSelect.find(GENERAL[_ana]->ROI[i]->name) != std::string::npos)
+                                    LogImage(logPath, _imagename, NULL, &GENERAL[_ana]->ROI[i]->result_klasse, time, GENERAL[_ana]->ROI[i]->image_org);
+                            }
+                            else
+                            {
+                                LogImage(logPath, _imagename, NULL, &GENERAL[_ana]->ROI[i]->result_klasse, time, GENERAL[_ana]->ROI[i]->image_org);
+                            }
+                        }
                    } break;
+*/
                case DigitalHyprid10:
                    {
                        int _num, _nachkomma;
@@ -560,8 +812,136 @@ bool ClassFlowCNNGeneral::doNeuralNetwork(string time)
                        if (debugdetailgeneral) LogFile.WriteToFile(_zwres);

                        if (isLogImage)
-                            LogImage(logPath, GENERAL[_ana]->ROI[i]->name, &GENERAL[_ana]->ROI[i]->result_float, NULL, time, GENERAL[_ana]->ROI[i]->image_org);
+                        {
+                            string _imagename = GENERAL[_ana]->name +  "_" + GENERAL[_ana]->ROI[i]->name;
+                            if (isLogImageSelect)
+                            {
+                                if (LogImageSelect.find(GENERAL[_ana]->ROI[i]->name) != std::string::npos)
+                                    LogImage(logPath, _imagename, NULL, &GENERAL[_ana]->ROI[i]->result_klasse, time, GENERAL[_ana]->ROI[i]->image_org);
+                            }
+                            else
+                            {
+                                LogImage(logPath, _imagename, NULL, &GENERAL[_ana]->ROI[i]->result_klasse, time, GENERAL[_ana]->ROI[i]->image_org);
+                            }
+                        }
                    } break;
+
+                case DoubleHyprid10:
+                    {
+                        int _num, _numplus, _numminus;
+                        float _val, _valplus, _valminus;
+                        float _fit;
+                        float _result_save_file;
+
+                        tflite->LoadInputImageBasis(GENERAL[_ana]->ROI[i]->image);        
+                        tflite->Invoke();
+                        if (debugdetailgeneral) LogFile.WriteToFile("Nach Invoke");
+
+                        _num = tflite->GetOutClassification(0, 9);
+                        _numplus = (_num + 1) % 10;
+                        _numminus = (_num - 1 + 10) % 10;
+
+                        _val = tflite->GetOutputValue(_num);
+                        _valplus = tflite->GetOutputValue(_numplus);
+                        _valminus = tflite->GetOutputValue(_numminus);
+
+                        float result = _num;
+
+                        if (_valplus > _valminus)
+                        {
+                            result = result + _valplus / (_valplus + _val);
+                            _fit = _val + _valplus;
+                        }
+                        else
+                        {
+                            result = result - _valminus / (_val + _valminus);
+                            _fit = _val + _valminus;
+
+                        }
+                        if (result > 10)
+                            result = result - 10;
+                        if (result < 0)
+                            result = result + 10;
+
+                        string zw = "_num (p, m): " + to_string(_num) + " " + to_string(_numplus) + " " + to_string(_numminus);
+                        zw = zw + " _val (p, m): " + to_string(_val) + " " + to_string(_valplus) + " " + to_string(_valminus);
+                        zw = zw + " result: " + to_string(result) + " _fit: " + to_string(_fit);
+                        printf("details cnn: %s\n", zw.c_str());
+                        LogFile.WriteToFile(zw);
+
+
+                        _result_save_file = result;
+
+                        if (_fit < CNNGoodThreshold)
+                        {
+                            GENERAL[_ana]->ROI[i]->isReject = true;
+                            result = -1;
+                            _result_save_file+= 100;     // Für den Fall, dass fit nicht ausreichend, soll trotzdem das Ergebnis mit "-10x.y" abgespeichert werden.
+                            string zw = "Value Rejected due to Threshold (Fit: " + to_string(_fit) + "Threshold: " + to_string(CNNGoodThreshold);
+                            printf("Value Rejected due to Threshold (Fit: %f, Threshold: %f\n", _fit, CNNGoodThreshold);
+                            LogFile.WriteToFile(zw);
+                        }
+                        else
+                        {
+                            GENERAL[_ana]->ROI[i]->isReject = false;
+                        }
+
+
+                        GENERAL[_ana]->ROI[i]->result_float = result;
+                        printf("Result General(Analog)%i: %f\n", i, GENERAL[_ana]->ROI[i]->result_float); 
+
+                        if (isLogImage)
+                        {
+                            string _imagename = GENERAL[_ana]->name +  "_" + GENERAL[_ana]->ROI[i]->name;
+                            if (isLogImageSelect)
+                            {
+                                if (LogImageSelect.find(GENERAL[_ana]->ROI[i]->name) != std::string::npos)
+                                    LogImage(logPath, _imagename, &_result_save_file, NULL, time, GENERAL[_ana]->ROI[i]->image_org);
+                            }
+                            else
+                            {
+                                LogImage(logPath, _imagename, &_result_save_file, NULL, time, GENERAL[_ana]->ROI[i]->image_org);
+                            }
+                        }
+                    }
+                    break;
+                case Digital100:
+                case Analogue100:
+                    {
+                        int _num;
+                        float _result_save_file;
+                        
+                        tflite->LoadInputImageBasis(GENERAL[_ana]->ROI[i]->image);        
+                        tflite->Invoke();
+    
+                        _num = tflite->GetOutClassification();
+                        
+                        GENERAL[_ana]->ROI[i]->result_float = (float)_num / 10.0;
+
+ 
+                        _result_save_file = GENERAL[_ana]->ROI[i]->result_float;
+
+                        
+                        GENERAL[_ana]->ROI[i]->isReject = false;
+                        
+                        printf("Result General(Analog)%i: %f\n", i, GENERAL[_ana]->ROI[i]->result_float); 
+
+                        if (isLogImage)
+                        {
+                            string _imagename = GENERAL[_ana]->name +  "_" + GENERAL[_ana]->ROI[i]->name;
+                            if (isLogImageSelect)
+                            {
+                                if (LogImageSelect.find(GENERAL[_ana]->ROI[i]->name) != std::string::npos)
+                                    LogImage(logPath, _imagename, &_result_save_file, NULL, time, GENERAL[_ana]->ROI[i]->image_org);
+                            }
+                            else
+                            {
+                                LogImage(logPath, _imagename, &_result_save_file, NULL, time, GENERAL[_ana]->ROI[i]->image_org);
+                            }
+                        }
+
+                    } break;
+            
                default:
                    break;
            }
@@ -590,11 +970,14 @@ std::vector<HTMLInfo*> ClassFlowCNNGeneral::GetHTMLInfo()
    for (int _ana = 0; _ana < GENERAL.size(); ++_ana)
        for (int i = 0; i < GENERAL[_ana]->ROI.size(); ++i)
        {
+            printf("Image: %d\n", (int) GENERAL[_ana]->ROI[i]->image);
+            if (GENERAL[_ana]->ROI[i]->image)
+            {
                if (GENERAL[_ana]->name == "default")
                    GENERAL[_ana]->ROI[i]->image->SaveToFile(FormatFileName("/sdcard/img_tmp/" + GENERAL[_ana]->ROI[i]->name + ".bmp"));
                else
                    GENERAL[_ana]->ROI[i]->image->SaveToFile(FormatFileName("/sdcard/img_tmp/" + GENERAL[_ana]->name + "_" + GENERAL[_ana]->ROI[i]->name + ".bmp"));
-
+            }

            HTMLInfo *zw = new HTMLInfo;
            if (GENERAL[_ana]->name == "default")
--- a/code/components/jomjol_flowcontroll/ClassFlowCNNGeneral.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowCNNGeneral.h
@@ -8,9 +8,12 @@
 enum t_CNNType {
    AutoDetect,
    Analogue,
+    Analogue100,
    Digital,
-    DigitalHyprid,
+//    DigitalHyprid,
    DigitalHyprid10,
+    DoubleHyprid10,
+    Digital100,
    None
 };

@@ -20,9 +23,17 @@ class ClassFlowCNNGeneral :
 protected:
    t_CNNType CNNType;
    std::vector<general*> GENERAL;
+    float CNNGoodThreshold;
+    float AnalogFehler = 3.0;
+    float AnalogToDigtalFehler = 0.8;
+    float DigitalUnschaerfe = 0.2;
+    int DigitalBand = 3;
+    float DigitalAnalogerVorgaengerUebergangsbereich = 2;
+    float DigitalUebergangsbereichVorgaengerAnalogToDigit = 1; // war vorher 2
+    float DigitalUebergangsbereichVorgaenger = 0.9;

    string cnnmodelfile;
-    int modelxsize, modelysize;
+    int modelxsize, modelysize, modelchannel;
    bool isLogImageSelect;
    string LogImageSelect;
    ClassFlowAlignment* flowpostalignment;
@@ -30,13 +41,19 @@ protected:
    bool SaveAllFiles;   
 //    bool extendedResolution;

-    int ZeigerEval(float zahl, int ziffer_vorgaenger);
-    int ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int eval_vorgaenger);
+//    int ZeigerEval(float zahl, int ziffer_vorgaenger);
+//    int ZeigerEvalHybrid(float zahl, float zahl_vorgaenger, int eval_vorgaenger);
+    int ZeigerEvalAnalogNeu(float zahl, int ziffer_vorgaenger);
+    int ZeigerEvalAnalogToDigitNeu(float zahl, float ziffer_vorgaenger);
+    int ZeigerEvalHybridNeu(float zahl, float zahl_vorgaenger, int eval_vorgaenger, bool AnalogerVorgaenger = false);
+


    bool doNeuralNetwork(string time); 
    bool doAlignAndCut(string time);

+    bool getNetworkParameter();
+
 public:
    ClassFlowCNNGeneral(ClassFlowAlignment *_flowalign, t_CNNType _cnntype = AutoDetect);

@@ -44,7 +61,7 @@ public:
    bool doFlow(string time);

    string getHTMLSingleStep(string host);
-    string getReadout(int _analog, bool _extendedResolution);   
+    string getReadout(int _analog, bool _extendedResolution = false, int prev = -1, float _vorgaengerAnalog = -1);   

    void DrawROI(CImageBasis *_zw); 

--- a/code/components/jomjol_flowcontroll/ClassFlowControll.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowControll.cpp
@@ -49,6 +49,9 @@ std::string ClassFlowControll::doSingleStep(std::string _stepname, std::string _
    if ((_stepname.compare("[MQTT]") == 0) || (_stepname.compare(";[MQTT]") == 0)){
        _classname = "ClassFlowMQTT";
    }
+    if ((_stepname.compare("[InfluxDB]") == 0) || (_stepname.compare(";[InfluxDB]") == 0)){
+        _classname = "ClassFlowInfluxDB";
+    }

    for (int i = 0; i < FlowControll.size(); ++i)
        if (FlowControll[i]->name().compare(_classname) == 0){
@@ -67,14 +70,16 @@ std::string ClassFlowControll::TranslateAktstatus(std::string _input)
        return ("Take Image");
    if (_input.compare("ClassFlowAlignment") == 0)
        return ("Aligning");
-    //if (_input.compare("ClassFlowAnalog") == 0)
-    //    return ("Analog ROIs");
    if (_input.compare("ClassFlowCNNGeneral") == 0)
        return ("Digitalization of ROIs");
    if (_input.compare("ClassFlowMQTT") == 0)
        return ("Sending MQTT");
+    if (_input.compare("ClassFlowInfluxDB") == 0)
+        return ("Sending InfluxDB");
    if (_input.compare("ClassFlowPostProcessing") == 0)
        return ("Processing");
+    if (_input.compare("ClassFlowWriteList") == 0)
+        return ("Processing");

    return "Unkown Status";
 }
@@ -180,7 +185,13 @@ ClassFlow* ClassFlowControll::CreateClassFlow(std::string _type)
    }
    if (toUpper(_type).compare("[MQTT]") == 0)
        cfc = new ClassFlowMQTT(&FlowControll);
+
+    if (toUpper(_type).compare("[INFLUXDB]") == 0)
+        cfc = new ClassFlowInfluxDB(&FlowControll);
        
+    if (toUpper(_type).compare("[WRITELIST]") == 0)
+        cfc = new ClassFlowWriteList(&FlowControll);
+
    if (toUpper(_type).compare("[POSTPROCESSING]") == 0)
    {
        cfc = new ClassFlowPostProcessing(&FlowControll, flowanalog, flowdigit); 
@@ -294,6 +305,7 @@ bool ClassFlowControll::doFlow(string time)
            if (i) i -= 1;    // vorheriger Schritt muss wiederholt werden (vermutlich Bilder aufnehmen)
            result = false;
            if (repeat > 5) {
+                LogFile.SwitchOnOff(true);
                LogFile.WriteToFile("Wiederholung 5x nicht erfolgreich --> reboot");
                doReboot();
                // Schritt wurde 5x wiederholt --> reboot
@@ -482,6 +494,8 @@ bool ClassFlowControll::ReadParameter(FILE* pfile, string& aktparamgraph)
                // reboot notwendig damit die neue wlan.ini auch benutzt wird !!!
                fclose(pfile);
                printf("do reboot\n");
+                LogFile.SwitchOnOff(true);
+                LogFile.WriteToFile("Reboot to activate new HOSTNAME.");
                esp_restart();
                hard_restart();                   
                doReboot();
@@ -575,6 +589,8 @@ esp_err_t ClassFlowControll::GetJPGStream(std::string _fn, httpd_req_t *req)
        {
            std::vector<HTMLInfo*> htmlinfo;
            htmlinfo = GetAllDigital();
+            printf("After getClassFlowControll::GetAllDigital\n");
+
            for (int i = 0; i < htmlinfo.size(); ++i)
            {
                if (_fn == htmlinfo[i]->filename)
@@ -632,35 +648,7 @@ esp_err_t ClassFlowControll::GetJPGStream(std::string _fn, httpd_req_t *req)
    return result;
 }

-
-string ClassFlowControll::getJSON()
+string ClassFlowControll::getJSON(std::string _id, std::string _mac)
 {
-    std::vector<NumberPost*>* NUMBERS = flowpostprocessing->GetNumbers();
-
-    std::string json="{\n";
-
-    for (int i = 0; i < (*NUMBERS).size(); ++i)
-    {
-        json += "\"" + (*NUMBERS)[i]->name + "\":\n";
-        json += "  {\n";
-        if ((*NUMBERS)[i]->ReturnValue.length() > 0)
-            json += "    \"value\": "      + (*NUMBERS)[i]->ReturnValue          + ",\n";
-        else
-            json += "    \"value\": \"\",\n";
-        json += "    \"raw\": \""        + (*NUMBERS)[i]->ReturnRawValue              + "\",\n";
-        json += "    \"error\": \""     + (*NUMBERS)[i]->ErrorMessageText             + "\",\n";
-        if ((*NUMBERS)[i]->ReturnRateValue.length() > 0)
-            json += "    \"rate\": "      + (*NUMBERS)[i]->ReturnRateValue                + ",\n";
-        else
-            json += "    \"rate\": \"\",\n";
-
-        json += "    \"timestamp\": \"" + (*NUMBERS)[i]->timeStamp                    + "\"\n";
-        if ((i+1) < (*NUMBERS).size())
-            json += "  },\n";
-        else
-            json += "  }\n";
-    }
-    json += "}";
-
-    return json;
+    return flowpostprocessing->GetJSON(_id, _mac);
 }
--- a/code/components/jomjol_flowcontroll/ClassFlowControll.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowControll.h
@@ -9,7 +9,9 @@
 #include "ClassFlowCNNGeneral.h"
 #include "ClassFlowPostProcessing.h"
 #include "ClassFlowMQTT.h"
+#include "ClassFlowInfluxDB.h"
 #include "ClassFlowCNNGeneral.h"
+#include "ClassFlowWriteList.h"


 #define READOUT_TYPE_VALUE 0
@@ -48,7 +50,7 @@ public:
 	string UpdatePrevalue(std::string _newvalue, std::string _numbers, bool _extern);
 	string GetPrevalue(std::string _number = "");	
 	bool ReadParameter(FILE* pfile, string& aktparamgraph);	
-	string getJSON();
+	string getJSON(std::string _id = "", std::string _mac = "");

 	string TranslateAktstatus(std::string _input);

--- a/code/components/jomjol_flowcontroll/ClassFlowDefineTypes.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowDefineTypes.h
@@ -7,6 +7,7 @@ struct roi {
    int posx, posy, deltax, deltay;
    float result_float;
    int result_klasse;
+    bool isReject;
    string name;
    CImageBasis *image, *image_org;
 };
@@ -36,9 +37,10 @@ struct NumberPost {
    float PreValue;             // letzter Wert, der gut ausgelesen wurde
    float Value;                // letzer ausgelesener Wert, inkl. Korrekturen
    string ReturnRateValue;      // RückgabewertRate
+    string ReturnChangeAbsolute;      // RückgabewertRate
    string ReturnRawValue;      // Rohwert (mit N & führenden 0)    
    string ReturnValue;         // korrigierter Rückgabewert, ggf. mit Fehlermeldung
-    string ReturnPreValue;  // korrigierter Rückgabewert ohne Fehlermeldung
+    string ReturnPreValue;      // korrigierter Rückgabewert ohne Fehlermeldung
    string ErrorMessageText;        // Fehlermeldung bei Consistency Check
    int AnzahlAnalog;
    int AnzahlDigital;
--- a/code/components/jomjol_flowcontroll/ClassFlowInfluxDB.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowInfluxDB.cpp
@@ -0,0 +1,161 @@
+#include <sstream>
+#include "ClassFlowInfluxDB.h"
+#include "Helper.h"
+#include "connect_wlan.h"
+
+#include "time_sntp.h"
+#include "interface_influxdb.h"
+#include "ClassFlowPostProcessing.h"
+
+#include <time.h>
+
+void ClassFlowInfluxDB::SetInitialParameter(void)
+{
+    uri = "";
+    database = "";
+    measurement = "";
+
+    OldValue = "";
+    flowpostprocessing = NULL;  
+    user = "";
+    password = "";   
+    previousElement = NULL;
+    ListFlowControll = NULL; 
+    disabled = false;
+    InfluxDBenable = false;
+}       
+
+ClassFlowInfluxDB::ClassFlowInfluxDB()
+{
+    SetInitialParameter();
+}
+
+ClassFlowInfluxDB::ClassFlowInfluxDB(std::vector<ClassFlow*>* lfc)
+{
+    SetInitialParameter();
+
+    ListFlowControll = lfc;
+    for (int i = 0; i < ListFlowControll->size(); ++i)
+    {
+        if (((*ListFlowControll)[i])->name().compare("ClassFlowPostProcessing") == 0)
+        {
+            flowpostprocessing = (ClassFlowPostProcessing*) (*ListFlowControll)[i];
+        }
+    }
+}
+
+ClassFlowInfluxDB::ClassFlowInfluxDB(std::vector<ClassFlow*>* lfc, ClassFlow *_prev)
+{
+    SetInitialParameter();
+
+    previousElement = _prev;
+    ListFlowControll = lfc;
+
+    for (int i = 0; i < ListFlowControll->size(); ++i)
+    {
+        if (((*ListFlowControll)[i])->name().compare("ClassFlowPostProcessing") == 0)
+        {
+            flowpostprocessing = (ClassFlowPostProcessing*) (*ListFlowControll)[i];
+        }
+    }
+}
+
+
+bool ClassFlowInfluxDB::ReadParameter(FILE* pfile, string& aktparamgraph)
+{
+    std::vector<string> zerlegt;
+
+    aktparamgraph = trim(aktparamgraph);
+
+    if (aktparamgraph.size() == 0)
+        if (!this->GetNextParagraph(pfile, aktparamgraph))
+            return false;
+
+    if (toUpper(aktparamgraph).compare("[INFLUXDB]") != 0) 
+        return false;
+
+    while (this->getNextLine(pfile, &aktparamgraph) && !this->isNewParagraph(aktparamgraph))
+    {
+        printf("while loop reading line: %s\n", aktparamgraph.c_str());
+        zerlegt = this->ZerlegeZeile(aktparamgraph);
+        if ((toUpper(zerlegt[0]) == "USER") && (zerlegt.size() > 1))
+        {
+            this->user = zerlegt[1];
+        }  
+        if ((toUpper(zerlegt[0]) == "PASSWORD") && (zerlegt.size() > 1))
+        {
+            this->password = zerlegt[1];
+        }               
+        if ((toUpper(zerlegt[0]) == "URI") && (zerlegt.size() > 1))
+        {
+            this->uri = zerlegt[1];
+        }
+        if (((toUpper(zerlegt[0]) == "MEASUREMENT")) && (zerlegt.size() > 1))
+        {
+            this->measurement = zerlegt[1];
+        }
+        if (((toUpper(zerlegt[0]) == "DATABASE")) && (zerlegt.size() > 1))
+        {
+            this->database = zerlegt[1];
+        }
+    }
+
+    if ((uri.length() > 0) && (database.length() > 0) && (measurement.length() > 0)) 
+    { 
+        printf("Init InfluxDB with uri: %s, measurement: %s, user: %s, password: %s\n", uri.c_str(), measurement.c_str(), user.c_str(), password.c_str());
+        InfluxDBInit(uri, database, measurement, user, password); 
+        InfluxDBenable = true;
+    } else {
+        printf("InfluxDB init skipped as we are missing some parameters");
+    }
+   
+    return true;
+}
+
+
+string ClassFlowInfluxDB::GetInfluxDBMeasurement()
+{
+    return measurement;
+}
+
+
+bool ClassFlowInfluxDB::doFlow(string zwtime)
+{
+    if (!InfluxDBenable)
+        return true;
+
+    std::string result;
+    std::string resulterror = "";
+    std::string resultraw = "";
+    std::string resultrate = "";
+    std::string resulttimestamp = "";
+    string zw = "";
+    string namenumber = "";
+
+    if (flowpostprocessing)
+    {
+        std::vector<NumberPost*>* NUMBERS = flowpostprocessing->GetNumbers();
+
+        for (int i = 0; i < (*NUMBERS).size(); ++i)
+        {
+            result =  (*NUMBERS)[i]->ReturnValue;
+            resultraw =  (*NUMBERS)[i]->ReturnRawValue;
+            resulterror = (*NUMBERS)[i]->ErrorMessageText;
+            resultrate = (*NUMBERS)[i]->ReturnRateValue;
+            resulttimestamp = (*NUMBERS)[i]->timeStamp;
+
+            namenumber = (*NUMBERS)[i]->name;
+            if (namenumber == "default")
+                namenumber = "value";
+            else
+                namenumber = namenumber + "/value";
+
+            if (result.length() > 0 && resulttimestamp.length() > 0)   
+                InfluxDBPublish(namenumber, result, resulttimestamp);
+        }
+    }
+   
+    OldValue = result;
+    
+    return true;
+}
--- a/code/components/jomjol_flowcontroll/ClassFlowInfluxDB.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowInfluxDB.h
@@ -0,0 +1,31 @@
+#pragma once
+#include "ClassFlow.h"
+
+#include "ClassFlowPostProcessing.h"
+
+#include <string>
+
+class ClassFlowInfluxDB :
+    public ClassFlow
+{
+protected:
+    std::string uri, database, measurement;
+    std::string OldValue;
+	ClassFlowPostProcessing* flowpostprocessing;  
+    std::string user, password; 
+    bool InfluxDBenable;
+
+    void SetInitialParameter(void);        
+
+public:
+    ClassFlowInfluxDB();
+    ClassFlowInfluxDB(std::vector<ClassFlow*>* lfc);
+    ClassFlowInfluxDB(std::vector<ClassFlow*>* lfc, ClassFlow *_prev);
+
+    string GetInfluxDBMeasurement();
+
+    bool ReadParameter(FILE* pfile, string& aktparamgraph);
+    bool doFlow(string time);
+    string name(){return "ClassFlowInfluxDB";};
+};
+
--- a/code/components/jomjol_flowcontroll/ClassFlowMQTT.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowMQTT.cpp
@@ -25,7 +25,8 @@ void ClassFlowMQTT::SetInitialParameter(void)
    OldValue = "";
    flowpostprocessing = NULL;  
    user = "";
-    password = "";   
+    password = ""; 
+    SetRetainFlag = 0;  
    previousElement = NULL;
    ListFlowControll = NULL; 
    disabled = false;
@@ -99,6 +100,12 @@ bool ClassFlowMQTT::ReadParameter(FILE* pfile, string& aktparamgraph)
        {
            this->uri = zerlegt[1];
        }
+        if ((toUpper(zerlegt[0]) == "SETRETAINFLAG") && (zerlegt.size() > 1))
+        {
+            if (toUpper(zerlegt[1]) == "TRUE")
+                SetRetainFlag = 1;  
+        }
+

        if ((toUpper(zerlegt[0]) == "CLIENTID") && (zerlegt.size() > 1))
        {
@@ -118,7 +125,7 @@ bool ClassFlowMQTT::ReadParameter(FILE* pfile, string& aktparamgraph)
        mainerrortopic = maintopic + "/connection";
        printf("Init MQTT with uri: %s, clientname: %s, user: %s, password: %s, maintopic: %s\n", uri.c_str(), clientname.c_str(), user.c_str(), password.c_str(), mainerrortopic.c_str());
        MQTTInit(uri, clientname, user, password, mainerrortopic, 60); 
-        MQTTPublish(mainerrortopic, "connected");
+        MQTTPublish(mainerrortopic, "connected", SetRetainFlag);
        MQTTenable = true;
    }
   
@@ -142,6 +149,7 @@ bool ClassFlowMQTT::doFlow(string zwtime)
    std::string resultraw = "";
    std::string resultrate = "";
    std::string resulttimestamp = "";
+    std::string resultchangabs = "";
    string zw = "";
    string namenumber = "";

@@ -150,17 +158,17 @@ bool ClassFlowMQTT::doFlow(string zwtime)
    zw = maintopic + "/" + "uptime";
    char uptimeStr[11];
    sprintf(uptimeStr, "%ld", (long)getUpTime());
-    MQTTPublish(zw, uptimeStr);
+    MQTTPublish(zw, uptimeStr, SetRetainFlag);

    zw = maintopic + "/" + "freeMem";
    char freeheapmem[11];
    sprintf(freeheapmem, "%zu", esp_get_free_heap_size());
-    MQTTPublish(zw, freeheapmem);
+    MQTTPublish(zw, freeheapmem, SetRetainFlag);

    zw = maintopic + "/" + "wifiRSSI";
    char rssi[11];
    sprintf(rssi, "%d", get_WIFI_RSSI());
-    MQTTPublish(zw, rssi);
+    MQTTPublish(zw, rssi, SetRetainFlag);


    if (flowpostprocessing)
@@ -173,6 +181,7 @@ bool ClassFlowMQTT::doFlow(string zwtime)
            resultraw =  (*NUMBERS)[i]->ReturnRawValue;
            resulterror = (*NUMBERS)[i]->ErrorMessageText;
            resultrate = (*NUMBERS)[i]->ReturnRateValue;
+            resultchangabs = (*NUMBERS)[i]->ReturnChangeAbsolute;
            resulttimestamp = (*NUMBERS)[i]->timeStamp;

            namenumber = (*NUMBERS)[i]->name;
@@ -183,23 +192,27 @@ bool ClassFlowMQTT::doFlow(string zwtime)

            zw = namenumber + "value"; 
            if (result.length() > 0)   
-                MQTTPublish(zw, result);
+                MQTTPublish(zw, result, SetRetainFlag);

            zw = namenumber + "error"; 
            if (resulterror.length() > 0)  
-                MQTTPublish(zw, resulterror, 1);
+                MQTTPublish(zw, resulterror, SetRetainFlag);

            zw = namenumber + "rate"; 
            if (resultrate.length() > 0)   
-                MQTTPublish(zw, resultrate);
+                MQTTPublish(zw, resultrate, SetRetainFlag);
+
+            zw = namenumber + "changeabsolut"; 
+            if (resultchangabs.length() > 0)   
+                MQTTPublish(zw, resultchangabs, SetRetainFlag);

            zw = namenumber + "raw"; 
            if (resultraw.length() > 0)   
-                MQTTPublish(zw, resultraw);
+                MQTTPublish(zw, resultraw, SetRetainFlag);

            zw = namenumber + "timestamp";
            if (resulttimestamp.length() > 0)
-                MQTTPublish(zw, resulttimestamp);
+                MQTTPublish(zw, resulttimestamp, SetRetainFlag);


            std::string json = "";
@@ -218,7 +231,7 @@ bool ClassFlowMQTT::doFlow(string zwtime)
            json += ",\"timestamp\":\""+resulttimestamp+"\"}";

            zw = namenumber + "json";
-            MQTTPublish(zw, json);
+            MQTTPublish(zw, json, SetRetainFlag);
        }
    }
    else
@@ -234,7 +247,7 @@ bool ClassFlowMQTT::doFlow(string zwtime)
                    result = result + "\t" + zw;
            }
        }
-        MQTTPublish(topic, result);
+        MQTTPublish(topic, result, SetRetainFlag);
    }
    
    OldValue = result;
--- a/code/components/jomjol_flowcontroll/ClassFlowMQTT.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowMQTT.h
@@ -13,6 +13,7 @@ protected:
    std::string OldValue;
 	ClassFlowPostProcessing* flowpostprocessing;  
    std::string user, password; 
+    int SetRetainFlag;
    bool MQTTenable;

    std::string maintopic, mainerrortopic; 
--- a/code/components/jomjol_flowcontroll/ClassFlowPostProcessing.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowPostProcessing.cpp
@@ -15,6 +15,42 @@
 #define PREVALUE_TIME_FORMAT_INPUT "%d-%d-%dT%d:%d:%d"


+std::string ClassFlowPostProcessing::GetJSON(std::string _id, std::string _mac, std::string _lineend)
+{
+    std::string json="{" + _lineend;
+
+    for (int i = 0; i < NUMBERS.size(); ++i)
+    {
+        json += "\"" + NUMBERS[i]->name + "\":"  + _lineend;
+        json += "  {"  + _lineend;
+
+        if (_id.length() > 0)
+            json += "    \"ID\": \"" + _id + "\","  + _lineend;
+        if (_mac.length() > 0)
+            json += "    \"MAC\": \"" + _mac + "\","  + _lineend;
+
+        if (NUMBERS[i]->ReturnValue.length() > 0)
+            json += "    \"value\": \""      + NUMBERS[i]->ReturnValue          + "\"," + _lineend;
+        else
+            json += "    \"value\": \"\","  + _lineend;
+        json += "    \"raw\": \""        + NUMBERS[i]->ReturnRawValue              + "\","  + _lineend;
+        json += "    \"error\": \""     + NUMBERS[i]->ErrorMessageText             + "\","  + _lineend;
+        if (NUMBERS[i]->ReturnRateValue.length() > 0)
+            json += "    \"rate\": "      + NUMBERS[i]->ReturnRateValue                + ","  + _lineend;
+        else
+            json += "    \"rate\": \"\","  + _lineend;
+
+        json += "    \"timestamp\": \"" + NUMBERS[i]->timeStamp                    + "\""  + _lineend;
+        if ((i+1) < NUMBERS.size())
+            json += "  }," + _lineend;
+        else
+            json += "  }" + _lineend;
+    }
+    json += "}";
+
+    return json;
+}
+
 string ClassFlowPostProcessing::GetPreValue(std::string _number)
 {
    std::string result;
@@ -41,6 +77,8 @@ void ClassFlowPostProcessing::SetPreValue(float zw, string _numbers, bool _exter
        if (NUMBERS[j]->name == _numbers)
        {
            NUMBERS[j]->PreValue = zw;
+            NUMBERS[j]->ReturnPreValue = std::to_string(zw);
+            NUMBERS[j]->PreValueOkay = true;
            if (_extern)
            {
                time(&(NUMBERS[j]->lastvalue));
@@ -200,8 +238,9 @@ void ClassFlowPostProcessing::SavePreValue()

        _zw = NUMBERS[j]->name + "\t" + NUMBERS[j]->timeStamp + "\t" + RundeOutput(NUMBERS[j]->PreValue, NUMBERS[j]->Nachkomma) + "\n";
        printf("Write PreValue Zeile: %s\n", _zw.c_str());
-
-        fputs(_zw.c_str(), pFile);
+        if (pFile) {
+            fputs(_zw.c_str(), pFile);
+        }
    }

    UpdatePreValueINI = false;
@@ -505,7 +544,6 @@ void ClassFlowPostProcessing::InitNUMBERS()

        _number->ReturnRawValue = "";      // Rohwert (mit N & führenden 0)    
        _number->ReturnValue = "";         // korrigierter Rückgabewert, ggf. mit Fehlermeldung
-//        _number->ReturnValueNoError = "";  // korrigierter Rückgabewert ohne Fehlermeldung
        _number->ErrorMessageText = "";        // Fehlermeldung bei Consistency Check
        _number->ReturnPreValue = "";
        _number->PreValueOkay = false;
@@ -524,7 +562,6 @@ void ClassFlowPostProcessing::InitNUMBERS()
        _number->Value = 0;                // letzer ausgelesener Wert, inkl. Korrekturen
        _number->ReturnRawValue = "";      // Rohwert (mit N & führenden 0)    
        _number->ReturnValue = "";         // korrigierter Rückgabewert, ggf. mit Fehlermeldung
-//        _number->ReturnValueNoError = "";  // korrigierter Rückgabewert ohne Fehlermeldung
        _number->ErrorMessageText = "";        // Fehlermeldung bei Consistency Check

        _number->Nachkomma = _number->AnzahlAnalog;
@@ -532,8 +569,10 @@ void ClassFlowPostProcessing::InitNUMBERS()
        NUMBERS.push_back(_number);
    }

-    for (int i = 0; i < NUMBERS.size(); ++i)
+    for (int i = 0; i < NUMBERS.size(); ++i) {
        printf("Number %s, Anz DIG: %d, Anz ANA %d\n", NUMBERS[i]->name.c_str(), NUMBERS[i]->AnzahlDigital, NUMBERS[i]->AnzahlAnalog);
+    }
+
 }

 string ClassFlowPostProcessing::ShiftDecimal(string in, int _decShift){
@@ -612,18 +651,29 @@ bool ClassFlowPostProcessing::doFlow(string zwtime)

        UpdateNachkommaDecimalShift();

+        int previous_value = -1;
+
+        if (NUMBERS[j]->analog_roi)
+        {
+            NUMBERS[j]->ReturnRawValue = flowAnalog->getReadout(j, NUMBERS[j]->isExtendedResolution); 
+            if (NUMBERS[j]->ReturnRawValue.length() > 0)
+            {
+                char zw = NUMBERS[j]->ReturnRawValue[0];
+                if (zw >= 48 && zw <=57)
+                    previous_value = zw - 48;
+            }
+        }
+
+        if (NUMBERS[j]->digit_roi && NUMBERS[j]->analog_roi)
+            NUMBERS[j]->ReturnRawValue = "." + NUMBERS[j]->ReturnRawValue;
+
        if (NUMBERS[j]->digit_roi)
        {
            if (NUMBERS[j]->analog_roi) 
-                NUMBERS[j]->ReturnRawValue = flowDigit->getReadout(j, false);
+                NUMBERS[j]->ReturnRawValue = flowDigit->getReadout(j, false, previous_value, NUMBERS[j]->analog_roi->ROI[0]->result_float) + NUMBERS[j]->ReturnRawValue;
            else
-                NUMBERS[j]->ReturnRawValue = flowDigit->getReadout(j, NUMBERS[j]->isExtendedResolution);        // Extended Resolution nur falls es keine analogen Ziffern gibt
+                NUMBERS[j]->ReturnRawValue = flowDigit->getReadout(j, NUMBERS[j]->isExtendedResolution, previous_value);        // Extended Resolution nur falls es keine analogen Ziffern gibt
        }
-        if (NUMBERS[j]->digit_roi && NUMBERS[j]->analog_roi)
-            NUMBERS[j]->ReturnRawValue = NUMBERS[j]->ReturnRawValue + ".";
-
-        if (NUMBERS[j]->analog_roi)
-            NUMBERS[j]->ReturnRawValue = NUMBERS[j]->ReturnRawValue + flowAnalog->getReadout(j, NUMBERS[j]->isExtendedResolution); 

        NUMBERS[j]->ReturnRawValue = ShiftDecimal(NUMBERS[j]->ReturnRawValue, NUMBERS[j]->DecimalShift);

@@ -675,7 +725,7 @@ bool ClassFlowPostProcessing::doFlow(string zwtime)

        if (NUMBERS[j]->useMaxRateValue && PreValueUse && NUMBERS[j]->PreValueOkay)
        {
-            float _ratedifference;                                                   
+            float _ratedifference;  
            if (NUMBERS[j]->RateType == RateChange)
                _ratedifference = NUMBERS[j]->FlowRateAct;
            else
@@ -691,6 +741,7 @@ bool ClassFlowPostProcessing::doFlow(string zwtime)
            }
        }

+        NUMBERS[j]->ReturnChangeAbsolute = RundeOutput(NUMBERS[j]->Value - NUMBERS[j]->PreValue, NUMBERS[j]->Nachkomma);                                                
        NUMBERS[j]->lastvalue = imagetime;
        NUMBERS[j]->PreValue = NUMBERS[j]->Value;
        NUMBERS[j]->PreValueOkay = true;
--- a/code/components/jomjol_flowcontroll/ClassFlowPostProcessing.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowPostProcessing.h
@@ -60,6 +60,8 @@ public:
    string GetPreValue(std::string _number = "");
    void SetPreValue(float zw, string _numbers, bool _extern = false);

+    std::string GetJSON(std::string _id = "", std::string _mac = "", std::string _lineend = "\n");
+
    void UpdateNachkommaDecimalShift();

    std::vector<NumberPost*>* GetNumbers(){return &NUMBERS;};
--- a/code/components/jomjol_flowcontroll/ClassFlowWriteList.cpp
+++ b/code/components/jomjol_flowcontroll/ClassFlowWriteList.cpp
@@ -0,0 +1,97 @@
+#include <sstream>
+#include "ClassFlowWriteList.h"
+#include "Helper.h"
+
+#include "time_sntp.h"
+
+
+#include <time.h>
+
+void ClassFlowWriteList::SetInitialParameter(void)
+{
+    flowpostprocessing = NULL;  
+    previousElement = NULL;
+    ListFlowControll = NULL; 
+    disabled = false;
+}       
+
+ClassFlowWriteList::ClassFlowWriteList()
+{
+    SetInitialParameter();
+}
+
+ClassFlowWriteList::ClassFlowWriteList(std::vector<ClassFlow*>* lfc)
+{
+    SetInitialParameter();
+
+    ListFlowControll = lfc;
+    for (int i = 0; i < ListFlowControll->size(); ++i)
+    {
+        if (((*ListFlowControll)[i])->name().compare("ClassFlowPostProcessing") == 0)
+        {
+            flowpostprocessing = (ClassFlowPostProcessing*) (*ListFlowControll)[i];
+        }
+    }
+}
+
+
+bool ClassFlowWriteList::ReadParameter(FILE* pfile, string& aktparamgraph)
+{
+    std::vector<string> zerlegt;
+
+    aktparamgraph = trim(aktparamgraph);
+
+    if (aktparamgraph.size() == 0)
+        if (!this->GetNextParagraph(pfile, aktparamgraph))
+            return false;
+
+    if (toUpper(aktparamgraph).compare("[MQTT]") != 0)       // Paragraph passt nich zu MakeImage
+        return false;
+
+    while (this->getNextLine(pfile, &aktparamgraph) && !this->isNewParagraph(aktparamgraph))
+    {
+        zerlegt = this->ZerlegeZeile(aktparamgraph);
+/*
+        if ((toUpper(zerlegt[0]) == "USER") && (zerlegt.size() > 1))
+        {
+            this->user = zerlegt[1];
+        }  
+*/
+    }
+   
+    return true;
+}
+
+
+
+bool ClassFlowWriteList::doFlow(string zwtime)
+{
+    std::string line = "";
+
+    std::string result;
+    std::string resulterror = "";
+    std::string resultraw = "";
+    std::string resultrate = "";
+    std::string resulttimestamp = "";
+    string zw = "";
+    string namenumber = "";
+
+    if (flowpostprocessing)
+    {
+        std::vector<NumberPost*>* NUMBERS = flowpostprocessing->GetNumbers();
+
+        for (int i = 0; i < (*NUMBERS).size(); ++i)
+        {
+            result =  (*NUMBERS)[i]->ReturnValue;
+            resultraw =  (*NUMBERS)[i]->ReturnRawValue;
+            resulterror = (*NUMBERS)[i]->ErrorMessageText;
+            resultrate = (*NUMBERS)[i]->ReturnRateValue;
+            resulttimestamp = (*NUMBERS)[i]->timeStamp;
+
+            line = line + resulttimestamp + "\t" + resultraw + "\t" + result + "\t" + resultraw + "\t" + resultrate + "\t" + resulttimestamp + "\t"; 
+
+        }
+    }
+    
+    return true;
+}
--- a/code/components/jomjol_flowcontroll/ClassFlowWriteList.h
+++ b/code/components/jomjol_flowcontroll/ClassFlowWriteList.h
@@ -0,0 +1,22 @@
+#pragma once
+#include "ClassFlow.h"
+#include "ClassFlowPostProcessing.h"
+
+#include <string>
+
+class ClassFlowWriteList :
+    public ClassFlow
+{
+protected:
+	ClassFlowPostProcessing* flowpostprocessing;  
+	void SetInitialParameter(void);        
+
+public:
+    ClassFlowWriteList();
+    ClassFlowWriteList(std::vector<ClassFlow*>* lfc);
+
+    bool ReadParameter(FILE* pfile, string& aktparamgraph);
+    bool doFlow(string time);
+    string name(){return "ClassFlowWriteList";};
+};
+
--- a/code/components/jomjol_image_proc/CFindTemplate.h
+++ b/code/components/jomjol_image_proc/CFindTemplate.h
@@ -20,7 +20,7 @@ struct RefInfo {
    int fastalg_max = -1;
    float fastalg_SAD = -1;
    float fastalg_SAD_criteria = -1;
-    int alignment_algo = 0;             // 0 = "Default" (nur R-Kanal), 1 = "HighAccurity" (RGB-Kanal), 2 = "Fast" (1.x RGB, dann isSimilar)
+    int alignment_algo = 0;             // 0 = "Default" (nur R-Kanal), 1 = "HighAccuracy" (RGB-Kanal), 2 = "Fast" (1.x RGB, dann isSimilar)
 };


--- a/code/components/jomjol_image_proc/CRotateImage.cpp
+++ b/code/components/jomjol_image_proc/CRotateImage.cpp
@@ -156,12 +156,140 @@ void CRotateImage::Rotate(float _angle, int _centerx, int _centery)
    RGBImageRelease();
 }

+
+
+void CRotateImage::RotateAntiAliasing(float _angle, int _centerx, int _centery)
+{
+    int org_width, org_height;
+    float m[2][3];
+
+    float x_center = _centerx;
+    float y_center = _centery;
+    _angle = _angle / 180 * M_PI;
+
+    if (doflip)
+    {
+        org_width = width;
+        org_height = height;
+        height = org_width;
+        width = org_height;
+        x_center =  x_center - (org_width/2) + (org_height/2);
+        y_center =  y_center + (org_width/2) - (org_height/2);
+        if (ImageOrg)
+        {
+            ImageOrg->height = height;
+            ImageOrg->width = width;
+        }
+    }
+    else
+    {
+        org_width = width;
+        org_height = height;
+    }
+
+    m[0][0] = cos(_angle);
+    m[0][1] = sin(_angle);
+    m[0][2] = (1 - m[0][0]) * x_center - m[0][1] * y_center;
+
+    m[1][0] = -m[0][1];
+    m[1][1] = m[0][0];
+    m[1][2] = m[0][1] * x_center + (1 - m[0][0]) * y_center;
+
+    if (doflip)
+    {
+        m[0][2] = m[0][2] + (org_width/2) - (org_height/2);
+        m[1][2] = m[1][2] - (org_width/2) + (org_height/2);
+    }
+
+    int memsize = width * height * channels;
+    uint8_t* odata;
+    if (ImageTMP)
+    {
+        odata = ImageTMP->RGBImageLock();
+    }
+    else
+    {
+        odata = (unsigned char*)GET_MEMORY(memsize);
+    }
+    
+
+    int x_source_1, y_source_1, x_source_2, y_source_2;
+    float x_source, y_source;
+    float quad_ul, quad_ur, quad_ol, quad_or;
+    stbi_uc* p_target;
+    stbi_uc *p_source_ul, *p_source_ur, *p_source_ol, *p_source_or;
+
+    RGBImageLock();
+
+    for (int x = 0; x < width; ++x)
+        for (int y = 0; y < height; ++y)
+        {
+            p_target = odata + (channels * (y * width + x));
+
+            x_source = (m[0][0] * x + m[0][1] * y);
+            y_source = (m[1][0] * x + m[1][1] * y);
+
+            x_source += (m[0][2]);
+            y_source += (m[1][2]);
+
+            x_source_1 = (int)x_source;
+            x_source_2 = x_source_1 + 1;
+            y_source_1 = (int)y_source;
+            y_source_2 = y_source_1 + 1;
+
+            quad_ul = (x_source_2 - x_source) * (y_source_2 - y_source);
+            quad_ur = (1- (x_source_2 - x_source)) * (y_source_2 - y_source);
+            quad_or = (x_source_2 - x_source) * (1-(y_source_2 - y_source));
+            quad_ol = (1- (x_source_2 - x_source)) * (1-(y_source_2 - y_source));
+
+
+            if ((x_source_1 >= 0) && (x_source_2 < org_width) && (y_source_1 >= 0) && (y_source_2 < org_height))
+            {
+                p_source_ul = rgb_image + (channels * (y_source_1 * org_width + x_source_1));
+                p_source_ur = rgb_image + (channels * (y_source_1 * org_width + x_source_2));
+                p_source_or = rgb_image + (channels * (y_source_2 * org_width + x_source_1));
+                p_source_ol = rgb_image + (channels * (y_source_2 * org_width + x_source_2));
+                for (int _channels = 0; _channels < channels; ++_channels)
+                {
+                    p_target[_channels] = (int)((float)p_source_ul[_channels] * quad_ul
+                                                + (float)p_source_ur[_channels] * quad_ur
+                                                + (float)p_source_or[_channels] * quad_or
+                                                + (float)p_source_ol[_channels] * quad_ol);
+                }
+            }
+            else
+            {
+                for (int _channels = 0; _channels < channels; ++_channels)
+                    p_target[_channels] = 255;
+            }
+        }
+
+    //    memcpy(rgb_image, odata, memsize);
+    memCopy(odata, rgb_image, memsize);
+
+    if (!ImageTMP)
+    {
+        stbi_image_free(odata);
+    }
+    if (ImageTMP)
+        ImageTMP->RGBImageRelease();
+
+    RGBImageRelease();
+}
+
+
 void CRotateImage::Rotate(float _angle)
 {
 //    printf("width %d, height %d\n", width, height);
    Rotate(_angle, width / 2, height / 2);
 }

+void CRotateImage::RotateAntiAliasing(float _angle)
+{
+//    printf("width %d, height %d\n", width, height);
+    RotateAntiAliasing(_angle, width / 2, height / 2);
+}
+
 void CRotateImage::Translate(int _dx, int _dy)
 {
    int memsize = width * height * channels;
--- a/code/components/jomjol_image_proc/CRotateImage.h
+++ b/code/components/jomjol_image_proc/CRotateImage.h
@@ -11,7 +11,11 @@ class CRotateImage: public CImageBasis
        CRotateImage(CImageBasis *_org, CImageBasis *_temp, bool _flip = false);

        void Rotate(float _angle);
+        void RotateAntiAliasing(float _angle);
+       
        void Rotate(float _angle, int _centerx, int _centery);
+        void RotateAntiAliasing(float _angle, int _centerx, int _centery);
+
        void Translate(int _dx, int _dy);
        void Mirror();
 };
--- a/code/components/jomjol_influxdb/CMakeLists.txt
+++ b/code/components/jomjol_influxdb/CMakeLists.txt
@@ -0,0 +1,7 @@
+FILE(GLOB_RECURSE app_sources ${CMAKE_CURRENT_SOURCE_DIR}/*.*)
+
+idf_component_register(SRCS ${app_sources}
+                    INCLUDE_DIRS "."
+                    REQUIRES tflite-lib esp_http_client jomjol_logfile)
+
+
--- a/code/components/jomjol_influxdb/interface_influxdb.cpp
+++ b/code/components/jomjol_influxdb/interface_influxdb.cpp
@@ -0,0 +1,114 @@
+#include "interface_influxdb.h"
+
+//#define LOG_LOCAL_LEVEL ESP_LOG_DEBUG
+#include "esp_log.h"
+#include <time.h>
+#include "ClassLogFile.h"
+#include "esp_http_client.h"
+
+#define MAX_HTTP_OUTPUT_BUFFER 2048
+
+static const char *TAG_INTERFACEINFLUXDB = "interface_influxdb";
+
+std::string _influxDBURI;
+std::string _influxDBDatabase;
+std::string _influxDBMeasurement;
+std::string _influxDBUser;
+std::string _influxDBPassword;
+
+static esp_err_t http_event_handler(esp_http_client_event_t *evt)
+{
+    switch(evt->event_id)
+    {
+        case HTTP_EVENT_ERROR:
+            ESP_LOGE(TAG_INTERFACEINFLUXDB, "HTTP Client Error encountered");
+            break;
+        case HTTP_EVENT_ON_CONNECTED:
+            ESP_LOGI(TAG_INTERFACEINFLUXDB, "HTTP Client Connected");
+            break;
+        case HTTP_EVENT_HEADERS_SENT:
+            ESP_LOGV(TAG_INTERFACEINFLUXDB, "HTTP Client sent all request headers");
+            break;
+        case HTTP_EVENT_ON_HEADER:
+            ESP_LOGV(TAG_INTERFACEINFLUXDB, "Header: key=%s, value=%s", evt->header_key, evt->header_value);
+            break;
+        case HTTP_EVENT_ON_DATA:
+            ESP_LOGV(TAG_INTERFACEINFLUXDB, "HTTP Client data recevied: len=%d", evt->data_len);
+            break;
+        case HTTP_EVENT_ON_FINISH:
+            ESP_LOGI(TAG_INTERFACEINFLUXDB, "HTTP Client finished");
+            break;
+         case HTTP_EVENT_DISCONNECTED:
+            ESP_LOGI(TAG_INTERFACEINFLUXDB, "HTTP Client Disconnected");
+            break;
+    }
+    return ESP_OK;
+}
+
+void InfluxDBPublish(std::string _key, std::string _content, std::string _timestamp) {
+    char response_buffer[MAX_HTTP_OUTPUT_BUFFER] = {0};
+    esp_http_client_config_t http_config = {
+       .user_agent = "ESP32 Meter reader",
+       .method = HTTP_METHOD_POST,
+       .event_handler = http_event_handler,
+       .buffer_size = MAX_HTTP_OUTPUT_BUFFER,
+       .user_data = response_buffer
+    };
+
+    if (_influxDBUser.length() && _influxDBPassword.length()){
+       http_config.username = _influxDBUser.c_str();
+       http_config.password = _influxDBPassword.c_str();
+       http_config.auth_type = HTTP_AUTH_TYPE_BASIC;
+    }
+
+    // generate timestamp (TODO: parse result timestamp passed as string and convert it to POSIX timestamp?)
+    time_t now = time(NULL);
+    char nowTimestamp[21];
+    // pad with zeroes to get nanoseconds
+    sprintf(nowTimestamp,"%jd000000000", (intmax_t)now);
+    
+    std::string payload = _influxDBMeasurement + " " + _key + "=" + _content + " " + nowTimestamp;
+    payload.shrink_to_fit();
+    ESP_LOGI(TAG_INTERFACEINFLUXDB, "sending line to influxdb: %s\n", payload.c_str());
+
+    // use the default retention policy of the database
+    std::string apiURI = _influxDBURI + "/api/v2/write?bucket=" + _influxDBDatabase + "/";
+    apiURI.shrink_to_fit();
+    http_config.url = apiURI.c_str();
+    ESP_LOGI(TAG_INTERFACEINFLUXDB, "API URI: %s", apiURI.c_str());
+
+    esp_http_client_handle_t http_client = esp_http_client_init(&http_config);
+    ESP_LOGI(TAG_INTERFACEINFLUXDB, "client is initialized%s\n", "");
+
+    esp_http_client_set_header(http_client, "Content-Type", "text/plain");
+    ESP_LOGI(TAG_INTERFACEINFLUXDB, "header is set%s\n", "");
+
+    ESP_ERROR_CHECK(esp_http_client_set_post_field(http_client, payload.c_str(), payload.length()));
+    ESP_LOGI(TAG_INTERFACEINFLUXDB, "post payload is set%s\n", "");
+
+    esp_err_t err = ESP_ERROR_CHECK_WITHOUT_ABORT(esp_http_client_perform(http_client));
+
+    if( err == ESP_OK ) {
+      ESP_LOGI(TAG_INTERFACEINFLUXDB, "HTTP request was performed%s\n", "");
+      int status_code = esp_http_client_get_status_code(http_client);
+      ESP_LOGI(TAG_INTERFACEINFLUXDB, "HTTP status code %d\n", status_code);
+    } else {
+      ESP_LOGW(TAG_INTERFACEINFLUXDB, "HTTP request failed%s\n", "");
+    }
+    esp_http_client_cleanup(http_client);
+}
+
+
+void InfluxDBInit(std::string _uri, std::string _database, std::string _measurement, std::string _user, std::string _password){
+    _influxDBURI = _uri;
+    _influxDBDatabase = _database;
+    _influxDBMeasurement = _measurement;
+    _influxDBUser = _user;
+    _influxDBPassword = _password;
+ 
+}
+
+void InfluxDBdestroy() {
+}
+
+
--- a/code/components/jomjol_influxdb/interface_influxdb.h
+++ b/code/components/jomjol_influxdb/interface_influxdb.h
@@ -0,0 +1,13 @@
+#ifndef INTERFACE_INFLUXDB_H
+#define INTERFACE_INFLUXDB_H
+
+#include <string>
+#include <map>
+#include <functional>
+
+void InfluxDBInit(std::string _influxDBURI, std::string _database, std::string _measurement, std::string _user, std::string _password);
+void InfluxDBdestroy();
+
+void InfluxDBPublish(std::string _key, std::string _content, std::string _timestamp);
+
+#endif //INTERFACE_INFLUXDB_H
--- a/code/components/jomjol_logfile/ClassLogFile.cpp
+++ b/code/components/jomjol_logfile/ClassLogFile.cpp
@@ -73,7 +73,7 @@ void ClassLogFile::WriteToDedicatedFile(std::string _fn, std::string info, bool

 //    pFile = OpenFileAndWait(_fn.c_str(), "a"); 
    pFile = fopen(_fn.c_str(), "a+");
-    printf("Logfile opened: %s\n", _fn.c_str());
+//    printf("Logfile opened: %s\n", _fn.c_str());

    if (pFile!=NULL) {
        if (_time)
--- a/code/components/jomjol_mqtt/interface_mqtt.h
+++ b/code/components/jomjol_mqtt/interface_mqtt.h
@@ -10,7 +10,7 @@ void MQTTdestroy();

 //void MQTTInit(std::string _mqttURI, std::string _clientid, std::string _user = "", std::string _password = "");

-void MQTTPublish(std::string _key, std::string _content, int retained_flag = 0);
+void MQTTPublish(std::string _key, std::string _content, int retained_flag = 1);            // retained Flag as Standart

 bool MQTTisConnected();

--- a/code/components/jomjol_tfliteclass/CTfLiteClass.cpp
+++ b/code/components/jomjol_tfliteclass/CTfLiteClass.cpp
@@ -87,6 +87,19 @@ void CTfLiteClass::GetInputDimension(bool silent = false)
  }
 }

+int CTfLiteClass::ReadInputDimenstion(int _dim)
+{
+  if (_dim == 0)
+    return im_width;
+  if (_dim == 1)
+    return im_height;
+  if (_dim == 2)
+    return im_channel;
+
+  return -1;
+}
+
+

 int CTfLiteClass::GetAnzOutPut(bool silent)
 {
--- a/code/components/jomjol_tfliteclass/CTfLiteClass.h
+++ b/code/components/jomjol_tfliteclass/CTfLiteClass.h
@@ -9,7 +9,6 @@
 #include "tensorflow/lite/micro/micro_error_reporter.h"
 #include "tensorflow/lite/micro/micro_interpreter.h"
 #include "tensorflow/lite/schema/schema_generated.h"
-//#include "tensorflow/lite/version.h"
 #include "tensorflow/lite/micro/kernels/micro_ops.h"
 #include "esp_err.h"
 #include "esp_log.h"
@@ -65,8 +64,6 @@ class CTfLiteClass
        bool LoadInputImageBasis(CImageBasis *rs);
        void Invoke();
        int GetAnzOutPut(bool silent = true);        
-//        void GetOutPut();
-//        int GetOutClassification();
        int GetOutClassification(int _von = -1, int _bis = -1);

        int GetClassFromImageBasis(CImageBasis *rs);
@@ -74,5 +71,6 @@ class CTfLiteClass

        float GetOutputValue(int nr);
        void GetInputDimension(bool silent);
+        int ReadInputDimenstion(int _dim);
 };

--- a/code/components/jomjol_tfliteclass/server_tflite.cpp
+++ b/code/components/jomjol_tfliteclass/server_tflite.cpp
@@ -314,6 +314,7 @@ esp_err_t handler_wasserzaehler(httpd_req_t *req)
        
        std::vector<HTMLInfo*> htmlinfodig;
        htmlinfodig = tfliteflow.GetAllDigital();  
+
        for (int i = 0; i < htmlinfodig.size(); ++i)
        {
            if (tfliteflow.GetTypeDigital() == Digital)
--- a/code/components/jomjol_wlan/connect_wlan.cpp
+++ b/code/components/jomjol_wlan/connect_wlan.cpp
@@ -151,8 +151,24 @@ void wifi_init_sta(const char *_ssid, const char *_password, const char *_hostna

    if ((_ipadr != NULL) && (_gw != NULL) && (_netmask != NULL))
    {
+    /*
+       tcpip_adapter_dhcpc_stop(TCPIP_ADAPTER_IF_STA);
+        tcpip_adapter_ip_info_t ip_info;
+        int a, b, c, d;
+        strinttoip4(_ipadr, a, b, c, d);
+        IP4_ADDR(&ip_info.ip, a, b, c, d);
+        strinttoip4(_gw, a, b, c, d);
+        IP4_ADDR(&ip_info.gw, a, b, c, d);
+        strinttoip4(_netmask, a, b, c, d);
+        IP4_ADDR(&ip_info.netmask, a, b, c, d);
+
+        tcpip_adapter_set_ip_info(TCPIP_ADAPTER_IF_STA, &ip_info);
+    */
+
+
        ESP_LOGI(TAG, "set IP %s, GW %s, Netmask %s manual", _ipadr, _gw, _netmask);
        esp_netif_dhcpc_stop(my_sta);
+
        esp_netif_ip_info_t ip_info;
        int a, b, c, d;
        strinttoip4(_ipadr, a, b, c, d);
@@ -168,6 +184,22 @@ void wifi_init_sta(const char *_ssid, const char *_password, const char *_hostna
    wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
    ESP_ERROR_CHECK(esp_wifi_init(&cfg));

+    if ((_ipadr != NULL) && (_gw != NULL) && (_netmask != NULL))
+    {
+        if (_dns == NULL)
+            _dns = _gw;
+            
+        ESP_LOGI(TAG, "set DNS manual");
+        esp_netif_dns_info_t dns_info;
+        ip4_addr_t ip;
+        ip.addr = esp_ip4addr_aton(_dns);
+        ip_addr_set_ip4_u32(&dns_info.ip, ip.addr);
+        ESP_ERROR_CHECK(esp_netif_set_dns_info(my_sta, ESP_NETIF_DNS_MAIN, &dns_info));
+    }
+
+
+
+
    esp_event_handler_instance_t instance_any_id;
    esp_event_handler_instance_t instance_got_ip;
    ESP_ERROR_CHECK(esp_event_handler_instance_register(WIFI_EVENT,
--- a/code/components/tflite-lib/CMakeLists.txt
+++ b/code/components/tflite-lib/CMakeLists.txt
@@ -1,3 +1,5 @@
+## TODO: GLOB is not a good way to collect files. Use explicit file list instead
+
 cmake_minimum_required(VERSION 3.5)

 set(tflite_dir "${CMAKE_CURRENT_SOURCE_DIR}/tensorflow/lite")
@@ -16,14 +18,32 @@ file(GLOB srcs_kernels
          "${tfmicro_kernels_dir}/*.c"
          "${tfmicro_kernels_dir}/*.cc")

+# remove sources which will be provided by esp_nn
+list(REMOVE_ITEM srcs_kernels
+          "${tfmicro_kernels_dir}/add.cc"
+          "${tfmicro_kernels_dir}/conv.cc"
+          "${tfmicro_kernels_dir}/depthwise_conv.cc"
+          "${tfmicro_kernels_dir}/fully_connected.cc"
+          "${tfmicro_kernels_dir}/mul.cc"
+          "${tfmicro_kernels_dir}/pooling.cc"
+          "${tfmicro_kernels_dir}/softmax.cc")
+
+FILE(GLOB esp_nn_kernels
+          "${tfmicro_kernels_dir}/esp_nn/*.cc")
+
 set(lib_srcs
          "${srcs_micro}"
          "${srcs_kernels}"
+          "${esp_nn_kernels}"
          "${src_micro_frontend}"
          "${tflite_dir}/kernels/kernel_util.cc"
          "${tflite_dir}/micro/memory_planner/greedy_memory_planner.cc"
          "${tflite_dir}/micro/memory_planner/linear_memory_planner.cc"
-          "${tflite_dir}/c/common.c"
+          "${tflite_dir}/micro/arena_allocator/non_persistent_arena_buffer_allocator.cc"
+          "${tflite_dir}/micro/arena_allocator/persistent_arena_buffer_allocator.cc"
+          "${tflite_dir}/micro/arena_allocator/recording_single_arena_buffer_allocator.cc"
+          "${tflite_dir}/micro/arena_allocator/single_arena_buffer_allocator.cc"
+          "${tflite_dir}/c/common.cc"
          "${tflite_dir}/core/api/error_reporter.cc"
          "${tflite_dir}/core/api/flatbuffer_conversions.cc"
          "${tflite_dir}/core/api/op_resolver.cc"
@@ -36,15 +56,17 @@ idf_component_register(
            INCLUDE_DIRS "." "third_party/gemmlowp"
                         "third_party/flatbuffers/include"
                         "third_party/ruy"
-                         "third_party/kissfft")
+                         "third_party/kissfft"
+            REQUIRES "esp-nn")

 # Reduce the level of paranoia to be able to compile TF sources
 target_compile_options(${COMPONENT_LIB} PRIVATE
  -Wno-maybe-uninitialized
  -Wno-missing-field-initializers
+  -DESP_NN # enables ESP-NN optimizations by Espressif
  -Wno-type-limits)

-target_compile_options(${COMPONENT_LIB} PRIVATE -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -O3 -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wmissing-field-initializers -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wstrict-aliasing -Wno-unused-parameter -DESP -DESP_NN -Wno-nonnull -Wno-nonnull -Wno-nonnull)
-target_compile_options(${COMPONENT_LIB} PRIVATE $<$<COMPILE_LANGUAGE:CXX>: -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -O3 -Werror -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wmissing-field-initializers -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wstrict-aliasing -Wno-unused-parameter -DESP -DESP_NN -Wno-return-type -Wno-strict-aliasing -std=gnu++14 -Wno-return-type -Wno-strict-aliasing -std=gnu++14 -Wno-return-type -Wno-strict-aliasing -std=gnu++14 >)
+target_compile_options(${COMPONENT_LIB} PRIVATE -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -O3 -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wmissing-field-initializers -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wstrict-aliasing -Wno-unused-parameter -Wno-nonnull)
+target_compile_options(${COMPONENT_LIB} PRIVATE $<$<COMPILE_LANGUAGE:CXX>: -std=c++11 -fno-rtti -fno-exceptions -fno-threadsafe-statics -fno-unwind-tables -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -O3 -Werror -Wsign-compare -Wdouble-promotion -Wshadow -Wunused-variable -Wmissing-field-initializers -Wunused-function -Wswitch -Wvla -Wall -Wextra -Wstrict-aliasing -Wno-unused-parameter -Wno-return-type -Wno-strict-aliasing -std=gnu++14 >)
 target_compile_options(${COMPONENT_LIB} INTERFACE $<$<IN_LIST:-DTF_LITE_STATIC_MEMORY,$<TARGET_PROPERTY:${COMPONENT_LIB},COMPILE_OPTIONS>>:-DTF_LITE_STATIC_MEMORY>)
 target_link_libraries(${COMPONENT_LIB} PRIVATE -lm)
--- a/code/components/tflite-lib/tensorflow/lite/builtin_op_data.h
+++ b/code/components/tflite-lib/tensorflow/lite/builtin_op_data.h
@@ -0,0 +1,22 @@
+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+// Compatibility shim for new location of interface definitions.
+
+#ifndef TENSORFLOW_LITE_BUILTIN_OP_DATA_H_
+#define TENSORFLOW_LITE_BUILTIN_OP_DATA_H_
+
+#include "tensorflow/lite/c/builtin_op_data.h"
+
+#endif  // TENSORFLOW_LITE_BUILTIN_OP_DATA_H_
--- a/code/components/tflite-lib/tensorflow/lite/builtin_ops.h
+++ b/code/components/tflite-lib/tensorflow/lite/builtin_ops.h
@@ -0,0 +1,193 @@
+/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#ifndef TENSORFLOW_LITE_BUILTIN_OPS_H_
+#define TENSORFLOW_LITE_BUILTIN_OPS_H_
+
+// DO NOT EDIT MANUALLY: This file is automatically generated by
+// `schema/builtin_ops_header/generator.cc`.
+
+#ifdef __cplusplus
+extern "C" {
+#endif  // __cplusplus
+
+// The enum for builtin operators.
+// Note: CUSTOM, DELEGATE, and PLACEHOLDER_FOR_GREATER_OP_CODES are 3 special
+// ops which are not real built-in ops.
+typedef enum {
+  kTfLiteBuiltinAdd = 0,
+  kTfLiteBuiltinAveragePool2d = 1,
+  kTfLiteBuiltinConcatenation = 2,
+  kTfLiteBuiltinConv2d = 3,
+  kTfLiteBuiltinDepthwiseConv2d = 4,
+  kTfLiteBuiltinDepthToSpace = 5,
+  kTfLiteBuiltinDequantize = 6,
+  kTfLiteBuiltinEmbeddingLookup = 7,
+  kTfLiteBuiltinFloor = 8,
+  kTfLiteBuiltinFullyConnected = 9,
+  kTfLiteBuiltinHashtableLookup = 10,
+  kTfLiteBuiltinL2Normalization = 11,
+  kTfLiteBuiltinL2Pool2d = 12,
+  kTfLiteBuiltinLocalResponseNormalization = 13,
+  kTfLiteBuiltinLogistic = 14,
+  kTfLiteBuiltinLshProjection = 15,
+  kTfLiteBuiltinLstm = 16,
+  kTfLiteBuiltinMaxPool2d = 17,
+  kTfLiteBuiltinMul = 18,
+  kTfLiteBuiltinRelu = 19,
+  kTfLiteBuiltinReluN1To1 = 20,
+  kTfLiteBuiltinRelu6 = 21,
+  kTfLiteBuiltinReshape = 22,
+  kTfLiteBuiltinResizeBilinear = 23,
+  kTfLiteBuiltinRnn = 24,
+  kTfLiteBuiltinSoftmax = 25,
+  kTfLiteBuiltinSpaceToDepth = 26,
+  kTfLiteBuiltinSvdf = 27,
+  kTfLiteBuiltinTanh = 28,
+  kTfLiteBuiltinConcatEmbeddings = 29,
+  kTfLiteBuiltinSkipGram = 30,
+  kTfLiteBuiltinCall = 31,
+  kTfLiteBuiltinCustom = 32,
+  kTfLiteBuiltinEmbeddingLookupSparse = 33,
+  kTfLiteBuiltinPad = 34,
+  kTfLiteBuiltinUnidirectionalSequenceRnn = 35,
+  kTfLiteBuiltinGather = 36,
+  kTfLiteBuiltinBatchToSpaceNd = 37,
+  kTfLiteBuiltinSpaceToBatchNd = 38,
+  kTfLiteBuiltinTranspose = 39,
+  kTfLiteBuiltinMean = 40,
+  kTfLiteBuiltinSub = 41,
+  kTfLiteBuiltinDiv = 42,
+  kTfLiteBuiltinSqueeze = 43,
+  kTfLiteBuiltinUnidirectionalSequenceLstm = 44,
+  kTfLiteBuiltinStridedSlice = 45,
+  kTfLiteBuiltinBidirectionalSequenceRnn = 46,
+  kTfLiteBuiltinExp = 47,
+  kTfLiteBuiltinTopkV2 = 48,
+  kTfLiteBuiltinSplit = 49,
+  kTfLiteBuiltinLogSoftmax = 50,
+  kTfLiteBuiltinDelegate = 51,
+  kTfLiteBuiltinBidirectionalSequenceLstm = 52,
+  kTfLiteBuiltinCast = 53,
+  kTfLiteBuiltinPrelu = 54,
+  kTfLiteBuiltinMaximum = 55,
+  kTfLiteBuiltinArgMax = 56,
+  kTfLiteBuiltinMinimum = 57,
+  kTfLiteBuiltinLess = 58,
+  kTfLiteBuiltinNeg = 59,
+  kTfLiteBuiltinPadv2 = 60,
+  kTfLiteBuiltinGreater = 61,
+  kTfLiteBuiltinGreaterEqual = 62,
+  kTfLiteBuiltinLessEqual = 63,
+  kTfLiteBuiltinSelect = 64,
+  kTfLiteBuiltinSlice = 65,
+  kTfLiteBuiltinSin = 66,
+  kTfLiteBuiltinTransposeConv = 67,
+  kTfLiteBuiltinSparseToDense = 68,
+  kTfLiteBuiltinTile = 69,
+  kTfLiteBuiltinExpandDims = 70,
+  kTfLiteBuiltinEqual = 71,
+  kTfLiteBuiltinNotEqual = 72,
+  kTfLiteBuiltinLog = 73,
+  kTfLiteBuiltinSum = 74,
+  kTfLiteBuiltinSqrt = 75,
+  kTfLiteBuiltinRsqrt = 76,
+  kTfLiteBuiltinShape = 77,
+  kTfLiteBuiltinPow = 78,
+  kTfLiteBuiltinArgMin = 79,
+  kTfLiteBuiltinFakeQuant = 80,
+  kTfLiteBuiltinReduceProd = 81,
+  kTfLiteBuiltinReduceMax = 82,
+  kTfLiteBuiltinPack = 83,
+  kTfLiteBuiltinLogicalOr = 84,
+  kTfLiteBuiltinOneHot = 85,
+  kTfLiteBuiltinLogicalAnd = 86,
+  kTfLiteBuiltinLogicalNot = 87,
+  kTfLiteBuiltinUnpack = 88,
+  kTfLiteBuiltinReduceMin = 89,
+  kTfLiteBuiltinFloorDiv = 90,
+  kTfLiteBuiltinReduceAny = 91,
+  kTfLiteBuiltinSquare = 92,
+  kTfLiteBuiltinZerosLike = 93,
+  kTfLiteBuiltinFill = 94,
+  kTfLiteBuiltinFloorMod = 95,
+  kTfLiteBuiltinRange = 96,
+  kTfLiteBuiltinResizeNearestNeighbor = 97,
+  kTfLiteBuiltinLeakyRelu = 98,
+  kTfLiteBuiltinSquaredDifference = 99,
+  kTfLiteBuiltinMirrorPad = 100,
+  kTfLiteBuiltinAbs = 101,
+  kTfLiteBuiltinSplitV = 102,
+  kTfLiteBuiltinUnique = 103,
+  kTfLiteBuiltinCeil = 104,
+  kTfLiteBuiltinReverseV2 = 105,
+  kTfLiteBuiltinAddN = 106,
+  kTfLiteBuiltinGatherNd = 107,
+  kTfLiteBuiltinCos = 108,
+  kTfLiteBuiltinWhere = 109,
+  kTfLiteBuiltinRank = 110,
+  kTfLiteBuiltinElu = 111,
+  kTfLiteBuiltinReverseSequence = 112,
+  kTfLiteBuiltinMatrixDiag = 113,
+  kTfLiteBuiltinQuantize = 114,
+  kTfLiteBuiltinMatrixSetDiag = 115,
+  kTfLiteBuiltinRound = 116,
+  kTfLiteBuiltinHardSwish = 117,
+  kTfLiteBuiltinIf = 118,
+  kTfLiteBuiltinWhile = 119,
+  kTfLiteBuiltinNonMaxSuppressionV4 = 120,
+  kTfLiteBuiltinNonMaxSuppressionV5 = 121,
+  kTfLiteBuiltinScatterNd = 122,
+  kTfLiteBuiltinSelectV2 = 123,
+  kTfLiteBuiltinDensify = 124,
+  kTfLiteBuiltinSegmentSum = 125,
+  kTfLiteBuiltinBatchMatmul = 126,
+  kTfLiteBuiltinPlaceholderForGreaterOpCodes = 127,
+  kTfLiteBuiltinCumsum = 128,
+  kTfLiteBuiltinCallOnce = 129,
+  kTfLiteBuiltinBroadcastTo = 130,
+  kTfLiteBuiltinRfft2d = 131,
+  kTfLiteBuiltinConv3d = 132,
+  kTfLiteBuiltinImag = 133,
+  kTfLiteBuiltinReal = 134,
+  kTfLiteBuiltinComplexAbs = 135,
+  kTfLiteBuiltinHashtable = 136,
+  kTfLiteBuiltinHashtableFind = 137,
+  kTfLiteBuiltinHashtableImport = 138,
+  kTfLiteBuiltinHashtableSize = 139,
+  kTfLiteBuiltinReduceAll = 140,
+  kTfLiteBuiltinConv3dTranspose = 141,
+  kTfLiteBuiltinVarHandle = 142,
+  kTfLiteBuiltinReadVariable = 143,
+  kTfLiteBuiltinAssignVariable = 144,
+  kTfLiteBuiltinBroadcastArgs = 145,
+  kTfLiteBuiltinRandomStandardNormal = 146,
+  kTfLiteBuiltinBucketize = 147,
+  kTfLiteBuiltinRandomUniform = 148,
+  kTfLiteBuiltinMultinomial = 149,
+  kTfLiteBuiltinGelu = 150,
+  kTfLiteBuiltinDynamicUpdateSlice = 151,
+  kTfLiteBuiltinRelu0To1 = 152,
+  kTfLiteBuiltinUnsortedSegmentProd = 153,
+  kTfLiteBuiltinUnsortedSegmentMax = 154,
+  kTfLiteBuiltinUnsortedSegmentSum = 155,
+  kTfLiteBuiltinAtan2 = 156,
+  kTfLiteBuiltinUnsortedSegmentMin = 157,
+} TfLiteBuiltinOperator;
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif  // __cplusplus
+#endif  // TENSORFLOW_LITE_BUILTIN_OPS_H_
--- a/code/components/tflite-lib/tensorflow/lite/c/c_api_types.h
+++ b/code/components/tflite-lib/tensorflow/lite/c/c_api_types.h
@@ -98,6 +98,7 @@ typedef enum {
  kTfLiteResource = 14,
  kTfLiteVariant = 15,
  kTfLiteUInt32 = 16,
+  kTfLiteUInt16 = 17,
 } TfLiteType;

 // Legacy. Will be deprecated in favor of TfLiteAffineQuantization.
@@ -111,6 +112,18 @@ typedef struct TfLiteQuantizationParams {
  int32_t zero_point;
 } TfLiteQuantizationParams;

+// --------------------------------------------------------------------------
+// Opaque types used by c_api.h, c_api_opaque.h and common.h.
+
+// TfLiteOpaqueContext is an opaque version of TfLiteContext;
+typedef struct TfLiteOpaqueContext TfLiteOpaqueContext;
+
+// TfLiteOpaqueNode is an opaque version of TfLiteNode;
+typedef struct TfLiteOpaqueNode TfLiteOpaqueNode;
+
+// TfLiteOpaqueTensor is an opaque version of TfLiteTensor;
+typedef struct TfLiteOpaqueTensor TfLiteOpaqueTensor;
+
 #ifdef __cplusplus
 }  // extern C
 #endif
--- a/code/components/tflite-lib/tensorflow/lite/c/common.cc
+++ b/code/components/tflite-lib/tensorflow/lite/c/common.cc
@@ -14,13 +14,19 @@ limitations under the License.
 ==============================================================================*/

 #include "tensorflow/lite/c/common.h"
+
 #include "tensorflow/lite/c/c_api_types.h"
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+#include "tensorflow/lite/tensorflow_profiler_logger.h"
+#endif

 #ifndef TF_LITE_STATIC_MEMORY
 #include <stdlib.h>
 #include <string.h>
 #endif  // TF_LITE_STATIC_MEMORY

+extern "C" {
+
 size_t TfLiteIntArrayGetSizeInBytes(int size) {
  static TfLiteIntArray dummy;

@@ -34,13 +40,13 @@ size_t TfLiteIntArrayGetSizeInBytes(int size) {

 int TfLiteIntArrayEqual(const TfLiteIntArray* a, const TfLiteIntArray* b) {
  if (a == b) return 1;
-  if (a == NULL || b == NULL) return 0;
+  if (a == nullptr || b == nullptr) return 0;
  return TfLiteIntArrayEqualsArray(a, b->size, b->data);
 }

 int TfLiteIntArrayEqualsArray(const TfLiteIntArray* a, int b_size,
                              const int b_data[]) {
-  if (a == NULL) return (b_size == 0);
+  if (a == nullptr) return (b_size == 0);
  if (a->size != b_size) return 0;
  int i = 0;
  for (; i < a->size; i++)
@@ -52,7 +58,7 @@ int TfLiteIntArrayEqualsArray(const TfLiteIntArray* a, int b_size,

 TfLiteIntArray* TfLiteIntArrayCreate(int size) {
  size_t alloc_size = TfLiteIntArrayGetSizeInBytes(size);
-  if (alloc_size <= 0) return NULL;
+  if (alloc_size <= 0) return nullptr;
  TfLiteIntArray* ret = (TfLiteIntArray*)malloc(alloc_size);
  if (!ret) return ret;
  ret->size = size;
@@ -60,7 +66,7 @@ TfLiteIntArray* TfLiteIntArrayCreate(int size) {
 }

 TfLiteIntArray* TfLiteIntArrayCopy(const TfLiteIntArray* src) {
-  if (!src) return NULL;
+  if (!src) return nullptr;
  TfLiteIntArray* ret = TfLiteIntArrayCreate(src->size);
  if (ret) {
    memcpy(ret->data, src->data, src->size * sizeof(int));
@@ -97,9 +103,14 @@ void TfLiteFloatArrayFree(TfLiteFloatArray* a) { free(a); }
 void TfLiteTensorDataFree(TfLiteTensor* t) {
  if (t->allocation_type == kTfLiteDynamic ||
      t->allocation_type == kTfLitePersistentRo) {
-    free(t->data.raw);
+    if (t->data.raw) {
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+      tflite::OnTfLiteTensorDealloc(t);
+#endif
+      free(t->data.raw);
+    }
  }
-  t->data.raw = NULL;
+  t->data.raw = nullptr;
 }

 void TfLiteQuantizationFree(TfLiteQuantization* quantization) {
@@ -108,31 +119,31 @@ void TfLiteQuantizationFree(TfLiteQuantization* quantization) {
        (TfLiteAffineQuantization*)(quantization->params);
    if (q_params->scale) {
      TfLiteFloatArrayFree(q_params->scale);
-      q_params->scale = NULL;
+      q_params->scale = nullptr;
    }
    if (q_params->zero_point) {
      TfLiteIntArrayFree(q_params->zero_point);
-      q_params->zero_point = NULL;
+      q_params->zero_point = nullptr;
    }
    free(q_params);
  }
-  quantization->params = NULL;
+  quantization->params = nullptr;
  quantization->type = kTfLiteNoQuantization;
 }

 void TfLiteSparsityFree(TfLiteSparsity* sparsity) {
-  if (sparsity == NULL) {
+  if (sparsity == nullptr) {
    return;
  }

  if (sparsity->traversal_order) {
    TfLiteIntArrayFree(sparsity->traversal_order);
-    sparsity->traversal_order = NULL;
+    sparsity->traversal_order = nullptr;
  }

  if (sparsity->block_map) {
    TfLiteIntArrayFree(sparsity->block_map);
-    sparsity->block_map = NULL;
+    sparsity->block_map = nullptr;
  }

  if (sparsity->dim_metadata) {
@@ -141,13 +152,13 @@ void TfLiteSparsityFree(TfLiteSparsity* sparsity) {
      TfLiteDimensionMetadata metadata = sparsity->dim_metadata[i];
      if (metadata.format == kTfLiteDimSparseCSR) {
        TfLiteIntArrayFree(metadata.array_segments);
-        metadata.array_segments = NULL;
+        metadata.array_segments = nullptr;
        TfLiteIntArrayFree(metadata.array_indices);
-        metadata.array_indices = NULL;
+        metadata.array_indices = nullptr;
      }
    }
    free(sparsity->dim_metadata);
-    sparsity->dim_metadata = NULL;
+    sparsity->dim_metadata = nullptr;
  }

  free(sparsity);
@@ -156,16 +167,16 @@ void TfLiteSparsityFree(TfLiteSparsity* sparsity) {
 void TfLiteTensorFree(TfLiteTensor* t) {
  TfLiteTensorDataFree(t);
  if (t->dims) TfLiteIntArrayFree(t->dims);
-  t->dims = NULL;
+  t->dims = nullptr;

  if (t->dims_signature) {
-    TfLiteIntArrayFree((TfLiteIntArray *) t->dims_signature);
+    TfLiteIntArrayFree((TfLiteIntArray*)t->dims_signature);
  }
-  t->dims_signature = NULL;
+  t->dims_signature = nullptr;

  TfLiteQuantizationFree(&t->quantization);
  TfLiteSparsityFree(t->sparsity);
-  t->sparsity = NULL;
+  t->sparsity = nullptr;
 }

 void TfLiteTensorReset(TfLiteType type, const char* name, TfLiteIntArray* dims,
@@ -185,20 +196,16 @@ void TfLiteTensorReset(TfLiteType type, const char* name, TfLiteIntArray* dims,
  tensor->is_variable = is_variable;

  tensor->quantization.type = kTfLiteNoQuantization;
-  tensor->quantization.params = NULL;
+  tensor->quantization.params = nullptr;
 }

 TfLiteStatus TfLiteTensorCopy(const TfLiteTensor* src, TfLiteTensor* dst) {
-  if (!src || !dst)
-    return kTfLiteOk;
-  if (src->bytes != dst->bytes)
-    return kTfLiteError;
-  if (src == dst)
-    return kTfLiteOk;
+  if (!src || !dst) return kTfLiteOk;
+  if (src->bytes != dst->bytes) return kTfLiteError;
+  if (src == dst) return kTfLiteOk;

  dst->type = src->type;
-  if (dst->dims)
-    TfLiteIntArrayFree(dst->dims);
+  if (dst->dims) TfLiteIntArrayFree(dst->dims);
  dst->dims = TfLiteIntArrayCopy(src->dims);
  memcpy(dst->data.raw, src->data.raw, src->bytes);
  dst->buffer_handle = src->buffer_handle;
@@ -216,8 +223,17 @@ void TfLiteTensorRealloc(size_t num_bytes, TfLiteTensor* tensor) {
  // TODO(b/145340303): Tensor data should be aligned.
  if (!tensor->data.raw) {
    tensor->data.raw = (char*)malloc(num_bytes);
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+    tflite::OnTfLiteTensorAlloc(tensor, num_bytes);
+#endif
  } else if (num_bytes > tensor->bytes) {
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+    tflite::OnTfLiteTensorDealloc(tensor);
+#endif
    tensor->data.raw = (char*)realloc(tensor->data.raw, num_bytes);
+#ifdef TF_LITE_TENSORFLOW_PROFILER
+    tflite::OnTfLiteTensorAlloc(tensor, num_bytes);
+#endif
  }
  tensor->bytes = num_bytes;
 }
@@ -229,6 +245,8 @@ const char* TfLiteTypeGetName(TfLiteType type) {
      return "NOTYPE";
    case kTfLiteFloat32:
      return "FLOAT32";
+    case kTfLiteUInt16:
+      return "UINT16";
    case kTfLiteInt16:
      return "INT16";
    case kTfLiteInt32:
@@ -263,14 +281,6 @@ const char* TfLiteTypeGetName(TfLiteType type) {
  return "Unknown type";
 }

-TfLiteDelegate TfLiteDelegateCreate(void) {
-  TfLiteDelegate d = {
-      .data_ = NULL,
-      .Prepare = NULL,
-      .CopyFromBufferHandle = NULL,
-      .CopyToBufferHandle = NULL,
-      .FreeBufferHandle = NULL,
-      .flags = kTfLiteDelegateFlagsNone,
-  };
-  return d;
-}
+TfLiteDelegate TfLiteDelegateCreate() { return TfLiteDelegate{}; }
+
+}  // extern "C"
--- a/code/components/tflite-lib/tensorflow/lite/c/common.h
+++ b/code/components/tflite-lib/tensorflow/lite/c/common.h
@@ -173,8 +173,9 @@ void TfLiteFloatArrayFree(TfLiteFloatArray* a);
    }                                                 \
  } while (false)
 #else  // TF_LITE_STRIP_ERROR_STRINGS
-#define TF_LITE_KERNEL_LOG(context, ...)
-#define TF_LITE_MAYBE_KERNEL_LOG(context, ...)
+#define ARGS_UNUSED(...) (void)sizeof(#__VA_ARGS__)
+#define TF_LITE_KERNEL_LOG(context, ...) ARGS_UNUSED(__VA_ARGS__)
+#define TF_LITE_MAYBE_KERNEL_LOG(context, ...) ARGS_UNUSED(__VA_ARGS__)
 #endif  // TF_LITE_STRIP_ERROR_STRINGS

 // Check whether value is true, and if not return kTfLiteError from
@@ -316,6 +317,7 @@ typedef union TfLitePtrUnion {
  uint8_t* uint8;
  bool* b;
  int16_t* i16;
+  uint16_t* ui16;
  TfLiteComplex64* c64;
  TfLiteComplex128* c128;
  int8_t* int8;
@@ -459,7 +461,8 @@ typedef struct TfLiteTensor {
  // Optional. Encodes shapes with unknown dimensions with -1. This field is
  // only populated when unknown dimensions exist in a read-write tensor (i.e.
  // an input or output tensor). (e.g.  `dims` contains [1, 1, 1, 3] and
-  // `dims_signature` contains [1, -1, -1, 3]).
+  // `dims_signature` contains [1, -1, -1, 3]). Note that this field only
+  // exists when TF_LITE_STATIC_MEMORY is not defined.
  const TfLiteIntArray* dims_signature;
 } TfLiteTensor;

@@ -839,6 +842,12 @@ typedef struct TfLiteContext {
                                   size_t* bytes);
 } TfLiteContext;

+// `TfLiteRegistrationExternal` is an external version of `TfLiteRegistration`
+// for C API which doesn't use internal types (such as `TfLiteContext`) but only
+// uses stable API types (such as `TfLiteOpaqueContext`). The purpose of each
+// field is the exactly the same as with `TfLiteRegistration`.
+typedef struct TfLiteRegistrationExternal TfLiteRegistrationExternal;
+
 typedef struct TfLiteRegistration {
  // Initializes the op from serialized data.
  // Called only *once* for the lifetime of the op, so any one-time allocations
@@ -900,8 +909,31 @@ typedef struct TfLiteRegistration {
  // Note: It is the responsibility of the registration binder to set this
  // properly.
  int version;
+
+  // The external version of `TfLiteRegistration`. Since we can't use internal
+  // types (such as `TfLiteContext`) for C API to maintain ABI stability.
+  // C API user will provide `TfLiteRegistrationExternal` to implement custom
+  // ops. We keep it inside of `TfLiteRegistration` and use it to route
+  // callbacks properly.
+  TfLiteRegistrationExternal* registration_external;
 } TfLiteRegistration;

+// Old version of `TfLiteRegistration` to maintain binary backward
+// compatibility.
+// WARNING: This structure is deprecated / not an official part of the API.
+// It should be only used for binary backward compatibility.
+typedef struct TfLiteRegistration_V1 {
+  void* (*init)(TfLiteContext* context, const char* buffer, size_t length);
+  void (*free)(TfLiteContext* context, void* buffer);
+  TfLiteStatus (*prepare)(TfLiteContext* context, TfLiteNode* node);
+  TfLiteStatus (*invoke)(TfLiteContext* context, TfLiteNode* node);
+  const char* (*profiling_string)(const TfLiteContext* context,
+                                  const TfLiteNode* node);
+  int32_t builtin_code;
+  const char* custom_name;
+  int version;
+} TfLiteRegistration_V1;
+
 // The flags used in `TfLiteDelegate`. Note that this is a bitmask, so the
 // values should be 1, 2, 4, 8, ...etc.
 typedef enum TfLiteDelegateFlags {
--- a/code/components/tflite-lib/tensorflow/lite/context_util.h
+++ b/code/components/tflite-lib/tensorflow/lite/context_util.h
@@ -0,0 +1,51 @@
+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+// This provides a few C++ helpers that are useful for manipulating C structures
+// in C++.
+#ifndef TENSORFLOW_LITE_CONTEXT_UTIL_H_
+#define TENSORFLOW_LITE_CONTEXT_UTIL_H_
+
+#include <stddef.h>
+
+#include "tensorflow/lite/c/common.h"
+
+namespace tflite {
+
+// Provide a range iterable wrapper for TfLiteIntArray* (C lists that TfLite
+// C api uses. Can't use the google array_view, since we can't depend on even
+// absl for embedded device reasons.
+class TfLiteIntArrayView {
+ public:
+  // Construct a view of a TfLiteIntArray*. Note, `int_array` should be non-null
+  // and this view does not take ownership of it.
+  explicit TfLiteIntArrayView(const TfLiteIntArray* int_array)
+      : int_array_(int_array) {}
+
+  TfLiteIntArrayView(const TfLiteIntArrayView&) = default;
+  TfLiteIntArrayView& operator=(const TfLiteIntArrayView& rhs) = default;
+
+  typedef const int* const_iterator;
+  const_iterator begin() const { return int_array_->data; }
+  const_iterator end() const { return &int_array_->data[int_array_->size]; }
+  size_t size() const { return end() - begin(); }
+  int operator[](size_t pos) const { return int_array_->data[pos]; }
+
+ private:
+  const TfLiteIntArray* int_array_;
+};
+
+}  // namespace tflite
+
+#endif  // TENSORFLOW_LITE_CONTEXT_UTIL_H_
--- a/code/components/tflite-lib/tensorflow/lite/core/api/flatbuffer_conversions.cc
+++ b/code/components/tflite-lib/tensorflow/lite/core/api/flatbuffer_conversions.cc
@@ -208,6 +208,14 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
      return ParseBatchToSpaceNd(op, error_reporter, allocator, builtin_data);
    }

+    case BuiltinOperator_BROADCAST_ARGS: {
+      return ParseBroadcastArgs(op, error_reporter, allocator, builtin_data);
+    }
+
+    case BuiltinOperator_BROADCAST_TO: {
+      return ParseBroadcastTo(op, error_reporter, allocator, builtin_data);
+    }
+
    case BuiltinOperator_CALL_ONCE: {
      return ParseCallOnce(op, error_reporter, allocator, builtin_data);
    }
@@ -336,6 +344,10 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
      return ParseLogSoftmax(op, error_reporter, allocator, builtin_data);
    }

+    case BuiltinOperator_LSTM: {
+      return ParseLSTM(op, error_reporter, allocator, builtin_data);
+    }
+
    case BuiltinOperator_MAXIMUM: {
      return ParseMaximum(op, error_reporter, allocator, builtin_data);
    }
@@ -481,6 +493,11 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
      return ParseSquare(op, error_reporter, allocator, builtin_data);
    }

+    case BuiltinOperator_SQUARED_DIFFERENCE: {
+      return ParseSquaredDifference(op, error_reporter, allocator,
+                                    builtin_data);
+    }
+
    case BuiltinOperator_SQUEEZE: {
      return ParseSqueeze(op, error_reporter, allocator, builtin_data);
    }
@@ -605,37 +622,6 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
      *builtin_data = params.release();
      return kTfLiteOk;
    }
-    case BuiltinOperator_LSTM: {
-      auto params = safe_allocator.Allocate<TfLiteLSTMParams>();
-      TF_LITE_ENSURE(error_reporter, params != nullptr);
-      if (const auto* lstm_params = op->builtin_options_as_LSTMOptions()) {
-        params->activation =
-            ConvertActivation(lstm_params->fused_activation_function());
-        params->cell_clip = lstm_params->cell_clip();
-        params->proj_clip = lstm_params->proj_clip();
-        switch (lstm_params->kernel_type()) {
-          case LSTMKernelType_FULL:
-            params->kernel_type = kTfLiteLSTMFullKernel;
-            break;
-          case LSTMKernelType_BASIC:
-            params->kernel_type = kTfLiteLSTMBasicKernel;
-            break;
-          default:
-            TF_LITE_REPORT_ERROR(error_reporter,
-                                 "Unhandled LSTM kernel type: %d",
-                                 lstm_params->kernel_type());
-            return kTfLiteError;
-        }
-        params->asymmetric_quantize_inputs =
-            lstm_params->asymmetric_quantize_inputs();
-      } else {
-        TF_LITE_REPORT_ERROR(error_reporter,
-                             "No valid LSTM builtin options exist");
-        return kTfLiteError;
-      }
-      *builtin_data = params.release();
-      return kTfLiteOk;
-    }
    case BuiltinOperator_UNIDIRECTIONAL_SEQUENCE_LSTM: {
      return ParseUnidirectionalSequenceLSTM(op, error_reporter, allocator,
                                             builtin_data);
@@ -859,14 +845,25 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
    // TODO(aselle): Implement call in BuiltinOptions, but nullptrs are
    // ok for now, since there is no call implementation either.
    case BuiltinOperator_CALL:
+    case BuiltinOperator_COMPLEX_ABS:
    case BuiltinOperator_CONCAT_EMBEDDINGS:
    case BuiltinOperator_COS:
    case BuiltinOperator_CUSTOM:
+    case BuiltinOperator_DENSIFY:
+    case BuiltinOperator_DYNAMIC_UPDATE_SLICE:
    case BuiltinOperator_EMBEDDING_LOOKUP:
    case BuiltinOperator_EQUAL:
+    case BuiltinOperator_HASHTABLE_FIND:
+    case BuiltinOperator_HASHTABLE_IMPORT:
+    case BuiltinOperator_HASHTABLE_SIZE:
+    case BuiltinOperator_IMAG:
    case BuiltinOperator_MATRIX_DIAG:
    case BuiltinOperator_MATRIX_SET_DIAG:
+    case BuiltinOperator_NON_MAX_SUPPRESSION_V4:
+    case BuiltinOperator_NON_MAX_SUPPRESSION_V5:
    case BuiltinOperator_RELU_N1_TO_1:
+    case BuiltinOperator_RELU_0_TO_1:
+    case BuiltinOperator_SCATTER_ND:
    case BuiltinOperator_SELECT:
    case BuiltinOperator_SELECT_V2:
    case BuiltinOperator_SLICE:
@@ -874,24 +871,17 @@ TfLiteStatus ParseOpDataTfLite(const Operator* op, BuiltinOperator op_type,
    case BuiltinOperator_TOPK_V2:
    case BuiltinOperator_TRANSPOSE:
    case BuiltinOperator_RANGE:
-    case BuiltinOperator_SQUARED_DIFFERENCE:
-    case BuiltinOperator_REVERSE_V2:
-    case BuiltinOperator_WHERE:
    case BuiltinOperator_RANK:
-    case BuiltinOperator_NON_MAX_SUPPRESSION_V4:
-    case BuiltinOperator_NON_MAX_SUPPRESSION_V5:
-    case BuiltinOperator_SCATTER_ND:
-    case BuiltinOperator_DENSIFY:
-    case BuiltinOperator_SEGMENT_SUM:
-    case BuiltinOperator_BROADCAST_TO:
-    case BuiltinOperator_RFFT2D:
-    case BuiltinOperator_IMAG:
    case BuiltinOperator_REAL:
-    case BuiltinOperator_COMPLEX_ABS:
-    case BuiltinOperator_HASHTABLE_FIND:
-    case BuiltinOperator_HASHTABLE_IMPORT:
-    case BuiltinOperator_HASHTABLE_SIZE:
-    case BuiltinOperator_BROADCAST_ARGS:
+    case BuiltinOperator_RFFT2D:
+    case BuiltinOperator_SEGMENT_SUM:
+    case BuiltinOperator_REVERSE_V2:
+    case BuiltinOperator_UNSORTED_SEGMENT_MAX:
+    case BuiltinOperator_UNSORTED_SEGMENT_MIN:
+    case BuiltinOperator_UNSORTED_SEGMENT_PROD:
+    case BuiltinOperator_UNSORTED_SEGMENT_SUM:
+    case BuiltinOperator_ATAN2:
+    case BuiltinOperator_WHERE:
      return kTfLiteOk;
    case BuiltinOperator_PLACEHOLDER_FOR_GREATER_OP_CODES:
      return kTfLiteError;
@@ -916,6 +906,9 @@ TfLiteStatus ConvertTensorType(TensorType tensor_type, TfLiteType* type,
    case TensorType_INT16:
      *type = kTfLiteInt16;
      return kTfLiteOk;
+    case TensorType_UINT16:
+      *type = kTfLiteUInt16;
+      return kTfLiteOk;
    case TensorType_INT32:
      *type = kTfLiteInt32;
      return kTfLiteOk;
@@ -1085,6 +1078,22 @@ TfLiteStatus ParseBatchToSpaceNd(const Operator*, ErrorReporter*,
  return kTfLiteOk;
 }

+// We have this parse function instead of directly returning kTfLiteOk from the
+// switch-case in ParseOpData because this function is used as part of the
+// selective registration for the OpResolver implementation in micro.
+TfLiteStatus ParseBroadcastArgs(const Operator*, ErrorReporter*,
+                                BuiltinDataAllocator*, void**) {
+  return kTfLiteOk;
+}
+
+// We have this parse function instead of directly returning kTfLiteOk from the
+// switch-case in ParseOpData because this function is used as part of the
+// selective registration for the OpResolver implementation in micro.
+TfLiteStatus ParseBroadcastTo(const Operator*, ErrorReporter*,
+                              BuiltinDataAllocator*, void**) {
+  return kTfLiteOk;
+}
+
 TfLiteStatus ParseCallOnce(const Operator* op, ErrorReporter* error_reporter,
                           BuiltinDataAllocator* allocator,
                           void** builtin_data) {
@@ -1605,6 +1614,40 @@ TfLiteStatus ParseLogSoftmax(const Operator*, ErrorReporter*,
  return kTfLiteOk;
 }

+TfLiteStatus ParseLSTM(const Operator* op, ErrorReporter* error_reporter,
+                       BuiltinDataAllocator* allocator, void** builtin_data) {
+  CheckParsePointerParams(op, error_reporter, allocator, builtin_data);
+
+  SafeBuiltinDataAllocator safe_allocator(allocator);
+  auto params = safe_allocator.Allocate<TfLiteLSTMParams>();
+  TF_LITE_ENSURE(error_reporter, params != nullptr);
+  if (const auto* lstm_params = op->builtin_options_as_LSTMOptions()) {
+    params->activation =
+        ConvertActivation(lstm_params->fused_activation_function());
+    params->cell_clip = lstm_params->cell_clip();
+    params->proj_clip = lstm_params->proj_clip();
+    switch (lstm_params->kernel_type()) {
+      case LSTMKernelType_FULL:
+        params->kernel_type = kTfLiteLSTMFullKernel;
+        break;
+      case LSTMKernelType_BASIC:
+        params->kernel_type = kTfLiteLSTMBasicKernel;
+        break;
+      default:
+        TF_LITE_REPORT_ERROR(error_reporter, "Unhandled LSTM kernel type: %d",
+                             lstm_params->kernel_type());
+        return kTfLiteError;
+    }
+    params->asymmetric_quantize_inputs =
+        lstm_params->asymmetric_quantize_inputs();
+  } else {
+    TF_LITE_REPORT_ERROR(error_reporter, "No valid LSTM builtin options exist");
+    return kTfLiteError;
+  }
+  *builtin_data = params.release();
+  return kTfLiteOk;
+}
+
 // We have this parse function instead of directly returning kTfLiteOk from the
 // switch-case in ParseOpData because this function is used as part of the
 // selective registration for the OpResolver implementation in micro.
@@ -2156,6 +2199,14 @@ TfLiteStatus ParseSquare(const Operator*, ErrorReporter*, BuiltinDataAllocator*,
  return kTfLiteOk;
 }

+// We have this parse function instead of directly returning kTfLiteOk from the
+// switch-case in ParseOpData because this function is used as part of the
+// selective registration for the OpResolver implementation in micro.
+TfLiteStatus ParseSquaredDifference(const Operator*, ErrorReporter*,
+                                    BuiltinDataAllocator*, void**) {
+  return kTfLiteOk;
+}
+
 TfLiteStatus ParseStridedSlice(const Operator* op,
                               ErrorReporter* error_reporter,
                               BuiltinDataAllocator* allocator,
@@ -2337,6 +2388,31 @@ TfLiteStatus ParseVarHandle(const Operator* op, ErrorReporter* error_reporter,
  return kTfLiteOk;
 }

+TfLiteStatus ParseWhile(const Operator* op, ErrorReporter* error_reporter,
+                        BuiltinDataAllocator* allocator, void** builtin_data) {
+  CheckParsePointerParams(op, error_reporter, allocator, builtin_data);
+
+  SafeBuiltinDataAllocator safe_allocator(allocator);
+  std::unique_ptr<TfLiteWhileParams,
+                  SafeBuiltinDataAllocator::BuiltinDataDeleter>
+      params = safe_allocator.Allocate<TfLiteWhileParams>();
+  TF_LITE_ENSURE(error_reporter, params != nullptr);
+
+  const WhileOptions* schema_params = op->builtin_options_as_WhileOptions();
+
+  if (schema_params != nullptr) {
+    params->cond_subgraph_index = schema_params->cond_subgraph_index();
+    params->body_subgraph_index = schema_params->body_subgraph_index();
+  } else {
+    // TODO(b/157480169): We should either return kTfLiteError or fill in some
+    // reasonable defaults in the params struct. We are not doing so until we
+    // better undertand the ramifications of changing the legacy behavior.
+  }
+
+  *builtin_data = params.release();
+  return kTfLiteOk;
+}
+
 // We have this parse function instead of directly returning kTfLiteOk from the
 // switch-case in ParseOpData because this function is used as part of the
 // selective registration for the OpResolver implementation in micro.
--- a/code/components/tflite-lib/tensorflow/lite/core/api/flatbuffer_conversions.h
+++ b/code/components/tflite-lib/tensorflow/lite/core/api/flatbuffer_conversions.h
@@ -98,6 +98,15 @@ TfLiteStatus ParseBatchToSpaceNd(const Operator* op,
                                 BuiltinDataAllocator* allocator,
                                 void** builtin_data);

+TfLiteStatus ParseBroadcastArgs(const Operator* op,
+                                ErrorReporter* error_reporter,
+                                BuiltinDataAllocator* allocator,
+                                void** builtin_data);
+
+TfLiteStatus ParseBroadcastTo(const Operator* op, ErrorReporter* error_reporter,
+                              BuiltinDataAllocator* allocator,
+                              void** builtin_data);
+
 TfLiteStatus ParseCallOnce(const Operator* op, ErrorReporter* error_reporter,
                           BuiltinDataAllocator* allocator,
                           void** builtin_data);
@@ -232,6 +241,9 @@ TfLiteStatus ParseLogSoftmax(const Operator* op, ErrorReporter* error_reporter,
                             BuiltinDataAllocator* allocator,
                             void** builtin_data);

+TfLiteStatus ParseLSTM(const Operator* op, ErrorReporter* error_reporter,
+                       BuiltinDataAllocator* allocator, void** builtin_data);
+
 TfLiteStatus ParseMaximum(const Operator* op, ErrorReporter* error_reporter,
                          BuiltinDataAllocator* allocator, void** builtin_data);

@@ -344,6 +356,11 @@ TfLiteStatus ParseSqrt(const Operator* op, ErrorReporter* error_reporter,
 TfLiteStatus ParseSquare(const Operator* op, ErrorReporter* error_reporter,
                         BuiltinDataAllocator* allocator, void** builtin_data);

+TfLiteStatus ParseSquaredDifference(const Operator* op,
+                                    ErrorReporter* error_reporter,
+                                    BuiltinDataAllocator* allocator,
+                                    void** builtin_data);
+
 TfLiteStatus ParseStridedSlice(const Operator* op,
                               ErrorReporter* error_reporter,
                               BuiltinDataAllocator* allocator,
@@ -379,6 +396,9 @@ TfLiteStatus ParseVarHandle(const Operator* op, ErrorReporter* error_reporter,
                            BuiltinDataAllocator* allocator,
                            void** builtin_data);

+TfLiteStatus ParseWhile(const Operator* op, ErrorReporter* error_reporter,
+                        BuiltinDataAllocator* allocator, void** builtin_data);
+
 TfLiteStatus ParseZerosLike(const Operator* op, ErrorReporter* error_reporter,
                            BuiltinDataAllocator* allocator,
                            void** builtin_data);
--- a/code/components/tflite-lib/tensorflow/lite/core/api/op_resolver.h
+++ b/code/components/tflite-lib/tensorflow/lite/core/api/op_resolver.h
@@ -23,6 +23,16 @@ limitations under the License.
 #include "tensorflow/lite/core/api/error_reporter.h"
 #include "tensorflow/lite/schema/schema_generated.h"

+// Opaque type similar to TfLiteDelegate / TfLiteOpaqueDelegate.
+// This is used for cases (e.g. when using "TF Lite with Google Play Services")
+// where the TF Lite runtime might be built using a newer (or older)
+// version of the TF Lite sources than the app, and hence might have a
+// different definition of the TfLiteDelegate type. TF Lite APIs use
+// TfLiteOpaqueDelegate rather than TfLiteDelegate when they want to
+// refer to a delegate defined with that potentially different version
+// of the TfLiteDelegate type.
+struct TfLiteOpaqueDelegateStruct;
+
 namespace tflite {

 /// Abstract interface that returns TfLiteRegistrations given op codes or custom
@@ -37,8 +47,10 @@ class OpResolver {
  virtual const TfLiteRegistration* FindOp(const char* op,
                                           int version) const = 0;

+  // Represents a sequence of delegates.
  using TfLiteDelegatePtrVector =
      std::vector<std::unique_ptr<TfLiteDelegate, void (*)(TfLiteDelegate*)>>;
+
  // Returns optional delegates for resolving and handling ops in the flatbuffer
  // model. This may be used in addition to the standard TfLiteRegistration
  // lookup for graph resolution.
@@ -47,16 +59,55 @@ class OpResolver {
    return {};
  }

-  // Represent a function that creates a TfLite delegate instance.
+  // Represents a function that creates a TfLite delegate instance.
  using TfLiteDelegateCreator =
      std::function<std::unique_ptr<TfLiteDelegate, void (*)(TfLiteDelegate*)>(
          int /*num_threads*/)>;
+
+  // Represents a sequence of delegate creator functions.
  using TfLiteDelegateCreators = std::vector<TfLiteDelegateCreator>;
+
  // Returns a vector of delegate creators to create optional delegates for
  // resolving and handling ops in the flatbuffer model. This may be used in
  // addition to the standard TfLiteRegistration lookup for graph resolution.
+  //
+  // Note that this method is not used (will not be called) if you are using
+  // TF Lite in Google Play Services; the GetOpaqueDelegateCreators method
+  // (see below) is used for that case.
  virtual TfLiteDelegateCreators GetDelegateCreators() const { return {}; }

+  // TODO(b/202712825): it would be nice if we could avoid the need for separate
+  // "opaque" types & methods for use only with TF Lite in Google Play Services.
+
+  // Represents an opaque delegate instance.
+  // WARNING: Experimental interface, subject to change.
+  using TfLiteOpaqueDelegatePtr =
+      std::unique_ptr<TfLiteOpaqueDelegateStruct,
+                      void (*)(TfLiteOpaqueDelegateStruct*)>;
+
+  // Represents a function that creates an opaque delegate instance.
+  // WARNING: Experimental interface, subject to change.
+  using TfLiteOpaqueDelegateCreator =
+      std::function<TfLiteOpaqueDelegatePtr(int /*num_threads*/)>;
+
+  // Represents a sequence of opaque delegate creator functions.
+  // WARNING: Experimental interface, subject to change.
+  using TfLiteOpaqueDelegateCreators = std::vector<TfLiteOpaqueDelegateCreator>;
+
+  // Returns a vector of opaque delegate creators to create optional opaque
+  // delegates for resolving and handling ops in the flatbuffer model. This may
+  // be used in addition to the standard TfLiteRegistration lookup for graph
+  // resolution.
+  //
+  // Note that this method will be called only if you are using TF Lite in
+  // Google Play Services; if you are using regular TF Lite, GetDelegateCreators
+  // (see above) is used instead.
+  //
+  // WARNING: Experimental interface, subject to change.
+  virtual TfLiteOpaqueDelegateCreators GetOpaqueDelegateCreators() const {
+    return {};
+  }
+
  virtual ~OpResolver() {}

 private:
--- a/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/fft.cc
+++ b/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/fft.cc
@@ -13,10 +13,10 @@ See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/
 #include "tensorflow/lite/experimental/microfrontend/lib/fft.h"
-#include "tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h"

 #include <string.h>

+#include "tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h"

 void FftCompute(struct FftState* state, const int16_t* input,
                int input_scale_shift) {
@@ -37,9 +37,9 @@ void FftCompute(struct FftState* state, const int16_t* input,

  // Apply the FFT.
  kissfft_fixed16::kiss_fftr(
-    reinterpret_cast<kissfft_fixed16::kiss_fftr_cfg>(state->scratch),
-    state->input,
-    reinterpret_cast<kissfft_fixed16::kiss_fft_cpx*>(state->output));
+      reinterpret_cast<kissfft_fixed16::kiss_fftr_cfg>(state->scratch),
+      state->input,
+      reinterpret_cast<kissfft_fixed16::kiss_fft_cpx*>(state->output));
 }

 void FftInit(struct FftState* state) {
--- a/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/fft_util.cc
+++ b/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/fft_util.cc
@@ -13,10 +13,11 @@ See the License for the specific language governing permissions and
 limitations under the License.
 ==============================================================================*/
 #include "tensorflow/lite/experimental/microfrontend/lib/fft_util.h"
-#include "tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h"

 #include <stdio.h>

+#include "tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h"
+
 int FftPopulateState(struct FftState* state, size_t input_size) {
  state->input_size = input_size;
  state->fft_size = 1;
--- a/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h
+++ b/code/components/tflite-lib/tensorflow/lite/experimental/microfrontend/lib/kiss_fft_int16.h
@@ -31,4 +31,3 @@ namespace kissfft_fixed16 {
 #undef KISS_FFT_H

 #endif  // TENSORFLOW_LITE_EXPERIMENTAL_MICROFRONTEND_LIB_KISS_FFT_INT16_H_
-
--- a/code/components/tflite-lib/tensorflow/lite/kernels/internal/common.h
+++ b/code/components/tflite-lib/tensorflow/lite/kernels/internal/common.h
@@ -15,6 +15,7 @@ limitations under the License.
 #ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_COMMON_H_
 #define TENSORFLOW_LITE_KERNELS_INTERNAL_COMMON_H_

+#include <algorithm>
 #ifndef ALLOW_SLOW_GENERIC_DEPTHWISECONV_FALLBACK
 #ifdef GEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK
 #define ALLOW_SLOW_GENERIC_DEPTHWISECONV_FALLBACK
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
jomjol	bce99da6e5	v11.2.0	2022-08-28 20:04:36 +02:00
jomjol	234925c850	Merge branch 'rolling'	2022-08-28 19:54:15 +02:00
jomjol	c9b7a5f84c	v11.2.0	2022-08-28 19:52:51 +02:00
jomjol	338184712d	Delete dig-cont_0570_s3_q.tflite	2022-08-27 20:41:14 +02:00
jomjol	b2c510d73e	Rolling 20220827	2022-08-27 20:21:53 +02:00
jomjol	6a5d5511f1	Merge pull request #949 from caco3/beautify-restart Beautify restart	2022-08-27 18:03:03 +02:00
jomjol	ec99ac3a3b	Merge pull request #951 from haverland/rolling Testcases for #942	2022-08-27 18:01:43 +02:00
Frank Haverland	f676f12e02	Testcases for https://github.com/jomjol/AI-on-the-edge-device/issues/942	2022-08-27 16:48:42 +02:00
George Ruinelli	296a50a6d2	.	2022-08-26 23:51:24 +02:00
George Ruinelli	b1ee3d8793	Show progress on reboot and reload page automatically	2022-08-26 23:45:25 +02:00
jomjol	993fbfe5a8	Rolling 2022-08-26	2022-08-26 21:20:26 +02:00
jomjol	2b60e81a52	Rolling 2022-08-24	2022-08-24 17:52:21 +02:00
jomjol	11418459b8	Merge pull request #939 from caco3/rolling Extend Config Page	2022-08-24 17:46:34 +02:00
George Ruinelli	3b8b8e47da	Added link to wiki. Added filter for CNNs: on digital selection show all files except those which contain '/ana' in theri name and vice versa for the analog selection.	2022-08-22 23:48:48 +02:00
jomjol	ae302d49ef	Rolling 2022-08-22	2022-08-22 22:17:32 +02:00
jomjol	0153229d3c	Merge pull request #936 from haverland/rolling add test case to reproduce	2022-08-22 21:07:02 +02:00
jomjol	eeb74dd6fd	Merge pull request #931 from caco3/master added favicon and adjusted window title to show hostname first	2022-08-22 21:01:46 +02:00
Frank Haverland	ecc62a3ba9	add test case to reproduce	2022-08-22 21:01:44 +02:00
jomjol	aca60465f0	v11.1.1	2022-08-22 18:20:00 +02:00
jomjol	57bdca37fc	v11.1.1	2022-08-22 18:12:08 +02:00
jomjol	6409397770	Merge pull request #933 from haverland/rolling Fix for #921, #919	2022-08-22 17:58:30 +02:00
Frank Haverland	974044adf0	last analog result_float as Readout parameter, testcases for #921 , #919	2022-08-22 16:10:07 +02:00
Frank Haverland	59aeeda786	Merge branch 'rolling' into analogtodig_as_float	2022-08-22 10:33:10 +02:00
Frank Haverland	b6bf8d992f	Test case for postprocessing	2022-08-22 10:31:25 +02:00
jomjol	9d31edc67a	Rolling 2022-08-21	2022-08-21 21:33:56 +02:00
George Ruinelli	1b4f4bdd6d	added favicon and adjusted window title to show hostname first	2022-08-21 21:05:59 +02:00
jomjol	c9a879d329	v11.1.0	2022-08-21 17:56:46 +02:00
jomjol	ea69b1be00	v11.1.0	2022-08-21 17:44:34 +02:00
jomjol	2a8b3a87ea	Merge pull request #922 from haverland/rolling Rolling	2022-08-21 16:47:07 +02:00
jomjol	52783734ce	Merge pull request #925 from jochenchrist/master Remove .DS_Store	2022-08-21 16:44:12 +02:00
jomjol	0e1b390ec6	Merge pull request #918 from ppisljar/patch-1 Update FeatureRequest.md	2022-08-21 16:41:30 +02:00
jochen	ab49bdf82f	.DS_Store removed	2022-08-20 19:25:52 +02:00
jochenchrist	25e7051271	Delete .DS_Store	2022-08-20 19:23:08 +02:00
Frank Haverland	7315f9adfc	Merge branch 'jomjol:rolling' into rolling	2022-08-19 21:33:08 +02:00
Frank Haverland	af1aee4ac3	add testcase for #921	2022-08-19 21:30:11 +02:00
Frank Haverland	d6ff7eef88	fix problems with early transition of digits if analog pointers. #921	2022-08-19 21:05:23 +02:00
Frank Haverland	7706b4dbc3	fix for #919 the prev is int, so <9.0 instead of <9.5	2022-08-18 19:28:13 +02:00
Peter Pisljar	3561ecd2b7	Update FeatureRequest.md	2022-08-18 15:16:57 +02:00
jomjol	74c7ff7fdf	v11.0.1	2022-08-15 22:48:42 +02:00
jomjol	a68ce353ad	Merge pull request #910 from haverland/rolling Fix naming of models and new version	2022-08-13 15:58:57 +02:00
Frank Haverland	0d168f3445	Merge branch 'jomjol:rolling' into rolling	2022-08-13 15:38:57 +02:00
Frank Haverland	073e04a3cc	fix naming of models and new versions	2022-08-13 15:37:04 +02:00
jomjol	591dc048d4	v11.0.0	2022-08-13 14:26:04 +02:00
jomjol	bfe8d3b37a	v11.0.0	2022-08-13 14:20:40 +02:00
jomjol	9695dba415	Merge branch 'master' into rolling	2022-08-07 21:20:28 +02:00
jomjol	6a48f0502e	Merge pull request #885 from haverland/rolling CNNThreshold removed vor Analog100 and Digital100	2022-08-07 21:19:37 +02:00
Frank Haverland	4a8d6592ab	CNNThreshold removed for Analog100 and Digital100	2022-07-28 19:43:45 +02:00
jomjol	434aebd641	Merge pull request #881 from haverland/rolling Ignore hidden files in configuration->model selection	2022-07-25 19:11:29 +02:00
Frank Haverland	c124c38e70	ignore hidden files in configuration->model selection	2022-07-25 16:30:11 +02:00
Frank Haverland	e6d60bb124	Merge branch 'jomjol:rolling' into rolling	2022-07-24 20:20:53 +02:00
jomjol	085ea2028c	v10.6.1	2022-07-24 19:07:43 +02:00
jomjol	0e7c600cf7	Rolling v10.6.1	2022-07-24 18:53:25 +02:00
Frank Haverland	3f3532defe	Revert "Fix for #712 "Incorrect rollover digital numbers"" This reverts commit `11bfaf0e91`.	2022-07-20 19:12:19 +02:00
Frank Haverland	a0ffc88e47	Merge branch 'rolling' of https://github.com/haverland/AI-on-the-edge-device into rolling	2022-07-20 18:37:05 +02:00
Frank Haverland	11bfaf0e91	Fix for #712 "Incorrect rollover digital numbers"	2022-07-20 18:35:42 +02:00
jomjol	189093d548	Merge pull request #862 from haverland/rolling Version 1.0 of digital and analog categorical models	2022-07-18 20:33:02 +02:00
Frank Haverland	34eb89b1b6	fix extension finding for tlfite modellist	2022-07-18 19:18:09 +02:00
Frank Haverland	b725d242d3	Add Error to Logfile if output-dimenstion of model inconsistent.	2022-07-18 18:36:42 +02:00
Frank Haverland	568e88314b	Merge branch 'jomjol:rolling' into rolling	2022-07-17 20:00:28 +02:00
jomjol	aaad952782	v10.6.0	2022-07-17 09:34:27 +02:00
jomjol	dc27911174	Preparation for v10.6.0	2022-07-17 09:00:10 +02:00
Frank Haverland	42678ae8e1	Merge branch 'jomjol:rolling' into rolling	2022-07-16 23:09:41 +02:00
Frank Haverland	b6341992f6	Version 1.0 of digital and analog categorical models	2022-07-16 21:29:56 +02:00
jomjol	17fd0f96df	Rolling 20220716_2	2022-07-16 20:59:09 +02:00
jomjol	0b039e8d8c	Merge pull request #861 from haverland/rolling fix false value if analog + dighybrid on last digit if previous >=9.5	2022-07-16 20:07:02 +02:00
Frank Haverland	eab8a6d96b	fix false value if analog + dighybrid on last digit if previous >=9.5	2022-07-16 19:37:30 +02:00
jomjol	9f72bf117e	Rolling 20220716	2022-07-16 08:09:18 +02:00
jomjol	058e9436f0	Merge pull request #858 from haverland/rolling Rolling: Add analog classification model	2022-07-16 07:50:00 +02:00
Frank Haverland	ebd8fe0e4f	Merge branch 'rolling' of https://github.com/haverland/AI-on-the-edge-device into rolling	2022-07-03 21:02:00 +02:00
Frank Haverland	7970f5f691	Add analog classification (100) as modeltype analog100	2022-07-03 21:01:42 +02:00
jomjol	0689ca86c5	Merge branch 'rolling' of https://github.com/jomjol/AI-on-the-edge-device into rolling	2022-07-01 07:27:28 +02:00
jomjol	5783e0f182	Rolling 20220107	2022-07-01 07:27:26 +02:00
jomjol	e190aa8d03	Merge pull request #839 from haverland/rolling Rolling: Add testing for CNNFlowcontroll	2022-06-28 07:09:35 +02:00
Frank Haverland	15bac02cbf	Merge branch 'jomjol:rolling' into rolling	2022-06-26 23:23:16 +02:00
Frank Haverland	7fdacce9a2	Merge branch 'rolling' of https://github.com/haverland/AI-on-the-edge-device into rolling	2022-06-26 23:21:39 +02:00
Frank Haverland	78ea179e9a	added testing to cnnflowcontroll::ZeigerEval/ZeigerEvalHybrid	2022-06-26 23:21:01 +02:00
jomjol	2cfc3fadb2	Rolling 20220626	2022-06-26 21:57:20 +02:00
jomjol	60cdee2ea6	Merge pull request #834 from haverland/rolling Rolling: Fix -1 issue on watermeter (digital100 + analog)	2022-06-26 21:12:43 +02:00
Frank Haverland	a17ac6860d	Merge branch 'jomjol:rolling' into rolling	2022-06-26 13:47:06 +02:00
Frank Haverland	ef24466702	revert comment out in ZeigerEval use ZeigerEvalHybrid instead of ZeigerEval in DoubleHyprid10/Digital100 branch	2022-06-26 13:44:55 +02:00
jomjol	de78027b0e	Merge pull request #837 from toolsfactory/master Added Homie Mqtt request	2022-06-26 12:11:35 +02:00
jomjol	28c40a0ad2	Merge branch 'rolling' into master	2022-06-26 12:11:02 +02:00
Michael Geissler	230bbb2239	Added Homie Mqtt request	2022-06-26 08:14:41 +02:00
Frank Haverland	bda2913a32	cleanup, disable debugging	2022-06-24 14:59:51 +02:00
Frank Haverland	78daaf6e5b	fix -1 on the last digital digit if using digital100 model	2022-06-24 14:58:00 +02:00
jomjol	8939167a39	Rolling 20220618_v3	2022-06-18 16:01:25 +02:00
jomjol	b48c6111d9	Merge pull request #827 from haverland/rolling Rolling: Add hostname to title of configuration	2022-06-18 16:00:24 +02:00
Frank Haverland	de1e919b96	add hostname to configure	2022-06-18 15:39:54 +02:00
jomjol	35a96027f1	Rolling 20220618_v2	2022-06-18 14:42:30 +02:00
jomjol	7a36bfa2ff	Rolling 20220618	2022-06-18 11:10:31 +02:00
jomjol	dfeac0cb7f	Rolling 20220609	2022-06-09 21:16:05 +02:00
jomjol	dfec780157	Merge pull request #817 from haverland/rolling Digital100 model processing added	2022-06-01 20:34:43 +02:00
Frank Haverland	4cc93324f8	Digital100 model	2022-05-31 22:33:12 +02:00
Frank Haverland	bf8833eae6	add Digital100 model processing with output 0.0-9.9 as categorical network	2022-05-30 23:22:17 +02:00
jomjol	00028010ee	Rolling 20220526	2022-05-26 20:31:26 +02:00
jomjol	cce812ff11	Rolling 20220417	2022-04-17 21:51:26 +02:00
jomjol	ccb4bd595c	Rolling 20220415	2022-04-15 14:49:58 +02:00
jomjol	dc7eedcd8f	Merge pull request #774 from wetneb/534-influxdb InfluxDB integration	2022-04-15 14:39:53 +02:00
jomjol	40faa78929	Merge branch 'rolling' into 534-influxdb	2022-04-15 14:39:09 +02:00
Antonin Delpeuch	eb53db00d0	First draft of InfluxDB integration, for #534	2022-04-15 10:19:14 +02:00
jomjol	658ef39736	Rolling 20220322	2022-03-22 20:51:55 +01:00
jomjol	0e90bcb2ef	Rolling 20220320	2022-03-20 21:36:44 +01:00
jomjol	a020fce2d7	Merge pull request #725 from mkelley88/master Update README.md	2022-03-15 20:27:34 +01:00
Matthew T. Kelley	525ccf7aba	Update README.md Made changes to increase clarity of documentation.	2022-03-15 04:50:37 -04:00
jomjol	ebcfc16f63	Create dig-s0-q-20220224.tflite	2022-02-24 21:17:54 +01:00