How to build a high performance front end intelligent reasoning engine

Aliyunqi 2021-02-23 03:38:44
build high performance end intelligent

brief introduction : What is the front-end intelligent reasoning engine and how to build and apply it ?

What is a front-end intelligent reasoning engine

Before the front-end intelligent reasoning engine , Let's start with what is ” End intelligence ”.

End intelligence (On-Device Machine Learning) It refers to putting the application of machine learning on the end side . there “ End side ”, It's relative to cloud services . It can be a cell phone , It can also be IOT Equipment etc. .

Traditional machine learning , Because of the size of the model 、 The problem of computing power of machines , Many of them are done on the server side . such as Amazon AWS Yes “Amazon Rekognition Service”,Google Yes “Google Cloud Vision Service”. With the improvement of computing power of end-to-side devices represented by mobile phones , And the evolution of model design itself , Smaller size 、 More powerful models are gradually able to deploy to the end to run .

-- Reference from 《

Compared to cloud deployment ,APP The end has more direct user characteristics , At the same time, it has the following advantages :

  • Real time high , End side processing can save the network transmission time of data .
  • Save resources , Make full use of end-to-side computing power and storage space .
  • Good privacy , Data generation and consumption are done on the end side , Avoid the risk of privacy leakage caused by transmission .

These are the advantages of end-to-end intelligence , But it's not a panacea , There are still some limitations :

  • Equipment resources are limited , End to side forces 、 Storage is limited , You can't do large-scale, high-intensity, continuous computing .
  • The algorithm is small in scale , The end side calculation force is small , And single user data , It's not optimal in algorithm .
  • User data is limited , End side data is not suitable for long-term storage , At the same time, the available data is limited .

Empathy , Front end intelligence refers to putting machine learning applications on the front end (web、h5、 Small program etc. ).

therefore , What is the front-end intelligent reasoning engine ?

Here's the picture :
The front-end intelligent reasoning engine is actually the thing that uses the front-end computing power to execute the model .

The existing front-end reasoning engine in the industry

Here are three common reasoning engines

  • tensorflow.js( Hereinafter referred to as tfjs)
  • ONNX.js
  • WebDNN

For an end-to-end reasoning engine , What's the most important ? Performance, of course ! The better the performance , It also means that there will be more application scenarios on the end , Let's take a look at the performance comparison of these three reasoning engines :

( The following data usage model is MobileNetV2 Classification model )

cpu(js Calculation )

You can see , In pure JS Computing in an environment , Just once you do a classification, you have to 1500ms above . Imagine if a camera needs to make a real-time classification prediction of the objects it takes ( For example, we can predict whether the subject is a cat or a dog ), So every prediction needs 1500ms, This kind of performance is intolerable .


stay WASM In the environment , The best performance ONNX.js Reached 135ms Performance of , That is to say 7fps about , It's barely working . and tfjs But it's bad 1501ms. This is because onnx.js Take advantage of worker Multithread acceleration , So the best performance .


And finally GPU Environmental Science , You can see tfjs and ONNXjs The performance of the system has reached a relatively good level , and WebDNN Worse performance .

In addition to the above three engines , At present, there are also Baidu's paddle.js And Taobao's mnn.js etc. , No discussion here .

Of course , When choosing an appropriate inference engine , Besides performance , And Ecology 、 Engine maintenance and so on . In a comprehensive way ,tfjs It is the most suitable front-end reasoning engine in the current market . because tfjs Can rely on tensorflow The powerful ecology of 、google Full time maintenance of official team, etc . by comparison ONNX The framework is relatively small , And ONNXjs It has not been maintained for nearly a year .WebDNN Performance and ecology are not competitive .

High performance computing solutions on the front end

As you can see from the last chapter , It's common to do high-performance computing on the front end WASM And based on WebGL Of GPU Calculation , Of course, there are asm.js No discussion here .


WASM We should be familiar with , Here's just a brief introduction :

WebAssembly Is a new type of code running in modern web browser , And provide new performance features and effects . It's not designed for handwritten code, it's designed for things like C、C++ and Rust And other low-level source languages provide an efficient compilation target .

For the network platform , It has great significance —— This is for the client app It provides a way to run code written in multiple languages in a way close to local speed on the network platform ; before this , client app It's impossible .

and , You don't know how to write WebAssembly You can use it in the case of code .WebAssembly Can be imported into a network app( or Node.js) in , And exposed the evidence that JavaScript The use of WebAssembly function .JavaScript Not only can frameworks be used WebAssembly Get huge performance advantages and new features , And it can also make all kinds of functions easy to use for network developers .--《 Excerpt from MDNWebAssembly Concept 》


what ?WebGL It's not for graphic rendering ? It's not doing 3D Yes, I don't know ? Why can we do high performance computing ?

Maybe some students have heard of gpgpu.js This library , This library is to use webgl For general purpose calculation , What is the specific principle ?( To be able to read on , Please take a quick look at this article first ):《 utilize WebGL2 Realization Web Front-end GPU Calculation 》.

Optimize the performance of reasoning engine

Okay , At present, we know two high-performance computing methods on the front end , So if the existing framework (tfjs、onnxjs) Performance is not to meet our needs ? How to further improve engine performance , And the production environment ?

The answer is : By hand , Optimize performance . Yes , It's so simple and rude . With tfjs For example ( The other frameworks are consistent in principle ), Here's how to optimize engine performance with different postures .

At the beginning of last year , Our team and google Of tfjs The team had an in-depth communication ,google There is a clear indication that tfjs The following development direction is WASM Calculation oriented 、webgl Calculation doesn't do new feature Focus on maintenance . But at this stage, browsers 、 The app is right WASM Our support is not complete ( for example SIMD、Multi-Thread Other characteristics ), therefore WASM It can't be implemented in the production environment for the time being . therefore , At this stage, we still need to rely on webgl Computing power . Bad is , here tfjs Of webgl The performance on the mobile terminal is still unsatisfactory , In particular, the performance of low-end computers can not meet our business requirements . Can't , We have to go in and optimize the engine ourselves . So the following is all about webgl Calculation .

Optimize WebGL High Performance Computing n Posture

Pose a : Compute vectorization

Computational vectorization means , utilize glsl Of vec2/vec4/matrix Data types are calculated , Because for GPU Come on , The biggest advantage is parallel computing , Parallel computing can be achieved as much as possible through vector computing .

For example, a matrix multiplication :
c = a1 * b1 + a2 * b2 + a3 * b3 + a4 * b4;
It can be changed to
c = dot(vec4(a1, a2, a3, a4), vec4(b1,b2,b3,b4));

Vectorization should also be combined with the optimization of memory layout ;

Position 2 : Memory layout optimization

If you read the above 《 utilize WebGL2 Realization Web Front-end GPU Calculation 》 The students of this article should understand that , stay GPU All of the data in is stored through Texture Of , and Texture Itself is a Long n * wide m * passageway (rgba)4 Things that are , If we want to save one 3 * 224 * 224 * 150 What should we do if we go into the four-dimensional matrix of the matrix ? It's going to involve matrix coding , That is to store the high-dimensional matrix into the characteristic shape in a certain format Texture Inside , and Texture In addition, the data layout will affect the read and memory performance in the calculation process . for example , Take a simpler example :
If it's a normal memory layout , The calculation needs to traverse the matrix once by row or case , and GPU Of cache yes tile Type of , namely n*n Cache of type , Depending on the chip n Somewhat different . So this way of traversal will cause frequent cache miss, It becomes the bottleneck of performance . therefore , We need to optimize the performance through memory layout . Like the image below :

Pose three : Graph optimization

Because a model is made up of operators one by one , And in the GPU Each operator is designed to be a webgl program, Every switch program It will cause a lot of performance loss . So if there's a way to reduce the number of models program Number , The performance improvement is also very considerable . Here's the picture :
We fuse some nodes that can be fused on the graph structure (nOP -> 1OP), Based on the new computing node, the new OP. This greatly reduces OP The number of , And it reduces Program The number of , So it improves reasoning performance . Especially on low-end phones .

Position 4 : Calculation of mixing accuracy

All of the above calculations are based on the conventional floating-point calculation , That is to say float32 Single precision floating point calculation . that , stay GPU Whether the mixed precision calculation can be realized in this paper ? for example float16、float32、uint8 Calculation of mixed precision . The answer is yes , stay GPU The value of realizing mixed precision calculation in this paper is to improve GPU Of bandwidth. because webgl Of texture Each pixel contains rgba Four channels , And the maximum of each channel is 32 position , We can do it in 32 Store as much data as possible in the bit . If the accuracy is float16, So you can store two float16,bandwidth That's what happened before 2 times , Empathy uint8 Of bandwidth It was before 4 times . This performance improvement is huge . Let's talk about the picture above :

posture n:...

There are many ways to optimize , Here is not a list .

The scene of the engine landing

at present , The engine based on our deep optimization has been implemented in many application scenarios of ant group and Ali economy , A typical example is the pet recognition demonstrated at the beginning of the article , And card identification 、 Broken screen camera and so on .

Before the industry has a relatively hot virtual make-up app, etc .

Friends who read this article can also open your brain holes , Dig out more and more interesting intelligent scenes .

Future outlook

With the upgrading of models and the in-depth optimization of engines on the market , I Believe tfjs It will shine on more interactive scenes , For example, have AI The front end game of ability 、AR、VR Scenes, etc. . Now all we have to do is calm down , Stand on the shoulders of giants and keep polishing our engines , Wish to wait for the flowers to bloom .

author : Green wall

Link to the original text  

This article is the original content of Alibaba cloud , No reprint without permission



  1. An inexperienced front-end engineer, what are the common problems when writing CSS?
  2. HttpServletRequest get URL (parameter, path, port number, protocol, etc.) details
  3. Springboot starts http2
  4. Enabling http2.0 in spring boot
  5. JQuery:JQuery基本语法,JQuery选择器,JQuery DOM,综合案例 复选框,综合案例 随机图片
  6. Using JavaScript in Safari browser history.back () the page will not refresh after returning to the previous page
  7. vue.js Error in win10 NPM install
  8. In less than two months, musk made more than $1 billion, more than Tesla's annual profit
  9. Springboot starts http2
  10. Vue event bus
  11. JQuery easy UI tutorial: custom data grid Pagination
  12. Using okhttp and okhttpgo to obtain onenet cloud platform data
  13. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control
  14. HTTP 1. X learning notes: an authoritative guide to Web Performance
  15. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control
  16. HTTP 1. X learning notes: an authoritative guide to Web Performance
  17. JQuery:JQuery基本语法,JQuery选择器,JQuery DOM,综合案例 复选框,综合案例 随机图片
  18. Event bubble and capture in JavaScript
  19. The root element is missing solution
  20. Event bubble and capture in JavaScript
  21. Configure the certificate to enable ngnix to publish the trusted website of HTTPS
  22. Javascript数据类型
  23. HTTP interface debugging tool! 48000 star HTTP command line client!
  24. Parameter encryption of front end URL link band
  25. HTTP interface debugging tool! 48000 star HTTP command line client!
  26. Three front end frameworks: data binding and data flow
  27. Reading Axios source code (1) -- exploring the realization of basic ability
  28. Event bubble and capture in JavaScript
  29. 【微前端】微前端最終章-qiankun指南以及微前端整體探索
  30. R & D solution e-Car front end monitoring system
  31. [JS] 877 - 35 wonderful knowledge of JavaScript, long experience!
  32. R & D solution e-Car front end monitoring system
  33. High performance nginx HTTPS tuning - how to speed up HTTPS by 30%
  34. 解决ajax跨域问题【5种解决方案】
  35. Top ten classic sorting of JavaScript
  36. HTTP 1. X learning notes: an authoritative guide to Web Performance
  37. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control component
  38. My http / 1.1 is so slow!
  39. Why Vue uses asynchronous rendering
  40. The response status was 0. Check out the W3C XMLHttpRequest Level 2 spec for
  41. The tapable instance object hook of webpack4. X core tool library
  42. The tapable instance object hook of webpack4. X core tool library
  43. Using libcurl for HTTP communication in C + +
  44. Using libcurl for HTTP communication in C + +
  45. Using CSS variable in Vue
  46. Deeply understand the update of state and props in react
  47. No matter how fast the Internet is, it's useless! In addition to Baidu disk, there is this website slow to let you eat shriveled
  48. Baidu share does not support the solution of HTTPS
  49. [micro front end] the final chapter of micro front end - Qiankun guide and overall exploration of micro front end
  50. [micro front end] the final chapter of micro front end - Qiankun guide and overall exploration of micro front end
  51. Vue cli creates vue3 project
  52. Nginx reverse proxy for windows authentication using NTLM
  53. Rust tutorial: introduction to rust for JavaScript developers
  54. Deploying personal blog to Tencent cloud with serverless framework
  55. R & D solution e-Car front end monitoring system
  56. JavaScript advanced learning
  57. Spend 1 minute to master these 5 ppt tips, courseware making less detours
  58. Vue: vuex persistent state
  59. React native gets the current network state of the device Netinfo
  60. High performance nginx HTTPS tuning - how to speed up HTTPS by 30%