R & D solution e-Car front end monitoring system

itread01 2021-02-23 02:02:37
solution e-car car end monitoring


# Background self developed tools are developed to solve internal problems , I hope these questions will resonate with you :1. Do you know the important business , This page can serve users normally ?2. Can it be before the problem erupts on a large scale , Quickly perceive business anomalies ?3. How not to go to the user's computer can be intuitive to see the problem , So that you can see the whole project ; Can we drill down all the way from macro view to micro view to quickly locate online alarm information ?4. Bring out reasonable evidence when communicating with other departments , To tell him that the interface is inaccessible at this time , And tell us that the quotation is correct , Help server ** The check ** The problem is .5. Product and design students want to improve the user experience , R & D continuously iterates over functional versions . These are the optimization points we think , What's the effect ? How to measure ?6. Which advertising space , Which resource is more valuable ? How to more accurately touch the user's pain point , To enhance business empowerment ? We see these questions , Need to be ** Data indicators ** The support of . From the perspective of solving these problems , Problems that repeatedly appear or cannot be explained to other departments , Build products that can help us solve problems . So in this scenario , Yiche · Front end monitoring came into being . It is mainly multi scene, multi-dimensional real-time monitoring of the market , Realize the whole link monitoring of browser client , It is convenient for the team to trace and rectify after the event , Change to early warning and rapid root cause determination . After detailed planning , We divide the front-end monitoring into four phases , They are : Exception monitoring ( The first phase of )、 Performance monitoring ( Phase two )、 Data buried point ( Three issues )、 Behavior collection ( Four issues ), On 2020 year 6 month 23 R & D was officially launched on the 12th , It is now in phase II .# The key structure is to realize the above requirements , The monitoring system is mainly divided into four stages ; The difference is : Index collection 、 Index storage 、 Statistics and Analysis 、 Visual display .![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161132813-571147359.png)** Index collection stage :** Through front end Integration SDK Collect requests 、 Efficiency of the 、 Exception and other indicators information ; It's a simple process on the client side , Then report to the server .** Index storage stage :** It is used to receive the collected information reported by the front end , The main purpose is data landing .** Statistics and analysis stage :** Automatic analysis , Through the statistics of data , Let the program find the problem and trigger the alarm . Artificial analysis , It's through the visual data panel , Let users see the specific log data , So as to find the root of the abnormal problem .** Visualization stage :** Through the visual platform ; In these indicators (API Monitoring 、 Exception monitoring 、 Resource monitoring 、 Performance monitoring ) in , Tracking user behavior to locate problems .# With the increase of statistical requirements and the launch of front-end applications, the overall architecture diagram is becoming more and more popular , The amount of data from the early days of the day 100 More than ten thousand pieces of information ; Up to now, every day is about 7000 Ten thousand pieces of information . The architecture also experienced three iterations . This is the latest version of the architecture , The main process is 6 Layer processing .![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161202658-493211893.png)** Acquisition layer :**PC and H5 Used a set of SDK Monitoring event collection index , Then the monitored indicators are passed through REST Interface to Logback Push material .Logback In the form of long lines , These different types of indicator data will be pushed to Flume In clusters .Flume The cluster will collect the information , Distribute to Kafka Topic To store .** Treatment layer :** from Flink To consume in real time ;Flink There are three types of consumption , The difference is : Offline data landing 、 Real time ETL+ Atlas 、 Detail log .** Storage :** Offline data will be stored in HDFS in ; Real time ETL+ The map data will be stored in MySQL in ; Details will fall into ES in .** Statistical layer :** offline (DW、DM)、 Real time ( Minutes -> Ten minutes -> Hour class ) The way , Summarize and count the indicators .** Application layer :** Finally, the interface is used to summarize tables and details ES Search for information in .** Display layer :** And then the front-end outputs the chart 、 Report 、 Details 、 Links and other information .# Technical solutions ## The initial vision of data collection was to focus on the business ** Non invasive **, Business systems don't need to be revamped , Just embed a piece of code . So these collections , All are SDK Automated processing .SDK It's going to listen to a few events all over the world , They are : Error monitoring 、 Resource exception monitoring 、 Monitoring of page performance 、API Call monitoring .![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161216386-1081843.png) Through these monitoring , The final summary is 3 Collection of indicators .** Abnormal collection :** call error/unhandledrejection event , Used to capture JS、 Pictures 、CSS And so on .**** Efficiency collection :** Call browser native performance.timing API Capture page performance metrics .** Interface collection :** Through Object.definePropety Proxy global XHR Used to capture the browser's XHR/FETCH Request for .## Data acquisition terminal SDK Architecture ![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161239970-1393936010.png)SDK It is mainly divided into two parts :** The first part :**SDK Mainly SDK Drive of , contain : entrance 、 Core tools and generic inference .** The second part :** It's also called the plug-in part ( Blue area ), It mainly realizes the collection of the above three data indexes . Next, we will introduce the second part in detail , The collection scheme of each index .## Exception collection scheme through monitoring error Mistake , You can capture all (JS Mistake 、 Picture loading 、CSS Load 、JS Load 、Promise etc. ) Abnormal ; It also supports InternalError、ReferenceError [ etc. 7 It's a kind of error capture ](https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Global_Objects/Error). Here's the key code .### Monitoring events ```javascript/** * Monitoring error、unhandledrejection Method to handle exception information * * @param {YicheMonitorInstance} instance SDK For example */export default function setupErrorPlugin(instance: YicheMonitorInstance) { // JS Error or static resource loading error on('error', (e: Event, url: any, lineno: any) => { handleError(instance, e, url, lineno); }); // Promise Mistake ,IE No support on('unhandledrejection', (e: any) => { handleError(instance, e); });}```### Determine the exception type ```javascript/** * W3C Mode support ErrorEvent, All exceptions from ErrorEvent Take... Here * * @param {MutationEvent} error Resource error 、 Code error */function handleW3C(event: any) { switch (event.type) { // Wrong instruction code , Or resource error case 'error': event instanceof ErrorEvent ? reportJSError(instance, event) : reportResourceError(instance, event); break; // Promise Whether there is an uncapped reject Error of case 'unhandledrejection': reportPromiseError(instance, event); break; }}```### Capture abnormal data ```javascript/** * Report JS Abnormal * * @param {YicheMonitorInstance} instance SDK For example * @param {ErrorEvent} event */export default function reportJSError( instance: YicheMonitorInstance, event: ErrorEvent,): void { // Set reporting data const report = new ReportDataStruct('error', 'js'); const errorInfo = event.error ? event.error.message : ` Unknown error :${event.message}`; // Setting error information , Compatible remote instruction code is not set Script error The resulting exception report.setData({ det: errorInfo.substring(0, 2000), des: event.error ? event.error.stack : '', defn: event.filename, deln: event.lineno, delc: event.colno, rre: 1, });}```### Deal with IE When the compatibility problem catches an exception, handle it IE The problem of compatibility of ,IE The solution is as follows :```javascript/** * IE 8 Error item for , So for IE 8 Browser , We just need to get it wrong . * * 1. Error message * 2. Error page * 3. Wrong line number ( Because files are usually compressed , So Statistics IE8 It doesn't make any sense ) * * @param {string} error Error message * @param {string | undefined} url Abnormal URL * @param {number | undefined} lineno Number of exception lines ,IE There are no columns */export function handleIE8Error( error: string, url?: string | undefined, lineno?: number | undefined,) { return { colno: 0, lineno: lineno, filename: url, message: error, error: { message: error, stack: `IE8 Error:${error}`, }, } as ErrorEvent;}/** * IE 9 Error of , Need to be in target It's got * * @param { Element | any } error IE9 Abnormal elements */export function handleIE9Error(error: any) { // Get Event const event = error.currentTarget.event; return { colno: event.errorCharacter, lineno: event.errorLine, filename: event.errorUrl, message: event.errorMessage, error: { message: event.errorMessage, stack: `IE9 Error:${event.errorMessage}`, }, } as ErrorEvent;}```## Efficiency collection scheme ### Browser page loading process ![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161302193-1167348100.png)### We get performance indicators by browser native [Navigation Timing API](https://w3c.github.io/navigation-timing/) We can get the above information ** Page loading process ** The data of various performance indicators in , For efficiency analysis , It's time in nanoseconds .![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161343315-1496046922.png) Of course, it also depends on [PerformanceObserver API](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceObserver) To measure [FCP](https://web.dev/fcp/)、[LCP](https://web.dev/lcp/)、[FID](https://web.dev/fid/)、[TTI](https://web.dev/tti/)、[TBT](https://web.dev/tbt/)、[CLS](https://web.dev/cls/) And other key indicators .### Detailed calculation formula | Indicators | Meaning | Calculation formula || --- | --- | --- || ttfb | The first time | timing.responseStart - timing.requestStart || domReady | Dom Ready Time | timing.domContentLoadedEventEnd - timing.fetchStart || pageLoad | Page full load time | timing.loadEventStart - timing.fetchStart || dns | DNS Query time | timing.domainLookupEnd - timing.domainLookupStart || tcp | TCP Connection time | timing.connectEnd - timing.connectStart || ssl | SSL Connection time | timing.secureConnectionStart > 0 ? timing.connectEnd - timing.secureConnectionStart) : 0 || contentDownload | Content delivery time | timing.responseEnd - timing.responseStart || domParse | DOM Parsing time | timing.domInteractive - timing.responseEnd || resourceDownload | Resource loading takes time | timing.loadEventStart - timing.domContentLoadedEventEnd || waiting | Request response | timing.responseStart - timing.requestStart || fpt | White screen time , The old | timing.responseEnd - timing.fetchStart || tti | It's interactive for the first time | timing.domInteractive - timing.fetchStart || firstByte | First package time | timing.responseStart - timing.domainLookupStart || domComplete | DOM Completion time | timing.domComplete - timing.domLoading || fp | White screen time , New indicators | performance.getEntriesByType('paint')[0] || fcp | First effective content rendering | performance.getEntriesByType('paint')[1] || lcp | First screen big content drawing time | PerformanceObserver('largest-contentful-paint')" || Faster than | | Page full load time ≤ For a certain period of time ( Such as 2s) Of Sampling PV / Total sampling PV * 100% || Slow drive ratio | | Page full load time ≥ For a certain period of time ( Such as 5s) Of Sampling PV / Total sampling PV * 100% |## Network request collection scheme network request , Through Object.definePropety In the right way XHR Acting as an agent . The key code is as follows .### Rewrite XMLHttpRequest This part can be referred to directly [ajax-hook](https://github.com/wendux/Ajax-hook) The implementation principle of .```javascriptexport function hook(proxy) { window[realXhr] = window[realXhr] || XMLHttpRequest XMLHttpRequest = function () { const xhr = new window[realXhr]; for (let attr in xhr) { let type = ""; try { type = typeof xhr[attr] } catch (e) { } if (type === "function") { this[attr] = hookFunction(attr); } else { Object.defineProperty(this, attr, { get: getterFactory(attr), set: setterFactory(attr), enumerable: true }) } } const that = this; xhr.getProxy = function () { return that } this.xhr = xhr; } return window[realXhr];}```### Intercept all requests. Normally, a page will request multiple interfaces , If there is 20 A request ; We expect that after all the requests in the phase have ended , Summary into a record, merge and report , This can effectively reduce the concurrency of requests . The key code is as follows :```javascript/** * Ajax Request plug-in * * @author wubaiqing */// All requests for information , And the total amount let allRequestRecordArray: any = [];let allRequestRecordCount: any = [];// Information on success ,200,304 Information about let allRequestData: any = [];// Unusual information , time-out ,405 And so on let errorData: any = [];/** * Monitoring Ajax Request information * * @param {YicheMonitorInstance} instance SDK For example */export default function setupAjaxPlugin(instance: YicheMonitorInstance) { let id = 0; proxy({ onRequest: (config, handler) => { // Filter out the listening clouds 、 Sherlock Holmes 、APM if (filterDomain(config)) { // Queue of new request record allRequestRecordArray.push({ id, timeStamp: new Date().getTime(), // Record request duration config, // contain : Request address 、body Etc handler, // XHR Entity }); // Record the total number of requests allRequestRecordCount.push(1); id++; } handler.next(config); }, // It will trigger once when it fails onError: (err, handler) => { if (allRequestRecordArray.length === 0) { handler.next(err); return; } for (let i = 0; i < allRequestRecordArray.length; i++) { // Current information const currentData = allRequestRecordArray[i]; if ( currentData.handler.xhr.status === 0 && // Not transmitted currentData.handler.xhr.readyState === 4 ) { errorData.push( JSON.stringify(handleReportDataStruct(instance, currentData)), ); allRequestRecordArray.splice(i, 1); } } sendAllRequestData(instance); handler.next(err); }, onResponse: (response, handler) => { // Return without request Null if (allRequestRecordArray.length === 0) { handler.next(response); return; } for (let i = 0; i < allRequestRecordArray.length; i++) { // Current information const currentData = allRequestRecordArray[i]; // As long as the request load is complete , Whether it's success or failure , It's all a request if (currentData.handler.xhr.readyState === 4) { // Normal requests if ( (currentData.handler.xhr.status >= 200 && currentData.handler.xhr.status < 300) || currentData.handler.xhr.status === 304 ) { allRequestData.push( JSON.stringify(handleReportDataStruct(instance, currentData)), ); } else { if (currentData.handler.xhr.status > 0) { // With status code // Wrong request errorData.push( JSON.stringify(handleReportDataStruct(instance, currentData)), ); } } // Delete the value of the current array allRequestRecordArray.splice(i, 1); } } // Send data sendAllRequestData(instance); handler.next(response); }, });}function sendAllRequestData(instance) { if ( allRequestData.length + errorData.length === allRequestRecordCount.length ) { // Processing normal requests if (allRequestData.length > 0 || errorData.length > 0) { handleAllRequestData(instance); } // Handle exception requests if (errorData.length > 0) { handleErrorData(instance); } // All requests for information , And the total amount allRequestRecordArray = []; allRequestRecordCount = []; // Information on success ,200,304 Information about allRequestData = []; // Unusual information , time-out ,405 And so on errorData = []; }}```## There are two ways to load probes , They have some advantages and disadvantages respectively :** Load synchronously :** Collect SDK Put it in all JS In front of the request head ; Because of the loading order , If you put it in other JS After the request , Previous JS There's something unusual , You can't capture it . Because it's going to load in advance JS Resources , It's going to have an impact on performance .** Asynchronous load :** Collect SDK By executing JS After injection into the page ; If we can guarantee the first time JS No abnormality , You can also load it asynchronously SDK, It's good for first screen optimization . Now we're using the first one ** Load synchronously ** The way .# A screenshot of the product ## The home page will show the intelligence of all the applications , On the home page, you can find the abnormal data of each application intuitively .![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161416646-1203440516.png)## If you want to check the details of an application on the market page , Will enter the application's big page ; It will mainly show the application , The importance of the front end , Information status in the past one hour . At present, there are mainly page efficiency 、 Resource exception 、JS Abnormal 、API Interface success rate and other important indicators as a measure .![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161425488-1297879458.png)## Details page details page , You can see the data details of the application . It's convenient for the team to trace after the event 、 Rectification , Early warning and rapid root cause determination .![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161450754-1508935139.png)# # Problems encountered SDK After the indicators are collected, the data will be reported , Will do some filtering ** In front of ** operation , Such as :- Cover up some blacklists .- Peak cutting and valley filling of indicators .- Transformation of application information .- Client IP Get .- Token Verification of . There is a drawback to preprocessing , Because the server will go through the parsing and conversion process ; When the amount of data reaches daily 7000 About ten thousand , The reported server can't handle it . So we took the information ** In front of ** Deal with , It's data landing ** Postposition ** Deal with ; Post processing is in the process of data cleaning , After filtering out blacklists and abnormal indicators . This reduces the pressure on the reporting server . And the warehouse will keep all the original data , If something goes wrong , It's also convenient for us to trace the source , Restore the data .# The overall plan is divided into four phases , At present, it is still in the phase II performance monitoring stage .| Plan | The goal is | Priorities | Support platform | The main problems to be solved are || :---: | :---: | :---: | :---: | :---: || The first phase of | Exception monitoring | high | PC、Mobile、 Small program | The impact of abnormal impact on users , Resource loading exception awareness , Network request exception awareness , Code error exception awareness , The details of the code error (SourceMap) analysis || Phase two | Performance monitoring | high | | Efficiency value ( First tuple 、DOMReady、 The page is fully loaded 、 Redirect 、DNS、TCP、 Request response, etc ),API Monitoring ( The success rate 、 It takes time to succeed 、 The number of failures, etc ), Page reference resource statistics , And the share of resources (JS、CSS、 Pictures 、 Font type 、iFrame、Ajax etc. ), Number comparison ,95% Users of 、99% Users of 、 The average user || Three issues | Data buried point | in | | The operating system 、 Resolution 、 Browser , Event classification ( Click on the event 、 Scroll Events ), Specific specified event type ( Click on Banner Graph ), The time of the incident , The location of the trigger event ( Mouse X、Y, Thermal maps can be generated ), Visitor identification , User logo , Link acquisition || Four issues | Behavior collection | low | | Go to the page , Leave the page , Click on the element , Scroll through the page , Operation link , Customize ( Such as , Click on the picture of the advertising space ),Chrome Plug in intuitive view of the buried point |# Other self-developed APM The system is convenient to get through and integrate with the inside ; For example, after the application is released, it can be pushed directly SourceMap Archives ; And it can automatically analyze the page performance after online release . If there is no need to build such a system at the present stage of development , But the business needs the ability to , You can also consider some third-party products .## Commercial product analysis | | Yiche | [ Listen to the cloud ](https://www.tingyun.com/) | [ Alibaba cloud ARMS](https://www.aliyun.com/product/arms) | [Fundebug](https://www.yuque.com/ab93na/project/fcqxvc?inner=2HKDM) | [ Yue Ying ](https://yueying.effirst.com) | [FrontJS](https://www.frontjs.com/) || --- | --- | --- | --- | --- | --- | --- || Page performance monitoring | It's fully functional | Basic functions | It's fully functional | weak | It's fully functional | It's fully functional || Exception monitoring | Basic functions | Basic functions | It's fully functional | It's fully functional | It's fully functional | It's fully functional || API Monitoring | It's fully functional | Basic functions | It's fully functional | Basic functions | Basic functions | Basic functions || Page load waterfall | Nothing | It's fully functional | Basic functions | Nothing | Nothing | It's fully functional || Interactivity | good | commonly | good | blurring | good | good |## The importance index is right for Ali ARMS Compared with easy car · Front end monitoring and Alibaba cloud ARMS Made a comparison of some important indicators ,** mean value ** It's floating up and down 5%-8% about ;![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161632849-1036373400.jpg)![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161640796-828289541.jpg)![](https://img2020.cnblogs.com/blog/328599/202102/328599-20210222161645032-1511614805.jpg)## Reference link - [User Timing API](https://w3c.github.io/user-timing/)- [Long Tasks API](https://w3c.github.io/longtasks/)- [Element Timing API](https://wicg.github.io/element-timing/)- [Navigation Timing API](https://w3c.github.io/navigation-timing/)- [Resource Timing API](https://w3c.github.io/resource-timing/)- [Server timing](https://w3c.github.io/server-timing/)- [Custom Metrics](https://web.dev/custom-metrics/)- [Lighthouse performance scoring](https://web.dev/performance-scoring/)- [FCP](https://web.dev/fcp/)- [LCP](https://web.dev/lcp/)- [FID](https://web.dev/fid/)- [TTI](https://web.dev/tti/)- [TBT](https://web.dev/tbt/)- [CLS](https://web.dev/cls/)
版权声明
本文为[itread01]所创,转载请带上原文链接,感谢
https://qdmana.com/2021/02/20210222210428971c.html

  1. vue判断elementui中el-form是否更新变化,变化就提示是否保存,没变就直接离开
  2. 算法题:两数之和——JavaScript及Java实现
  3. 高性能 Nginx HTTPS 调优
  4. Why Vue uses asynchronous rendering
  5. day 31 jQuery进阶
  6. day 30 jQuery
  7. CSS whimsy -- using background to create all kinds of wonderful backgrounds
  8. Why are more and more people learning front end?
  9. What do you do with 4K front-end development?
  10. 8 years of front-end development knowledge precipitation (do not know how many words, keep writing it...)
  11. What is the annual salary of a good web front end?
  12. Front end novice tutorial! How to get started with web front end
  13. Will the front end have a future?
  14. Is the front end hard to learn?
  15. Seven new Vue combat skills to improve efficiency in 2021!
  16. Is front end learning difficult?
  17. How about the process of Web front-end development and self-study?
  18. Front end learning route from zero basis to proficient
  19. What is the basis of learning front end?
  20. What knowledge points need to be learned for self-study front end? How long can I become a front-end Engineer?
  21. An inexperienced front-end engineer, what are the common problems when writing CSS?
  22. HttpServletRequest get URL (parameter, path, port number, protocol, etc.) details
  23. Springboot starts http2
  24. Enabling http2.0 in spring boot
  25. JQuery:JQuery基本语法,JQuery选择器,JQuery DOM,综合案例 复选框,综合案例 随机图片
  26. Using JavaScript in Safari browser history.back () the page will not refresh after returning to the previous page
  27. vue.js Error in win10 NPM install
  28. In less than two months, musk made more than $1 billion, more than Tesla's annual profit
  29. Springboot starts http2
  30. Vue event bus
  31. JQuery easy UI tutorial: custom data grid Pagination
  32. Using okhttp and okhttpgo to obtain onenet cloud platform data
  33. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control
  34. HTTP 1. X learning notes: an authoritative guide to Web Performance
  35. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control
  36. HTTP 1. X learning notes: an authoritative guide to Web Performance
  37. JQuery:JQuery基本语法,JQuery选择器,JQuery DOM,综合案例 复选框,综合案例 随机图片
  38. Event bubble and capture in JavaScript
  39. The root element is missing solution
  40. Event bubble and capture in JavaScript
  41. Configure the certificate to enable ngnix to publish the trusted website of HTTPS
  42. Javascript数据类型
  43. HTTP interface debugging tool! 48000 star HTTP command line client!
  44. Parameter encryption of front end URL link band
  45. HTTP interface debugging tool! 48000 star HTTP command line client!
  46. Three front end frameworks: data binding and data flow
  47. Reading Axios source code (1) -- exploring the realization of basic ability
  48. Event bubble and capture in JavaScript
  49. 【微前端】微前端最終章-qiankun指南以及微前端整體探索
  50. R & D solution e-Car front end monitoring system
  51. [JS] 877 - 35 wonderful knowledge of JavaScript, long experience!
  52. R & D solution e-Car front end monitoring system
  53. High performance nginx HTTPS tuning - how to speed up HTTPS by 30%
  54. 解决ajax跨域问题【5种解决方案】
  55. Top ten classic sorting of JavaScript
  56. HTTP 1. X learning notes: an authoritative guide to Web Performance
  57. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control component
  58. My http / 1.1 is so slow!
  59. Why Vue uses asynchronous rendering
  60. The response status was 0. Check out the W3C XMLHttpRequest Level 2 spec for