Technology Sharing

Front-end data collection and reporting

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Original URL

What is a buried point?

The scientific name is Event Tracking, which mainly captures, processes and sends related technologies and implementation processes for user behaviors or business processes.
Buried point is a professional term in the data field, and is also a common name in the Internet field.

Tracking data is the basis of product data analysis and is generally used for feedback from recommendation systems, monitoring and analysis of user behavior, statistical analysis of new features or effectiveness of operational activities, etc.

Tracking includes two important concepts: event and param.

  • Event: What happened in the application, such as user operations, system events, or system errors. For example, the following events are included in the Nipai product: enter_page (enter page), leave_page (leave page).
  • Attributes (param): Attributes defined to describe user group segments, such as language preference or geographic location. Taking the "enter after-class exercise" event as an example, it contains the following event attributes: enter_from (from which page), class_id (course id), etc.
  • Attribute value: the dimension of the attribute, that is, the specific dimension when the behavior is triggered. For example: enter_from: home, system, etc.

Mainstream solution

Traceless embedding (full embedding) uses the built-in monitoring method of the browser or APP to monitor the user's browsing pages, clicks, and other behaviors. It is generally used for coarse-grained data analysis, such as the company's slardar
shortcoming:

  • The data is noisy, and data will be collected regardless of whether it is useful or not
  • Unable to customize tracking points and unable to collect specified events and business attributes
  • Less information available to DA
    advantage:
  • Easy access, almost non-intrusive, no additional development costs required
  • The collection of user operation behaviors is very complete and almost nothing is missed

Code tracking, front-end developers customize monitoring and collection in the code
shortcoming:

  • The workload is large, and it is highly invasive to the code, making it inconvenient to maintain later.
    advantage:
  • It can accurately bury points and has clear event identification
  • The attributes of the salesperson are very rich
  • The triggering method of the buried point can be flexibly defined
  • DA is more convenient and accurate to use

The tracking SDK exposes the interface for reporting tracking points. Developers are unaware of the monitoring and collection process. For example, the company's tea
Disadvantages: NA
advantage:

  • Business development only needs to focus on event identification, business attributes, etc.
  • Taking into account the advantages of both traceless tracking and code tracking

Common burial point attributes

Usually the front end counts the buried points according to the page dimension. Common event attributes are as follows

Attributesdescribe
uidUser ID. If the user is not logged in, the specific ID is returned.
urlThe URL of the page that triggers the current event
eventTimeTimestamp of triggering the tracking point
localTimeThe local time of the user who triggered the tracking point, expressed in the standard YYYY=MM-DD HH:mm:ss format, which is convenient for direct string query later
device TypeThe type of device currently used by the user, such as Apple, Samsung, Chrome
deviceIdThe device id used by the current user
osTypeSystem type: windows, macos, ios, android
osVersionsystem version
appVersionApplication version
appIdCurrent application id
extraCustom data, usually a serialized string, and the data structure should remain stable

Common buried events

eventTiming of reportingdescribe
Page stayWhen the current page is switched or the page is unloadedRecord the previous page browsing time
pvWhen entering the pagePage visit count, uv only needs to be filtered based on deviceId
Interaction EventsWhen a user interaction event is triggeredFor example, click, long press, etc.
Logical eventsWhen the logical conditions are metFor example, login, page jump, etc.

Performance Data Collection Solution

Currently, most performance indicator data comes from the window.performance API.

insert image description here
insert image description here

parameter namedescribe
connectEndHTTP (TCP) Returns the timestamp when the connection between the browser and the server is established. If a persistent connection is established, the return value is equal to the value of the fetchStart attribute. Connection establishment means that all handshake and authentication processes are completed.
connectStartHTTP (TCP) The timestamp when the domain name query ends. If a persistent connection is used, or this information is stored in a cache or local resource, this value will be consistent with fetchStart.
domCompleteThe timestamp when the current document parsing is complete, that is, Document.readyState becomes 'complete' and the corresponding readystatechange is triggered
domContentLoadedEventEndThe timestamp when all scripts that need to be executed immediately have been executed (regardless of the order in which they are executed).
domContentLoadedEventStartWhen the parser sends the DOMContentLoaded event, the timestamp when all scripts that need to be executed have been parsed.
domInteractiveThe timestamp when the DOM structure of the current web page is parsed and embedded resources are loaded (that is, when the Document.readyState property changes to "interactive" and the corresponding readystatechange event is triggered).
domLoadingThe timestamp when the DOM structure of the current web page begins to be parsed (that is, when the Document.readyState property changes to "loading" and the corresponding readystatechange event is triggered).
domainLookupEndThe time it takes for a DNS domain name query to complete. If a local cache (i.e. no DNS query) or a persistent connection is used, this is equal to the fetchStart value.
domainLookupStartUNIX timestamp when the DNS domain name query started. If a persistent connection is used, or this information is stored in a cache or local resource, this value will be consistent with fetchStart.
fetchStartThe timestamp when the browser is ready to fetch the document using HTTP request. This happens before checking any application cache.
loadEventEndWhen the load event ends, the timestamp of when the load event is completed. If this event has not been sent yet, or has not yet completed, its value will be 0.
loadEventStartThe timestamp when the load event was emitted. If the event has not yet been emitted, its value will be 0.
navigationStartThe timestamp when the previous page unloaded (unload) of the same browser ended. If there is no previous page, this value will be the same as fetchStart.
redirectEndThe timestamp of when the last HTTP redirect was completed (that is, when the last byte of the HTTP response was received directly). If there were no redirects, or the redirects were to a different origin, this value will be 0.
redirectStartThe timestamp when the first HTTP redirect started. If there is no redirect, or the redirects are to a different origin, this value will be 0.
requestStartReturns the timestamp of when the browser made the HTTP request to the server (or started reading the local cache).
responseEndReturns the timestamp of when the browser received the last byte from the server (or read from the local cache, or read from the local resource) (if the HTTP connection has been closed before this, it returns the close time).
responseStartReturns the timestamp of when the browser received the first byte from the server (or read it from the local cache). If the transport layer fails after starting the request and the connection is reopened, this property will be counted as the corresponding start time of the new request.
secureConnectionStartHTTPS Returns the timestamp when the browser and server started the handshake for a secure connection. If the current web page does not require a secure connection, it returns 0.
unloadEventEndCorresponding to unloadEventStart, the timestamp when the unload event processing is completed. If there is no previous page, this value will return 0.
unloadEventStartThe timestamp when the previous page unload event was thrown. If there is no previous page, this value will return 0.

Common performance indicators

Indicator Namedescribe
FPFirst paint time
FCPThe first time the page is drawn.
FMPThe first effective drawing time of the page FMP>=FCP
TTIPage fully interactive time
FIDDelay time of the user's first interactive operation during the page loading phase
MPFIDThe maximum delay time that user interaction may encounter during the page loading phase
LOADThe time when the page is fully loaded (the time when the load event occurs)

FP

The FP (First Paint) indicator usually reflects the white screen time of the page. The white screen time will reflect the current network loading performance of the web page. When the loading performance is very good, the white screen time will be shorter and the probability of user loss will be lower.

This indicator can be obtained through the preparation.getEntriesByType('paint') method. The dot information provided by the PerformancePaintTming API is found and the object named first-paint is described, which is the indicator data of FP:
insert image description here

FCP

FCP (First Contentful Paint) is the time point when content is first rendered. In performance statistics, the time from the time when the user starts to access the web page to FCP can be regarded as the content-free time. Generally, FCP >= FP

The indicator can be obtained through the performance.getEntriesByType('paint') method. The dot information provided by the PerformancePaintTiming API is found, and the object named first-contentful-paint is described, which is the indicator data of FCP, as shown in the following figure:

FMP

FMP (First Meaningful Paint) is the time it takes to draw meaningful content for the first time. When the layout and text content of the entire page are rendered, it can be considered that the first meaningful content has been drawn. Therefore, FMP measures the time it takes for users to see the main content of the web page, and is an important measurement indicator from the perspective of user experience.

A method of calculating FMP that is now widely accepted in the front-end industry is "the drawing time after the largest layout change during the loading and rendering process of the page." You can use MutationObserver to monitor every DOM change of the entire page, trigger the callback of MutationObserver, and calculate the change score of the current DOM tree in the callback. The moment when the score changes most drastically is the time point of FMP.

TTI

TTI (Time To Interactive) is the time it takes from the page loading to the page being fully interactive. When the page is fully interactive, the following three conditions are met:

The page already displays useful content.
The event response functions associated with the visible elements on the page have been registered.
The event response function can start executing within 50ms after the event occurs.

Resource loading indicators

window.performance.getEntriesByType('resource') returns various performance indicators of all resources (js, css, img...) loaded on the current page, which can be used to collect static resource performance data.

The main types are: script, link, img, css, xmlhttprequest, beacon, fetch, and other.
PerformanceResourceTiming - Web APIs | MDN

Other indicator calculation methods

Indicator namedescribeCalculation
DNS LookupDNS phase timedomainLookupEnd - domainLookupStart
TCP ConnectionTCP phase timeconnectEnd - connectStart
SSL connectionSSL connection timeconnectEnd - secureConnectionStart
First Byte Network RequestTime to first byte (ttfb)responseStart - requestStart

Wrong data collection scheme

There are three types

  • Resource loading error. Use addEventListener('error', callback, true) to capture resource loading failure errors during the capture phase.
  • js execution error, capture js error through window.onerror.
    • Cross-origin scripts will give a "Script Error." prompt, and no specific error information or stack trace information can be obtained. In this case, you need to add the crossorigin="anonymous" attribute to the script tag, and the resource server needs to add CORS related configurations, such as Access-Control-Allow-Origin: *
  • Promise error, catch promise error through addEventListener('unhandledrejection', callback), but there is no information such as the number of rows and columns where the error occurred, so you can only throw relevant error information manually.
// 在捕获阶段,捕获资源加载失败错误
Element.addEventListener('error', e => {
const target = e.target
    if (target != window) {
        monitor.errors.push({
            type: target.localName,
            url: target.src || target.href,
            msg: (target.src || target.href) + ' is load error',
            time: Date.now()
        })
    }
})

// 监听 js 错误
window.onerror = function(msg, url, row, col, error) {
    monitor.errors.push({
        type: 'javascript',
        row: row,
        col: col,
        msg: error && error.stack? error.stack : msg,
        url: url,
        time: Date.now()
    })
}
// 监听 promise 错误 缺点是获取不到行数数据
addEventListener('unhandledrejection', e => {
    monitor.errors.push({
        type: 'promise',
        msg: (e.reason && e.reason.msg) || e.reason || '',
        time: Date.now()
    })
})

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

Data reporting plan

In this scenario, there are two issues to consider:

If the data reporting interface and the business system use the same domain name, the browser has limits on the number of concurrent requests, so there is a possibility of competition for network resources.
The browser usually ignores asynchronous Ajax requests when the page is unloaded. If data requests must be made, a synchronous Ajax request is usually created in the unload or beforeunload event to delay the page unloading. From the user's perspective, the page jump is slow.

Navigator.sendBeacon

advantage:
Reliably sends data when the page is unloaded, without blocking the page from closing.
Supports sending data in the background.

shortcoming:
Only POST requests can be sent, and response results cannot be obtained.

insert image description here
Except for Internet Explorer, the current mainstream modern browsers have a very high support rate for beacon. Beacon - MDN documentation

The Beacon interface is used to schedule asynchronous non-blocking requests to a web server.

  • Beacon requests use the HTTP POST method and do not require a response.
  • Beacon requests ensure that initialization is completed before the page triggers unload.

In layman's terms, Beacon can send data asynchronously to the server and ensure that the request is sent before the page is uninstalled (solving the problem that the request will be terminated when the ajax page is uninstalled). How to use it is as follows:

navigator.sendBeacon(url, data);
  • 1

The data parameter is optional and can be of type ArrayBufferView, Blob, DOMString or FormData. If the browser successfully adds the beacon request to the queue to be sent, this method will return true, otherwise it will return false.

When using Beacon, the backend needs to use the post method to receive parameters. Considering the cross-domain issue, the backend also needs to modify the interface to configure CORS. At the same time, the request header must meet the CORS-safelisted request-header, where the content-type type must be application/x-www-form-urlencoded, multipart/form-data, or text/plain.

type ContentType = 'application/x-www-form-urlencoded' | 'multipart/form-data' | 'text/plain';

const serilizeParams = (params: object) => {
    return window.btoa(JSON.stringify(params))
}

function sendBeacon(url: string, params: object) {
  const formData = new FormData()
  formData.append('params', serilizeParams(params))
  navigator.sendBeacon(url, formData)
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

Image

advantage:
It is easy to use, has good compatibility, and can report across domains.
Will not block page loading and closing.

shortcoming:
Only GET requests can be sent, and the response results cannot be obtained.
Asynchronous operations are not supported.

The compatibility issue of sendBeacon is inevitable, but you can make full use of the feature that most browsers will complete the loading of images before the page is unloaded, and report data by adding img to the page.

function sendImage(url: string, params: object) {
  const img = new Image()

  img.style.display = 'none'

  const removeImage = function() {
    img.parentNode.removeChild(img)
  }

  img.onload = removeImage
  img.onerror = removeImage

  img.src = `${url}?params=${serilizeParams(params)}`

  document.body.appendChild(img)
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

Since img images are requested using the get method, different servers have limits on the length of the URI. If the length exceeds the limit, an HTTP 414 error will occur. Therefore, you should also pay attention to the reporting frequency to reduce the number of attributes uploaded at one time.

Compatible solutions

The sendBeacon method is preferred, and the Image method is used as fallback.


function sendLog(url: string, params: object) {
    if(navigator.sendBeacon) {
        sendBeacon(url, params)
    } else {
        sendImage(url, params)
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

In fact, many people use GIF to embed points?
To report data to the server, you can request an interface, request a normal file, or request an image resource. As long as the data can be reported, the server does not care about the specific reporting method, whether it is requesting a GIF file, requesting a JS file, or calling a page interface. So why do all systems use the unified method of requesting GIF images to report data?
●Prevent cross-domain
Generally speaking, the domain name is not the current domain name, so all interface requests will constitute cross-domain. Cross-domain requests are easily intercepted and reported by the browser due to improper configuration, which is unacceptable. However, the src attribute of the image will not cross domain, and requests can also be initiated. (Excluding interface reporting)
●Prevent blocking page loading and affecting user experience
Usually, after creating a resource node, the browser will not actually send a resource request until the object is injected into the browser DOM tree. Repeated DOM operations will not only cause performance problems, but loading js/css resources will also block page rendering and affect user experience.
However, image requests are an exception. To construct an image, you don't need to insert it into the DOM. You can initiate a request by creating a new Image object in js. There is no blocking problem. In a browser environment without js, you can also use the img tag to request it normally. This is something that other types of resource requests cannot do. (Excluding file methods)
●Compared to PNG/JPG, GIF is the smallest
The smallest BMP file requires 74 bytes, PNG requires 67 bytes, and the legal GIF only requires 43 bytes.
For the same response, GIF can save 41% of traffic compared to BMP and 35% of traffic compared to PNG.
And most of them use 1*1 pixel transparent GIF for reporting
1x1 pixel is the smallest legal image. In addition, because it is dotted through an image, it is best to make the image transparent, so that it will not affect the display effect of the page itself. Secondly, to indicate that the image is transparent, just use a binary bit to mark the image as transparent, without storing color space data, which can save volume.

XMLHttpRequest或Fetch API

advantage:

Asynchronous requests can be sent, supporting multiple HTTP methods such as GET and POST.
The response result can be obtained and further processed.
shortcoming:

Requires manual processing of request and response logic.
Need to handle cross-domain request issues (such as setting up CORS).

Use XMLHttpRequest or Fetch API to send asynchronous requests to report data. You can choose to use GET or POST method and send data as request body or URL parameters.

WebSocket

advantage:

Good real-time performance and support for two-way communication.
Suitable for real-time monitoring and large-scale data reporting.
shortcoming:

The server needs to support the WebSocket protocol.
It is relatively complex and not suitable for simple tracking needs.

Reporting platform

Common front-end data tracking tools include Google Analytics, Baidu Statistics, Umeng Statistics, etc. Of course, you can also use the company's internal interface or platform for reporting

Take Google Analytics as an example:
Google Analytics is a website analysis tool developed by Google that is used to track and report website traffic. It helps website owners understand visitor behavior, including who they are, where they come from, what they do on the website, and more. With Google Analytics, website owners can better understand their audience, optimize website content and marketing strategies, and thus improve website performance and user experience. Google Analytics provides a wealth of data analysis features, including real-time data, user behavior analysis, conversion tracking, traffic source analysis, and more. It is a powerful tool that is widely used in various websites and online marketing activities.

How to Use Google Analytics

Since we are using Google Analytics, first of all, we must have a Google account, which you need to create yourself. Secondly, you need to know the entrance to Google Analytics. Here are the two addresses used:

Google Tag Manager:tagmanager.google.com/

Analytics:analytics.google.com/

Google Tag Manager
Google Tag Manager (GTM) is a tag management system developed and provided by Google. It allows webmasters to manage and deploy various tracking codes, analysis codes, and marketing tags without modifying the website code. With GTM, users can easily add, update, and delete tags without relying on developers.

The main functions of GTM include:
In plain words, this platform is used to collect the embedded events triggered by the front end, and can realize data reporting by customizing the trigger conditions and trigger event callbacks. Here, it is used to collect data and report it to Google Analytics.

Google Analytics
As the name suggests, it is a website used to collect, view and display data.