Abstract ： Huang Ting, cloud video architect of Huawei , We will start with the status quo of video transmission , Analyze the logic behind choosing different video transmission modes for different services , Share Huawei cloud new media network value proposition .
The development of audio and video industry , User's definition of audio and video quality 、 The fluency of playing 、 Low latency of interaction 、 The requirement of breaking through terminal limit is higher and higher . These requirements objectively pose higher challenges to video transmission , At present, the video transmission modes of different services are different , Based on the current situation of video transmission , Comprehensive system to meet user needs , It has become a common problem in the industry .
Huang Ting, cloud video architect of Huawei , We will start with the status quo of video transmission , Analyze the logic behind choosing different video transmission modes for different services , Share Huawei cloud new media network value proposition . The content mainly covers the following aspects ： The characteristics of video development ; influence IPTV/OTT/RTC The logic behind the choice of audio and video transmission technology ; Combined with insight into the future audio and video transmission industry , Huawei cloud's new media network value proposition .
digital transmission IP turn , stay TV field , From traditional digital TV based on Cable The transmission of , To IPTV field , be based on IP transmission , And in the field of production , Based on tradition SDI The transmission of , Then based on IP The transmission of , These are digital transmissions IP The embodiment of transformation . digital transmission IP Video distribution is based on public domain .
Public domain of video distribution , If there is a public domain, there is a corresponding private domain , Private video distribution generally refers to video distribution in manageable network , Public domain generally refers to Internet based distribution , Common are : from IPTV To OTT; From in-house video conferencing to online video conferencing ; The public domain of video distribution is closely related to the development of video transmission technology .
Business experience diversification ： That is, different businesses have different requirements for experience specifications , There are three main aspects ： quality , scale , Time delay ; Such as live broadcast delay <5s;RTC Time delay <400ms; Cloud game delay <10
Let's take a look here VR The field is changing . at present VR There are two main forms , One is PC VR Pictured above , Players are playing VR During the game, the helmet needs to be connected to a wire PC host , The game runs in PC On a host , Limited activity flexibility . The other is the all-in-one machine VR, This is a ALL IN ONE Design , Without this “ Braid ”, The activity is flexible , But because the computing unit is also on this helmet , So limited by the power consumption , The calculation power is relatively small .
Now there's an all-in-one machine +PC The plan , Neither “ Braid ”, It can also be used PC Calculation power , That is, through Wifi take PC The rendered image is encoded and transmitted to VR Helmet . be relative to PC VR, It can be seen as digital transmission IP turn .
cloud VR It's a step closer . cloud VR Directly use cloud computing power to render the game , At this time, the video needs to be distributed in the public domain network , At the same time, in order to meet the requirements of the experience , It also needs further optimization of communication technology and video transmission technology . The current experience effect and ideal situation , There is still a certain gap .
In addition to delay, there are many differences in experience , For example, in terms of scale , Different businesses have different requirements for distribution . For example, cloud games have higher requirements on time delay , The demand for quality is also very high , If you've played the king's glory , Should know , If you only drive 30 frame , Will obviously feel that the game screen is not smooth enough .
Understanding of video requirements in different application scenarios , It can also help us understand the logic of technology selection in different business areas , It can make us find the shortcomings of current technology more quickly . Next , Will be from IPTV、OTT、RTC Business perspective , Sort out the logic behind the selection of audio and video transmission technology .
IPTV It is a set of system which is built by the operator ,IPTV Its main business includes live broadcasting 、 on demand 、 Time shift 、 Look back and NPVR etc. , And at the same time to achieve TV The quality requirement of grade 1 , It's live all day long .IPTV The main advantage is that operators can build their own manageable network , To protect TV Level of user experience .
IPTV There are two main transmission modes used ： One is multicast technology , It is mainly used in live broadcasting business . This technology greatly reduces the peak of business , The pressure of streaming media servers . The other is unicast Technology , Used RTSP Media signaling control , Use RTP Protocol for audio and video data transmission , Unicast technology is mainly used in on demand 、 Time shift 、 Look back 、NPVR Other business .
At present 2 A way to solve this problem ：FEC and ARQ（ Also called RET）.FEC It is mainly used in multicast scenarios ; It is effective for random packet loss , At the same time, because it's channel level redundancy generation , There is no need to generate redundant messages for each user independently , So it's more efficient .ARQ It can be used in multicast and unicast scenarios . It can better solve the continuous packet loss problem . In the multicast scenario , Generally, this is 2 Technology will be used at the same time .
Channel switching time mainly refers to , The time when the user presses the remote switch button until the corresponding screen appears . Here I mainly introduce how to shorten this time in multicast scenarios . First of all, we know that we need to make the screen display quickly , It needs to be able to decode quickly . The time for STB to join the multicast group depends on when the user switches off , It's random , So the initial set-top box received the message can not immediately start decoding , This will slow down the speed of channel switching .
We've introduced an independent FCC The server , The STB requests a unicast stream from the server while joining the multicast group ,FCC The server can ensure that every request comes from I The frame starts sending . In this way, the set-top box can decode the message when it first receives it , So as to improve the speed of channel switching . Before optimization, the channel switching time needs to be 1-3s, After optimization, it can be shortened to 300-500ms.
IPTV Essentially, TV Audio and video transmission technology IP turn , Because the network conditions are better , So the choice of technology is not too complicated , More emphasis is placed on system stability and cross manufacturer integration .
Because of the huge number of online users ,OTT Video platform is the primary solution to the problem of large-scale distribution of video content . First , A wide range of services , Need to be distributed to global users , Public domain of video transmission 、 Providing services across carriers ; secondly , Large scale of users ; Last , We need low cost and reliable service .
At present, the mainstream solution is to adopt mature third party CDN Services are distributed . for example Netflix, As the size of the business grows , Towards self construction CDN（Open Connect）, But still for third parties CDN friendly , In this way, we should build our own CDN After the breakdown , Can quickly transfer traffic to a third party CDN Service for , Ensure business availability .
Besides ,OTT VOD also faces a series of experience problems ： for example ： Bandwidth quality is not stable , This leads to a decline in the playback experience ; Terminal because CPU Occupied affects the stability of player decoding ; Due to different average access conditions in different countries and regions , How to make a content meet the experience requirements of different users and different terminals at the same time .
2009 It began to appear in succession HLS、MSS、DASH etc. ABR technology ,ABR Technology based on real-time detection of user bandwidth and terminal side CPU Usage rate , Adjust the quality of the video stream . These technologies are good for HTTP CDN It's also friendly . however ,ABR It only standardizes the specification of server and client . The quality of the experience , It also depends on the rate adaptive algorithm .
Live broadcast can be subdivided into E2E Delay insensitive and sensitive .
The first category ： Live news, for example , Because there is no requirement to interact with the audience, it belongs to time delay insensitivity . So they can still choose to be right CDN Amicable HLS and DASH agreement , But the delay can be as high as 10-30s.
The second category ： For example, webcast, etc , You need a barrage with the audience 、 Comments and other interactions , So the live broadcast is required E2E The delay must be less than 5s, The technology stack selected by this kind of manufacturer is the one with lower delay RTMP and HTTP FLV The way .
There are some differences between overseas technology stack and domestic technology stack , Because overseas we have to consider a lot of web End customers , Low delay transmission technology is basically based on CMAF Format based . There are three types of technologies ： Namely DASH LL、LLHLS and LHLS. Based on this technology stack E2E Time delay can also be achieved 5s within .
OTT Personal live experience , Another very important point is the stability of upstream push flow , Because once the streaming quality is not good , The viewing quality of the whole network will decline . At present, there are three main types of streaming protocols ： Namely RTMP、SRT and RIST, among RTMP It's the mainstream , Advantage is ： mature 、 Stable 、 Good ecology , All kinds of coding tools basically support .SRT and RIST Is based on UDP transmission , The main advantage is ： Long distance transmission （ for example ： Transoceanic ）、 High bit rate transmission 、 Weak network transmission . In addition, compared with TCP Layer congestion algorithm optimization ,SRT and RIST Transmission algorithms can be optimized at the application layer , It's more convenient to update . Some large cross ocean live streaming of the first kilometer will use this kind of protocol .
SRT There is relatively mature open source community support .RIST It only defines standardized Syntax , Allow manufacturers to innovate algorithms on this basis , And it doesn't affect interoperability .
As the epidemic continues , The demand for real-time interactive class is exploding rapidly ,RTC Technology in entertainment 、 Live Lianmai 、 Online education 、 Online meetings 、 Medical finance and other scenarios , It has a wide range of applications .
RTC There are mainly MESH、SFU、MCU Three types of Architecture ,MESH The advantage of architecture is simplicity , No server involvement is required . The disadvantage is that when there are more and more people , Client side CPU、 The pressure of network resources will be more and more great , No more than 6 At the same time , The direction of improvement is to increase servers , Centralized architecture ：SFU、MCU.
SFU The server is only responsible for forwarding the client's data , Comparison MESH The way the client's uplink bandwidth pressure and CPU The consumption of resources has been greatly reduced . The disadvantage is ： The downlink still needs multiple streams . adopt MCU Mixed streaming on the server side 、 Transcoding can solve this problem , The disadvantage is ： Server side computing pressure increases , The flexibility of picture combination is not enough , Deployment costs are compared to SFU Higher .
Centralized SFU and MCU Architecture for small-scale scenarios , For example, the traditional enterprise internal video call such as private scene . With the rise of public domain business , Centralized SFU and MCU Architecture can't meet the requirements . for instance ： A conference where users a、b In China, , user c、d In the U.S. , Centralized SFU If deployed in the United States , Then the user a and b The communication between them is not good ; conversely , Then the user c and d The communication between them is not good .
Cascade SFU framework , Allow a meeting to span multiple SFU. cascade SFU The advantage is ： Allow the number of participants to grow dynamically ; Through the appropriate routing strategy , Reduce transnational 、 Transmission bandwidth cost across carriers ; Access through local proximity , So that the terminal can be connected with the nearest SFU Fast error recovery , So as to improve the experience of real-time audio and video communication ; The evolution part of architecture solves RTC The problem of business commonality and scale .
And cascading SFU There are still some problems left unsolved , for example ： How to satisfy the same room at the same time , There's no problem with different audience experiences , The industry generally has 2 One technology ： Namely SVC and Simulcast.
Simulcast It's also called broadcast , It's from the sender to SFU Send multiple video streams , Different quality levels ,SFU According to the network conditions , Screen layout, etc , Decide which stream to send to the receiver . The advantage of simulcast is that there is no additional requirement for the traditional decoder ; The disadvantage is that it takes up a lot of bandwidth .
SVC： That's scalable coding , A coding technique that creates a single video stream in a layered manner . Each layer adds the quality of the previous layer , Support time domain 、 airspace 、 There are three ways of quality domain ,SFU Decide which layers to send to the receiver , At present, the mainstream is time domain mode . The advantage is that the bandwidth is small ; The disadvantage is that only some decoders support SVC decode .
contrast OTT ABR In the server side to complete multi rate coding ,RTC Multi rate coding is completed in the end test , Reduced one transcoding , This can reduce E2E Time delay , This is also the difference of technology selection brought by the diversification of business experience .
because RTC It is mainly used in business scenarios with high requirements for low latency , therefore RTC More “ positive ” The way , Responding to network changes , To improve the user experience .
First RTC From the transmission of the underlying technology to choose RTP over UDP Real time streaming media transmission mode , This strategy provides a positive basis for the follow-up .RTC Common domain transmission is compared to IPTV More abundant packet loss recovery means in private area transmission , Include ：FEC、NACK、RED、RTX and PLI etc. .
These packet loss recovery methods are not enough , The client still needs to have some Buffer, To resist network jitter and packet loss , Otherwise, after retransmission , this 1 Frames may be out of date . But add buffer It will also increase the delay , So we have a dynamic on the end side Jitter Buffer The algorithm of , To solve the problem of packet loss 、 Disorder and delayed arrival . At the same time can also smooth the display of the frame rate .
The core problem of low delay is to avoid network congestion , Once there's a lot of buffer, The delay will increase , At this time, we need to solve the problem by congestion control algorithm . The goal of congestion control algorithm is ： Give Way “ Sending rate ” Close to “ Available rate ”, And keep it as low as possible “ Queue occupancy ”.
RMCAT It's a IETF team ; Their work includes ： Define requirements ; The design is based on RTP Real time streaming media protocol transmission congestion control algorithm . There are three kinds of RMCAT The algorithm includes ：GCC、NADA and SCReAM. among GCC Because it's used in Chrome The browser , It is a relatively mature algorithm at present . Include GCC-REMB And the new version GCC-TFB. The advantage of the new version is ： One end controls the algorithm , Good for version evolution , At the same time, the originator can be based on different content properties , Allocate different bandwidth for transmission , More flexible .
First, many businesses . There are more and more types of edge business , From the now mature download 、 on demand 、 live broadcast 、RTC, In the rapid development of cloud games 、 cloud XR etc. ; Deploying different types of services on the same node , Including the cache 、 Push flow 、 Pull flow 、 forward 、 Cloud rendering, etc ; Chimney architecture faces a series of problems ： Including the network 、 Calculation 、 Storage resource management 、 Differentiated experience management, etc .
Second, high requirements . The new media expression is more immersive , Higher requirements for audio and video transmission . And this improvement is all-round , It mainly includes ：
The main indicators of flat video ： Including second opening rate , Caton rate 、 And broadcast success rate , And influence VR There are more factors to experience immersion .
Third, rapid development . In the increasingly competitive environment of the industry , Enterprises are required to have a differentiated experience , Objectively, it requires fast innovation , Technology is developing fast , In this process, our customers have encountered some pain points ：
How to deal with these three challenges , We put forward the value proposition of Huawei cloud new media network . The vision is to create an entertainment oriented video 、 Communication video 、 New media network of industry video , To meet the requirements of efficient video transmission .
Our value proposition is ：
The new media network has the following characteristics at the same time ：
Low latency 、 All connected 、 Large scale real-time audio and video distribution
New media network based on Huawei cloud , We support online education technology upgrades , Create a better online education platform . Under the traditional architecture , To achieve low latency interaction and large-scale distribution, we need to use 2 A product RTC and CDN, It exists like this 4 A question ：
Based on the architecture of Huawei cloud new media network , Only one Huawei is needed RTC service , You can realize the original 2 The function of a product , The main advantages are ：
High throughput 、 Immersive new media transmission
Huawei cloud Tile wise Streaming technology , Solved the current VR Two big problems of industry ： First of all VR Helmets have limited computing power , Unable to support VR 8K Hardware requirements for content ; second VR Full content transfer , Too much bandwidth consumption .
Our solution is ： The original 8K VR The content is preprocessed , It turns into two streams , One is 4K Panoramic background stream and a HD foreground stream . At the same time, the HD foreground stream Tile Divide . The player depends on the user's field of view , Choose the corresponding high definition Tile Download in blocks , Download at the same time 4K Panoramic background stream , For a brief use when turning the head .
The advantage of this scheme is ：4k Hard decoding terminal can play 8K VR Content ; Network download bandwidth reduced 75%; We collaborate through the end-to-end cloud , To achieve the user turn to the high-definition screen display delay, only need 100-200ms, The human eye can hardly perceive .
End 、 edge 、 Cloud collaborative innovation , Flexible definition of media processing pipeline
At present, Betta is working with huaweiyun to create the cloud special effects market , Let go of your imagination , Create a better interactive live experience .
There are several advantages to this scheme ： First of all 、 It provides an innovative way to play for the live broadcast stage ： Special effects run directly in the cloud 、APP Lower consumption , The anchor doesn't have to worry about the battery anymore ; Cloud server performance is strong , Special effects are better , Advanced special effects algorithm selection more .
The second point is to form an algorithmic ecology ： Cloud algorithm ecological aggregation of various special effects , for example : Different faces 、 The beauty effect of skin color ; The innovation cycle is shorter , The anchor can experience all kinds of special effects faster .
The third point is a quality experience ： Relying on Huawei cloud new media network , Based on Huawei RTC Real time beauty , The delay can be lower than 400ms; The new effects come into effect in real time , No need to update APP.
Three characteristics of video development ：
Three magic weapons of video transmission technology selection ：
Three value propositions of Huawei cloud new media network ：