Intelligent Vision - Huawei Technical Support

90
Huawei HoloSens 2020 Intelligent Vision Tech Express

Transcript of Intelligent Vision - Huawei Technical Support

Huawei HoloSens

2020

Intelligent VisionTech Express

CON

TEN

TS

Abbreviations

Legal Statement

Product Portfolio

Products and Solutions Catalog

CONTENTS

015G

Preface

02AI

Products and Solutions Catalog

SuperColor Technology

Storage EC TechnologyMulti-Lens Synergy Technology

Video Codec Technology

Chip Evolution and DevelopmentAlgorithm Repository Technology

Products and Solutions Catalog

03Cloud Service

04Ecosystem

05Appendix

P2P Technology

Products and Solutions Catalog

02

05

10

15

17

3236

63

67

72

74

78

81

82

83

42

47

52

56

60

Discussion on Frontend Intelligence Trends 24

28

Embrace the Intelligent Vision, Build an Intelligent World

Discussion on the Impact of 5G on Intelligent Vision

Discussion on Intelligent Vision Ecosystem Trends

Discussion on Video Cloud Service Trends

5G-enabled Image Encoding and Transmission Technologies

Image, Algorithm, and Storage Trends Led by AI

Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms

Build an Intelligent World

02

Embrace the Intelligent Vision

— President of Huawei Intelligent Vision Domain

In the past 120 years, three industrial revolutions have made breakthroughs in fields such as electricity and information technologies, dramatically improving productivity and our daily life. Today, the fourth industrial revolution, driven by AI and ICT technologies, ushers in an intelligent era where all things are sensing, interconnected, and intelligent. Vision, the core of biological evolution, will serve as a significant enabler in this era. The combination of AI and vision systems will enable machines to perceive information and respond intelligently, which revolutionizes people's work and everyday life, and improves productivity and security.

Today, we are delighted to see that new ICT technologies, such as 5G, AI, and machine vision are being put into commercial use, and playing a significant role in the video surveillance industry. 2020 marks the first year of 5G commercialization as well as a turning point of AI development. Additionally, machine vision now surpasses human vision to obtain more information in specific scenarios. The three technologies are interwoven with each other, fueling the development of intelligent vision.

Intelligent vision serves as the eyes of the intelligent world, the core of worldwide sensory connections, and a key enabler for digital transformation of industries. Huawei Intelligent Vision looks forward to, together with our partners across indus-tries, driving industry development and the intelligent transformation of cities, production, and people's life with the power of technology, to build an intelligent world where all things can sense.

Huawei remains steady in its commitment to embed 5G technologies into intelligent vision, which opens up opportunities by providing high bandwidth, low latency, and broad connection capabilities.

Huawei is developing intelligent cameras like how we develop smartphones by revolutioniz-ing the technical architecture, ecosystem, and industry chain. Huawei embeds innovativeoperating system (OS) into software-defined cameras (SDCs) to enable remote loading of intelligent algorithms anytime, anywhere. The HoloSens Store allows users to download and install algorithms on cameras depending on their needs.

Huawei adheres to the "platform + ecosystem" strategy to build a future-proof intelligent vision ecosystem and empower more industries. Huawei is committed to providing platforms and opening algorithms and applications to benefit vendors and customers across industries.

Huawei develops cloud-edge-device synergy to maximize data value. Huawei will give full play to the technical advantages of the device-edge-cloud industry chain, develop devices based on cloud technologies, and empower the cloud through interconnection with various devices, thereby advancing the digital transformation of all industries.

03

01

5GDiscussion on the Impact of 5G on Intelligent Vision

5G-enabled Image Encoding and Transmission Technologies

Products and Solutions Catalog

05

10

15

05

Comparison between 5G and 4G

Latency10 ms Latency 1 ms

Downlink service rate10 Mbit/s

Downlink service rate 2 Gbit/s

Uplink service rate1 Mbit/s

Uplink service rate 200 Mbit/s

4G 5G

3. Impact of 5G on Intelligent Vision

In the 4G era, video services were limited to the consumer field. This was due to the low bandwidth and high latency of 4G networks. However, compared with 4G, 5G improves the service rate by about 100-fold, and reduces latency by about 10-fold, enriching video application scenarios, from remote areas with complex terrains, to mines, factories, harbors with cabling difficulties, and places requiring security for major events.

Extending the breadth of intelligent vision

Niu Liyang, Liu Zhen

Discussion on the Impact of 5G on Intelligent Vision

1. 5G Development

New 5G infrastructure is driving the expansion of the global digital economy, and each country’s information capability is represented by the state of their 5G networks. 5G is even revolutionizing the whole industry chain, from electronic devices to base station devices to mobile phones. Therefore, major economies around the world are accelerating their application of 5G and actively exploring upstream and downstream industries to seize the strategic high ground. According to TeleGeography, a prominent telecommunications market research company, the number of global 5G networks in commercial use had reached 82 by June 2020, and will be doubled by the end of 2020.

2. Features of 5G Networks

With their high bandwidth, low latency, and massive connectivity, 5G networks contribute to the building of a fully connected world. They have three major applications: Enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency Communications (URLLC), and Massive Machine Type Communications (mMTC). Users can select the 5G devices they require according to different scenarios, and developers can select development scenarios based on the types of applications they want to create.

5G application scenarios

Fast transmission at Gbit/s

3D video and UHD video

Cloud-based office/gaming

Augmented reality (AR)

Industrial automation

Self-driving car

mMTC URLLC

Smart home

Smart city

Voice intercom

Intelligent video surveillance

eMBB

Source: International Telecommunication Union (ITU), partly updated

High-reliability applications, such as mobile healthcare

Niu Liyang, Liu Zhen

5G/Discussion on the Impact of 5G on Intelligent Vision

06

5G increases the peak transmission rate limit, laying a solid foundation for the internet of everything. It will play an important role in communications among machines and drive innovation across a range of emerging industries. Because of its high mobility and low power consumption, 5G is capable of supporting a wide array of frontend devices, such as vehicle-mounted devices, drones, wearables, and industrial robots, which will serve as significant carriers for video awareness. It is estimated that by 2023, the number of connected short-distance Internet of Things (IoT) terminals will reach 15.7 billion. In addition, the 5G network can be sliced into multiple subnets to meet the differing requirements of terminals in terms of latency, bandwidth, number of connections, and security. This will further enrich the application scenarios of 5G.

5G camera installed atop Mount Qomolangma Video image from a 5G camera

5G camera

Rongbuk Monastery

Optical fibers deployed at harbors are prone to corrosion, and those on gantry cranes can easily become entangled during operations. To solve this problem, HD cameras are connected to 5G networks to monitor gantry cranes, so that operators can remotely check lifting and hoisting operations in real time and promptly identify anomalies. In addition, powered by 5G and artificial intelligence (AI), most container hoisting operations can be completed by machines, greatly improving efficiency. When 5G is applied in a harbor, the transfer efficiency of the harbor is doubled, and the deployment and maintenance costs of optical fibers are reduced by about CNY100,000 each year. Additionally, operators no longer need to work at heights, greatly improving their work efficiency and ensuring safety.

Typical application case

Diverse 5G terminals become enablers of intelligent vision Network slicing enriches 5G application scenarios

5G network

Harbor VehicleEmergency assurance

5G slicing(harbor private

network)

5G slicing(bus private

network)

5G slicing(emergency assurance

private network)

Drone

穿戴设备Wearable 工业机器人Industrial robot

Vehicle-mounted device

07

With its low latency, 5G serves as the supporting system for AI. During the Industrial Revolutions, people increased their productivity by mastering mechanical energy. At present, we are experiencing an AI revolution, in which people are improving the intelligent capabilities of machines by harnessing computing power. As the cost of computing power drops, the cloud, edges, and devices are coming to possess ample computing power, which they can use to perform video-based analysis using intelligent algorithms, and generate massive amounts of valuable data. This data can only be fully utilized when it is quickly transferred among the cloud, edges, and devices.

5G is revolutionizing the way we think about AI. AI is now deeply rooted in the video surveillance industry, which in turn poses increasingly high requirements on video and image quality. 4K video encoded in the H.265 format requires an average transmission bandwidth of 10 Mbit/s to 20 Mbit/s. However, when intelligent services are enabled, the immediate peak transmission rate will soar to over 100 Mbit/s, far higher than that provided by 4G networks. Once they are connected to 5G networks, cameras can utilize the high bandwidth to quickly deliver detailed, high-quality video images, thereby improving intelligent analysis performance.

5G ushers in the AI era

VS

4G network720p camera

Bandwidth: 1 Mbit/s

Low-definition video, which cannot be used for intelligent services

720p video

5G network

4K camera

4K video

On-site operation

Optical fibers

Remote operation in the central control room

Camera

5G networks enable HD cameras to obtain full coverage

• Optical fibers easily become entangled

• Cabling is subject to sea tide impact

50 gantry cranes,each fitted with 10 to 18 cameras

Operators in the central control room can remotely operate two or three gantry cranes at the same time

Bandwidth: 200 Mbit/s

High-quality video,meeting the requirements of intelligent services

Optical fibers on existing gantry cranes

18 HD cameras are required for precise control

Remote detection and remote control joystick

Niu Liyang, Liu Zhen

08

111010011

1010011

010011

010011

010011

010011

010011

010011

110011

4. Application Bottlenecks of 5G in Intelligent VisionThe high bandwidth and low latency of 5G enable wireless video transmission, extending the boundary of intelligent vision applications. When powered by 5G, cameras can connect to massive sensors to implement multi-dimensional awareness. Additionally, as 5G develops, it is enabling the creation of various innovative kinds of devices, fueling the digital transformation of all industries.

5G/Discussion on the Impact of 5G on Intelligent Vision

Typical application case

Major economies around the globe are seeking to digitally transform their manufacturing sectors. Aircraft manufacturing is the most valuable sector of the manufacturing industry. Aircraft manufacturers adopt 5G and AI technologies for quality assurance, reducing the time required for carbon fiber stitching gap checks from 40 minutes to 2 minutes. In addition, 5G cameras provide a wide range of intelligent applications in factories, including safety helmet detection, workwear detection, and perimeter intrusion detection.

Intelligent capabilities are like electric power. The electric power possesses great potential, but cannot be directly applied in industries unless a power transmission network is built. 5G, in essence, serves as the transmission network for computing power and intelligent data. It enables the full implementation of intelligent capabilities, and by doing so, is promoting the intelligent transformation of industries and people's everyday life.

Intelligent data transmission on the devices, edges, and cloud

5G 5G5G

CloudAI

AI AIAI

Aircraft manufacturing plant

Edge node Edge node Edge node

09

To solve these problems, 5G cameras should not simply be combinations of cameras and 5G modules. Instead, they should provide efficient video/image encoding capabilities to reduce the bandwidth required for transmission. Additionally, reliable transmission technologies are needed to prevent the packet loss and bit errors which occur during wireless transmission. In this way, 5G base station resources can be utilized properly.

200m140Mbps

100m210Mbps

300m90Mbps

400m60Mbps

200 m140 Mbit/s

200m140Mbps

200 m140 Mbit/s

100 m210 Mbit/s

100m210Mbps

100 m210 Mbit/s

300 m90 Mbit/s

300m90Mbps

300 m90 Mbit/s

400 m60 Mbit/s

400m60Mbps

400 m60 Mbit/s

5G module

Built-in 5G module

Every technology encounters various difficulties when it is being applied. 5G is no exception when it is applied to intelligent vision. The 5G uplink and downlink bandwidths are unbalanced, and the total 5G uplink bandwidth of a single base station is limited to around 300 Mbit/s. However, most of the time, cameras upload P-frames containing changes in an image from the previous frame, as well as periodically upload I-frames containing all information. As a result, bandwidth usage can fluctuate dramatically. The instantaneous transmission rate of a single 4K camera can reach 60 Mbit/s. If five 4K cameras are connected to a single 5G base station, the uplink bandwidth of the base station will be insufficient for video transmission during peak hours. Therefore, video encoding needs to be optimized so cameras can adapt to the limited uplink bandwidth of 5G networks. In addition, packet loss and bit errors during wireless transmission may cause image quality issues such as artifacts and video stuttering, which require more reliable transmission modes.

A 5G network uses short wavelengths for transmission, which results in fast signal attenuation. The network bandwidth decreases rapidly as the distance increases. Therefore, the number of cameras that can be connected to a single 5G base station is limited. In addition, carriers tend to build 5G base stations based on their actual requirements in terms of construction costs and benefits, and 5G coverage is limited in the short term. Therefore, it is important to properly and efficiently use 5G base station resources and improve the coverage and access capability of a single base station.

Artifacts and video stuttering may occur due to wireless network transmission limitations.

Bandwidth attenuation of a 5G base station

Limited uplink bandwidth

Packet loss and bit errors frequently

occur during wireless transmissions

More efficient encoding

More reliable transmission

Niu Liyang, Liu Zhen

10

Chen Yun, Liu Zhen

5G-enabled Image Encoding and Transmission Technologies

5G expands the scope of intelligent vision, and embeds artificial intelligence (AI) into a wide range of industries. However, due to the limitations of 5G New Radio (NR), wireless 5G networks feature limited uplink bandwidth, and have high requirements for network stability. Technical innovations have sought to overcome these challenges for utilizing 5G in intelligent vision applications.

5G networks adopt a time-division transmission mode, and spend 80% of the time transmitting downlink data and 20% of the time transmitting uplink data, under typical configurations. Generally, the uplink bandwidth of a single 5G base station accounts for only 20% of the total bandwidth, and can reach 300 Mbit/s. However, in the intelligent vision industry, video and image transmission requires far higher uplink bandwidth than that provided by 5G networks.

1. Challenges to Video and Image Transmission on 5G Networks

D D D S U D D D S U

D D D D D D D S U U

Video and image transmission requires high uplink bandwidth and stable wireless networks

In addition, during video and image transmission, an I-frame containing the full image information is sent first, after which P-frames containing changes in the image from previous frames are sent, followed by an I-frame being sent again. The size of I-frames is larger than that of P-frames. As a result, image data occupies uneven network bandwidth during the 10 ms time window. Sending P-frames does not require a lot of bandwidth, but sending I-frames requires a high amount. For example, the average bit rate of 4K video streams is 12 Mbit/s to 20 Mbit/s, and the peak bit rate during I-frame transmission can reach 60 Mbit/s. This is known as I-frame burst, as it places great strain on the data transmission time window on 5G networks.

12345678

1 2 345678

RX+ (positive end for receiving data)RX- (negative end for receiving data)TX+ (positive end for transmitting data)Not usedNot usedTX- (negative end for transmitting data)Not usedNot used

Wired transmission Typical wireless time-division transmission

Bandwidth usage in a 10 ms time window, with each column indicating the size of a file

I-frame I-frameI-frame

I-frame I-frame

I-frame

Time

P-frame

P-frame

P-frame P-frameP-frame

0

File size

Time segment labeled with a D is used for data downlink, that labeled with a U is used for data uplink, and that labeled with an S can be configured.

4:1 subframe configuration

8:2 subframe configuration

Wired transmission in full-duplex mode to receive and send data packets anytime

Uplink transmission occupies only 20% of the total time, and uplink data packets can be sent only during the specific time

5G/5G-enabled Image Encoding and Transmission Technologies

11

Efficiently utilizing 5G base station resources to promote the large-scale commercial use of 5G in intelligent vision

In actual applications, a 5G base station always connects to multiple cameras at the same time. In this case, I-frame bursts may occur simultaneously for multiple cameras, resulting in I-frame collision, further intensifying the pressure on 5G NR bandwidth. According to tests, the probability of I-frame collision is close to 100% when over 7 cameras using traditional encoding algorithms are connected to a single 5G base station.

Furthermore, 5G networks are challenged by unstable transmission. Compared with wired network transmission, 5G wireless network transmission is subject to packet loss and bit errors, especially during network congestion. This results in video quality issues, such as image delays, artifacts, and video stuttering, which in turn affect backend intelligent applications.

In addition to limited uplink bandwidth and network transmission reliability, 5G networks feature a fast attenuation speed, which restricts the coverage of a single base station. This also affects the commercial use of 5G in intelligent vision. 5G transmission is mainly conducted on the millimeter wave and sub-6 GHz (centimeter-level wavelength) bands. These two bands feature short wavelengths, resulting in limited transmission range, poor penetration and diffraction performance, and faster 5G network attenuation. Therefore, the coverage of a single 5G base station is far smaller than that of a 4G base station. In addition, unlike 4G base stations which cover almost all areas, carriers build 5G base stations based on actual project requirements with construction costs and benefits taken into consideration. Therefore, efficiently utilizing 5G base station resources is essential to improving the coverage and access capabilities of a single base station, and to achieving the large-scale commercial use of 5G in intelligent vision.

Camera 1

Camera 2

Camera 3

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

1 2 3 4 5 6 7 8 9 10 11 12 13

Total uplink bandwidth of 5G networks decreases as the coverage radius increases

210

100 m 200 m 300 m 400 m

140

6090

Outdoor macrocell

Rate (Mbit/s)

I-frame

Probability that I-frames of all cameras do not collide with each other

Probability

Data packets of three cameras are scattered within 5 seconds, preventing I-frame collision

Number of cameras

25 frames per second GOP-25

25 frames per second GOP-30

25 frames per secondGOP-60

Supports 6–8 access channels for 40% of areas

Supports 2–3 access channels for 60% of areas

Coverage radius (m)

Chen Yun, Liu Zhen

12

In the intelligent vision industry, bandwidth required for video transmission has soared, as image resolution has continually increased. On top of that, high-quality person and vehicle images are captured and transmitted for intelligent analysis, which requires even higher bandwidth than that for video transmission. However, in real world applications, people tend to only focus on key information in video and images, such as pedestrians and vehicles, and have little need for high definition image backgrounds. ROI-based encoding technology was developed with this understanding in mind. It automatically distinguishes the image foreground from the background, ensuring high resolution in ROI within images, while compressing the background, which reduces the overall bandwidth required for transmission. This technology has managed to reduce the size of video streams and snapshots, with average bit rate a remarkable 30% lower in complex scenarios, and 60% lower in simple scenarios.

ROI-based encoding technology, reducing the average bandwidth required for video transmission

2. Key Technologies

The biggest challenge for large-scale commercial use of 5G in intelligent vision is efficiently utilizing 5G uplink bandwidth, and preventing packet loss and bit errors. As a remedy, the industry at large has sought to optimize image encoding and transmission.

Image encoding optimization is designed to eliminate I-frame bursts and reduce bandwidth required for video and image transmission. The region of interest (ROI)-based encoding technology is used to compress image backgrounds, which reduces the overall bandwidth required. In addition, stream smoothing technology is adopted to optimize I-frames, thereby reducing the peak bandwidth required and preventing network congestion.

Image encoding optimization

Compressed encoding of background, reducing bit rate

Normal encoding of foreground, ensuring high image quality

Encoder

Encoding streamAI

ROI-based video encoding vs. Traditional encoding method

Average bit rate of 1080p video (Mbit/s)

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0

Reduced by 30%

Reduced by 50%

Reduced by 60%

Complex scenario Common scenario Simple scenario

Complex scenario Common scenario Simple scenario

Original video/image

streams

Processed by AI algorithms

Standard H.265 encoding

ROI-based encoding

5G/5G-enabled Image Encoding and Transmission Technologies

13

I-frame optimization, reducing peak bandwidth required for transmission

DataChannel

No flow controls

Data Intelligent flow control Encoder

Transmission optimization technology mainly focuses on intelligent flow controls and network transmission reliability. Intelligent flow controls can detect network transmission status in real time and adjust data packet sending parameters accordingly, to improve overall network bandwidth usage. Network transmission reliability can be enhanced via automatic repeat request (ARQ) and forward error correction (FEC) technologies, and help prevent packet loss and bit errors.

In wireless transmission, if data is continuously sent while the network is congested, transmission capabilities will deteriorate sharply. Intelligent flow control technology makes use of flow control units to detect the length of data queues in real time, and adjust the data packet sending parameters accordingly. This allows for more data to be sent during off-peak hours, and prevents data stacking during peak hours, for optimized network bandwidth usage.

Intelligent flow controls

Transmission optimization

Receiver

No flow control: Data is directly sent to the channel, causing network congestions and packet loss.

Encoder

Channel

Peak bit rate of I-frames reduced by 40% after stream smoothing, reducing network congestions caused by I-frame bursts

0Time

0Time

Receiver

File size File size

Before I-frame optimization After I-frame optimization

The peak bit rate during I-frame bursts is extremely high, which can lead to network congestion. To address this, the industry has adopted a stream smoothing technology to adjust encoder parameters and control the size and frequency of I-frames, reducing the peak bandwidth required for video transmission during I-frame bursts.

Packets sent without flow control are prone to packet loss and

network congestions

Adjust the encoder and data packet sending parameters based on the length of data

queues, preventing data stacking.

Video delay and stuttering

Smooth, clear video images

Intelligent flow control: Flow control unit monitors network status in real time and adjusts the packet sending parameters to improve network usage and prevent network congestions.

Chen Yun, Liu Zhen

14

Video transmission through the Transmission Control Protocol (TCP) features low efficiency, particularly when packet loss occurs on wireless networks. On 5G networks, video and images are transmitted through the User Datagram Protocol (UDP), which features two implementation methods: acknowledgment and retransmission mechanisms based on ARQ and FEC. ARQ adds a verification and retransmission mechanism on the basis of the conventional UDP-based transmission. If the receiver detects that the transmitted data packet is incorrect, the receiver requests that the transmitter retransmit the data packet. FEC reserves verification and error correction bits during data transmission. When the receiver detects an error in the data, it uses the error correction bits to perform the exclusive or (XOR) operation, in order to restore the data. The transmission optimization technologies can ensure smooth video transmission, even when packet loss rate approaches 10%. However, transmission reliability improvement mechanisms need to be deployed on both the peripheral units (PUs) and backend platforms.

Enhanced transmission reliability to prevent packet loss and bit errors

These innovations have helped facilitate the commercial use of 5G in intelligent vision. More specifically, ROI-based encoding and I-frame optimization help reduce the average bit rate at the encoding end and the peak bit rate, so that 5G uplink bandwidth can be utilized in a more efficient manner. Intelligent flow controls and transmission reliability improvement technologies enable cameras to actively monitor data sending queues. This helps prevent network congestion and improve 5G bandwidth usage. In addition, advancements in encoding and transmission technologies allow a single 5G base station to connect to more cameras and increase its coverage range.

3. Camera Bit Rate and Base Station Coverage After Optimization

ARQ FEC

Receiveddata C2

Redundant coding matrix A

Originaldata B

Sent data C1

D1....

D3

D4C1

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

R11 R12 R13 R14

D1

D2

D3

D4

D1

D2

D3

D4C1

=Data

transmission

Sender

Receiver

Retransmission

NOT OK!NOT OK!

Data D2 lost during data transmission can be restored using the received data and redundancy coding matrix (A^B=C). Data lost in matrix B can also be restored (C^A).

Peak bandwidth of 4K videoPeak bandwidth of 1080p video

AfterBefore

83

20

6

AfterBefore

41

60

15

Peak bandwidth required for video transmission

Uplink bandwidth: 300 Mbit/sUnit: Mbit/s

ARQ adds a verification and retransmission mechanism on the basis of the conventional UDP-based transmission. If the receiver detects that the transmitted data packet is incorrect, the receiver requests the transmitter to retransmit the data packet.

Number of cameras that can be connected to a single 5G base station within 400 m

Number of 1080p cameras supported by a single base station

Number of 4K cameras supported by a single base station

5G/5G-enabled Image Encoding and Transmission Technologies

15

Tan Shenquan, Liu Zhen

Products and Solutions Catalog

Huawei 5G Cameras

Huawei 5G Camera Models

Intelligent encoding and I-frame optimization, improving resource utilization of 5G base stations

Huawei, has leveraged its accumulated prowess in 5G and network communications, in releasing a series of patented innovations to resolve longstanding 5G transmission challenges, such as the limited coverage of individual 5G base stations, low uplink bandwidth, and packet loss. Huawei has also launched a series of related products, such as 5G cameras, that can be applied across a wide range of industries, including intelligent harbors and manufacturing.

5G networks feature limited uplink bandwidth, resulting in network congestion when I-frame bursts occur during video transmission. To resolve this problem, Huawei has proposed an region of interest (ROI)-based encoding technology to increase the compression ratio of image backgrounds. This helps reduce the average bit rate of video streams. Furthermore, the I-frame optimization technology helps reduce the bandwidth required for video transmission during peak hours, to prevent network congestion. After the optimization, the maximum number of cameras that can be connected to a single 5G base station has increased by two to three times, and 5G base station coverage has increased by two to three times as well, significantly improving the resource utilization of 5G base stations.

User Datagram Protocol (UDP)-based reliable transmission, ensuring smooth, efficient video transmission

To prevent packet loss and bit errors during wireless transmission, Huawei has adopted UDP and the dynamic optimization policy, to ensure smooth video transmission even when packet loss occurs.

M2281-10-QLI-W5 M6781-10-GZ40-W5 X7341-10-HMI-W5

Packet loss rate within 10%

Large-scale access Built-in integrated antenna, intelligent encoding and transmission optimization for 5G New Radio (NR), ensuring large-scale access of 5G cameras

Flexible deployment

AI-powered innovationProfessional-grade artificial intelligence (AI) chips and dedicated software-defined camera (SDC) operating system (OS), supporting a wide range of intelligent functions such as person analysis, crowd flow analysis, and vehicle analysis; support for long-tail algorithms

Supports n78, n79, and n41 frequency bands and standalone (SA)/non-standalone (NSA) hybrid networking

Clear, smooth video

Image encoding and transmission optimization technologies ensure smooth video transmission even when the packet loss rate reaches 10%

Tan Shenquan, Liu Zhen

02

AIImage, Algorithm, and Storage Trends Led by AI

Discussion on Frontend Intelligence Trends

SuperColor Technology

Storage EC Technology

Multi-Lens Synergy Technology

Video Codec Technology

Chip Evolution and Development

Algorithm Repository Technology

Products and Solutions Catalog

17

Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms

24

32

36

42

47

52

56

60

28

Ge Xinyu, Zhang Yingjun

Image, Algorithm, and Storage Trends Led by AI

17

AI has become a core enabler of digital transformation across industries

As artificial intelligence (AI) technology matures and an intelligent society develops, AI is being used in a wide range of industries. Currently, the transportation industry is using AI+video to achieve the efficacy of traffic management. In the future, AI+video will gradually be embedded in more sectors, such as government, finance, energy, and education.

The rapid development of AI is driving considerable growth within the global video analysis industry

In recent years, the fast development of deep learning technology has driven the rapid growth of the overall video analysis industry. According to statistics, from 2018 to 2023, the compound annual growth rate (CAGR) of the video analysis product market is predicted to reach 37.1%. Additionally, the proportion of intelligent cameras powered by deep learning is expected to increase from 5% to 66%.

1. AI+Video Future Prospects

Transportation

Government

Finance

Energy

Education

Proportion of intelligent cameras shipped with deep learning analytics and rules based analytics

0%2018 2019 2020 2021 2022 2023

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Rules Based Deep Learning Based

S

SS S 0.38bn

37.1%2018 global

revenue

2018-2023 CAGR66.4%2018

63.6%2019

42.9%2020

34.4%2021

26.1%2022

22.3%2023

YOY revenue growth

Video analysis applications

%

Data source: IHS MarKit 2019

Transport networks can use AI to: Recognize key people and vehicles, thereby improving traffic safety governance in urban areas; realize refined management of urban traffic and promote smooth traffic optimization based on precise data.

Governments can use AI to: Improve their administrative efficiency by informatizing infrastructure; improve the intelligence of various application systems; enhance information awareness, analysis, and processing capabilities by analyzing massive video data.

Banks can use AI to: Turn their focus from improving service efficiency to enhancing marketing, improving the intelligence of unstaffed bank branches, and accelerating the reconstruction of smart branches.

Energy companies can use AI to: Realize visualized exploration and development, and construct intelligent pipelines and gas stations.

Educational institutions can use AI to: Establish uniform systems across countries/regions; promote intelligent education; establish intelligent education demonstration areas; and drive education networking.

Ge Xinyu, Zhang Yingjun

AI/Image, Algorithm, and Storage Trends Led by AI

18

Why is it necessary to have an image quality assessment standard?

The status quo of image quality assessment standards

Machines are capable of conducting a wide range of recognition tasks, including recognizing objects such as pedestrians, cyclists, and vehicles. To improve the recognition accuracy of AI algorithms, high-quality video is needed.

All-scenario and all-weather coverage: New intelligent applications pose higher requirements on full-color imaging in low light conditions, and this is now a trend within the industry. For example, person re-identification (ReID) requires cameras to accurately capture the color of the surroundings and the gait details of people. Against this backdrop, infrared multi-spectral light compensation technology has been proposed, which enables cameras to perform better in low light conditions, and do so in an environmental-friendly way.

AI and image enhancement technologies have developed rapidly. Technologies such as AI noise reduction use global and local optimization methods to improve image quality. They focus on optimizing image quality for targets such as license plates, which greatly enhances the accuracy of image recognition. However, the industry still lacks a complete and objective image assessment standard.

The rapid development of AI in recent years has revolutionized the public safety industry. In the past, video needed to be watched by people, but now, machines also play an important role in viewing and analyzing video. However, the current technical standards do not reflect the true capabilities of today’s video surveillance technologies.

The current Chinese national standard GA/T 1127–2013 mainly lists requirements for camera network access and manual video viewing. According to the traditional assessment method, experienced workers grade images subjectively, but this method cannot be used in machine assessment. Now that AI is enabling image assessment to become increasingly objective, an objective image assessment standard needs to be formulated.

General technical requirements for cameras used in security video surveillance

ReID technology Full-color imaging in low light conditions

Re-ID

......

Cyclists VehiclesPedestrians

2. To Achieve AI Development, an Image Quality Assessment Standard is Needed for Intelligent Cameras

19

Thoughts and suggestions on the design of a standard system

Key issues relating to the formulation of a new standard

There are five key issues to consider when developing an image quality assessment system for intelligent cameras.

GYT 134 (1998), The method for the subjective assessment of the quality of digital television picture

1997 1998 2000 2002 2003 2007 2009 2010 2011 2012 2013 2018 2019

GB 50198-2011 Recommendation ITU-T J.341 (2011),

Recommendation ITU-T J.341 (2011),

Technical code for project of civil closed circuit monitoring television systemObjective perceptual multimedia video quality measurement of

HDTV for digital cable television in the presence of a full referenceObjective multimedia video quality measurement of HDTV

for digital cable television in the presence of a reduced reference signal

GA/T 1356-2018 Specifications for compliance tests with national standard GB/T 25724-2017

(2017 to now)(2012 to now)

No Reference Metric (NORM) Audiovisual HD Quality (AVHD)

GA/T 1127-2013 General technical requirements for cameras used in security video surveillance

The current Chinese national standard GA/T 1127–2013 mainly lists requirements for camera network access and manual video viewing. According to the traditional assessment method, experienced workers grade images subjectively, but this method cannot be used in machine assessment. Now that AI is enabling image assessment to become increasingly objective, an objective image assessment standard needs to be formulated.

General technical requirements for cameras used in security video

The assessment indicators should be associated with user scenarios and reflect practicability of the service.

Score weighting should be decided based on each user task and scenario to calculate the overall score.

Methodology for the subjective assessment of the quality of television pictures

Recommendation ITU-R BT.500-13 (2012),

HDTV Phase I (2010),

Recommendation ITU-R BT.500-12 (2009), Methodology for the subjective assessment of the quality of television pictures

Full References (FR) and Reduced Reference (RR) objective video quality models that predict the quality of high definition television QART (Quality Assessment for Recognition Tasks) (2010)

RRNR-TV (2009), Reduced Reference (RR) and No References (NR) objective video quality models that predict the quality of standard definition television

Recommendation ITU-R BT.1788 (2007), Methodology for the subjective assessment of video quality in multimedia applications

FRTV Phase II (2003), Full References (FR) objective video quality models that predict the quality of standard definition television

Recommendation ITU-R BT.500-11 (2002), Methodology for the subjective assessment of the quality of television pictures

FRTV Phase I (2000), Full References (FR) objective video quality models that predict the quality of standard definition television

Recommendation ITU-R BT.500-7 (1997), Methodology for the subjective assessment of the quality of television pictures

Objectivity of camera imaging quality assessment

When humans judge imaging quality using their eyes, their assessment is subjective. An objective quality assessment model would be based on existing full-reference, semi-reference, or no-reference models within the industry.

The assessment result arrived at by intelligent vision must be consistent with the subjective perception. This is a key factor that any standard system must promote and recognize.

Currently, the image quality indicators of cameras are mainly evaluated using test cards and software or by manual judgment. This is different from the actual scenarios where these cameras would be used, which involve moving objects like people and vehicles. In addition, infrared multi-spectral light compensation technology is widely used in actual scenarios. Therefore, the spectral characteristics of the target must be consistent.

Currently, the image quality indicators of cameras are tested separately, and the relationship and weight of indicators for different intelligent tasks are not considered.

Different assessors should get the same result regardless of time or place.

Consistency of assessment result and subjective perception

Identity of assessment scenario and real environment

Concordance of assessment indicators and actual effect

Repeatability of assessment methods

The assessment dimensions should include the user task type, user scenario type, and basic factor of image customer assessment.

Ge Xinyu, Zhang Yingjun

20

3. Service Development Requirements for AI Algorithms and Future Evolution

Evolution from traditional single-object analysis to multi-object associative recognition

Indicator system for the image quality assessment for intelligent cameras

Overall score

Recognition task 2

Score

Recognition task 3

Score

........

Score

Recognition task 1

Daytime

Even illumination in the daytime

Score

Backlight in the daytime

Score

Score

...

Low light at night

Score

Low light with glare

Score

Nighttime

Rain and snow

Score

Rain and fog

Score

.....

DefinitionTexture detail NoiseContrast

Color reproductionColor sensitivity Color saturationExposure qualityGeometric distortion

Stability Frame rate

计算函数f(x)Calculation function f(x)

计算函数f(x)Calculation function f(x)

计算函数f(x)Calculation function f(x)

Person recognition Behavior recognition Gait recognition

...

License plate recognition

Multi-algorithm integration

Light raking in the daytime

Aggregate scores by user task weight

Aggregate scores by user scenario weight

Basic image indicator factor

Objective quality factors of a single frame in the spatial domain:

Objective quality factors in the temporal domain:

The traditional single-object recognition method cannot accurately recognize or analyze occluded objects. Instead, multiple algorithms must be integrated to improve recognition efficiency, which has become a key service requirement and future direction for algorithm evolution.

AI/Image, Algorithm, and Storage Trends Led by AI

21

Evolution from traditional service closed-loop in a single area to comprehensive security protection

Bus stations/Bus stops

Airports

Pedestrian zones/Areas

Railway/Subway stations

4. Storage Requirements of AI DevelopmentThe status quo of video and image storage

To improve recognition accuracy, AI algorithms pose higher requirements on the image quality of cameras (including definition and resolution). In smart cities and intelligent transportation systems, HD cameras are widely deployed, and this requires considerable storage space for video and images. As a result, storage duration and coverage areas increase, which can lead to a range of problems such as a limited equipment room footprint, high power consumption, and maintenance difficulties.

Customers' primary concern is how to improve storage space utilization and reduce equipment room footprint, storage deployment costs, power consumption, and total cost of ownership (TCO).

In a medium-sized city

Video resolution Storage duration Coverage area

1080p 30 days Key areas

4K 90 days All areas

Limited equipment room footprint40+ cabinets;

line reconstructionMaintenancedifficulties

Component/Node/Site faults

High power consumption

440+ kW

Comprehensive intelligence across all scenarios: Implement closed-loop video surveillance for key areas such as city's entrances, railway stations, subway stations, bus stations, airports, pedestrian zones, urban-rural intersections, street communities, and agricultural trade markets.

Multi-dimensional data collision and analysis: Align vast quantities of video and image data with multi-dimensional social data such as travel data, to better analyze people.

Full awareness of people and vehicles within a residential community: Collect and update data for people and vehicles entering and leaving residential communities every day in real time; quickly, and accurately recognize objects.

Social and transportation development facilitates provincial and national population mobility. Therefore, the traditional service, with a closed-loop in a single area, cannot meet the requirements of comprehensive security protection which is gradually developing towards cross-region intelligent management.

Ge Xinyu, Zhang Yingjun

22

5. Trends

The core objective of AI is to turn the physical world into metadata for analysis. However, in actual applications, a single piece of metadata is generally useless. This requires frontend devices to go from uni-dimensional data collection to multi-dimensional data awareness, and backend platforms to evolve from relying on image intelligence to data intelligence. In this way, data can be fully associated and utilized for analysis and prediction.

Frontend devices: from uni-dimensional data collection to multi-dimensional data awareness

Aggregated data lake

In smart cities and intelligent transportation systems, video streams are mainly used to conduct AI analysis of people and vehicles. A balance needs to be struck between lowering storage costs and ensuring the accuracy of this analysis.

Future trends

High-density storage: more storage media per unit

Video compression: Deep video compression enables better utilization of storage space. For example, region of interest (ROI) compression technology separates and extracts ROIs from the background to reduce video bit rate and storage space without decreasing the ROI detection rate.

机动车

Bit rate beforecompression: 2642 kbit/s

Motorvehicle 机动车

Bit rate aftercompression: 551 kbit/s

Motorvehicle

Siloed systems where data is isolated

Department C

Accommodation

Travel

Department B

Phone

Relationship

Department A

Person

Vehicle

Pixel-level image

segmentation

Diversified awareness dimensions and integrated device form

Multi-dimensional data awareness (+time/space/multi-modal) where data has converged

AI/Image, Algorithm, and Storage Trends Led by AI

23

Backend platform: from image intelligence to data intelligence

Image intelligence: unforeseeable Data intelligence: foreseeable

............ ......

Internet of things (IoT) data

Internet data

Ge Xinyu, Zhang Yingjun

Xu Tongjing

24

AI/Discussion on Frontend Intelligence Trends

Superior imaging quality with ultimate computing power

Intelligent cameras, as sensing devices in the intelligent vision sector, were introduced around five years ago. Different from traditional IP cameras (IPCs), intelligent cameras can adapt to challenging environments and collect video data of a higher quality. However, due to immature algorithms and chips, intelligent cameras cannot provide sharp, HD-quality images in harsh weather conditions such as during rain, sandstorms, and on overcast days. In addition, factors such as poor installation angle, occlusion, low light, and low resolution may also lead to inaccurate object recognition. If the imaging quality cannot be guaranteed, intelligence will remain an unachievable mirage.

With AI algorithms, intelligent cameras can automatically adjust image signal processing (ISP) parameters such as shutter speed, aperture, and exposure according to the ambient lighting and object speed, deliver optimal images for further detection and recognition, and associate face images with personal data.

The aim of artificial intelligence (AI) is to train computers to see, hear, and read like human beings. Current AI technologies are mainly used to recognize images, speech, and text. Renowned experimental psychologist D. G. Treichler proposes that 83% of the information we obtain from the world around us is through our vision. Therefore, over 50% of AI applications nowadays are related to intelligent vision, and around 65% of industry digitalization information comes from intelligent vision. In addition, to bridge the physical and digital worlds, all things must be sensing. The type, quantity, and quality of data collected by frontend sensing devices determine the intelligence level.

1. Five Advantages of Frontend Intelligence

Discussion on Frontend Intelligence Trends

83%11%

1.5%

1%

3.5%

Intelligent image quality adjustment

25

Applicable to varied scenarios

Intelligent vision systems are increasingly expected to satisfy the needs of various industries for various intelligent applications at various times and in various scenarios. For example, cameras must be able to detect vehicle queue length and accidents in the daytime and detect parking violations at night or load different algorithms at different preset positions.

Thanks to frontend intelligence, customers can load their desired algorithms on intelligent cameras to satisfy their personalized or scenario-specific requirements. This also helps reduce risk exposure in the delivery of diversified algorithms. In addition, lightweight container technology is used to construct an integrated multi-algorithm framework. This enables each algorithm to operate independently, ensuring service continuity during algorithm upgrade and switchover. Customers can also flexibly choose their desired intelligent capabilities to adapt to specific application scenarios.

System linkage within milliseconds

In many industries, such as transportation and emergency response, fast response and closed-loop management are the basic and also the most critical requirements of services. Frontend intelligence enables cameras to analyze video in real time and to immediately link related service systems upon detecting objects that trigger behavior analysis rules, in locations such as airports and high-speed rail stations.

In road traffic scenarios, cameras need to link external devices such as illuminators, radar detectors, and traffic signal detectors within milliseconds. For example, cameras need to work with illuminators to provide enhanced lighting for specific areas at the right moment or periodically synchronize with traffic signal detectors to accurately detect traffic incidents. In other linkage scenarios, for example, linkage between radar detectors and PTZ dome cameras or between barrier gates/swing gates and cameras, frontend intelligence can dramatically improve the system response efficiency and ensure quick service closure.

Optimal computing efficiency

Video plays an essential role in some key industries such as social governance and transportation. However, the traditional video surveillance market tends to be saturated and cannot satisfy digital transformation across industries. Thanks to ultimate computing power, a lot of intelligent applications are now possible. Compared with backend intelligence, frontend intelligence improves computing efficiency by 30% to 60%.

With frontend intelligence, each camera processes only one video channel at the frontend, which poses lower requirements on computing power, and directly obtains raw data for analysis, further reducing computational requirements and enhancing processing efficiency. Frontend intelligence also enables cameras to deliver high-quality images to the backend, so the backend platform can focus on intelligent analysis while focusing less on secondary image decoding. With the same computing power, image analysis is roughly 10 times more efficient than video analysis. Moving intelligence to the frontend can maximize the value of intelligent applications for customers with limited resources.

Radar Radar

Intelligent camera

Vehicle feature

extraction

Vehicle capture

Gantry Gantry

Intelligent camera

Frontend intelligenceBackend intelligence

Computing efficiency

0

100%

Intelligent camera

Motor vehicles, non-motorized vehicles, and pedestrians appear

simultaneously

Collision warning upon lane change

Millimeter-wave radar

Xu TongjingXu Tongjing

26

AI/Discussion on Frontend Intelligence Trends

2. Key Factors for Implementing Frontend Intelligence

Improved engineering efficiencyTo apply intelligent applications on a large scale, engineering issues must be considered. A top concern for engineering vendors is upgrading and reconstructing the live network using existing investments and at the lowest cost. The prevalence of intelligent cameras (including common cameras with inclusive AI computing power), where intelligent algorithms can be dynamically loaded, can dramatically improve the frontend data collection quality, enhance the intelligent analysis efficiency by 10-fold and intelligent application availability by several-fold, and lower the total cost of ownership (TCO) by over 50%.

The most basic functionality of a camera is to shoot HD video around the clock, and HD and sharp images are the most basic requirements for computer vision. Computing power is required to optimize images to improve the intelligent recognition rate. In scenarios where intelligent services require high real-time performance, ultimate computing power is required to meet real-time data awareness, computing, and response requirements.

In addition, frontend intelligence enables a camera to run multiple algorithms concurrently. For example, an intelligent camera can simultaneously load multiple algorithms such as traffic violation detection, vehicle capture and recognition, and traffic flow statistics, while multiple devices were required to support these functions in the past. This sharply lowers the engineering implementation difficulty and improves the engineering efficiency.

In terms of product technologies, intelligent cameras must be equipped with AI main control chips and intelligent operating systems to implement frontend intelligence.

Frontend intelligence

Backend intelligence

Frontend intelligence

Backend intelligence

Frontend intelligence

Backend intelligence

Intelligent analysis efficiency improved by 10-fold

TCO reduced by over 50%Intelligent application availability improved by several-fold

0

100%

0

100%

0

100%

27

Customers require cameras with different hardware forms and software with different capabilities depending on the usage scenario. Currently, most cameras are designed for specific scenarios, but their software and hardware are closely coupled. If software can be decoupled from hardware, users can install desired algorithms on cameras just like installing apps on smartphones. This maximizes the value of hardware, saves overall costs, and improves user experience. To decouple software from hardware, an open and intelligent operating system is required. With the intelligent operating system, differences between bottom-layer hardware are no longer obstacles. After the computing and orchestration capabilities of bottom-layer hardware devices are invoked, they are uniformly encapsulated by the operating system. This significantly simplifies development and allows developers to focus solely on the software's functional capabilities. In addition, the lightweight container is used to construct an integrated multi-algorithm framework, where each algorithm runs independently in a virtual space, allowing independent loading and online upgrading. In summary, an intelligent camera operating system is the basis of frontend intelligence.

Computing power is the foundation of intelligent capabilities, while professional AI chips give a huge boost to computing power. Accelerated by dedicated hardware, these AI chips support tera-scale computing and visual processing based on deep learning on a neural network. To support frontend intelligence, cameras must be equipped with professional AI chips.

The industry has reached a consensus on frontend intelligence and related standards. Mainstream vendors and users in the industry are actively embracing frontend intelligence. Vendors in the industry have launched products such as software-defined cameras and scenario-specific intelligent cameras. The industry ecosystem is thriving.

Intelligent awareness can help collect multi-dimensional data, dramatically improve the data collection quality, and unleash the value of mass video data while reducing computing power required for backend data processing and the overall TCO. In addition, distributed processing significantly improves system reliability.

In the mobile Internet sector, the app market provides an overwhelming number of apps. Users can download and install desired apps on their smartphones. In the intelligent video sector, the burning question is: How can we aggregate excellent ecosystem partners to provide superior algorithms and applications to meet customers' fragmented and long-tail requirements? To address this issue, the intelligent algorithm platform was developed, which aggregates ecosystem partners in the intelligent vision sector to provide intelligent video/image applications for a range of industries. The platform protects developers' rights and interests through license files and verification mechanisms and also allows users to easily choose from a range of reliable intelligent algorithms. In addition, intelligent cameras can connect to a range of hardware sensors in wired or wireless mode to help build a multi-dimensional awareness ecosystem. With a rich ecosystem, a large number of long-tail algorithms dedicated to specific industries can be quickly released to meet the requirements of various scenarios.

From the perspective of application ecosystems, frontend intelligence requires a future-proof algorithm and hardware ecosystem to boost industry digital transformation.

Intelligent operating system

Xu TongjingXu Tongjing

AI/Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms

Beijing CSVision Technology Co., Ltd.

1. Background

2. Cloud Storage: Managing a Massive Amount of Video and Images

The public safety industry has developed rapidly over the past 40 years, and there is now a large market for public safety products and services. The emergence of innovative technologies like 5G, cloud storage, new video codec technologies and video/image analysis technologies has driven the industry to expand, and it now encompasses a much wider range of fields. These technologies are driving the public safety industry into a brand-new era in which images are clearer, more accurate, and comprehensible, and more data values are mined from a multitude of sources, including: masses of video and image data; service data from the transportation industry, governmental bodies, campuses, and enterprises; multi-dimensional big data from Internet and Internet of Things (IoT).Furthermore, the industry is becoming increasingly intelligent, and risks can now be identified in advance, achieving proactive surveillance and prevention, and enabling the public safety industry to shift from merely perceiving to being capable of foresight. Scenario-based intelligent video and image cloud platforms need to adapt to this.

Like the public safety industry itself, public safety management platforms have gone through multiple phases. They have evolved from being a standalone software to an embedded software to a distributed system and finally to an intelligent video and image cloud platform. Driven by scenario-based requirements and related technologies, intelligent video and image cloud platforms have been optimized to overcome the shortcomings of previous phases.

In the field of public safety, the main problem video and image platforms encounter is how to store and search through a massive amount of video and images. If millions of cameras are deployed, tens of exabytes (EB) will be required to store the video and images generated by these cameras. Traditional distributed video surveillance platforms cannot solve this problem because they are independent and cannot connect or share with each other. The emergence of cloud storage technologies has solved this problem.

Smooth expansion

Storage nodes can be added online at anytime and anywhere to meet storage capacity requirements. This means

data can be automatically reallocated after a scale-out, and restored after a scale-in.

Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms

H.264MPEG-4/AVC

4K

2K

1080P

720PDV

DB

DBDB

DB

DBDB

DBDB DB

Video sensing Video storage Video encoding

Intelligent analysis Network transmission

28

3. AI Technology: Better Understanding of Video and Images The development of video-related technologies has played a significant role in the advancement of the public safety industry, and the emergence of visual artificial intelligence (AI) technologies has strengthened this role. AI will be key to the future development of the public safety industry.The increase of hardware computing power and optimization of the software framework has driven the explosive development of AI technologies. As the field where AI is implemented most directly, the public safety industry also improves the development of AI technologies. The emergence of AI makes it possible for intelligent video and image cloud platforms to store and manage video and image data in a structured manner, and better understand video and images.

Cloud-edge synergyEdge computing enables real-time feedback and alleviates the pressure on the network transmission bandwidth. The cloud focuses on non-real-time, long-period, and service decision-making scenarios, while edges function as terminals which collect high-value data to better support big data analysis on the cloud. The cloud delivers service rules to edges, thereby optimizing their service decision-making. Enabling the cloud and edges to collaborate is the best way to cope with the huge amounts of data generated by AI-powered systems.

Video and image resource pool designThe video and image resource pool aggregates and manages various types of video and images in the local domain and provides resource management services for external systems, for example, the upper-level domain.Various themed libraries and specialized libraries are formed through the aggregation and governance of video and images. External services include basic data services like data queries, data subscriptions, database views, and data sharing as well as basic application services such as full-database searches, model matching, feature analysis, relationship graphs, and convergent big data.

Multi-dimensional application of video and imagesIntelligent video and image cloud platforms perform multi-dimensional data convergence and matching on managed data and external data. For example, they trigger multi-dimensional alert tasks, report alarms in real time, and analyze events from multiple dimensions based on real-time data. Another example is that they analyze and mine internal relationships among historical data to quickly identify anomalies. In addition, they use knowledge graph technology to mine relationships between people, events, and people and events, providing a decision-making basis for major events and improving the intelligent analysis capability of the entire system.

Efficient storageBandwidth aggregation enables the unlimited expansion of storage bandwidth, and block-level data striping enables fast concurrent access. Structured sequential storage allows the structured storage of unstructured video data, achieving efficient video storage and quick searches. Increased computing power enables video synopsis and data migration, so less storage space is required on the video cloud.

29

Beijing CSVision Technology Co., Ltd.

4. 5G Enabling Extensive Large-Capacity ConnectionsIntelligent video and image cloud platforms use existing network technologies to achieve heterogeneous data connections among multiple networks. However, wireless-based applications are not widely used due to limited bandwidth and poor real-time performance. With enhanced mobile broadband, platforms can utilize 5G's reliability, low latency, wide coverage, and mass connections to improve their connection capabilities. In addition, since nearly 80% of typical AI and 5G applications overlap, 5G can completely support the large-scale implementation of AI applications.

More intelligent and clearer5G- and AI-powered 4K and 8K video surveillance solutions have become an optimal choice due to their high frame rate, ultra high definition (HD) technology, and wide dynamic range. AI algorithms can extract more detailed information such as a person's physical characteristics and behavior from video, making it possible for an intelligent video and image cloud platform to be applied in more diverse industry scenarios. With the help of augmented reality (AR), virtual reality (VR), and 5G technologies, the platform can provide immersive services for specific scenarios such as power monitoring and repair, and intelligent manufacturing.

More convenient and efficient5G enables the formation of a quick surveillance solution comprising a command center and AI-powered cameras. Data is transmitted quickly and securely and cameras provide all required computing power.In addition, AirFlash 5G implements fast data dumping for train-to-ground communications, such as metro and railway communications. This solution efficiently transmits data for passenger cars and crew compartments, as well as loading video, to the ground, so that security risks can be detected promptly.

30

AI/Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms

5. Future TrendsPowered by 5G, AI, and video and image applications, intelligent video and image cloud platforms are being applied in various industry scenarios. They have gradually evolved to become the core of video surveillance systems. In the future, intelligent video and image cloud platforms will continue to develop in the following ways:

More compatible and openPlatforms are required to be compatible with frontend devices (camera, digital video recorder, digital video server, network video recorder, and central video recorder); backend storage media (IP storage area network, fiber channel storage area network, network attached storage, direct attached storage, and cloud storage); various compression algorithms (MJPEG, H.263, MPEG4, H.264, H.265, and SVAC); multiple AI algorithms (vehicle analysis, pedestrian analysis, and behavior analysis). The high requirements on compatibility require a great degree of openness. As a result, more compatible and open platforms will be the trend.

Data visualization and integrationIntelligent video and image cloud platforms converge IoT data in the form of video and images. The question of how to intuitively and efficiently display and apply IoT data will become a much-debated topic for this kind of platform.

Efficient management of unstructured dataThe core service will still focus on video and image management, in terms of data storage, forwarding, searching, and application. Among management indicators, response speed is among the most important when it comes to measuring the platform performance. The question of how to efficiently manage unstructured data will be researched to improve the response speed. The structured data technology may be an important tool to solve this issue.

Layered software designImplementing industry requirements is key to developing innovative technologies such as AI, cloud computing, and 5G. Although video surveillance platforms in different industries differ greatly in terms of functionality, on an architectural level, they are basically the same, and their differences are mainly limited to the application layer. This not only ensures the stability of the basic architecture but also enables these platforms to adapt to a huge range of industry applications. In future, a layered software design will be introduced so that the diverse requirements of each industry will be quickly met and customer stickiness will be improved.

31

Widely used edge-cloud synergyWith the increase of edge computing power and the concurrent decrease in the cost of unit computing power, more mature video structuring technologies will make structured descriptions possible for video. Tens of billions of structured and semi-structured data records will be generated based on massive amounts of accumulated video data. Edge-cloud synergy will help intelligent video and image cloud platforms to cope with this explosive data growth.

Beijing CSVision Technology Co., Ltd.

32

AI/Chip Evolution and Development

1. Chip Classification

Chips can be classified into general-purpose chips and special-purpose chips according to their intended application.

In fact, general-purpose and special-purpose chips are starting to converge in terms of chip design. For example, general-purpose chips are integrating GPUs, DSPs, and even FPGAs to provide acceleration capabilities for specific applications, while special-purpose chips are also integrating general-purpose CPUs to provide flexibility and independent deployment capabilities.

General-purpose chips, designed to execute general tasks, are mainly central processing unit (CPU) chips based on various architectures, such as x86, Arm, MIPS, PowerPC, and RISC-V. This type of chips can run operating systems and provide abundant peripheral interfaces, meeting diversified application requirements. As market competition gets fiercer, chip architectures such as MIPS and PowerPC are becoming less prevalent and may one day disappear altogether.

Chip Evolution and Development

2. Chip History

The CPU has been synonymous with Intel/x86 ever since its advent back in 1971. In 1971, Intel developed the world's first microprocessor, the 4004, a 4-bit processor that was capable of performing 60,000 operations per second (OPS). This epoch-making product, despite its weak performance, had far-reaching implications when it debuted. In the next few years, Intel quickly rolled out more processors such as the 4040, 8008, and 8080.

x86 chip

In 1978, Intel launched the first 16-bit processor, the i8086, which gave rise to the x86 architecture. This processor used an instruction set called the x86 instruction set, which has evolved since and is in ubiquitous use today.

In the 1980s, Intel made available the 80286, 80386, and 80486 processors, which featured over 1 million transistors and a CPU clock speed of up to 50 MHz. In the 1990s, Intel launched the Pentium/P6 series of processors. Technologies such as superscalar, multi-level cache, branch prediction, and single instruction, multiple data (SIMD) were integrated into the CPU. In 2000, Intel launched the 64-bit Pentium 4 series of processors, which supported virtualization. In 2010, Intel introduced its all-new Intel Core series of processors that adopted the tick-tock production model. With this model, every microarchitecture change (tock) was followed by a die shrink of the process technology (tick). However, thanks to the shrinking die sizes, that process was replaced with a three-element cycle known as Process-Architecture-Optimization (PAO).

Special-purpose chips are designed for a specific type of application, for example, general processing units (GPUs) for image processing, digital signal processors (DSPs) for digital signal processing, AI chips for AI acceleration, and field-programmable gate arrays (FPGAs) for hardware programming in specific application scenarios. This type of chip is efficient in processing specific applications but weak in processing general services.

General-purpose chips

x86 Arm MIPS

PowerPC RISC-V

Special-purpose chips

AI chip FPGA

GPU DSP

Yang Shengkai, You Shiping

The core components of a camera are the image sensor and image processing chip. Optimal image quality requires both light processing by the lens and image sensor as well as image processing by image processing chips. Therefore, chips, as the core processor, play a critical role in high-definition (HD) video surveillance. To support frontend intelligence, cameras must be equipped with professional Artificial Intelligence (AI) chips.

33

Arm processors can be traced back to the Cambridge Processing Unit (CPU), which was founded in 1978 in Cambridge, UK and renamed Acorn in 1979. The company specialized in electronic devices. In 1985, Acorn developed the first-generation 32-bit 6 MHz processor based on the reduced instruction set computer (RISC) architecture. Acorn gave the name, Acorn RISC Machine (Arm), to this processor.

RISC features low power consumption and high cost-effectiveness since it supports simple instructions. Therefore, RISC is perfect for mobile devices. Apple's Newton personal digital assistant (PDA) was a well-known early device that used an Arm processor.

In the 1990s, the creators of the Arm processor developed 32-bit embedded RISC processors aimed at embedded applications featuring low power consumption, low costs, and high performance. At the turn of the 21st century, Arm processors gradually increased their domination of the booming mobile phone market across the globe, which it continues to enjoy to this day.

Arm processors

Currently, there is no recognized standard for the definition of AI chips. Generally speaking, chips for AI applications are all called AI chips. AI chips can be classified into three types according to their design principle: (1) chips for accelerating training and inference of machine learning algorithms, especially deep neural network (DNN) algorithms; (2) brain-like chips inspired by biological brains; and (3) general-purpose AI chips that can efficiently compute various AI algorithms.

AI chips

8086/8088

Core series (second generation to tenth generation)

1978: real mode, 16-bit processor

Pentium

1993: superscalar, data/instruction cache, branch prediction, SIMD (MMX)

Pentium 4 series

2000–2006: NetBurst microarchitecture, hyper-threading, 64-bit architecture, SSE2/3 with virtualization support

Core

2010: Westmere microarchitecture, 32 nm process

Arm7

199332-bit, Arm v4 architecture, von Neumann architecture

1998Arm v5 Harvard architecture, which can run advanced embedded operating systems such as Linux

Arm9

2002Arm v6 architecture that supports SIMD instructions

Arm11

2005Arm v7-A architecture that supports hardware virtualization, superscalar, and multi-core

Cortex-A8

2007Arm v7-A architecture that supports out-of-order execution and instruction set enhancement

Cortex-A9

Cortex-X1

2020Arm v8-A architecture,big-core design, performance-first

2012–2020Arm v8-A architecture, 64- or 32-bit mode, TrustZone, dual issue mechanism

Cortex-A53/57/A72/A76/A78

2010Arm v7-A architecture,15–24 pipeline stages

Cortex-A15

2011–2019Process: 32 nm -> 14 nm -> 10 nmArchitecture: Sandy Bridge/Ivy Bridge/Haswell/Broadwell/Skylake/Icelake...

Yang Shengkai, You Shiping

AI/Chip Evolution and Development

34

3. Chip Industry Chain

From chip design to delivery, the division of work in the chip industry chain consists of six parts.

The rise of deep learning has seen the growth of AI chips. Deep learning poses high requirements on computing power, which cannot be met by traditional CPUs. Algorithm researchers find that GPUs are ideal for processing training and inference tasks on deep learning algorithms. In the past five years, a large number of dedicated AI chips and chip vendors have emerged with the explosive growth of deep learning. NVIDIA has also seized this opportunity to develop a series of AI chips, such as Tesla and Jetson, dedicated to deep learning tasks.

Design software

DescriptionDivision of Work Representative Brand/Vendor

NVIDIATraining: Tesla P100/V00/A100Inference: Tesla P4/T4, Jetson TX2

AI Chip Vendor Function

Google Training: TPU V1/V2/V3

HuaweiTraining: Ascend 910Inference: Ascend 310

Alibaba Inference: Hanguang 800

Cambrian Inference: Siyuan 270/220/100

Key tool used by chip manufacturers to design chip architecture. Currently, electronic design automation (EDA) software is used to design chips.

Synopsys (US), Cadence (US), Mentor Graphics (Germany)

Instruction set architecture (ISA)

Cornerstone of the chip ecosystem since it determines the OS running on a chip.

IA64 instruction set (Intel), Arm instruction set (Arm),RISC-V open source instruction set (RISC-V Foundation), etc.

Chip designResponsible for chip layout design, for example, key intellectual property (IP) cores of chips and complete System on Chip (SoC) design.

Intel (US), Qualcomm (US), Samsung (Republic of Korea), NVIDIA (US), MediaTek (Taiwan of China), HiSilicon (China)

Manufacturing equipment

Cutting edge devices used to produce chips, mostly referring to mask aligners which produce integrated circuits using the photolithography process.

ASML (Netherlands) is the top mask aligner manufacturer in the world. Shanghai Micro Electronics Equipment is the top mask aligner manufacturer in China but is well below the top players.

Wafer foundryProcess from chip layout to product manufacturing.

TSMC (Taiwan of China), Samsung (Republic of Korea), and GLOBALFOUNDRIES (US), UMC (Taiwan of China), SMIC (China), etc.

Packaging and testing

Generally performed in packaging and testing factories. The purpose of packaging and testing isto package chips after chip slicing and test the electrical performance of each chip to ensure that the functions and performance counters of the chips meet requirements.

ASE Group (Taiwan of China), Amkor Technology (US), JCET Group (China), TSHT (China), etc.

35

4. Chip Development Trends

Currently, AI chips can surpass human beings in some specific tasks. However, they are far from reaching human intelligence in terms of universality and adaptability. Most AI chips are used only to accelerate specific algorithms and cannot process general tasks. In the future, general-purpose AI chips are forecasted to have the following capabilities:

Arm chips dominate the embedded system market while x86 chips dominate the desktop and data center markets. However, the competition between the two has never ceased. Currently, some server-level Arm chips have emerged and are competing for the data center market. However, x86 chips still take the lead in this market.

Arm chips have weak single-core performance, so they use a large number of CPU cores for better performance. x86 chips have strong single-core performance, so they use a relatively small number of CPU cores. These two types of chips can be applied to different service scenarios because of their architecture differences.

However, due to slow manufacturing process evolution, deep x86 pipelines, and difficulties in microarchitecture breakthroughs, x86 chips are likely to integrate more cores. In addition, Arm has also launched the Cortex-X architecture aimed at the data center market. Arm is now trying to achieve maximum performance, while de-emphasizing performance per watt. In other words, Arm chips will evolve from having small cores to having large cores.

Programmability: AI chips can adapt to algorithm evolution and application diversity.

Dynamic architecture variability: AI chips can adapt to different algorithms to implement efficient computing.

High computing efficiency: Algorithms have endless requirements on computing power. Non-AI computing restricts implementation of many algorithms.

High energy efficiency: AI chips can be applied to embedded devices thanks to high energy efficiency ratio (EER).

Easy application development: AI chips can provide complete software stacks to facilitate AI development.

Development trends of AI chips

Development trends of general-purpose chips

Microarchitecture evolution: The Instruction Per Cycle (IPC) will be improved by 10% each generation. IPC is still an important indicator to measure the processing capability of general-purpose chips. Chips can integrate larger caches, more execution units, and more accurate branch prediction and task scheduling mechanisms to continuously improve the IPC.

Process evolution: Currently, general-purpose chips use the 7 nm process, with 5 nm and 3 nm processes currently in the pipeline. Although chip militarization is nearing its physical limits, process evolution is still the key to improving chip performance.

Power consumption control: The increasing number of general-purpose chip cores expands the chip size. A more refined and intelligent power consumption control mechanism is required to enable large-scale commercial use of chips.

Interconnection bandwidth: Multiple chips can be interconnected to enhance single-node processing capabilities.

Yang Shengkai, You Shiping

Vehicle algorithm & application

device (vendor A)

Vehicle algorithm & application

device (vendor B)

Other algorithm & application

device (vendor C)

Other algorithm & application

device (vendor D)

36

Kang Ming, Ding Fuqiang, Liu Yanyan

Algorithm Repository Technology

1. BackgroundAs artificial intelligence (AI) technology proliferates, using intelligence to improve the efficacy of video and image capture has become a trend in the industry.

2. Traditional Algorithm Repository SolutionsIn the early stage of the industry, some solutions have been provided to support the capability of co-deploying multiple algorithms in the same system. The following figure shows the integration of multiple algorithms from different independent software vendors (ISVs).

The application system integrates algorithm & application devices from various vendors.

Application system

Intelligent objects(examples: pedestrians,motor vehicles, and non-motorized vehicles)

Algorithms dedicated to various scenarios(examples: smoke and fire detection in the warehousing industry, safety helmet detection in the construction industry, and discharge detection in the water conservancy management industry)

Current situation

System architecture technology

Requirement: construct a multi-algorithm platform to support video and image intelligence

Support:Multiple algorithmsAlgorithms from multiple vendorsCoexistence of multiple versions

Provide:Optimal resource utilization, flexible scheduling capabilities, and unified APIs for integrating service-oriented algorithmsLarge-scale data management, and flexible orchestration for rapid combining of service-oriented algorithmsA mechanism to support quick algorithm optimization and "survival of the fittest", ensuring efficient application implementation

Algorithm repository technology

Vehicle application Other applications Algorithm evaluation

AI/Algorithm Repository Technology

37

Although the preceding solution is the most commonly used solution provided by traditional video surveillance vendors, it is the most inefficient and wasteful solution for users.

In principle is able to integrate the service capabilities of multiple algorithms.Supports separate algorithm deployment on devices.Allows the application system to integrate algorithm & application devices from various vendors.

In general, this solution is a stack of independent algorithm services with no overall algorithm repository architecture or technical support. Its weaknesses are as follows: Algorithms are independently deployed and hardware resources cannot be shared.Service data is stored on devices and cannot be shared.The application system needs to adapt to open APIs of various devices, which requires a heavy integration workload.Algorithm devices are independent of each other. To associate services between algorithm devices, the application system needs to perform comprehensive processing, which poses technical challenges.Systems need to be maintained separately, and the hardware and software differences between systems increase maintenance difficulties.Algorithm evolution requires support from algorithm vendors.

3. Current Algorithm Repository Solutions in the IndustryAs AI and cloud-based big data technologies become mature, AI algorithms have increasing requirements on computing power, new algorithms and algorithm types are being released at a faster rate, and more and more scenario-based algorithm applications have come into being. Building a more efficient and sustainable resource management system that complies with the technology evolution trend has become a common requirement.Traditional video surveillance vendors and emerging companies specializing in AI and professional cloud-based big data have provided technical solutions to meet this requirement.

Algorithmconfiguration

Algorithm scheduling

Algorithmrunning

monitoring

Algorithm evaluation

Algorithm orchestration

Algorithm management

Logical Architecture of Algorithm RepositoryMain Technical Solution

Algorithm SDK integration solutionAlgorithm container image integration solutionAlgorithm app service integration solution

Personservice

Vehicle service

Behavior analysis

Holographic profile

...

Service capability openness

Algorithm platform

Analysis algorithm Search/Clustering algorithm

...

Vendor AVersion A Version B

Existing Solutions

WeaknessesPe

rson

ana

lysi

s

Vehi

cle

anal

ysis

Stru

ctur

ed a

naly

sis

Beha

vior

ana

lysi

s

Scen

ario

-spe

cific

anal

ysis

Pers

on s

earc

hby

feat

ure

Vehi

cle

sear

chby

feat

ure

Stru

ctur

edda

ta-b

ased

sea

rch

Fea

ture

clu

ster

ing

Vide

o sy

nops

is

Vendor B Vendor AVersion A Version B Vendor B

Kang Ming, Ding Fuqiang, Liu Yanyan

38

Technically-speaking, most algorithm vendors (such as Hikvision, Dahua, and Uniview) have launched their own algorithm SDK integration solutions. In these solutions, most self-developed or purchased algorithms are integrated via SDK but a multi-algorithm capability is not supported.Currently, only Huawei's algorithm repository has managed to achieve SDK integration. Currently, nearly 50 algorithms have been integrated, providing users with a large-scale SDK interconnection experience.

Algorithm SDK integration solution

Logical Architecture of Huawei Multi-algorithm SDK Integration Solution

Abstraction

ISV application

eSDK/RESTful API

APIGW

API serviceimage repository

Video/Image analysis

Analysis algorithmplug-in

Container Container Container

Search service

Search algorithmplug-in

Vendor K's service

image

Vendor L'sservice image

Video access

AI accelerator card

Video/Image access

CPUIaaS

vPaaS algorithm repository

SDK plug-in mode API mode

Semi-structured data

Algorithm repository

Intelligent video/image recognition

(algorithm SDK)

Intelligent video/image search

(algorithm SDK)

Data lake (video/image, feature, and structured data)

Sharing of hardware among cloud platforms

ISV

Unified capability openness

Service image

Algorithm service API service

API routing agent

Vehicle application

Other applications

Search service Analysis serviceAlgorithm service

Person application

Long-tail applications

Pedestrian/Vehicle application

Vehicle algorithm repository

Person algorithm repository

Data structuring algorithm repository

Long-tail algorithm repository

Others

Vendor A's algorithm plug-in

Vendor C's algorithm plug-in

Vendor D's algorithm plug-in

Vendor E's algorithm plug-in

Vendor F's algorithm plug-in

Vendor G's algorithm plug-in

Vendor H's algorithm plug-in

Vendor I's algorithm plug-in

Vendor J's algorithm plug-in

SDK algorithm repository

Camera Camera Camera Camera Networked platform

Disk

Platform connectionMetadata accessImage access

Memory

Vendor B's algorithm plug-in

AI/Algorithm Repository Technology

39

Algorithm container image integration solution

The following figure shows the logical architecture of the algorithm container image integration solution. In this solution, algorithms are packaged into container images. The algorithms in the images have basic algorithm service capabilities and provide services for external systems through APIs. Upper-layer applications manage a combination of algorithm images to form an overall service.This solution is used by Alibaba and Huawei (including HUAWEI CLOUD EI). It is applicable to the integration of analysis and recognition algorithms, but not applicable to the integration of search or match algorithms.

Sharable cloud platform hardware

Vendor X's applicationIntelligent video/

image search(algorithm container)

Algorithm repository

Personalized API

Intelligent video/image recognition

(algorithm container)

Data lake (video, images, and structured data)

4. Benefits to Intelligent Video SurveillanceAlgorithm repository technology can fully utilize the advantages of on-demand allocation and elastic capacity expansion of cloud-based resources to meet the requirements of resource changes in actual use. Services such as traffic-based scheduling and inter-domain algorithm collaboration derived from algorithm repository technology bring high practical values to users.

Algorithm app service integration solution

The algorithm app service integration solution is essentially similar to the traditional platform or device connection solutions. This solution can be classified as proactive (initiator: platform) or passive (initiator: algorithm service). Both types require connection to the APIs designed for algorithm services or platforms. The following figure shows the logical architecture of the algorithm app service integration solution.

ISV application(examples: Ropeok, Hikvision, and Kedacom)

Platform Invoke at theapplication layer

Adapt to algorithmservice APIs

Unified capability

Algorithmservice

container/VM 1

Algorithmservice

container/VM 2

Algorithmservice

container/VM 3

Kang Ming, Ding Fuqiang, Liu Yanyan

40

GPU resource pooling, elastic anddynamic algorithm scheduling

ActiveStandby (1–3)

Scenario 1: Off-peak hours of the active algorithm

Scenario 2: Peak hours of

the active algorithm

Scenario 3: Burst traffic of the active algorithm

Service challengesThe traffic difference between day and night is substantial, and idle resources are not fully used at night.

Urgenttasks

Importanttasks

Commontasks

Idle resourcereuse tasks

Fast

ana

lysi

s of

rec

ordi

ngs

Anal

ysis

of v

ideo

from

site

s fo

r ke

y or

gani

zatio

nsAn

alys

is o

f vid

eo fr

om s

ites

for

key

publ

ic p

lace

s

Virt

ual c

heck

poin

ts in

no

n-co

nstr

aint

sce

nario

sAc

cess

con

trol

sys

tem

s in

co

mm

uniti

esVi

deo

from

com

mon

site

s

Non-critical site analysis

Algorithm update and

data cleansing

Past video analysis

Traffi

c sh

apin

g

GPU

GPU

GPU

Analysistask A

VM

GPU80%

Releaseresources

VM

GPU

GPU

GPU

GPU

20%

Analysistask A

VM

Night

Scenario 1

00:00 04:00 08:00 12:00 16:00 20:00

Noon

Video surveillance for emergencies or

major eventsScenario 2

Scenario 3

Dynamic and elastic scheduling based on service awareness

Imagestream

Task queue

Traffi

c di

strib

utio

n/ca

chin

g

1. Analysis (active algorithm)2. Analysis (standby algorithm)

3. Video structuring

4. Others

Resource scheduling management (policy center)Status monitoringTraffic monitoring

High LowTask priority

Daily people flow trend at checkpoints

Real-time, dynamic, elastic scheduling

Trafficaggregation

at checkpoints

The upper-level domain is responsible for building the overall algorithm repository. Lower-level domains download desired algorithms from the upper-level domain for specific services, fully utilizing resources of the upper- and lower-level domains. In addition, traffic-based scheduling can be used to pool resources for the entire area, and implement one cloud for the entire area or even the whole country in terms of hardware resources and service capabilities.

Analysis task Search task

Service entry

Analysis task Search task

Service entry

Analysis task Search task

Service entry

Cloud-edge synergy

Cloud-edge synergy

Lower-level domain

Cloud-edge synergy

...

Lower-level domain

Upper-level domain

Match and download algorithms

Match and downloadalgorithms

Inter-domain algorithm collaboration

Traffic-based scheduling

Traffic-based scheduling improves hardware resource usage, implements on-demand resource scheduling between multiple algorithms in a system, and supports resource sharing between services, meeting the requirements for scaling out/in specified service resources in a short period of time.

Vehiclealgorithm

Synopsisalgorithm ...

Vehiclealgorithm

Synopsisalgorithm

... Vehiclealgorithm

Synopsisalgorithm

...

Scenario 3

Evening rush hours

Evening rush hours

Morningrush hoursMorning

rush hours

AI/Algorithm Repository Technology

41

5. Challenges and Trends

Challenges

Once industry standards are released for algorithm repository technology, it is possible to promote industry vendors to form a standard algorithm repository technical architecture based on the cloud-based architecture. This will then encourage the development of more useful industry applications and allow for the realization of an all-encompassing cloud where multiple algorithms can coexist and various upper-layer applications can be developed.

Driven by the requirements for hierarchical user construction and separate management of data, algorithms, and computing power, the traditional mode of one vendor leading one project no longer exists. This drives vendors to seek new cooperation modes and therefore provides great opportunities for promoting and popularizing algorithm repository technology.

Trends

CouplingIn the algorithm SDK integration solution, the algorithm repository framework and algorithms share the same open-source software. The algorithms may fail to run properly due to mismatch of the open-source software version. The algorithm vendor and the algorithm repository platform vendor need to work together to solve the problem. If the algorithm SDK is deployed as an independent process, the performance deteriorates due to procedure call, especially during the processing of a large amount of data.

The traditional method of using independent algorithm devices is prone to resource siloing. Algorithm repositories, based on the cloud-based architecture, can solve the problem of resource siloing, but require algorithm vendors to separate algorithms from services. This has the downside of reducing device sales and affecting device solutions. In this scenario, algorithm vendors and platform vendors need to work together to find a new business cooperation mode to achieve win-win.

Currently, industry groups have been organizing the discussion and formulation of technical standards for algorithm repositories. However, due to impact of the preceding business model issues and vendors' own interests, vendors in the industry are not active in formulating technical standards (especially SDK integration standards) for algorithm repositories. Vendors that have been using algorithm repositories need to develop more high-level services and solutions based on algorithm repositories to attract users to accept algorithm repository technology. In this way, users will in turn drive major vendors to increase investment and support for algorithm repository technology.

Business mode

Standards

Kang Ming, Ding Fuqiang, Liu Yanyan

Cheng Min, Huang Jinxin, Shi Changshou, Fang Guangxiang, Jia Lin

42

AI/SuperColor Technology

Bayer array Combination of primary colors

High-sensitivity sensor

Conventional sensor technology

High-sensitivity sensor technology

1. High-Sensitivity Sensor Technology

SuperColor technology utilizes innovative sensor and algorithm-based image processing so cameras can deliver sharp, full-color images at night, improving recognition accuracy in extreme darkness. However, capturing images at night-time causes severe light pollution, especially at traffic checkpoints, and this in turn affects driver safety and disturbs nearby residents. In recent years, various manufacturers have endeavored to improve night image quality while reducing light pollution, but have so far made little progress. Therefore, this issue remains an important area for research within the camera industry.

Traditional image sensors adopt the Bayer filter, which features a 4 x 4 array consisting of four grids (2 x 2 array). Each grid contains one red pixel (R), one blue pixel (B), and two green pixels (G). The energy produced by the red spectrum, green spectrum, and blue spectrum pass through pixels of the corresponding color, respectively.

The following figure shows the light transmittance of a sensor with a

Bayer filter and that of a high-sensitivity sensor.

The high-sensitivity sensors adopt color filters with higher light transmittance, to enhance the light sensitivity of each

pixel. When equipped with these sensors, cameras are capable of delivering images with a higher signal-to-noise ratio

(SNR) in low light conditions, improving image-based recognition.

RYYB sensor: Allows both the green spectrum and the red spectrum to pass through the green pixels, improving light transmittance by 40% compared to a sensor with a Bayer filter.

RCCB sensor: Allows the gamut of the spectrum to pass through the green pixels, improving light transmittance by 80% compared to a sensor with a Bayer filter.

RGB-IR sensor: Allows the red and blue pixels of the traditional Bayer array to pass through the pixel-sensitive infrared (IR) spectrum. In low light conditions, the IR spectrum can supplement light, which enables the sensor to deliver images with a high SNR.

SuperColor Technology

R G R G

G B G B

R G R G

G B G B

RYYB RCCB RGBIR

Light transmittance1/3 About 2/3

Red + Green = Yellow

Red + Blue = Purple

Blue + Green = Cyan

Red + Green + Blue = White

Red

Yellow Cyan

Green

White

Purple Blue

High-sensitivity sensorSensor with the Bayer filter

43

BM3D algorithm

NL-means algorithm

Conventional spatial noise reduction

Effect comparison

2. DNN ISP-based Noise ReductionNoise is a kind of image distortion which occurs during the signal acquisition process. There are several types of noise, including dominant shot noise, dark-current shot noise, thermal noise, fixed-pattern noise, and readout noise. Generally, to obtain high-quality images, we need to eliminate noise in images without damaging information integrity. In short, noise reduction removes valueless information from images and improves encoding efficiency. Conventional noise reduction algorithms are classified into two categories: spatial or temporal.

Spatial noise reduction, also called single-frame noise reduction, reduces noise by processing frames individually.The non-local means (NL-means) algorithm and block matching 3D filtering (BM3D) algorithm are common spatial noise reduction methods. The NL-means algorithm takes a target pixel and searches the entire image for similar pixels, weights them by similarity, then takes the mean of these pixels to optimize the image. The BM3D algorithm searches for similar pixels to the target pixel, then transfers the pixels to the frequency domain, performs filtering and thresholding, and then transfers the pixels to the space domain.

When the sensor size, pixel size, and processing method of the high-sensitivity sensor are the same as those of the Bayer sensor, it can deliver images with higher quality at night, as shown in the following figures.

Imaging effect of the Bayer sensor (0.001 lux) Imaging effect of the high-sensitivity sensor (0.001 lux)

Searches for similar pixels

Searches for similar pixels

Integrates the pixels into a 3D matrix

Conducts 3D linear transformation

Performs filtering and thresholding

Performs 3D inverse transform

Generates denoised pixels

Runs the weighted average algorithm

Generates denoised pixels

Cheng Min, Huang Jinxin, Shi Changshou, Fang Guangxiang, Jia Lin

44

AI/SuperColor Technology

Effect comparison

Conventional temporal noise reduction

DNN ISP-based noise reduction

Temporal noise reduction, also known as multi-frame noise reduction, introduces several adjacent frames (temporal domain information) and performs weighted averaging on similar pixels in the spatial domain to reduce noise. However, if there are moving objects in two consecutive frames, an error occurs when two pixels belonging to different objects are filtered, resulting in motion blur. Therefore, the objective of temporal noise reduction is to accurately detect the motion strength and perform weighted averaging on the results of temporal filtering and spatial filtering. When the motion strength is considerable, the temporal-domain weight coefficient decreases and the spatial-domain weight coefficient increases. When the motion strength is minor, the temporal-domain weight coefficient increases and the spatial-domain weight coefficient decreases.

Traditional noise reduction technologies are unable to retain image details or edge information. Additionally, they cannot accurately evaluate motion strength, resulting in motion blur when obvious noise is removed. Against this backdrop, the deep neural network (DNN) image signal processing (ISP)-based noise reduction technology has been proposed. Based on a brand-new algorithm architecture, it is better at distinguishing between motion regions and non-motion regions, and between noise and image details, effectively resolving the problems that traditional noise reduction technologies are unable to overcome.

The DNN ISP-based noise reduction algorithm consists of a preprocessing algorithm, deep learning network, and post-processing algorithm. The preprocessing algorithm transforms data into a format suitable for network input and sends images to the deep learning network, which is formed of convolutional layers. Then, the network reduces noise on these images according to a pre-trained weight, and sends them to the post-processing unit where they are transformed into denoised images.

The DNN ISP-based noise reduction technology trains the network using data that includes real noise, which enables it to reduce 3 dB to 6 dB more noise than a conventional algorithm and facilitate more accurate image recognition.

Effect of conventional noise reduction algorithm Effect of DNN ISP-based noise reduction algorithm

Conventional temporal noise reduction

Temporal-domain filtering

Spatial-domain filtering

Weighted filtering result

Motion area

Motion area

Motion area

DNN ISP-based noise reduction

Video frame Denoised video frame

Prep

roce

ssin

g

Dee

p le

arni

ng

netw

ork

Post

-pro

cess

ing

45

Definition of glare

Measurement of glare

In addition to sensor and image processing technologies, night-time image capture is also affected by lighting conditions. Illuminators can provide improved lighting, but this causes uncomfortable glare in scenarios such as urban roads and alleys, and has lead to numerous complaints from nearby residents. Therefore, the industry desperately needs a standard by which glare can be measured and evaluated.

Currently, the industry primarily adopts IR flash, LED strobe, and dual-spectrum fusion technologies to reduce the glare of illuminators at night, but color cast and LED light pollution remain significant. Therefore, intelligent light compensation technology has been introduced to enable cameras to capture sharp, full-color images at night without producing severe light pollution.

Glare refers to visual conditions in which there is excessive contrast or an inappropriate distribution of light sources that disturbs the observers or limits their ability to distinguish details and objects.

Additionally, a threshold increment formula is introduced as another measurement of glare in the video surveillance industry. Threshold increment is the measure of disability glare expressed as the percentage increase in contrast required between an object and its background for it to be seen equally well with a source of glare present.

From the preceding formula, we can infer that the threshold increment can be reduced by lowering Lv and increasing Lav simultaneously.

Currently, there is no universal method for measuring glare within the video surveillance industry. In addition, the values of Lv and Lav can only be obtained through complex theoretical calculations and optical design. The threshold increment formula provides a quantitative way to measure disability glare in the industry and helps us to formulate appropriate standards. As a result, vendors will be able to produce illuminators that meet the glare index requirements.

Currently, in most of scenarios, glare is measured using the unified glare rating (UGR) proposed by the International Commission on Illumination (CIE), and this method has been widely applied in the field of lighting. However, when it comes to video surveillance, light compensation is generally used in scenarios where illumination is extremely poor, and the UGR cannot be directly applied. Instead, the following formula is used to measure glare.

In the formula above:

Lv : luminance received by the observer (unit: cd/m2)

Lav: average background luminance (unit: cd/m2)

χ: correction index (0 < χ < 1)

3. Formulation of Glare Measurement Standard

4. Intelligent Light Compensation

Lb : background luminance (unit: cd/m2)

Ls : luminance of each light source from the perspective of the observer

(unit: cd/m2)

w: solid angle from the light source to eyes of the observer

P: position index of each light source

n: number of light sources (positive integer)

In the formula above:

Glare levels

Glare Index (UGR)

16

10

22

28

19

Description

Imperceptible

Acceptable

Borderline

Uncomfortable

Disabling

Cheng Min, Huang Jinxin, Shi Changshou, Fang Guangxiang, Jia Lin

46

AI/SuperColor Technology

Traditional light compensation

Intelligent light compensation

Effect comparison

LED illuminators at traffic checkpoints are a main source of light pollution. High-speed vehicles can quickly pass the snapshot area, and the snapshot distance ranges from 20 m to 30 m. In this case, LED illuminators produce little light pollution. However, traditional omnidirectional light compensation technology produces large light pollution areas and severe glare at a distance of over 50 m, affecting drivers on the target and surrounding lanes. Additionally, the illuminators feature scattered luminous energy, resulting in poor light compensation efficiency.

The technology enables cameras to compensate a specific amount of light for targets and regions of interest (ROIs), performing weak light compensation for video streaming (light intensity: 0–20 lux) and strong light compensation for image capture (light intensity: 50–100 lux). This means gas-discharge flash lights (light intensity: 20,000 lux) are no longer needed and light pollution is effectively reduced. Additionally, the light cup and architecture can be altered to effectively cut off stray lights and glare, thereby reducing light pollution and improving light compensation efficiency.

This targeted, intelligent light compensation technology helps effectively cut out stray lights and eliminates the need for gas-discharge flash lights. It can integrate a high-sensitivity sensor and DNN ISP-based noise reduction technologies to enable cameras to capture sharp, full-color images in low light conditions without producing light pollution.

Severe color cast under traditional light compensation

Sharp, full-color images under intelligent light compensation

Stray lights have a limited impact on driver 1 but severely affect driver 2 and nearby drivers/passengers

Illuminator

LED strobe light

Gas-discharge flash light

Environment-friendly smart LED illuminator

Video: 0–20 lux Snapshot: 50–100 lux

Weak light compensation for video and strong light compensation for snapshots, no need for flash lights means no light pollution

+

Traditional omnidirectional light compensation with a large light pollution area, scattered luminous energy, and poor light compensation

18 m22 m

25 m

40 m

50 m

100 m

Light pollution

Intelligent light compensation with shallow depth

Excellent light compensation at 18–40 m and better snapshot effect

Driver 2's vision

Lights are cut off at a distance of over 50 m to ensure safe driving

Stray lightMain beam

Driver 1's vision

18 m

22 m25 m

40 m

50 m

100 m

Gong Junhui, Yue Boxuan, Xu Zhen

Video Codec Technology

Transmission Storage

It’s widely acknowledged that we live in the mobile Internet era, in which streaming media, and more specifically, diverse video formats – from humorous short video clips on social networking apps, to interactive live streams, and ubiquitous video surveillance, have reinvented daily life. However, the sheer amount of video data generated after image collection can be enormous, which places great strain on video transmission and backend storage. Fortunately, the arrival of video codec technology has made video transmission and storage easier and more efficient than ever.

When a camera is used to shoot a 1-minute 4K video (3840 x 2160 pixels), the uncompressed video data volume is about 17.38 GB. Under this scenario:

1. Foundational ConceptsA video feed is a sequence of images (called frames) captured and eventually displayed at a given frequency. The frames are classified into three types: I-frame, P-frame, and B-frame.

I-frame

B-frame B-frame

P-frame P-frameI-frame

If the bandwidth is 100 Mbit/s, it takes about 24 minutes to transmit a 1-minute video.

At full speed, a 10 TB hard disk can store a combined-9 hours of video at most.

47

B-frame is a bi-directional predictive frame that records the difference between the current frame and its preceding and proceeding frames. During decoding, the preceding and proceeding frames need to overlay with the current frame for final video images. This frame type is not suitable for real-time transmission, for example, video conferences, because it depends on subsequent frames.

P-frame (predicted picture) holds only the changes in the image from the previous frame. During decoding, the video frames buffered previously, need to overlay with the difference defined by the current frame for final video images.

I-frame, also called the key frame, is a single frame of digital contents that the compressor examines independently of the frames that precede and follow it, and stores all of the data needed to display that frame. However, I-frames contain the most amount of bits and therefore take up more space on the storage medium. After proper intra-frame compression, decoding only requires the current frame data. An I-frame can be used as the reference frame for subsequent frames, or as an image.

Gong Junhui, Yue Boxuan, Xu Zhen

48

AI/Video Codec Technology

2. Main Process

Video codec technology involves compressing video in a step-by-step manner. Current video codec solutions are mainly in the hybrid encoding format (predictive coding + transform coding), and contain the following main processes: prediction, transformation, quantization, and entropy encoding.

Prediction Transformation Quantization

PredictionAnti-

transformationDequantization

Entropyencoding

Entropyencoding

Channeltransmission

Encodedvideo

Decodedvideo

Originalvideo

Eliminating spatial redundancy in intra-frame prediction

Eliminating temporal redundancy in inter-frame prediction

Predictionresiduals

Transformationcoefficient

Time domain Frequency domain

Quantization

Quantized data

Source: Overview of the High Efficiency Video Coding (HEVC) Standard

Prediction: uses intra-frame prediction and inter-frame prediction technologies to eliminate spatial and temporal redundancy in video for compression.

Transformation: converts a time domain to a frequency domain to eliminate the correlation between adjacent data, indicating spatial redundancy elimination.

Quantization: eliminates the information that is imperceptible to human eyes, to reduce the amount of encoded data, and improve the compression ratio, indicating psycho visual redundancy elimination.

Entropy encoding: reduces data redundancy based on the probability characteristics of the data to be encoded.

49

3. Exploration Phase History of Video Codec Standards

During the entire video codec process, including prediction, transformation, quantization, and entropy encoding, numerous methods can be used within each sub-process to perform phased compressed encoding. For example, discrete cosine transform (DCT) algorithms and continuous wavelet transform (CWT) can both facilitate transform coding, and promote fully compressed encoding.

As early as 1993, the ITU Telecommunication Standardization Sector (ITU-T) had formulated the first video codec standard H.261, to ensure the interoperability of products from different vendors. With the flourishing of the video industry, video codec standards have continued to evolve.

The following table provides a comparative overview of the three common video codec standards.

In the video surveillance field, H.264 is the dominant video codec standard in use. Due to their advanced technology and standout performance, H.265 chips are expected to eventually replace H.264 chips, and become the mainstream technology within the industry.

28

0 200100 300

30

32

34

3635

38

40

PSNR (dB)

bit rate (kbit/s)

H.265/

MPEG-HEVC

(2013) H.264/

MPEG-4 AVC

(2003)

H.262/

MPEG-2

(1995)

H.261 (1991)

JPEG (1990)

Bit-rate Reduction:50%Bit-rate Reduction:50%

Exploration PhaseHistory of Video Codec Standards

Source: – Fraunhofer Heinrich Hertz InstituteVersatile Video Coding (VVC)

Standard H.264 (AVC) H.265 (HEVC)

Bit-rate usage 100%

Background Proposed by ITU-TVCEG in 2013

50%

Complexity 100% 300%

35%

300%

H.265+ (Video Surveillance)

Proposed by ITU-T and ISO/IEC in 2003

Launched by video surveillance vendors for surveillance scenarios based on H.265 (HEVC)

Motion compensation for multiple reference frames,adaptive 4 x 4 and 8 x 8 integer conversion, adaptive frame encoding, etc.

Multiple dimension changes (from 4 x 4 to 32 x 32), multiple intra-frame prediction modes, tree-structure prediction units, etc.

Dynamic GOP, dynamic FPS,dynamic ROI, long-term reference frame and background frame, etc.

Main technical methods

Gong Junhui, Yue Boxuan, Xu Zhen

4. ROI Encoding Technology

In video surveillance scenarios, certain areas under observation, such as the sky and grassy areas, can be neglected. Encoding and transmitting data from the entire surveillance area can unnecessarily strain network bandwidth and storage resources.

With the H.264/H.265 standard, encoding optimization is performed on the region of interest (ROI) technology for video surveillance scenarios. ROIs are extracted using the ROI extraction algorithm. During encoding and quantization, ROIs are encoded, while the non-ROIs are compressed. This lessens the network bandwidth and storage space demands, without compromising overall image quality.

ROI extraction is the basis of ROI coding technology, and directly determines the final effects of video coding. ROI extraction algorithms can be classified into background modeling algorithms and deep learning-based object detection algorithms.

These days, Gaussian Mixture Model (GMM) and Visual Background Extractor (ViBE) are the most prevalent background modeling algorithms for advanced foreground object segmentation. The background modeling algorithm development process consists of the following three steps: 1. Background model setup: establishing a mathematical model by extracting background features in a video sequence.2. Foreground detection: subtracting the detected image from the background model, to identify and locate moving

objects.3. Model update: updating the background model at a specific rate, to adapt to changes to background objects, such as

illumination, rain, and fog.

Backgroundmodel

Backgroundmodel

initializationVideo input

Foregrounddetection

Objectdetection

result output

Update

Detect

The foreground indicates the extracted ROIs and the background indicates the non-ROIs.

ROIextraction

In-depthcoding

Videooutput

Videoinput

Live video

Background compression

Normal encoding of foreground

Real-time re-coding of foreground and background

50

AI/Video Codec Technology

Neural network

To save storage space and transmission bandwidth resources.

To ensure image quality via an optimal peak signal-to-noise ratio (PSNR).

To maintain high-precision video surveillance services.

In-depth filtering of interferencefactors in the background Refined pixel-level segmentation to save space

ROI Extraction Algorithm Benefits and Drawbacks

Target regions are accurately recognized.

ROIs are extracted in rectangles, containing many redundant areas.

Foreground tracking is comparatively accurate.

Background modeling algorithm

51

In deep learning-based object detection, a video image is transferred to a deep neural network, for example, a convolutional neural network (CNN or ConvNet). Analysis capabilities are acquired through the deep neural network model, for the purposes of learning the internal rules for a large amount of sample data. These capabilities can be further utilized to categorize all objects in images and subsequently predict corresponding locations of the objects. Common deep learning-based object detection algorithms include: region-based convolutional neural networks (R-CNN) algorithm, you only look once (YOLO) algorithm, and single shot detector (SSD) algorithm.

The benefits and drawbacks of conventional background modeling algorithms and deep learning-based object detection algorithms are as follows:

Some vendors in the video surveillance industry made enhancements due to the disadvantages of the ROI extraction algorithm. New and improved ROI extraction algorithms take both moving objects, and specific objects, which are the target of video surveillance services, into account, and deeply filter out interference, in the form of animals, shaking greenery, or other elements. In addition, the ROI can be protected within a limited number of bytes, from rectangle cutouts to refined segmentation and delicate edge extraction, for the following purposes:

Deep learning-based object detection algorithm

The surveillance system is sensitive only to moving objects, resulting in the excessive extraction of moving objects such as animals and waving leaves from video images, rather than specific objects, such as motor vehicles, non-motorized vehicles, and individuals, which may be of interest for video surveillance purposes.

Gong Junhui, Yue Boxuan, Xu Zhen

Zhou Jiangli, Xu Zhen

52

AI/Storage EC Technology

1. It only tolerates disk faults, and each RAID group can only withstand one disk failure at a time.

2. At least one global hot spare disk needs to be prepared for each node.

3. An independent RAID controller card needs to be configured.

1. Data is protected as long as the number of failed disks does not exceed the limit.

2. No independent hot spare disks are required, and data can be read from and written to all hard disks.

3. No additional hardware is required.

1. Divides data into four data fragments.2. Calculates two parity fragments based on the EC algorithm.3. Writes the six (4 + 2) data fragments to different nodes.

However, traditional RAID 5 technology has some limitations:

Take the 4 + 2 level as an example. When receiving data, the system:

Redundant array of independent disks (RAID) is a well-known basic disk array technology used in storage systems. Nowadays, there are multiple RAID levels, each of which provides different kinds of data protection.

RAID 5 stores data and parity information on different disks. If one disk is damaged, RAID 5 automatically uses the remaining parity information to recreate the corrupted data once the disk has been replaced, ensuring data integrity.

Storage erasure coding (EC) is a method of data protection in which the original data and parity data are stored on different nodes, effectively breaking through the limitations of traditional RAID 5 technology.

The protection level of EC technology is represented by N + M, where N indicates the number of data fragments and M indicates the number of parity fragments.

1. Introduction

Storage EC Technology

Data stripe 1

Data stripe 2

Data stripe 3

Data stripe 4

Data stripe 5

Hard disk 1 Hard disk 2 Hard disk 3 Hard disk 4 Hard disk 5

A11 A12 A13 A14 AP

B11 B12 B13 BP B14

C11 C12 CP C13 C14

D11 DP D12 D13 D14

EP E11 E12 E13 E14

Data stripe

Parity data

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Disk

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6

Parity fragment

Parity fragment

Data fragment

Data fragment

Data fragment

Data fragment

Data

1

3

2

53

Hash distribution

A typical example is a shard-based distributed hash table (DHT), which can implement balanced data distribution.A storage system is divided into multiple shards at a fixed granularity (for example, 1 MB). IDs for these shards are created based on the file ID and start logical block address (LBA), and they are used as keys to calculate the hash value of each shard and allocate each shard to the DHT ring.

Compared with traditional storage, storage EC technology achieves load balancing by distributing data across nodes in either of the following modes:

3. Key Technology 1 – Globally-Balanced Data Distribution

In this type of architecture, no independent metadata management nodes are deployed. The system evenly distributes metadata and service data across storage nodes, preventing system resource contention. It also automatically reconstructs any metadata and service data which were stored on a failed node to ensure service continuity. When accessing service data, the client directly communicates with storage nodes, eliminating the performance bottlenecks of management nodes. Multiple renowned storage vendors, for example, Dell EMC Isilon, use this architecture. This kind of architecture is applicable to both large and small objects (files), but has high entry criteria: Metadata and service data must be evenly distributed across nodes, and reads and writes are balanced based on the client IP addresses and storage resource usage.

Asymmetric architecture is mainly applied to large file storage. Information such as file directories and blocks is stored on metadata management nodes. As more files are stored, they take up more memory resources of metadata management nodes. As a result, the distributed storage performance deteriorates.

Asymmetric architecture has the following characteristics:

1. Metadata management nodes need to be deployed independently, and their deployment scheme is complex.

2. If metadata management nodes fail, services will be interrupted.3. The metadata management node specifications are limited, which

can cause performance bottlenecks.

Asymmetric architectureAsymmetric architecture has dedicated metadata management nodes so that metadata and service data can be stored separately. A typical asymmetric architecture is the Hadoop Distributed File System (HDFS). The process of accessing data involves two steps:

1. The client communicates with a metadata management node to obtain the storage location of the service data.

2. The client communicates with a storage node to read data.

2. Architectures

Storage EC architecture can be classified as either asymmetric or symmetric, depending on whether metadata management nodes are independently deployed. This type of node manages metadata such as file directories and blocks.

Symmetric architecture

Client

ClientManagement

nodeManagement node: stores metadata.Storage node: stores service data.

Storage node Storage node Storage node Storage node

Storage node Storage node Storage node Storage node

Metadata and service data are stored across storage nodes.

Zhou Jiangli, Xu Zhen

54

Sequential distribution

Sequential distribution is relatively common in a distributed table system, for example, Bigtable. Tables are split into multiple tablets. Data from these tablets is distributed across storage nodes by the control server based on specific policies.

Each tablet is equivalent to a leaf node in a tree structure. Some tablets may become much larger while others become much smaller when data is inserted and deleted.

When a storage system receives an I/O request (whether it is for file ID, start LBA, or data), it uses the DHT algorithm to determine which server node will process this request. It then implements the following functions:

Balance: Data is distributed across nodes as evenly as possible.

Monotonicity: When new nodes are added to the system, the system only redistributes a small proportion of shards to them, to reduce data migration workload.

Metatables are introduced as a type of index to support a larger cluster scale. They maintain information about the nodes where the user tables are located, reducing reads and writes of the root table.

Hash addressing

Key 1

Key 2

Key 3

Key 4

Key

Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 Node 4

Metatable Metatable Metatable

Root table

Metatable(Optional)

User tableUser User User User User User User

Value 1

Disk 1

Disk 2

Disk

Value 2

Value 3

Value 4

Value

P

Shards

Add node 4

MappingPhysical nodes

P2 P1 P1

P5

P8

P3

P4 P6

P7 P7

P5

P8P9

P3

P4

P2

P6

P9

1-1000 1001-2000 2001-3000 3001-4000 4001-5000 5001-6000 6001-7000

Disk 2 Disk 3 DiskDisk 1

n n

n

n

n

AI/Storage EC Technology

55

If disk 2 on node 3 fails, data from other nodes is reconstructed into four copies and scattered to other normal disks on node 3.

5. Benefits in the Public Safety Industry

In the public safety industry, storage EC technology will:

As high-definition video surveillance develops rapidly, data volumes continue to increase, especially in large- and medium-sized projects such as intelligent transportation systems and intelligent campuses. In the next few years, storage EC technology will play an important role in storage solutions, and it is believed that it will provide the mainstream, optimal storage solution for public safety.

Improve system reliability and ensure service continuity even when a node fails or historical data is corrupted.

Distribute shards across nodes and enable high-concurrency access.

Restore data quickly by carrying out concurrent reconstruction of multiple shards when a fault occurs.

1. Data blocks are scattered across different storage nodes. During the repair process, data is concurrently reconstructed on multiple nodes, but each node

only needs to reconstruct a small amount. This effectively avoids performance bottlenecks caused by the reconstruction of large amounts of data on a

single node.

2. Data is distributed across nodes to ensure that it can be accessed and reconstructed even when a node fails.

3. Loads are automatically balanced when a fault occurs or capacity is expanded.

When a disk or node within the system is corrupted, the system will reconstruct the data. First, it runs erasure coding for normal data blocks to calculate which of the data blocks need to be reconstructed. Then, the system writes reconstructed data blocks to normal disks. Storage EC technology supports parallel and rapid troubleshooting and data rebuilding.

4. Key Technology 2 - Fast Data Reconstruction

x

Disk n Disk n Disk n Disk n Disk n Disk n

Zhou Jiangli, Xu Zhen

Liu Lei, Shen Zifu, Liu Tengjun, Zhang Tian

Multi-Lens Synergy Technology

In our intelligent age, many industries have deployed intelligent vision solutions. Cameras are the basic video surveillance product, but when combined with intelligence, they offer best-in-class panoramic surveillance and object movement detection functions, such as license plate recognition. Against this backdrop, multi-lens cameras are needed to meet the requirements of diverse surveillance for any industry. A multi-lens camera can provide views from multiple perspectives at the same time, and the lenses collaborate with each other. This multi-lens synergy technology helps supercharge legacy applications, and is an important foundation for intelligent vision.

1. BackgroundSingle-lens common cameras are widely deployed for conventional surveillance. As video surveillance enters the intelli-gent era, these cameras are insufficient and cannot meet the requirements in complex scenarios. Multi-lens cameras comprise a wide-angle prime lens and a minimum of one zoom lens, improving surveillance efficiency and coverage. Multi-lens cameras are available in diversified forms, including combinations of a single box camera and up to two pan–tilt–zoom (PTZ) dome cameras, or dual-lens PTZ dome cameras.

• Each camera is installed and deployed independently. Multiple cameras are needed for wide-scale coverage.

Single lens camera Multi-lens camera

• Lens can effectively collaborate across distance and devices. The wide-angle prime lens detects objects within the field of view, while the long-focus zoom lens rotates and zooms in on the video image to focus on the object.

• A single camera provides a wide field of view and functions as multiple cameras.

• Limited field of view, high false negative rate in detail capture

56

AI/Multi-Lens Synergy Techology

Wide-angle prime lens

57

When the wide-angle prime lens detects that an object enters the surveillance scope, it sends the coordinates to the zoom lens, which then focuses on and captures the object. This feature can be applied in a wide range of applications such as license plate recognition. Automatic calibration is needed to ensure collaboration between multiple lenses.

Multi-lens cameras run automatic calibration and real-time focusing technologies. The former ensures device-cloud synergy among lenses, while the latter enables quick zoom and accurate focus.

Automatic calibration process

The automatic calibration requires the two lenses to capture an image in the same field of view. Then, the camera system uses a dedicated algorithm to extract and match feature points from the two images. The system then obtains the mapping data between the two lenses based on the coordinates, and uses this calibration for future surveillance.

Automatic calibration

2. Technical Principles

Object feature set Object feature set

Coordinate mapping between the two lenses

Complete

Feature points matched

Use coordinates of feature points to obtain the mapping.

Ensure that the images from the two lenses are basically the same.

Feature extraction Feature extraction

Zoom lens

Feature mapping

Liu Lei, Shen Zifu, Liu Tengjun, Zhang Tian

Transportation surveillance

58

A real-time focusing technology is needed to create mapping based on object distance, image distance, and focal length to ensure seamless collaboration across various video streams. With this technology, the camera system establishes a conversion relationship between the camera and object distance/depth, helping control the lens to take quality snapshots.

Focus calibration must first be performed on the camera. For fixed cameras, a geometrical optics principle ensures the zoom lens of the camera can determine the object distance, successfully establishing a conversion between camera and object. After the calibration, the camera can calculate the object and image distance based on the focal length and object distance. The variables correspond to the focus position of the camera's built-in motor, which ensures the camera controls the motor and adapts its field of view (FoV) based on the image distance.

In actual usage, the zoom lens rotates or zooms in/out to take snapshots, helping the multi-lens camera obtain the object distance (for example, 20 m) based on the lens position such as PTZ data (for example, 20° pan and 15° tilt). Then, the system controls the lens to accurately focus on the object based on the object distance.

3. Applicable Scenarios

Multi-lens cameras are suitable for various scenarios such as transportation surveillance and open squares.

In transportation surveillance scenarios, the wide-angle prime lens helps simultaneously detect multiple objects such as vehicles and pedestrians, while the zoom lens captures object detail, such as license plates.

Focu

sing

and

ca

libra

tion

proc

ess

Real

-tim

e fo

cusi

ngpr

oces

s

Establish the conversion relationship between camera position and object distance

Take sharp snapshots in real time

Detect an objectControl focus position based on the object distance and camera

motor

Obtain the object distance based on the geometrical

optics principle

Obtain the camera position information

Obtain the object distance

Conversion relationship

Multi-lens cameras are often installed on street poles at a height of 5-10 m, but this variation may cause video images to vary in different zoom ratios and have a large depth of field (DoF). In this case, the camera may fail to focus on the desired object. Additionally, crowd and vehicle flow changes throughout the day, which can affect snapshot efficiency.

Vertical angle of view

Capture position range in front of the object

Capture position rangeDev

ice

heig

ht

目标高度

Obj

ect

heig

ht

Real-time focusing

Capture range

Street Scenario

AI/Multi-Lens Synergy Techology

Stream switch

More multi-lens

camera forms

Advanced data

processing technology

3-lens or 4-lens bullet/PTZ dome camera

Fixed lens + rotatable lens

Rotatable lens + rotatable lens

More data processing technologies

Open public squares

59

4. ProspectsIntelligent video surveillance poses increasingly high requirements on cameras. The multi-lens synergy technology, which is proposed to solve major problems facing video surveillance, also needs to be innovated to meet the ever-changing requirements of customers.

A multi-lens camera can perform the workloads of multiple cameras. The following shows the advantages of multi-lens cameras compared with common cameras when deployed in open squares.

Common camera: 6 cameras on 4 poles Multi-lens camera: 2 cameras on 2 poles

Current Issue Prospect Goal

• Efficient collaboration among modules

• Larger surveillance scope

• Simplified deployment

• Surveillance blind spots• High construction costs for

busy areas, such as crossroads

• Ever-changing surveillance scenarios and requirements

Multiple cameras are required to provide detailed video stream and panoramic coverage.

Because of poor zoom functionality, common cameras may miss targets when detecting targets. Blind spots exist at the border of cameras' field of view. Reduces the number of cameras and poles

required, saving maintenance costs.

Multi-lens cameras are equipped with a prime lens with a wide field of view, and a zoom lens that supports flexible object movement detection. This improves the surveillance scope and reduces blind spots.

Wide-angle prime lens Zoom lens

Users can select any object in the video image from the prime lens, so that the zoom lens can capture the object.

Lens synergy:Provides panoramic view and accurate object details.

Panoramic surveillancePanoramic surveillance

Detail capture Detail capture

Detail capture

Detail capture

Liu Lei, Shen Zifu, Liu Tengjun, Zhang Tian

Artificial Intelligence (AI) is not only a feature, but also the core competitiveness of software-defined cameras (SDCs). AI chips are the key to adding powerful intelligence to SDCs.

Huawei SDCs adopt professional NPUs with a computing power 25 times that of CPUs, enabling

visual analysis and computing of trillions of records.

Products and Solutions Catalog

1. Moving AI capabilities to cameras can dramatically improve intelligent recognition performance and reduce bandwidth usage, and is a development trend in the video surveillance industry.

2. Huawei, utilizing its own dedicated AI chips, has released a series of SDCs with computing power specifications ranging from 1 TOPS to 20 TOPS, aimed at satisfying surveillance requirements in various scenarios.

Dedicated AI chip

Huawei SDC Series

High-end series: 1–2 TOPSFlagship series: 4–20 TOPS Mid-range series: 1 TOPS Best-value series: 1 TOPS

X Series

Ultimate AI

M Series

Professional AI

C Series

Basic AI

D Series

Inclusive AI

Key Intelligent Services

Long-Tail Algorithms

Behavior analysis (such as abandoned object detection, loiteringdetection, and tripwire crossing detection)

Crowd flow analysis (queue length detection, crowd density detection, and heat map)

Forest fire detection

Person capture and personal feature extraction

Third-party long-tail algorithms such as safety helmet detection, smoke detection, floating debris recognition, and

attendance detection can be released in the Huawei HoloSens Store and downloaded and loaded onto SDCs.

AI/Products and Solutions Catalog

60

Gong Liye, Tan Shenquan, You Shiping

Intelligent vehicle analysis (such as vehicle feature recognition and traffic violation detection)

1. Huawei Software-Defined Camera

To cope with issues such as siloed system construction, limited storage space, and low intelligence, Huawei, utilizing cloud and AI technologies, has developed the Intelligent Video Storage (IVS) solution featuring algorithm ecosystem, elastic resource utilization, and effective storage. Huawei HoloSens IVS products have seen wide usage in a variety of scenarios such as intelligent campus and intelligent transportation.

Huawei IVS supports intelligent edge-cloud synergy to enable independent management and fast closure of edge services as well as unified aggregation, alert, and search of global services. This can dramatically improve intelligent application efficiency and intelligence coverage across industries.

Huawei IVS, based on the vPaaS2.0 architecture, complies with the "platform+ecosystem" strategy and provides the algorithm repository framework that enables concurrent operation of multiple algorithms on an application platform.

Huawei IVS features all-scenario intelligence solutions from the edge to the center, including Micro Edge, Lite Edge, and Central Platform and offers a range of services such as multi-dimensional data analysis, storage, and search, accelerating digital transformation across industries.

Huawei IVS Series

All-cloud solution

Level-1 center

IVS9000 Series

64–384 TOPS computing power, 768-channel access

36 or 38 disks

Large- and medium-sized campus and transportation

IVS3800 Series

IVS3800 IVS3800X

4–32 TOPS computing power, 64-channel image analysis

8 or 16 disks

Campus, education, and banking branch

IVS1800 Series

32 TOPS computing power, 16-channel parallel analysis

Intelligent transportation

ITS800 Series

20 TB storage per device

Device-edge intelligent synergy, inclusive AI

Distribution scenario

NVR800 Series

1, 2, 4, or 8 disks

Major Intelligent Services

Omni-Data Structuring

Object classification

Vehicle attribute recognition

Vehicle search by attribute

Personal attribute recognition

Personal feature extraction

Person search by image

Cyclist attribute recognition

License plate recognition

In-vehicle feature recognition

Person Clustering

N:N clustering

N:M clustering

Holographic profile

Person Recognition

Person capture

Personal attribute recognition

Personal feature extraction

Person search by image

Vehicle Recognition

Vehicle capture

Vehicle attribute recognition

License plate recognition

In-vehicle feature recognition

Vehicle feature recognition

Vehicle search by image

Behavior Analysis

Perimeter detection

Tripwire crossing detection

Loitering detection

Area entry/exit detection

Fast movement detection

Head counting

Queue length detection

Crowd density detection

61

2. Huawei Intelligent Video Storage

Gong Liye, Tan Shenquan, You Shiping

03

Cloud ServiceDiscussion on Video Cloud Service Trends

P2P Technology

Products and Solutions Catalog

63

67

72

Wang Kun, Wang Hongwei

Discussion on Video Cloud Service Trends

AI-powered intelligent analysis: enabling fast mining and extraction of valuable dataWith field-proven intelligent algorithms and GPU chips, valuable data can be quickly extracted from mass video data or directly provided by cameras. Intelligent algorithms such as vehicle recognition and object recognition have been or will be widely used.

IoT technology: enriching video surveillance dataValuable data (such as structured and semi-structured data relating to people and vehicles) extracted from massive quantities of raw video is also present on a massive scale. Although the data is seemingly disordered, substantial relationships among people, vehicles, and objects are embodied in the data.

Cloud computing technology: maximizing video surveillance resource usageAfter mass video data and IoT data are collected and mined, they will be applied to a range of industry applications such as video applications, vehicle applications, and multi-dimensional data applications. Currently, these applications are independently constructed as subsystems, and are finally interconnected through the upper-layer service system.

Against this backdrop, the next-generation intelligent video surveillance system — video cloud — that centers on AI, big data, and cloud computing rises to the occasion.

1. Origin of Video Cloud Services

Since video surveillance entered the Chinese market in the late 1970s, video surveillance technology has evolved through the four following stages after over 30 years of development: analog surveillance, analog-digital surveillance, networked surveillance, and intelligent surveillance.

Traditional video surveillance systems collect massive amounts of unstructured video data with low value density. In the past, security personnel needed to manually view video feeds to discover potential risks. However, due to limited labor resources, security personnel may suffer from fatigue after long viewing sessions and miss important information. The number of monitors in a surveillance center is also always far less than the total number of cameras. Therefore, in large-scale surveillance scenarios, security personnel cannot accurately and efficiently monitor all sites around the clock. The ever-expanding scale of video surveillance has seen an exponential increase in video surveillance data. It is difficult to manually search, view, and analyze the massive amounts of video and image data. The massive increase in surveillance data causes challenges for system architecture, data management, and data analysis in the traditional video surveillance field.

70-90′s

1999

2004

2006

2015

Analog surveillance

Analog-digital surveillance

Networked surveillance

Intelligent surveillance

Analog matrix Optical transceiver VCR-based storageSimple management

DVR/DVSAnalog/Digital matrix Local digital storage Simple management

DVS/Codec IP camera (IPC)IP network switchProfessional storage device Streaming media server Service management platform

Common HD camera Intelligent HD camera Intelligent analysisBig data Cloud computingOpen and standard architecture Multi-service convergence

63

Wang Kun, Wang Hongwei

Cloud Service/Discussion on Video Cloud Service Trends

Core value 2: No need for equipment roomsCloud services can free governments and enterprises from the need for equipment rooms, helping them reduce capital expenditure (CAPEX) on equipment room construction or leasing.

Core value 3: Professional maintenanceCloud services can free governments, enterprises, and individuals from the need to build their own systems and hiring dedicated maintenance personnel, since cloud service providers can offer 24/7 professional maintenance services.

Core value 4: Excellent agility and elasticityCustomers can flexibly increase or decrease resources depending on their actual service volume and do not need to spend lots of time and resources on equipment room construction and system construction, deployment, and debugging.

Cloud service concept

Cloud service categories

2. What Is Video Cloud Service?

The video cloud service is a video streaming media service based on cloud computing technologies. It provides customers with a range of services such as surveillance device (such as cameras and NVRs) access, pan-tilt-zoom (PTZ) control, and audio, video, and intelligent data upload, storage, processing and distribution across the network. The video cloud service enables customers to build a professional video surveillance system in a cost-effective and efficient manner, quickly build applications and intelligent surveillance solutions based on computer vision and video analysis, and easily develop online industry video services. At present, video cloud services have seen wide usage in varied scenarios such as intelligent store, education, community, factory, and construction site solutions.

For different application scenarios

Cloud service value

For different implementation capabilities

PaaS-based video cloud serviceThe software R&D platform is provided to users as a service so that software developers can develop new applications without having to purchase equipment such as servers.

Core value 1: Focus on core servicesCloud services give full play to social division of labor and lower requirements on video access. Video cloud service providers can provide horizontal SaaS or PaaS services to enable industry video service enterprises to focus on their core businesses and quickly build industry-specific video services for industrial market segments.

SaaS-based video cloud serviceSaaS is a creative software deployment mode in which software is provided on the Internet. Users can rent web-based software from providers to manage enterprise businesses.

Public cloud-based video cloud serviceThird-party cloud service providers own and operate public cloud resources (such as servers and storage space) and provide the resources for small- and medium-sized enterprises and individuals over the Internet.

Hybrid cloud-based video cloud serviceThe on-premise infrastructure or private cloud is combined with the public cloud, so data and applications can freely move between the private cloud and the public cloud, making deployment more flexible. This type of cloud services are targeted at governments, industry customers, and large enterprises.

Private cloud-based video cloud servicePhysically located in enterprise data centers, private cloud resources can be hosted by third-party service providers. This type of cloud services are aimed at key sectors such as governments, transportation, and finance.

PaaS

SaaS

64

COVID-19 has transformed the role and importance of digital experiences in people's lives. Demand is soaring for information sharing, cloud services, and cross-region collaboration. Video plays an essential role in modernized, efficient, intelligent, and refined societal governance. Also, this epidemic reveals a country's infrastructure capabilities, especially in the field of digital information technologies. Currently, China has started new infrastructure construction, which opens a window of opportunity for video cloud services.

Globalization has been torn apart in this epidemic, and countries are reshaping their industrial commercial systems to ensure supply security. The epidemic has highlighted the fact that globalization has put every country at the mercy of others. Globalization has accelerated the spread of viruses, setting off widespread social discontent. The foundation of western civilization, the concept of contractual obligation, collapses in the face of life and death. Countries are competing for masks and food. Also, fierce wars relating to currency, trade, science and technology, meteorology, biology, oil, and aerospace are in full swing. Global solidarity and cooperation have become nothing but empty words. All countries have begun to focus on supply security. National and regional ecosystems will surpass global ecosystems. Additionally, an industry-oriented commercial ecosystem enabling system is required to help local ecosystem participants find their positions, quickly engage in the industry ecosystem, and develop commercialized and stable supply capabilities. Local independent supply, as strategic reserves, can ensure supply security in emergencies. Hence why video cloud, as future infrastructure, needs to emphasize more on supply security.

The epidemic has also accelerated the digital transformation of enterprises. Cameras are basic devices that generate massive amounts of data, and industry video cloud services are ingress points for data traffic, which can help HUAWEI CLOUD win the multi-cloud era. Currently, cameras have seen wide use in enterprise video surveillance. The appearance of intelligent cameras gives rise to a large amount of structured data, which is also widely applied in enterprise management, production, and supply.

Third, the uplink bandwidth of the backbone network is greater than 50 Gbit/s, which leads to large video traffic on high-speed networks. Therefore, video reliability and traceability are vital for further strengthening information security protection.

First, production supervision is strengthened in key new infrastructure construction fields, including healthcare, manufacturing, power grid, Internet of Vehicles (IoV), ultra high definition (UHD), education, and harbors. Video cloud services are the basic services of infrastructure construction. Second, the access bandwidth is greater than 100 Mbit/s. This will alleviate the pressure on video backhaul bandwidth, improve the video cloud service access capabilities, add weight to edges, balance service indicators and network costs, and help build competitive service products.

3. Future Development

Frontend intelligence

Full-fledged AI technologies also play a significant role in the intelligent vision sector. AI technologies make available intelligent recognition, valuable information extraction from video, and alarm linkage, which dramatically improve efficiency across industries. Cameras mainly collect video and image data, which is then uploaded to equipment rooms, resulting in high Border Gateway Protocol (BGP) bandwidth costs. Embedding computing power on cameras will help directly deploy intelligent recognition algorithms on cameras, sharply decreasing the amount of data transmitted on the network.

Use the peer-to-peer (P2P) technology to give full play to devices and edges and reduce traffic forwarded on the cloud through network traversal to lower bandwidth costs. Combine with the content delivery network (CDN) bandwidth to improve distribution efficiency and reduce costs. Deploy video cloud service at edges, in close proximity to users, to reduce cloud-based traffic forwarding costs.

Device-cloud synergy

1. Reduce the video transmission bandwidth to lower network costs without compromising image quality by using cameras that use the H.265 encoding format.

2. Flexibly deploy computing power on devices, edges, and cloud to meet different latency, cost, and SLA requirements and achieve the optimal performance in varied scenarios.

65

Wang Kun, Wang Hongwei

Cloud Service/Discussion on Video Cloud Service Trends

Scenario-specific intelligence, boosting industry digitalization

Video acts as the data awareness entry point for enterprise digitalization. Cameras – basic devices of enterprises – collect video data, which boosts digital transformation of enterprises. There are many opportunities in vertical industries, which is why we need to focus on digital transformation of vertical industries to continuously explore application scenarios and provide value-added services accordingly.

Big data

Intelligentawareness

Intelligent decision

AI platform

Video cloud

Sound pickup

Image capture

Video collection

Behavior capture

Gas detection

Liquid detection

Vibration detection

Intention judgment

Human-machine interface

AI

VR

Mobile phone interface

Government

Enterprise

Household

Individual

Intelligent olfaction

Intelligent vision

Intelligentgustation

Intelligent somatosensation

AI

AI

AI

AI

Intelligent twins of industry digitalization

Region 1 Region 2

Aggregation point

CDN node for live streaming Edge computing Hybrid cloud

Public cloud node

Aggregation

Edge

Device

Edge storage Nearby forwarding

Central forwarding

IP IPIP IP IP

IP IPIP

66

Wu Xiaoliang, Zhang Yinqun

Currently, the most common Internet access mode is to place P2P communication hosts on both sides of Network Address Translation (NAT) devices, such as firewalls. NAT is a technique of translating the address in an IPv4 packet header into another address. NAT technology is widely applied to solve the IPv4 address exhaustion issue but prevents the establishment of P2P sessions. This is because NAT does not allow hosts on the public network to proactively access hosts on the private network, while P2P requires that both communication parties be able to proactively access each other. Therefore, this presentation focuses on how to conduct effective P2P communication in NAT traversal scenarios.

P2P is a communication model that allows peers to share resources with each other without a central server. The use of P2P can decrease the number of nodes involved in network transmission and prevent information loss. Unlike the client/server (C/S) model with a central server, each node on the P2P network acts as a server as well as a client. Nodes can directly communicate with each other.

1. What Is P2P?

2. P2P Implementation Solutions in NAT Traversal Scenarios

With improvements in network quality and bandwidth, and the proliferation of 4G/5G and smartphones, access options for household video access and public video access have correspondingly become more diverse. This presentation describes the use of peer-to-peer (P2P) technology in the video surveillance sector for the transfer of video via the Internet.

In this scenario, client A can directly connect to client B through TCP. However, if client B attempts to establish a TCP connection with client A for P2P communication, the connection will fail because the NAT device behind which client A is located will reject the connection request. To communicate with client A, client B cannot initiate a connection request directly to client A. Instead, client B sends a connection request to centralized server S, which then forwards the request to client A to request client A to connect to client B (that is, to establish a reverse connection). After receiving the request from the server, client A initiates a TCP connection request to client B. Then entries related to the TCP connection will be created on the NAT device so that the TCP connection is established between clients A and B.

The following describes common P2P implementation solutions in NAT traversal scenarios.

P2P Technology

Peer

Peer

Peer

Peer Peer

Client

Centralserver

Client

Client

ClientClient

3. Establish the reverse connection

Centralized server S(public IP address)

1. Request a reverse connection

2. Forward the reverse connection request

Internet

NAT gateway(public IP address)

Privatenetwork

Client A Client B

P2P network Classic C/S network

P2P implementation solution based on reverse connection

67

Wu Xiaoliang, Zhang Yinqun

Cloud Service/P2P Technology

In this solution, related NAT forwarding entries are created on the NAT devices of both clients through the centralized server, so packets sent by the two clients can directly pass through each other's NAT devices, thereby establishing a connection between the two clients.

UDP hole punching applies to the following typical scenarios:Scenario 1: In a simple scenario, two clients are located behind the same NAT device, that is, on the same private network.

In this scenario, clients A and B establish UDP connections with the centralized server separately. After NAT translation, the internal IP addresses and port numbers of clients A and B are translated to the public IP address and port number of the NAT device. After clients A and B obtain each other's internal and public IP address and port number from the centralized server, clients A and B send UDP data packets to each other to set up a connection. Clients A and B attempt to connect to the public IP address and private IP address at the same time. After the private IP address is connected, the private network connection is used preferentially.

Client A sends a request to centralized server S to connect to client B.Centralized server S sends client A's public IP address, private IP address, and port number to client B and client B's public IP address, private IP address, and port number to client A. Clients A and B send UDP data packets to each other to set up a connection. Because clients A and B are on the same private network, the UDP data packets are directly sent through the private network.

UDP hole punching process when both clients are located behind the same NAT device

InternetInternet

Centralized server S(public IP address)

Internet

Centralized server S(public IP address)

Centralized server S(public IP address)

??

Privatenetwork

Privatenetwork

Privatenetwork

Client A Client B Client A Client B Client A Client B

NAT device(public IP address)

NAT device(public IP address)

NAT device(public IP address)

The UDP hole punching process is as follows:

Before hole punching During hole punching After hole punching

1

2

3

2

2

3

1

P2P implementation solution based on UDP hole punching

68

2

3

4

1

1

Scenario 2: In a common scenario, two clients are located behind different NAT devices, that is, on different private networks.

This scenario is similar to scenario 1. The difference lies in that clients A and B are connected to different NAT devices and their IP addresses are translated into different public IP addresses during IP address and port mapping. After clients A and B obtain each other's public and private IP addresses and port numbers from the centralized server, clients A and B send UDP data packets to each other to set up a connection. However, clients A and B are on different private networks, and no route is configured for their private IP addresses on the public network. Therefore, UDP data packets destined to the private IP addresses will be sent to incorrect hosts.

The message packet (public network connection) sent by client A to client B will be used to create a session entry on client A's NAT device, and the message packet sent by client B to client A will also be used to create a session entry on client B's NAT device. If the message is unreachable, the data packet is discarded. Once clients A and B send data packets to the IP address and port number of the peer NAT device on the public network, the "hole" between clients A and B is punched. Clients A and B send data packets to each other's public IP address, which is equivalent to sending UDP data packets to the peer client. In this way, real P2P data transmission starts.

UDP hole punching process when two clients are located behind different NAT devices

Internet

Internet

Centralized server S(public IP address)

Internet

Centralized server S(public IP address)

Centralized server S(public IP address)

x x

xNAT device(public IP address)

Privatenetwork

NAT device(public IP address)

Privatenetwork

NAT device(public IP address)

Privatenetwork

NAT device(public IP address)

Privatenetwork

NAT device(public IP address)

Privatenetwork

NAT device(public IP address)

Privatenetwork

Client A Client B Client A Client B Client A Client B

The UDP hole punching process is as follows:

Client A sends a request to centralized server S to connect to client B.Centralized server S sends client A's public IP address, private IP address, and port number to client B and client B's public IP address, private IP address, and port number to client A. Client A sends a message to client B. Clients A and B belong to different private networks, and there is no public network route. Therefore, the message sent by client A to client B is unreachable. In this case, a session entry is created on client A's NAT device. Client B sends a message to client A. Because the session entry between clients A and B has been created on client A's NAT device, client B can send the message to client A through the public IP address. Client A connects to client B in the same way to implement P2P communication.

Before hole punching During hole punching After hole punching

2

3 4

2

69

Wu Xiaoliang, Zhang Yinqun

Cloud Service/P2P Technology

Scenario 3: In a complex scenario, clients are located behind two-layer NAT devices. Generally, top-layer NAT devices are provided by telecom carriers and layer-2 NAT devices are usually home NAT routers.

It is assumed that NAT device C is provided by a telecom carrier and NAT device A provides public IP address translation for its internal nodes. Internal nodes include NAT devices A and B and clients A and B. Clients A and B can connect to the centralized server only after their private IP addresses are translated into public IP addresses through two-layer NAT devices. Clients A and B obtain each other's public IP address and port number from the centralized server and perform hole punching. The data packets used for hole punching are forwarded by NAT device C. The hole punching process is the same as that in scenario 2.

UDP hole punching process when clients are located behind two-layer NAT devices

Internet

Centralized server S(public IP address)

Privatenetwork

NAT device A(public IP address)

Privatenetwork

NAT device B(public IP address)

Privatenetwork

Internet

Centralized server S(public IP address)

NAT device C(public IP address)

Privatenetwork

NAT device B(public IP address)

NAT device A(public IP address)

Privatenetwork

Privatenetwork

Internet

Centralized server S(public IP address)

xNAT device C

(public IP address)

Privatenetwork

NAT device B(public IP address)

NAT device A(public IP address)

Privatenetwork

Privatenetwork

NAT device C(public IP address)

Client A Client B Client A Client B Client A Client B

xx

Before hole punching During hole punching After hole punching

70

In the public cloud scenario, a camera and a mobile client, functioning as P2P clients, are deployed on different private networks which are isolated from the public network through a NAT gateway provided by the carrier. Through the hole punching technology, the camera and mobile client punch a "hole" on the NAT gateway. In this way, NAT will no longer be an obstacle for P2P session establishment.

Camera side: The camera registers with the cloud-based server and exchanges necessary information (about the network where the camera is located, including the internal IP address, internal service port, public IP address, and public service port) with the server to implement network analysis and connection establishment.

Client side: The client connects to the cloud-based server in the same way and exchanges information with the server.

Major benefits of P2P technology to video surveillance systems:

3. Application of P2P Technology in Video Surveillance Systems

In a video surveillance system, a camera or client obtains peer network information from a cloud server, and actively establishes a P2P connection with the peer end to transmit media streams and control signaling.

After the client, camera, and cloud-based server complete information exchange and the client attempts to request video streams from the camera, the client and the camera attempt to establish a P2P connection. The following figure shows the specific service process.

Reduces computing and network resources required for forwarding video streams through the cloud-based server, lowering service costs accordingly.

Enables users to enjoy HD video without stuttering.

4. Forward video streams.3. Request video streams.

2. Obtain the device list and request video streams.

1. Register the camera with the server and bring the camera online.

5. Establish a P2P connection. After the connection is established, the video transmission mode is switched from cloud-based forwarding to P2P.

1. Log in to the client and connect to the server.

Cloud-based server

71

Wu Xiaoliang, Zhang Yinqun

Cloud Service/Products and Solutions Catalog

Su Rui, Zhang Yinqun

Huawei HoloSens Cloud Service provides a wide range of cloud-based capabilities for Huawei devices such as cameras, network video recorders (NVRs), and intelligent video storage (IVS), as well as third-party devices. These capabilities include cloud-based video access, storage, viewing, and analysis. Software service providers can develop industry video applications based on Huawei HoloSens Cloud Service, for example, intelligent store and intelligent kindergarten solutions.

Products and Solutions Catalog

Huawei HoloSens Cloud Service (Only Available in China)

Open Access

Quick access for devices from third-party vendors

Standard APIs that enable ISV application innovation

Intelligent analysis via device-cloud synergy

Intelligent Ecosystem

Open intelligent algorithms

Intelligent algorithm/app store on the cloud

A rich selection of WeCode applets

Security and Trustworthiness

Encrypted data transmission and storage, video watermark

Dynamic privacy mask

E2E traceability

Seamless Experience

Consistent service experience on multiple types of devices

Device-cloud-edge synergy, superior user experience

Unified architecture and service based on HUAWEI CLOUD and hybrid cloud

72

Industry video data

Huawei HoloSens Cloud Service

VideoviewingVideo storageVideo access

Customer group analysis

Hotspotanalysis

Customerflow analysis

Remoteinspection

Third-party devicesEcosystem devicesNVR800Huawei SDC

Open to devices from various vendors

HoloSens Store

Aggregate ecosystem partners

Boost industry digitalization

SaaS

Stores …

Huawei HoloSens App

(for enterprises)

Huawei HoloSens

App

Huawei HoloSens PC

Client

Application client based on video intelligence

Algorithm & application transaction platform

Algorithm aggregation on the cloudAlgorithm loading onto devicesSelected, best-in-class algorithms

Intelligent chain store

Intelligent education

Intelligent breeding

Intelligent construction site

Shopping malls/Supermarkets 4S Stores Restaurants Kindergartens

Primary schools

Construction sites

Residential complexes

Clou

d-ed

ge-d

evic

e sy

nerg

y

04

EcosystemDiscussion on Intelligent Vision Ecosystem Trends

Products and Solutions Catalog

74

78

74

Ecosystem/Discussion on Intelligent Vision Ecosystem Trends

Yu Zhuo, Liang Jiani

1. Status Quo of the Video Surveillance Industry

Driven by 5G, AI, big data, and cloud computing technologies, the video surveillance industry has grown from a conventional industry to an intelligent one. It is an unstoppable trend of the industry to transform from single-dimensional to multi-dimensional data applications and develop comprehensive applications that converge video surveillance data and traditional service data. In addition, industry participants, including AI vendors, IT vendors (in the fields of IT infrastructure, big data, cloud, and computing), and industry application vendors, are all vying for primacy. The industry chain, however, is such an all-inclusive and intricate ecosystem that it is tough for even an enterprise who reigns supreme in the market to be a sophisticated all-rounder. Enterprises are also facing increasing market competition for all-in-one and non-decoupling products.

The concept of manufacturers going it alone is becoming less and less popular nowadays. In this era, enterprises with various roles in the industry should collaborate with others to build up an ecosystem filled with partnerships throughout the industry chain. In other words, the nature of competition in the video surveillance market has changed – from competition between hardware and solutions to competition between ecosystems in the industry chain. The ecosystem is now the arena of competition for leading enterprises in the video surveillance sector, with technological enablement, platform openness, and partnership enablement being the core areas of competition.

2. Ten-Year Progression of the Video Surveillance Industry

During the past decade, the video surveillance industry has undergone progressive improvement, moving through the eras of network surveillance, HD surveillance, intelligent surveillance, to data-enabled surveillance. The video surveillance ecosystem has also evolved in parallel, from the initial networking platform to its current state, featuring AI, applications, and big data.

Discussion on Intelligent Vision Ecosystem Trends

Conventional video

surveillance

Video surveillance vendor

Party A

IntegratorIntelligent

video surveillance

Video surveillance vendor

IT vendor

AI vendor

Application vendor

Integrator

IT vendor: supplies fundamental computing hardware, storage hardware, as well as cloud and big data capabilities.

AI vendor: supplies fundamental capabilities of AI algorithms and AI chips, such as vehicle-related algorithms and algorithms for various industry segments.

Application vendor: supplies fundamental applications for each industry segment as well as end-to-end (E2E) services, such as the integrated video surveillance management platform and integrated command platform used in traffic management.

Party A

75

3. Challenges for the Video Surveillance Industry

Video surveillance has undergone constant development in recent years. Developing hardware-based algorithms is one of the important industry trends, together with algorithm value transfer to the frontend – simple algorithms that support closed-loop management can be directly loaded on frontend devices. As deep learning has been growing from a fledgling technology to a thriving one, the value of training data samples has surpassed the value of algorithms themselves. Consequently, algorithm vendors are finding it difficult to maintain competitiveness and prevent new entrants from outcompeting them. Application vendors are also confronted with challenges such as customizing applications for different regions, large-scale replication, and high competition in the homogeneous application market.

Algorithms relating to people, motor vehicles, and non-motorized vehicles are becoming more and more mature, and are widely used in traffic management applications. However, there is still a long way to go for other industries' intelligent apps to achieve this level of efficacy.

For example, various industries demand various long-tail algorithms, such as algorithms for head counting and health monitoring of livestock in animal husbandry, kitchen environment and standard operating procedure (SOP) monitoring in the catering industry, and transaction monitoring in the retail industry. However, many long-tail algorithms suffer from multiple issues such as difficulties in data acquisition and algorithm training, as well as enormous and diverse customization requirements.

(1997–2008)The video surveillance system was complex, involving IP cameras (IPCs), network video recorders (NVRs), and software systems.

Network Surveillance Era

(2009–2012)The video surveillance system gradually expanded to a platform-based application that integrates data transmission, video, alarms, and control.

HD Surveillance Era

(2013–2018)Computer vision technology promotes the advancement of AI-based video surveillance.

Intelligent Surveillance Era

(since 2019)A huge amount of structured data is generated with the wide use of AI in video surveillance. Cloud and big data technologies are required to solve the problems of massive data search and multi-dimensional data match.

Data-enabled Surveillance Era

AI

Major challenges for AI vendors and application vendors

Enormous and diverse requirements for customized long-tail algorithms

General Surveillance Platform Vendors Industry Solution Vendors AI and Application Vendors AI, Application, and

Big Data Vendors

Fire and smoke detection

Person recognition

Video structuringRecognition of license plates from the Chinese mainland

Recognition of license plates from countries/regions outside the Chinese mainland

Floating debris detection

Behavior analysis Video synopsis Video search

Safety helmet detection

Excavator recognition

High

Ma

rke

t D

em

an

d S

ca

le

Low

Yu Zhuo, Liang Jiani

76

4. Future of Video SurveillanceThe principal driving forces of AI development in the future will still be computing power, algorithms, and data, while the driving forces for AI-powered video surveillance include computing power, algorithms, data, solutions, engineering, and services. The core foundation of industry development is to provide viable and all-round solutions by combining technologies with industry attributes. No matter what the trend of the video surveillance industry and technological advancement will be, the methodological principle, using technical means to solve industry challenges, will never change. That principle will continue to guide every technical activity of an enterprise.

Years of evolution have seen the transition from analog cameras to intelligent cameras to modern software-defined cameras, which feature user-defined algorithms that can satisfy user requirements in various scenarios. Most cameras in the industry today feature optical sensors. To satisfy the needs of various industries, ecosystem cameras that support all types of interfaces are required in order to connect to various other types of sensors such as humidity sensors, temperature sensors, and PH value sensors.

With advanced technologies and policies, various industries are motivated to gradually make inroads into Smart City, which requires more intelligent, multifaceted, and all-round video surveillance ICT construction. In some industries, scenarios are segmented in a supplemental manner for video surveillance, breeding a batch of budding companies who expertise in providing algorithms and applications.

Intelligent campus

Intelligent grid

Intelligent transportation

Intelligent community

Intelligent business

Intelligent prison

Infrastructure PlatformResource orchestration

and scheduling Automatic deploymentVM resourcesContainer resources Auto-scaling

Inadequate camera hardware scalability

Platform + AI + Ecosystem + Industry intelligence to build an open cooperation system

AI Ecosystem

... ...

Person

Vehicle

Long tail

Audio Behavior Data management Intelligent analysis

Algorithm

Application

...

Ecosystem/Discussion on Intelligent Vision Ecosystem Trends

77

The camera of the future will support third-party hardware ecosystems. Through standard interfaces such as ad-serving interfaces or RS-485 (now also as EIA-485) interfaces, cameras can connect, with or without a pan-tilt unit (PTU), to multiple types of sensors such as liquid temperature sensors, liquid level sensors, conductivity sensors, air temperature sensors, tilt sensors, PH value sensors, and pull sensors.

In the construction of the video surveillance ecosystem, it is essential to build a general-purpose platform. Open APIs can contribute to the swarm effect where open-source technologies lay the foundation for ecosystem expansion and development. The entire ecosystem consists of shared platforms, open-source software, open APIs, and ecosystem business models.

Once new Internet technologies are integrated into the video surveillance industry, an increasing number of algorithm developers will join the online store-like algorithm platform, allowing for the realization of a new video surveillance ecosystem business model. Algorithm developers and users alike will no longer have to look for each other but will instead partake in direct-to-customer (D2C) transactions on universal algorithm markets.

Camera

Micrometeorological sensor

Liquid volume sensor

Liquid level sensor

Third-party RTU

Camera

Liquid temperature

sensor

Liquid level sensor

Conductivity sensor

Massive data

AIAI algorithms

User Developer

BEFORE

AFTER

User Developer

Search for algorithms

Search for users

Huawei HoloSens Store

From software-defined to software- and hardware-defined

From project-based model to business model

Few sensors Massive sensors

... ...

Yu Zhuo, Liang Jiani

78

Ecosystem/Products and Solutions Catalog

Tan Shenquan, Su Rui, Liang Jiani

Products and Solutions Catalog

Huawei Eco-Cube Camera Models

Huawei Eco-Cube cameras can be flexibly deployed in challenging environments, for example, areas with challenging power supply or network deployment. They provide an AIoT-Hub that a variety of sensors can connect to, allowing video data to be fused with IoT data to implement multi-dimensional awareness. Huawei Eco-Cube cameras introduce next-generation cameras to an entirely new perspective on the architecture.

The Eco-Cube camera adopts a functional compartment design inspired by space stations. The camera’s built-in AIoT-Hub allows connections with various third-party sensors to collect multi-dimensional data. The camera, with its cylindrical body and droplet-shaped cover, can perfectly blend into the environment. In addition, the camera features a dovetail slot for easy installation and can be disassembled with the push of a single button.

The Eco-Cube camera can be deployed in areas with difficulties in network deployment. An ad-hoc network can be quickly built between the primary camera and secondary cameras through the VideoX wireless transmission technology. In this way, video data from secondary cameras can be transferred through wireless networks. The Eco-Cube camera can also be used in areas without power grid access because it can be powered by solar panels. With built-in SuperColor technology, the camera can deliver full-color images at night without producing any light pollution. In addition, the camera supports one-click optimization, visualized and remote inspection, and self-cleaning, reducing on-site O&M.

The AIoT-Hub features a variety of sensor interfaces that enable the camera to connect to a variety of sensors to collect multi-dimensional data such as water level, air quality, temperature, and wind speed.

1. Huawei Eco-Cube Camera

M7341-10-I-RTX7341-10-HMI M7641-10-Z23-RT

Innovative design

No time or space restrictions

Unrestricted awareness

All-scenario adaptation No power grid | No wired connection | No light pollution | No onsite O&M

Environment-blending Aesthetic, user-friendly design, allowing the camera to perfectly blend into the environment; simplified installation, enabling the camera to be installed in various locations and used for various scenarios

Multi-dimensional awareness AIoT-Hub for the primary camera; AI chip + SDC OS + HoloSens Store, supporting on-demand loading of intelligent algorithms

Agile release Algorithm marketing

79

2. Huawei HoloSens Store (Only Available in China)

Huawei HoloSens Store integrates high quality third-party algorithms and applications that can run on Huawei's Software-Defined Cameras (SDCs) and Intelligent Video Storage (IVS) platforms. In this store, customers from various industries can choose from a variety of reliable intelligent algorithms.

Unified OS: provides standard and service-oriented APIs on SDCs and IVS platforms to run, manage, upgrade, and monitor third-party algorithms. One-stop development platform: provides comprehensive and efficient algorithm development and commissioning services for developers based on the algorithm development and training capabilities provided by the Huawei ModelArts and the remote connection capabilities provided by the Intelligent Vision Ecosystem Lab. HoloSens iClient: allows users to load and update third-party algorithms on Huawei intelligent vision products and manage algorithm licenses.

More Choices● Numerous algorithms● For diversified industries● Convenient order placement

Fast Replacement● Algorithm updates are pushed like

app updates● One-stop algorithm management

on the iClient● Intelligent algorithm match for

new demands

Safe to Use● Select, high-quality algorithms● Service continuity during

algorithm loading● Online verification and

closed-loop services

Fast Rollout

● One-stop development platform, enabling fast algorithm rollout at low costs

● Enablement plan, HUAWEI Developer Day, and HUAWEI Developer

Automatic Testing

● Automatic testing laboratory, providing online testing services for hundreds of products in terms of application stability, compatibility, and performance optimization

Broad Market Application

● Powerful marketing channels, helping partners to demonstrate and promote algorithms, so that they can quickly capitalize on their applications and algorithms

For Users

For Developers

HoloSens Store and Its Support System

Developers

Application development

Algorithm model development

Online

Online

Online Online

Model training

Algorithm debugging

Algorithm loading and

management, empowering a

range of industries

Diverse industries

Intelligent campus

Intelligent transportation

Intelligent finance

Intelligent education

Intelligent energy

...

SDC and IVS based on a unified OS, opening capabilities to build a future-proof algorithm and application ecosystem

HoloSens one-stop development platform HoloSens Store HoloSens iClient

Scan to access the HoloSens Store

Tan Shenquan, Su Rui, Liang Jiani

81

05

AppendixAbbreviations

Legal Statement

Product Portfolio

82

83

AbbreviationsArtificial Intelligence

Application Programming Interface

Advanced RISC Machine

Automatic Repeat Request

Audiovisual HD Quality

Block-Matching and 3D Filtering

Compound Annual Growth Rate

Capital Expenditure

Content Distribution Network

International Commission on Illumination

Convolutional Neural Network

Central Processing Unit

Distributed Hash Table

Deep Neural Network

Digital Video Recorder

Digital Video Server

Erasure Coding

Enhanced Mobile Broadband

Ecosystem Software Development Kit

Forward Error Correction

Gaussian Mixture Model

Graphics Processing Unit

International Electrotechnical Commission

International Organization for Standardization

Image Signal Processing

Independent Software Vendor

International TelecommunicationUnion-Telecommunication Standardization Sector

Intelligence Video Storage

K-Nearest Neighbors

Light Emitting Diode

Megabit Per Second

AI

API

ARM

ARQ

AVHD

BM3D

CAGR

CAPEX

CDN

CIE

CNN

CPU

DHT

DNN

DVR

DVS

EC

eMBB

eSDK

FEC

GMM

GPU

IEC

ISO

ISP

ISV

ITU-T

Massive Machine Type of Communications

Network Address Translation

Non-Local Means

No-Reference Metric

Neural Processing Unit

Network Video Recorder

Peer to Peer

Platform as a Service

Process-Architecture-Optimization

Peak Signal-to-Noise Ratio

Quality Assessment for Recognition Tasks

Quantization Parameter

Redundant Arrays of Independent Disks

Person Re-Identification

Radio Frequency Identification

Reduced Instruction Set Computing

Region of Interest

Software as a Service

Software Defined Camera

Software Development Kit

Service Level Agreement

Total Cost of Ownership

Transmission Control Protocol

Threshold Increment

User Datagram Protocol

Unified Glare Rating

Ultra-Reliable Low-Latency Communication

Video Cassette Recorder

Visual Background Extractor

mMTC

NAT

NL-Means

NORM

NPU

NVR

P2P

PaaS

PAO

PSNR

QART

QP

RAID

ReID

RFID

RISC

ROI

SaaS

SDC

SDK

SLA

TCO

TCP

TI

UDP

UGR

URLLC

VCR

ViBe

IVS

KNN

LED

Mbps

81

Abbreviations

Appendix/Legal Statement

Legal Statement

About This Document

jointly published by Huawei Data Storage and Intelligent Vision Product

Customer Experience Dept and Huawei Intelligent Vision PDU, presents the latest industry insights, hot technical topics, as

well as product and solution developments in intelligent vision.

Copyright Statement

The copyright of this Tech Express belongs to Huawei Technologies Co., Ltd. and is protected by law.

No part of this Tech Express may be copied, translated, modified, or distributed by any individual or organization in any

form or by any means without the prior written consent of Huawei Technologies Co., Ltd. Include "Source: Huawei Technol-

ogies Co., Ltd." when reproducing, distributing, or using any parts of this Tech Express in any form or by any means.

Huawei Technologies Co., Ltd. will investigate and affix legal liability to any individual or organization involved in the

violation of the preceding statements.

Responsibility Declaration

To the maximum extent permitted by law, the content of this Tech Express is provided As-is. It does not represent the views

of Huawei Technologies Co., Ltd., and does not serve as a warranty, guarantee or representation of any kind, either

expressed or implied, including but not limited to the warranties suitable for specific purposes and for commercial use.

Huawei Technologies Co., Ltd. does not guarantee the accuracy of information provided in this Tech Express. The informa-

tion in this document is subject to correction, modification and change without notice. Huawei Technologies Co., Ltd. does

not assume the responsibility for any decision-making or negative consequence caused by any individual or organization

based on the contents of this document.

Change History

Issue 1, December 2020

Huawei HoloSens Intelligent Vision Tech Express,

Legal Statement

82

Huawei HoloSens Intelligent Vision Product Portfolio

Industry application Safe City Intelligent Transportation Intelligent Campus

Cloud

Edge

Device

HoloSens IVS9000

HoloSens IVS3800

VCN5X0

HoloSens Store

HoloSens SDC

VCN VCM Video Big Data

IVS3800S\IVS3800XSStorage

IVS3800F\IVS3800XFStorage, compute,

and search

IVS3800C\IVS3800XCCompute

HoloSens IVS1800

IVS1800 C08-4T VCN510/520 VCN540

X series (Ultimate AI) M series (Professional AI) C series (Basic AI) D series (Inclusive AI)

IVS1800 C08-16T IVS1800 C08-32T

HoloSens SDCHuawei HoloSens SDC continuously evolves based on professional AI chips, open-ended SDC OS, and future-proof algorithm and application ecosystem. Ultimate AI computing power provides SDC with self-learning capabilities; deepened software-hardware decoupling allows partners to develop more algorithms for intelligent applications; open-ended architecture eliminates the boundary between software and hardware, making the SDC an intelligent enabler for multi-dimensional data awareness.

AI chips are key to adding true intelligence to SDC. Huawei SDC, relying on advanced Neural Processing Units (NPUs) such as Ascend chips, continues to evolve from inference only to inference and training, enabling visual analysis and computing of trillions of records.

Professional AI ChipA dedicated OS, the industry's first SDC OS launched by Huawei, runs on SDCs. The OS features a standard and unified software running environment where software can be decoupled from hardware. The OS opens service-oriented interfaces for ecosystem building and makes SDC truly software-defined. The HoloSens SDC OS adopts lightweight microservice architecture featuring loose coupling, high performance, and high reliability, ensuring service continuity during algorithm upgrade and switchover.

Open-ended SDC OSBased on the open-ended OS, a complete ecosystem tool chain is available to implement standard connection, training, and rollout of algorithms, opening the software architecture. Additionally, the SDC uses modular design to collect multi-dimensional sensory information, opening the hardware architecture. Huawei SDC supports on-demand loading of algorithms and hardware capabilities, turning common cameras into dedicated cameras within seconds and adding intelligence to a variety of industries.

Future-proof ecosystem

OSAI

TOPS

C series

M series

X series

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Future-proof ecosystem

Person Data Structuring Camera

Omni-Data Structuring Camera Vehicle Data Structuring Camera

X (eXtra) Series: Ultimate AI

Integrated ITS Camera

Security Situation Awareness Camera

Person Data Structuring Camera

M (Magic) Series: Professional AI

Omni-Data Structuring Camera

Vehicle Data Structuring Camera

4T 8MP Face Capture Box Camera

X1281-F4T 2MP Face Capture Softlight Bullet Camera

X2221-FL4T 2MP Face Recognition Softlight Bullet Camera

X2221-CL

4T 2MP Ultra-low-light Face Recognition infrared fixed dome camera

X3221-C

4T 4MP Ultra-low-light Face Capture Bullet Camera

X2241-FLI

4T 2MP Face Recognition Softlight Bullet Camera

X2222-CL4T 4MP Face Recognition Softlight Bullet Camera

X2241-CL

NEWSuperColor

4T 2MP Face Capture Softlight Bullet Camera

X2221-10-FL

4T 4MP SuperColor Multi-Algo-rithm-Concurrency Bullet Camera

X2241-10-HLI

4T 4MP Multi-Algorithm Concurrency Bullet Camera

X2241-HL 4T 8MP Multi-Algorithm Concurrency Bullet Camera

X2281-HL

NEW

5T Single-PTZ Compound-Eye Camera

X8341-10-HLI-PT8T Dual-PTZ Compound-Eye Camera

X8341-10-HLI-PT24T 2MP Vehicle Recognition Softlight Bullet Camera

X2221-VL

4T 9MP ITS AI Bullet Camera

X2331-10-TL4T 3MP Low-Light ITS AI Bullet Camera

X2391-10-TL20T 9MP Low-Light ITS AI Bullet Camera

X2391-20-T NEW NEWNEW

4T 2MP Ultra-Low Light IR Bullet Camera

X2221-I4T 2MP Ultra-Low Light IR Dome Camera

X3221-I

4T 4K 20x Ultra-Low Light IR PTZ Dome Camera

X6981-Z204T 2MP Starlight Laser IR PTZ Dome Camera

X6921-Z48

4T 2MP 37x Ultra-Low Light IR PTZ Dome Camera

X6721-Z374T 2MP 37x Ultra-Low Light IR PTZ Dome Camera

X6721-GZ374T 4K 37x Starlight IR PTZ Dome Camera

X6781-Z37

4 TOPS 4MP AI Fixed Dome Eco-Cube Camera

X7341-10-HMI

4T 2MP 30x Starlight PTZ Dome Camera

X6621-Z30 4K

4K

4T 2MP Ultra-Low Light Box Camera

X1221-FbNEW

1T 4MP IR Face Capture Bullet Camera

M2140-EFI(6mm)

2MP Face Capture IRBullet Camera

M2120-EFI(6mm)

2T 4MP Face Capture Bullet Camera

M2241-EFL

1T 2MP Face Capture Bullet Camera

M2121-EFL(8-32mm)1T 4MP Face Capture Bullet Camera

M2140-EFL(6mm)1T 4MP Face Capture Bullet Camera

M2140-EFL(7-35mm)

1T 2MP Face Capture Bullet Camera

M2120-EFL(7-35mm)1T 2MP Face Recognition Softlight Bullet Camera

M2121-ECL(2.8-12mm)1T 2MP Face Capture Bullet Camera

M2120-EFL(2.8-12mm)

2T 2MP Multi-Algorithm Box Camera

M1221-Q2T 8MP Multi-Algorithm Box Camera

M1281-Q2T 4MP Multi-Algorithm Box Camera

M1241-Q

2T 2MP Multi-Algorithm Bullet Camera

M2221-QL

2T 2MP Ultra-Low Light Invisible IR Bullet Camera

M2221-QIn

1T 2MP Class-D Anti-corrosion Infrared Bullet Camera

M2121-10-EI-S(8-32mm)

2T 4MP Multi-Algorithm Bullet Camera

M2241-QL1T 2MP IR AI VF Dome Camera

M3220-10-EI-Sf

1T 2MP Multi-Algorithm PTZ Dome Camera

1T 4MP Multi-Algorithm PTZ Dome Camera

1T 5MP IR AI VF Dome Camera

M3250-10-EI-Sf

M6721-E-Z31 M6741-E-Z371T 2MP Class-D Anti-corrosion Infrared PTZ Dome Camera

M6620-10-EZ33-S

1T 2MP vehicle recognition bullet camera

M2120-EVL(7-35mm) 1T 2MP vehicle recognition bullet camera

M2121-EVL(2.8-12\8-32mm)1T 4MP vehicle recognition bullet camera

M2140-EVL(7-35mm)

1T 4MP vehicle recognition bullet camera

1T 2MP vehicle recognition bullet camera

M2121-EVL-Sf(2.8-12mm)

M2141-EVL(2.8-12mm)

Copyright © Huawei Technologies Co.,Ltd. All rights reserved.No part of this document may be reproduced or transmitted in any from or by any means without prior written consent of Huawei Technologies Co.,Ltd.

Integrated ITS Camera

Person Data Structuring Camera

Security Situation Awareness Camera

Omni-Data Structuring Camera

C (Credible) Series: Basic AI

Security Situation Awareness Camera

3MP Integrated ITS Camera

M2331-T3MP Integrated ITS Camera

M2331-TG9MP Integrated ITS Camera

M2391-T9MP Integrated ITS Camera

M2391-TG

1T 2MP Invisible IR AI Bullet Camera

M2120-10-EBIn

1T 4MP Behavior Analysis Bullet Camera

M2140-EBI(3.6/6mm)1T 4MP IR AI Bullet Network Camera

M2141-10-EGI1T 2MP IR AI Bullet Camera

M2121-10-EI

1T 2MP IR AI Bullet Network Camera

M2120-10-EI

1T 5MP IR AI Bullet Network Camera

M2150-10-EGI1T 5MP IR AI Bullet Camera

M2150-10-EI1T 2MP IR AI VF Dome Camera

M3221-10-EI

1T 2MP IR AI Bullet Camera

M2120-10-EI(7-35mm)1T 2MP AI Bullet Camera

M2121-10-EL(2.8-12mm)

Smart Tracking System

M8544-EL-Z37 2T 2MP Low-Light Invisible IR PTZ Dome Camera

M6621-10-EBIn-Z23

1T 2MP IR AI VF Dome Network Camera

M3220-10-EI

1T 5MP IR AI VF Dome Network Camera

M3250-10-EI NEW

NEW

NEW

1T 2MP IR AI Bullet Camera

M2121-10-EI(8-32mm)

NEW

NEW

2T Thermal & Optical Bi-spectrum Network Positioning System

M9341-10-Th(75mm) NEW

2T 8MP 5G AI PTZ Dome Camera

M6781-10-GZ40-W5 NEW

2T 4MP Multi-Lens AI PTZ Dome Camera

M6741-10-Z40-E2 NEW

2 TOPS 4MP Microwave AI Eco-Cube Fixed Dome Camera

M7341-10-I-RT NEW

NEW

1T 2MP Face Capture IR AI Bullet Camera

C2120-10-FI(6-9mm)1T 2MP Face Capture IR AI Bullet Camera

C2120-10-FIC2120-10-CI(6mm)1T 2MP Face Recognition Bullet Network Camera

NEW

1T 5MP AI IR PTZ Dome Camera

C6650-10-Z331T 2MP AI IR PTZ Dome Camera

C6620-10-Z23

1T 5MP IR AI Dome Camera

C2150-10-I-PU(3.6\6mm)1T 5MP IR AI Dome Camera

C2150-10-IU1T 2MP IR AI Dome Camera

C3220-10-I

1T 2MP IR AI Dome Camera

C3221-10-I 1T 5MP IR AI Dome Camera

C3250-10-I(U)

1T 2MP AI Box Camera

C1220-10(-Fb)

200万AI白光筒型摄像机

C2120-10-LU(2.8-12mm)

C2120-I2MP Starlight Infrared Bullet Camera

C2120-I-Sf2MP Starlight Infrared Bullet Camera

C2121-I2MP Super Starlight Infrared Bullet Camera

C2121-I-Sf2MP Starlight Infrared Bullet Camera

C2120-I-P(3.6/6mm)2MP Starlight Infrared Bullet Camera

C2120-I(3.6/6mm)2MP Starlight Infrared Bullet Camera

C2150-I(3.6/6mm)5MP Starlight IR Bullet Camera

C2150-I-P(3.6/6mm)5MP Starlight IR Bullet Camera

C2150-I5MP Starlight IR Bullet Camera

C2141-I4MP Super Starlight Infrared Bullet Camera

C6620-Z23(-sf)2MP Starlight Infrared PTZ Dome Camera

C6620-10-Z23/Z331T 2MP Starlight Infrared PTZ Dome Camera

C3050-I(2.8/3.6mm)5MP Starlight Infrared Fixed Dome Camera

C3020-I(2.8/3.6mm)2MP Starlight Infrared Fixed Dome Camera

C3020-EI-P(2.8/3.6/6mm)2MP Starlight Infrared Fixed Dome Camera

C2120-EI(3.6/6mm)2MP Infrared Bullet Camera

C2120-EI-P(3.6/6mm)2MP Starlight IR Bullet Camera

C3220-10-IU1T 2MP Starlight IR Dome Camera

C3221-10-IU1T 2MP IR AI VF Dome Camera

NEW C6650-10-Z331T 5MP Starlight Infrared PTZ Dome Camera

NEW

C3220-10-I-PU(2.8/3.6/6mm)1T 2MP IR AI Fixed Dome Network Camera

NEW

C3250-10-I-PU(2.8/3.6/6mm)1T 5MP IR AI Fixed Dome Network Camera

NEW

NEW

C2120-10-I-PU(3.6/6mm)1T 2MP IR AI Bullet Network Camera

NEW C2120-10-I-P(3.6/6mm)1T 2MP IR AI Bullet Network Camera

NEW C2150-10-I-P(3.6/6mm)1T 5MP IR AI Bullet Network Camera

NEW

C3220-10-I-P(2.8/3.6/6mm)1T 2MP IR AI Fixed Dome Network Camera

NEW

C3220-10-I(2.8/3.6mm)1T 2MP IR AI Fixed Dome Network Camera

C2120-10-L-P(3.6mm)1T 2MP Softlight AI Bullet Network Camera

C3050-10-I-P(2.8mm/3.6mm/6mm)1T 5MP IR AI Fixed Dome Network Camera

C3050-I-P(2.8mm/3.6mm/6mm)5MP Starlight Infrared Fixed Dome Camera

C3050-I-P(2.8/3.6mm)5MP Starlight Infrared Fixed Dome Camera

NEW

NEW

Huawei HoloSens IVS9000 provides large-capacity and high-concurrency video access, storage, forwarding, analysis, and searching capabilities. It is perfect for medium- and large-sized Safe City projects. As the intelligent center of the entire network, Huawei IVS 9000 processes complex, multi-dimensional, and cross-domain services. It aggregates and shares video resources from provincial and city offices, assisting the command center in cross-city collaboration.

Provides functions such as real-time surveillance, forwarding, video recording, backup, security alarm, intelligent analysis, voice intercom, and voice broadcast.

VCN VCMProvides functions such as video analysis and data search.

Lite Edge HoloSens IVS3800

Center Platform HoloSens IVS9000

Huawei HoloSens IVS edge solution provides more efficient storage, analysis, and search capabilities. It also supports medium- and small-sized service cover-

age, regional autonomy, and fast service deployment in city offices, district/county branches, and campuses.

Huawei Intelligent Video Storage is based on cloud architecture where software is decoupled from hardware and data is decoupled from applications. It uses a variety of mission-critical technologies, such as cloud computing, cloud storage, and big data to provide full-stack, all-cloud collaboration capabilities. Huawei HoloSens IVS can be used in Safe City projects and during situations requiring surveillance. This solution uses distributed cloud computing, high-performance big data, and intelligent analysis technologies to provide a high-density resource pool featuring elastic scaling. Huawei HoloSens IVS uses algorithm repository service to integrate third-party face-, vehicle-, and person-related algorithms, as well as reverse image search algorithms. Faster analysis is achieved through cooperation between software and hardware, which actively searches through hundreds of billions of data records within seconds. In addition, multiple platforms can collaborate to provide more efficient services. HoloSens IVS uses intelligent insight to help create safer cities.

HoloSens Intelligent Video Storage

Service Platforms

Infrastructure

VCN VCM

Cloud Computing Management Platform

Video Big Data

Big Data Support Platform

Access

Storage

Forwarding

Transcoding

Face analysis

Person analysis

Vehicle analysis

Video structuring

Search

Multi-algorithm

scheduling

Identity library

Pedestrian library

Vehicle library

Case library

···

Basic components

Distributed DB ···

General-purpose CPU, GPU, and NPU

Cloud storage resource pool

Storage device Network deviceCompute device

HCS, Docker

Cloud network resource pool

All-Cloud Synergy Hardcore Innovation Data Intelligence

In-house innovation

On-demand combinations of storage,

compute, and search capabilities

800-channel access, 2x the industry average

768-channel forwarding, 3x the industry average

384-channel computing, 4x the industry average

Multi-algorithm rollout in one week

App rollout in one week

N:N data clustering among millions of records

IVS3800SIVS3800XS

IVS3800FIVS3800XF

IVS3800CIVS3800XC

Storage

64-bit multi-core high-performance processor

Storage+Compute+Search

64-bit multi-core high-performance processorAI-accelerated processing unit

Compute

64-bit multi-core high-performance processorAI-accelerated processing unit

※Some of the preceding specifications are supported after software upgrade.

Copyright © Huawei Technologies Co.,Ltd. All rights reserved.No part of this document may be reproduced or transmitted in any from or by any means without prior written consent of Huawei Technologies Co.,Ltd.

Micro Edge HoloSens IVS1800

Huawei VCN is applicable for small-sized campuses, communities, and intelli-gent power distribution rooms.It supports a wide assortment of functions such as live video surveillance, video forwarding, video search, video playback, PTZ control, local video viewing,and alarm linkage.

Video Content Node

VCN510-88 channels

VCN510-8P8 channels, PoE power supply

VCN510-1616 channels

VCN510-16P16 channels, PoE power supply

VCN520-3232 channels

VCN540-6464 channels

Secure•N+0 cluster•SafeVideo+

Open•Supports 300+ brands of cameras•Supports connections to survei l lance platforms of 50+ brands.•Allows the eSDK to integrate partners' video surveillance capabilities.

IVS1800-C08-4T | IVS1800-C08-16T | IVS1800-C08-32T

Multi-algorithm concurrency | 16-channel video analysis | 64-channel image analysis

2 U embedded server | AI-accelerated processor

Scan for more

Official website

Intelligent Micro Edge Platform That Integrates Storage, Compute, and Search

16-/32-/64-channel, 8 disks

16-/32-/64-channel, 8 disks

32-/64-channel, 8 disks

Huawei HoloSens Intelligent Vision Product Portfolio

About Huawei HoloSens Intelligent Vision

Intelligent vision serves as the eyes of the intelligent world, the core enabler of worldwide sensory interconnectivity, and a key enabler for digital transformation of industries. Huawei intelligent vision refers to the technologies and methods revolving around the use of non-contact optical sensors to automatically receive and perform intelligent analysis on large amounts of image data, so as to obtain desired information and control machines or processes. An intelligent vision system, covering image collection and perception, data processing and analysis, and decision making and execution, generally consists of multiple units such as algorithms, software, and hardware. Huawei intelligent vision extends from industrial to non-industrial sectors and is widely applied in various industries, such as intelligent video surveillance, autonomous driving, robotics, and consumer electronics. Huawei Intelligent Vision, with Huawei HoloSens as its brand name, serves as an entrance to the intelligent world based on multi-dimensional awareness and data intelligence. Huawei HoloSens integrates technical edges in connectivity, computing, cloud technology, and devices to provide competitive multi-spectral intelligent awareness devices; delivers optimal intelligent video storage solutions for edge scenarios where massive amounts of video data are generated; provides powerful device-edge-cloud synergy solutions and business models with cloud services as the core; and builds an open, competitive, and operable intelligent vision ecosystem. Huawei HoloSens Intelligent Vision provides software-defined cameras (HoloSens SDCs), intelligent video storage (HoloSens IVS), and a one-stop intelligent video algorithm store (HoloSens Store) for sectors such as transportation, campus, education, and finance. Additionally, Huawei joins hands with partners such as algorithm, application, and hardware vendors to embed intelligence into all industries. From video surveillance to intelligent vision, Huawei takes the lead in research and technology development by leveraging its core technical advantages. In the future, Huawei will explore vehicle-mounted and industrial vision products to embrace larger markets. Huawei remains committed to providing the most competitive multi-dimensional awareness and device-edge-cloud synergy solutions in order to become a pioneer in the intelligent vision industry.

HUAWEI TECHNOLOGIES CO., LTD. Huawei Industrial Base Bantian Longgang Shenzhen 518129, P. R. China Tel: +86-755-28780808 www.huawei.com

DisclaimerThis document may contain predictive information, including but not limited to information about future finance, operations, product series, and new

technologies. Due to uncertain factors, the information may be greatly different from actual results.

Therefore, the information in this document is for reference only and does not constitute any offer or commitment. Huawei is not liable for any

behavior that you make based on this document. Huawei may change the information at any time without notice.

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved.

Trademarks and Permissions , HUAWEI, and are trademarks or trade names of Huawei Technologies Co., Ltd. All other trademarks, product names, service names, and company names mentioned in this document are the property of their respective holders.

No part of this document may be reproduced or transmitted in any form or by any means without the prior written consent of Huawei Technologies Co., Ltd.