CON
TEN
TS
Abbreviations
Legal Statement
Product Portfolio
Products and Solutions Catalog
CONTENTS
015G
Preface
02AI
Products and Solutions Catalog
SuperColor Technology
Storage EC TechnologyMulti-Lens Synergy Technology
Video Codec Technology
Chip Evolution and DevelopmentAlgorithm Repository Technology
Products and Solutions Catalog
03Cloud Service
04Ecosystem
05Appendix
P2P Technology
Products and Solutions Catalog
02
05
10
15
17
3236
63
67
72
74
78
81
82
83
42
47
52
56
60
Discussion on Frontend Intelligence Trends 24
28
Embrace the Intelligent Vision, Build an Intelligent World
Discussion on the Impact of 5G on Intelligent Vision
Discussion on Intelligent Vision Ecosystem Trends
Discussion on Video Cloud Service Trends
5G-enabled Image Encoding and Transmission Technologies
Image, Algorithm, and Storage Trends Led by AI
Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms
Build an Intelligent World
02
Embrace the Intelligent Vision
— President of Huawei Intelligent Vision Domain
In the past 120 years, three industrial revolutions have made breakthroughs in fields such as electricity and information technologies, dramatically improving productivity and our daily life. Today, the fourth industrial revolution, driven by AI and ICT technologies, ushers in an intelligent era where all things are sensing, interconnected, and intelligent. Vision, the core of biological evolution, will serve as a significant enabler in this era. The combination of AI and vision systems will enable machines to perceive information and respond intelligently, which revolutionizes people's work and everyday life, and improves productivity and security.
Today, we are delighted to see that new ICT technologies, such as 5G, AI, and machine vision are being put into commercial use, and playing a significant role in the video surveillance industry. 2020 marks the first year of 5G commercialization as well as a turning point of AI development. Additionally, machine vision now surpasses human vision to obtain more information in specific scenarios. The three technologies are interwoven with each other, fueling the development of intelligent vision.
Intelligent vision serves as the eyes of the intelligent world, the core of worldwide sensory connections, and a key enabler for digital transformation of industries. Huawei Intelligent Vision looks forward to, together with our partners across indus-tries, driving industry development and the intelligent transformation of cities, production, and people's life with the power of technology, to build an intelligent world where all things can sense.
Huawei remains steady in its commitment to embed 5G technologies into intelligent vision, which opens up opportunities by providing high bandwidth, low latency, and broad connection capabilities.
Huawei is developing intelligent cameras like how we develop smartphones by revolutioniz-ing the technical architecture, ecosystem, and industry chain. Huawei embeds innovativeoperating system (OS) into software-defined cameras (SDCs) to enable remote loading of intelligent algorithms anytime, anywhere. The HoloSens Store allows users to download and install algorithms on cameras depending on their needs.
Huawei adheres to the "platform + ecosystem" strategy to build a future-proof intelligent vision ecosystem and empower more industries. Huawei is committed to providing platforms and opening algorithms and applications to benefit vendors and customers across industries.
Huawei develops cloud-edge-device synergy to maximize data value. Huawei will give full play to the technical advantages of the device-edge-cloud industry chain, develop devices based on cloud technologies, and empower the cloud through interconnection with various devices, thereby advancing the digital transformation of all industries.
03
01
5GDiscussion on the Impact of 5G on Intelligent Vision
5G-enabled Image Encoding and Transmission Technologies
Products and Solutions Catalog
05
10
15
05
Comparison between 5G and 4G
Latency10 ms Latency 1 ms
Downlink service rate10 Mbit/s
Downlink service rate 2 Gbit/s
Uplink service rate1 Mbit/s
Uplink service rate 200 Mbit/s
4G 5G
3. Impact of 5G on Intelligent Vision
In the 4G era, video services were limited to the consumer field. This was due to the low bandwidth and high latency of 4G networks. However, compared with 4G, 5G improves the service rate by about 100-fold, and reduces latency by about 10-fold, enriching video application scenarios, from remote areas with complex terrains, to mines, factories, harbors with cabling difficulties, and places requiring security for major events.
Extending the breadth of intelligent vision
Niu Liyang, Liu Zhen
Discussion on the Impact of 5G on Intelligent Vision
1. 5G Development
New 5G infrastructure is driving the expansion of the global digital economy, and each country’s information capability is represented by the state of their 5G networks. 5G is even revolutionizing the whole industry chain, from electronic devices to base station devices to mobile phones. Therefore, major economies around the world are accelerating their application of 5G and actively exploring upstream and downstream industries to seize the strategic high ground. According to TeleGeography, a prominent telecommunications market research company, the number of global 5G networks in commercial use had reached 82 by June 2020, and will be doubled by the end of 2020.
2. Features of 5G Networks
With their high bandwidth, low latency, and massive connectivity, 5G networks contribute to the building of a fully connected world. They have three major applications: Enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency Communications (URLLC), and Massive Machine Type Communications (mMTC). Users can select the 5G devices they require according to different scenarios, and developers can select development scenarios based on the types of applications they want to create.
5G application scenarios
Fast transmission at Gbit/s
3D video and UHD video
Cloud-based office/gaming
Augmented reality (AR)
Industrial automation
Self-driving car
mMTC URLLC
Smart home
Smart city
Voice intercom
Intelligent video surveillance
eMBB
Source: International Telecommunication Union (ITU), partly updated
High-reliability applications, such as mobile healthcare
Niu Liyang, Liu Zhen
5G/Discussion on the Impact of 5G on Intelligent Vision
06
5G increases the peak transmission rate limit, laying a solid foundation for the internet of everything. It will play an important role in communications among machines and drive innovation across a range of emerging industries. Because of its high mobility and low power consumption, 5G is capable of supporting a wide array of frontend devices, such as vehicle-mounted devices, drones, wearables, and industrial robots, which will serve as significant carriers for video awareness. It is estimated that by 2023, the number of connected short-distance Internet of Things (IoT) terminals will reach 15.7 billion. In addition, the 5G network can be sliced into multiple subnets to meet the differing requirements of terminals in terms of latency, bandwidth, number of connections, and security. This will further enrich the application scenarios of 5G.
5G camera installed atop Mount Qomolangma Video image from a 5G camera
5G camera
Rongbuk Monastery
Optical fibers deployed at harbors are prone to corrosion, and those on gantry cranes can easily become entangled during operations. To solve this problem, HD cameras are connected to 5G networks to monitor gantry cranes, so that operators can remotely check lifting and hoisting operations in real time and promptly identify anomalies. In addition, powered by 5G and artificial intelligence (AI), most container hoisting operations can be completed by machines, greatly improving efficiency. When 5G is applied in a harbor, the transfer efficiency of the harbor is doubled, and the deployment and maintenance costs of optical fibers are reduced by about CNY100,000 each year. Additionally, operators no longer need to work at heights, greatly improving their work efficiency and ensuring safety.
Typical application case
Diverse 5G terminals become enablers of intelligent vision Network slicing enriches 5G application scenarios
5G network
Harbor VehicleEmergency assurance
5G slicing(harbor private
network)
5G slicing(bus private
network)
5G slicing(emergency assurance
private network)
Drone
穿戴设备Wearable 工业机器人Industrial robot
Vehicle-mounted device
07
With its low latency, 5G serves as the supporting system for AI. During the Industrial Revolutions, people increased their productivity by mastering mechanical energy. At present, we are experiencing an AI revolution, in which people are improving the intelligent capabilities of machines by harnessing computing power. As the cost of computing power drops, the cloud, edges, and devices are coming to possess ample computing power, which they can use to perform video-based analysis using intelligent algorithms, and generate massive amounts of valuable data. This data can only be fully utilized when it is quickly transferred among the cloud, edges, and devices.
5G is revolutionizing the way we think about AI. AI is now deeply rooted in the video surveillance industry, which in turn poses increasingly high requirements on video and image quality. 4K video encoded in the H.265 format requires an average transmission bandwidth of 10 Mbit/s to 20 Mbit/s. However, when intelligent services are enabled, the immediate peak transmission rate will soar to over 100 Mbit/s, far higher than that provided by 4G networks. Once they are connected to 5G networks, cameras can utilize the high bandwidth to quickly deliver detailed, high-quality video images, thereby improving intelligent analysis performance.
5G ushers in the AI era
VS
4G network720p camera
Bandwidth: 1 Mbit/s
Low-definition video, which cannot be used for intelligent services
720p video
5G network
4K camera
4K video
On-site operation
Optical fibers
Remote operation in the central control room
Camera
5G networks enable HD cameras to obtain full coverage
• Optical fibers easily become entangled
• Cabling is subject to sea tide impact
50 gantry cranes,each fitted with 10 to 18 cameras
Operators in the central control room can remotely operate two or three gantry cranes at the same time
Bandwidth: 200 Mbit/s
High-quality video,meeting the requirements of intelligent services
Optical fibers on existing gantry cranes
18 HD cameras are required for precise control
Remote detection and remote control joystick
Niu Liyang, Liu Zhen
08
111010011
1010011
010011
010011
010011
010011
010011
010011
110011
4. Application Bottlenecks of 5G in Intelligent VisionThe high bandwidth and low latency of 5G enable wireless video transmission, extending the boundary of intelligent vision applications. When powered by 5G, cameras can connect to massive sensors to implement multi-dimensional awareness. Additionally, as 5G develops, it is enabling the creation of various innovative kinds of devices, fueling the digital transformation of all industries.
5G/Discussion on the Impact of 5G on Intelligent Vision
Typical application case
Major economies around the globe are seeking to digitally transform their manufacturing sectors. Aircraft manufacturing is the most valuable sector of the manufacturing industry. Aircraft manufacturers adopt 5G and AI technologies for quality assurance, reducing the time required for carbon fiber stitching gap checks from 40 minutes to 2 minutes. In addition, 5G cameras provide a wide range of intelligent applications in factories, including safety helmet detection, workwear detection, and perimeter intrusion detection.
Intelligent capabilities are like electric power. The electric power possesses great potential, but cannot be directly applied in industries unless a power transmission network is built. 5G, in essence, serves as the transmission network for computing power and intelligent data. It enables the full implementation of intelligent capabilities, and by doing so, is promoting the intelligent transformation of industries and people's everyday life.
Intelligent data transmission on the devices, edges, and cloud
5G 5G5G
CloudAI
AI AIAI
Aircraft manufacturing plant
Edge node Edge node Edge node
09
To solve these problems, 5G cameras should not simply be combinations of cameras and 5G modules. Instead, they should provide efficient video/image encoding capabilities to reduce the bandwidth required for transmission. Additionally, reliable transmission technologies are needed to prevent the packet loss and bit errors which occur during wireless transmission. In this way, 5G base station resources can be utilized properly.
200m140Mbps
100m210Mbps
300m90Mbps
400m60Mbps
200 m140 Mbit/s
200m140Mbps
200 m140 Mbit/s
100 m210 Mbit/s
100m210Mbps
100 m210 Mbit/s
300 m90 Mbit/s
300m90Mbps
300 m90 Mbit/s
400 m60 Mbit/s
400m60Mbps
400 m60 Mbit/s
5G module
Built-in 5G module
Every technology encounters various difficulties when it is being applied. 5G is no exception when it is applied to intelligent vision. The 5G uplink and downlink bandwidths are unbalanced, and the total 5G uplink bandwidth of a single base station is limited to around 300 Mbit/s. However, most of the time, cameras upload P-frames containing changes in an image from the previous frame, as well as periodically upload I-frames containing all information. As a result, bandwidth usage can fluctuate dramatically. The instantaneous transmission rate of a single 4K camera can reach 60 Mbit/s. If five 4K cameras are connected to a single 5G base station, the uplink bandwidth of the base station will be insufficient for video transmission during peak hours. Therefore, video encoding needs to be optimized so cameras can adapt to the limited uplink bandwidth of 5G networks. In addition, packet loss and bit errors during wireless transmission may cause image quality issues such as artifacts and video stuttering, which require more reliable transmission modes.
A 5G network uses short wavelengths for transmission, which results in fast signal attenuation. The network bandwidth decreases rapidly as the distance increases. Therefore, the number of cameras that can be connected to a single 5G base station is limited. In addition, carriers tend to build 5G base stations based on their actual requirements in terms of construction costs and benefits, and 5G coverage is limited in the short term. Therefore, it is important to properly and efficiently use 5G base station resources and improve the coverage and access capability of a single base station.
Artifacts and video stuttering may occur due to wireless network transmission limitations.
Bandwidth attenuation of a 5G base station
Limited uplink bandwidth
Packet loss and bit errors frequently
occur during wireless transmissions
More efficient encoding
More reliable transmission
Niu Liyang, Liu Zhen
10
Chen Yun, Liu Zhen
5G-enabled Image Encoding and Transmission Technologies
5G expands the scope of intelligent vision, and embeds artificial intelligence (AI) into a wide range of industries. However, due to the limitations of 5G New Radio (NR), wireless 5G networks feature limited uplink bandwidth, and have high requirements for network stability. Technical innovations have sought to overcome these challenges for utilizing 5G in intelligent vision applications.
5G networks adopt a time-division transmission mode, and spend 80% of the time transmitting downlink data and 20% of the time transmitting uplink data, under typical configurations. Generally, the uplink bandwidth of a single 5G base station accounts for only 20% of the total bandwidth, and can reach 300 Mbit/s. However, in the intelligent vision industry, video and image transmission requires far higher uplink bandwidth than that provided by 5G networks.
1. Challenges to Video and Image Transmission on 5G Networks
D D D S U D D D S U
D D D D D D D S U U
Video and image transmission requires high uplink bandwidth and stable wireless networks
In addition, during video and image transmission, an I-frame containing the full image information is sent first, after which P-frames containing changes in the image from previous frames are sent, followed by an I-frame being sent again. The size of I-frames is larger than that of P-frames. As a result, image data occupies uneven network bandwidth during the 10 ms time window. Sending P-frames does not require a lot of bandwidth, but sending I-frames requires a high amount. For example, the average bit rate of 4K video streams is 12 Mbit/s to 20 Mbit/s, and the peak bit rate during I-frame transmission can reach 60 Mbit/s. This is known as I-frame burst, as it places great strain on the data transmission time window on 5G networks.
12345678
1 2 345678
RX+ (positive end for receiving data)RX- (negative end for receiving data)TX+ (positive end for transmitting data)Not usedNot usedTX- (negative end for transmitting data)Not usedNot used
Wired transmission Typical wireless time-division transmission
Bandwidth usage in a 10 ms time window, with each column indicating the size of a file
I-frame I-frameI-frame
I-frame I-frame
I-frame
Time
P-frame
P-frame
P-frame P-frameP-frame
0
File size
Time segment labeled with a D is used for data downlink, that labeled with a U is used for data uplink, and that labeled with an S can be configured.
4:1 subframe configuration
8:2 subframe configuration
Wired transmission in full-duplex mode to receive and send data packets anytime
Uplink transmission occupies only 20% of the total time, and uplink data packets can be sent only during the specific time
5G/5G-enabled Image Encoding and Transmission Technologies
11
Efficiently utilizing 5G base station resources to promote the large-scale commercial use of 5G in intelligent vision
In actual applications, a 5G base station always connects to multiple cameras at the same time. In this case, I-frame bursts may occur simultaneously for multiple cameras, resulting in I-frame collision, further intensifying the pressure on 5G NR bandwidth. According to tests, the probability of I-frame collision is close to 100% when over 7 cameras using traditional encoding algorithms are connected to a single 5G base station.
Furthermore, 5G networks are challenged by unstable transmission. Compared with wired network transmission, 5G wireless network transmission is subject to packet loss and bit errors, especially during network congestion. This results in video quality issues, such as image delays, artifacts, and video stuttering, which in turn affect backend intelligent applications.
In addition to limited uplink bandwidth and network transmission reliability, 5G networks feature a fast attenuation speed, which restricts the coverage of a single base station. This also affects the commercial use of 5G in intelligent vision. 5G transmission is mainly conducted on the millimeter wave and sub-6 GHz (centimeter-level wavelength) bands. These two bands feature short wavelengths, resulting in limited transmission range, poor penetration and diffraction performance, and faster 5G network attenuation. Therefore, the coverage of a single 5G base station is far smaller than that of a 4G base station. In addition, unlike 4G base stations which cover almost all areas, carriers build 5G base stations based on actual project requirements with construction costs and benefits taken into consideration. Therefore, efficiently utilizing 5G base station resources is essential to improving the coverage and access capabilities of a single base station, and to achieving the large-scale commercial use of 5G in intelligent vision.
Camera 1
Camera 2
Camera 3
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
1 2 3 4 5 6 7 8 9 10 11 12 13
Total uplink bandwidth of 5G networks decreases as the coverage radius increases
210
100 m 200 m 300 m 400 m
140
6090
Outdoor macrocell
Rate (Mbit/s)
I-frame
Probability that I-frames of all cameras do not collide with each other
Probability
Data packets of three cameras are scattered within 5 seconds, preventing I-frame collision
Number of cameras
25 frames per second GOP-25
25 frames per second GOP-30
25 frames per secondGOP-60
Supports 6–8 access channels for 40% of areas
Supports 2–3 access channels for 60% of areas
Coverage radius (m)
Chen Yun, Liu Zhen
12
In the intelligent vision industry, bandwidth required for video transmission has soared, as image resolution has continually increased. On top of that, high-quality person and vehicle images are captured and transmitted for intelligent analysis, which requires even higher bandwidth than that for video transmission. However, in real world applications, people tend to only focus on key information in video and images, such as pedestrians and vehicles, and have little need for high definition image backgrounds. ROI-based encoding technology was developed with this understanding in mind. It automatically distinguishes the image foreground from the background, ensuring high resolution in ROI within images, while compressing the background, which reduces the overall bandwidth required for transmission. This technology has managed to reduce the size of video streams and snapshots, with average bit rate a remarkable 30% lower in complex scenarios, and 60% lower in simple scenarios.
ROI-based encoding technology, reducing the average bandwidth required for video transmission
2. Key Technologies
The biggest challenge for large-scale commercial use of 5G in intelligent vision is efficiently utilizing 5G uplink bandwidth, and preventing packet loss and bit errors. As a remedy, the industry at large has sought to optimize image encoding and transmission.
Image encoding optimization is designed to eliminate I-frame bursts and reduce bandwidth required for video and image transmission. The region of interest (ROI)-based encoding technology is used to compress image backgrounds, which reduces the overall bandwidth required. In addition, stream smoothing technology is adopted to optimize I-frames, thereby reducing the peak bandwidth required and preventing network congestion.
Image encoding optimization
Compressed encoding of background, reducing bit rate
Normal encoding of foreground, ensuring high image quality
Encoder
Encoding streamAI
ROI-based video encoding vs. Traditional encoding method
Average bit rate of 1080p video (Mbit/s)
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Reduced by 30%
Reduced by 50%
Reduced by 60%
Complex scenario Common scenario Simple scenario
Complex scenario Common scenario Simple scenario
Original video/image
streams
Processed by AI algorithms
Standard H.265 encoding
ROI-based encoding
5G/5G-enabled Image Encoding and Transmission Technologies
13
I-frame optimization, reducing peak bandwidth required for transmission
DataChannel
No flow controls
Data Intelligent flow control Encoder
Transmission optimization technology mainly focuses on intelligent flow controls and network transmission reliability. Intelligent flow controls can detect network transmission status in real time and adjust data packet sending parameters accordingly, to improve overall network bandwidth usage. Network transmission reliability can be enhanced via automatic repeat request (ARQ) and forward error correction (FEC) technologies, and help prevent packet loss and bit errors.
In wireless transmission, if data is continuously sent while the network is congested, transmission capabilities will deteriorate sharply. Intelligent flow control technology makes use of flow control units to detect the length of data queues in real time, and adjust the data packet sending parameters accordingly. This allows for more data to be sent during off-peak hours, and prevents data stacking during peak hours, for optimized network bandwidth usage.
Intelligent flow controls
Transmission optimization
Receiver
No flow control: Data is directly sent to the channel, causing network congestions and packet loss.
Encoder
Channel
Peak bit rate of I-frames reduced by 40% after stream smoothing, reducing network congestions caused by I-frame bursts
0Time
0Time
Receiver
File size File size
Before I-frame optimization After I-frame optimization
The peak bit rate during I-frame bursts is extremely high, which can lead to network congestion. To address this, the industry has adopted a stream smoothing technology to adjust encoder parameters and control the size and frequency of I-frames, reducing the peak bandwidth required for video transmission during I-frame bursts.
Packets sent without flow control are prone to packet loss and
network congestions
Adjust the encoder and data packet sending parameters based on the length of data
queues, preventing data stacking.
Video delay and stuttering
Smooth, clear video images
Intelligent flow control: Flow control unit monitors network status in real time and adjusts the packet sending parameters to improve network usage and prevent network congestions.
Chen Yun, Liu Zhen
14
Video transmission through the Transmission Control Protocol (TCP) features low efficiency, particularly when packet loss occurs on wireless networks. On 5G networks, video and images are transmitted through the User Datagram Protocol (UDP), which features two implementation methods: acknowledgment and retransmission mechanisms based on ARQ and FEC. ARQ adds a verification and retransmission mechanism on the basis of the conventional UDP-based transmission. If the receiver detects that the transmitted data packet is incorrect, the receiver requests that the transmitter retransmit the data packet. FEC reserves verification and error correction bits during data transmission. When the receiver detects an error in the data, it uses the error correction bits to perform the exclusive or (XOR) operation, in order to restore the data. The transmission optimization technologies can ensure smooth video transmission, even when packet loss rate approaches 10%. However, transmission reliability improvement mechanisms need to be deployed on both the peripheral units (PUs) and backend platforms.
Enhanced transmission reliability to prevent packet loss and bit errors
These innovations have helped facilitate the commercial use of 5G in intelligent vision. More specifically, ROI-based encoding and I-frame optimization help reduce the average bit rate at the encoding end and the peak bit rate, so that 5G uplink bandwidth can be utilized in a more efficient manner. Intelligent flow controls and transmission reliability improvement technologies enable cameras to actively monitor data sending queues. This helps prevent network congestion and improve 5G bandwidth usage. In addition, advancements in encoding and transmission technologies allow a single 5G base station to connect to more cameras and increase its coverage range.
3. Camera Bit Rate and Base Station Coverage After Optimization
ARQ FEC
Receiveddata C2
Redundant coding matrix A
Originaldata B
Sent data C1
D1....
D3
D4C1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
R11 R12 R13 R14
D1
D2
D3
D4
D1
D2
D3
D4C1
=Data
transmission
Sender
Receiver
Retransmission
NOT OK!NOT OK!
Data D2 lost during data transmission can be restored using the received data and redundancy coding matrix (A^B=C). Data lost in matrix B can also be restored (C^A).
Peak bandwidth of 4K videoPeak bandwidth of 1080p video
AfterBefore
83
20
6
AfterBefore
41
60
15
Peak bandwidth required for video transmission
Uplink bandwidth: 300 Mbit/sUnit: Mbit/s
ARQ adds a verification and retransmission mechanism on the basis of the conventional UDP-based transmission. If the receiver detects that the transmitted data packet is incorrect, the receiver requests the transmitter to retransmit the data packet.
Number of cameras that can be connected to a single 5G base station within 400 m
Number of 1080p cameras supported by a single base station
Number of 4K cameras supported by a single base station
5G/5G-enabled Image Encoding and Transmission Technologies
15
Tan Shenquan, Liu Zhen
Products and Solutions Catalog
Huawei 5G Cameras
Huawei 5G Camera Models
Intelligent encoding and I-frame optimization, improving resource utilization of 5G base stations
Huawei, has leveraged its accumulated prowess in 5G and network communications, in releasing a series of patented innovations to resolve longstanding 5G transmission challenges, such as the limited coverage of individual 5G base stations, low uplink bandwidth, and packet loss. Huawei has also launched a series of related products, such as 5G cameras, that can be applied across a wide range of industries, including intelligent harbors and manufacturing.
5G networks feature limited uplink bandwidth, resulting in network congestion when I-frame bursts occur during video transmission. To resolve this problem, Huawei has proposed an region of interest (ROI)-based encoding technology to increase the compression ratio of image backgrounds. This helps reduce the average bit rate of video streams. Furthermore, the I-frame optimization technology helps reduce the bandwidth required for video transmission during peak hours, to prevent network congestion. After the optimization, the maximum number of cameras that can be connected to a single 5G base station has increased by two to three times, and 5G base station coverage has increased by two to three times as well, significantly improving the resource utilization of 5G base stations.
User Datagram Protocol (UDP)-based reliable transmission, ensuring smooth, efficient video transmission
To prevent packet loss and bit errors during wireless transmission, Huawei has adopted UDP and the dynamic optimization policy, to ensure smooth video transmission even when packet loss occurs.
M2281-10-QLI-W5 M6781-10-GZ40-W5 X7341-10-HMI-W5
Packet loss rate within 10%
Large-scale access Built-in integrated antenna, intelligent encoding and transmission optimization for 5G New Radio (NR), ensuring large-scale access of 5G cameras
Flexible deployment
AI-powered innovationProfessional-grade artificial intelligence (AI) chips and dedicated software-defined camera (SDC) operating system (OS), supporting a wide range of intelligent functions such as person analysis, crowd flow analysis, and vehicle analysis; support for long-tail algorithms
Supports n78, n79, and n41 frequency bands and standalone (SA)/non-standalone (NSA) hybrid networking
Clear, smooth video
Image encoding and transmission optimization technologies ensure smooth video transmission even when the packet loss rate reaches 10%
Tan Shenquan, Liu Zhen
02
AIImage, Algorithm, and Storage Trends Led by AI
Discussion on Frontend Intelligence Trends
SuperColor Technology
Storage EC Technology
Multi-Lens Synergy Technology
Video Codec Technology
Chip Evolution and Development
Algorithm Repository Technology
Products and Solutions Catalog
17
Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms
24
32
36
42
47
52
56
60
28
Ge Xinyu, Zhang Yingjun
Image, Algorithm, and Storage Trends Led by AI
17
AI has become a core enabler of digital transformation across industries
As artificial intelligence (AI) technology matures and an intelligent society develops, AI is being used in a wide range of industries. Currently, the transportation industry is using AI+video to achieve the efficacy of traffic management. In the future, AI+video will gradually be embedded in more sectors, such as government, finance, energy, and education.
The rapid development of AI is driving considerable growth within the global video analysis industry
In recent years, the fast development of deep learning technology has driven the rapid growth of the overall video analysis industry. According to statistics, from 2018 to 2023, the compound annual growth rate (CAGR) of the video analysis product market is predicted to reach 37.1%. Additionally, the proportion of intelligent cameras powered by deep learning is expected to increase from 5% to 66%.
1. AI+Video Future Prospects
Transportation
Government
Finance
Energy
Education
Proportion of intelligent cameras shipped with deep learning analytics and rules based analytics
0%2018 2019 2020 2021 2022 2023
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Rules Based Deep Learning Based
S
SS S 0.38bn
37.1%2018 global
revenue
2018-2023 CAGR66.4%2018
63.6%2019
42.9%2020
34.4%2021
26.1%2022
22.3%2023
YOY revenue growth
Video analysis applications
%
Data source: IHS MarKit 2019
Transport networks can use AI to: Recognize key people and vehicles, thereby improving traffic safety governance in urban areas; realize refined management of urban traffic and promote smooth traffic optimization based on precise data.
Governments can use AI to: Improve their administrative efficiency by informatizing infrastructure; improve the intelligence of various application systems; enhance information awareness, analysis, and processing capabilities by analyzing massive video data.
Banks can use AI to: Turn their focus from improving service efficiency to enhancing marketing, improving the intelligence of unstaffed bank branches, and accelerating the reconstruction of smart branches.
Energy companies can use AI to: Realize visualized exploration and development, and construct intelligent pipelines and gas stations.
Educational institutions can use AI to: Establish uniform systems across countries/regions; promote intelligent education; establish intelligent education demonstration areas; and drive education networking.
Ge Xinyu, Zhang Yingjun
AI/Image, Algorithm, and Storage Trends Led by AI
18
Why is it necessary to have an image quality assessment standard?
The status quo of image quality assessment standards
Machines are capable of conducting a wide range of recognition tasks, including recognizing objects such as pedestrians, cyclists, and vehicles. To improve the recognition accuracy of AI algorithms, high-quality video is needed.
All-scenario and all-weather coverage: New intelligent applications pose higher requirements on full-color imaging in low light conditions, and this is now a trend within the industry. For example, person re-identification (ReID) requires cameras to accurately capture the color of the surroundings and the gait details of people. Against this backdrop, infrared multi-spectral light compensation technology has been proposed, which enables cameras to perform better in low light conditions, and do so in an environmental-friendly way.
AI and image enhancement technologies have developed rapidly. Technologies such as AI noise reduction use global and local optimization methods to improve image quality. They focus on optimizing image quality for targets such as license plates, which greatly enhances the accuracy of image recognition. However, the industry still lacks a complete and objective image assessment standard.
The rapid development of AI in recent years has revolutionized the public safety industry. In the past, video needed to be watched by people, but now, machines also play an important role in viewing and analyzing video. However, the current technical standards do not reflect the true capabilities of today’s video surveillance technologies.
The current Chinese national standard GA/T 1127–2013 mainly lists requirements for camera network access and manual video viewing. According to the traditional assessment method, experienced workers grade images subjectively, but this method cannot be used in machine assessment. Now that AI is enabling image assessment to become increasingly objective, an objective image assessment standard needs to be formulated.
General technical requirements for cameras used in security video surveillance
ReID technology Full-color imaging in low light conditions
Re-ID
......
Cyclists VehiclesPedestrians
2. To Achieve AI Development, an Image Quality Assessment Standard is Needed for Intelligent Cameras
19
Thoughts and suggestions on the design of a standard system
Key issues relating to the formulation of a new standard
There are five key issues to consider when developing an image quality assessment system for intelligent cameras.
GYT 134 (1998), The method for the subjective assessment of the quality of digital television picture
1997 1998 2000 2002 2003 2007 2009 2010 2011 2012 2013 2018 2019
GB 50198-2011 Recommendation ITU-T J.341 (2011),
Recommendation ITU-T J.341 (2011),
Technical code for project of civil closed circuit monitoring television systemObjective perceptual multimedia video quality measurement of
HDTV for digital cable television in the presence of a full referenceObjective multimedia video quality measurement of HDTV
for digital cable television in the presence of a reduced reference signal
GA/T 1356-2018 Specifications for compliance tests with national standard GB/T 25724-2017
(2017 to now)(2012 to now)
No Reference Metric (NORM) Audiovisual HD Quality (AVHD)
GA/T 1127-2013 General technical requirements for cameras used in security video surveillance
The current Chinese national standard GA/T 1127–2013 mainly lists requirements for camera network access and manual video viewing. According to the traditional assessment method, experienced workers grade images subjectively, but this method cannot be used in machine assessment. Now that AI is enabling image assessment to become increasingly objective, an objective image assessment standard needs to be formulated.
General technical requirements for cameras used in security video
The assessment indicators should be associated with user scenarios and reflect practicability of the service.
Score weighting should be decided based on each user task and scenario to calculate the overall score.
Methodology for the subjective assessment of the quality of television pictures
Recommendation ITU-R BT.500-13 (2012),
HDTV Phase I (2010),
Recommendation ITU-R BT.500-12 (2009), Methodology for the subjective assessment of the quality of television pictures
Full References (FR) and Reduced Reference (RR) objective video quality models that predict the quality of high definition television QART (Quality Assessment for Recognition Tasks) (2010)
RRNR-TV (2009), Reduced Reference (RR) and No References (NR) objective video quality models that predict the quality of standard definition television
Recommendation ITU-R BT.1788 (2007), Methodology for the subjective assessment of video quality in multimedia applications
FRTV Phase II (2003), Full References (FR) objective video quality models that predict the quality of standard definition television
Recommendation ITU-R BT.500-11 (2002), Methodology for the subjective assessment of the quality of television pictures
FRTV Phase I (2000), Full References (FR) objective video quality models that predict the quality of standard definition television
Recommendation ITU-R BT.500-7 (1997), Methodology for the subjective assessment of the quality of television pictures
Objectivity of camera imaging quality assessment
When humans judge imaging quality using their eyes, their assessment is subjective. An objective quality assessment model would be based on existing full-reference, semi-reference, or no-reference models within the industry.
The assessment result arrived at by intelligent vision must be consistent with the subjective perception. This is a key factor that any standard system must promote and recognize.
Currently, the image quality indicators of cameras are mainly evaluated using test cards and software or by manual judgment. This is different from the actual scenarios where these cameras would be used, which involve moving objects like people and vehicles. In addition, infrared multi-spectral light compensation technology is widely used in actual scenarios. Therefore, the spectral characteristics of the target must be consistent.
Currently, the image quality indicators of cameras are tested separately, and the relationship and weight of indicators for different intelligent tasks are not considered.
Different assessors should get the same result regardless of time or place.
Consistency of assessment result and subjective perception
Identity of assessment scenario and real environment
Concordance of assessment indicators and actual effect
Repeatability of assessment methods
The assessment dimensions should include the user task type, user scenario type, and basic factor of image customer assessment.
Ge Xinyu, Zhang Yingjun
20
3. Service Development Requirements for AI Algorithms and Future Evolution
Evolution from traditional single-object analysis to multi-object associative recognition
Indicator system for the image quality assessment for intelligent cameras
Overall score
Recognition task 2
Score
Recognition task 3
Score
........
Score
Recognition task 1
Daytime
Even illumination in the daytime
Score
Backlight in the daytime
Score
Score
...
Low light at night
Score
Low light with glare
Score
Nighttime
Rain and snow
Score
Rain and fog
Score
.....
DefinitionTexture detail NoiseContrast
Color reproductionColor sensitivity Color saturationExposure qualityGeometric distortion
Stability Frame rate
计算函数f(x)Calculation function f(x)
计算函数f(x)Calculation function f(x)
计算函数f(x)Calculation function f(x)
Person recognition Behavior recognition Gait recognition
...
License plate recognition
Multi-algorithm integration
Light raking in the daytime
Aggregate scores by user task weight
Aggregate scores by user scenario weight
Basic image indicator factor
Objective quality factors of a single frame in the spatial domain:
Objective quality factors in the temporal domain:
The traditional single-object recognition method cannot accurately recognize or analyze occluded objects. Instead, multiple algorithms must be integrated to improve recognition efficiency, which has become a key service requirement and future direction for algorithm evolution.
AI/Image, Algorithm, and Storage Trends Led by AI
21
Evolution from traditional service closed-loop in a single area to comprehensive security protection
Bus stations/Bus stops
Airports
Pedestrian zones/Areas
Railway/Subway stations
4. Storage Requirements of AI DevelopmentThe status quo of video and image storage
To improve recognition accuracy, AI algorithms pose higher requirements on the image quality of cameras (including definition and resolution). In smart cities and intelligent transportation systems, HD cameras are widely deployed, and this requires considerable storage space for video and images. As a result, storage duration and coverage areas increase, which can lead to a range of problems such as a limited equipment room footprint, high power consumption, and maintenance difficulties.
Customers' primary concern is how to improve storage space utilization and reduce equipment room footprint, storage deployment costs, power consumption, and total cost of ownership (TCO).
In a medium-sized city
Video resolution Storage duration Coverage area
1080p 30 days Key areas
4K 90 days All areas
Limited equipment room footprint40+ cabinets;
line reconstructionMaintenancedifficulties
Component/Node/Site faults
High power consumption
440+ kW
Comprehensive intelligence across all scenarios: Implement closed-loop video surveillance for key areas such as city's entrances, railway stations, subway stations, bus stations, airports, pedestrian zones, urban-rural intersections, street communities, and agricultural trade markets.
Multi-dimensional data collision and analysis: Align vast quantities of video and image data with multi-dimensional social data such as travel data, to better analyze people.
Full awareness of people and vehicles within a residential community: Collect and update data for people and vehicles entering and leaving residential communities every day in real time; quickly, and accurately recognize objects.
Social and transportation development facilitates provincial and national population mobility. Therefore, the traditional service, with a closed-loop in a single area, cannot meet the requirements of comprehensive security protection which is gradually developing towards cross-region intelligent management.
Ge Xinyu, Zhang Yingjun
22
5. Trends
The core objective of AI is to turn the physical world into metadata for analysis. However, in actual applications, a single piece of metadata is generally useless. This requires frontend devices to go from uni-dimensional data collection to multi-dimensional data awareness, and backend platforms to evolve from relying on image intelligence to data intelligence. In this way, data can be fully associated and utilized for analysis and prediction.
Frontend devices: from uni-dimensional data collection to multi-dimensional data awareness
Aggregated data lake
In smart cities and intelligent transportation systems, video streams are mainly used to conduct AI analysis of people and vehicles. A balance needs to be struck between lowering storage costs and ensuring the accuracy of this analysis.
Future trends
High-density storage: more storage media per unit
Video compression: Deep video compression enables better utilization of storage space. For example, region of interest (ROI) compression technology separates and extracts ROIs from the background to reduce video bit rate and storage space without decreasing the ROI detection rate.
机动车
Bit rate beforecompression: 2642 kbit/s
Motorvehicle 机动车
Bit rate aftercompression: 551 kbit/s
Motorvehicle
Siloed systems where data is isolated
Department C
Accommodation
Travel
Department B
Phone
Relationship
Department A
Person
Vehicle
Pixel-level image
segmentation
Diversified awareness dimensions and integrated device form
Multi-dimensional data awareness (+time/space/multi-modal) where data has converged
AI/Image, Algorithm, and Storage Trends Led by AI
23
Backend platform: from image intelligence to data intelligence
Image intelligence: unforeseeable Data intelligence: foreseeable
............ ......
Internet of things (IoT) data
Internet data
Ge Xinyu, Zhang Yingjun
Xu Tongjing
24
AI/Discussion on Frontend Intelligence Trends
Superior imaging quality with ultimate computing power
Intelligent cameras, as sensing devices in the intelligent vision sector, were introduced around five years ago. Different from traditional IP cameras (IPCs), intelligent cameras can adapt to challenging environments and collect video data of a higher quality. However, due to immature algorithms and chips, intelligent cameras cannot provide sharp, HD-quality images in harsh weather conditions such as during rain, sandstorms, and on overcast days. In addition, factors such as poor installation angle, occlusion, low light, and low resolution may also lead to inaccurate object recognition. If the imaging quality cannot be guaranteed, intelligence will remain an unachievable mirage.
With AI algorithms, intelligent cameras can automatically adjust image signal processing (ISP) parameters such as shutter speed, aperture, and exposure according to the ambient lighting and object speed, deliver optimal images for further detection and recognition, and associate face images with personal data.
The aim of artificial intelligence (AI) is to train computers to see, hear, and read like human beings. Current AI technologies are mainly used to recognize images, speech, and text. Renowned experimental psychologist D. G. Treichler proposes that 83% of the information we obtain from the world around us is through our vision. Therefore, over 50% of AI applications nowadays are related to intelligent vision, and around 65% of industry digitalization information comes from intelligent vision. In addition, to bridge the physical and digital worlds, all things must be sensing. The type, quantity, and quality of data collected by frontend sensing devices determine the intelligence level.
1. Five Advantages of Frontend Intelligence
Discussion on Frontend Intelligence Trends
83%11%
1.5%
1%
3.5%
Intelligent image quality adjustment
25
Applicable to varied scenarios
Intelligent vision systems are increasingly expected to satisfy the needs of various industries for various intelligent applications at various times and in various scenarios. For example, cameras must be able to detect vehicle queue length and accidents in the daytime and detect parking violations at night or load different algorithms at different preset positions.
Thanks to frontend intelligence, customers can load their desired algorithms on intelligent cameras to satisfy their personalized or scenario-specific requirements. This also helps reduce risk exposure in the delivery of diversified algorithms. In addition, lightweight container technology is used to construct an integrated multi-algorithm framework. This enables each algorithm to operate independently, ensuring service continuity during algorithm upgrade and switchover. Customers can also flexibly choose their desired intelligent capabilities to adapt to specific application scenarios.
System linkage within milliseconds
In many industries, such as transportation and emergency response, fast response and closed-loop management are the basic and also the most critical requirements of services. Frontend intelligence enables cameras to analyze video in real time and to immediately link related service systems upon detecting objects that trigger behavior analysis rules, in locations such as airports and high-speed rail stations.
In road traffic scenarios, cameras need to link external devices such as illuminators, radar detectors, and traffic signal detectors within milliseconds. For example, cameras need to work with illuminators to provide enhanced lighting for specific areas at the right moment or periodically synchronize with traffic signal detectors to accurately detect traffic incidents. In other linkage scenarios, for example, linkage between radar detectors and PTZ dome cameras or between barrier gates/swing gates and cameras, frontend intelligence can dramatically improve the system response efficiency and ensure quick service closure.
Optimal computing efficiency
Video plays an essential role in some key industries such as social governance and transportation. However, the traditional video surveillance market tends to be saturated and cannot satisfy digital transformation across industries. Thanks to ultimate computing power, a lot of intelligent applications are now possible. Compared with backend intelligence, frontend intelligence improves computing efficiency by 30% to 60%.
With frontend intelligence, each camera processes only one video channel at the frontend, which poses lower requirements on computing power, and directly obtains raw data for analysis, further reducing computational requirements and enhancing processing efficiency. Frontend intelligence also enables cameras to deliver high-quality images to the backend, so the backend platform can focus on intelligent analysis while focusing less on secondary image decoding. With the same computing power, image analysis is roughly 10 times more efficient than video analysis. Moving intelligence to the frontend can maximize the value of intelligent applications for customers with limited resources.
Radar Radar
Intelligent camera
Vehicle feature
extraction
Vehicle capture
Gantry Gantry
Intelligent camera
Frontend intelligenceBackend intelligence
Computing efficiency
0
100%
Intelligent camera
Motor vehicles, non-motorized vehicles, and pedestrians appear
simultaneously
Collision warning upon lane change
Millimeter-wave radar
Xu TongjingXu Tongjing
26
AI/Discussion on Frontend Intelligence Trends
2. Key Factors for Implementing Frontend Intelligence
Improved engineering efficiencyTo apply intelligent applications on a large scale, engineering issues must be considered. A top concern for engineering vendors is upgrading and reconstructing the live network using existing investments and at the lowest cost. The prevalence of intelligent cameras (including common cameras with inclusive AI computing power), where intelligent algorithms can be dynamically loaded, can dramatically improve the frontend data collection quality, enhance the intelligent analysis efficiency by 10-fold and intelligent application availability by several-fold, and lower the total cost of ownership (TCO) by over 50%.
The most basic functionality of a camera is to shoot HD video around the clock, and HD and sharp images are the most basic requirements for computer vision. Computing power is required to optimize images to improve the intelligent recognition rate. In scenarios where intelligent services require high real-time performance, ultimate computing power is required to meet real-time data awareness, computing, and response requirements.
In addition, frontend intelligence enables a camera to run multiple algorithms concurrently. For example, an intelligent camera can simultaneously load multiple algorithms such as traffic violation detection, vehicle capture and recognition, and traffic flow statistics, while multiple devices were required to support these functions in the past. This sharply lowers the engineering implementation difficulty and improves the engineering efficiency.
In terms of product technologies, intelligent cameras must be equipped with AI main control chips and intelligent operating systems to implement frontend intelligence.
Frontend intelligence
Backend intelligence
Frontend intelligence
Backend intelligence
Frontend intelligence
Backend intelligence
Intelligent analysis efficiency improved by 10-fold
TCO reduced by over 50%Intelligent application availability improved by several-fold
0
100%
0
100%
0
100%
27
Customers require cameras with different hardware forms and software with different capabilities depending on the usage scenario. Currently, most cameras are designed for specific scenarios, but their software and hardware are closely coupled. If software can be decoupled from hardware, users can install desired algorithms on cameras just like installing apps on smartphones. This maximizes the value of hardware, saves overall costs, and improves user experience. To decouple software from hardware, an open and intelligent operating system is required. With the intelligent operating system, differences between bottom-layer hardware are no longer obstacles. After the computing and orchestration capabilities of bottom-layer hardware devices are invoked, they are uniformly encapsulated by the operating system. This significantly simplifies development and allows developers to focus solely on the software's functional capabilities. In addition, the lightweight container is used to construct an integrated multi-algorithm framework, where each algorithm runs independently in a virtual space, allowing independent loading and online upgrading. In summary, an intelligent camera operating system is the basis of frontend intelligence.
Computing power is the foundation of intelligent capabilities, while professional AI chips give a huge boost to computing power. Accelerated by dedicated hardware, these AI chips support tera-scale computing and visual processing based on deep learning on a neural network. To support frontend intelligence, cameras must be equipped with professional AI chips.
The industry has reached a consensus on frontend intelligence and related standards. Mainstream vendors and users in the industry are actively embracing frontend intelligence. Vendors in the industry have launched products such as software-defined cameras and scenario-specific intelligent cameras. The industry ecosystem is thriving.
Intelligent awareness can help collect multi-dimensional data, dramatically improve the data collection quality, and unleash the value of mass video data while reducing computing power required for backend data processing and the overall TCO. In addition, distributed processing significantly improves system reliability.
In the mobile Internet sector, the app market provides an overwhelming number of apps. Users can download and install desired apps on their smartphones. In the intelligent video sector, the burning question is: How can we aggregate excellent ecosystem partners to provide superior algorithms and applications to meet customers' fragmented and long-tail requirements? To address this issue, the intelligent algorithm platform was developed, which aggregates ecosystem partners in the intelligent vision sector to provide intelligent video/image applications for a range of industries. The platform protects developers' rights and interests through license files and verification mechanisms and also allows users to easily choose from a range of reliable intelligent algorithms. In addition, intelligent cameras can connect to a range of hardware sensors in wired or wireless mode to help build a multi-dimensional awareness ecosystem. With a rich ecosystem, a large number of long-tail algorithms dedicated to specific industries can be quickly released to meet the requirements of various scenarios.
From the perspective of application ecosystems, frontend intelligence requires a future-proof algorithm and hardware ecosystem to boost industry digital transformation.
Intelligent operating system
Xu TongjingXu Tongjing
AI/Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms
Beijing CSVision Technology Co., Ltd.
1. Background
2. Cloud Storage: Managing a Massive Amount of Video and Images
The public safety industry has developed rapidly over the past 40 years, and there is now a large market for public safety products and services. The emergence of innovative technologies like 5G, cloud storage, new video codec technologies and video/image analysis technologies has driven the industry to expand, and it now encompasses a much wider range of fields. These technologies are driving the public safety industry into a brand-new era in which images are clearer, more accurate, and comprehensible, and more data values are mined from a multitude of sources, including: masses of video and image data; service data from the transportation industry, governmental bodies, campuses, and enterprises; multi-dimensional big data from Internet and Internet of Things (IoT).Furthermore, the industry is becoming increasingly intelligent, and risks can now be identified in advance, achieving proactive surveillance and prevention, and enabling the public safety industry to shift from merely perceiving to being capable of foresight. Scenario-based intelligent video and image cloud platforms need to adapt to this.
Like the public safety industry itself, public safety management platforms have gone through multiple phases. They have evolved from being a standalone software to an embedded software to a distributed system and finally to an intelligent video and image cloud platform. Driven by scenario-based requirements and related technologies, intelligent video and image cloud platforms have been optimized to overcome the shortcomings of previous phases.
In the field of public safety, the main problem video and image platforms encounter is how to store and search through a massive amount of video and images. If millions of cameras are deployed, tens of exabytes (EB) will be required to store the video and images generated by these cameras. Traditional distributed video surveillance platforms cannot solve this problem because they are independent and cannot connect or share with each other. The emergence of cloud storage technologies has solved this problem.
Smooth expansion
Storage nodes can be added online at anytime and anywhere to meet storage capacity requirements. This means
data can be automatically reallocated after a scale-out, and restored after a scale-in.
Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms
H.264MPEG-4/AVC
4K
2K
1080P
720PDV
DB
DBDB
DB
DBDB
DBDB DB
Video sensing Video storage Video encoding
Intelligent analysis Network transmission
28
3. AI Technology: Better Understanding of Video and Images The development of video-related technologies has played a significant role in the advancement of the public safety industry, and the emergence of visual artificial intelligence (AI) technologies has strengthened this role. AI will be key to the future development of the public safety industry.The increase of hardware computing power and optimization of the software framework has driven the explosive development of AI technologies. As the field where AI is implemented most directly, the public safety industry also improves the development of AI technologies. The emergence of AI makes it possible for intelligent video and image cloud platforms to store and manage video and image data in a structured manner, and better understand video and images.
Cloud-edge synergyEdge computing enables real-time feedback and alleviates the pressure on the network transmission bandwidth. The cloud focuses on non-real-time, long-period, and service decision-making scenarios, while edges function as terminals which collect high-value data to better support big data analysis on the cloud. The cloud delivers service rules to edges, thereby optimizing their service decision-making. Enabling the cloud and edges to collaborate is the best way to cope with the huge amounts of data generated by AI-powered systems.
Video and image resource pool designThe video and image resource pool aggregates and manages various types of video and images in the local domain and provides resource management services for external systems, for example, the upper-level domain.Various themed libraries and specialized libraries are formed through the aggregation and governance of video and images. External services include basic data services like data queries, data subscriptions, database views, and data sharing as well as basic application services such as full-database searches, model matching, feature analysis, relationship graphs, and convergent big data.
Multi-dimensional application of video and imagesIntelligent video and image cloud platforms perform multi-dimensional data convergence and matching on managed data and external data. For example, they trigger multi-dimensional alert tasks, report alarms in real time, and analyze events from multiple dimensions based on real-time data. Another example is that they analyze and mine internal relationships among historical data to quickly identify anomalies. In addition, they use knowledge graph technology to mine relationships between people, events, and people and events, providing a decision-making basis for major events and improving the intelligent analysis capability of the entire system.
Efficient storageBandwidth aggregation enables the unlimited expansion of storage bandwidth, and block-level data striping enables fast concurrent access. Structured sequential storage allows the structured storage of unstructured video data, achieving efficient video storage and quick searches. Increased computing power enables video synopsis and data migration, so less storage space is required on the video cloud.
29
Beijing CSVision Technology Co., Ltd.
4. 5G Enabling Extensive Large-Capacity ConnectionsIntelligent video and image cloud platforms use existing network technologies to achieve heterogeneous data connections among multiple networks. However, wireless-based applications are not widely used due to limited bandwidth and poor real-time performance. With enhanced mobile broadband, platforms can utilize 5G's reliability, low latency, wide coverage, and mass connections to improve their connection capabilities. In addition, since nearly 80% of typical AI and 5G applications overlap, 5G can completely support the large-scale implementation of AI applications.
More intelligent and clearer5G- and AI-powered 4K and 8K video surveillance solutions have become an optimal choice due to their high frame rate, ultra high definition (HD) technology, and wide dynamic range. AI algorithms can extract more detailed information such as a person's physical characteristics and behavior from video, making it possible for an intelligent video and image cloud platform to be applied in more diverse industry scenarios. With the help of augmented reality (AR), virtual reality (VR), and 5G technologies, the platform can provide immersive services for specific scenarios such as power monitoring and repair, and intelligent manufacturing.
More convenient and efficient5G enables the formation of a quick surveillance solution comprising a command center and AI-powered cameras. Data is transmitted quickly and securely and cameras provide all required computing power.In addition, AirFlash 5G implements fast data dumping for train-to-ground communications, such as metro and railway communications. This solution efficiently transmits data for passenger cars and crew compartments, as well as loading video, to the ground, so that security risks can be detected promptly.
30
AI/Discussion on Development Trends Among Intelligent Video and Image Cloud Platforms
5. Future TrendsPowered by 5G, AI, and video and image applications, intelligent video and image cloud platforms are being applied in various industry scenarios. They have gradually evolved to become the core of video surveillance systems. In the future, intelligent video and image cloud platforms will continue to develop in the following ways:
More compatible and openPlatforms are required to be compatible with frontend devices (camera, digital video recorder, digital video server, network video recorder, and central video recorder); backend storage media (IP storage area network, fiber channel storage area network, network attached storage, direct attached storage, and cloud storage); various compression algorithms (MJPEG, H.263, MPEG4, H.264, H.265, and SVAC); multiple AI algorithms (vehicle analysis, pedestrian analysis, and behavior analysis). The high requirements on compatibility require a great degree of openness. As a result, more compatible and open platforms will be the trend.
Data visualization and integrationIntelligent video and image cloud platforms converge IoT data in the form of video and images. The question of how to intuitively and efficiently display and apply IoT data will become a much-debated topic for this kind of platform.
Efficient management of unstructured dataThe core service will still focus on video and image management, in terms of data storage, forwarding, searching, and application. Among management indicators, response speed is among the most important when it comes to measuring the platform performance. The question of how to efficiently manage unstructured data will be researched to improve the response speed. The structured data technology may be an important tool to solve this issue.
Layered software designImplementing industry requirements is key to developing innovative technologies such as AI, cloud computing, and 5G. Although video surveillance platforms in different industries differ greatly in terms of functionality, on an architectural level, they are basically the same, and their differences are mainly limited to the application layer. This not only ensures the stability of the basic architecture but also enables these platforms to adapt to a huge range of industry applications. In future, a layered software design will be introduced so that the diverse requirements of each industry will be quickly met and customer stickiness will be improved.
31
Widely used edge-cloud synergyWith the increase of edge computing power and the concurrent decrease in the cost of unit computing power, more mature video structuring technologies will make structured descriptions possible for video. Tens of billions of structured and semi-structured data records will be generated based on massive amounts of accumulated video data. Edge-cloud synergy will help intelligent video and image cloud platforms to cope with this explosive data growth.
Beijing CSVision Technology Co., Ltd.
32
AI/Chip Evolution and Development
1. Chip Classification
Chips can be classified into general-purpose chips and special-purpose chips according to their intended application.
In fact, general-purpose and special-purpose chips are starting to converge in terms of chip design. For example, general-purpose chips are integrating GPUs, DSPs, and even FPGAs to provide acceleration capabilities for specific applications, while special-purpose chips are also integrating general-purpose CPUs to provide flexibility and independent deployment capabilities.
General-purpose chips, designed to execute general tasks, are mainly central processing unit (CPU) chips based on various architectures, such as x86, Arm, MIPS, PowerPC, and RISC-V. This type of chips can run operating systems and provide abundant peripheral interfaces, meeting diversified application requirements. As market competition gets fiercer, chip architectures such as MIPS and PowerPC are becoming less prevalent and may one day disappear altogether.
Chip Evolution and Development
2. Chip History
The CPU has been synonymous with Intel/x86 ever since its advent back in 1971. In 1971, Intel developed the world's first microprocessor, the 4004, a 4-bit processor that was capable of performing 60,000 operations per second (OPS). This epoch-making product, despite its weak performance, had far-reaching implications when it debuted. In the next few years, Intel quickly rolled out more processors such as the 4040, 8008, and 8080.
x86 chip
In 1978, Intel launched the first 16-bit processor, the i8086, which gave rise to the x86 architecture. This processor used an instruction set called the x86 instruction set, which has evolved since and is in ubiquitous use today.
In the 1980s, Intel made available the 80286, 80386, and 80486 processors, which featured over 1 million transistors and a CPU clock speed of up to 50 MHz. In the 1990s, Intel launched the Pentium/P6 series of processors. Technologies such as superscalar, multi-level cache, branch prediction, and single instruction, multiple data (SIMD) were integrated into the CPU. In 2000, Intel launched the 64-bit Pentium 4 series of processors, which supported virtualization. In 2010, Intel introduced its all-new Intel Core series of processors that adopted the tick-tock production model. With this model, every microarchitecture change (tock) was followed by a die shrink of the process technology (tick). However, thanks to the shrinking die sizes, that process was replaced with a three-element cycle known as Process-Architecture-Optimization (PAO).
Special-purpose chips are designed for a specific type of application, for example, general processing units (GPUs) for image processing, digital signal processors (DSPs) for digital signal processing, AI chips for AI acceleration, and field-programmable gate arrays (FPGAs) for hardware programming in specific application scenarios. This type of chip is efficient in processing specific applications but weak in processing general services.
General-purpose chips
x86 Arm MIPS
PowerPC RISC-V
Special-purpose chips
AI chip FPGA
GPU DSP
Yang Shengkai, You Shiping
The core components of a camera are the image sensor and image processing chip. Optimal image quality requires both light processing by the lens and image sensor as well as image processing by image processing chips. Therefore, chips, as the core processor, play a critical role in high-definition (HD) video surveillance. To support frontend intelligence, cameras must be equipped with professional Artificial Intelligence (AI) chips.
33
Arm processors can be traced back to the Cambridge Processing Unit (CPU), which was founded in 1978 in Cambridge, UK and renamed Acorn in 1979. The company specialized in electronic devices. In 1985, Acorn developed the first-generation 32-bit 6 MHz processor based on the reduced instruction set computer (RISC) architecture. Acorn gave the name, Acorn RISC Machine (Arm), to this processor.
RISC features low power consumption and high cost-effectiveness since it supports simple instructions. Therefore, RISC is perfect for mobile devices. Apple's Newton personal digital assistant (PDA) was a well-known early device that used an Arm processor.
In the 1990s, the creators of the Arm processor developed 32-bit embedded RISC processors aimed at embedded applications featuring low power consumption, low costs, and high performance. At the turn of the 21st century, Arm processors gradually increased their domination of the booming mobile phone market across the globe, which it continues to enjoy to this day.
Arm processors
Currently, there is no recognized standard for the definition of AI chips. Generally speaking, chips for AI applications are all called AI chips. AI chips can be classified into three types according to their design principle: (1) chips for accelerating training and inference of machine learning algorithms, especially deep neural network (DNN) algorithms; (2) brain-like chips inspired by biological brains; and (3) general-purpose AI chips that can efficiently compute various AI algorithms.
AI chips
8086/8088
Core series (second generation to tenth generation)
1978: real mode, 16-bit processor
Pentium
1993: superscalar, data/instruction cache, branch prediction, SIMD (MMX)
Pentium 4 series
2000–2006: NetBurst microarchitecture, hyper-threading, 64-bit architecture, SSE2/3 with virtualization support
Core
2010: Westmere microarchitecture, 32 nm process
Arm7
199332-bit, Arm v4 architecture, von Neumann architecture
1998Arm v5 Harvard architecture, which can run advanced embedded operating systems such as Linux
Arm9
2002Arm v6 architecture that supports SIMD instructions
Arm11
2005Arm v7-A architecture that supports hardware virtualization, superscalar, and multi-core
Cortex-A8
2007Arm v7-A architecture that supports out-of-order execution and instruction set enhancement
Cortex-A9
Cortex-X1
2020Arm v8-A architecture,big-core design, performance-first
2012–2020Arm v8-A architecture, 64- or 32-bit mode, TrustZone, dual issue mechanism
Cortex-A53/57/A72/A76/A78
2010Arm v7-A architecture,15–24 pipeline stages
Cortex-A15
2011–2019Process: 32 nm -> 14 nm -> 10 nmArchitecture: Sandy Bridge/Ivy Bridge/Haswell/Broadwell/Skylake/Icelake...
Yang Shengkai, You Shiping
AI/Chip Evolution and Development
34
3. Chip Industry Chain
From chip design to delivery, the division of work in the chip industry chain consists of six parts.
The rise of deep learning has seen the growth of AI chips. Deep learning poses high requirements on computing power, which cannot be met by traditional CPUs. Algorithm researchers find that GPUs are ideal for processing training and inference tasks on deep learning algorithms. In the past five years, a large number of dedicated AI chips and chip vendors have emerged with the explosive growth of deep learning. NVIDIA has also seized this opportunity to develop a series of AI chips, such as Tesla and Jetson, dedicated to deep learning tasks.
Design software
DescriptionDivision of Work Representative Brand/Vendor
NVIDIATraining: Tesla P100/V00/A100Inference: Tesla P4/T4, Jetson TX2
AI Chip Vendor Function
Google Training: TPU V1/V2/V3
HuaweiTraining: Ascend 910Inference: Ascend 310
Alibaba Inference: Hanguang 800
Cambrian Inference: Siyuan 270/220/100
Key tool used by chip manufacturers to design chip architecture. Currently, electronic design automation (EDA) software is used to design chips.
Synopsys (US), Cadence (US), Mentor Graphics (Germany)
Instruction set architecture (ISA)
Cornerstone of the chip ecosystem since it determines the OS running on a chip.
IA64 instruction set (Intel), Arm instruction set (Arm),RISC-V open source instruction set (RISC-V Foundation), etc.
Chip designResponsible for chip layout design, for example, key intellectual property (IP) cores of chips and complete System on Chip (SoC) design.
Intel (US), Qualcomm (US), Samsung (Republic of Korea), NVIDIA (US), MediaTek (Taiwan of China), HiSilicon (China)
Manufacturing equipment
Cutting edge devices used to produce chips, mostly referring to mask aligners which produce integrated circuits using the photolithography process.
ASML (Netherlands) is the top mask aligner manufacturer in the world. Shanghai Micro Electronics Equipment is the top mask aligner manufacturer in China but is well below the top players.
Wafer foundryProcess from chip layout to product manufacturing.
TSMC (Taiwan of China), Samsung (Republic of Korea), and GLOBALFOUNDRIES (US), UMC (Taiwan of China), SMIC (China), etc.
Packaging and testing
Generally performed in packaging and testing factories. The purpose of packaging and testing isto package chips after chip slicing and test the electrical performance of each chip to ensure that the functions and performance counters of the chips meet requirements.
ASE Group (Taiwan of China), Amkor Technology (US), JCET Group (China), TSHT (China), etc.
35
4. Chip Development Trends
Currently, AI chips can surpass human beings in some specific tasks. However, they are far from reaching human intelligence in terms of universality and adaptability. Most AI chips are used only to accelerate specific algorithms and cannot process general tasks. In the future, general-purpose AI chips are forecasted to have the following capabilities:
Arm chips dominate the embedded system market while x86 chips dominate the desktop and data center markets. However, the competition between the two has never ceased. Currently, some server-level Arm chips have emerged and are competing for the data center market. However, x86 chips still take the lead in this market.
Arm chips have weak single-core performance, so they use a large number of CPU cores for better performance. x86 chips have strong single-core performance, so they use a relatively small number of CPU cores. These two types of chips can be applied to different service scenarios because of their architecture differences.
However, due to slow manufacturing process evolution, deep x86 pipelines, and difficulties in microarchitecture breakthroughs, x86 chips are likely to integrate more cores. In addition, Arm has also launched the Cortex-X architecture aimed at the data center market. Arm is now trying to achieve maximum performance, while de-emphasizing performance per watt. In other words, Arm chips will evolve from having small cores to having large cores.
Programmability: AI chips can adapt to algorithm evolution and application diversity.
Dynamic architecture variability: AI chips can adapt to different algorithms to implement efficient computing.
High computing efficiency: Algorithms have endless requirements on computing power. Non-AI computing restricts implementation of many algorithms.
High energy efficiency: AI chips can be applied to embedded devices thanks to high energy efficiency ratio (EER).
Easy application development: AI chips can provide complete software stacks to facilitate AI development.
Development trends of AI chips
Development trends of general-purpose chips
Microarchitecture evolution: The Instruction Per Cycle (IPC) will be improved by 10% each generation. IPC is still an important indicator to measure the processing capability of general-purpose chips. Chips can integrate larger caches, more execution units, and more accurate branch prediction and task scheduling mechanisms to continuously improve the IPC.
Process evolution: Currently, general-purpose chips use the 7 nm process, with 5 nm and 3 nm processes currently in the pipeline. Although chip militarization is nearing its physical limits, process evolution is still the key to improving chip performance.
Power consumption control: The increasing number of general-purpose chip cores expands the chip size. A more refined and intelligent power consumption control mechanism is required to enable large-scale commercial use of chips.
Interconnection bandwidth: Multiple chips can be interconnected to enhance single-node processing capabilities.
Yang Shengkai, You Shiping
Vehicle algorithm & application
device (vendor A)
Vehicle algorithm & application
device (vendor B)
Other algorithm & application
device (vendor C)
Other algorithm & application
device (vendor D)
36
Kang Ming, Ding Fuqiang, Liu Yanyan
Algorithm Repository Technology
1. BackgroundAs artificial intelligence (AI) technology proliferates, using intelligence to improve the efficacy of video and image capture has become a trend in the industry.
2. Traditional Algorithm Repository SolutionsIn the early stage of the industry, some solutions have been provided to support the capability of co-deploying multiple algorithms in the same system. The following figure shows the integration of multiple algorithms from different independent software vendors (ISVs).
The application system integrates algorithm & application devices from various vendors.
Application system
Intelligent objects(examples: pedestrians,motor vehicles, and non-motorized vehicles)
Algorithms dedicated to various scenarios(examples: smoke and fire detection in the warehousing industry, safety helmet detection in the construction industry, and discharge detection in the water conservancy management industry)
Current situation
System architecture technology
Requirement: construct a multi-algorithm platform to support video and image intelligence
Support:Multiple algorithmsAlgorithms from multiple vendorsCoexistence of multiple versions
Provide:Optimal resource utilization, flexible scheduling capabilities, and unified APIs for integrating service-oriented algorithmsLarge-scale data management, and flexible orchestration for rapid combining of service-oriented algorithmsA mechanism to support quick algorithm optimization and "survival of the fittest", ensuring efficient application implementation
Algorithm repository technology
Vehicle application Other applications Algorithm evaluation
AI/Algorithm Repository Technology
37
Although the preceding solution is the most commonly used solution provided by traditional video surveillance vendors, it is the most inefficient and wasteful solution for users.
In principle is able to integrate the service capabilities of multiple algorithms.Supports separate algorithm deployment on devices.Allows the application system to integrate algorithm & application devices from various vendors.
In general, this solution is a stack of independent algorithm services with no overall algorithm repository architecture or technical support. Its weaknesses are as follows: Algorithms are independently deployed and hardware resources cannot be shared.Service data is stored on devices and cannot be shared.The application system needs to adapt to open APIs of various devices, which requires a heavy integration workload.Algorithm devices are independent of each other. To associate services between algorithm devices, the application system needs to perform comprehensive processing, which poses technical challenges.Systems need to be maintained separately, and the hardware and software differences between systems increase maintenance difficulties.Algorithm evolution requires support from algorithm vendors.
3. Current Algorithm Repository Solutions in the IndustryAs AI and cloud-based big data technologies become mature, AI algorithms have increasing requirements on computing power, new algorithms and algorithm types are being released at a faster rate, and more and more scenario-based algorithm applications have come into being. Building a more efficient and sustainable resource management system that complies with the technology evolution trend has become a common requirement.Traditional video surveillance vendors and emerging companies specializing in AI and professional cloud-based big data have provided technical solutions to meet this requirement.
Algorithmconfiguration
Algorithm scheduling
Algorithmrunning
monitoring
Algorithm evaluation
Algorithm orchestration
Algorithm management
Logical Architecture of Algorithm RepositoryMain Technical Solution
Algorithm SDK integration solutionAlgorithm container image integration solutionAlgorithm app service integration solution
Personservice
Vehicle service
Behavior analysis
Holographic profile
...
Service capability openness
Algorithm platform
Analysis algorithm Search/Clustering algorithm
...
Vendor AVersion A Version B
Existing Solutions
WeaknessesPe
rson
ana
lysi
s
Vehi
cle
anal
ysis
Stru
ctur
ed a
naly
sis
Beha
vior
ana
lysi
s
Scen
ario
-spe
cific
anal
ysis
Pers
on s
earc
hby
feat
ure
Vehi
cle
sear
chby
feat
ure
Stru
ctur
edda
ta-b
ased
sea
rch
Fea
ture
clu
ster
ing
Vide
o sy
nops
is
Vendor B Vendor AVersion A Version B Vendor B
Kang Ming, Ding Fuqiang, Liu Yanyan
38
Technically-speaking, most algorithm vendors (such as Hikvision, Dahua, and Uniview) have launched their own algorithm SDK integration solutions. In these solutions, most self-developed or purchased algorithms are integrated via SDK but a multi-algorithm capability is not supported.Currently, only Huawei's algorithm repository has managed to achieve SDK integration. Currently, nearly 50 algorithms have been integrated, providing users with a large-scale SDK interconnection experience.
Algorithm SDK integration solution
Logical Architecture of Huawei Multi-algorithm SDK Integration Solution
Abstraction
ISV application
eSDK/RESTful API
APIGW
API serviceimage repository
Video/Image analysis
Analysis algorithmplug-in
Container Container Container
Search service
Search algorithmplug-in
Vendor K's service
image
Vendor L'sservice image
Video access
AI accelerator card
Video/Image access
CPUIaaS
vPaaS algorithm repository
SDK plug-in mode API mode
Semi-structured data
Algorithm repository
Intelligent video/image recognition
(algorithm SDK)
Intelligent video/image search
(algorithm SDK)
Data lake (video/image, feature, and structured data)
Sharing of hardware among cloud platforms
ISV
Unified capability openness
Service image
Algorithm service API service
API routing agent
Vehicle application
Other applications
Search service Analysis serviceAlgorithm service
Person application
Long-tail applications
Pedestrian/Vehicle application
Vehicle algorithm repository
Person algorithm repository
Data structuring algorithm repository
Long-tail algorithm repository
Others
Vendor A's algorithm plug-in
Vendor C's algorithm plug-in
Vendor D's algorithm plug-in
Vendor E's algorithm plug-in
Vendor F's algorithm plug-in
Vendor G's algorithm plug-in
Vendor H's algorithm plug-in
Vendor I's algorithm plug-in
Vendor J's algorithm plug-in
SDK algorithm repository
Camera Camera Camera Camera Networked platform
Disk
Platform connectionMetadata accessImage access
Memory
Vendor B's algorithm plug-in
AI/Algorithm Repository Technology
39
Algorithm container image integration solution
The following figure shows the logical architecture of the algorithm container image integration solution. In this solution, algorithms are packaged into container images. The algorithms in the images have basic algorithm service capabilities and provide services for external systems through APIs. Upper-layer applications manage a combination of algorithm images to form an overall service.This solution is used by Alibaba and Huawei (including HUAWEI CLOUD EI). It is applicable to the integration of analysis and recognition algorithms, but not applicable to the integration of search or match algorithms.
Sharable cloud platform hardware
Vendor X's applicationIntelligent video/
image search(algorithm container)
Algorithm repository
Personalized API
Intelligent video/image recognition
(algorithm container)
Data lake (video, images, and structured data)
4. Benefits to Intelligent Video SurveillanceAlgorithm repository technology can fully utilize the advantages of on-demand allocation and elastic capacity expansion of cloud-based resources to meet the requirements of resource changes in actual use. Services such as traffic-based scheduling and inter-domain algorithm collaboration derived from algorithm repository technology bring high practical values to users.
Algorithm app service integration solution
The algorithm app service integration solution is essentially similar to the traditional platform or device connection solutions. This solution can be classified as proactive (initiator: platform) or passive (initiator: algorithm service). Both types require connection to the APIs designed for algorithm services or platforms. The following figure shows the logical architecture of the algorithm app service integration solution.
ISV application(examples: Ropeok, Hikvision, and Kedacom)
Platform Invoke at theapplication layer
Adapt to algorithmservice APIs
Unified capability
Algorithmservice
container/VM 1
Algorithmservice
container/VM 2
Algorithmservice
container/VM 3
Kang Ming, Ding Fuqiang, Liu Yanyan
40
GPU resource pooling, elastic anddynamic algorithm scheduling
ActiveStandby (1–3)
Scenario 1: Off-peak hours of the active algorithm
Scenario 2: Peak hours of
the active algorithm
Scenario 3: Burst traffic of the active algorithm
Service challengesThe traffic difference between day and night is substantial, and idle resources are not fully used at night.
Urgenttasks
Importanttasks
Commontasks
Idle resourcereuse tasks
Fast
ana
lysi
s of
rec
ordi
ngs
Anal
ysis
of v
ideo
from
site
s fo
r ke
y or
gani
zatio
nsAn
alys
is o
f vid
eo fr
om s
ites
for
key
publ
ic p
lace
s
Virt
ual c
heck
poin
ts in
no
n-co
nstr
aint
sce
nario
sAc
cess
con
trol
sys
tem
s in
co
mm
uniti
esVi
deo
from
com
mon
site
s
Non-critical site analysis
Algorithm update and
data cleansing
Past video analysis
Traffi
c sh
apin
g
GPU
GPU
GPU
Analysistask A
VM
GPU80%
Releaseresources
VM
GPU
GPU
GPU
GPU
20%
Analysistask A
VM
Night
Scenario 1
00:00 04:00 08:00 12:00 16:00 20:00
Noon
Video surveillance for emergencies or
major eventsScenario 2
Scenario 3
Dynamic and elastic scheduling based on service awareness
Imagestream
Task queue
Traffi
c di
strib
utio
n/ca
chin
g
1. Analysis (active algorithm)2. Analysis (standby algorithm)
3. Video structuring
4. Others
Resource scheduling management (policy center)Status monitoringTraffic monitoring
High LowTask priority
Daily people flow trend at checkpoints
Real-time, dynamic, elastic scheduling
Trafficaggregation
at checkpoints
The upper-level domain is responsible for building the overall algorithm repository. Lower-level domains download desired algorithms from the upper-level domain for specific services, fully utilizing resources of the upper- and lower-level domains. In addition, traffic-based scheduling can be used to pool resources for the entire area, and implement one cloud for the entire area or even the whole country in terms of hardware resources and service capabilities.
Analysis task Search task
Service entry
Analysis task Search task
Service entry
Analysis task Search task
Service entry
Cloud-edge synergy
Cloud-edge synergy
Lower-level domain
Cloud-edge synergy
...
Lower-level domain
Upper-level domain
Match and download algorithms
Match and downloadalgorithms
Inter-domain algorithm collaboration
Traffic-based scheduling
Traffic-based scheduling improves hardware resource usage, implements on-demand resource scheduling between multiple algorithms in a system, and supports resource sharing between services, meeting the requirements for scaling out/in specified service resources in a short period of time.
Vehiclealgorithm
Synopsisalgorithm ...
Vehiclealgorithm
Synopsisalgorithm
... Vehiclealgorithm
Synopsisalgorithm
...
Scenario 3
Evening rush hours
Evening rush hours
Morningrush hoursMorning
rush hours
AI/Algorithm Repository Technology
41
5. Challenges and Trends
Challenges
Once industry standards are released for algorithm repository technology, it is possible to promote industry vendors to form a standard algorithm repository technical architecture based on the cloud-based architecture. This will then encourage the development of more useful industry applications and allow for the realization of an all-encompassing cloud where multiple algorithms can coexist and various upper-layer applications can be developed.
Driven by the requirements for hierarchical user construction and separate management of data, algorithms, and computing power, the traditional mode of one vendor leading one project no longer exists. This drives vendors to seek new cooperation modes and therefore provides great opportunities for promoting and popularizing algorithm repository technology.
Trends
CouplingIn the algorithm SDK integration solution, the algorithm repository framework and algorithms share the same open-source software. The algorithms may fail to run properly due to mismatch of the open-source software version. The algorithm vendor and the algorithm repository platform vendor need to work together to solve the problem. If the algorithm SDK is deployed as an independent process, the performance deteriorates due to procedure call, especially during the processing of a large amount of data.
The traditional method of using independent algorithm devices is prone to resource siloing. Algorithm repositories, based on the cloud-based architecture, can solve the problem of resource siloing, but require algorithm vendors to separate algorithms from services. This has the downside of reducing device sales and affecting device solutions. In this scenario, algorithm vendors and platform vendors need to work together to find a new business cooperation mode to achieve win-win.
Currently, industry groups have been organizing the discussion and formulation of technical standards for algorithm repositories. However, due to impact of the preceding business model issues and vendors' own interests, vendors in the industry are not active in formulating technical standards (especially SDK integration standards) for algorithm repositories. Vendors that have been using algorithm repositories need to develop more high-level services and solutions based on algorithm repositories to attract users to accept algorithm repository technology. In this way, users will in turn drive major vendors to increase investment and support for algorithm repository technology.
Business mode
Standards
Kang Ming, Ding Fuqiang, Liu Yanyan
Cheng Min, Huang Jinxin, Shi Changshou, Fang Guangxiang, Jia Lin
42
AI/SuperColor Technology
Bayer array Combination of primary colors
High-sensitivity sensor
Conventional sensor technology
High-sensitivity sensor technology
1. High-Sensitivity Sensor Technology
SuperColor technology utilizes innovative sensor and algorithm-based image processing so cameras can deliver sharp, full-color images at night, improving recognition accuracy in extreme darkness. However, capturing images at night-time causes severe light pollution, especially at traffic checkpoints, and this in turn affects driver safety and disturbs nearby residents. In recent years, various manufacturers have endeavored to improve night image quality while reducing light pollution, but have so far made little progress. Therefore, this issue remains an important area for research within the camera industry.
Traditional image sensors adopt the Bayer filter, which features a 4 x 4 array consisting of four grids (2 x 2 array). Each grid contains one red pixel (R), one blue pixel (B), and two green pixels (G). The energy produced by the red spectrum, green spectrum, and blue spectrum pass through pixels of the corresponding color, respectively.
The following figure shows the light transmittance of a sensor with a
Bayer filter and that of a high-sensitivity sensor.
The high-sensitivity sensors adopt color filters with higher light transmittance, to enhance the light sensitivity of each
pixel. When equipped with these sensors, cameras are capable of delivering images with a higher signal-to-noise ratio
(SNR) in low light conditions, improving image-based recognition.
RYYB sensor: Allows both the green spectrum and the red spectrum to pass through the green pixels, improving light transmittance by 40% compared to a sensor with a Bayer filter.
RCCB sensor: Allows the gamut of the spectrum to pass through the green pixels, improving light transmittance by 80% compared to a sensor with a Bayer filter.
RGB-IR sensor: Allows the red and blue pixels of the traditional Bayer array to pass through the pixel-sensitive infrared (IR) spectrum. In low light conditions, the IR spectrum can supplement light, which enables the sensor to deliver images with a high SNR.
SuperColor Technology
R G R G
G B G B
R G R G
G B G B
RYYB RCCB RGBIR
Light transmittance1/3 About 2/3
Red + Green = Yellow
Red + Blue = Purple
Blue + Green = Cyan
Red + Green + Blue = White
Red
Yellow Cyan
Green
White
Purple Blue
High-sensitivity sensorSensor with the Bayer filter
43
BM3D algorithm
NL-means algorithm
Conventional spatial noise reduction
Effect comparison
2. DNN ISP-based Noise ReductionNoise is a kind of image distortion which occurs during the signal acquisition process. There are several types of noise, including dominant shot noise, dark-current shot noise, thermal noise, fixed-pattern noise, and readout noise. Generally, to obtain high-quality images, we need to eliminate noise in images without damaging information integrity. In short, noise reduction removes valueless information from images and improves encoding efficiency. Conventional noise reduction algorithms are classified into two categories: spatial or temporal.
Spatial noise reduction, also called single-frame noise reduction, reduces noise by processing frames individually.The non-local means (NL-means) algorithm and block matching 3D filtering (BM3D) algorithm are common spatial noise reduction methods. The NL-means algorithm takes a target pixel and searches the entire image for similar pixels, weights them by similarity, then takes the mean of these pixels to optimize the image. The BM3D algorithm searches for similar pixels to the target pixel, then transfers the pixels to the frequency domain, performs filtering and thresholding, and then transfers the pixels to the space domain.
When the sensor size, pixel size, and processing method of the high-sensitivity sensor are the same as those of the Bayer sensor, it can deliver images with higher quality at night, as shown in the following figures.
Imaging effect of the Bayer sensor (0.001 lux) Imaging effect of the high-sensitivity sensor (0.001 lux)
Searches for similar pixels
Searches for similar pixels
Integrates the pixels into a 3D matrix
Conducts 3D linear transformation
Performs filtering and thresholding
Performs 3D inverse transform
Generates denoised pixels
Runs the weighted average algorithm
Generates denoised pixels
Cheng Min, Huang Jinxin, Shi Changshou, Fang Guangxiang, Jia Lin
44
AI/SuperColor Technology
Effect comparison
Conventional temporal noise reduction
DNN ISP-based noise reduction
Temporal noise reduction, also known as multi-frame noise reduction, introduces several adjacent frames (temporal domain information) and performs weighted averaging on similar pixels in the spatial domain to reduce noise. However, if there are moving objects in two consecutive frames, an error occurs when two pixels belonging to different objects are filtered, resulting in motion blur. Therefore, the objective of temporal noise reduction is to accurately detect the motion strength and perform weighted averaging on the results of temporal filtering and spatial filtering. When the motion strength is considerable, the temporal-domain weight coefficient decreases and the spatial-domain weight coefficient increases. When the motion strength is minor, the temporal-domain weight coefficient increases and the spatial-domain weight coefficient decreases.
Traditional noise reduction technologies are unable to retain image details or edge information. Additionally, they cannot accurately evaluate motion strength, resulting in motion blur when obvious noise is removed. Against this backdrop, the deep neural network (DNN) image signal processing (ISP)-based noise reduction technology has been proposed. Based on a brand-new algorithm architecture, it is better at distinguishing between motion regions and non-motion regions, and between noise and image details, effectively resolving the problems that traditional noise reduction technologies are unable to overcome.
The DNN ISP-based noise reduction algorithm consists of a preprocessing algorithm, deep learning network, and post-processing algorithm. The preprocessing algorithm transforms data into a format suitable for network input and sends images to the deep learning network, which is formed of convolutional layers. Then, the network reduces noise on these images according to a pre-trained weight, and sends them to the post-processing unit where they are transformed into denoised images.
The DNN ISP-based noise reduction technology trains the network using data that includes real noise, which enables it to reduce 3 dB to 6 dB more noise than a conventional algorithm and facilitate more accurate image recognition.
Effect of conventional noise reduction algorithm Effect of DNN ISP-based noise reduction algorithm
Conventional temporal noise reduction
Temporal-domain filtering
Spatial-domain filtering
Weighted filtering result
Motion area
Motion area
Motion area
DNN ISP-based noise reduction
Video frame Denoised video frame
Prep
roce
ssin
g
Dee
p le
arni
ng
netw
ork
Post
-pro
cess
ing
45
Definition of glare
Measurement of glare
In addition to sensor and image processing technologies, night-time image capture is also affected by lighting conditions. Illuminators can provide improved lighting, but this causes uncomfortable glare in scenarios such as urban roads and alleys, and has lead to numerous complaints from nearby residents. Therefore, the industry desperately needs a standard by which glare can be measured and evaluated.
Currently, the industry primarily adopts IR flash, LED strobe, and dual-spectrum fusion technologies to reduce the glare of illuminators at night, but color cast and LED light pollution remain significant. Therefore, intelligent light compensation technology has been introduced to enable cameras to capture sharp, full-color images at night without producing severe light pollution.
Glare refers to visual conditions in which there is excessive contrast or an inappropriate distribution of light sources that disturbs the observers or limits their ability to distinguish details and objects.
Additionally, a threshold increment formula is introduced as another measurement of glare in the video surveillance industry. Threshold increment is the measure of disability glare expressed as the percentage increase in contrast required between an object and its background for it to be seen equally well with a source of glare present.
From the preceding formula, we can infer that the threshold increment can be reduced by lowering Lv and increasing Lav simultaneously.
Currently, there is no universal method for measuring glare within the video surveillance industry. In addition, the values of Lv and Lav can only be obtained through complex theoretical calculations and optical design. The threshold increment formula provides a quantitative way to measure disability glare in the industry and helps us to formulate appropriate standards. As a result, vendors will be able to produce illuminators that meet the glare index requirements.
Currently, in most of scenarios, glare is measured using the unified glare rating (UGR) proposed by the International Commission on Illumination (CIE), and this method has been widely applied in the field of lighting. However, when it comes to video surveillance, light compensation is generally used in scenarios where illumination is extremely poor, and the UGR cannot be directly applied. Instead, the following formula is used to measure glare.
In the formula above:
Lv : luminance received by the observer (unit: cd/m2)
Lav: average background luminance (unit: cd/m2)
χ: correction index (0 < χ < 1)
3. Formulation of Glare Measurement Standard
4. Intelligent Light Compensation
Lb : background luminance (unit: cd/m2)
Ls : luminance of each light source from the perspective of the observer
(unit: cd/m2)
w: solid angle from the light source to eyes of the observer
P: position index of each light source
n: number of light sources (positive integer)
In the formula above:
Glare levels
Glare Index (UGR)
16
10
22
28
19
Description
Imperceptible
Acceptable
Borderline
Uncomfortable
Disabling
Cheng Min, Huang Jinxin, Shi Changshou, Fang Guangxiang, Jia Lin
46
AI/SuperColor Technology
Traditional light compensation
Intelligent light compensation
Effect comparison
LED illuminators at traffic checkpoints are a main source of light pollution. High-speed vehicles can quickly pass the snapshot area, and the snapshot distance ranges from 20 m to 30 m. In this case, LED illuminators produce little light pollution. However, traditional omnidirectional light compensation technology produces large light pollution areas and severe glare at a distance of over 50 m, affecting drivers on the target and surrounding lanes. Additionally, the illuminators feature scattered luminous energy, resulting in poor light compensation efficiency.
The technology enables cameras to compensate a specific amount of light for targets and regions of interest (ROIs), performing weak light compensation for video streaming (light intensity: 0–20 lux) and strong light compensation for image capture (light intensity: 50–100 lux). This means gas-discharge flash lights (light intensity: 20,000 lux) are no longer needed and light pollution is effectively reduced. Additionally, the light cup and architecture can be altered to effectively cut off stray lights and glare, thereby reducing light pollution and improving light compensation efficiency.
This targeted, intelligent light compensation technology helps effectively cut out stray lights and eliminates the need for gas-discharge flash lights. It can integrate a high-sensitivity sensor and DNN ISP-based noise reduction technologies to enable cameras to capture sharp, full-color images in low light conditions without producing light pollution.
Severe color cast under traditional light compensation
Sharp, full-color images under intelligent light compensation
Stray lights have a limited impact on driver 1 but severely affect driver 2 and nearby drivers/passengers
Illuminator
LED strobe light
Gas-discharge flash light
Environment-friendly smart LED illuminator
Video: 0–20 lux Snapshot: 50–100 lux
Weak light compensation for video and strong light compensation for snapshots, no need for flash lights means no light pollution
+
Traditional omnidirectional light compensation with a large light pollution area, scattered luminous energy, and poor light compensation
18 m22 m
25 m
40 m
50 m
100 m
Light pollution
Intelligent light compensation with shallow depth
Excellent light compensation at 18–40 m and better snapshot effect
Driver 2's vision
Lights are cut off at a distance of over 50 m to ensure safe driving
Stray lightMain beam
Driver 1's vision
18 m
22 m25 m
40 m
50 m
100 m
Gong Junhui, Yue Boxuan, Xu Zhen
Video Codec Technology
Transmission Storage
It’s widely acknowledged that we live in the mobile Internet era, in which streaming media, and more specifically, diverse video formats – from humorous short video clips on social networking apps, to interactive live streams, and ubiquitous video surveillance, have reinvented daily life. However, the sheer amount of video data generated after image collection can be enormous, which places great strain on video transmission and backend storage. Fortunately, the arrival of video codec technology has made video transmission and storage easier and more efficient than ever.
When a camera is used to shoot a 1-minute 4K video (3840 x 2160 pixels), the uncompressed video data volume is about 17.38 GB. Under this scenario:
1. Foundational ConceptsA video feed is a sequence of images (called frames) captured and eventually displayed at a given frequency. The frames are classified into three types: I-frame, P-frame, and B-frame.
I-frame
B-frame B-frame
P-frame P-frameI-frame
If the bandwidth is 100 Mbit/s, it takes about 24 minutes to transmit a 1-minute video.
At full speed, a 10 TB hard disk can store a combined-9 hours of video at most.
47
B-frame is a bi-directional predictive frame that records the difference between the current frame and its preceding and proceeding frames. During decoding, the preceding and proceeding frames need to overlay with the current frame for final video images. This frame type is not suitable for real-time transmission, for example, video conferences, because it depends on subsequent frames.
P-frame (predicted picture) holds only the changes in the image from the previous frame. During decoding, the video frames buffered previously, need to overlay with the difference defined by the current frame for final video images.
I-frame, also called the key frame, is a single frame of digital contents that the compressor examines independently of the frames that precede and follow it, and stores all of the data needed to display that frame. However, I-frames contain the most amount of bits and therefore take up more space on the storage medium. After proper intra-frame compression, decoding only requires the current frame data. An I-frame can be used as the reference frame for subsequent frames, or as an image.
Gong Junhui, Yue Boxuan, Xu Zhen
48
AI/Video Codec Technology
2. Main Process
Video codec technology involves compressing video in a step-by-step manner. Current video codec solutions are mainly in the hybrid encoding format (predictive coding + transform coding), and contain the following main processes: prediction, transformation, quantization, and entropy encoding.
Prediction Transformation Quantization
PredictionAnti-
transformationDequantization
Entropyencoding
Entropyencoding
Channeltransmission
Encodedvideo
Decodedvideo
Originalvideo
Eliminating spatial redundancy in intra-frame prediction
Eliminating temporal redundancy in inter-frame prediction
Predictionresiduals
Transformationcoefficient
Time domain Frequency domain
Quantization
Quantized data
Source: Overview of the High Efficiency Video Coding (HEVC) Standard
Prediction: uses intra-frame prediction and inter-frame prediction technologies to eliminate spatial and temporal redundancy in video for compression.
Transformation: converts a time domain to a frequency domain to eliminate the correlation between adjacent data, indicating spatial redundancy elimination.
Quantization: eliminates the information that is imperceptible to human eyes, to reduce the amount of encoded data, and improve the compression ratio, indicating psycho visual redundancy elimination.
Entropy encoding: reduces data redundancy based on the probability characteristics of the data to be encoded.
49
3. Exploration Phase History of Video Codec Standards
During the entire video codec process, including prediction, transformation, quantization, and entropy encoding, numerous methods can be used within each sub-process to perform phased compressed encoding. For example, discrete cosine transform (DCT) algorithms and continuous wavelet transform (CWT) can both facilitate transform coding, and promote fully compressed encoding.
As early as 1993, the ITU Telecommunication Standardization Sector (ITU-T) had formulated the first video codec standard H.261, to ensure the interoperability of products from different vendors. With the flourishing of the video industry, video codec standards have continued to evolve.
The following table provides a comparative overview of the three common video codec standards.
In the video surveillance field, H.264 is the dominant video codec standard in use. Due to their advanced technology and standout performance, H.265 chips are expected to eventually replace H.264 chips, and become the mainstream technology within the industry.
28
0 200100 300
30
32
34
3635
38
40
PSNR (dB)
bit rate (kbit/s)
H.265/
MPEG-HEVC
(2013) H.264/
MPEG-4 AVC
(2003)
H.262/
MPEG-2
(1995)
H.261 (1991)
JPEG (1990)
Bit-rate Reduction:50%Bit-rate Reduction:50%
Exploration PhaseHistory of Video Codec Standards
Source: – Fraunhofer Heinrich Hertz InstituteVersatile Video Coding (VVC)
Standard H.264 (AVC) H.265 (HEVC)
Bit-rate usage 100%
Background Proposed by ITU-TVCEG in 2013
50%
Complexity 100% 300%
35%
300%
H.265+ (Video Surveillance)
Proposed by ITU-T and ISO/IEC in 2003
Launched by video surveillance vendors for surveillance scenarios based on H.265 (HEVC)
Motion compensation for multiple reference frames,adaptive 4 x 4 and 8 x 8 integer conversion, adaptive frame encoding, etc.
Multiple dimension changes (from 4 x 4 to 32 x 32), multiple intra-frame prediction modes, tree-structure prediction units, etc.
Dynamic GOP, dynamic FPS,dynamic ROI, long-term reference frame and background frame, etc.
Main technical methods
Gong Junhui, Yue Boxuan, Xu Zhen
4. ROI Encoding Technology
In video surveillance scenarios, certain areas under observation, such as the sky and grassy areas, can be neglected. Encoding and transmitting data from the entire surveillance area can unnecessarily strain network bandwidth and storage resources.
With the H.264/H.265 standard, encoding optimization is performed on the region of interest (ROI) technology for video surveillance scenarios. ROIs are extracted using the ROI extraction algorithm. During encoding and quantization, ROIs are encoded, while the non-ROIs are compressed. This lessens the network bandwidth and storage space demands, without compromising overall image quality.
ROI extraction is the basis of ROI coding technology, and directly determines the final effects of video coding. ROI extraction algorithms can be classified into background modeling algorithms and deep learning-based object detection algorithms.
These days, Gaussian Mixture Model (GMM) and Visual Background Extractor (ViBE) are the most prevalent background modeling algorithms for advanced foreground object segmentation. The background modeling algorithm development process consists of the following three steps: 1. Background model setup: establishing a mathematical model by extracting background features in a video sequence.2. Foreground detection: subtracting the detected image from the background model, to identify and locate moving
objects.3. Model update: updating the background model at a specific rate, to adapt to changes to background objects, such as
illumination, rain, and fog.
Backgroundmodel
Backgroundmodel
initializationVideo input
Foregrounddetection
Objectdetection
result output
Update
Detect
The foreground indicates the extracted ROIs and the background indicates the non-ROIs.
ROIextraction
In-depthcoding
Videooutput
Videoinput
Live video
Background compression
Normal encoding of foreground
Real-time re-coding of foreground and background
50
AI/Video Codec Technology
Neural network
To save storage space and transmission bandwidth resources.
To ensure image quality via an optimal peak signal-to-noise ratio (PSNR).
To maintain high-precision video surveillance services.
In-depth filtering of interferencefactors in the background Refined pixel-level segmentation to save space
ROI Extraction Algorithm Benefits and Drawbacks
Target regions are accurately recognized.
ROIs are extracted in rectangles, containing many redundant areas.
Foreground tracking is comparatively accurate.
Background modeling algorithm
51
In deep learning-based object detection, a video image is transferred to a deep neural network, for example, a convolutional neural network (CNN or ConvNet). Analysis capabilities are acquired through the deep neural network model, for the purposes of learning the internal rules for a large amount of sample data. These capabilities can be further utilized to categorize all objects in images and subsequently predict corresponding locations of the objects. Common deep learning-based object detection algorithms include: region-based convolutional neural networks (R-CNN) algorithm, you only look once (YOLO) algorithm, and single shot detector (SSD) algorithm.
The benefits and drawbacks of conventional background modeling algorithms and deep learning-based object detection algorithms are as follows:
Some vendors in the video surveillance industry made enhancements due to the disadvantages of the ROI extraction algorithm. New and improved ROI extraction algorithms take both moving objects, and specific objects, which are the target of video surveillance services, into account, and deeply filter out interference, in the form of animals, shaking greenery, or other elements. In addition, the ROI can be protected within a limited number of bytes, from rectangle cutouts to refined segmentation and delicate edge extraction, for the following purposes:
Deep learning-based object detection algorithm
The surveillance system is sensitive only to moving objects, resulting in the excessive extraction of moving objects such as animals and waving leaves from video images, rather than specific objects, such as motor vehicles, non-motorized vehicles, and individuals, which may be of interest for video surveillance purposes.
Gong Junhui, Yue Boxuan, Xu Zhen
Zhou Jiangli, Xu Zhen
52
AI/Storage EC Technology
1. It only tolerates disk faults, and each RAID group can only withstand one disk failure at a time.
2. At least one global hot spare disk needs to be prepared for each node.
3. An independent RAID controller card needs to be configured.
1. Data is protected as long as the number of failed disks does not exceed the limit.
2. No independent hot spare disks are required, and data can be read from and written to all hard disks.
3. No additional hardware is required.
1. Divides data into four data fragments.2. Calculates two parity fragments based on the EC algorithm.3. Writes the six (4 + 2) data fragments to different nodes.
However, traditional RAID 5 technology has some limitations:
Take the 4 + 2 level as an example. When receiving data, the system:
Redundant array of independent disks (RAID) is a well-known basic disk array technology used in storage systems. Nowadays, there are multiple RAID levels, each of which provides different kinds of data protection.
RAID 5 stores data and parity information on different disks. If one disk is damaged, RAID 5 automatically uses the remaining parity information to recreate the corrupted data once the disk has been replaced, ensuring data integrity.
Storage erasure coding (EC) is a method of data protection in which the original data and parity data are stored on different nodes, effectively breaking through the limitations of traditional RAID 5 technology.
The protection level of EC technology is represented by N + M, where N indicates the number of data fragments and M indicates the number of parity fragments.
1. Introduction
Storage EC Technology
Data stripe 1
Data stripe 2
Data stripe 3
Data stripe 4
Data stripe 5
Hard disk 1 Hard disk 2 Hard disk 3 Hard disk 4 Hard disk 5
A11 A12 A13 A14 AP
B11 B12 B13 BP B14
C11 C12 CP C13 C14
D11 DP D12 D13 D14
EP E11 E12 E13 E14
Data stripe
Parity data
Disk
Disk
Disk
Disk
…
Disk
Disk
Disk
Disk
…
Disk
Disk
Disk
Disk
…
Disk
Disk
Disk
Disk
…
Disk
Disk
Disk
Disk
…
Disk
Disk
Disk
Disk
…
Node 1 Node 2 Node 3 Node 4 Node 5 Node 6
Parity fragment
Parity fragment
Data fragment
Data fragment
Data fragment
Data fragment
Data
1
3
2
53
Hash distribution
A typical example is a shard-based distributed hash table (DHT), which can implement balanced data distribution.A storage system is divided into multiple shards at a fixed granularity (for example, 1 MB). IDs for these shards are created based on the file ID and start logical block address (LBA), and they are used as keys to calculate the hash value of each shard and allocate each shard to the DHT ring.
Compared with traditional storage, storage EC technology achieves load balancing by distributing data across nodes in either of the following modes:
3. Key Technology 1 – Globally-Balanced Data Distribution
In this type of architecture, no independent metadata management nodes are deployed. The system evenly distributes metadata and service data across storage nodes, preventing system resource contention. It also automatically reconstructs any metadata and service data which were stored on a failed node to ensure service continuity. When accessing service data, the client directly communicates with storage nodes, eliminating the performance bottlenecks of management nodes. Multiple renowned storage vendors, for example, Dell EMC Isilon, use this architecture. This kind of architecture is applicable to both large and small objects (files), but has high entry criteria: Metadata and service data must be evenly distributed across nodes, and reads and writes are balanced based on the client IP addresses and storage resource usage.
Asymmetric architecture is mainly applied to large file storage. Information such as file directories and blocks is stored on metadata management nodes. As more files are stored, they take up more memory resources of metadata management nodes. As a result, the distributed storage performance deteriorates.
Asymmetric architecture has the following characteristics:
1. Metadata management nodes need to be deployed independently, and their deployment scheme is complex.
2. If metadata management nodes fail, services will be interrupted.3. The metadata management node specifications are limited, which
can cause performance bottlenecks.
Asymmetric architectureAsymmetric architecture has dedicated metadata management nodes so that metadata and service data can be stored separately. A typical asymmetric architecture is the Hadoop Distributed File System (HDFS). The process of accessing data involves two steps:
1. The client communicates with a metadata management node to obtain the storage location of the service data.
2. The client communicates with a storage node to read data.
2. Architectures
Storage EC architecture can be classified as either asymmetric or symmetric, depending on whether metadata management nodes are independently deployed. This type of node manages metadata such as file directories and blocks.
Symmetric architecture
Client
ClientManagement
nodeManagement node: stores metadata.Storage node: stores service data.
Storage node Storage node Storage node Storage node
Storage node Storage node Storage node Storage node
Metadata and service data are stored across storage nodes.
Zhou Jiangli, Xu Zhen
54
Sequential distribution
Sequential distribution is relatively common in a distributed table system, for example, Bigtable. Tables are split into multiple tablets. Data from these tablets is distributed across storage nodes by the control server based on specific policies.
Each tablet is equivalent to a leaf node in a tree structure. Some tablets may become much larger while others become much smaller when data is inserted and deleted.
When a storage system receives an I/O request (whether it is for file ID, start LBA, or data), it uses the DHT algorithm to determine which server node will process this request. It then implements the following functions:
Balance: Data is distributed across nodes as evenly as possible.
Monotonicity: When new nodes are added to the system, the system only redistributes a small proportion of shards to them, to reduce data migration workload.
Metatables are introduced as a type of index to support a larger cluster scale. They maintain information about the nodes where the user tables are located, reducing reads and writes of the root table.
Hash addressing
Key 1
Key 2
Key 3
Key 4
Key
Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 Node 4
Metatable Metatable Metatable
Root table
Metatable(Optional)
User tableUser User User User User User User
Value 1
Disk 1
Disk 2
Disk
Value 2
Value 3
Value 4
Value
P
Shards
Add node 4
MappingPhysical nodes
P2 P1 P1
P5
P8
P3
P4 P6
P7 P7
P5
P8P9
P3
P4
P2
P6
P9
1-1000 1001-2000 2001-3000 3001-4000 4001-5000 5001-6000 6001-7000
Disk 2 Disk 3 DiskDisk 1
n n
n
n
n
AI/Storage EC Technology
55
If disk 2 on node 3 fails, data from other nodes is reconstructed into four copies and scattered to other normal disks on node 3.
5. Benefits in the Public Safety Industry
In the public safety industry, storage EC technology will:
As high-definition video surveillance develops rapidly, data volumes continue to increase, especially in large- and medium-sized projects such as intelligent transportation systems and intelligent campuses. In the next few years, storage EC technology will play an important role in storage solutions, and it is believed that it will provide the mainstream, optimal storage solution for public safety.
Improve system reliability and ensure service continuity even when a node fails or historical data is corrupted.
Distribute shards across nodes and enable high-concurrency access.
Restore data quickly by carrying out concurrent reconstruction of multiple shards when a fault occurs.
1. Data blocks are scattered across different storage nodes. During the repair process, data is concurrently reconstructed on multiple nodes, but each node
only needs to reconstruct a small amount. This effectively avoids performance bottlenecks caused by the reconstruction of large amounts of data on a
single node.
2. Data is distributed across nodes to ensure that it can be accessed and reconstructed even when a node fails.
3. Loads are automatically balanced when a fault occurs or capacity is expanded.
When a disk or node within the system is corrupted, the system will reconstruct the data. First, it runs erasure coding for normal data blocks to calculate which of the data blocks need to be reconstructed. Then, the system writes reconstructed data blocks to normal disks. Storage EC technology supports parallel and rapid troubleshooting and data rebuilding.
4. Key Technology 2 - Fast Data Reconstruction
x
Disk n Disk n Disk n Disk n Disk n Disk n
Zhou Jiangli, Xu Zhen
Liu Lei, Shen Zifu, Liu Tengjun, Zhang Tian
Multi-Lens Synergy Technology
In our intelligent age, many industries have deployed intelligent vision solutions. Cameras are the basic video surveillance product, but when combined with intelligence, they offer best-in-class panoramic surveillance and object movement detection functions, such as license plate recognition. Against this backdrop, multi-lens cameras are needed to meet the requirements of diverse surveillance for any industry. A multi-lens camera can provide views from multiple perspectives at the same time, and the lenses collaborate with each other. This multi-lens synergy technology helps supercharge legacy applications, and is an important foundation for intelligent vision.
1. BackgroundSingle-lens common cameras are widely deployed for conventional surveillance. As video surveillance enters the intelli-gent era, these cameras are insufficient and cannot meet the requirements in complex scenarios. Multi-lens cameras comprise a wide-angle prime lens and a minimum of one zoom lens, improving surveillance efficiency and coverage. Multi-lens cameras are available in diversified forms, including combinations of a single box camera and up to two pan–tilt–zoom (PTZ) dome cameras, or dual-lens PTZ dome cameras.
• Each camera is installed and deployed independently. Multiple cameras are needed for wide-scale coverage.
Single lens camera Multi-lens camera
• Lens can effectively collaborate across distance and devices. The wide-angle prime lens detects objects within the field of view, while the long-focus zoom lens rotates and zooms in on the video image to focus on the object.
• A single camera provides a wide field of view and functions as multiple cameras.
• Limited field of view, high false negative rate in detail capture
56
AI/Multi-Lens Synergy Techology
Wide-angle prime lens
57
When the wide-angle prime lens detects that an object enters the surveillance scope, it sends the coordinates to the zoom lens, which then focuses on and captures the object. This feature can be applied in a wide range of applications such as license plate recognition. Automatic calibration is needed to ensure collaboration between multiple lenses.
Multi-lens cameras run automatic calibration and real-time focusing technologies. The former ensures device-cloud synergy among lenses, while the latter enables quick zoom and accurate focus.
Automatic calibration process
The automatic calibration requires the two lenses to capture an image in the same field of view. Then, the camera system uses a dedicated algorithm to extract and match feature points from the two images. The system then obtains the mapping data between the two lenses based on the coordinates, and uses this calibration for future surveillance.
Automatic calibration
2. Technical Principles
Object feature set Object feature set
Coordinate mapping between the two lenses
Complete
Feature points matched
Use coordinates of feature points to obtain the mapping.
Ensure that the images from the two lenses are basically the same.
Feature extraction Feature extraction
Zoom lens
Feature mapping
Liu Lei, Shen Zifu, Liu Tengjun, Zhang Tian
Transportation surveillance
58
A real-time focusing technology is needed to create mapping based on object distance, image distance, and focal length to ensure seamless collaboration across various video streams. With this technology, the camera system establishes a conversion relationship between the camera and object distance/depth, helping control the lens to take quality snapshots.
Focus calibration must first be performed on the camera. For fixed cameras, a geometrical optics principle ensures the zoom lens of the camera can determine the object distance, successfully establishing a conversion between camera and object. After the calibration, the camera can calculate the object and image distance based on the focal length and object distance. The variables correspond to the focus position of the camera's built-in motor, which ensures the camera controls the motor and adapts its field of view (FoV) based on the image distance.
In actual usage, the zoom lens rotates or zooms in/out to take snapshots, helping the multi-lens camera obtain the object distance (for example, 20 m) based on the lens position such as PTZ data (for example, 20° pan and 15° tilt). Then, the system controls the lens to accurately focus on the object based on the object distance.
3. Applicable Scenarios
Multi-lens cameras are suitable for various scenarios such as transportation surveillance and open squares.
In transportation surveillance scenarios, the wide-angle prime lens helps simultaneously detect multiple objects such as vehicles and pedestrians, while the zoom lens captures object detail, such as license plates.
Focu
sing
and
ca
libra
tion
proc
ess
Real
-tim
e fo
cusi
ngpr
oces
s
Establish the conversion relationship between camera position and object distance
Take sharp snapshots in real time
Detect an objectControl focus position based on the object distance and camera
motor
Obtain the object distance based on the geometrical
optics principle
Obtain the camera position information
Obtain the object distance
Conversion relationship
Multi-lens cameras are often installed on street poles at a height of 5-10 m, but this variation may cause video images to vary in different zoom ratios and have a large depth of field (DoF). In this case, the camera may fail to focus on the desired object. Additionally, crowd and vehicle flow changes throughout the day, which can affect snapshot efficiency.
Vertical angle of view
Capture position range in front of the object
Capture position rangeDev
ice
heig
ht
目标高度
Obj
ect
heig
ht
Real-time focusing
Capture range
Street Scenario
AI/Multi-Lens Synergy Techology
Stream switch
More multi-lens
camera forms
Advanced data
processing technology
3-lens or 4-lens bullet/PTZ dome camera
Fixed lens + rotatable lens
Rotatable lens + rotatable lens
More data processing technologies
Open public squares
59
4. ProspectsIntelligent video surveillance poses increasingly high requirements on cameras. The multi-lens synergy technology, which is proposed to solve major problems facing video surveillance, also needs to be innovated to meet the ever-changing requirements of customers.
A multi-lens camera can perform the workloads of multiple cameras. The following shows the advantages of multi-lens cameras compared with common cameras when deployed in open squares.
Common camera: 6 cameras on 4 poles Multi-lens camera: 2 cameras on 2 poles
Current Issue Prospect Goal
• Efficient collaboration among modules
• Larger surveillance scope
• Simplified deployment
• Surveillance blind spots• High construction costs for
busy areas, such as crossroads
• Ever-changing surveillance scenarios and requirements
Multiple cameras are required to provide detailed video stream and panoramic coverage.
Because of poor zoom functionality, common cameras may miss targets when detecting targets. Blind spots exist at the border of cameras' field of view. Reduces the number of cameras and poles
required, saving maintenance costs.
Multi-lens cameras are equipped with a prime lens with a wide field of view, and a zoom lens that supports flexible object movement detection. This improves the surveillance scope and reduces blind spots.
Wide-angle prime lens Zoom lens
Users can select any object in the video image from the prime lens, so that the zoom lens can capture the object.
Lens synergy:Provides panoramic view and accurate object details.
Panoramic surveillancePanoramic surveillance
Detail capture Detail capture
Detail capture
Detail capture
Liu Lei, Shen Zifu, Liu Tengjun, Zhang Tian
Artificial Intelligence (AI) is not only a feature, but also the core competitiveness of software-defined cameras (SDCs). AI chips are the key to adding powerful intelligence to SDCs.
Huawei SDCs adopt professional NPUs with a computing power 25 times that of CPUs, enabling
visual analysis and computing of trillions of records.
Products and Solutions Catalog
1. Moving AI capabilities to cameras can dramatically improve intelligent recognition performance and reduce bandwidth usage, and is a development trend in the video surveillance industry.
2. Huawei, utilizing its own dedicated AI chips, has released a series of SDCs with computing power specifications ranging from 1 TOPS to 20 TOPS, aimed at satisfying surveillance requirements in various scenarios.
Dedicated AI chip
Huawei SDC Series
High-end series: 1–2 TOPSFlagship series: 4–20 TOPS Mid-range series: 1 TOPS Best-value series: 1 TOPS
X Series
Ultimate AI
M Series
Professional AI
C Series
Basic AI
D Series
Inclusive AI
Key Intelligent Services
Long-Tail Algorithms
Behavior analysis (such as abandoned object detection, loiteringdetection, and tripwire crossing detection)
Crowd flow analysis (queue length detection, crowd density detection, and heat map)
Forest fire detection
Person capture and personal feature extraction
Third-party long-tail algorithms such as safety helmet detection, smoke detection, floating debris recognition, and
attendance detection can be released in the Huawei HoloSens Store and downloaded and loaded onto SDCs.
AI/Products and Solutions Catalog
60
Gong Liye, Tan Shenquan, You Shiping
Intelligent vehicle analysis (such as vehicle feature recognition and traffic violation detection)
1. Huawei Software-Defined Camera
To cope with issues such as siloed system construction, limited storage space, and low intelligence, Huawei, utilizing cloud and AI technologies, has developed the Intelligent Video Storage (IVS) solution featuring algorithm ecosystem, elastic resource utilization, and effective storage. Huawei HoloSens IVS products have seen wide usage in a variety of scenarios such as intelligent campus and intelligent transportation.
Huawei IVS supports intelligent edge-cloud synergy to enable independent management and fast closure of edge services as well as unified aggregation, alert, and search of global services. This can dramatically improve intelligent application efficiency and intelligence coverage across industries.
Huawei IVS, based on the vPaaS2.0 architecture, complies with the "platform+ecosystem" strategy and provides the algorithm repository framework that enables concurrent operation of multiple algorithms on an application platform.
Huawei IVS features all-scenario intelligence solutions from the edge to the center, including Micro Edge, Lite Edge, and Central Platform and offers a range of services such as multi-dimensional data analysis, storage, and search, accelerating digital transformation across industries.
Huawei IVS Series
All-cloud solution
Level-1 center
IVS9000 Series
64–384 TOPS computing power, 768-channel access
36 or 38 disks
Large- and medium-sized campus and transportation
IVS3800 Series
IVS3800 IVS3800X
4–32 TOPS computing power, 64-channel image analysis
8 or 16 disks
Campus, education, and banking branch
IVS1800 Series
32 TOPS computing power, 16-channel parallel analysis
Intelligent transportation
ITS800 Series
20 TB storage per device
Device-edge intelligent synergy, inclusive AI
Distribution scenario
NVR800 Series
1, 2, 4, or 8 disks
Major Intelligent Services
Omni-Data Structuring
Object classification
Vehicle attribute recognition
Vehicle search by attribute
Personal attribute recognition
Personal feature extraction
Person search by image
Cyclist attribute recognition
License plate recognition
In-vehicle feature recognition
Person Clustering
N:N clustering
N:M clustering
Holographic profile
Person Recognition
Person capture
Personal attribute recognition
Personal feature extraction
Person search by image
Vehicle Recognition
Vehicle capture
Vehicle attribute recognition
License plate recognition
In-vehicle feature recognition
Vehicle feature recognition
Vehicle search by image
Behavior Analysis
Perimeter detection
Tripwire crossing detection
Loitering detection
Area entry/exit detection
Fast movement detection
Head counting
Queue length detection
Crowd density detection
61
2. Huawei Intelligent Video Storage
Gong Liye, Tan Shenquan, You Shiping
03
Cloud ServiceDiscussion on Video Cloud Service Trends
P2P Technology
Products and Solutions Catalog
63
67
72
Wang Kun, Wang Hongwei
Discussion on Video Cloud Service Trends
AI-powered intelligent analysis: enabling fast mining and extraction of valuable dataWith field-proven intelligent algorithms and GPU chips, valuable data can be quickly extracted from mass video data or directly provided by cameras. Intelligent algorithms such as vehicle recognition and object recognition have been or will be widely used.
IoT technology: enriching video surveillance dataValuable data (such as structured and semi-structured data relating to people and vehicles) extracted from massive quantities of raw video is also present on a massive scale. Although the data is seemingly disordered, substantial relationships among people, vehicles, and objects are embodied in the data.
Cloud computing technology: maximizing video surveillance resource usageAfter mass video data and IoT data are collected and mined, they will be applied to a range of industry applications such as video applications, vehicle applications, and multi-dimensional data applications. Currently, these applications are independently constructed as subsystems, and are finally interconnected through the upper-layer service system.
Against this backdrop, the next-generation intelligent video surveillance system — video cloud — that centers on AI, big data, and cloud computing rises to the occasion.
1. Origin of Video Cloud Services
Since video surveillance entered the Chinese market in the late 1970s, video surveillance technology has evolved through the four following stages after over 30 years of development: analog surveillance, analog-digital surveillance, networked surveillance, and intelligent surveillance.
Traditional video surveillance systems collect massive amounts of unstructured video data with low value density. In the past, security personnel needed to manually view video feeds to discover potential risks. However, due to limited labor resources, security personnel may suffer from fatigue after long viewing sessions and miss important information. The number of monitors in a surveillance center is also always far less than the total number of cameras. Therefore, in large-scale surveillance scenarios, security personnel cannot accurately and efficiently monitor all sites around the clock. The ever-expanding scale of video surveillance has seen an exponential increase in video surveillance data. It is difficult to manually search, view, and analyze the massive amounts of video and image data. The massive increase in surveillance data causes challenges for system architecture, data management, and data analysis in the traditional video surveillance field.
70-90′s
1999
2004
2006
2015
Analog surveillance
Analog-digital surveillance
Networked surveillance
Intelligent surveillance
Analog matrix Optical transceiver VCR-based storageSimple management
DVR/DVSAnalog/Digital matrix Local digital storage Simple management
DVS/Codec IP camera (IPC)IP network switchProfessional storage device Streaming media server Service management platform
Common HD camera Intelligent HD camera Intelligent analysisBig data Cloud computingOpen and standard architecture Multi-service convergence
63
Wang Kun, Wang Hongwei
Cloud Service/Discussion on Video Cloud Service Trends
Core value 2: No need for equipment roomsCloud services can free governments and enterprises from the need for equipment rooms, helping them reduce capital expenditure (CAPEX) on equipment room construction or leasing.
Core value 3: Professional maintenanceCloud services can free governments, enterprises, and individuals from the need to build their own systems and hiring dedicated maintenance personnel, since cloud service providers can offer 24/7 professional maintenance services.
Core value 4: Excellent agility and elasticityCustomers can flexibly increase or decrease resources depending on their actual service volume and do not need to spend lots of time and resources on equipment room construction and system construction, deployment, and debugging.
Cloud service concept
Cloud service categories
2. What Is Video Cloud Service?
The video cloud service is a video streaming media service based on cloud computing technologies. It provides customers with a range of services such as surveillance device (such as cameras and NVRs) access, pan-tilt-zoom (PTZ) control, and audio, video, and intelligent data upload, storage, processing and distribution across the network. The video cloud service enables customers to build a professional video surveillance system in a cost-effective and efficient manner, quickly build applications and intelligent surveillance solutions based on computer vision and video analysis, and easily develop online industry video services. At present, video cloud services have seen wide usage in varied scenarios such as intelligent store, education, community, factory, and construction site solutions.
For different application scenarios
Cloud service value
For different implementation capabilities
PaaS-based video cloud serviceThe software R&D platform is provided to users as a service so that software developers can develop new applications without having to purchase equipment such as servers.
Core value 1: Focus on core servicesCloud services give full play to social division of labor and lower requirements on video access. Video cloud service providers can provide horizontal SaaS or PaaS services to enable industry video service enterprises to focus on their core businesses and quickly build industry-specific video services for industrial market segments.
SaaS-based video cloud serviceSaaS is a creative software deployment mode in which software is provided on the Internet. Users can rent web-based software from providers to manage enterprise businesses.
Public cloud-based video cloud serviceThird-party cloud service providers own and operate public cloud resources (such as servers and storage space) and provide the resources for small- and medium-sized enterprises and individuals over the Internet.
Hybrid cloud-based video cloud serviceThe on-premise infrastructure or private cloud is combined with the public cloud, so data and applications can freely move between the private cloud and the public cloud, making deployment more flexible. This type of cloud services are targeted at governments, industry customers, and large enterprises.
Private cloud-based video cloud servicePhysically located in enterprise data centers, private cloud resources can be hosted by third-party service providers. This type of cloud services are aimed at key sectors such as governments, transportation, and finance.
PaaS
SaaS
64
COVID-19 has transformed the role and importance of digital experiences in people's lives. Demand is soaring for information sharing, cloud services, and cross-region collaboration. Video plays an essential role in modernized, efficient, intelligent, and refined societal governance. Also, this epidemic reveals a country's infrastructure capabilities, especially in the field of digital information technologies. Currently, China has started new infrastructure construction, which opens a window of opportunity for video cloud services.
Globalization has been torn apart in this epidemic, and countries are reshaping their industrial commercial systems to ensure supply security. The epidemic has highlighted the fact that globalization has put every country at the mercy of others. Globalization has accelerated the spread of viruses, setting off widespread social discontent. The foundation of western civilization, the concept of contractual obligation, collapses in the face of life and death. Countries are competing for masks and food. Also, fierce wars relating to currency, trade, science and technology, meteorology, biology, oil, and aerospace are in full swing. Global solidarity and cooperation have become nothing but empty words. All countries have begun to focus on supply security. National and regional ecosystems will surpass global ecosystems. Additionally, an industry-oriented commercial ecosystem enabling system is required to help local ecosystem participants find their positions, quickly engage in the industry ecosystem, and develop commercialized and stable supply capabilities. Local independent supply, as strategic reserves, can ensure supply security in emergencies. Hence why video cloud, as future infrastructure, needs to emphasize more on supply security.
The epidemic has also accelerated the digital transformation of enterprises. Cameras are basic devices that generate massive amounts of data, and industry video cloud services are ingress points for data traffic, which can help HUAWEI CLOUD win the multi-cloud era. Currently, cameras have seen wide use in enterprise video surveillance. The appearance of intelligent cameras gives rise to a large amount of structured data, which is also widely applied in enterprise management, production, and supply.
Third, the uplink bandwidth of the backbone network is greater than 50 Gbit/s, which leads to large video traffic on high-speed networks. Therefore, video reliability and traceability are vital for further strengthening information security protection.
First, production supervision is strengthened in key new infrastructure construction fields, including healthcare, manufacturing, power grid, Internet of Vehicles (IoV), ultra high definition (UHD), education, and harbors. Video cloud services are the basic services of infrastructure construction. Second, the access bandwidth is greater than 100 Mbit/s. This will alleviate the pressure on video backhaul bandwidth, improve the video cloud service access capabilities, add weight to edges, balance service indicators and network costs, and help build competitive service products.
3. Future Development
Frontend intelligence
Full-fledged AI technologies also play a significant role in the intelligent vision sector. AI technologies make available intelligent recognition, valuable information extraction from video, and alarm linkage, which dramatically improve efficiency across industries. Cameras mainly collect video and image data, which is then uploaded to equipment rooms, resulting in high Border Gateway Protocol (BGP) bandwidth costs. Embedding computing power on cameras will help directly deploy intelligent recognition algorithms on cameras, sharply decreasing the amount of data transmitted on the network.
Use the peer-to-peer (P2P) technology to give full play to devices and edges and reduce traffic forwarded on the cloud through network traversal to lower bandwidth costs. Combine with the content delivery network (CDN) bandwidth to improve distribution efficiency and reduce costs. Deploy video cloud service at edges, in close proximity to users, to reduce cloud-based traffic forwarding costs.
Device-cloud synergy
1. Reduce the video transmission bandwidth to lower network costs without compromising image quality by using cameras that use the H.265 encoding format.
2. Flexibly deploy computing power on devices, edges, and cloud to meet different latency, cost, and SLA requirements and achieve the optimal performance in varied scenarios.
65
Wang Kun, Wang Hongwei
Cloud Service/Discussion on Video Cloud Service Trends
Scenario-specific intelligence, boosting industry digitalization
Video acts as the data awareness entry point for enterprise digitalization. Cameras – basic devices of enterprises – collect video data, which boosts digital transformation of enterprises. There are many opportunities in vertical industries, which is why we need to focus on digital transformation of vertical industries to continuously explore application scenarios and provide value-added services accordingly.
Big data
Intelligentawareness
Intelligent decision
AI platform
Video cloud
Sound pickup
Image capture
Video collection
Behavior capture
Gas detection
Liquid detection
Vibration detection
Intention judgment
Human-machine interface
AI
VR
Mobile phone interface
Government
Enterprise
Household
Individual
Intelligent olfaction
Intelligent vision
Intelligentgustation
Intelligent somatosensation
AI
AI
AI
AI
Intelligent twins of industry digitalization
Region 1 Region 2
Aggregation point
CDN node for live streaming Edge computing Hybrid cloud
Public cloud node
Aggregation
Edge
Device
Edge storage Nearby forwarding
Central forwarding
IP IPIP IP IP
IP IPIP
66
Wu Xiaoliang, Zhang Yinqun
Currently, the most common Internet access mode is to place P2P communication hosts on both sides of Network Address Translation (NAT) devices, such as firewalls. NAT is a technique of translating the address in an IPv4 packet header into another address. NAT technology is widely applied to solve the IPv4 address exhaustion issue but prevents the establishment of P2P sessions. This is because NAT does not allow hosts on the public network to proactively access hosts on the private network, while P2P requires that both communication parties be able to proactively access each other. Therefore, this presentation focuses on how to conduct effective P2P communication in NAT traversal scenarios.
P2P is a communication model that allows peers to share resources with each other without a central server. The use of P2P can decrease the number of nodes involved in network transmission and prevent information loss. Unlike the client/server (C/S) model with a central server, each node on the P2P network acts as a server as well as a client. Nodes can directly communicate with each other.
1. What Is P2P?
2. P2P Implementation Solutions in NAT Traversal Scenarios
With improvements in network quality and bandwidth, and the proliferation of 4G/5G and smartphones, access options for household video access and public video access have correspondingly become more diverse. This presentation describes the use of peer-to-peer (P2P) technology in the video surveillance sector for the transfer of video via the Internet.
In this scenario, client A can directly connect to client B through TCP. However, if client B attempts to establish a TCP connection with client A for P2P communication, the connection will fail because the NAT device behind which client A is located will reject the connection request. To communicate with client A, client B cannot initiate a connection request directly to client A. Instead, client B sends a connection request to centralized server S, which then forwards the request to client A to request client A to connect to client B (that is, to establish a reverse connection). After receiving the request from the server, client A initiates a TCP connection request to client B. Then entries related to the TCP connection will be created on the NAT device so that the TCP connection is established between clients A and B.
The following describes common P2P implementation solutions in NAT traversal scenarios.
P2P Technology
Peer
Peer
Peer
Peer Peer
Client
Centralserver
Client
Client
ClientClient
3. Establish the reverse connection
Centralized server S(public IP address)
1. Request a reverse connection
2. Forward the reverse connection request
Internet
NAT gateway(public IP address)
Privatenetwork
Client A Client B
P2P network Classic C/S network
P2P implementation solution based on reverse connection
67
Wu Xiaoliang, Zhang Yinqun
Cloud Service/P2P Technology
In this solution, related NAT forwarding entries are created on the NAT devices of both clients through the centralized server, so packets sent by the two clients can directly pass through each other's NAT devices, thereby establishing a connection between the two clients.
UDP hole punching applies to the following typical scenarios:Scenario 1: In a simple scenario, two clients are located behind the same NAT device, that is, on the same private network.
In this scenario, clients A and B establish UDP connections with the centralized server separately. After NAT translation, the internal IP addresses and port numbers of clients A and B are translated to the public IP address and port number of the NAT device. After clients A and B obtain each other's internal and public IP address and port number from the centralized server, clients A and B send UDP data packets to each other to set up a connection. Clients A and B attempt to connect to the public IP address and private IP address at the same time. After the private IP address is connected, the private network connection is used preferentially.
Client A sends a request to centralized server S to connect to client B.Centralized server S sends client A's public IP address, private IP address, and port number to client B and client B's public IP address, private IP address, and port number to client A. Clients A and B send UDP data packets to each other to set up a connection. Because clients A and B are on the same private network, the UDP data packets are directly sent through the private network.
UDP hole punching process when both clients are located behind the same NAT device
InternetInternet
Centralized server S(public IP address)
Internet
Centralized server S(public IP address)
Centralized server S(public IP address)
??
Privatenetwork
Privatenetwork
Privatenetwork
Client A Client B Client A Client B Client A Client B
NAT device(public IP address)
NAT device(public IP address)
NAT device(public IP address)
The UDP hole punching process is as follows:
Before hole punching During hole punching After hole punching
1
2
3
2
2
3
1
P2P implementation solution based on UDP hole punching
68
2
3
4
1
1
Scenario 2: In a common scenario, two clients are located behind different NAT devices, that is, on different private networks.
This scenario is similar to scenario 1. The difference lies in that clients A and B are connected to different NAT devices and their IP addresses are translated into different public IP addresses during IP address and port mapping. After clients A and B obtain each other's public and private IP addresses and port numbers from the centralized server, clients A and B send UDP data packets to each other to set up a connection. However, clients A and B are on different private networks, and no route is configured for their private IP addresses on the public network. Therefore, UDP data packets destined to the private IP addresses will be sent to incorrect hosts.
The message packet (public network connection) sent by client A to client B will be used to create a session entry on client A's NAT device, and the message packet sent by client B to client A will also be used to create a session entry on client B's NAT device. If the message is unreachable, the data packet is discarded. Once clients A and B send data packets to the IP address and port number of the peer NAT device on the public network, the "hole" between clients A and B is punched. Clients A and B send data packets to each other's public IP address, which is equivalent to sending UDP data packets to the peer client. In this way, real P2P data transmission starts.
UDP hole punching process when two clients are located behind different NAT devices
Internet
Internet
Centralized server S(public IP address)
Internet
Centralized server S(public IP address)
Centralized server S(public IP address)
x x
xNAT device(public IP address)
Privatenetwork
NAT device(public IP address)
Privatenetwork
NAT device(public IP address)
Privatenetwork
NAT device(public IP address)
Privatenetwork
NAT device(public IP address)
Privatenetwork
NAT device(public IP address)
Privatenetwork
Client A Client B Client A Client B Client A Client B
The UDP hole punching process is as follows:
Client A sends a request to centralized server S to connect to client B.Centralized server S sends client A's public IP address, private IP address, and port number to client B and client B's public IP address, private IP address, and port number to client A. Client A sends a message to client B. Clients A and B belong to different private networks, and there is no public network route. Therefore, the message sent by client A to client B is unreachable. In this case, a session entry is created on client A's NAT device. Client B sends a message to client A. Because the session entry between clients A and B has been created on client A's NAT device, client B can send the message to client A through the public IP address. Client A connects to client B in the same way to implement P2P communication.
Before hole punching During hole punching After hole punching
2
3 4
2
69
Wu Xiaoliang, Zhang Yinqun
Cloud Service/P2P Technology
Scenario 3: In a complex scenario, clients are located behind two-layer NAT devices. Generally, top-layer NAT devices are provided by telecom carriers and layer-2 NAT devices are usually home NAT routers.
It is assumed that NAT device C is provided by a telecom carrier and NAT device A provides public IP address translation for its internal nodes. Internal nodes include NAT devices A and B and clients A and B. Clients A and B can connect to the centralized server only after their private IP addresses are translated into public IP addresses through two-layer NAT devices. Clients A and B obtain each other's public IP address and port number from the centralized server and perform hole punching. The data packets used for hole punching are forwarded by NAT device C. The hole punching process is the same as that in scenario 2.
UDP hole punching process when clients are located behind two-layer NAT devices
Internet
Centralized server S(public IP address)
Privatenetwork
NAT device A(public IP address)
Privatenetwork
NAT device B(public IP address)
Privatenetwork
Internet
Centralized server S(public IP address)
NAT device C(public IP address)
Privatenetwork
NAT device B(public IP address)
NAT device A(public IP address)
Privatenetwork
Privatenetwork
Internet
Centralized server S(public IP address)
xNAT device C
(public IP address)
Privatenetwork
NAT device B(public IP address)
NAT device A(public IP address)
Privatenetwork
Privatenetwork
NAT device C(public IP address)
Client A Client B Client A Client B Client A Client B
xx
Before hole punching During hole punching After hole punching
70
In the public cloud scenario, a camera and a mobile client, functioning as P2P clients, are deployed on different private networks which are isolated from the public network through a NAT gateway provided by the carrier. Through the hole punching technology, the camera and mobile client punch a "hole" on the NAT gateway. In this way, NAT will no longer be an obstacle for P2P session establishment.
Camera side: The camera registers with the cloud-based server and exchanges necessary information (about the network where the camera is located, including the internal IP address, internal service port, public IP address, and public service port) with the server to implement network analysis and connection establishment.
Client side: The client connects to the cloud-based server in the same way and exchanges information with the server.
Major benefits of P2P technology to video surveillance systems:
3. Application of P2P Technology in Video Surveillance Systems
In a video surveillance system, a camera or client obtains peer network information from a cloud server, and actively establishes a P2P connection with the peer end to transmit media streams and control signaling.
After the client, camera, and cloud-based server complete information exchange and the client attempts to request video streams from the camera, the client and the camera attempt to establish a P2P connection. The following figure shows the specific service process.
Reduces computing and network resources required for forwarding video streams through the cloud-based server, lowering service costs accordingly.
Enables users to enjoy HD video without stuttering.
4. Forward video streams.3. Request video streams.
2. Obtain the device list and request video streams.
1. Register the camera with the server and bring the camera online.
5. Establish a P2P connection. After the connection is established, the video transmission mode is switched from cloud-based forwarding to P2P.
1. Log in to the client and connect to the server.
Cloud-based server
71
Wu Xiaoliang, Zhang Yinqun
Cloud Service/Products and Solutions Catalog
Su Rui, Zhang Yinqun
Huawei HoloSens Cloud Service provides a wide range of cloud-based capabilities for Huawei devices such as cameras, network video recorders (NVRs), and intelligent video storage (IVS), as well as third-party devices. These capabilities include cloud-based video access, storage, viewing, and analysis. Software service providers can develop industry video applications based on Huawei HoloSens Cloud Service, for example, intelligent store and intelligent kindergarten solutions.
Products and Solutions Catalog
Huawei HoloSens Cloud Service (Only Available in China)
Open Access
Quick access for devices from third-party vendors
Standard APIs that enable ISV application innovation
Intelligent analysis via device-cloud synergy
Intelligent Ecosystem
Open intelligent algorithms
Intelligent algorithm/app store on the cloud
A rich selection of WeCode applets
Security and Trustworthiness
Encrypted data transmission and storage, video watermark
Dynamic privacy mask
E2E traceability
Seamless Experience
Consistent service experience on multiple types of devices
Device-cloud-edge synergy, superior user experience
Unified architecture and service based on HUAWEI CLOUD and hybrid cloud
72
Industry video data
Huawei HoloSens Cloud Service
VideoviewingVideo storageVideo access
Customer group analysis
Hotspotanalysis
Customerflow analysis
Remoteinspection
Third-party devicesEcosystem devicesNVR800Huawei SDC
Open to devices from various vendors
HoloSens Store
Aggregate ecosystem partners
Boost industry digitalization
SaaS
Stores …
Huawei HoloSens App
(for enterprises)
Huawei HoloSens
App
Huawei HoloSens PC
Client
Application client based on video intelligence
Algorithm & application transaction platform
Algorithm aggregation on the cloudAlgorithm loading onto devicesSelected, best-in-class algorithms
Intelligent chain store
Intelligent education
Intelligent breeding
Intelligent construction site
Shopping malls/Supermarkets 4S Stores Restaurants Kindergartens
Primary schools
Construction sites
Residential complexes
Clou
d-ed
ge-d
evic
e sy
nerg
y
74
Ecosystem/Discussion on Intelligent Vision Ecosystem Trends
Yu Zhuo, Liang Jiani
1. Status Quo of the Video Surveillance Industry
Driven by 5G, AI, big data, and cloud computing technologies, the video surveillance industry has grown from a conventional industry to an intelligent one. It is an unstoppable trend of the industry to transform from single-dimensional to multi-dimensional data applications and develop comprehensive applications that converge video surveillance data and traditional service data. In addition, industry participants, including AI vendors, IT vendors (in the fields of IT infrastructure, big data, cloud, and computing), and industry application vendors, are all vying for primacy. The industry chain, however, is such an all-inclusive and intricate ecosystem that it is tough for even an enterprise who reigns supreme in the market to be a sophisticated all-rounder. Enterprises are also facing increasing market competition for all-in-one and non-decoupling products.
The concept of manufacturers going it alone is becoming less and less popular nowadays. In this era, enterprises with various roles in the industry should collaborate with others to build up an ecosystem filled with partnerships throughout the industry chain. In other words, the nature of competition in the video surveillance market has changed – from competition between hardware and solutions to competition between ecosystems in the industry chain. The ecosystem is now the arena of competition for leading enterprises in the video surveillance sector, with technological enablement, platform openness, and partnership enablement being the core areas of competition.
2. Ten-Year Progression of the Video Surveillance Industry
During the past decade, the video surveillance industry has undergone progressive improvement, moving through the eras of network surveillance, HD surveillance, intelligent surveillance, to data-enabled surveillance. The video surveillance ecosystem has also evolved in parallel, from the initial networking platform to its current state, featuring AI, applications, and big data.
Discussion on Intelligent Vision Ecosystem Trends
Conventional video
surveillance
Video surveillance vendor
Party A
IntegratorIntelligent
video surveillance
Video surveillance vendor
IT vendor
AI vendor
Application vendor
Integrator
IT vendor: supplies fundamental computing hardware, storage hardware, as well as cloud and big data capabilities.
AI vendor: supplies fundamental capabilities of AI algorithms and AI chips, such as vehicle-related algorithms and algorithms for various industry segments.
Application vendor: supplies fundamental applications for each industry segment as well as end-to-end (E2E) services, such as the integrated video surveillance management platform and integrated command platform used in traffic management.
Party A
75
3. Challenges for the Video Surveillance Industry
Video surveillance has undergone constant development in recent years. Developing hardware-based algorithms is one of the important industry trends, together with algorithm value transfer to the frontend – simple algorithms that support closed-loop management can be directly loaded on frontend devices. As deep learning has been growing from a fledgling technology to a thriving one, the value of training data samples has surpassed the value of algorithms themselves. Consequently, algorithm vendors are finding it difficult to maintain competitiveness and prevent new entrants from outcompeting them. Application vendors are also confronted with challenges such as customizing applications for different regions, large-scale replication, and high competition in the homogeneous application market.
Algorithms relating to people, motor vehicles, and non-motorized vehicles are becoming more and more mature, and are widely used in traffic management applications. However, there is still a long way to go for other industries' intelligent apps to achieve this level of efficacy.
For example, various industries demand various long-tail algorithms, such as algorithms for head counting and health monitoring of livestock in animal husbandry, kitchen environment and standard operating procedure (SOP) monitoring in the catering industry, and transaction monitoring in the retail industry. However, many long-tail algorithms suffer from multiple issues such as difficulties in data acquisition and algorithm training, as well as enormous and diverse customization requirements.
(1997–2008)The video surveillance system was complex, involving IP cameras (IPCs), network video recorders (NVRs), and software systems.
Network Surveillance Era
(2009–2012)The video surveillance system gradually expanded to a platform-based application that integrates data transmission, video, alarms, and control.
HD Surveillance Era
(2013–2018)Computer vision technology promotes the advancement of AI-based video surveillance.
Intelligent Surveillance Era
(since 2019)A huge amount of structured data is generated with the wide use of AI in video surveillance. Cloud and big data technologies are required to solve the problems of massive data search and multi-dimensional data match.
Data-enabled Surveillance Era
AI
Major challenges for AI vendors and application vendors
Enormous and diverse requirements for customized long-tail algorithms
General Surveillance Platform Vendors Industry Solution Vendors AI and Application Vendors AI, Application, and
Big Data Vendors
Fire and smoke detection
Person recognition
Video structuringRecognition of license plates from the Chinese mainland
Recognition of license plates from countries/regions outside the Chinese mainland
Floating debris detection
Behavior analysis Video synopsis Video search
Safety helmet detection
Excavator recognition
High
Ma
rke
t D
em
an
d S
ca
le
Low
Yu Zhuo, Liang Jiani
76
4. Future of Video SurveillanceThe principal driving forces of AI development in the future will still be computing power, algorithms, and data, while the driving forces for AI-powered video surveillance include computing power, algorithms, data, solutions, engineering, and services. The core foundation of industry development is to provide viable and all-round solutions by combining technologies with industry attributes. No matter what the trend of the video surveillance industry and technological advancement will be, the methodological principle, using technical means to solve industry challenges, will never change. That principle will continue to guide every technical activity of an enterprise.
Years of evolution have seen the transition from analog cameras to intelligent cameras to modern software-defined cameras, which feature user-defined algorithms that can satisfy user requirements in various scenarios. Most cameras in the industry today feature optical sensors. To satisfy the needs of various industries, ecosystem cameras that support all types of interfaces are required in order to connect to various other types of sensors such as humidity sensors, temperature sensors, and PH value sensors.
With advanced technologies and policies, various industries are motivated to gradually make inroads into Smart City, which requires more intelligent, multifaceted, and all-round video surveillance ICT construction. In some industries, scenarios are segmented in a supplemental manner for video surveillance, breeding a batch of budding companies who expertise in providing algorithms and applications.
Intelligent campus
Intelligent grid
Intelligent transportation
Intelligent community
Intelligent business
Intelligent prison
Infrastructure PlatformResource orchestration
and scheduling Automatic deploymentVM resourcesContainer resources Auto-scaling
Inadequate camera hardware scalability
Platform + AI + Ecosystem + Industry intelligence to build an open cooperation system
AI Ecosystem
... ...
Person
Vehicle
Long tail
Audio Behavior Data management Intelligent analysis
Algorithm
Application
...
Ecosystem/Discussion on Intelligent Vision Ecosystem Trends
77
The camera of the future will support third-party hardware ecosystems. Through standard interfaces such as ad-serving interfaces or RS-485 (now also as EIA-485) interfaces, cameras can connect, with or without a pan-tilt unit (PTU), to multiple types of sensors such as liquid temperature sensors, liquid level sensors, conductivity sensors, air temperature sensors, tilt sensors, PH value sensors, and pull sensors.
In the construction of the video surveillance ecosystem, it is essential to build a general-purpose platform. Open APIs can contribute to the swarm effect where open-source technologies lay the foundation for ecosystem expansion and development. The entire ecosystem consists of shared platforms, open-source software, open APIs, and ecosystem business models.
Once new Internet technologies are integrated into the video surveillance industry, an increasing number of algorithm developers will join the online store-like algorithm platform, allowing for the realization of a new video surveillance ecosystem business model. Algorithm developers and users alike will no longer have to look for each other but will instead partake in direct-to-customer (D2C) transactions on universal algorithm markets.
Camera
Micrometeorological sensor
Liquid volume sensor
Liquid level sensor
Third-party RTU
Camera
Liquid temperature
sensor
Liquid level sensor
Conductivity sensor
Massive data
AIAI algorithms
User Developer
BEFORE
AFTER
User Developer
Search for algorithms
Search for users
Huawei HoloSens Store
From software-defined to software- and hardware-defined
From project-based model to business model
Few sensors Massive sensors
... ...
Yu Zhuo, Liang Jiani
78
Ecosystem/Products and Solutions Catalog
Tan Shenquan, Su Rui, Liang Jiani
Products and Solutions Catalog
Huawei Eco-Cube Camera Models
Huawei Eco-Cube cameras can be flexibly deployed in challenging environments, for example, areas with challenging power supply or network deployment. They provide an AIoT-Hub that a variety of sensors can connect to, allowing video data to be fused with IoT data to implement multi-dimensional awareness. Huawei Eco-Cube cameras introduce next-generation cameras to an entirely new perspective on the architecture.
The Eco-Cube camera adopts a functional compartment design inspired by space stations. The camera’s built-in AIoT-Hub allows connections with various third-party sensors to collect multi-dimensional data. The camera, with its cylindrical body and droplet-shaped cover, can perfectly blend into the environment. In addition, the camera features a dovetail slot for easy installation and can be disassembled with the push of a single button.
The Eco-Cube camera can be deployed in areas with difficulties in network deployment. An ad-hoc network can be quickly built between the primary camera and secondary cameras through the VideoX wireless transmission technology. In this way, video data from secondary cameras can be transferred through wireless networks. The Eco-Cube camera can also be used in areas without power grid access because it can be powered by solar panels. With built-in SuperColor technology, the camera can deliver full-color images at night without producing any light pollution. In addition, the camera supports one-click optimization, visualized and remote inspection, and self-cleaning, reducing on-site O&M.
The AIoT-Hub features a variety of sensor interfaces that enable the camera to connect to a variety of sensors to collect multi-dimensional data such as water level, air quality, temperature, and wind speed.
1. Huawei Eco-Cube Camera
M7341-10-I-RTX7341-10-HMI M7641-10-Z23-RT
Innovative design
No time or space restrictions
Unrestricted awareness
All-scenario adaptation No power grid | No wired connection | No light pollution | No onsite O&M
Environment-blending Aesthetic, user-friendly design, allowing the camera to perfectly blend into the environment; simplified installation, enabling the camera to be installed in various locations and used for various scenarios
Multi-dimensional awareness AIoT-Hub for the primary camera; AI chip + SDC OS + HoloSens Store, supporting on-demand loading of intelligent algorithms
Agile release Algorithm marketing
79
2. Huawei HoloSens Store (Only Available in China)
Huawei HoloSens Store integrates high quality third-party algorithms and applications that can run on Huawei's Software-Defined Cameras (SDCs) and Intelligent Video Storage (IVS) platforms. In this store, customers from various industries can choose from a variety of reliable intelligent algorithms.
Unified OS: provides standard and service-oriented APIs on SDCs and IVS platforms to run, manage, upgrade, and monitor third-party algorithms. One-stop development platform: provides comprehensive and efficient algorithm development and commissioning services for developers based on the algorithm development and training capabilities provided by the Huawei ModelArts and the remote connection capabilities provided by the Intelligent Vision Ecosystem Lab. HoloSens iClient: allows users to load and update third-party algorithms on Huawei intelligent vision products and manage algorithm licenses.
More Choices● Numerous algorithms● For diversified industries● Convenient order placement
Fast Replacement● Algorithm updates are pushed like
app updates● One-stop algorithm management
on the iClient● Intelligent algorithm match for
new demands
Safe to Use● Select, high-quality algorithms● Service continuity during
algorithm loading● Online verification and
closed-loop services
Fast Rollout
● One-stop development platform, enabling fast algorithm rollout at low costs
● Enablement plan, HUAWEI Developer Day, and HUAWEI Developer
Automatic Testing
● Automatic testing laboratory, providing online testing services for hundreds of products in terms of application stability, compatibility, and performance optimization
Broad Market Application
● Powerful marketing channels, helping partners to demonstrate and promote algorithms, so that they can quickly capitalize on their applications and algorithms
For Users
For Developers
HoloSens Store and Its Support System
Developers
Application development
Algorithm model development
Online
Online
Online Online
Model training
Algorithm debugging
Algorithm loading and
management, empowering a
range of industries
Diverse industries
Intelligent campus
Intelligent transportation
Intelligent finance
Intelligent education
Intelligent energy
...
SDC and IVS based on a unified OS, opening capabilities to build a future-proof algorithm and application ecosystem
HoloSens one-stop development platform HoloSens Store HoloSens iClient
Scan to access the HoloSens Store
Tan Shenquan, Su Rui, Liang Jiani
AbbreviationsArtificial Intelligence
Application Programming Interface
Advanced RISC Machine
Automatic Repeat Request
Audiovisual HD Quality
Block-Matching and 3D Filtering
Compound Annual Growth Rate
Capital Expenditure
Content Distribution Network
International Commission on Illumination
Convolutional Neural Network
Central Processing Unit
Distributed Hash Table
Deep Neural Network
Digital Video Recorder
Digital Video Server
Erasure Coding
Enhanced Mobile Broadband
Ecosystem Software Development Kit
Forward Error Correction
Gaussian Mixture Model
Graphics Processing Unit
International Electrotechnical Commission
International Organization for Standardization
Image Signal Processing
Independent Software Vendor
International TelecommunicationUnion-Telecommunication Standardization Sector
Intelligence Video Storage
K-Nearest Neighbors
Light Emitting Diode
Megabit Per Second
AI
API
ARM
ARQ
AVHD
BM3D
CAGR
CAPEX
CDN
CIE
CNN
CPU
DHT
DNN
DVR
DVS
EC
eMBB
eSDK
FEC
GMM
GPU
IEC
ISO
ISP
ISV
ITU-T
Massive Machine Type of Communications
Network Address Translation
Non-Local Means
No-Reference Metric
Neural Processing Unit
Network Video Recorder
Peer to Peer
Platform as a Service
Process-Architecture-Optimization
Peak Signal-to-Noise Ratio
Quality Assessment for Recognition Tasks
Quantization Parameter
Redundant Arrays of Independent Disks
Person Re-Identification
Radio Frequency Identification
Reduced Instruction Set Computing
Region of Interest
Software as a Service
Software Defined Camera
Software Development Kit
Service Level Agreement
Total Cost of Ownership
Transmission Control Protocol
Threshold Increment
User Datagram Protocol
Unified Glare Rating
Ultra-Reliable Low-Latency Communication
Video Cassette Recorder
Visual Background Extractor
mMTC
NAT
NL-Means
NORM
NPU
NVR
P2P
PaaS
PAO
PSNR
QART
QP
RAID
ReID
RFID
RISC
ROI
SaaS
SDC
SDK
SLA
TCO
TCP
TI
UDP
UGR
URLLC
VCR
ViBe
IVS
KNN
LED
Mbps
81
Abbreviations
Appendix/Legal Statement
Legal Statement
About This Document
jointly published by Huawei Data Storage and Intelligent Vision Product
Customer Experience Dept and Huawei Intelligent Vision PDU, presents the latest industry insights, hot technical topics, as
well as product and solution developments in intelligent vision.
Copyright Statement
The copyright of this Tech Express belongs to Huawei Technologies Co., Ltd. and is protected by law.
No part of this Tech Express may be copied, translated, modified, or distributed by any individual or organization in any
form or by any means without the prior written consent of Huawei Technologies Co., Ltd. Include "Source: Huawei Technol-
ogies Co., Ltd." when reproducing, distributing, or using any parts of this Tech Express in any form or by any means.
Huawei Technologies Co., Ltd. will investigate and affix legal liability to any individual or organization involved in the
violation of the preceding statements.
Responsibility Declaration
To the maximum extent permitted by law, the content of this Tech Express is provided As-is. It does not represent the views
of Huawei Technologies Co., Ltd., and does not serve as a warranty, guarantee or representation of any kind, either
expressed or implied, including but not limited to the warranties suitable for specific purposes and for commercial use.
Huawei Technologies Co., Ltd. does not guarantee the accuracy of information provided in this Tech Express. The informa-
tion in this document is subject to correction, modification and change without notice. Huawei Technologies Co., Ltd. does
not assume the responsibility for any decision-making or negative consequence caused by any individual or organization
based on the contents of this document.
Change History
Issue 1, December 2020
Huawei HoloSens Intelligent Vision Tech Express,
Legal Statement
82
Huawei HoloSens Intelligent Vision Product Portfolio
Industry application Safe City Intelligent Transportation Intelligent Campus
Cloud
Edge
Device
HoloSens IVS9000
HoloSens IVS3800
VCN5X0
HoloSens Store
HoloSens SDC
VCN VCM Video Big Data
IVS3800S\IVS3800XSStorage
IVS3800F\IVS3800XFStorage, compute,
and search
IVS3800C\IVS3800XCCompute
HoloSens IVS1800
IVS1800 C08-4T VCN510/520 VCN540
X series (Ultimate AI) M series (Professional AI) C series (Basic AI) D series (Inclusive AI)
IVS1800 C08-16T IVS1800 C08-32T
HoloSens SDCHuawei HoloSens SDC continuously evolves based on professional AI chips, open-ended SDC OS, and future-proof algorithm and application ecosystem. Ultimate AI computing power provides SDC with self-learning capabilities; deepened software-hardware decoupling allows partners to develop more algorithms for intelligent applications; open-ended architecture eliminates the boundary between software and hardware, making the SDC an intelligent enabler for multi-dimensional data awareness.
AI chips are key to adding true intelligence to SDC. Huawei SDC, relying on advanced Neural Processing Units (NPUs) such as Ascend chips, continues to evolve from inference only to inference and training, enabling visual analysis and computing of trillions of records.
Professional AI ChipA dedicated OS, the industry's first SDC OS launched by Huawei, runs on SDCs. The OS features a standard and unified software running environment where software can be decoupled from hardware. The OS opens service-oriented interfaces for ecosystem building and makes SDC truly software-defined. The HoloSens SDC OS adopts lightweight microservice architecture featuring loose coupling, high performance, and high reliability, ensuring service continuity during algorithm upgrade and switchover.
Open-ended SDC OSBased on the open-ended OS, a complete ecosystem tool chain is available to implement standard connection, training, and rollout of algorithms, opening the software architecture. Additionally, the SDC uses modular design to collect multi-dimensional sensory information, opening the hardware architecture. Huawei SDC supports on-demand loading of algorithms and hardware capabilities, turning common cameras into dedicated cameras within seconds and adding intelligence to a variety of industries.
Future-proof ecosystem
OSAI
TOPS
C series
M series
X series
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Future-proof ecosystem
Person Data Structuring Camera
Omni-Data Structuring Camera Vehicle Data Structuring Camera
X (eXtra) Series: Ultimate AI
Integrated ITS Camera
Security Situation Awareness Camera
Person Data Structuring Camera
M (Magic) Series: Professional AI
Omni-Data Structuring Camera
Vehicle Data Structuring Camera
4T 8MP Face Capture Box Camera
X1281-F4T 2MP Face Capture Softlight Bullet Camera
X2221-FL4T 2MP Face Recognition Softlight Bullet Camera
X2221-CL
4T 2MP Ultra-low-light Face Recognition infrared fixed dome camera
X3221-C
4T 4MP Ultra-low-light Face Capture Bullet Camera
X2241-FLI
4T 2MP Face Recognition Softlight Bullet Camera
X2222-CL4T 4MP Face Recognition Softlight Bullet Camera
X2241-CL
NEWSuperColor
4T 2MP Face Capture Softlight Bullet Camera
X2221-10-FL
4T 4MP SuperColor Multi-Algo-rithm-Concurrency Bullet Camera
X2241-10-HLI
4T 4MP Multi-Algorithm Concurrency Bullet Camera
X2241-HL 4T 8MP Multi-Algorithm Concurrency Bullet Camera
X2281-HL
NEW
5T Single-PTZ Compound-Eye Camera
X8341-10-HLI-PT8T Dual-PTZ Compound-Eye Camera
X8341-10-HLI-PT24T 2MP Vehicle Recognition Softlight Bullet Camera
X2221-VL
4T 9MP ITS AI Bullet Camera
X2331-10-TL4T 3MP Low-Light ITS AI Bullet Camera
X2391-10-TL20T 9MP Low-Light ITS AI Bullet Camera
X2391-20-T NEW NEWNEW
4T 2MP Ultra-Low Light IR Bullet Camera
X2221-I4T 2MP Ultra-Low Light IR Dome Camera
X3221-I
4T 4K 20x Ultra-Low Light IR PTZ Dome Camera
X6981-Z204T 2MP Starlight Laser IR PTZ Dome Camera
X6921-Z48
4T 2MP 37x Ultra-Low Light IR PTZ Dome Camera
X6721-Z374T 2MP 37x Ultra-Low Light IR PTZ Dome Camera
X6721-GZ374T 4K 37x Starlight IR PTZ Dome Camera
X6781-Z37
4 TOPS 4MP AI Fixed Dome Eco-Cube Camera
X7341-10-HMI
4T 2MP 30x Starlight PTZ Dome Camera
X6621-Z30 4K
4K
4T 2MP Ultra-Low Light Box Camera
X1221-FbNEW
1T 4MP IR Face Capture Bullet Camera
M2140-EFI(6mm)
2MP Face Capture IRBullet Camera
M2120-EFI(6mm)
2T 4MP Face Capture Bullet Camera
M2241-EFL
1T 2MP Face Capture Bullet Camera
M2121-EFL(8-32mm)1T 4MP Face Capture Bullet Camera
M2140-EFL(6mm)1T 4MP Face Capture Bullet Camera
M2140-EFL(7-35mm)
1T 2MP Face Capture Bullet Camera
M2120-EFL(7-35mm)1T 2MP Face Recognition Softlight Bullet Camera
M2121-ECL(2.8-12mm)1T 2MP Face Capture Bullet Camera
M2120-EFL(2.8-12mm)
2T 2MP Multi-Algorithm Box Camera
M1221-Q2T 8MP Multi-Algorithm Box Camera
M1281-Q2T 4MP Multi-Algorithm Box Camera
M1241-Q
2T 2MP Multi-Algorithm Bullet Camera
M2221-QL
2T 2MP Ultra-Low Light Invisible IR Bullet Camera
M2221-QIn
1T 2MP Class-D Anti-corrosion Infrared Bullet Camera
M2121-10-EI-S(8-32mm)
2T 4MP Multi-Algorithm Bullet Camera
M2241-QL1T 2MP IR AI VF Dome Camera
M3220-10-EI-Sf
1T 2MP Multi-Algorithm PTZ Dome Camera
1T 4MP Multi-Algorithm PTZ Dome Camera
1T 5MP IR AI VF Dome Camera
M3250-10-EI-Sf
M6721-E-Z31 M6741-E-Z371T 2MP Class-D Anti-corrosion Infrared PTZ Dome Camera
M6620-10-EZ33-S
1T 2MP vehicle recognition bullet camera
M2120-EVL(7-35mm) 1T 2MP vehicle recognition bullet camera
M2121-EVL(2.8-12\8-32mm)1T 4MP vehicle recognition bullet camera
M2140-EVL(7-35mm)
1T 4MP vehicle recognition bullet camera
1T 2MP vehicle recognition bullet camera
M2121-EVL-Sf(2.8-12mm)
M2141-EVL(2.8-12mm)
Copyright © Huawei Technologies Co.,Ltd. All rights reserved.No part of this document may be reproduced or transmitted in any from or by any means without prior written consent of Huawei Technologies Co.,Ltd.
Integrated ITS Camera
Person Data Structuring Camera
Security Situation Awareness Camera
Omni-Data Structuring Camera
C (Credible) Series: Basic AI
Security Situation Awareness Camera
3MP Integrated ITS Camera
M2331-T3MP Integrated ITS Camera
M2331-TG9MP Integrated ITS Camera
M2391-T9MP Integrated ITS Camera
M2391-TG
1T 2MP Invisible IR AI Bullet Camera
M2120-10-EBIn
1T 4MP Behavior Analysis Bullet Camera
M2140-EBI(3.6/6mm)1T 4MP IR AI Bullet Network Camera
M2141-10-EGI1T 2MP IR AI Bullet Camera
M2121-10-EI
1T 2MP IR AI Bullet Network Camera
M2120-10-EI
1T 5MP IR AI Bullet Network Camera
M2150-10-EGI1T 5MP IR AI Bullet Camera
M2150-10-EI1T 2MP IR AI VF Dome Camera
M3221-10-EI
1T 2MP IR AI Bullet Camera
M2120-10-EI(7-35mm)1T 2MP AI Bullet Camera
M2121-10-EL(2.8-12mm)
Smart Tracking System
M8544-EL-Z37 2T 2MP Low-Light Invisible IR PTZ Dome Camera
M6621-10-EBIn-Z23
1T 2MP IR AI VF Dome Network Camera
M3220-10-EI
1T 5MP IR AI VF Dome Network Camera
M3250-10-EI NEW
NEW
NEW
1T 2MP IR AI Bullet Camera
M2121-10-EI(8-32mm)
NEW
NEW
2T Thermal & Optical Bi-spectrum Network Positioning System
M9341-10-Th(75mm) NEW
2T 8MP 5G AI PTZ Dome Camera
M6781-10-GZ40-W5 NEW
2T 4MP Multi-Lens AI PTZ Dome Camera
M6741-10-Z40-E2 NEW
2 TOPS 4MP Microwave AI Eco-Cube Fixed Dome Camera
M7341-10-I-RT NEW
NEW
1T 2MP Face Capture IR AI Bullet Camera
C2120-10-FI(6-9mm)1T 2MP Face Capture IR AI Bullet Camera
C2120-10-FIC2120-10-CI(6mm)1T 2MP Face Recognition Bullet Network Camera
NEW
1T 5MP AI IR PTZ Dome Camera
C6650-10-Z331T 2MP AI IR PTZ Dome Camera
C6620-10-Z23
1T 5MP IR AI Dome Camera
C2150-10-I-PU(3.6\6mm)1T 5MP IR AI Dome Camera
C2150-10-IU1T 2MP IR AI Dome Camera
C3220-10-I
1T 2MP IR AI Dome Camera
C3221-10-I 1T 5MP IR AI Dome Camera
C3250-10-I(U)
1T 2MP AI Box Camera
C1220-10(-Fb)
200万AI白光筒型摄像机
C2120-10-LU(2.8-12mm)
C2120-I2MP Starlight Infrared Bullet Camera
C2120-I-Sf2MP Starlight Infrared Bullet Camera
C2121-I2MP Super Starlight Infrared Bullet Camera
C2121-I-Sf2MP Starlight Infrared Bullet Camera
C2120-I-P(3.6/6mm)2MP Starlight Infrared Bullet Camera
C2120-I(3.6/6mm)2MP Starlight Infrared Bullet Camera
C2150-I(3.6/6mm)5MP Starlight IR Bullet Camera
C2150-I-P(3.6/6mm)5MP Starlight IR Bullet Camera
C2150-I5MP Starlight IR Bullet Camera
C2141-I4MP Super Starlight Infrared Bullet Camera
C6620-Z23(-sf)2MP Starlight Infrared PTZ Dome Camera
C6620-10-Z23/Z331T 2MP Starlight Infrared PTZ Dome Camera
C3050-I(2.8/3.6mm)5MP Starlight Infrared Fixed Dome Camera
C3020-I(2.8/3.6mm)2MP Starlight Infrared Fixed Dome Camera
C3020-EI-P(2.8/3.6/6mm)2MP Starlight Infrared Fixed Dome Camera
C2120-EI(3.6/6mm)2MP Infrared Bullet Camera
C2120-EI-P(3.6/6mm)2MP Starlight IR Bullet Camera
C3220-10-IU1T 2MP Starlight IR Dome Camera
C3221-10-IU1T 2MP IR AI VF Dome Camera
NEW C6650-10-Z331T 5MP Starlight Infrared PTZ Dome Camera
NEW
C3220-10-I-PU(2.8/3.6/6mm)1T 2MP IR AI Fixed Dome Network Camera
NEW
C3250-10-I-PU(2.8/3.6/6mm)1T 5MP IR AI Fixed Dome Network Camera
NEW
NEW
C2120-10-I-PU(3.6/6mm)1T 2MP IR AI Bullet Network Camera
NEW C2120-10-I-P(3.6/6mm)1T 2MP IR AI Bullet Network Camera
NEW C2150-10-I-P(3.6/6mm)1T 5MP IR AI Bullet Network Camera
NEW
C3220-10-I-P(2.8/3.6/6mm)1T 2MP IR AI Fixed Dome Network Camera
NEW
C3220-10-I(2.8/3.6mm)1T 2MP IR AI Fixed Dome Network Camera
C2120-10-L-P(3.6mm)1T 2MP Softlight AI Bullet Network Camera
C3050-10-I-P(2.8mm/3.6mm/6mm)1T 5MP IR AI Fixed Dome Network Camera
C3050-I-P(2.8mm/3.6mm/6mm)5MP Starlight Infrared Fixed Dome Camera
C3050-I-P(2.8/3.6mm)5MP Starlight Infrared Fixed Dome Camera
NEW
NEW
Huawei HoloSens IVS9000 provides large-capacity and high-concurrency video access, storage, forwarding, analysis, and searching capabilities. It is perfect for medium- and large-sized Safe City projects. As the intelligent center of the entire network, Huawei IVS 9000 processes complex, multi-dimensional, and cross-domain services. It aggregates and shares video resources from provincial and city offices, assisting the command center in cross-city collaboration.
Provides functions such as real-time surveillance, forwarding, video recording, backup, security alarm, intelligent analysis, voice intercom, and voice broadcast.
VCN VCMProvides functions such as video analysis and data search.
Lite Edge HoloSens IVS3800
Center Platform HoloSens IVS9000
Huawei HoloSens IVS edge solution provides more efficient storage, analysis, and search capabilities. It also supports medium- and small-sized service cover-
age, regional autonomy, and fast service deployment in city offices, district/county branches, and campuses.
Huawei Intelligent Video Storage is based on cloud architecture where software is decoupled from hardware and data is decoupled from applications. It uses a variety of mission-critical technologies, such as cloud computing, cloud storage, and big data to provide full-stack, all-cloud collaboration capabilities. Huawei HoloSens IVS can be used in Safe City projects and during situations requiring surveillance. This solution uses distributed cloud computing, high-performance big data, and intelligent analysis technologies to provide a high-density resource pool featuring elastic scaling. Huawei HoloSens IVS uses algorithm repository service to integrate third-party face-, vehicle-, and person-related algorithms, as well as reverse image search algorithms. Faster analysis is achieved through cooperation between software and hardware, which actively searches through hundreds of billions of data records within seconds. In addition, multiple platforms can collaborate to provide more efficient services. HoloSens IVS uses intelligent insight to help create safer cities.
HoloSens Intelligent Video Storage
Service Platforms
Infrastructure
VCN VCM
Cloud Computing Management Platform
Video Big Data
Big Data Support Platform
Access
Storage
Forwarding
Transcoding
Face analysis
Person analysis
Vehicle analysis
Video structuring
Search
Multi-algorithm
scheduling
Identity library
Pedestrian library
Vehicle library
Case library
···
Basic components
Distributed DB ···
General-purpose CPU, GPU, and NPU
Cloud storage resource pool
Storage device Network deviceCompute device
HCS, Docker
Cloud network resource pool
All-Cloud Synergy Hardcore Innovation Data Intelligence
In-house innovation
On-demand combinations of storage,
compute, and search capabilities
800-channel access, 2x the industry average
768-channel forwarding, 3x the industry average
384-channel computing, 4x the industry average
Multi-algorithm rollout in one week
App rollout in one week
N:N data clustering among millions of records
IVS3800SIVS3800XS
IVS3800FIVS3800XF
IVS3800CIVS3800XC
Storage
64-bit multi-core high-performance processor
Storage+Compute+Search
64-bit multi-core high-performance processorAI-accelerated processing unit
Compute
64-bit multi-core high-performance processorAI-accelerated processing unit
※Some of the preceding specifications are supported after software upgrade.
Copyright © Huawei Technologies Co.,Ltd. All rights reserved.No part of this document may be reproduced or transmitted in any from or by any means without prior written consent of Huawei Technologies Co.,Ltd.
Micro Edge HoloSens IVS1800
Huawei VCN is applicable for small-sized campuses, communities, and intelli-gent power distribution rooms.It supports a wide assortment of functions such as live video surveillance, video forwarding, video search, video playback, PTZ control, local video viewing,and alarm linkage.
Video Content Node
VCN510-88 channels
VCN510-8P8 channels, PoE power supply
VCN510-1616 channels
VCN510-16P16 channels, PoE power supply
VCN520-3232 channels
VCN540-6464 channels
Secure•N+0 cluster•SafeVideo+
Open•Supports 300+ brands of cameras•Supports connections to survei l lance platforms of 50+ brands.•Allows the eSDK to integrate partners' video surveillance capabilities.
IVS1800-C08-4T | IVS1800-C08-16T | IVS1800-C08-32T
Multi-algorithm concurrency | 16-channel video analysis | 64-channel image analysis
2 U embedded server | AI-accelerated processor
Scan for more
Official website
Intelligent Micro Edge Platform That Integrates Storage, Compute, and Search
16-/32-/64-channel, 8 disks
16-/32-/64-channel, 8 disks
32-/64-channel, 8 disks
Huawei HoloSens Intelligent Vision Product Portfolio
About Huawei HoloSens Intelligent Vision
Intelligent vision serves as the eyes of the intelligent world, the core enabler of worldwide sensory interconnectivity, and a key enabler for digital transformation of industries. Huawei intelligent vision refers to the technologies and methods revolving around the use of non-contact optical sensors to automatically receive and perform intelligent analysis on large amounts of image data, so as to obtain desired information and control machines or processes. An intelligent vision system, covering image collection and perception, data processing and analysis, and decision making and execution, generally consists of multiple units such as algorithms, software, and hardware. Huawei intelligent vision extends from industrial to non-industrial sectors and is widely applied in various industries, such as intelligent video surveillance, autonomous driving, robotics, and consumer electronics. Huawei Intelligent Vision, with Huawei HoloSens as its brand name, serves as an entrance to the intelligent world based on multi-dimensional awareness and data intelligence. Huawei HoloSens integrates technical edges in connectivity, computing, cloud technology, and devices to provide competitive multi-spectral intelligent awareness devices; delivers optimal intelligent video storage solutions for edge scenarios where massive amounts of video data are generated; provides powerful device-edge-cloud synergy solutions and business models with cloud services as the core; and builds an open, competitive, and operable intelligent vision ecosystem. Huawei HoloSens Intelligent Vision provides software-defined cameras (HoloSens SDCs), intelligent video storage (HoloSens IVS), and a one-stop intelligent video algorithm store (HoloSens Store) for sectors such as transportation, campus, education, and finance. Additionally, Huawei joins hands with partners such as algorithm, application, and hardware vendors to embed intelligence into all industries. From video surveillance to intelligent vision, Huawei takes the lead in research and technology development by leveraging its core technical advantages. In the future, Huawei will explore vehicle-mounted and industrial vision products to embrace larger markets. Huawei remains committed to providing the most competitive multi-dimensional awareness and device-edge-cloud synergy solutions in order to become a pioneer in the intelligent vision industry.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Industrial Base Bantian Longgang Shenzhen 518129, P. R. China Tel: +86-755-28780808 www.huawei.com
DisclaimerThis document may contain predictive information, including but not limited to information about future finance, operations, product series, and new
technologies. Due to uncertain factors, the information may be greatly different from actual results.
Therefore, the information in this document is for reference only and does not constitute any offer or commitment. Huawei is not liable for any
behavior that you make based on this document. Huawei may change the information at any time without notice.
Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved.
Trademarks and Permissions , HUAWEI, and are trademarks or trade names of Huawei Technologies Co., Ltd. All other trademarks, product names, service names, and company names mentioned in this document are the property of their respective holders.
No part of this document may be reproduced or transmitted in any form or by any means without the prior written consent of Huawei Technologies Co., Ltd.
Top Related