Post on 06-May-2023
Network Working Group A. Clemm
Internet-Draft L. Dong
Intended status: Informational Futurewei
Expires: January 12, 2023 G. Mirsky
Ericsson
L. Ciavaglia
Rakuten Mobile
J. Tantsura
Microsoft
M-P. Odini
July 11, 2022
Green Networking Metrics
draft-cx-green-metrics-00
Abstract
This document explains the need for network instrumentation that
allows to assess the power consumption, energy efficiency, and carbon
footprint associated with a network, its equipment, and the services
that are provided over it. It also suggests a set of related metrics
that, when provided visibility into, can help to optimize a network’s
energy efficiency and "greenness".
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 12, 2023.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
Clemm, et al. Expires January 12, 2023 [Page 1]
Internet-Draft July 2022
This document is subject to BCP 78 and the IETF Trust’s Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Definitions and Acronyms . . . . . . . . . . . . . . . . . . 3
3. Energy Metrics . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Energy Metrics related to Equipment . . . . . . . . . . . 4
3.1.1. Base Metrics . . . . . . . . . . . . . . . . . . . . 4
3.1.2. Virtualization Considerations . . . . . . . . . . . . 6
3.2. Energy Metrics related to Flows . . . . . . . . . . . . . 7
3.3. Energy Metrics related to Paths . . . . . . . . . . . . . 8
3.4. Energy Metrics related to the Network-at-Large . . . . . 8
4. Other considerations and discussion items . . . . . . . . . . 9
4.1. User perspective . . . . . . . . . . . . . . . . . . . . 9
4.2. Holistic perspective . . . . . . . . . . . . . . . . . . 10
4.3. Sustainable equipment production . . . . . . . . . . . . 10
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
6. Security Considerations . . . . . . . . . . . . . . . . . . . 11
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11
8. Informative References . . . . . . . . . . . . . . . . . . . 11
Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction
Climate change and the need to curb greenhouse emissions have been
recognized by the United Nations and by most governments as one of
the big challenges of our time. As a result, improving energy
efficiency and reducing power consumption are becoming of increasing
importance for society and for many industries. The networking
industry is no exception.
Networks themselves consume significant amounts of energy.
Therefore, the networking industry has an important role to play in
meeting sustainability goals. Future networking advances will
increasingly need to focus on becoming more energy-efficient and
reducing carbon footprint, both for economic reasons and for reasons
of corporate responsibility. This shift has already begun and
sustainability is already becoming an important concern for network
providers [telefonica2020].
Clemm, et al. Expires January 12, 2023 [Page 2]
Internet-Draft July 2022
There are many vectors along which networks can be made "greener".
At its foundation, it involves network equipment itself. Making such
equipment more energy-efficient is a big factor in helping networks
become greener. However, opportunities also exist at the level of
protocols themselves (e.g. reduction of transmission waste and
enabling of rapid control loops), at the level of the overall network
(e.g. path optimization under consideration of energy efficiency as a
cost factor), and architecture level (e.g. placement of contents and
functions) [I.D.draft-cwx-green-ps].
However, regardless of any particular approach that is chosen, in
order to assess its impact, there is a need to have visibility into
the actual energy consumption that is occurring and to ideally be
able to attribute that consumption to its sources. As the adage
goes, you cannot manage what you cannot measure. By extension, you
cannot optimize what you have no visibility of. The ability to
instrument networks in a way that allows for the assessment of energy
consumption is hence an important enabler for potential energy
optimizations, allowing to assess the effectiveness of measures that
are being taken and enabling (for example) control loops that involve
energy as an input. Before instrumenting, it needs to be clear,
however, what the proper metrics are that network providers will be
interested in and that applications will seek to optimize.
This document defines a set of metrics that allow to assess the
"greenness" of networks and that form the basis for optimizing energy
efficiency, carbon footprint, and environmental sustainability of
networks and the services provided. These metrics are intended to
serve the foundation for possible later IETF standardization
activities, such as the definition of related YANG modules [RFC7950]
or energy-related control protocol extensions.
Please note that throughout this document, we will be using the terms
"green" and "energy efficient" interchangeably. In general, we will
be use these terms in a broad sense, encompassing also carbon
footprint and sustainability except when explicitly mentioned
otherwise. Likewise, we treat "energy efficiency" as synonymous with
"energy utilization efficiency", broadly speaking referring to the
efficiency with which energy is being utilized.
2. Definitions and Acronyms
Carbon footprint: as used in this document, the amount of carbon
emissions associated with the use or deployment of technology,
usually directly correlated with the associated energy consumption
CPU: Central Processing Unit
Clemm, et al. Expires January 12, 2023 [Page 3]
Internet-Draft July 2022
IPFIX: IP Flow Information eXport
TCAM: Ternary Content-Addressable Memory
pWh: pico Watt hour
Wh: Watt hour
3. Energy Metrics
In the following, we categorize energy metrics as follows:
o At the device/equipment level. This concerns aspects such as
energy consumption of a device as a whole, of equipment components
such as line cards or individual ports. It includes metrics that
would, for example, be found in equipment data sheets.
o At the flow level. This concerns aspects about energy consumption
by flows. Metrics at this level attribute energy consumption to a
flow.
o At the path level. These metrics attest to the end-to-end energy
efficiency of paths, attesting to their energy intensity
(reflecting e.g. the amount of energy drawn when the path is
selected) and taking into account, for example, whether a given
path includes segments known to be energy-intensive.
o At the network level. These metrics aggregate energy consumption
across a network to provide a holistic picture of the "network as
a system".
3.1. Energy Metrics related to Equipment
3.1.1. Base Metrics
Arguably the most relevant energy metrics relate to equipment as a
whole. After all, power is drawn from devices.
The power consumption of the device can be divided into the
consumption of the core components (e.g. the backplane and CPU) as
well as additional consumption incurred per port and line card. In
[I.D.draft-manral-bmwg-power-usage], the device factors affecting
power consumption are summarized: base chassis power, number of line
cards, number of active ports, port settings, port utilization,
implementation of packet classification of Ternary Content-
Addressable Memory (TCAM) and the size of TCAM, firmware version.
Clemm, et al. Expires January 12, 2023 [Page 4]
Internet-Draft July 2022
Furthermore it is important to understand the difference between
power consumption when a resource is idling versus when it is under
load. This helps to understand the incremental cost of additional
transmission versus the initial cost of transmission. Generally, the
cost of the first bit could be considered very high, as it requires
powering up a device, port, etc. The cost of transmission of
additional bits (beyond the first) is many orders of magnitude lower.
Likewise, the incremental cost of CPU and memory that will be needed
to process additional packets becomes fairly negligible.
The first set of metrics corresponds to ratings of the device:
o Power consumption when idle (e.g. Watts)
o Power consumption when fully loaded (e.g. Watts)
o Power consumption at various loads: e.g. 50% utilization, 90%
utilization
These metrics should be maintained for the device as a whole, and for
the subcomponents: i.e. for the chassis by itself, for each line
card, for each port. It should also take into account aspects such
as the current memory configuration, as the overall energy
consumption of a device is a function of the energy consumption of
the components the system is comprised of.
The metrics could be provided by the data sheet associated with the
device or they could be measured as part of a deployment. For
maximum accuracy and comparability, they should reflect pre-defined
environmental setting, e.g., operating temperature, relative
humidity, barometric pressure. For example, ATIS (Alliance for
Telecommunications Industry Solutions) [ATIS0600015.02] defines a
reference environment under which to measure router power
consumption: temperature of 25 celsius degree (within 3 celsius
degree deviation), relative humidity of 30% to 75%, barometric
pressure between 1020 and 812 mbar. In the AC power configuration,
the router should be evaluated at 230 VAC or within 1% deviation, 50
or 60 Hz or within 1% deviation [Ahn2014].
The second set of metrics reflects the actual power being drawn
during operation. It is the type of data that might be provided as
management data. Again, it should be provided for the device as a
whole, as well as for the subcomponents reflected in the device
hierarchy: line cards, ports, etc.
o Current power consumption (e.g. Watts)
Clemm, et al. Expires January 12, 2023 [Page 5]
Internet-Draft July 2022
o Power drawn since system start (or module insertion, if at the
level of a line card, or port activation, if at the level of a
port), for the past minute (e.g. Watt hours)
The third set of metrics are derived from the earlier metrics. They
normalize the power consumption relative to the line speeds
respectively amount of traffic that is passed.
o Current power consumption / kilooctet
The fourth set of metrics reflects expectation values about
incremental energy usage. It could be relevant for use cases that
assess the cost of additional traffic. [Bolla2011] and [Ahn2014]
found that the power consumption of a router is in direct proportion
of the link utilization as well as the packet sizes.
o Incremental power per packet, per kilooctet, per gigaoctet.
(Possible units might be pWh - pico Watt hours)
In addition to these metrics, it is conceivable to also have the
device reflect other context of relevance, such as the sustainability
rating of the power source. This could potentially be reflected
along a scale ranging from diesel-generator powered, via conventional
power grid, to renewable (powered by windmill, capture of excess
heat, etc). Also, the environmental status of the device could be
taken into consideration, such as whether it is deployed in a data
center and its share in contributing to the need for cooling. It is
conceivable to, for example, introduce corresponding metrics
indicating a "green rating" of device, and/or of the context in which
a device has been deployed.
3.1.2. Virtualization Considerations
Instrumentation should also take into account the possibility of
virtualization. This is important in particular as networking
functions may increasingly be virtualized and hosted (for example) in
a data center. Overlay networks may be formed. Likewise, many
applications expected to optimize energy consumption may be hosted on
controllers and applied to soft switches, VNFs (Virtual Network
Functions), or networking slices. The attribution of actual power
consumed to such virtualized entities is a non-trivial task. It
involves navigating layers of indirection to assess actual energy
usage and contribution by individual entities. While it would be
possible in such cases to simply revert to energy metrics of CPUs and
data centers as a whole, this loses the ability to account for those
metrics on the basis of networking decisions being made.
Clemm, et al. Expires January 12, 2023 [Page 6]
Internet-Draft July 2022
For example, virtualized networking functions could be hosted on
containers or virtual machines which are hosted on a CPU in a data
center instead of a regular network appliance such as a router or a
switch, leading to very different power consumption characteristics.
A data center CPU could be more power efficient and consume power
more proportionally to actual CPU load. Virtualization could result
in using fewer servers. [Energystar] reports that one watt-hour of
energy savings at the server level results in roughly 1.9 watt-hours
of facility-level energy savings by reducing energy waste in the
power infrastructure and reducing energy needed to cool the waste
heat produced by the server.
Instrumentation needs to reflect these facts and facilitate
attributing power consumption in a correct manner. Alternatively, a
simpler solution may be to simply forgo energy metrics for
virtualized functions entirely, instead focus on instrumenting and
relying on optimizing the energy footprint of the underlying hosting
infrastructure. In the meantime, the attribution of energy
consumption and carbon footprint to individual functions that run on
top of that infrastructure may be a topic for further research.
3.2. Energy Metrics related to Flows
Energy metrics related to flows attempt to capture the contribution
of a given flow to energy consumption. In its basic incarnation,
those metrics reflect the energy consumption at a given device. They
could be used in conjunction with IPFIX [RFC7011] and modeled as
Information Elements to be treated analogous to other flow statistics
[RFC7012]. The following is a corresponding set of flow energy
metrics:
o Incremental energy consumed over the duration of the flow.
This is the incremental energy consumption that is directly caused
by the flow, representing the difference between the amount of
energy consumed with the flow and the amount of energy that would
have been consumed without the flow. (It should be noted that
this metric may be difficult to assess in practice.)
o Amortized energy consumed over the duration of the flow.
This is the portion of the flow’s energy consumption for the
duration of the flow, effectively computed by computing the
proportion of flow traffic to overall traffic and multiplying it
with the total energy consumption incurred for that time.
A second set of energy metrics related to flow might aggregate the
flow’s energy consumption over the entire flow path. In that case,
the flow energy consumption is added up along the systems of the
traversed path. In practice, this will be more difficult to assess
Clemm, et al. Expires January 12, 2023 [Page 7]
Internet-Draft July 2022
for many reasons, including impacts of load balancing, PREOF (Packet
Replication, Elimination, and Ordering Functions [RFC8655]),
challenges to trace actual routes taken by production traffic, and
more.
3.3. Energy Metrics related to Paths
Enerby metrics related to paths involve assessing the carbon
footprints of paths and optimizing those paths so that overall
footprint is minimized, then applying techniques such as path-aware
networking [I.D.draft-chunduri-rtgwg-preferred-path-routing] or
segment routing [RFC8402] to steer traffic along those paths that are
deemed "the greenest" among alternatives. It also includes aspects
such as considering the incremental energy usage in routing
decisions.
Optimizing cost has a long tradition in networking; many of the
existing mechanisms can be leveraged for greener networking simply by
introducing energy footprint as a cost factor. Low-hanging fruit
include the inclusion of energy-related parameters as a cost
parameter in control planes, whether distributed (e.g. IGP) or
conceptually centralized via SDN controllers. In addition to power
consumption over a path itself, other factors such as paths involving
intermediate routers that are powered by renewable energy resources
might be considered, as might be determined by an aggregate
sustainability score. After all, paths with devices that are powered
by solar, wind, or geothermal might be preferable over paths
involving devices powered by conventional energy that may include
fossil fuel or nuclear resources.
The following are a corresponding set of candidate metrics:
o Energy rating of a path. (This could be computed as a function of
energy ratings of different hops along the path.)
o Current power consumption across a path. (This could be computed
by aggregating the current power per packet (or per kilo octet
etc) of each of the hops along the path.)
o Incremental power for a packet over a path. (This could be
computed by aggregating the incremental power per packet of each
of the hops along the path.)
3.4. Energy Metrics related to the Network-at-Large
Ultimately, the goal of energy optimization and reduction of carbon
footprint is to minimize the aggregate amount of energy used across
the entire network, as well as to minimize the overall carbon
Clemm, et al. Expires January 12, 2023 [Page 8]
Internet-Draft July 2022
footprint of the network as a whole. Accordingly, metrics that
aggregate the energy usage across the network as a whole are needed.
In order to account for changing traffic profiles, growth in user
traffic etc, additional metrics are needed that normalize the total
over the volume of services supported and volume of traffic passed.
Corresponding metrics will generally be computed at the level of
Operational Support Systems (or Business Support Systems) for the
entire network.
Some of the metrics used include the following [telefonica2020]:
o Total energy consumption (MWh)
o Electricity from renewable sources (%)
o Network energy efficiency (MWh/PB)
4. Other considerations and discussion items
This document is intended to spark discussion about what energy
metrics will be useful to reduce the carbon footprint of networks -
that provide visibility into energy consumption, that help
optimization of networks under green criteria, that enable the next
generation of energy-aware controllers and services. Clearly, other
metrics are conceivable and more considerations apply beyond those
that are currently reflected in this document. The following
subsections highlight items that warrant further discussion and that
might be addressed in greater detail in future revisions of this
document.
4.1. User perspective
Arguably, attributing energy usage to individual users and making
users aware of the energy-implications of their communication
behavior may provide interesting possibilities to reduce energy
footprint by guiding their behavior accordingly. For example, the
network could present clients with energy statistics related to their
communication usage. This could be supported by metrics related to
service instances, such as energy usage statistics beyond statistics
regarding volume, duration, number of transactions. Such approaches
would raise questions about how to actually collect such statistics
accurately (versus just computing them via a formula) or what to
actually include as part of those statistics (amortized vs
incuremental energy contribution, attribution of cost for path
resilience or retransmissions due to congestion, etc). They also
raise questions about how they would in practice be used. For
example, energy-based charging might be explored as an alternative
for volume-based charging; however, in practice the two may be
Clemm, et al. Expires January 12, 2023 [Page 9]
Internet-Draft July 2022
strongly correlated and rejected by customers for similar reasons
that volume-based charging is frequently rejected.
4.2. Holistic perspective
The network itself is only one contributor to a network’s carbon
footprint. Arguably just as important are aspects outside the
network itself, such as cooling and ventilation. These aspects need
to be considered. However, reflecting such aspects here would
arguably result in "boiling the ocean" and are therefore not
addressed here.
4.3. Sustainable equipment production
Internet energy consumption may constitute two major components
[Raghavan2011]: (1) the energy of the devices that construct the
Internet, including the infrastructure devices: routers, LAN devices,
cellular and telecommunication infrastructure, (2) More broadly, with
the rise of peer-to-peer applications and cloud services, it also
considers the energy consumption of the end systems, including
desktops, laptops, smart phones, cloud servers, and application
servers that are not in the cloud.
For those two components, the following factors need to take into
consideration for energy consumption calculation:
o Energy consumed in manufacturing of the devices and end-systems,
as well as the contribution from their components and materials.
o The replacement lifespan of the devices and end-systems: desktops
and laptops are typically replaced in 3-4 years, smartphones in 2
years, application servers and cloud servers in 3 years, routers
and WiFi-LAN switches in 3 years, cellular towers and
telecommunication switches in 10 years, fiber optics in 10 years,
copper in 30 years, etc. With the incremental growth rate of the
technology advancement, the replacement lifespan might be
decreased over time.
o Operational maintenance: the network would not be functional
without various software and implementation of protocols. The
energy consumed in creating software is complicated because it is
overwhelmingly human involved, which usually include the energy
used for the facilities of the software companies and human energy
of the programmers.
o Replacement: The energy consumed in replacement of devices and
end-systems could vary. Some could be very energy intensive for
Clemm, et al. Expires January 12, 2023 [Page 10]
Internet-Draft July 2022
those large devices, e.g., cellular towers, or environmental
unfriendly equipment, such as submarine communication cables.
o Disposal: There is substantial energy cost in disposing and
recycling the old devices and equipment.
By combining the energy consumption for running each device that
builds the Internet [JuniperRouterPower], and the energy consumption
of the end systems, in the meantime counting the energy consumption
of manufacturing, operational maintenance, replacement and lifespan,
disposal of those devices and equipment, we may have an estimate of
the energy consumption for the network as a whole.
5. IANA Considerations
This document does not have any IANA requests.
6. Security Considerations
When instrumenting a network for energy metrics, it is important that
implementations are secured to ensure that data is accurately
measured and cannot be tampered with. For example, an attacker might
try to tamper energy readings to confuse controller trying to minize
power consumption, leading to increased power consumption instead.
In addition, access to the data needs to be secured in similar ways
as for other sensitive management data, for example using secure
management protocols and subjecting energy data that is maintained in
YANG datastores via NACM (NETCONF Access Control Model).
However, it should be noted that this draft specifies only metrics
themselves, not how to instrument networks accordingly. For the
definition of metrics themselves, security considerations do thus not
really apply.
7. Acknowledgments
Acknowledgments will be added when the time comes.
8. Informative References
[Ahn2014] Ahn, J. and H. S. Park, "Measurement and modeling the
power consumption of router interface",
DOI: 10.1109/ICACT.2014.6779082, 16th International
Conference on Advanced Communication Technology, pp.
860-863, 2014,
<https://ieeexplore.ieee.org/document/6779082>.
Clemm, et al. Expires January 12, 2023 [Page 11]
Internet-Draft July 2022
[ATIS0600015.02]
AITS, "Energy Efficiency for Telecommunication Equipment:
Methodology for Measurement and Reporting - Transport and
Optical Access Requirements", March 2016.
[Bolla2011]
Bolla, R., Bruschi, R., Lombardo, C., and D. Suino,
"Evaluating the energy-awareness of future Internet
devices", DOI: 10.1109/HPSR.2011.5986001, 2011 IEEE 12th
International Conference on High Performance Switching and
Routing, pp. 36-43, 2011,
<https://ieeexplore.ieee.org/document/5986001>.
[Energystar]
EnergyStar, "12 Ways to Save Energy in the Data Center,
Server Virtualization", 2022,
<https://www.energystar.gov/products/
low_carbon_it_campaign/12_ways_save_energy_data_center/
server_virtualization>.
[I.D.draft-chunduri-rtgwg-preferred-path-routing]
Bryant, S. E., Chunduri, U., and A. Clemm, "Preferred Path
Routing Framework", May 2022,
<https://datatracker.ietf.org/doc/html/draft-chunduri-
rtgwg-preferred-path-routing-01>.
[I.D.draft-cwx-green-ps]
Clemm, A. and C. Westphal, "Challenges and Opportunities
in Green Networking", June 2022.
[I.D.draft-manral-bmwg-power-usage]
Manral, V., "Benchmarking Power usage of networking
devices", Jan 2011.
[JuniperRouterPower]
Juniper, "Power Requirements for an MX960 Router", 2021.
[Raghavan2011]
Raghavan, B. and J. Ma, "The energy and emergy of the
Internet", HotNets-X: Proceedings of the 10th ACM Workshop
on Hot Topics in Networks, pp. 1-6, 2011,
<https://dl.acm.org/doi/10.1145/2070562.2070571>.
[RFC7011] (Ed.), B. C., (Ed.), B. T., and P. Aitken, "Specification
of the IP Flow Information Export (IPFIX) Protocol for the
Exchange of Flow Information", RFC 7011, September 2013,
<https://datatracker.ietf.org/doc/html/rfc7011>.
Clemm, et al. Expires January 12, 2023 [Page 12]
Internet-Draft July 2022
[RFC7012] (Ed.), B. C. and B. T. (Ed.), "Information Model for IP
Flow Information Export (IPFIX)", RFC 7012, September
2013, <https://datatracker.ietf.org/doc/html/rfc7012>.
[RFC7950] Bjorklund, M. E., "The YANG 1.1 Data Modeling Language",
RFC 7950, August 2016,
<https://datatracker.ietf.org/doc/html/rfc7950>.
[RFC8402] (Ed.), C. F., (Ed.), S. P., Ginsberg, L., Decraene, B.,
Decraene, B., Litkowski, S., and R. Shakir, "Segment
Routing Architecture", RFC 8402, July 2018,
<https://datatracker.ietf.org/doc/html/rfc8402>.
[RFC8655] Finn, N., Thubert, P., Varga, B., and J. Farkas,
"Deterministic Networking Architecture", RFC 8655, October
2019, <https://datatracker.ietf.org/doc/html/rfc8655>.
[telefonica2020]
Telefonica, "Telefonica Consolidated Annual Report 2020.",
2020.
Authors’ Addresses
Alexander Clemm
Futurewei
2220 Central Expressway
Santa Clara CA 95050
USA
Email: ludwig@clemm.org
Lijun Dong
Futurewei
2220 Central Expressway
Santa Clara CA 95050
USA
Email: lijun.dong@futurewei.com
Greg Mirsky
Ericsson
Email: gregimirsky@gmail.com
Clemm, et al. Expires January 12, 2023 [Page 13]
Internet-Draft July 2022
Laurent Ciavaglia
Rakuten Mobile
Email: laurent.ciavaglia@rakuten.com
Jeff Tantsura
Microsoft
Email: jefftant.ietf@gmail.com
Marie-Paule Odini
Email: mp.odini@orange.fr
Clemm, et al. Expires January 12, 2023 [Page 14]
Network Working Group A. Clemm
Internet-Draft C. Westphal
Intended status: Informational Futurewei
Expires: January 12, 2023 J. Tantsura
Microsoft
L. Ciavaglia
Rakuten Mobile
M-P. Odini
July 11, 2022
Challenges and Opportunities in Green Networking
draft-cx-green-ps-00
Abstract
Reducing technology’s carbon footprint is one of the big challenges
of our age. Networks are an enabler of applications that reduce this
footprint, but also contribute to this footprint substantially
themselves. The biggest opportunities to reduce the energy footprint
may not be networking specific, for instance general power efficiency
gains in hardware or hosting of equipment in more cooling-efficient
buildings. Yet methods to make networking technology itself
"greener" also need to be explored. This document outlines a
corresponding set of opportunities, along with associated research
challenges, for reducing this footprint and reducing network energy
demand.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 12, 2023.
Clemm, et al. Expires January 12, 2023 [Page 1]
Internet-Draft July 2022
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust’s Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Definitions and Acronyms . . . . . . . . . . . . . . . . . . 5
3. Contributors to Network Energy Consumption . . . . . . . . . 5
4. Challenges and Opportunities - Equipment Level . . . . . . . 6
5. Challenges and Opportunities - Protocol Level . . . . . . . . 7
5.1. Data Volume Reduction . . . . . . . . . . . . . . . . . . 8
5.2. Traffic Adaptation . . . . . . . . . . . . . . . . . . . 9
5.3. Enabling Network Energy Saving Mechanisms . . . . . . . . 9
5.4. Network Addressing . . . . . . . . . . . . . . . . . . . 10
6. Challenges and Opportunities - Network Level . . . . . . . . 10
7. Challenges and Opportunities - Architecture Level . . . . . . 12
8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 13
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
10. Security Considerations . . . . . . . . . . . . . . . . . . . 15
11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15
12. Informative References (TBD) . . . . . . . . . . . . . . . . 15
Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 18
1. Introduction
Climate change and the need to curb greenhouse emissions have been
recognized by the United Nations and by most governments as one of
the big challenges of our time. As a result, improving energy
efficiency and reducing power consumption are becoming of increasing
importance for society and for many industries. The networking
industry is no exception.
Arguably, networks can already be considered "green" technology in
that networks enable many applications that allow users and whole
industries to save energy and become more sustainable in a
significant way. For example, it allows (at least to an extent) to
Clemm, et al. Expires January 12, 2023 [Page 2]
Internet-Draft July 2022
replace travel with teleconferencing; it enables many employees to
work from home and "telecommute," thus reducing the need for actual
commute; IoT applications that facilitate automated monitoring and
control from remote sites help make agriculture more sustainable by
minimizing the application of resources such as water and fertilizer;
networked smart buildings allow for greater energy optimization and
sparser use of lighting and HVAC (heating, ventilation, air
conditioning) than their non-networked not-so-smart counterparts.
That said, networks themselves consume significant amounts of energy.
Therefore, the networking industry has an important role to play in
meeting sustainability goals not just by enabling others to reduce
their reliance on energy, but by also reducing its own. Future
networking advances will increasingly need to focus on becoming more
energy-efficient and reducing carbon footprint, both for economic
reasons and for reasons of corporate responsibility. This shift has
already begun and sustainability is already becoming an important
concern for network providers. In some cases such as in the context
of networked data centers, the ability to procure enough energy
becomes a bottleneck prohibiting further growth and greater
sustainability thus becomes a business necessity.
For example, in its annual report, Telefonica reports that in 2020,
its network’s energy consumption per PB of data amounted to 78MWh
[telefonica2020]. This rate has has been dramatically decreasing (a
five-fold factor over five years) although gains in efficiency are
being offset by simultaneous growth in data volume. In the same
report, it is stated as an important corporate goal to continue on
that trajectory and reduce overall carbon emissions by 70% over the
next 5 years.
From a technical perspective, multiple vectors along which networks
can be made "greener" should be considered:
o At the equipment level. Perhaps the most promising vector for
improving networking sustainability concerns the network equipment
itself. At the most fundamental level, networks (even softwarized
ones) involve appliances, i.e. equipment that relies on electrical
power to perform its function. However, beyond making those
appliances merely energy-efficient, there are other important ways
in which equipment can help networks become greener. This
includes aspects such as support for port power saving modes
allowing to reduce power consumption for resources that are not
fully utilized, but also instrumentation that allows to precisely
monitor power usage at different levels of granularity, enabling
(for example) controllers applications that aim to optimize energy
usage across the network. (As a side note, the term "device", as
used in the context of this draft, is used to refer to networking
Clemm, et al. Expires January 12, 2023 [Page 3]
Internet-Draft July 2022
equipment. We are not taking into consideration end-user devices
and endpoints such as mobile phones or computing equipment.)
o At the protocol level. Energy-efficiency and greenness are
aspects that are rarely considered when designing network
protocols. This suggests that there may be plenty of untapped
potential. Some aspects involve designing protocols in ways that
reduce the need for redundant or wasteful transmission of data to
allow not only for better network utilization, but greater goodput
per unit of energy being consumed. Techniques include approaches
that reduce the "header tax" incurred by payloads as well as
methods resulting in the reduction of wasteful retransmissions.
Likewise, aspects such as restructuring addresses in ways that
allow to minimize the size of lookup tables and associated memory
sizes and their energy use can play a role as well. Another role
of protocols concerns the enabling of functionality to improve
energy efficiency at the network level, such as discovery
protocols that allow for quick adaptation to network components
being taken dynamically into and out of service depending on
network conditions.
o At the network level. Perhaps the greatest opportunities to
realize power savings exist at the level of the network as whole.
For example, optimizing energy efficiency may involve directing
traffic in such a way that it allows for isolation of equipment
that may at the moment not be needed so that it could be powered
down or brought into power-saving mode. By the same token,
traffic should be directed in a way that requires bringing
additional equipment online or out of power-saving mode in cases
where alternative traffic paths are available for which the
incremental energy cost would amount to zero. Likewise, some
networking devices may be more power-intensive than others whose
use might be avoided unless required to meet peak capacity
demands. Generally, incremental power consumption can be viewed
as a cost metric that networks should strive to minimize and
consider as part of routing and of network path optimization.
o At the architecture level. The current network architecture
supports a wide range of applications, but does not take into
account energy efficiency as one of its design parameters. One
can argue that the most energy efficient shift of the last two
decades has been the deployment of Content Delivery Network
overlays: while these were set up to reduce latency and minimize
bandwidth consumption, from a network perspective, retrieving the
content from a local cache is also much greener. What other
architectural shifts can produce energy consumption reduction?
Clemm, et al. Expires January 12, 2023 [Page 4]
Internet-Draft July 2022
We believe that network standardization organizations in general, and
IETF in particular, can make important contributions to each of these
vectors. In this document, we will there explore each of those
vectors in further detail and for each point out specific challenges
for IETF.
It should be noted that this document borrows heavily from material
from a prior paper, [GreenNet22]. This material has been both
expanded (for example, in terms of some of the opportunities) and
pruned (for example, in terms of background on prior scholarly work).
In addition, unlike the prior paper, this document focuses on and
attempts to articulate specific challenges as related to work that
could be championed by the IETF.
2. Definitions and Acronyms
TBD
3. Contributors to Network Energy Consumption
When exploring possibilities to improve energy efficiency, it is
important to understand which aspects contribute to power consumption
the most and hence where the greatest potential for power savings
lies.
Power is ultimately drawn from devices. The power consumption of the
device can be divided into the consumption of the core device - the
backplane and CPU, if you will - as well as additional consumption
incurred per port and line card. Furthermore it is important to
understand the difference between power consumption when a resource
is idling versus when it is under load. This helps to understand the
incremental cost of additional transmission versus the initial cost
of transmission.
In typical networking devices, only roughly half of the energy
consumption is associated with the data plane [bolla2011energy]. An
idle base system typically consumes more than half of the power over
the same system running at full load [chabarek08], [cervero19]. This
means that a device’s power consumption increases not linearly with
the volume of forwarded traffic but resembles more of a step
function. Generally, the cost of the first bit is very high, as it
requires powering up a device, port, etc. The cost of transmission
of additional bits (beyond the first) is many orders of magnitude
lower. Likewise, the incremental cost of incremental CPU and memory
needed to process additional packets becomes fairly negligible. By
the same token, generally speaking it is more energy-efficient to
transmit a large volume of data in one burst (and turning off the
interface when idling), instead of continuously transmitting at a
Clemm, et al. Expires January 12, 2023 [Page 5]
Internet-Draft July 2022
lower rate. In that sense it can be the duration of the transmission
that dominates the energy consumption, not the actual data rate.
The implications on green networking from an energy-savings
standpoint are significant: Potentially the largest gains can be made
when network resources can effectively be taken off the grid (i.e.
isolated and removed from service so they can be powered down while
not needed). Likewise, for applications where this is possible, it
may be desirable to replace continuous traffic at low data rates with
traffic that is sent in burst at high data rates, in order to
potentially maximize the time during which resources can be idled.
At the same time, any non-idle resources should be utilized to the
greatest extent possible as the incremental energy cost is
negligible. Of course, this needs to occur while still taking other
operational goals into consideration, such as protection against
failures (allowing for readily-available redundancy and spare
capacity in case of failure) and load balancing (for increased
operational robustness). As data transmission needs tend to
fluctuate wildly and occur in bursts, any optimization schemes need
to be highly adaptable and allow for very short control loops.
As a result, emphasis needs to be given to technology that allows to
(for example) (at the device level) exercise very efficient and rapid
discovery, monitoring, and control of networking resources so that
they can be dynamically be taken offline or back into service,
without (at the network level) requiring extensive convergence of
state across the network or recalculation of routes and other
optimization problems, and (at the network equipment level) support
rapid power cycle and initialization schemes.
4. Challenges and Opportunities - Equipment Level
Perhaps the most obvious opportunities to make networking technology
more energy efficient exist at the equipment level. After all,
networking involves physical equipment to receive and transmit data.
Making such equipment more power efficient, have it dissipate less
heat to consume less energy and reduce the need for cooling, making
it eco-friendly to deploy, sourcing sustainable materials and
facilitating recycling of equipment at the end of its life-cycle all
contribute to making networks greener. More specific and unique to
networking are schemes to reduce energy usage of transmission
technology from wireless (antennas) to optical (lasers).
Beyond such "first-order" opportunities, network equipment just as
importantly plays an important role to enable and support green
networking at other levels. Of prime importance is the equipment’s
ability to provide visibility to management and control plane into
Clemm, et al. Expires January 12, 2023 [Page 6]
Internet-Draft July 2022
its current energy usage. Such visibility enables control loops for
energy optimization schemes, allowing applications to obtain feedback
regarding the energy implications of their actions, from setting up
paths across the network that require the least incremental amount of
energy to quantifying metrics related to energy cost used to optimize
forwarding decisions.
One prerequisite to such schemes is to have proper instrumentation in
place that allows to monitor current power consumption at the level
of networking devices as a whole, line cards, and individual ports.
Such instrumentation should also allow to assess the energy
efficiency and carbon footprint of the device as a whole. In
addition, it would be desirable to relate this power consumption to
data rates as well as to current traffic, for example, to indicate
current energy consumption relative to interface speeds, as well as
for incremental energy consumption that is expected for incremental
traffic (to aid control schemes that aim to "shave" power off current
services or to minimize the incremental use of power for additional
traffic). This is an area where the current state of the art is
sorely lacking and standardization lags behind; for example, as of
today, no corresponding standardized YANG data models [RFC7950] for
network energy consumption that can be used in conjunction with
management and control protocols have been defined.
Instrumentation should also take into account the possibility of
virtualization, introducing layers of indirection to assess the
actual energy usage. For example, virtualized networking functions
could be hosted on containers or virtual machines which are hosted on
a CPU in a data center instead of a regular network appliance such as
a router or a switch, leading to very different power consumption
characteristics. For example, a data center CPU could be more power
efficient and consume power more proportionally to actual CPU load.
Instrumentation needs to reflect these facts and facilitate
attributing power consumption in a correct manner.
Beyond monitoring and providing visibility into power consumption,
control knobs are needed to configure energy saving policies. For
instance, power saving modes are common in endpoints (such as mobile
phones or notebook computers) but sorely lacking in networking
equipment.
5. Challenges and Opportunities - Protocol Level
There are several opportunities for energy savings at the protocol
level. We characterize them along three main categories: protocols
designed to reduce the volume of data to be transmitted; protocols
designed to optimize data transmission rates under energy
considerations; and protocols that enable energy optimization schemes
Clemm, et al. Expires January 12, 2023 [Page 7]
Internet-Draft July 2022
at the network level. A fourth category, "other", is used to capture
any other aspects not easily categorized into the other three.
5.1. Data Volume Reduction
The first category involves designing protocols in such a way that
they reduce the volume of data that needs to be transmitted for any
given purpose. Loosely speaking, by reducing this volume, more
traffic can be served by the same amount of networking
infrastructure, hence reducing overall energy consumption.
Possibilities here include protocols that avoid unnecessary
retransmissions. At the application layer, protocols may also use
coding mechanisms that encode information close to the Shannon limit.
Currently, most of the traffic over the Internet consists of video
streaming and encoders for video are already quite efficient and keep
improving all the time, resulting in energy savings as one of many
advantages (of course being offset by increasingly higher
resolution). However, it is not clear that the extra work to achieve
higher compression ratios for the payloads results in a net energy
gain: what is saved over the network may be offset by the
compression/decompression effort. Further research on this aspect is
necessary.
At the transport protocol layer, TCP and to some extent QUIC react to
congestion by dropping packets. This is a highly energy inefficient
method to signal congestion, since the network has to wait one RTT to
be aware that the congestion has occurred, and since the effort to
transmit the packet from the source up until it is dropped ends up
being wasted. This calls for new transport protocols that react to
congestion without dropping packets. ECN[RFC2481] is a possible
solution, however not widely deployed. DC-TCP [alizadeh2010DCTCP] is
tuned for the Data Center. Qualitative Communication [QUAL]
[westphal2021qualitative] allows the nodes to react to congestion by
dropping only some of the data in the packet, thereby only partially
wasting the resource consumed by transmitted the packet up to this
point. We believe there is a need for novel transport protocols for
the WAN that ensures that no energy is wasted transmitting packets
that will be eventually dropped.
Another solution to reduce the bandwidth of network protocols by
reducing their header tax, for example applying header compression.
An example in IETF is [RFC3095]. Again, reducing protocol header
size saves energy to forward packets, but at the cost of maintaining
a state for compression/decompression, plus computing these
operations. The gain from such protocol optimization further depends
on the application and whether it sends packets with large payloads
close to the MTU (the header tax and any savings here are very
Clemm, et al. Expires January 12, 2023 [Page 8]
Internet-Draft July 2022
limited), or whether it sends packets with very small payload size
(making the header tax more pronounced and savings more significant).
An alternative to reducing the amount of protocol data is to design
routing protocols that are more efficient to process at each node.
For instance, path based forwarding/labels such as MPLS [RFC3031]
facilitate the next hop look-up, thereby reducing the energy
consumption. It is unclear if some state at router to speed up look
up is more energy efficient that "no state + lookup" that is more
computationally intensive. Other methods to speed up a next-hop
lookup include geographic routing (e.g. [herzen2011PIE]). Some
network protocols could be designed to reduce the next hop look-up
computation at a router. It is unclear if Longest Prefix Match (LPM)
is inefficient from an energy point of view, or if it is a
significant energy budget cost for the operation of a router.
5.2. Traffic Adaptation
The second category involves designing protocols in such a way that
the rate of transmission is chosen to maximize energy efficiency.
For example, Traffic Engineering (TE) can be manipulated to impact
the rate adaptation mechanism [ren2018jordan]. By choosing where to
send the traffic, TE can artificially congest links so as to trigger
rate adaptation and therefore reduce the total amount of traffic.
Most TE systems attempt to minimize Maximal Link Utilization (MLU)
but energy saving mechanisms could decide to do the opposite
(maximize minimial link utilization) and attempt to turn off some
resources to save power.
5.3. Enabling Network Energy Saving Mechanisms
Novel protocols are also needed in two dimensions: to discover what
links are available and/or energy efficient. For instance, links may
be turned off in order to save energy, and turned back on based upon
the elasticity of the demand. Protocols should be devised to
discover when this happens, and to have a view of the topology that
is consistent with frequent topology updates due to power cycling of
the network resources.
Also, protocols are required to quickly converge onto an energy-
efficient path once a new topology is created by turning links on/
off. Current routing protocols may provide for fast recovery in the
case of failure. However, failures are hopefully relatively rare
events, while we expect an energy efficient network to aggressively
try to turn off links.
Some mechanism is needed to present to the management layer a view of
the network that identifies opportunities to turn resources off
Clemm, et al. Expires January 12, 2023 [Page 9]
Internet-Draft July 2022
(routers/links) while still providing some decent level of Quality of
Experience (QoE) to the users. This gets more complex as the level
of QoE shifts from the current Best Effort delivery model to more
sophisticated mechanisms with, for instance, latency, bandwidth or
reliability guarantees.
5.4. Network Addressing
There are other ways to shave off energy usage from networks. One
example concerns network addressing. Address tables can get very
large, resulting in large forwarding tables that require considerable
amount of memory, in addition to large amounts of state needing to be
maintained and synchronized. From an energy footprint perspective,
both can be considered wasteful and offer opportunities for
improvement. At the protocol level, rethinking how addresses are
structured can allow for flexible addressing schemes that can be
exploited in network deployments that are less energy-intensive by
design. This can be complemented by supporting clever address
allocation schemes that minimize the number of required forwarding
entries as part of deployments.
6. Challenges and Opportunities - Network Level
Networks have been optimized for many years under many criteria, for
example to optimize (maximize) network utilization and to optimize
(minimize) cost. Hence, it is straighforward to add optimization for
"greenness" (including energy efficiency, power consumption, carbon
footprint) as important criteria.
This includes assessing the carbon footprints of paths and optimizing
those paths so that overall footprint is minimized, then applying
techniques such as path-aware networking or segment routing [RFC8402]
to steer traffic along those paths. It also includes aspects such as
considering the incremental energy usage in routing decisions.
Optimizing cost has a long tradition in networking; many of the
existing mechanisms can be leveraged for greener networking simply by
introducing energy footprint as a cost factor. Low-hanging fruit
include the inclusion of energy-related parameters as a cost
parameter in control planes, whether distributed (e.g. IGP) or
conceptually centralized via SDN controllers.
Other opportunities concern adding energy-awareness to dynamic path
selection schemes, requiring corresponding instrumentation as
mentioned earlier. Again, considerable energy savings can
potentially be realized by taking resources offline (e.g. putting
them into power-saving or hibernation mode) when they are not
currently needed under current network demand and load conditions.
Therefore, weaning such resources from traffic becomes an important
Clemm, et al. Expires January 12, 2023 [Page 10]
Internet-Draft July 2022
consideration for energy-efficient traffic steering. This contrasts
and indeed conflicts with existing schemes that typically aim to
create redundancy and load-balance traffic across a network to
achieve even resource utilization. This usually occurs for important
reasons, such as making networks more resilient, optimizing service
levels, and increasing fairness. One of the big challenges hence
concerns how resource weaning schemes to realize energy savings can
be accommodated while preventing the cannibalization of other
important goals, counteracting other established mechanisms, and
avoiding destabilization of the network.
As an important prerequisite to capture many of those opportunities,
good abstractions (and corresponding instrumentation) that allow to
easily assess energy cost and carbon footprint will be required.
These abstractions need to account for not only for the energy cost
associated with packet forwarding across a given path, but related
cost for processing, for memory, for maintaining of state, to result
in a holistic picture. Optimization of carbon footprint involves in
many cases trade-offs that involve not only packet forwarding but
also aspects such as keeping state, caching data, or running
computations at the edge instead of elsewhere. (Note: there may be a
differential in running a computation at an edge server vs. at an
hyperscale DC. The latter is often better optimized than the
latter.) Likewise, other aspects of carbon footprint beyond mere
energy-intensity should be considered. For instance, some network
segments may be powered by more sustainable energy sources than
others, and some network equipment may be more environmentally-
friendly to build, deploy and recycle, all of which can be reflected
in abstractions to consider.
A related set of challenges concerns the fact that such schemes
result in much greater dynamicity and continuous change in the
network as resources may be getting steered away from (when possible)
and then leveraged again (when necessary) in rapid succession. This
imposes significant stress on convergence schemes that results in
challenges to the scalability of solutions and their ability to
perform in a fast-enough manner. Network-wide convergence imposes
high cost and incurs significant delay and is hence not susceptible
to such schemes. The impact will in all likelihood needs to be
mechanisms that do not require convergence beyond the vicinity of the
affected network device. Especially in cases where central network
controllers are involved that are responsible for aspects such as
configuration of paths and the positioning of network functions and
that aim for global optimization, the impact of churn needs to be
minimized. This means that, for example, extensive recalculation
e.g. of routes and paths based on the current energy state of the
network needs to be avoided.
Clemm, et al. Expires January 12, 2023 [Page 11]
Internet-Draft July 2022
An opportunity may lie in making a distinction between "energy modes"
of different domains. For instance, in a highly trafficked core, the
energy challenge is to transmit the traffic efficiently. The amount
of traffic is relatively fluid (due to multiplexing of multiple
sessions) and the traffic is predictable. In this case, there is no
need to optimize on a per session basis nor even at a short time
scale. In the access networks connecting to that core, though, there
are opportunites for this fast convergence: traffic is much more
bursty, less predictable and the network should be able to be more
reactive. Other domains such as DCs may have also more variable
workloads and different traffic patterns.
7. Challenges and Opportunities - Architecture Level
Another possibility to improve network energy efficiency is to
organize networks in a way that they can best serve important
applications so as to minimize energy consumption. Examples include
retrieval of content or remote computation. This allows to minimize
the amount of communication that needs to take place in the first
place, although energy savings within the network may at least in
part be offset by additional energy consumption elsewhere. The
following are some examples that suggest that it may be worthwhile
reconsidering the ways in which networks are architected to minimize
their carbon footprint.
For example, Content Delivery Networks (CDNs) have reduced the energy
expenditure of the Internet by downloading content near the users.
The content is sent only a few times over the WAN, and then is served
locally. This shifts the energy consumption from networking to
storage. Further methods can reduce the energy usage even more
[bianco2016energy][mathew2011energy][islam2012evaluating]. Whether
overall energy savings are net positive depends on the actual
deployment, but from the network operator’s perspective, at least it
shifts the energy bill away from the network to the CDN operator.
While CDNs operate as an overlay, another architecture has been
proposed to provide the CDN features directly in the network, namely
Information Centric Networks [ahlgren2012survey], studied as well in
the IRTF ICNRG. This however shifts the energy consumption back to
the network operator and requires some power-hungy hardware, such as
chips for larger name look-ups and memory for the in-network cache.
As a result, it is unclear if there is an actual energy gain from the
dissemination and retrieval of content within in-network caches.
Fog computing and placing intelligence at the edge are other
architectural directions for reducing the amount of energy that is
spent on packet forwarding and in the network. There again, the
trade-off is between performing computation in a an energy-optimized
Clemm, et al. Expires January 12, 2023 [Page 12]
Internet-Draft July 2022
data center at very large scale, but requiring transmission of
significant volumes of data across many nodes and long distances,
versus performing computational tasks at the edge where the energy
may not be used as efficiently (less multiplexing of resources, and
smaller sites are inherently less efficient due to their smaller
scale) but the amount of long-distance network traffic is
significantly reduced. Softwarization, containers, microservices are
direct enablers for such architectures, and the deployment of
programmable network infrastructure (as for instance Infrastructure
Processing Units - IPUs or smartNICs that offload some computations
from the CPU onto the NIC) will help its realization. However, the
power consumption characteristics of CPUs are different from those of
NPUs, another aspect to be considered in conjunction with
virtualization.
Other possibilities concern taking economic aspects into
consideration impact, such as providing incentives to users of
networking services in order to minimize energy consumption and
emission impact. An example for this is given in
[wolf2014choicenet], which could be expanded to include energy
incentives.
Other approaches consider performing a late binding of data and
functions to be performed on the data [krol2017NFaaS]. The COIN
Research Group in IRTF focuses on similar issues. Jointly optimizing
for the total energy cost, taking into account networking and
computing (and the different energy cost of computing in an
hyperscale DC vs an edge node) is still an area of open research.
In summary, rethinking of the overall network (and networked
application) architecture can be an opportunity to significantly
reduce the energy cost at the network layer, for example by
performing tasks that involve massive communications closer to the
user. To what extend these shifts result in a net reduction of
carbon footprint is an important question that requires further
analysis on a case-by-case basis.
8. Conclusions
How to make networks "greener" and reduce their carbon footprint is
an important problem for the networking industry to address, both for
societal and for economic reasons. This document has highlighted
some of the technical challenges and opportunities in that regard,
for example:
o Equipment instrumentation advances for improved energy-awareness,
definition and standardization of granular management information;
Clemm, et al. Expires January 12, 2023 [Page 13]
Internet-Draft July 2022
o Protocol advances for improving the ratio of goodput to throughput
and to reduce waste: reduction in header tax, in protocol
verbosity, improvements in coding, etc.
o Protocol advances to enable rapidly taking down, bring back
online, and discover availability and power saving status of
networking resources while minimizing the need for reconvergence
and propagation of state;
o Network advances to allow to dynamically take resources offline
where feasible while minimizing churn;
o Energy footprint aware traffic steering and routing; carbon
footprint as a traffic cost metric to optimize;
o Reorganization of networking architecture for important classes of
applications (examples: content delivery, right-placing of
computational intelligence) to optimize green foot print and
holistic approaches to trade off carbon footprint between
forwarding, storage, and computation;
o Security issues imposed by greater energy awareness, to minimize
the new attack surfaces that would allow an adversary to turn off
resources, or to waste energy;
o Reliability issues for a network that relies on fewer resource
diversity, and with more operational complexity.
Of those, perhaps the key challenge to address right away concerns
the ability to expose at a fine granularity the energy impact of any
networking actions. Providing visibility into this will enable many
approaches to come towards a solution. It will be key to
implementing optimization via control loops that allow to assess the
energy impact of decisiont taken. It will also help to answer
questions such as: is caching - with the associated storage energy -
better than retransmitting from a different server - with the
associated networking cost? Is compression more energy-efficient
once factoring the computation cost of compression vs transmitting
uncompressed data? Which compression scheme is more energy
efficient? Is energy saving of computing at an efficient hyperscale
DC compensated by the networking cost to reach that DC? Is the
overhead of gathering and transmitting fine-grained energy telemetry
data offset by the total energy gain by ways of better decisions that
this data enables? Is transmitting data to a LEO constellation
compensated by the fact that once in the constellation, the
networking is fueled on solar energy? Is the energy cost of sending
rockets to place routers in Low Earth Orbit amortized over time?
Clemm, et al. Expires January 12, 2023 [Page 14]
Internet-Draft July 2022
Determining where the sweet spots are and optimizing networks along
those lines will be a key towards making networks "greener". We
expect to see significant advances across these areas and believe
that IETF has an important role to play in facilitating this.
9. IANA Considerations
This document does not have any IANA requests.
10. Security Considerations
Security considerations may appear to be orthogonal to green
networking considerations. However, there are a number of important
caveats.
Security vulnerabilities of networks may manifest themselves in
compromised energy efficiency. For example, attackers could aim at
increasing energy consumption in order to drive up attack victims’
energy bill. Specific vulnerabilities will depend on the particular
mechanisms. For example, in the case of monitoring energy
consumption data, tampering with such data might result in
compromised energy optimization control loops. Hence any mechanisms
to instrument and monitor the network for such data need to be
properly secured to ensure authenticity.
In some cases there are inherent tradeoffs between security and
maximal energy efficiency that might otherwise be achieved. An
example is encryption, which requires additional computation for
encryption and decyption activities and security handshakes, in
addition to the need to send more traffic than necessitated by the
entropy of the actual data stream. Likewise, mechanisms that allow
to turn resources on or off could become a target for attackers.
11. Acknowledgments
Acknowledgments will be added at a later stage.
12. Informative References (TBD)
[ahlgren2012survey]
Ahlgren, B., Dannewitz, C., Imbrenda, C., Kutscher, D.,
and B. Ohlman, "A survey of information-centric
networking", IEEE Communications Magazine Vol.50 No.7,
2012.
Clemm, et al. Expires January 12, 2023 [Page 15]
Internet-Draft July 2022
[alizadeh2010DCTCP]
Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel,
P., Prabhakar, B., Sengupta, S., and M. Sridharan, "Data
Center TCP (DCTCP)", ACM SIGCOMM pp.63-74, 2010.
[bianco2016energy]
Bianco, A., Mashayekhi, R., and M. Meo, "Energy
consumption for data distribution in content delivery
networks", IEEE International Conference on Communications
(ICC) pp.1-6, 2016.
[bolla2011energy]
Bolla, R., Bruschi, R., Davoli, F., and F. Cucchietti,
"Energy Efficiency in the Future Internet: A Survey of
Existing Approaches and Trends in Energy-Aware Fixed
Network Infrastructures", IEEE Communications Surveys and
Tutorials Vol.13 No.2, pp.223-244, 2011.
[cervero19]
Cervero, A. G., Chincoli, M., Dittmann, L., Fischer, A.,
and A. Garcia, "Green Wired Networks", Wiley Journal on
Large-Scale Distributed Systems and Energy
Efficiency pp.41-80, 2019.
[chabarek08]
Chabarek, J., Sommers, J., Barford, P., Tsiang, D., and S.
Wright, "Power awareness in network design and routing",
IEEE Infocom pp.457-465, 2008.
[GreenNet22]
Clemm, A. and C. Westphal, "Challenges and Opportunities
in Green Networking", 1st International Workshop on
Network Energy Efficiency in the Softwarization Era IEEE
NetSoft 2022, June 2022.
[herzen2011PIE]
Herzen, J., Westphal, C., and P. Thiran, "Scalable routing
easy as PIE: A practical isometric embedding protocol",
19th IEEE International Conference on Network Protocols
(ICNP) pp.49-58, 2011.
[islam2012evaluating]
Islam, S. U. and J. Pierson, "Evaluating Energy
Consumption in CDN Servers", Proceedings of the Second
International Conference on ICT as Key Technology against
Global Warming pp.64-78, 2012.
Clemm, et al. Expires January 12, 2023 [Page 16]
Internet-Draft July 2022
[krol2017NFaaS]
Krol, M. and I. Psaras, "NFaaS: Named Function as a
Service", ACM SIGCOMM ICN Conference , 2017.
[mathew2011energy]
Mathew, V., Sitaraman, R., and P. Shenoy, "Energy-Aware
Load Balancing in Content Delivery Networks", CoRR
http://arxiv.org/abs/1109.5641 , 2011.
[QUAL] Li, R., Makhijani, K., Yousefi, H., Westphal, C., Xong,
L., Wauters, T., and F. D. Turck, "A framework for
Qualitative Communications using Big Packet Protocol",
Proceedings ACM Sigcomm Workshop On Networking For
Emerging Applications And Technologies pp.22-28, 2019.
[ren2018jordan]
Ren, J., Ren, K., Westphal, C., Wang, J., Wang, J., Song,
T., Liu, S., and J. Wang, "JORDAN: A Novel Traffic
Engineering Algorithm for Dynamic Adaptive Streaming over
HTTP", IEEE International Conference on Computing,
Networking and Communications (ICNC) pp.581-587, 2018.
[RFC2481] Ramakrishnan, K. and S. Floyd, "A Proposal to add Explicit
Congestion Notification (ECN) to IP", RFC 2481,
DOI 10.17487/RFC2481, January 1999,
<https://www.rfc-editor.org/info/rfc2481>.
[RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
Label Switching Architecture", RFC 3031,
DOI 10.17487/RFC3031, January 2001,
<https://www.rfc-editor.org/info/rfc3031>.
[RFC3095] Bormann, C., Burmeister, C., Degermark, M., Fukushima, H.,
Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., Le,
K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro, K.,
Wiebke, T., Yoshimura, T., and H. Zheng, "RObust Header
Compression (ROHC): Framework and four profiles: RTP, UDP,
ESP, and uncompressed", RFC 3095, DOI 10.17487/RFC3095,
July 2001, <https://www.rfc-editor.org/info/rfc3095>.
[RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language",
RFC 7950, DOI 10.17487/RFC7950, August 2016,
<https://www.rfc-editor.org/info/rfc7950>.
[RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
Decraene, B., Litkowski, S., and R. Shakir, "Segment
Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
July 2018, <https://www.rfc-editor.org/info/rfc8402>.
Clemm, et al. Expires January 12, 2023 [Page 17]
Internet-Draft July 2022
[telefonica2020]
Telefonica, "Consolidated Management Report 2020", 2021.
[westphal2021qualitative]
Westphal, C., He, D., Makhijani, K., and R. Li,
"Qualitative Communications for Augmented Reality and
Virtual Reality", 22nd IEEE International Conference on
High Performance Switching and Routing (HPSR) pp.1-6,
2021.
[wolf2014choicenet]
Tilman, W., Griffioen, J., Calvert, L., Dutta, R.,
Rouskas, G., Baldin, I., and A. Nagurney, "ChoiceNet:
Toward an Economy Plane for the Internet", SIGCOMM
Computer Communciations Review Vol.44 No.3, July 2014.
Authors’ Addresses
Alexander Clemm
Futurewei
2330 Central Expressway
Santa Clara, CA 95050
USA
Email: ludwig@clemm.org
Cedric Westphal
Futurewei
Email: cedric.westphal@futurewei.com
Jeff Tantsura
Microsoft
Email: jefftant.ietf@gmail.com
Laurent Ciavaglia
Rakuten Mobile
Email: laurent.ciavaglia@rakuten.com
Marie-Paule Odini
Email: mp.odini@orange.fr
Clemm, et al. Expires January 12, 2023 [Page 18]
Network Working Group T. Eckert, Ed.
Internet-Draft Futurewei Technologies USA
Intended status: Informational M. Boucadair
Expires: 12 January 2023 Orange
P. Thubert
Cisco Systems, Inc.
J. Tentsura
Microsoft
11 July 2022
IETF and Energy - An Overview
draft-eckert-ietf-and-energy-overview-03
Abstract
This memo provides an overview of work performed by or proposed
within the IETF related to energy and/or green: awareness,
management, control or reduction of consumption of energy, and
sustainability as it related to the IETF.
This document is written to help those unfamiliar with the work but
interested in it, in the hope to raise more interest in energy-
related activities within the IETF, such as identifying gaps and
investigating solutions as appropriate.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 12 January 2023.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
Eckert, et al. Expires 12 January 2023 [Page 1]
Internet-Draft energy-overview July 2022
This document is subject to BCP 78 and the IETF Trust’s Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Energy Saving: An Introduction . . . . . . . . . . . . . . . 3
2.1. Digitization . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Energy Saving Through Scale . . . . . . . . . . . . . . . 4
2.2.1. An Example: Telephony . . . . . . . . . . . . . . . . 5
2.2.2. The Packet Multiplexing Principle . . . . . . . . . . 5
2.2.3. End-to-End Transport . . . . . . . . . . . . . . . . 5
2.2.4. Global vs Restricted Connectivity: The Internet Routing
Architectures . . . . . . . . . . . . . . . . . . . . 5
2.2.5. Freedom to Innovate . . . . . . . . . . . . . . . . . 6
2.2.6. End-to-End Encryption . . . . . . . . . . . . . . . . 6
2.2.7. Converged Networks . . . . . . . . . . . . . . . . . 6
2.2.7.1. IntServ and DetNet . . . . . . . . . . . . . . . 7
2.2.7.2. DiffServ . . . . . . . . . . . . . . . . . . . . 7
2.2.7.3. SIP . . . . . . . . . . . . . . . . . . . . . . . 7
3. Higher or New Energy Consumption . . . . . . . . . . . . . . 8
4. Some Notes on Sustainability . . . . . . . . . . . . . . . . 9
4.1. Follow the Energy Cloud Scheduling . . . . . . . . . . . 9
4.2. Minimize Generated Heat . . . . . . . . . . . . . . . . . 10
4.3. Heat Recovery . . . . . . . . . . . . . . . . . . . . . . 10
4.4. Telecollaboration . . . . . . . . . . . . . . . . . . . . 10
5. Energy Optimization in Specific Networks . . . . . . . . . . 12
5.1. Low Power and Lossy Networks (LLN) . . . . . . . . . . . 12
5.1.1. 6LOWPAN WG . . . . . . . . . . . . . . . . . . . . . 13
5.1.2. LPWAN WG . . . . . . . . . . . . . . . . . . . . . . 13
5.1.3. 6TISCH WG . . . . . . . . . . . . . . . . . . . . . . 13
5.1.4. 6LO WG . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.5. ROLL WG . . . . . . . . . . . . . . . . . . . . . . . 14
5.2. Constrained Nodes and Networks . . . . . . . . . . . . . 15
5.2.1. LWIG WG . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.2. CoRE and CoAP . . . . . . . . . . . . . . . . . . . . 15
5.2.3. Satellite Constellations . . . . . . . . . . . . . . 16
5.2.4. Devices with Batteries . . . . . . . . . . . . . . . 16
5.3. Sample Technical Enablers . . . . . . . . . . . . . . . . 17
5.3.1. (IP) Multicast . . . . . . . . . . . . . . . . . . . 17
5.3.1.1. Power Saving with Multicast . . . . . . . . . . . 17
5.3.1.2. Power Waste Through Multicast-based Service
Coordination . . . . . . . . . . . . . . . . . . . 18
5.3.1.3. Multicast Problems in Wireless Networks . . . . . 18
5.3.2. Sleepy Nodes . . . . . . . . . . . . . . . . . . . . 19
Eckert, et al. Expires 12 January 2023 [Page 2]
Internet-Draft energy-overview July 2022
5.4. (Lack of) Power Benchmarking Proposals . . . . . . . . . 20
6. Energy Management Networks . . . . . . . . . . . . . . . . . 21
6.1. Smart Grid . . . . . . . . . . . . . . . . . . . . . . . 21
6.2. Syncro Phasor Networks . . . . . . . . . . . . . . . . . 22
7. (Limited) Energy Management for Networks . . . . . . . . . . 23
7.1. Some Metrics . . . . . . . . . . . . . . . . . . . . . . 23
7.2. EMAN WG . . . . . . . . . . . . . . . . . . . . . . . . . 23
8. Power-awareness in Forwarding and Routing Protocols . . . . . 25
8.1. Power Aware Networks (PANET) . . . . . . . . . . . . . . 25
8.2. SDN-based Semantic Forwarding . . . . . . . . . . . . . . 26
8.3. Misc . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9. Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
10. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
11. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 27
12. Informative References . . . . . . . . . . . . . . . . . . . 28
Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 40
1. Introduction
This document summarizes work that has been proposed to or performed
within the IETF/IRTF. Particularly, it covers IETF/IRTF RFCs as well
as ISE RFCs and IETF/IRTF or individual submission drafts that where
abandoned for various reasons (e.g., lack of momentum, broad scope).
There are various aspects how a given work can relate to energy that
are classified into categories. Such a classification does not
attempt to propose a formal taxonomy but it is used for the sake of
better readability. Technologies are listed under a category that is
specifically significant, for example, by being most narrow.
This memo usually refers to the technologies by significant early RFC
or specific draft version, as opposed to the newest. This is
contrary to the common practice in IETF documents to refer to the
newest version. This is done because it allows readers to better
understand the historic timeline in which a specific technology was
introduced. Especially successful IETF technologies will have newer
RFC that updates such initial work.
2. Energy Saving: An Introduction
Technologies that simply save energy compared to earlier/other
alternatives are the broadest and most unspecific category. In this
memo such an energy saving simply refers to energy savings in some
unit of electricity, such as kWh and does not take other aspects into
account. See Section 4 for more details.
Eckert, et al. Expires 12 January 2023 [Page 3]
Internet-Draft energy-overview July 2022
2.1. Digitization
Digitization describes the transformation of processes from non or
less digital with networking to more digital with computer-
networking. For comparable process results, the digitized option is
often, but not always, less energy consuming. Consider for example
energy consumption in the evolution of messaging starting from postal
mail and overs telegrams and various other historic form to solutions
including e-mail utilizing for example the IETF "Simple Mail
Transport Protocol" (SMTP, [RFC822]), group communications utilizing
the IETF "Network News Transport Protocol" (NNTP, [RFC3977]) or the
almost infinite set of communication options built on top of the IETF
"HyperText Transport Protocol" (HTTP, [RFC2086] and successors) and
IETF "HyperText Markup Language" (HTML, [RFC1866] and successors).
Traditionally, digitization had only "incidental", but not
"intentional" relationship to energy consumption: If it saved energy,
this was not only not a target benefit, it was not even recognized as
one, until probably recently. Instead, the evolution was driven from
anything-but-energy benefits, but instead utility benefits such as
improved speed, functionality/flexibility, accessibility, scalability
and reduced cost.
In hindsight though, digitization through IETF technologies and
specifically the Internet will likely have the largest contribution
to energy saving amongst all the possible categories, but it is also
the hardest to pinpoint on any specific technology/RFC. Instead, it
is often a combination of the whole stack of deployed protocols and
operational practices that contributes to energy saving through
digitization. It is likely also the biggest overall energy saving
impact of all possible categories that relate IETF work to energy:
The Internet as well as all other TCP/IP networks are likely the
biggest energy saving development of the past few decades if only the
energy consumption of equivalent services is compared. On the other
hand, they are also the cause for the biggest new type of new energy
consumption because of all the new services introduced in the past
decades with the Internet and the hyper-scaling that the Internet
affords them.
2.2. Energy Saving Through Scale
Eckert, et al. Expires 12 January 2023 [Page 4]
Internet-Draft energy-overview July 2022
2.2.1. An Example: Telephony
In most cases, energy saving through the use of IETF protocols
compared to earlier (digitized or non digitized) solutions is purely
a result of the reduction in the energy cost per bit over the decades
in networking. For example, the energy consumption of digital voice
telephony through the IETF "Session Initiation Protocol" (SIP,
[RFC2543] and successors) can easily be assumed to be more energy
efficient on a per voice-minute basis than prior voice technologies
such as analog or digital "Time Division Multiplex" (TDM) telephony
solely because of this evolution in mostly device as well as
physical-layer and link-layer networking technologies.
2.2.2. The Packet Multiplexing Principle
Nevertheless, it is at the heart of the packet multiplexing model
employed by the IETF networking protocols IP ([RFC791]) and IPv6
([RFC1883] and successors) to successfully support this scaling that
brough down the cost per bit through ever faster links and network
nodes, especially for networks larger than building scale networks.
While the IETF protocols have not been the first or over their early
decades necessarily the most widely deployed packet networking
protocols, they where the ones who at least during the 1990th started
to break away from other protocols both in scale of deployment, as
well as in development of further technologies to support this
scaling.
2.2.3. End-to-End Transport
At the core of scalability, even up to now, is the lightweight per-
packet-processing enabled through end-to-end congestion loss
management architecture as embodied through the IETF "Transmission
Control Protocol" (TCP, [RFC793] and successors, e.g.
[I-D.ietf-tcpm-rfc793bis]). This model eliminated more expensive
per-hop, per-packet processing, such as would be required for
reliable hop-by-hop forwarding through per-hop ARQ, which was key to
scaling routers cost effectively.
2.2.4. Global vs Restricted Connectivity: The Internet Routing
Architectures
The meshed peer-to-peer and transitive routing of the Internet
enabled through the IETF "Border Gateway (Routing) Protocol (BGP,
[RFC4271] as well as predecessors) is another key factor to
successful scalability, because it enabled competitive market forces
to explore markets quickly.
Eckert, et al. Expires 12 January 2023 [Page 5]
Internet-Draft energy-overview July 2022
Prior to the Internet, the public often only had access to highly
regulated international networking connections through often per-
country monopoly regulated data networks.
2.2.5. Freedom to Innovate
(non-IP) networks often also did not allow as much "freedom-to-
innovate" (as it is often called in the IETF) for applications
running over it. Instead those networks where exploring the coupling
of packet transport with higher layer services to allow the network
operator some degree of revenue sharing with the services running on
top of it. Such approaches resulted not only in higher cost of those
services but also (likely) preferential and (often) exclusionary
treatment of network traffic not fitting the perceived highest
revenue service options.
2.2.6. End-to-End Encryption
When the same business practices where applied to IP network, it was
one of the key factors leading to the development of IETF end-to-end
encryption though protocols such as "Transport Layer Security" (TLS,
[RFC2246] and its successors). This further strengthened the ability
to scale service/applications at minimum additional cost for the
underlying packet transport, arguably driving innovation into ever
faster networking technology and likely lower cost per bit.
2.2.7. Converged Networks
Another key factor to support scaling where IETF technologies that
allowed to multiplex different types of traffic (e.g., realtime vs.
non-realtime) which previously used separate networks with typically
incompatible networking technologies.
Eliminating multiple physical networks with separate routing/
forwarding nodes and separate links affords significant energy
savings even at the same generation of speed and hence energy/bit
simply by avoiding the N-fold production and operations of equipment
and links. Of course, originally the CAPEX and OPEX of multiple,
technology-diverse networks and host-stacks was the core reason for
unified networks, and energy saving is in hindsight just incidental
(as for all other cases mentioned here).
Eckert, et al. Expires 12 January 2023 [Page 6]
Internet-Draft energy-overview July 2022
2.2.7.1. IntServ and DetNet
The first (non-IETF) wider adopted technology promising converged
networks was "Asynchronuous Transfer Mode" (ATM), which was designed
and deployed at the end of the 1980th to support specifically
multiplexing of "Data Voice and Video", where both Voice and Video
(at that time) required loss-free deterministic bounded latency and
low-jitter and had therefore their own Time-Division-Multiplex (TDM)
networks, both separate from so-called Data networks using packet
multiplexing. This technology was very expensive on a per-bit basis
due to its cell-forwarding nature though.
At the end of the 1980th, it was proven in [BOUNDED_LATENCY] that
variable length packet multiplexing in network can also support non-
NP-hard calculations for bounded latency. This lead to the IETF
"Integrated Services WG" (INTSERV) to support such guaranteed
throughput and bounded latency traffic via [RFC2212] - and to the
demise of ATM.
IntServ has so far seen little traction because it too got
superceeded as explained in the following section - for its original
use-cases (voice and video). However this type of services are being
revisited for a broader set of use-cases [RFC8575] in the DetNet WG,
which should enable even further network infrastructure convergence
for IoT and industrial markets.
2.2.7.2. DiffServ
Due to the much higher per-packet processing overhead of INSERV
versus standard (so-called Best-Effort) Internet traffic, the INTSERV
model was already recognized in the 1990th to not support highest-
scale at lowest cost, leading to the parallel development of the IETF
"Differentiated Services WG" (DIFFSERV) model defined in [RFC2475].
This has since then become the dominant technology to support
multiplexing of applications and services originally not designed for
the Internet onto a common TCP/IP network infrastructure,
specifically for voice and video over UDP ([RFC768]) including RTP
[RFC3550] and SIP.
2.2.7.3. SIP
SIP has most notably in the past two decades eliminated additional
network infrastructures previously required for (voice) telephony
services starting in the early 2000 with commercial/enterprise
deployments and today by removing even the option for any (non-IP/
SIP) analog or digital (ISDN) telephone service connection, instead
delivering those purely as services over adaptation interfaces on
home routers (TBD: Any RFC to cite for those tunneling/adaptation
Eckert, et al. Expires 12 January 2023 [Page 7]
Internet-Draft energy-overview July 2022
services ?).
3. Higher or New Energy Consumption
Digitized, network centric workflows may consume more energy than
their non-digitized counterpart, as may new network centric workflows
without easy to compare prior workflows.
In one type of instances, the energy consumption on a per-instance
basis is lower than in the non-digitized/non-Internet-digitized case,
but the total number of instances that are (Internet)-digitized is
orders of magnitudes larger than their alternative options, typically
because of their higher utility or lower overall cost.
For example, each instance of (simple text) email consumes less
energy than sending a letter or postcard. Even streaming a movie or
TV series consumes less energy than renting a DVD DVDvsStreaming
(https://www.smithsonianmag.com/science-nature/streaming-movie-less-
energy-dvd-180951586). Nevertheless, the total amount of instances
and in result energy consumption for email and streaming easily
outranks their predecessor technologies.
While these instances look beneficial from a simple energy
consumption metric, its overall scale and the resulting energy
consumption may in itself become an issue, especially when the energy
demand it creates risks to outstrip the possible energy production,
short term or long term. This concern is nowadays often raised
against the "digital economy", where the network energy consumption
is typically cited as a small contributor relative to its
applications, such as what is running in Data Centers (DC).
In other cases, the energy consumption of digitization requires often
significantly more than their pre-digitization alternatives. The
most well-known example of this are likely crypto-currencies based on
"proof-of-work" computations (mining), which on a per currency value
unit can cost 10..30 times or more of the energy consumed by for
example gold mining (very much depending on the highly fluctuating
price of the crypto-currency). Nevertheless, its overall utility
compared to such prior currencies or valuables makes it highly
successful in the market.
In general, the digital economy tends to be more energy intensive on
a per utility/value unit, for example by replacing a lot of manual
labor with computation), and/or it allows for faster growth of its
workflows.
Eckert, et al. Expires 12 January 2023 [Page 8]
Internet-Draft energy-overview July 2022
The lower the cost of network traffic, and the more easily accessible
everywhere network connectivity is, the more competitive and/or
successful most of these new workflows of the digital economy can be.
Given how TCP/IP based networks, especially the Internet have
excelled through their design principles (and success) in this
reduction of network traffic cost and ubiquitous access over the past
few decades, as outlined above, one can say that IETF technologies
and especially the Internet are the most important enabler of the
digital economy, and the energy consumption it produces.
4. Some Notes on Sustainability
Sustainability is the principle to utilize resources in a way that
they do not diminish or run out over the long term. Beyond the above
covered energy saving, sustainability relates with respect to the
IETF specifically to the use of renewable sources of energy to
minimize exhaustion of fossile resources, and the impact of IETF
technologies on global warming to avoid worsening living conditions
on the planet.
While there seems to be no IETF work specifically intending to target
sustainability (TBD: did we miss anything ?), the Internet itself can
similarly to how it does for digitization play a key role in building
sustainable networked IT infrastructures. The following subsections
list three examples areas where global high performance, low-cost
Internet networking is a key requirement.
4.1. Follow the Energy Cloud Scheduling
Renewable energy resources (except for water) do commonly have
fluctuating energy output. For example, solar energy output
correlates to night/day and strength of sunlight. Cloud Data Centers
(DC) consume a significant amount of the IT sectors energy. Some
workloads may simply be scheduled to consume energy in accordance
with the amount of available renewable energy at the time, not
requiring the network. Significant workloads are not elastic in
time, such as interactive cloud DC interactive work (cloud based
applications) or entertainment (gaming, etc.). These workloads may
be instantiated or even dynamically (over time) migrate to a DC
location with sufficient renewable energy and the Internet (or large
TCP/IP OTT backbone networks) will serve as the fabric to access the
remote DC and to coordinate the instantiation/migration.
Eckert, et al. Expires 12 January 2023 [Page 9]
Internet-Draft energy-overview July 2022
4.2. Minimize Generated Heat
The majority of energy in cloud DC is normally also wasted as exhaust
heat, requiring even more energy for cooling. The warmer the
location, the more energy needs to be spent for cooling. For this
reason, DC in cooler climates such as https://greenmountain.no/power-
and-cooling/ can help to reduce the overall DC energy consumption
significantly (independent of the energy being consumed in the DC to
be renewable itself). The Internet again plays the role of providing
access to those type of DC whole location is not optimized for
consumption but for sustainable generation of compute and storage.
4.3. Heat Recovery
Exhaust heat, especially from compute in DC, can be recovered when it
is coupled to heating systems ranging in size all the way from
individual familys home through larger buildings (hotels etc.) all
the way to district heating systems. A provider of such type of
compute-generated heat as a service can sell the compute capacity as
long as there is cost efficient network connectivity. "Cloud & Heat"
is an example company offering such infrastructures and services
https://www.cloudandheat.com/wp-content/
uploads/2020/02/2020_CloudHeat-Whitepaper-Cost-saving-Potential.pdf.
4.4. Telecollaboration
Telecollaboration has a long history in the IETF resulting in
multiple core technologies over the decades.
If one considers textual communications via email and netwnews (using
e.g.: NNTP) as early forms of Telecollaboration, then
telecollaboration history through IETF technology reaches back into
the 1980th and earlier.
Around 1990, the IETF work on IP Multicast (e.g.[RFC1112] and later)
enabled the first efficient forms of audio/video group collaboration
through an overlay network over the Internet called the MBone
https://en.wikipedia.org/wiki/Mbone which was also used by the IETF
for more than a decade to provide remote collaboration for its own
(in-person + remote participation) meetings.
Eckert, et al. Expires 12 January 2023 [Page 10]
Internet-Draft energy-overview July 2022
With the advent of SIP in the early 2000, commercial
telecollaboration started to be built most often on SIP based session
and application protocols with multiple IETF working groups
contributing to that protocol suite (TBD: how much more example/
details should we have here). Using this technology and the
Internet, the immersive nature of telecollaboration was brought to
life-size video, was/is called Telepresence
https://en.wikipedia.org/wiki/Telepresence and later to even more
immersive forms such as AR/VR telecollaboration.
In 2011, the IETF opened the "Real-Time Communication in WEB-
browsers" (RTCWEB) WG, that towards the end of that decade became the
most widely supported cross-platform standard for hundreds of
commercial and free tele-collaboration solutions, including Cisco
Webex, which is also used by the IETF itself, Zoom and the new IETF
collaboration suite MeetEcho (TBD: good references here ?).
While the various forms of Telecollaboration are mostly instances of
digitization, they are discussed under sustainability because of its
comparison to in-person travel that is not based on simple comparison
of energy, but nowadays by comparing their impact on global warming,
a key factor to sustainability.
Telecollaboration was primarily developed because of the utility for
the participants - to avoid travel for originally predominantly
business communications/collaborations. It saw an extreme increase
in use (TBD: references) in the Corona Crisis of 2019, when
especially international travel was often prohibited, and often even
working from an office. This forced millions of people to work from
home and utilizing commercial telecollaboration tools. It equally
caused most in-person events that where not cancelled to be moved to
a telecollaboration platform over the Internet - most of them likely
relying on RTCWEB protocols.
Actual energy consumption related comparison between teleconferencing
and in-person travel is complex but since the last decades is
commonly based on calculating some form of CO2 emission equivalent of
the energy consumed, hence comparing not simply the energy
consumption, but weighing it by the impact the energy consumption has
on one of the key factors (CO2 emission) known to impact sustainable
living conditions.
[VC2014] is a good example of a comparison between travel and
telecollaboration taking various factors into account and using CO2
emission equivalents as its core metric. That paper concludes that
carbon/ energy cost of telecollaboration could be as little as 7% of
an in-person meeting. in-person meeting. Those numbers have various
assumptions and change when time-effort of participants is converted
Eckert, et al. Expires 12 January 2023 [Page 11]
Internet-Draft energy-overview July 2022
to carbon/energy costs. These numbers should even be better today in
favor of telecollaboration: cost of Internet traffic/bit goes down
while cost of fossile fuel for travel goes up.
Recently, air travel has also come under more scrutiny because the
greenhouse gas emissions of air travel at the altitudes used by
commercial aviation has been calculated to have a higher global
warming impact than simply the amount of CO2 used by the air plane if
it was exhausted at surface level. One publicly funded organization
offering carbon offset services calculates a factor 3 of the CO2
consumption of an air plane
https://www.atmosfair.de/de/fliegen_und_klima/flugverkehr_und_klima/
klimawirkung_flugverkehr/.
In summary: Telecollaboration has a higher sustainability benefit
compared to travel than just the comparison of energy consumption
because of the higher challenge to use renewable energy in
transportation than in networking, and this is most extreme in the
case of telecollaboration that replaces air travel because of the
even higher global warming impact of using fossile fuels in air
travel.
5. Energy Optimization in Specific Networks
5.1. Low Power and Lossy Networks (LLN)
Low Power and Lossy Networks are networks in which nodes and/or radio
links have constraints. Low power consumption constraints in nodes
often originate from the need to operate nodes from as long as
possible from battery and/or energy harvesting such as (today most
commonly) solar panels associated with the node or ambient energy
such as energy harvesting from movement for wearable nodes or piezo
cells to generate energy for mechanically operated nodes such as
switches.
Several IETF WGs have or are producing work is primarily intended wo
support LLN through multiple layers of the protocol stack. [RFC8352]
gives a good overview of the energy consumption related communication
challenges and solutions produced by the IETF for this space.
Eckert, et al. Expires 12 January 2023 [Page 12]
Internet-Draft energy-overview July 2022
To minimize the energy needs for such nodes, their network data-
processing mechanisms have to be optimized. This includes packet
header compression, fragmentation (to avoid latency through large
packets at low bitrates, packet bundling to only consume radio energy
at short time periods, radio energy tuning to just reach the
destination(s), minimization of multicasting to eliminate need of
radio receivers to consume energy and so on. [RFC8352] gives a more
detailed overview, especially because different L2 technologies such
as IEEE 802.15.4 type (low power) wireless networks, Bluetooth Low
Energy (BLE), WiFi (IEEE 802.11) and DEC ULE.
In the INTernet area of the IETF, several LLN specific WGs exist(ed):
5.1.1. 6LOWPAN WG
The "IPv6 over Low power WPAN (Wireless Personal Area Networks)"
(6lowpan) WG ran from 2005 to 2014 and produced 6 RFC that adopt IPv6
to IEEE 802.15.4 type (low power) wireless networks by transmission
procedures [RFC4949], compression of IPv6 (and transport) packet
headers [RFC6282], modifications for neighbor discovery (ND)
[RFC6775], as well as 3 informational RFCs about the WPAN space and
applying IPv6 to it. "Transmission of IPv6 Packets over IEEE
802.15.4 Networks" [RFC4944], "Compression Format for IPv6 Datagrams
over IEEE 802.15.4-Based Networks" [RFC6282], "Neighbor Discovery
Optimization for IPv6 over Low-Power Wireless Personal Area Networks
(6LoWPANs)" [RFC6775] (6LOWPAN-ND).
5.1.2. LPWAN WG
Since 2014, the "IPv6 over Low Power Wide-Area Networks" (LPWAN) WG
has produced 4 RFC for low-power wide area networks, such as LoRaWAN
https://en.wikipedia.org/wiki/LoRa, with three standards, [RFC8724],
[RFC8824], [RFC9011].
5.1.3. 6TISCH WG
Since 2013, the "IPv6 over the TSCH mode of IEEE 802.15.4e" (6tisch)
WG has produced 7 RFC for a version of 802.15.4 called the "Time-
Slotted Channel Hopping Mode" (TSCH), which supports deterministic
latency and lower energy consumption through the use of scheduling
traffic into well defined time slots, thereby also optimizing/
minimizing energy consumption when compared to 802.15.4 without TSCH.
Eckert, et al. Expires 12 January 2023 [Page 13]
Internet-Draft energy-overview July 2022
5.1.4. 6LO WG
Since 2013, the "IPv6 over Networks of Resource-constrained Nodes"
(6lo) WG has generalized the work of 6lowpan for LLN in general,
producing 17 RFC for IPv6-over-l2foo adaptation layer specifications,
information models, cross-adaptation layer specification (such as
header specifications) and maintenance and informational documents
for other pre-existing IETF work in this space.
5.1.5. ROLL WG
In the RouTinG (RTG) area of the IETF, the "Routing Over Low power
and Lossy networks" (ROLL) WG has produced since 2008 23 RFC.
Initially it produced requirement RFCs of different type of "Low-
power and Lossy Networks": urban: [RFC5548], industrial [RFC5673],
home automation [RFC5826] and building automation [RFC5867].
Since then its work is mostly focused on the "IPv6 Routing Protocol
for Low-Power and Lossy Networks" (RPL) [RFC6550], which is used in a
wide variety of the above described IPv6 instances of LLN networks
and which are discussed in two ROLL applicability statement RFCs,
"Applicability Statement: The Use of the Routing Protocol for Low-
Power and Lossy Networks (RPL) Protocol Suite in Home Automation and
Building Control" [RFC7733] and "Applicability Statement for the
Routing Protocol for Low-Power and Lossy Networks (RPL) in Advanced
Metering Infrastructure (AMI) Networks" [RFC8036].
The ROLL WG also wrote a more generic RFC for LLN, "Terms Used in
Routing for Low-Power and Lossy Networks" [RFC7102]. RPL has a
highly configurable set of functions to support (energy) constrained
networks. Unconstrained root node(s), typically edge routers between
the RPL network and a backbone network calculate "Destination-
Oriented Directed Acyclic Graphs" (DODAG) and can use strict hop-by-
hop source routing with dedicated IPv6 routing headers [RFC9008] to
minimize constrained nodes routing related compute and memory
requirements. "The Trickle Algorithm" [RFC6206] allows to minimize
routing related packets through automatic lazy updates. While RPL is
naturally a mesh network routing protocol, where all nodes are
usually expected to be able to participate in it, RPL also supports
even more lightweight leave nodes [RFC9010].
Eckert, et al. Expires 12 January 2023 [Page 14]
Internet-Draft energy-overview July 2022
The 2013 [I-D.ajunior-energy-awareness-00] proposes the introducing
of energy related parameters into RPL to support calculation/
selection of most energy efficient paths. The 2017 "An energy
optimization routing scheme for LLSs",
[I-D.wang-roll-energy-optimization-scheme] observed that DODAGs in
RPL tend to require more energy in nodes closer to the root and
proposed specific optimizations to reduce this problem. Neither of
these drafts proceeded in the IETF.
While original use-cases for RPL where energy and size limited
networks, its design is to a large extend not scale limited. Because
of this, and due to its reduced compute/memory requirements for the
same size networks compared to other routing protocols, especially
the so-called link-state "Interior Gateway routing Protocols" (IGP),
such as most commonly used protocols ISIS [RFC1142] and OSPF
[RFC2328], RPL has also proliferated into use-cases for non-
constrained networks, for example to support the largest possible
networks automatically, such as in [RFC8994].
5.2. Constrained Nodes and Networks
(Power) constrained nodes and/or networks exist in a much broader
variety than coupled with low-power and lossy networks. For example
WiFi and mobile network connections are not considered to be lossy
networks, and personal mobile nodes with either connections are order
of magnitude less constrained than nodes typically attached to LLN
network. Therefore, broader work in the IETF than focused primarily
on LLN typically uses just the term lightweight or constrained (nodes
and networks).
5.2.1. LWIG WG
Since 2013, the "Light-Weight Implementation Guidance" (lwig) WG is
has produced 6 informational RFC on the groups subject, much of which
indirectly supports implementing power efficient network
implementations via lightweight nodes/links, but it also addressed
the topic explicitly including via the aforementioned [RFC8352] and
[RFC9178], "Building Power-Efficient Constrained Application Protocol
(CoAP) Devices for Cellular Networks".
5.2.2. CoRE and CoAP
In the APPlication (APP) area of the IETF, the "Constrained RESTful
Environments" (core) WG has produced since 2010 21 RFC, most of them
for or related to "The Constrained Application Protocol" (CoAP)
[RFC6690], which can best be described as a replacement for HTTP for
constrained environment, using UDP instead of TCP and DTLS instead of
TLS, compact binary message formats instead of human readable textual
Eckert, et al. Expires 12 January 2023 [Page 15]
Internet-Draft energy-overview July 2022
formats, RESTful message exchange semantic instead of a broader set
of options (in HTTP), but also more functionality such as (multicast)
discovery and directory services, therefore providing a more
comprehensive set of common application functions with more compact
on-the-wire/radio encoding than its unconstrained alternatives.
"Object Security for Constrained RESTful Environments" (OSCORE),
[RFC8613] is a further product of the CoRE WG providing a more
message layer based, more lightweight security alternative to DTLS.
While originally designed for LLN, CoAP is transcending LLN and
equally becoming standards in unconstrained environments such as
wired/ethernet industrial Machine 2 Machine (M2M) communications,
because of simplicity, flexibility and relying on the single set of
protocols supporting the widest range of deployment scenarios.
In the SECurity (SEC) area of the IETF, the "Authentication and
Authorization for Constrained Environments" (ace) working group has
since 2014 produced 4 RFC for security functions in constrained
environments, for example CoAP based variations of prior HTTPS
protocols such as EST-coaps [RFC9148] for HTTPS based EST [RFC7030].
Constrained node support in cryptography especially entails support
for Elliptic Curve (EC) public keys due to their shorter key sizes
and lower compute requirements compared to RSA public keys with same
cryptographic strength. While the benefits of EC over RSA where
making them preferred, this "additional market space" (constrained
node) benefit helped in their faster market proliferation even beyond
constrained networks.
5.2.3. Satellite Constellations
Emerging communication infrastructures may have specific requirements
on power consumption. Such requirements should be taken into account
when designing/customizing techniques (e.g., routing) to be enabled
in such networks. For example,
[I-D.lhan-problems-requirements-satellite-net] identifies a set of
requirements (including power) for satellite constellations.
5.2.4. Devices with Batteries
Many IETF protocols (e.g., [RFC3948]) were designed to accommodate
the presence of middleboxes mainly by encouraging clients to issue
frequent keepalives. Such strategy has implication on battery-
supplied devices. In order to optimize battery consumption for such
devices, [RFC6887] specifies a deterministic method so that client
can control state in the network, including their lifetime.
Keepalive alive messages may this be optimized as a function of the
network policies.
Eckert, et al. Expires 12 January 2023 [Page 16]
Internet-Draft energy-overview July 2022
A_REC#2 of [RFC7849] further insist on the importance of saving
battery exacerbated by keep-alive messages and recommends the support
of collaborative means to control state in the network rather than
relying on heuristics.
5.3. Sample Technical Enablers
5.3.1. (IP) Multicast
5.3.1.1. Power Saving with Multicast
IP Multicast was introduced with [RFC1112] and today also called "Any
Source Multicast" (ASM) has various protocols standardized in the
IETF across multiple working groups. There are also MPLS and BIER
multicast protocols from the IETF developed in the equally named WGs.
These three, network layer multicast technologies can be a power
saving technologies when used to distribute data because they reduce
the number of packets that need to be sent across the network
(through in-network-replication where needed). Because most current
link and router technologies do not allow to actually save
significant amounts of energy on lower than maximum utilization,
these benefits are often only theoretical though. Software routers
are the ones most likely to expose energy consumption somewhat
proportional to their throughput for just the forwarding (CPU) chip.
Likewise, in large backbone networks, IP multicast can free up
bandwidth to be used for other traffic, such as unicast traffic,
which may allow to avoid upgrades to faster and potentially more
power consuming routers/links. Today, these benefits too are most
often overcompensated for by lower per-bit energy consumption of
newer generations of routers and links though.
Multicasting can also save energy on the transmitting station across
radio links, compared to replicated unicast traffic, but this is
rarely significant, because except for fully battery powered mesh
network, there are typically non-energy-constrained nodes, such as
(commonly) the wired access-points in WiFi networks.
In result, today multicasting has typically no significant power
saving benefits with available network technologies. Instead it is
used (for data distribution) when the amount of traffic that a
unicast solution alternative (with so-called ingress replication) is
not possible due to the total amount of traffic generated. This
includes wireless/radio networks, where equally airtime is the
limiting factor.
Eckert, et al. Expires 12 January 2023 [Page 17]
Internet-Draft energy-overview July 2022
5.3.1.2. Power Waste Through Multicast-based Service Coordination
(IP) multicast is often not used to distribute data requested by
receivers, but also coordination type functions such as service or
resource announcement, discovery or selection. These multicast
messages may not carry a lot of data, but they cause recurring, often
periodic packets to be sent across a domain and waste energy because
of various ill-advised designs, including, but not limited to the
following issues:
(a) The receivers of such packets may not even need to receive them,
but the protocol shares a multicast group with another protocol that
the client does need to receive.
(b) The receiver should not need to receive the packet as far as
multicast is concerned, but the underlying link-layer technology
still makes the receiver consume the packet at link-layer.
(c) The information received is not new, but just periodically
refreshed.
(d) The packet was originated for a service selection by a client,
and the receiving device is even responding, but the client then
chooses to select another device for the service/resource.
These problems are specifically problematic in the presence of so-
called "sleepy" nodes Section 5.3.2 that need to wake up to receive
such packets (unnecessarily). It is worse, when the network itself
is an LLN network where the forwarders themselves are power
constrained and for example periodic multicasting of such
coordination packets wastes energy on those forwarders as well -
compared to better alternatives.
In 2006, the IETF standardized "Source Specific Multicast" (SSM)
[RFC4607], a variation of IP Multicast that does not allow to perform
these type of coordination functions but is only meant for (and
useable for) actual data distribution. SSM was introduced for other
reasons than the above-described power related issues though, but
deprecating the use of ASM is one way to avoid/minimize its ill-
advised use with these type of coordination functions, when energy
efficiency is an issue. [RFC8815] is an example for deprecating ASM
for other reasons in Service Provider networks.
5.3.1.3. Multicast Problems in Wireless Networks
[RFC9119] covers multicast challenges and solutions (proposals) for
IP Multicast over Wi-Fi. With respect to power consumption, it
discusses the following aspects:
Eckert, et al. Expires 12 January 2023 [Page 18]
Internet-Draft energy-overview July 2022
(a) Unnecessary wake-up of power constrained Wi-Fi Stations (STA)
nodes can be minimized by wireless Access Points (APs) that buffer
multicast packets so they are sent only periodically when those nodes
wake up.
(b) WiFi access points with "Multiple Input Multiple Output" (MIMO)
antenna diversity focus sent packets in a way that they are not
"broadcast" to all receivers within a particular maximum distance
from the AP, making WiFi multicast transmission even less desirable.
(c) It lists the most widely deployed protocols using aforementioned
coordination via IP multicast and describes their specific challenges
and possible improvements.
(d) Existing proprietary conversion of WiFi multicast to Wi-Fi
unicast packets.
[I-D.desmouceaux-ipv6-mcast-wifi-power-usage] focuses on IPv6-related
concerns of multicast traffic in large wireless network. This
document provides as set of statistics and the induced device power
consumption of such flows.
5.3.2. Sleepy Nodes
Sleepy nodes are one of the most common design solutions in support
of power saving. This includes LLN level constrained nodes, but also
nodes with significant battery capacity, such as mobile phones,
tablets and notebooks, because battery lifetime has long since been a
key selling factor. In result, vendors do attempt to optimize power
consumption across all hardware and software components of such
nodes, including the interface hardware and protocols used across the
nodes WiFi and mobile radios.
Restating from [I-D.bormann-core-roadmap-05]: CoAP has basic support
for sleepy nodes by allowing caching of resource information in (non-
sleepy) proxy nodes. [RFC7641] enhances this support by enabling
sleepy nodes to update caching intermediaries on their own schedule.
Around 2012/2013, there was significant review of further review of
further support for sleepy nodes in CoAP, resulting in a long list of
drafts, whose sleepy nodes benefits are discussed in
[I-D.bormann-core-roadmap-05]: [I-D.vial-core-mirror-server],
[I-D.vial-core-mirror-proxy], [I-D.fossati-core-publish-option],
[I-D.giacomin-core-sleepy-option], [I-D.castellani-core-alive],
[I-D.rahman-core-sleepy-problem-statement], [I-D.rahman-core-sleepy],
[I-D.rahman-core-sleepy-nodes-do-we-need],
[I-D.fossati-core-monitor-option]. None of these drafts proceeded
though.
Eckert, et al. Expires 12 January 2023 [Page 19]
Internet-Draft energy-overview July 2022
One partial solution to some sleepy node issues related to their
energy consumption, especially the ones caused by the use of
multicast Section 5.3.1.2, Section 5.3.1.3 is the use of the
"Constrained RESTful Environments (CoRE) Resource Directory" (CoRE-
RD) [RFC9176]. It allows for sleepy nodes to register discover and
register resources via unicast and avoids waking up sleepy nodes when
they are not selected by a resouce consumer.
An partial alternative to CoRE-RD is the "DNS-Based Service
Discovery" {DNS-SD} [RFC6763] combined with for example "Service
Registration Protocol for DNS-Based Service Discovery"
[I-D.ietf-dnssd-srp]. Services can be seen as a subset of resources,
and in networks where DNS has to be supported anyhow for other
reasons, DNS-SD may be a sufficient alternative to CoRE-RD. It is
used for example in Thread https://en.wikipedia.org/wiki/
Thread_(network_protocol) for this purpose and the only multicast
based coordination is the one to establish network wide parameters,
such as the address(es) of DNS-SD server(s).
"Building Power-Efficient Constrained Application Protocol (CoAP)
Devices for Cellular Networks" [RFC9178] discusses sleepy devices,
especially the use of CoAP PubSub [I-D.ietf-core-coap-pubsub] as a
mechanism to build proxies for sleepy devices. "Sensor Measurement
Lists (SenML)", Standardized proxy infrastructures are best built
with standard data models, such as "Sensor Measurement Lists" (SenML)
[RFC8428] for sensors, likely the largest number of sleepy devices,
especially in LLN.
"Reducing Energy Consumption of Router Advertisements", [RFC7772]
eliminates/reduces the energy impact for sleepy nodes of the
ubiquitous IPv6 "Neighbor Discovery" (ND) protocol by giving
recommends for replacing multicast "Router Advertisement" (RA)
messages with so-called directed unicast versions, therefore not
waking up sleepy nodes (with an IP multicast RA message). This was
already allowed in ND [RFC4861], but not recommended as the default.
Note that [RFC7772] does not provide all the energy related
optimizations of ND as developed by 6LoWPAN through [RFC6775].
[I-D.chakrabarti-nordmark-energy-aware-nd] proposes generalizations
for those applications for to all IPv6 links, but was not further
pursued by the IETF so far.
5.4. (Lack of) Power Benchmarking Proposals
[I-D.petrescu-v6ops-ipv6-power-ipv4] presented some measurement
results of the power consumption when using IPv6 vs IPv4 with a focus
on mobile devices. Such measurements are not backed with formal
benchmarking methodologies so that solid and reliable references are
set to compare and interpret data.
Eckert, et al. Expires 12 January 2023 [Page 20]
Internet-Draft energy-overview July 2022
https://www.ietf.org/proceedings/103/slides/slides-103-saag-iot-
benchmarking-00 presented a benchmark example but with a focus on
power cost of encryption.
6. Energy Management Networks
Use of IETF protocol networks in networks that operate power
consumption and production is another broad area of digitization.
6.1. Smart Grid
"Smart Grid" is the most well-known instance of such energy
management networks. According to https://en.wikipedia.org/wiki/
Smart_grid, the term covers aspects mostly centered around
intelligent measured and controlled consumption of energy. This
includes "Advanced Metering Infrastructure" / "Smart Meters", remote
controllable "distribution boards", "circuit breakers", "load
control" and "smart appliances". Use cases for the "Smart Grid"
include for example timed and measured operations of home devices
such as washers or charging cars, when energy consumption is below
average.
The 2011 "Internet Protocols for the Smart Grid" [RFC6272] is a quite
comprehensive (66 page) overview of all IETF protocols considered to
be necessary or beneficial for Smart Grid networks. This document
was written in response to interest by the (not-yet-smart grid)
community in utilizing the IETF TCP/IP technologies to evolve
previously non-TCP/IP network, and the risk that unnecessary
reinvention of the wheel/protocols would be done by that community
instead of reusing what was already well specified by the IETF.
Most of the overview in this document is not specific to networks
used for Smart Grid applications but just summarized in the document
for the above described outreach and education to the community. The
aspects most specific to Smart Grids is the back in 2011 still
somewhat in its infancy adaptation of IPv6 network technologies to
LLN networks (see Section 5.1 below): smart meters, circuit breakers,
load measurement devices, car chargers and so on - all those devices
would most likely be connected to the network via a low-power radio
networks, which ideally would utilize IPv6 directly. Support for LLN
networks with IPv6 has well improved in IETF specifications in the
past decade.
Eckert, et al. Expires 12 January 2023 [Page 21]
Internet-Draft energy-overview July 2022
6.2. Syncro Phasor Networks
Power output of multiple power plants/generators into the same power
grid needs to be synchronized by power levels based on consumption
and power phase (50/60Hz depending on continent) to avoid that energy
created out-of-phase is not only wasted, but would actually burn out
power lines or create permanent damage in power generators. When
generators go out-of-sync, they have to be emergency switched off,
resulting in (rolling-)blackouts, worsening the conditions beyond its
likely root-cause such as a single overloaded limited region.
Syncro Phasor Networks are networks whose goal it is to support
synchronization of power generators across a power grid, ultimately
also permitting to build larger and more resilient power grids.
"Power Measurement Units" (PMU) are their core sensoring elements.
Since about 2012? these networks have started to move from
traditional SCADA towards more TCP/IP based networking and
application technologies "to improve power system reliability and
visibility through wide area measurement and control, by fostering
the use and capabilities of synchrophasor technology"
(www.naspi.org).
With their fast control loop reaction time and measurement
requirements, they also benefit from reliable, fast propagation of
PMU data as well as stricter clock synchronization than most Smart
Grid applications. For example, transmission lines expand under heat
that s caused by electrical load and/or environmental temperature by
as much as 30% (between coldest and hottest or highest-load times),
impacting the necessary phase relationship of power generation on
either end (speed of light propagation speed based on effective
length of contracted/expanded wire).
The length of transmission wires can be measured from data sent
across the transmission lines and measuring their propagation latency
with the help of accurate clock synchronization between sender and
receiver(s), using for example network-based clock synchronization
protocols. The IETF "Network Time Protocol version 4" (NTPv4),
[RFC5905] is one option for this. The IEEE PTP protocol is often
preferred though because it specifies better how measurements can be
integrated at the hardware level of Ethernet interfaces, thus
allowing easier to achieve higher accuracy, such as Maximum Time
Interval (MTIE) errors of less than 1 msec. See for example
[NASPICLOCK].
The "North American Syncro Phasor Initiative" (NASPI),
https://www.naspi.org is an example organization in support of syncro
phasor networking. It is an ongoing project by the USA "Department
of Energy" (DoE).
Eckert, et al. Expires 12 January 2023 [Page 22]
Internet-Draft energy-overview July 2022
7. (Limited) Energy Management for Networks
7.1. Some Metrics
A 2010-2013 draft [I-D.manral-bmwg-power-usage], which was not
adopted discussed and proposed metrics for power consumption that
where intended to be used for benchmarking.
The later work in Section 7.2 referred instead to other metrics for
measuring power consumption from other SDOs.
A 2011-2012 draft [I-D.jennings-energy-pricing], which was not
adopted, discusses and proposes a data model to communicate time-
varying cost of energy in support of enabling time-shifting of
network attached or managed equipment consumption of power.
7.2. EMAN WG
While the IETF did specify a few MIBs with aspects related to of
power management, it was only with the formation of the "Energy
Management" (EMAN) WG which ran from 2010 to 2015 and released 7 RFC,
that the IETF produced a comprehensive set of MIB based standards for
managing energy/power for network equipment and associated devices
and integrated prior scattered power management related work in the
IETF.
EMAN produced (solely) a set of data/information models (MIBs). It
does not introduce any new protocol/stacks nor does it address
"questions regarding Smart Grid, electricity producers, and
distributors" (from [RFC7603]).
[I-D.claise-power-management-arch] describes the initial EMAN
architecture as envisioned by some of the core contributors to the
WG. It was rewritten in EMAN as the "Energy Management Framework"
[RFC7326]. "Requirements for Energy Management" are defined in
[RFC6988].
According to [RFC7326], "the (EMAN) framework presents a physical
reference model and information model. The information model
consists of an Energy Management Domain as a set of Energy Objects.
Each Energy Object can be attributed with identity, classification,
and context. Energy Objects can be monitored and controlled with
respect to power, Power State, energy, demand, Power Attributes, and
battery. Additionally, the framework models relationships and
capabilities between Energy Objects."
Eckert, et al. Expires 12 January 2023 [Page 23]
Internet-Draft energy-overview July 2022
One category of use-cases of particular interest to network equipment
vendors was and is the management of "Power over Ethernet" via the
EMAN framework, measuring and controlling ethernet connected devices
through their PoE supplied power. Besides industrial, surveillance
cameras and office equipment, such as WiFi access points and phones,
PoE is also positioned as a new approach for replacing most in-
building automation components including security control for doors/
windows, as well as environmental controls and lighting through the
use of an in-ceiling, PoE enabled IP/ethernet infrastructure.
EMAN produced version 4 of the "Entity MIB" (ENTITY-MIB) [RFC6933],
primarily to introduce globally unique UUIDs for physical entities
that allows to better link across different entities, such as a PoE
port on an ethernet switch and the device connected to that switch
port.
The "Monitoring and Control MIB for Power and Energy" [RFC7460]
specifies a MIB for monitoring for Power State and energy consumption
of networked. The document discusses the link with other MIBs such
as the ENTITY-MIB, the ENTITY-SENSOR-MIB [RFC3433] for which it is
amending missing accuracy information to meet IEC power monitoring
requirements, the "Power Ethernet MIB" (POWER-ETHERNET-MIB) [RFC3621]
to manage PoE, and the pre-existing IETF MIB for Uninterruptable
Power Supplies (UPS) (UPS-MIB) [RFC1628], allowing for example to
build control systems that manage shutdowns of devices in case of
power failure based on UPS battery capacity and device consumptions/
priorities. Similarly, the EMAN "Definition of Managed Objects for
Battery Monitoring" [RFC7577] defines objects to support battery
monitoring in managed devices.
The pre-existing IETF "Entity State MIB" (ENTITY-STATE-MIB) [RFC4268]
allows to specify the operational state of entities specified via the
ENTITY-MIB respective to their power consumption and operational
capabilities (e.g.: "coldStandby", "hotStandby", "ready" etc.).
Devices can also act as proxies to provide a MIB interfaces for
monitoring and control of power for other devices, that may use other
protocols, such as in case of a home gateway interfacing with various
vendor specific protocols of home equipment.
The EMAN "Energy Object Context MIB" [RFC7461] defines the ENERGY-
OBJECT-CONTEXT-MIB and IANA-ENERGY-RELATION-MIB, both of which serve
to "address device identification, context information, and the
energy relationships between devices" according to [RFC7461].
To automatically discover and negotiate PoE power consumption between
switch and client, non-IETF technologies, such as IEEE "Link Layer
Discovery Protocol" (LLDP) and proprietary MIBs for it, such as LLDP-
EXT-MED-MIB can be used.
Eckert, et al. Expires 12 January 2023 [Page 24]
Internet-Draft energy-overview July 2022
Finally, the "Energy Management (EMAN) Applicability Statement"
[RFC7603] provides an overview of EMAN with a user/operator
perspective, also reviewing a range of typical scenarios it can
support as well as how it could/can link to a variety of pre-
existing, non-IETF standards relevant for power management. Such
intended applicability includes home, core, and DC networks.
There are currently no YANG equivalent modules. Such modules would
not only be designed to echo the EMAN MIBs but would also allow to
control dedicated power optimization engines instead of relying upon
static and frozen vendor-specific optimization.
8. Power-awareness in Forwarding and Routing Protocols
8.1. Power Aware Networks (PANET)
In 2013/2014, some drafts proposed how networks themselves,
specifically those of Internet Service Providers (ISP) could become
"power aware" to the extent that its power consumption could be
regulated (or self-regulate) based on the current required
performance of the network and/or available power, by reducing excess
(or too power expensive) network capacity through switching-off/low-
powering components such as redundant routers, linecards, interfaces
or links, or reducing power consumption by reducing bitrates on
links.
The 2013 "Power-Aware Networks (PANET): Problem Statement"
[I-D.zhang-panet-problem-statement] gives an overview of this
concept, and so does "Power-aware Routing and Traffic Engineering:
Requirements, Approaches, and Issues", [I-D.zhang-greennet] from the
same year.
The 2014 [I-D.retana-rtgwg-eacp] exemplifies the concept and
discusses key challenges such as the reduced resilience against
errors when redundant components are switched off, the risk of
increased stretch (path length) and therefore latency under partial
network component shutdown or downspeeding, as well as the idea of
saving energy through (periodic) microsleeps such as possible with
"Energy Efficient Ethernet" https://en.wikipedia.org/wiki/Energy-
Efficient_Ethernet links. The 2013 draft "Reducing Power Consumption
using BGP with power source data",
[I-D.mjsraman-panet-inter-as-power-source] proposed BGP attributes to
allow calculation of power efficient (or for example green) paths.
One core market driver for this work where rolling blackouts that
especially affected India at the time of these drafts, raising the
desire to be for example reducing the total power consumption of a
network in times of such energy emergencies.
Eckert, et al. Expires 12 January 2023 [Page 25]
Internet-Draft energy-overview July 2022
While there was technical interest in the IETF, the market
significance for the vendors mostly present in the IETF was
considered as not to be important enough. Likewise, traditional
routers, unlike for example todays standard PC hardware designs do
exhibit little power savings upon shutdown of components such as
line-cards or interfaces.
In addition, an SDN / controller-based solution where relatively in
their infancy back in 2013/2014, and technologies that would allow
for SDN controller to have resilient (self-healing) connectivity such
as described in [RFC8368]/[RFC8994] was also not available, making
the risk of severely impacting network reliability one of the key
factors for this PANET work to not proceed so far.
8.2. SDN-based Semantic Forwarding
Recently, [I-D.boucadair-irtf-sdn-and-semantic-routing] provided the
following feature as an examples of capabilities that can be offered
by appropriate control of forwarding elements:
Energy-efficient Forwarding: An important effort was made in the past
to optimize the energy consumption of network elements. However,
such optimization is node-specific and no standard means to optimize
the energy consumption at the scale of the network have been defined.
For example, many nodes (also, service cards) are deployed as
backups.
A controller-based approach can be implemented so that the route
selection process optimizes the overall energy consumption of a path.
Such a process takes into account the current load, avoids waking
nodes/cards for handling "sparse" traffic (i.e., a minor portion of
the total traffic), considers node-specific data (e.g., [RFC7460]),
etc. This off-line Semantic Routing approach will transition
specific cards/nodes to "idle" and wake them as appropriate, etc.,
without breaking service objectives. Moreover, such an approach will
have to maintain an up-to-date topology even if a node is in an
"idle" state (such nodes may be removed from adjacency tables if they
don’t participate in routing advertisements).
8.3. Misc
The non-adopted, expired 2013 draft
[I-D.okamoto-ccamp-midori-gmpls-extension-reqs] discusses power
awareness in routing in conjunction with Traffic Engineering
(tunnels), specifically in the context of Generalized MPLS (GMPLS),
e.g.: varous L2 technologies such as switched optical fiber networks.
It primarily claims the issue that the existing management objects
are not sufficient to express energy management related aspects, and
Eckert, et al. Expires 12 January 2023 [Page 26]
Internet-Draft energy-overview July 2022
thus do not allow to build energy conscious policies into PCE for
such GMPLS networks.
The non-adopted 2013 "Requirements for an Energy-Efficient Network
System", [I-D.suzuki-eens-requirements] proposes a signaling of
network capacity towards DC, for example based on load or network
energy management in support of appropriate performance control (such
as VM migration) the DC - or vice versa (DC load-based traffic
engineering in the network to support that DC load).
The non-adopted 2013 "Building power optimal Multicast Trees"
[I-D.mjsraman-rtgwg-pim-power] proposes that (PIM based) IP Multicast
routing could perform local routing choices in the case of "Equal
Cost MultiPath" (ECMP) "Reverse Path Forwarding" (RPF) alternatives
based on the energy that would be consumed in the router, such as
when one ECMP alternative would use a more power efficient linecard
or when one ECMP choice was on the same linecard as the interfaces to
which the packets would need to be routed (and therefore avoiding to
forward the packet across separate ingress and egress linecards).
9. Gaps
The 2013 "Towards an Energy-Efficient Internet"
[I-D.winter-energy-efficient-internet] summarizes some of the same
work items as this document (as written back in 2013) and lists
additional more non-adopted drafts. It also identifies three areas
of gaps, that it suggests the IETF to work on: "Load-adaptive
Resource Management", "Energy-efficient Protocol Design" and "Energy-
efficiency Metrics and Standard Benchmarking Methodologies".
Some aspects for those areas of gaps where partially tackled by later
work in the IETF, but broadly speaking, most of those areas remain
open to a wide range of possible further IETF/IRTF work.
10. Summary
TBD
11. Changelog
[RFC-Editor: this section to be removed in final document.]
The master for this document is hosted at http://github.com/toerless/
energy. Please submit Issues and/or Pull-requests for proposed
changes or join the team of authors and edit yourself.
00: Initial version
Eckert, et al. Expires 12 January 2023 [Page 27]
Internet-Draft energy-overview July 2022
01: Added Co-author (Mohamed Boucadair) - long list of typo fixes,
editorial improvements in abstract, introduction and other chapters.
Added section on satellite networks, devices with batteries, power
benchmarking and SDN-based forwarding semantics.
02: Minor text edits (med), add pointer to additional draft (med),
Added co-author pascal (tte),
03: Aded Jeff Tentsura as co-author
12. Informative References
[BOUNDED_LATENCY]
Cruz, R.L., "A calculus for network delay. I. Network
elements in isolation", DOI 10.1109/18.61109,
IEEE Transactions on Information Theory ( Volume: 37,
Issue: 1), 1991,
<https://ieeexplore.ieee.org/document/61109>.
[I-D.ajunior-energy-awareness-00]
Junior, A. and R. C. Sofia, "Energy-awareness metrics
global applicability guidelines", Work in Progress,
Internet-Draft, draft-ajunior-energy-awareness-00, 16
October 2012, <https://www.ietf.org/archive/id/draft-
ajunior-energy-awareness-00.txt>.
[I-D.bormann-core-roadmap-05]
Bormann, C., "CoRE Roadmap and Implementation Guide", Work
in Progress, Internet-Draft, draft-bormann-core-roadmap-
05, 21 October 2013, <https://www.ietf.org/archive/id/
draft-bormann-core-roadmap-05.txt>.
[I-D.boucadair-irtf-sdn-and-semantic-routing]
Boucadair, M., Trossen, D., and A. Farrel, "Considerations
for the use of SDN in Semantic Routing Networks", Work in
Progress, Internet-Draft, draft-boucadair-irtf-sdn-and-
semantic-routing-01, 31 May 2022,
<https://www.ietf.org/archive/id/draft-boucadair-irtf-sdn-
and-semantic-routing-01.txt>.
[I-D.castellani-core-alive]
Castellani, A. P. and S. Loreto, "CoAP Alive Message",
Work in Progress, Internet-Draft, draft-castellani-core-
alive-00, 29 March 2012, <https://www.ietf.org/archive/id/
draft-castellani-core-alive-00.txt>.
Eckert, et al. Expires 12 January 2023 [Page 28]
Internet-Draft energy-overview July 2022
[I-D.chakrabarti-nordmark-energy-aware-nd]
Chakrabarti, S., Nordmark, E., and M. Wasserman, "Energy
Aware IPv6 Neighbor Discovery Optimizations", Work in
Progress, Internet-Draft, draft-chakrabarti-nordmark-
energy-aware-nd-02, 12 March 2012,
<https://www.ietf.org/archive/id/draft-chakrabarti-
nordmark-energy-aware-nd-02.txt>.
[I-D.claise-power-management-arch]
Claise, B., Parello, J., and B. Schoening, "Power
Management Architecture", Work in Progress, Internet-
Draft, draft-claise-power-management-arch-02, 22 October
2010, <https://www.ietf.org/archive/id/draft-claise-power-
management-arch-02.txt>.
[I-D.desmouceaux-ipv6-mcast-wifi-power-usage]
Desmouceaux, Y., "Power consumption due to IPv6 multicast
on WiFi devices", Work in Progress, Internet-Draft, draft-
desmouceaux-ipv6-mcast-wifi-power-usage-01, 1 August 2014,
<https://www.ietf.org/archive/id/draft-desmouceaux-ipv6-
mcast-wifi-power-usage-01.txt>.
[I-D.fossati-core-monitor-option]
Fossati, T., Giacomin, P., and S. Loreto, "Monitor Option
for CoAP", Work in Progress, Internet-Draft, draft-
fossati-core-monitor-option-00, 9 July 2012,
<https://www.ietf.org/archive/id/draft-fossati-core-
monitor-option-00.txt>.
[I-D.fossati-core-publish-option]
Fossati, T., Giacomin, P., and S. Loreto, "Publish Option
for CoAP", Work in Progress, Internet-Draft, draft-
fossati-core-publish-option-03, 6 January 2014,
<https://www.ietf.org/archive/id/draft-fossati-core-
publish-option-03.txt>.
[I-D.giacomin-core-sleepy-option]
Fossati, T., Giacomin, P., Loreto, S., and M. Rossini,
"Sleepy Option for CoAP", Work in Progress, Internet-
Draft, draft-giacomin-core-sleepy-option-00, 29 February
2012, <https://www.ietf.org/archive/id/draft-giacomin-
core-sleepy-option-00.txt>.
Eckert, et al. Expires 12 January 2023 [Page 29]
Internet-Draft energy-overview July 2022
[I-D.ietf-core-coap-pubsub]
Koster, M., Keranen, A., and J. Jimenez, "Publish-
Subscribe Broker for the Constrained Application Protocol
(CoAP)", Work in Progress, Internet-Draft, draft-ietf-
core-coap-pubsub-10, 4 May 2022,
<https://www.ietf.org/archive/id/draft-ietf-core-coap-
pubsub-10.txt>.
[I-D.ietf-dnssd-srp]
Lemon, T. and S. Cheshire, "Service Registration Protocol
for DNS-Based Service Discovery", Work in Progress,
Internet-Draft, draft-ietf-dnssd-srp-14, 11 July 2022,
<https://www.ietf.org/archive/id/draft-ietf-dnssd-srp-
14.txt>.
[I-D.ietf-tcpm-rfc793bis]
Eddy, W. M., "Transmission Control Protocol (TCP)
Specification", Work in Progress, Internet-Draft, draft-
ietf-tcpm-rfc793bis-28, 7 March 2022,
<https://www.ietf.org/archive/id/draft-ietf-tcpm-
rfc793bis-28.txt>.
[I-D.jennings-energy-pricing]
Jennings, C. and B. Nordman, "Communication of Energy
Price Information", Work in Progress, Internet-Draft,
draft-jennings-energy-pricing-01, 10 July 2011,
<https://www.ietf.org/archive/id/draft-jennings-energy-
pricing-01.txt>.
[I-D.lhan-problems-requirements-satellite-net]
Han, L., Li, R., Retana, A., Chen, M., Su, L., Jiang, T.,
and N. Wang, "Problems and Requirements of Satellite
Constellation for Internet", Work in Progress, Internet-
Draft, draft-lhan-problems-requirements-satellite-net-03,
6 July 2022, <https://www.ietf.org/archive/id/draft-lhan-
problems-requirements-satellite-net-03.txt>.
[I-D.manral-bmwg-power-usage]
Manral, V., Sharma, P., Banerjee, S., and Y. Ping,
"Benchmarking Power usage of networking devices", Work in
Progress, Internet-Draft, draft-manral-bmwg-power-usage-
04, 12 March 2013, <https://www.ietf.org/archive/id/draft-
manral-bmwg-power-usage-04.txt>.
[I-D.mjsraman-panet-inter-as-power-source]
Raman, S., Venkataswami, B. V., Raina, G., and K.
Veezhinathan, "Reducing Power Consumption using BGP with
power source data", Work in Progress, Internet-Draft,
Eckert, et al. Expires 12 January 2023 [Page 30]
Internet-Draft energy-overview July 2022
draft-mjsraman-panet-inter-as-power-source-00, 25 January
2013, <https://www.ietf.org/archive/id/draft-mjsraman-
panet-inter-as-power-source-00.txt>.
[I-D.mjsraman-rtgwg-pim-power]
Raman, S., Venkataswami, B. V., Raina, G., and V. Srini,
"Building power optimal Multicast Trees", Work in
Progress, Internet-Draft, draft-mjsraman-rtgwg-pim-power-
02, 27 March 2012, <https://www.ietf.org/archive/id/draft-
mjsraman-rtgwg-pim-power-02.txt>.
[I-D.okamoto-ccamp-midori-gmpls-extension-reqs]
Okamoto, S., "Requirements of GMPLS Extensions for Energy
Efficient Traffic Engineering", Work in Progress,
Internet-Draft, draft-okamoto-ccamp-midori-gmpls-
extension-reqs-02, 14 March 2013,
<https://www.ietf.org/archive/id/draft-okamoto-ccamp-
midori-gmpls-extension-reqs-02.txt>.
[I-D.petrescu-v6ops-ipv6-power-ipv4]
Petrescu, A., Said, S. B. H., Philippot, O., and T.
Vincent, "Power Consumption of IPv6 vs IPv4 in
Smartphone", Work in Progress, Internet-Draft, draft-
petrescu-v6ops-ipv6-power-ipv4-00, 13 March 2017,
<https://www.ietf.org/archive/id/draft-petrescu-v6ops-
ipv6-power-ipv4-00.txt>.
[I-D.rahman-core-sleepy]
Rahman, A., "Enhanced Sleepy Node Support for CoAP", Work
in Progress, Internet-Draft, draft-rahman-core-sleepy-05,
11 February 2014, <https://www.ietf.org/archive/id/draft-
rahman-core-sleepy-05.txt>.
[I-D.rahman-core-sleepy-nodes-do-we-need]
Rahman, A., "Sleepy Devices: Do we need to Support them in
CORE?", Work in Progress, Internet-Draft, draft-rahman-
core-sleepy-nodes-do-we-need-01, 11 February 2014,
<https://www.ietf.org/archive/id/draft-rahman-core-sleepy-
nodes-do-we-need-01.txt>.
[I-D.rahman-core-sleepy-problem-statement]
Rahman, A., Fossati, T., Loreto, S., and M. Vial, "Sleepy
Devices in CoAP - Problem Statement", Work in Progress,
Internet-Draft, draft-rahman-core-sleepy-problem-
statement-01, 21 October 2012,
<https://www.ietf.org/archive/id/draft-rahman-core-sleepy-
problem-statement-01.txt>.
Eckert, et al. Expires 12 January 2023 [Page 31]
Internet-Draft energy-overview July 2022
[I-D.retana-rtgwg-eacp]
Retana, A., White, R., and M. Paul, "A Framework and
Requirements for Energy Aware Control Planes", Work in
Progress, Internet-Draft, draft-retana-rtgwg-eacp-03, 24
October 2014, <https://www.ietf.org/archive/id/draft-
retana-rtgwg-eacp-03.txt>.
[I-D.suzuki-eens-requirements]
Suzuki, T. and T. Tarui, "Requirements for an Energy-
Efficient Network System", Work in Progress, Internet-
Draft, draft-suzuki-eens-requirements-00, 15 October 2012,
<https://www.ietf.org/archive/id/draft-suzuki-eens-
requirements-00.txt>.
[I-D.vial-core-mirror-proxy]
Vial, M., "CoRE Mirror Server", Work in Progress,
Internet-Draft, draft-vial-core-mirror-proxy-01, 13 July
2012, <https://www.ietf.org/archive/id/draft-vial-core-
mirror-proxy-01.txt>.
[I-D.vial-core-mirror-server]
Vial, M., "CoRE Mirror Server", Work in Progress,
Internet-Draft, draft-vial-core-mirror-server-01, 10 April
2013, <https://www.ietf.org/archive/id/draft-vial-core-
mirror-server-01.txt>.
[I-D.wang-roll-energy-optimization-scheme]
Wang, H., Wei, M., Li, S., Huang, Q., Wang, P., and C.
Wang, "An energy optimization routing scheme for LLSs",
Work in Progress, Internet-Draft, draft-wang-roll-energy-
optimization-scheme-00, 21 February 2017,
<https://www.ietf.org/archive/id/draft-wang-roll-energy-
optimization-scheme-00.txt>.
[I-D.winter-energy-efficient-internet]
Winter, R., Jeong, S., and J. Choi, "Towards an Energy-
Efficient Internet", Work in Progress, Internet-Draft,
draft-winter-energy-efficient-internet-01, 22 October
2012, <https://www.ietf.org/archive/id/draft-winter-
energy-efficient-internet-01.txt>.
[I-D.zhang-greennet]
Zhang, B., Shi, J., Dong, J., and M. Zhang, "Power-aware
Routing and Traffic Engineering: Requirements, Approaches,
and Issues", Work in Progress, Internet-Draft, draft-
zhang-greennet-01, 10 January 2013,
<https://www.ietf.org/archive/id/draft-zhang-greennet-
01.txt>.
Eckert, et al. Expires 12 January 2023 [Page 32]
Internet-Draft energy-overview July 2022
[I-D.zhang-panet-problem-statement]
Zhang, B., Shi, J., Dong, J., Zhang, M., and M. Boucadair,
"Power-Aware Networks (PANET): Problem Statement", Work in
Progress, Internet-Draft, draft-zhang-panet-problem-
statement-03, 15 October 2013,
<https://www.ietf.org/archive/id/draft-zhang-panet-
problem-statement-03.txt>.
[NASPICLOCK]
Force, N. T. S. T., "Time Synchronization in the Electric
Power System", March 2017,
<https://www.naspi.org/sites/default/files/
reference_documents/tstf_electric_power_system_report_pnnl
_26331_march_2017_0.pdf>.
[RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5,
RFC 1112, DOI 10.17487/RFC1112, August 1989,
<https://www.rfc-editor.org/info/rfc1112>.
[RFC1142] Oran, D., Ed., "OSI IS-IS Intra-domain Routing Protocol",
RFC 1142, DOI 10.17487/RFC1142, February 1990,
<https://www.rfc-editor.org/info/rfc1142>.
[RFC1628] Case, J., Ed., "UPS Management Information Base",
RFC 1628, DOI 10.17487/RFC1628, May 1994,
<https://www.rfc-editor.org/info/rfc1628>.
[RFC1866] Berners-Lee, T. and D. Connolly, "Hypertext Markup
Language - 2.0", RFC 1866, DOI 10.17487/RFC1866, November
1995, <https://www.rfc-editor.org/info/rfc1866>.
[RFC1883] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 1883, DOI 10.17487/RFC1883,
December 1995, <https://www.rfc-editor.org/info/rfc1883>.
[RFC2086] Myers, J., "IMAP4 ACL extension", RFC 2086,
DOI 10.17487/RFC2086, January 1997,
<https://www.rfc-editor.org/info/rfc2086>.
[RFC2212] Shenker, S., Partridge, C., and R. Guerin, "Specification
of Guaranteed Quality of Service", RFC 2212,
DOI 10.17487/RFC2212, September 1997,
<https://www.rfc-editor.org/info/rfc2212>.
[RFC2246] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
RFC 2246, DOI 10.17487/RFC2246, January 1999,
<https://www.rfc-editor.org/info/rfc2246>.
Eckert, et al. Expires 12 January 2023 [Page 33]
Internet-Draft energy-overview July 2022
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
DOI 10.17487/RFC2328, April 1998,
<https://www.rfc-editor.org/info/rfc2328>.
[RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
and W. Weiss, "An Architecture for Differentiated
Services", RFC 2475, DOI 10.17487/RFC2475, December 1998,
<https://www.rfc-editor.org/info/rfc2475>.
[RFC2543] Handley, M., Schulzrinne, H., Schooler, E., and J.
Rosenberg, "SIP: Session Initiation Protocol", RFC 2543,
DOI 10.17487/RFC2543, March 1999,
<https://www.rfc-editor.org/info/rfc2543>.
[RFC3433] Bierman, A., Romascanu, D., and K.C. Norseth, "Entity
Sensor Management Information Base", RFC 3433,
DOI 10.17487/RFC3433, December 2002,
<https://www.rfc-editor.org/info/rfc3433>.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
July 2003, <https://www.rfc-editor.org/info/rfc3550>.
[RFC3621] Berger, A. and D. Romascanu, "Power Ethernet MIB",
RFC 3621, DOI 10.17487/RFC3621, December 2003,
<https://www.rfc-editor.org/info/rfc3621>.
[RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M.
Stenberg, "UDP Encapsulation of IPsec ESP Packets",
RFC 3948, DOI 10.17487/RFC3948, January 2005,
<https://www.rfc-editor.org/info/rfc3948>.
[RFC3977] Feather, C., "Network News Transfer Protocol (NNTP)",
RFC 3977, DOI 10.17487/RFC3977, October 2006,
<https://www.rfc-editor.org/info/rfc3977>.
[RFC4268] Chisholm, S. and D. Perkins, "Entity State MIB", RFC 4268,
DOI 10.17487/RFC4268, November 2005,
<https://www.rfc-editor.org/info/rfc4268>.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
Border Gateway Protocol 4 (BGP-4)", RFC 4271,
DOI 10.17487/RFC4271, January 2006,
<https://www.rfc-editor.org/info/rfc4271>.
Eckert, et al. Expires 12 January 2023 [Page 34]
Internet-Draft energy-overview July 2022
[RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for
IP", RFC 4607, DOI 10.17487/RFC4607, August 2006,
<https://www.rfc-editor.org/info/rfc4607>.
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
DOI 10.17487/RFC4861, September 2007,
<https://www.rfc-editor.org/info/rfc4861>.
[RFC4944] Montenegro, G., Kushalnagar, N., Hui, J., and D. Culler,
"Transmission of IPv6 Packets over IEEE 802.15.4
Networks", RFC 4944, DOI 10.17487/RFC4944, September 2007,
<https://www.rfc-editor.org/info/rfc4944>.
[RFC4949] Shirey, R., "Internet Security Glossary, Version 2",
FYI 36, RFC 4949, DOI 10.17487/RFC4949, August 2007,
<https://www.rfc-editor.org/info/rfc4949>.
[RFC5548] Dohler, M., Ed., Watteyne, T., Ed., Winter, T., Ed., and
D. Barthel, Ed., "Routing Requirements for Urban Low-Power
and Lossy Networks", RFC 5548, DOI 10.17487/RFC5548, May
2009, <https://www.rfc-editor.org/info/rfc5548>.
[RFC5673] Pister, K., Ed., Thubert, P., Ed., Dwars, S., and T.
Phinney, "Industrial Routing Requirements in Low-Power and
Lossy Networks", RFC 5673, DOI 10.17487/RFC5673, October
2009, <https://www.rfc-editor.org/info/rfc5673>.
[RFC5826] Brandt, A., Buron, J., and G. Porcu, "Home Automation
Routing Requirements in Low-Power and Lossy Networks",
RFC 5826, DOI 10.17487/RFC5826, April 2010,
<https://www.rfc-editor.org/info/rfc5826>.
[RFC5867] Martocci, J., Ed., De Mil, P., Riou, N., and W. Vermeylen,
"Building Automation Routing Requirements in Low-Power and
Lossy Networks", RFC 5867, DOI 10.17487/RFC5867, June
2010, <https://www.rfc-editor.org/info/rfc5867>.
[RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
"Network Time Protocol Version 4: Protocol and Algorithms
Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
<https://www.rfc-editor.org/info/rfc5905>.
[RFC6206] Levis, P., Clausen, T., Hui, J., Gnawali, O., and J. Ko,
"The Trickle Algorithm", RFC 6206, DOI 10.17487/RFC6206,
March 2011, <https://www.rfc-editor.org/info/rfc6206>.
Eckert, et al. Expires 12 January 2023 [Page 35]
Internet-Draft energy-overview July 2022
[RFC6272] Baker, F. and D. Meyer, "Internet Protocols for the Smart
Grid", RFC 6272, DOI 10.17487/RFC6272, June 2011,
<https://www.rfc-editor.org/info/rfc6272>.
[RFC6282] Hui, J., Ed. and P. Thubert, "Compression Format for IPv6
Datagrams over IEEE 802.15.4-Based Networks", RFC 6282,
DOI 10.17487/RFC6282, September 2011,
<https://www.rfc-editor.org/info/rfc6282>.
[RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J.,
Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur,
JP., and R. Alexander, "RPL: IPv6 Routing Protocol for
Low-Power and Lossy Networks", RFC 6550,
DOI 10.17487/RFC6550, March 2012,
<https://www.rfc-editor.org/info/rfc6550>.
[RFC6690] Shelby, Z., "Constrained RESTful Environments (CoRE) Link
Format", RFC 6690, DOI 10.17487/RFC6690, August 2012,
<https://www.rfc-editor.org/info/rfc6690>.
[RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service
Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013,
<https://www.rfc-editor.org/info/rfc6763>.
[RFC6775] Shelby, Z., Ed., Chakrabarti, S., Nordmark, E., and C.
Bormann, "Neighbor Discovery Optimization for IPv6 over
Low-Power Wireless Personal Area Networks (6LoWPANs)",
RFC 6775, DOI 10.17487/RFC6775, November 2012,
<https://www.rfc-editor.org/info/rfc6775>.
[RFC6887] Wing, D., Ed., Cheshire, S., Boucadair, M., Penno, R., and
P. Selkirk, "Port Control Protocol (PCP)", RFC 6887,
DOI 10.17487/RFC6887, April 2013,
<https://www.rfc-editor.org/info/rfc6887>.
[RFC6933] Bierman, A., Romascanu, D., Quittek, J., and M.
Chandramouli, "Entity MIB (Version 4)", RFC 6933,
DOI 10.17487/RFC6933, May 2013,
<https://www.rfc-editor.org/info/rfc6933>.
[RFC6988] Quittek, J., Ed., Chandramouli, M., Winter, R., Dietz, T.,
and B. Claise, "Requirements for Energy Management",
RFC 6988, DOI 10.17487/RFC6988, September 2013,
<https://www.rfc-editor.org/info/rfc6988>.
Eckert, et al. Expires 12 January 2023 [Page 36]
Internet-Draft energy-overview July 2022
[RFC7030] Pritikin, M., Ed., Yee, P., Ed., and D. Harkins, Ed.,
"Enrollment over Secure Transport", RFC 7030,
DOI 10.17487/RFC7030, October 2013,
<https://www.rfc-editor.org/info/rfc7030>.
[RFC7102] Vasseur, JP., "Terms Used in Routing for Low-Power and
Lossy Networks", RFC 7102, DOI 10.17487/RFC7102, January
2014, <https://www.rfc-editor.org/info/rfc7102>.
[RFC7326] Parello, J., Claise, B., Schoening, B., and J. Quittek,
"Energy Management Framework", RFC 7326,
DOI 10.17487/RFC7326, September 2014,
<https://www.rfc-editor.org/info/rfc7326>.
[RFC7460] Chandramouli, M., Claise, B., Schoening, B., Quittek, J.,
and T. Dietz, "Monitoring and Control MIB for Power and
Energy", RFC 7460, DOI 10.17487/RFC7460, March 2015,
<https://www.rfc-editor.org/info/rfc7460>.
[RFC7461] Parello, J., Claise, B., and M. Chandramouli, "Energy
Object Context MIB", RFC 7461, DOI 10.17487/RFC7461, March
2015, <https://www.rfc-editor.org/info/rfc7461>.
[RFC7577] Quittek, J., Winter, R., and T. Dietz, "Definition of
Managed Objects for Battery Monitoring", RFC 7577,
DOI 10.17487/RFC7577, July 2015,
<https://www.rfc-editor.org/info/rfc7577>.
[RFC7603] Schoening, B., Chandramouli, M., and B. Nordman, "Energy
Management (EMAN) Applicability Statement", RFC 7603,
DOI 10.17487/RFC7603, August 2015,
<https://www.rfc-editor.org/info/rfc7603>.
[RFC7641] Hartke, K., "Observing Resources in the Constrained
Application Protocol (CoAP)", RFC 7641,
DOI 10.17487/RFC7641, September 2015,
<https://www.rfc-editor.org/info/rfc7641>.
[RFC768] Postel, J., "User Datagram Protocol", STD 6, RFC 768,
DOI 10.17487/RFC0768, August 1980,
<https://www.rfc-editor.org/info/rfc768>.
[RFC7733] Brandt, A., Baccelli, E., Cragie, R., and P. van der Stok,
"Applicability Statement: The Use of the Routing Protocol
for Low-Power and Lossy Networks (RPL) Protocol Suite in
Home Automation and Building Control", RFC 7733,
DOI 10.17487/RFC7733, February 2016,
<https://www.rfc-editor.org/info/rfc7733>.
Eckert, et al. Expires 12 January 2023 [Page 37]
Internet-Draft energy-overview July 2022
[RFC7772] Yourtchenko, A. and L. Colitti, "Reducing Energy
Consumption of Router Advertisements", BCP 202, RFC 7772,
DOI 10.17487/RFC7772, February 2016,
<https://www.rfc-editor.org/info/rfc7772>.
[RFC7849] Binet, D., Boucadair, M., Vizdal, A., Chen, G., Heatley,
N., Chandler, R., Michaud, D., Lopez, D., and W. Haeffner,
"An IPv6 Profile for 3GPP Mobile Devices", RFC 7849,
DOI 10.17487/RFC7849, May 2016,
<https://www.rfc-editor.org/info/rfc7849>.
[RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791,
DOI 10.17487/RFC0791, September 1981,
<https://www.rfc-editor.org/info/rfc791>.
[RFC793] Postel, J., "Transmission Control Protocol", STD 7,
RFC 793, DOI 10.17487/RFC0793, September 1981,
<https://www.rfc-editor.org/info/rfc793>.
[RFC8036] Cam-Winget, N., Ed., Hui, J., and D. Popa, "Applicability
Statement for the Routing Protocol for Low-Power and Lossy
Networks (RPL) in Advanced Metering Infrastructure (AMI)
Networks", RFC 8036, DOI 10.17487/RFC8036, January 2017,
<https://www.rfc-editor.org/info/rfc8036>.
[RFC822] Crocker, D., "STANDARD FOR THE FORMAT OF ARPA INTERNET
TEXT MESSAGES", STD 11, RFC 822, DOI 10.17487/RFC0822,
August 1982, <https://www.rfc-editor.org/info/rfc822>.
[RFC8352] Gomez, C., Kovatsch, M., Tian, H., and Z. Cao, Ed.,
"Energy-Efficient Features of Internet of Things
Protocols", RFC 8352, DOI 10.17487/RFC8352, April 2018,
<https://www.rfc-editor.org/info/rfc8352>.
[RFC8368] Eckert, T., Ed. and M. Behringer, "Using an Autonomic
Control Plane for Stable Connectivity of Network
Operations, Administration, and Maintenance (OAM)",
RFC 8368, DOI 10.17487/RFC8368, May 2018,
<https://www.rfc-editor.org/info/rfc8368>.
[RFC8428] Jennings, C., Shelby, Z., Arkko, J., Keranen, A., and C.
Bormann, "Sensor Measurement Lists (SenML)", RFC 8428,
DOI 10.17487/RFC8428, August 2018,
<https://www.rfc-editor.org/info/rfc8428>.
Eckert, et al. Expires 12 January 2023 [Page 38]
Internet-Draft energy-overview July 2022
[RFC8575] Jiang, Y., Ed., Liu, X., Xu, J., and R. Cummings, Ed.,
"YANG Data Model for the Precision Time Protocol (PTP)",
RFC 8575, DOI 10.17487/RFC8575, May 2019,
<https://www.rfc-editor.org/info/rfc8575>.
[RFC8613] Selander, G., Mattsson, J., Palombini, F., and L. Seitz,
"Object Security for Constrained RESTful Environments
(OSCORE)", RFC 8613, DOI 10.17487/RFC8613, July 2019,
<https://www.rfc-editor.org/info/rfc8613>.
[RFC8724] Minaburo, A., Toutain, L., Gomez, C., Barthel, D., and JC.
Zuniga, "SCHC: Generic Framework for Static Context Header
Compression and Fragmentation", RFC 8724,
DOI 10.17487/RFC8724, April 2020,
<https://www.rfc-editor.org/info/rfc8724>.
[RFC8815] Abrahamsson, M., Chown, T., Giuliano, L., and T. Eckert,
"Deprecating Any-Source Multicast (ASM) for Interdomain
Multicast", BCP 229, RFC 8815, DOI 10.17487/RFC8815,
August 2020, <https://www.rfc-editor.org/info/rfc8815>.
[RFC8824] Minaburo, A., Toutain, L., and R. Andreasen, "Static
Context Header Compression (SCHC) for the Constrained
Application Protocol (CoAP)", RFC 8824,
DOI 10.17487/RFC8824, June 2021,
<https://www.rfc-editor.org/info/rfc8824>.
[RFC8994] Eckert, T., Ed., Behringer, M., Ed., and S. Bjarnason, "An
Autonomic Control Plane (ACP)", RFC 8994,
DOI 10.17487/RFC8994, May 2021,
<https://www.rfc-editor.org/info/rfc8994>.
[RFC9008] Robles, M.I., Richardson, M., and P. Thubert, "Using RPI
Option Type, Routing Header for Source Routes, and IPv6-
in-IPv6 Encapsulation in the RPL Data Plane", RFC 9008,
DOI 10.17487/RFC9008, April 2021,
<https://www.rfc-editor.org/info/rfc9008>.
[RFC9010] Thubert, P., Ed. and M. Richardson, "Routing for RPL
(Routing Protocol for Low-Power and Lossy Networks)
Leaves", RFC 9010, DOI 10.17487/RFC9010, April 2021,
<https://www.rfc-editor.org/info/rfc9010>.
[RFC9011] Gimenez, O., Ed. and I. Petrov, Ed., "Static Context
Header Compression and Fragmentation (SCHC) over LoRaWAN",
RFC 9011, DOI 10.17487/RFC9011, April 2021,
<https://www.rfc-editor.org/info/rfc9011>.
Eckert, et al. Expires 12 January 2023 [Page 39]
Internet-Draft energy-overview July 2022
[RFC9119] Perkins, C., McBride, M., Stanley, D., Kumari, W., and JC.
Zúñiga, "Multicast Considerations over IEEE 802 Wireless
Media", RFC 9119, DOI 10.17487/RFC9119, October 2021,
<https://www.rfc-editor.org/info/rfc9119>.
[RFC9148] van der Stok, P., Kampanakis, P., Richardson, M., and S.
Raza, "EST-coaps: Enrollment over Secure Transport with
the Secure Constrained Application Protocol", RFC 9148,
DOI 10.17487/RFC9148, April 2022,
<https://www.rfc-editor.org/info/rfc9148>.
[RFC9176] Amsüss, C., Ed., Shelby, Z., Koster, M., Bormann, C., and
P. van der Stok, "Constrained RESTful Environments (CoRE)
Resource Directory", RFC 9176, DOI 10.17487/RFC9176, April
2022, <https://www.rfc-editor.org/info/rfc9176>.
[RFC9178] Arkko, J., Eriksson, A., and A. Keränen, "Building Power-
Efficient Constrained Application Protocol (CoAP) Devices
for Cellular Networks", RFC 9178, DOI 10.17487/RFC9178,
May 2022, <https://www.rfc-editor.org/info/rfc9178>.
[VC2014] Ong, D., Moors, T., and V. Sivaraman, "Comparison of the
energy, carbon and time costs of videoconferencing and in-
person meetings", DOI 10.1016/j.comcom.2014.02.009, 2014,
<https://www.sciencedirect.com/science/article/pii/
S0140366414000620>.
Authors’ Addresses
Toerless Eckert (editor)
Futurewei Technologies USA
2220 Central Expressway
Santa Clara, CA 95050
United States of America
Email: tte@cs.fau.de
Mohamed Boucadair
Orange
35000 Rennes
France
Email: mohamed.boucadair@orange.com
Eckert, et al. Expires 12 January 2023 [Page 40]
Internet-Draft energy-overview July 2022
Pascal Thubert
Cisco Systems, Inc.
45 Allee des Ormes - BP1200, Building D
06254 MOUGINS Sophia Antipolis
France
Phone: +33 497 23 26 34
Email: pthubert@cisco.com
Jeff Tentsura
Microsoft
Email: jefftant.ietf@gmail.com
Eckert, et al. Expires 12 January 2023 [Page 41]
Internet Research Task Force J. François
Internet-Draft Inria
Intended status: Informational A. Clemm
Expires: 12 January 2023 Futurewei Technologies, Inc.
D. Papadimitriou
Nokia
S. Fernandes
Central Bank of Canada
S. Schneider
Digital Railway (DSD) at Deutsche Bahn
11 July 2022
Research Challenges in Coupling Artificial Intelligence and Network
Management
draft-francois-nmrg-ai-challenges-00
Abstract
This document is intended to introduce the challenges to overcome
when network management problems may require to be couple with AI
solutions. On one hand, there are many difficult problems in Network
Management that to this date have no good solutions, or where any
solutions come with significant limitations and constraints.
Artificial Intelligence may help produce novel solutions to those
problems. On the other hand, for several reasons (computational
costs of AI solutions, privacy of data), distribution of AI tasks
became primordial. It is thus also expected that network SHOULD be
operated efficiently to support those tasks.
To identify the right set of challenges, the document defines a
method based on the evolution and nature of NM problems. This will
be done in parallel with advances and the nature of existing
solutions in AI in order to highlight where AI and NM have been
already coupled together or could benefit from a higher integration.
So, the method aims at evaluating the gap between NM problems and AI
solutions. Challenges are derived accordingly, assuming solving
these challenges will help to reduce the gap between NM and AI.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
François, et al. Expires 12 January 2023 [Page 1]
Internet-Draft Coupling AI and network management July 2022
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 12 January 2023.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust’s Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 5
3. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4. Difficult problems in network management . . . . . . . . . . 5
5. High-level challenges in adopting AI in NM . . . . . . . . . 8
6. AI techniques for network management . . . . . . . . . . . . 10
6.1. Problem type and mapping . . . . . . . . . . . . . . . . 10
6.1.1. Sub-challenge: Suitable Approach for Given Input . . 10
6.1.2. Sub-challenge: Suitable Approach for Desired
Output . . . . . . . . . . . . . . . . . . . . . . . 11
6.1.3. Sub-challenge: Tailoring the AI Approach to the Given
Problem . . . . . . . . . . . . . . . . . . . . . . . 12
6.2. Performance of produced models . . . . . . . . . . . . . 13
6.3. Lightweight AI . . . . . . . . . . . . . . . . . . . . . 15
6.4. AI for planning of actions . . . . . . . . . . . . . . . 16
7. Network data as input for ML algorithms . . . . . . . . . . . 18
7.1. Data for AI-based NM solutions . . . . . . . . . . . . . 19
7.2. Data collection . . . . . . . . . . . . . . . . . . . . . 20
7.3. Usable data . . . . . . . . . . . . . . . . . . . . . . . 21
8. Acceptability of AI . . . . . . . . . . . . . . . . . . . . 22
8.1. Explainability of Network-AI products . . . . . . . . . 23
8.2. AI-based products and algorithms in production
systems . . . . . . . . . . . . . . . . . . . . . . . . . 24
8.3. AI with humans in the loop . . . . . . . . . . . . . . . 25
9. Security Considerations . . . . . . . . . . . . . . . . . . . 26
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 26
11.1. Normative References . . . . . . . . . . . . . . . . . . 26
François, et al. Expires 12 January 2023 [Page 2]
Internet-Draft Coupling AI and network management July 2022
11.2. Informative References . . . . . . . . . . . . . . . . . 26
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 32
Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 32
1. Introduction
The functional scope of network management (NM) is very large,
ranging from monitoring to accounting, from network provisioning to
service diagnostics, from usage accounting to security. The taxonomy
defined in [Hoo18] extends the traditional Fault, Configuration,
Accounting, Performance, Security (FCAPS) domains by considering
additional functional areas but above all by promoting additional
views. For instance, network management approaches can be classified
according to the technologies, methods or paradigms they will rely
on. Methods include common approaches as for example mathematical
optimization or queuing theory but also techniques which have been
widely applied in last decades like game theory, data analysis, data
mining and machine learning. In management paradigms, autonomic and
cognitive management are listed. As highlighted by this taxonomy,
the definition of automated and more intelligent techniques have been
promoted to support efficient network management operations.
Research in NM and more generally in networking has been very active
in the area of applied ML [Bou18].
However, for maintaining network operational in pre-defined safety
bounds, NM still heavily relies on established procedures. Even
after several cycles of adding automation, those procedures are still
mostly fixed in the sense that the exact control loop is and all
possibilities are defined in advance. They are so mostly
deterministic by nature or or at least with maximal error bounds
.Obviously, there have been a lot of propositions to make network
smarter or intelligent with the use of ML but without large adoption
for running real networks because it changes the paradigms towards
stochastic methods.
ML is a sub-area of AI that concentrates the focus nowadays but AI
encompasses other areas including knowledge representation, inference
rule engine, statistical methods or by extension the techniques that
allow to observe and perform actions on a system.
It is thus legitimate to question if ML or AI in general could be
helpful for NM in regards to practical deployment. This question is
actually tight with the problems the NM aims to address.
Independently of NM, ML solutions were introduced to solve one type
of problems in an approximate way which are very complex in nature,
i.e. finding an optimal solution is not possible (in polynomial
time). This is the case for NP-hard problems. In those cases,
solutions typically rely on heuristics that may not yield optimal
François, et al. Expires 12 January 2023 [Page 3]
Internet-Draft Coupling AI and network management July 2022
results, or algorithms that run into issues with scalability and the
ability to produce timely results due to the exponential search
space. In NM, those problems exist, for instance allocation of
resources in case of service function chaining or network slicing
among others are recent examples which have gained interest in our
community with SDN. Many propositions consist of defining the
problem as an MILP with some heuristics to reach a satisfactory
tradeoff between solution quality - computation time and model size/
dimensionality. Hence, ML is recognized to be well adapted to
progress on this type of problem [Kaf19].
However, all problems of NM are not NP-hard. Due to real-time
constraints, some involve very short control loops that require both
rapid decisions and the ability to rapidly adapt to new situations
and different contexts. So, even in that case, time is critical and
approximate solutions are usually more acceptable. Again, it is
where AI can be beneficial. Actually expert systems are AI systems
[Ste92] but this kind of systems are not designed to scale with the
volume and heterogeneity of data we can collect in a network today
for which the expert system is built thanks to numerous inference
rules. In contrast, ML is more efficient to automatically learn
abstract representations of the rules, which can be eventually
updated.
On one hand Another type of common problem in NM is classification.
For instance, classifying network flows is helpful for security
purposes to detect attack flows, to differentiate QoS among the
different flows (e.g. real-time streams which need to be
prioritized), etc. On the other hand, ML-based classification
algorithms have been widely used in literature with high quality
results when properly applied leading to their applications in
commercial products. There are many algorithms including decision
tress, support vector machine ir (deep) neural networks which have
been to be proven efficient in many areas and notably for image and
natural language processing.
François, et al. Expires 12 January 2023 [Page 4]
Internet-Draft Coupling AI and network management July 2022
Finally, many problems also still rely on humans in the loop, from
support issues such as dealing with trouble tickets to planning
activities for the roll-out of new services. This creates
operational bottlenecks and is often expensive and error prone. This
kind of tasks could be either automated or guided by an AI system to
avoid human bias. Indeed, the balance between human resources and
the complexity of problems to deal with is actually very imbalanced
and this will continue to increase due to the size of networks,
heterogeneity of devices, services, etc. Hence, human-based
procedures tend to be simple in comparison to the problem to solve or
time-consuming. Notable examples are in security where the network
operator should defend against potential unknown threat. As a
result, services might be largely affected during hours
Actually, all the problems aforementioned are exacerbated by the
situation of more complex networks to operate on many dimensions
(users, devices, services, connections, etc.). Therefore, AI is
expected to enable or simplify the solving of those problems in real
networks in the near future [czb20] [Yan20] because those would
require reaching unprecedented levels of performance in terms of
throughput, latency, mobility, security, etc.
2. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
3. Acronyms
AI: Artificial Intelligence GAN: Generative Adversarial Network GNN:
Graph Neural Network LSTM: Long Short-Term Memory ML: Machine
Learning MLP: Multilayer Perceptron NM: Network Management
4. Difficult problems in network management
As mentioned in introduction, problems to be tackled in NM tend to be
complex and exhibit characteristics that make them candidates for
solutions that involve AI techniques:
* C1: A very large solution space, combinatorially exploding with
the size of the problem domain. This makes it impractical to
explore and test every solution (again NP-hard problems here)
François, et al. Expires 12 January 2023 [Page 5]
Internet-Draft Coupling AI and network management July 2022
* C2: Uncertainty and unpredictability along multiple dimensions,
including the context in which the solution is applied, behavior
of users and traffic, lack of visibility into network state, and
more. In addition, many networks do not exist in isolation but
are subjected to myriads of interdependencies, some outside their
control. Accordingly, there are many external parameters that
affect the efficiency of the solution to a problem and that cannot
be known in advance: user activity, interconnected networks, etc.
* C3: The need to provide answers (i.e. compute solutions, deliver
verdicts, make decisions) in constrained or deterministic time.
In many cases, context changes dynamically and decisions need to
be made quickly to be of use.
* C4: Data-dependent solutions. To solve a problem accurately, it
can be necessary to rely on large volumes of data, having to deal
with issues that range from data heterogeneity to incomplete data
to general challenges of dealing with high data velocity.
* C5: Need to be integrated with existing automatic and human
processes.
* C6: Solutions MUST be cost-effective as resources (bandwidth, CPU,
human, etc.) can be limited, notably when part of processing is
distributed at the network edge or within the network.
Many problems are affected by multiple criteria. Below is a non-
exhaustive list of complex NM problems for which AI and/or non-AI-
based approaches have been proposed:
* Computation of optimal paths: packet forwarding is not always
based on traditional routing protocols with least cost routing,
but on computation of paths that are optimized for certain
criteria - for example, to meet certain level objectives, to
result in greater resilience, to balance utilization, to optimize
energy usage, etc. Many of those solutions can be found in SDN,
where a controller or path computation element computes paths that
are subsequently provisioned across the network. However, such
solutions generally do not scale to millions of paths (C1), and
cannot be recomputed in sub-second time scales (C3) to take into
account dynamically changing network conditions (C2). To compute
those paths, operations research techniques have been extensively
used in literature along with AI methods as shown in [Lop20]. As
such, this problem can be considered as close to big data problems
with some of the different Vs: volume, velocity, variety, value...
François, et al. Expires 12 January 2023 [Page 6]
Internet-Draft Coupling AI and network management July 2022
* Classification of network traffic: without loss of generality a
common objective of network monitoring for operators is to know
the type of traffic going through their networks (web, streaming,
gaming, VoIP). By nature, this task analyzes data (C4) which can
vary over time (C2) except in very particular scenarios like
industrial isolated networks. However, the output of the
classification technique is time-constrained only in specific
cases where fast decisions MUST be made, for example to reroute
traffic. Simple identification based on IANA-assigned TCP/UDP
ports numbers were sufficient in the past. However, with
applications using dynamic port numbers, signature techniques can
be used to match packet payload [Sen04]. To handle applications
now encapsulated in encrypted web or VPN traffic, machine-learning
has been leveraged [Bri19].
* Network diagnostics: disruptions of networking services can have
many causes. Identifying the root cause can be of high importance
when what is causing the disruption is not properly understood, so
that repair actions can address the root cause versus just working
around the symptoms. Further complicating the matter are
scenarios in which disruptions are not "hard" but involve only a
degradation of service level, and where disruptions are
intermittent, not reproducible, and hard to predict. Artificial
intelligence techniques can offer promising solutions.
* Intent-Based Networking (IBN): Roughly speaking, IBN refers to the
ability to manage networks by articulating desired outcomes
without the need to specify a course of actions to achieve those
outcomes. The ability to determine such courses of actions, in
particular in scenarios with multiple interdependencies,
conflicting goals, large scale, and highly complex and dynamic
environments is a huge and largely unsolved challenge. Artificial
Intelligence techniques can be of help here in multiple ways, from
accurately classifying dynamic context to determine matching
actions to reframing the expression of intent as a game that can
be played (and won) using artificially intelligent techniques.
* VNF placement and SFC design: Virtual Network Functions need to be
placed on physical resources and Service Function Chains designed
in an optimized manner to avoid use of networking resources and
minimize energy usage.
* Smart admission control to avoid congestion and oversubscription
of network resources: Admission control needs to be set up and
performed in ways that ensure service levels are optimized in a
manner that is fair and aligned with application needs, congestion
avoided or its effects mitigated.
François, et al. Expires 12 January 2023 [Page 7]
Internet-Draft Coupling AI and network management July 2022
5. High-level challenges in adopting AI in NM
As shown in the previous section, AI techniques are good candidates
for the difficult NM problems. There have been many propositions but
still most of them remain at the level of prototypes or have been
only evaluated with simulation and/or emulation. It is thus
questionable why our community investigates much research in this
direction but has not adopted those solutions to operate real
networks. There are different obstacles.
First, AI advances have been historically driven by the image/video,
natural language and signal processing communities as well as
robotics for many decades. As a result, the most impressive
applications are in this area including recently the generalization
of home assistants or the large progress in autonomous vehicles.
However, the network experts have been focused on building the
Internet, especially building protocols to make the world
interconnected and with always better performance and services. This
trend continues today with the 5G in deployment and 6G under
definition. Hence, AI was not our primary focus. However, AI is now
considered as a core enabler for the future 6G networks which are
sometimes qualified as AI-native networks.
While we can see major contributions in AI-based solutions for
networking over more than two decades, only a fraction of the
community was concerned by AI at that time. Progress as a whole,
from a community perspective, was so limited and compensated by
relying on the development of AI in the communities as mentioned
earlier. Even if our problems share some commonalities, for example
on the volume of data to analyze, there are many differences: data
types are completely different, networks are by nature heavily
distributed, etc. If problems are different, they SHOULD require
distinct solutions. In a nutshell, network-tailored AI was
overlooked.
Second, many AI techniques require enough representative data to be
applied independently if the algorithms are supervised or
unsupervised. NM has produced a lot of methods and technologies to
acquire data. However, in most cases, the goal was not to support AI
techniques and lead so to a mismatch. For example, (deep) learning
techniques mostly rely on having vectors of (real) numbers as input
which fits some metrics (packet/byte counts, latency, delays, etc)
but needs some adjustment for categorical (IP addresses, port
numbers, etc) or topological features. Conversions are usually
applied using common techniques like one-hot encoding or by coarse-
grained representations [Sco11]. However, more advanced techniques
have been recently proposed to embed representation of network
entities rather than pure encoding [Rin17][Evr19][Sol20].
François, et al. Expires 12 January 2023 [Page 8]
Internet-Draft Coupling AI and network management July 2022
An additional challenge concerns the fact that AI techniques that
involve analysis of networking data can also lead to the extraction
of sensitive and personally-identifiable information, raising
potential privacy concerns and concerns regarding the potential for
abuse. For example, AI techniques used to analyze encrypted network
traffic with the legitimate goal to protect the network from
intrusions and illegitimate attack traffic could be used to infer
information about network usage and interactions of network users.
Intelligent data analysis and the need to maintain privacy are in
many ways that are contradictory in nature, resulting in an arms
race. Similarly, training ML solutions on real network data is in
many cases preferable over using less-realisitic synthetic data
sets.However, network data may contain private or sensitive data, the
sharing of which may be problematic from a privacy standpoint and
even result in legal exposure. The challenge concerns thus how to
allow AI techniques to perform legitimate network management
functions and provide network owners with operational insights into
what is going on in their networks, while prohibiting their potential
for abuse for other (illegitimate) purposes.
Finally, networks are already operated thanks to (semi-)automated
procedures involving a large number of resources which are
synchronized with management or orchestration tools. Adding AI
supposes it would be seamlessly integrated within pre-existing
processes. Although the goal of these procedures might be solely to
provide relevant information to operators through alerts or
dashboards in case of monitoring applications, many other
applications rely on those procedures to trigger actions on the
different resources, which can be local or remote. The use of AI or
any other approaches to derive NM actions adds further constraint on
them, especially regarding time constraints and synchronization to
maintain a coherence over a distributed system.
A related challenge concerns the fact that to be deployed, a solution
needs to not only provide a technical solution but to also be
acceptable to users - in this case, network administrators and
operators. One challenge with automated solutions concerns that
users want to feel "in control" and able to understand what is going
on, even more so if ultimately those users are the ones who are held
accountable for whether or not the network is running smoothly.
Those same concerns extend to artificially intelligent systems for
obvious reasons. To mitigate those concerns, aspects such as the
ability to explain actions that are taken - or about to be taken - by
AI systems become important.
Beyond reasons of making users more comfortable, there are
potentially also legal or regulatory ramifications to ensure that
actions taken are properly understood. For example,agencies such as
François, et al. Expires 12 January 2023 [Page 9]
Internet-Draft Coupling AI and network management July 2022
the FCC may impose fines on network operators when services such as
E911 experience outages, as there is a public interest in ensuring
highest availability for such services. In investigating causes for
such outages, the underlying behavior of systems has to be properly
understood, and even more so the reasons for actions that fall under
the realm of network operations.
6. AI techniques for network management
6.1. Problem type and mapping
In the last few years, an increasing number of different AI
techniques have been proposed and applied successfully to a growing
variety of different problems in different domains, including network
management [Mus18], [Xie18]. Some of the more recently proposed AI
approaches are clearly advancements of older approaches, which they
supersede. Many other AI approaches are not predecessors or
successors but simply complementary because they are useful for
different problems or optimize different metrics. In fact, different
AI approaches are useful for different kinds of problem inputs (e.g.,
tabular data vs. text vs. images vs. time series) and also for
different kinds of desired outputs (e.g., a predicted value, a
classification, or an action). Similarly, there may be trade-offs
between multiple approaches that take the same kind of inputs and
desired outputs (e.g., in terms of desired objective, computation
complexity, constraints).
Overall, it is a key challenge of using AI for network management to
properly understand and map which kind of problems with which inputs,
outputs, and objectives are best solved with which kind of AI (or
non-AI) approaches. Given the wealth of existing and newly released
AI approaches, this is far from a trivial task.
6.1.1. Sub-challenge: Suitable Approach for Given Input
Different problems in network management come with widely different
problem parameters. For example, security-related problems may have
large amounts of text or encrypted data as input, whereas forecasting
problems have historical time series data as input. They also vary
in the amount of available data.
Both the type and amount of data influences which AI techniques could
be useful. On one hand, in scenarios with little data, classical
machine learning techniques (e.g., SVM, tree-based approaches, etc.)
are often sufficient and even superior to neural networks. On the
other hand, neural networks have the advantage of learning complex
models from large amounts of data without requiring feature
engineering. Here, different neural network architectures are useful
François, et al. Expires 12 January 2023 [Page 10]
Internet-Draft Coupling AI and network management July 2022
for different kinds of problems. The traditional and simplest
architecture are (fully connected) multi-layer perceptrons (MLPs),
which are useful for structured, tabular data. For images, videos,
or other high-dimensional data with correlation between "close"
features, convolutional neural networks (CNNs) are useful. Recurrent
neural networks (RNNs), especially LSTMs, and attention-based neural
networks (transformers) are great for sequential data like time
series or text. Finally, Graph Neural Networks (GNNs) can
incorporate and consider the graph-structured input, which is very
useful in network management, e.g., to represent the network
topology.
The aforementioned rough guidelines can help identify a suitable AI
approach and neural network architecture. Still, best results are
often only achieved with sophisticated combinations of different
approaches. For example, multiple elements can be combined into one
architecture, e.g., with both CNNs and LSTMs, and multiple separate
AI approaches can be used as an ensemble to combine their strengths.
Here, simplifying the mapping from problem type and input to suitable
AI approaches and architectures is clearly an open challenge. Future
work SHOULD address this challenge by providing both clearer
guidelines and striving for more general AI approaches that can
easily be applied to a large variety of different problem inputs.
6.1.2. Sub-challenge: Suitable Approach for Desired Output
Similar to the challenge of identifying suitable AI approaches for a
given problem input, the desired output for a given problem also
affects which AI approach SHOULD be chosen. Here, the format of the
desired output (single value, class, action, etc.), the frequency of
these outputs and their meaning SHOULD be considered.
Again, there are rough guidelines for identifying a group of suitable
AI approaches. For example, if a single value is required (e.g., the
amount of resources to allocate to a service instance), then typical
supervised regression approaches SHOULD be used. If classification
(e.g., of malware or another security issue [Abd10]) instead of a
value is desired, supervised classification methods SHOULD be used.
Alternatively, unsupervised machine learning can help to cluster
given data into separate groups, which can be useful to analyze
networking data, e.g., for better understanding different types of
traffic or user segments.
In addition to these classical supervised and unsupervised methods,
reinforcement learning approaches allow active, sequential decisions
rather than simple predictions or classifications. This is often
useful in network management, e.g., to actively control service
scaling and placement as well as flow scheduling and routing.
François, et al. Expires 12 January 2023 [Page 11]
Internet-Draft Coupling AI and network management July 2022
Reinforcement learning agents autonomously select suitable actions in
a given environment and are especially useful for self-learning
network management. In addition to model-free reinforcement
learning, model-based planning approaches (e.g., Monte Carlo Tree
Search (MCTS)) also allow choosing suitable actions in a given
environment but require full knowledge of the environment dynamics.
In contrast, model-free reinforcement learning is ideal for scenarios
with unknown environment dynamics, which is often the case in network
management.
Similar to the previous sub-challenge, these are just rough
guidelines that can help to select a suitable group of AI approaches.
Identifying the most suitable approach within the group, e.g., the
best out of the many existing reinforcement learning approaches, is
still challenging. And, as before, different approaches could be
combined to enable even more effective network management (e.g.,
heuristics + RL, LSTMs + RL, ...). Here, further research MAY
simplify the mapping from desired problem output to choosing or
designing a suitable AI approach.
6.1.3. Sub-challenge: Tailoring the AI Approach to the Given Problem
After addressing the two aforementioned sub-challenges, one may have
selected a useful kind of AI approach for the given input and output
of a network management problem. For example, one may select
regression and supervised learning to forecast upcoming network
traffic. Or select reinforcement learning to continuously control
network and service coordination (scaling, placement, etc.).
However, even within each of these fields (regression, reinforcement
learning, etc.), there are many possible algorithms and
hyperparameters to consider. Selecting a suitable algorithm and
parametrizing it with the right hyperparameters is crucial to tailor
the AI approach to the given network management problem.
For example, there are many different regression techniques
(classical linear, polynomial regression, lasso/ridge regression,
SVR, regression trees, neural networks, etc.), each with different
benefits and drawbacks and each with its own set of hyperparameters.
Choosing a suitable technique depends on the amount and structure of
the input data as well as on the desired output. It also depends on
the available amount of compute resources and compute time until a
prediction is required. If resources and time are not a limiting
factor, many hyperparameters can be tuned automatically. In
practice, however, the design space of choosing algorithms and
hyperparameters is often so large that it cannot be effectively tuned
automatically but also requires some initial expertise in selecting
suitable AI algorithms and hyperparameters.
François, et al. Expires 12 January 2023 [Page 12]
Internet-Draft Coupling AI and network management July 2022
This sub-challenge holds for all fields of AI: Supervised learning
(regression and classification), self-supervised learning,
unsupervised learning, and reinforcement learning, each are broad and
rapidly growing fields. Selecting suitable algorithms and
hyperparameters to tailor AI approaches to the network management
problem is both an opportunity and a challenge. Here, future work
should further explore these trade-offs and provide clearer
guidelines on how to navigate these trade-offs for different network
management tasks.
6.2. Performance of produced models
From a general point of view, any AI technique will produce results
with a certain level of quality. This leads to two inherent
questions: (1) what is the definition of the performance in a context
of a NM application? (2) How to measure it? and (3) How to ensure/
improve the quality of produced results?
Many metrics have been already defined to evaluate the performance of
an AI-based techniques in regards to its NM-level objectives. For
example, QoS metrics (throughput, latency) can serve to measure the
performance of a routing algorithm along with the computational
complexity (memory consumption, size of routing tables). The
question is to model and measure these two antagonist types of
metrics. Number of true/false positives/negatives are the most basic
metrics for network attack detection functions. Although the first
two questions are thus already answered even if improvement can be
done, question (3) refers to the integration of metrics into AI
algorithms. Its objective is to obtain the best results which need
to be quantified with these metrics. Depending on the type of
algorithm, these metrics are either evaluated in an online manner
with a feedback loop (for example with reinforcement learning) or in
batch to optimize a model based on a particular context (for example
described by a dataset for machine learning).
The problem is two-fold. First, the performance can be measured
through multiple metrics of different types (numerical or ordinal for
example) and some can be constrained by fixed boundaries (like a
maximum latency), making their joint use challenging when creating an
AI model to resolve a NM problem. Second, the scale metrics differ
from each other in terms of importance or impact and can eventually
vary on their domains. It can be hard to precisely assess what is a
good or bad value (as it might depend on multiple other ones) and it
is even more difficult to integrate in an AI technique, especially
for learning algorithms to adjust their models based on the
performance. Indeed, learning algorithms run through multiple
iterations and rely on internal metrics (MAE or (R)MSE for neural
network, gini index or entropy for decision trees, distance to an
François, et al. Expires 12 January 2023 [Page 13]
Internet-Draft Coupling AI and network management July 2022
hyperplane for SVMs, etc) which are not strongly correlated to the
final metrics of the application. For instance, a decision tree
algorithm for classification purposes aims at being able to create
branches with a maximum of data from the same classes and so avoid
mixing classes. It is done thanks to a criterion like the entropy
index but this kind of Index does not assume any difference between
mixing class A and B or A and C. Assuming now that from an
operational point of view, if A and B are mixed in the predictions is
not critical, the algorithm should have preferred to mix and A and B
rather than A and C even if in the first case it will produce more
errors.
Therefore, the internal functioning of the AI algorithms should be
refined, here by defining a particular criterion to replace the
entropy as a quality measure when separating two branches. It
assumes that the final NM objectives are integrated at this stage.
Another concrete example is traffic predictors which aim at
forecasting traffic demands. They only produce an input that is not
necessarily simple to be interpreted and used by, e.g., capacity
allocation strategies/policies. A traditional traffic prediction
that tries to minimize (perfectly symmetric) MAE/MSE treats positive
and negative errors in identical ways, hence is agnostic of the
diverse meaning (and costs) of under- and over-provisioning. And,
such a prediction does not provide any information on, e.g., how to
dimension resources/capacity to accommodate the future demand
avoiding all underprovisioning (which entails service disruption)
while minimizing overprovisioning (i.e., wasting resources). In
other words, it forces the operator to guess the overprovisioning by
taking (non-informed) safety margins. A more sensible approach here
is instead forecasting directly the needed capacity, rather than the
traffic [Beg19].
While the one above is just an example, the high-level challenge is
devising forecasting models that minimize the correct objective/loss
function for the specific NM task at hand (instead of generic MAE/
MSE). In this way, the prediction phase becomes an integral part of
the NM, and not just a (limited and hard-to-use) input to it. In ML
terms, this maps to solving the loss-metric mismatch in the context
of anticipatory NM [Hua19].
François, et al. Expires 12 January 2023 [Page 14]
Internet-Draft Coupling AI and network management July 2022
Another issue for statistical learning (from examples/observations)
is mainly about extracting an estimator from a finite set of input-
output samples drawn from an unknown probability distribution that
should be descriptive enough for unseen/new input data. In this
context online monitoring and error control of the quality/properties
of these point estimators (bias, variance, mean squared error, etc.)
is critical for dynamic/uncertain network environments. Similar
reasoning/challenge applies for interval estimates, i.e., confidence
intervals (frequentist) and credible intervals (Bayesian).
6.3. Lightweight AI
Network management and operations often need to be performed under
strict time constraints, i.e. at line rate, in particular in the
context of autonomic or self-driven networks. Locating NM functions
as close as possible where forwarding is achieved is thus an
interesting option to avoid additional delays when these operations
are performed remotely, for example in a centralized controller.
Besides, forwarding devices may offer available resources to
supplement or replace edge resources. In case of AI coupled with
network management, AI tasks can be offloaded in network devices, or
more generally embedded within the network. Obviously, time-critical
tasks are the best candidates to be offloaded within the network.
Costly learning tasks should be processed in high-end servers but
created models can be deployed, configured, modified and tuned in
switches.
Recent advances in network programmability ease the programming of
specific tasks at data-plane level. P4 [Bos14] is widely used today
for many tasks including firewalling [Dat18] or bandwidth management
[Che19]. P4 is prone to be agnostic to a specific hardware.
Switches actually have particular architectures and the RMT
(Reconfigurable Match Table) [Bos13] model is generally accepted to
be generic enough to represent limited but essential switch
architecture components and functionalities. P4 is inspired by this
architecture. The RMT model allows reconfiguring match-action tables
where actions can be usual ones (rewrite some headers, forward,
drop...). Actions are thus applied on the packets when they are
forwarded. Actions can also be more complex programs with some
safeguards: no loop, resistivity... The impact on the program
development is huge. For example, real number operations are not
available by default while they are primordial in many AI algorithms.
In a nutshell, the first challenge to overcome of embedding AI in a
network is the capacity of the hardware to support AI operations
(architectural limitation). Considering software equipment such as a
virtual switch simplifies the problem but does not totally resolve it
as, even in that case, strong line-rate requirement limits the type
François, et al. Expires 12 January 2023 [Page 15]
Internet-Draft Coupling AI and network management July 2022
of programs to be executed. For example, BPF (Berkeley Packet
Filter) programs provides a higher control on packet processing in
OVS [Cha18] but still have some limitations, as the execution time of
these programs are bounded by nature to ensure their termination, an
essential requirement assuming the run-to-completion model which
permits high throughput.
The second challenge (resource limitation) of network-embedded AI in
the network is to allocate enough resources for AI tasks with a
limited impact on other tasks of network devices such as forwarding,
monitoring, filtering... Approximation and/or optimization of AI
tasks are potential directions to help in this area. For instance,
many network monitoring proposals rely on sketches and with a
proposed well-tuned implementation for data-plane [Liu16][Yan18].
However, no general optimized AI-programmable abstraction exists to
fit all cases and proposals are mostly use-case centric. Research
direction in NM regarding this issue can benefit from propositions in
the field of embedded systems that face the same issues.
Binarization of neural networks is one example [Lia18]. Besides,
distributed processing is a common technique to distribute the load
of a single task between multiple entities. AI task decomposition
between network elements, edge servers or controllers has been also
proposed [Gup18].
6.4. AI for planning of actions
Many tasks in network management revolve around the planning of
actions with the purpose of optimizing a network and facilitating the
delivery of communication services. For example, Paths need to be
planned and set up in ways that minimize wasted network resources (to
optimize cost) while facilitating high network utilization (avoiding
bottlenecks and the formation of congestion hotspots) and ensuring
resiliency (by making sure that backup paths are not congruent with
primary paths). Other examples were mentioned in section 2.
The need for planning only increases with the rise of centralized
control planes. The promise of central control is that decisions can
be optimized when made with complete knowledge of relevant context,
as opposed to distributed control that needs to rely on local
decisions being made with incomplete knowledge while incurring higher
overhead to replicate relevant state across multiple systems.
However, as the scale of networks and interconnected systems
continues to grow, so does the size of the planning task. Many
problems are NP-hard. As a result, solutions typically need to rely
on heuristics and algorithms that often result in suboptimal outcomes
and that are challenging to deploy in a scalable manner.
François, et al. Expires 12 January 2023 [Page 16]
Internet-Draft Coupling AI and network management July 2022
The emergence of Intent-Based Networking emphasizes the need for
automated planning even further. The concept underlying "intent" is
that it should allow users (network operators, not end users of
communication services) to articulate desired outcomes without the
need to specify how to achieve those outcomes. An Intent-Based
System is responsible for translating the intent into courses of
action that achieve the desired outcomes and that continue to
maintain the outcomes over time. How the necessary courses of action
are derived and what planning needs to take place is left open but
where the real challenge lies. Solutions that rely on clever
algorithms devised by human developers face the same challenges as
any other network management tasks.
These properties (problems with a clearly defined need, whose
solution is faced with exploding search spaces and that today rely on
algorithms and heuristics that in many cases result only suboptimal
outcomes and significant limitations in scale) make automated
planning of actions an ideal candidate for the application of AI-
based solutions.
AI applications in network management in the past have been largely
focusing on classification problems. Examples include analysis by
Intrusion Protection Systems of traffic flow patterns to detect
suspicious traffic, classification of encrypted traffic for improved
QoS treatment based on suspected application type, and prediction of
performance parameters based on observations. In addition, AI has
been used for troubleshooting and diagnostics, as well as for
automated help and customer support systems. However, AI-based
solutions for the automated planning of actions, including the
automated identification of courses of action, have to this point not
been explored much.
A much-publicized leap in AI has been the development of Alpha Go.
Instead of using AI to merely solve classification problems, Alpha Go
has been successful in automatically deriving winning strategy for
board games, specifically the game of Go which features a
prohibitively large search space that was long thought to put the
ability to play Go at a world class level beyond the reach of
problems that AI could solve. Among the remarkable aspects of Alpha
Go is that it is able to identify winning strategies completely on
its own, without needing those strategies to be taught or learned by
observations assuming the system is aware of rules.
François, et al. Expires 12 January 2023 [Page 17]
Internet-Draft Coupling AI and network management July 2022
The challenge for AI in network management is hence, where is the
equivalent of an Alpha Go that can be applied to network management
(and networking) problems? Specifically, better solutions are needed
for solutions that automatically derive plans and courses of actions
for network optimization and similar NP-hard problems, such as
provided today with only limited effectiveness by controllers and
management applications.
Also, the evaluation of AI algorithms to derive courses of actions is
more complex than more common regression or classification tasks.
Actions need to be applied in order to observe the results it leads
to. However, contrary to game playing, solutions need to be applied
in the real world, where actions have real effects and consequences.
Different orientations can be envisioned. First, incremental
application of AI decisions with small steps can allow us to
carefully observe and detect unexpected effects. This can be
complemented with roll-back techniques. Second, formal verification
techniques can be leveraged to verify decisions made by AI are
maintained within safety bounds. Third, sandbox environments can be
used but they SHOULD be representative enough of the real world.
After progress in simulation and emulation, recent research advances
lead to the definition of digital twins which implies a tight
coupling between a real system and its digital twin to ensure a
parallel but synchronized execution. Alternatively, transfer
learning techniques in another promising area to be able to
capitalize on ML models applicable on a real word system in a more
generic sandbox environment. It is actually also an open problem to
make the use of AI more acceptable as highlighted in the dedicated
section.
7. Network data as input for ML algorithms
Many applications of AI takes as input data. The quality of the
outputs of ML-based techniques are highly dependent on the quality
and quantity of data used for learning but also on other parameters.
For example, as modern network infrastructures move towards higher
speed and scale, they aim to support increasingly more demanding
services with strict performance guarantees. These often require
resource reconfigurations at run time, in response to emerging
network events, so that they can ensure reliable delivery at the
expected performance level. Timely observation and detection of
events is also of paramount importance for security purposes, and can
allow faster execution of remedy actions thus leading to reduced
service downtime.
Thus, the challenge of data management is multifaceted as detailed in
next subsections.
François, et al. Expires 12 January 2023 [Page 18]
Internet-Draft Coupling AI and network management July 2022
7.1. Data for AI-based NM solutions
Assuming a network management application, the first problem to
address is to define the data to be collected which will be
appropriate to obtain accurate results. This data selection can
require defining problem-specific data or features (feature
engineering).
Firstly, NM has already produced a lot of methods and technologies to
acquire data. However, in most cases, the goal was not to support AI
problems and lead to a mismatch. Indeed, machine learning algorithms
only work as desired when data to be analyzed respects properties.
Many methods rely on vector-based distances which so supposes that
the data encoded into the vector respects the underlying distance
semantic. Taking the first n bytes of a packet as vectors and
computing distances accordingly is possible but does not embed the
semantic of the information carried out in the headers. For example,
(deep) learning techniques mostly rely on vectors of (real) numbers
as input which fits some metrics (packet/byte counts, latency,
delays, etc) but needs some adjustment for categorical (IP addresses,
port numbers, etc) or topological features. Conversions are usually
applied using common techniques like one-hot encoding or by coarse-
grained representations [Sco11]. However, more advanced techniques
have been recently proposed to embed representation of network
entities rather than pure encoding [Rin17][Evr19][Sol20]. Data to
handle can be in a schema-free or eventually text-based format. One
example could be the automated annotation of management intents
provided in an unstructured textual format (policies descriptions,
specifications,) to extract from them management entities and
operations. For that purpose, suitable annotation models need to be
built using existing NER (Named Entity Recognition) techniques
usually applied for NLP. However, this SHALL be carefully crafted or
specialized for network management (intent) language which indirectly
bounces back to the challenges of AI techniques for NM specified
earlier.
Secondly, The behavior of any network is not just derived from the
events that can be directly observed, such as network traffic
overload, but also from events occurring outside the environment of
the network. The information provided by the detectors of such kinds
of events, e.g. a natural incident (earthquake, storm), can be used
to determine the adaptation of the network to avoid potential
problems derived from such events. Those can be provided by BigData
sources as well as sensors of many kinds. The AI challenge related
to this task is to process large amounts of data and associate it
with the effects that those events have on the network. It is hard
to determine the static and dynamic relation between the data
provided by external sources and the specific implications it has in
François, et al. Expires 12 January 2023 [Page 19]
Internet-Draft Coupling AI and network management July 2022
networks. For instance, the effect of a "flash crowd" detected in an
external source depends on the relation of a particular network to
such an event. This can be addressed by AI and its particular
application to network management. The objective is to complement a
control-loop, as shown in [Mar18], by including the specific AI
engines into the decision components as well as the processes that
close the loop, so the AI engine can receive feedback from the
network in order to improve its own behavior. Similar challenges are
addressed in other domains, image processing and computer vision, by
using artifacts for anticipating movements in object location and
identification.
7.2. Data collection
Once defined, the second problem to address is the collection of
data. Monitoring frameworks have been developed for many years such
as IPFIX [RFC7011] and more recently with SDN-based monitoring
solutions [Yu14][Ngu20]. However, going towards more AI for actions
in network management supposes also to retrieve more than traffic
related information. Actually, configuration information such as
topologies, routing tables or security policies have been proven to
be relevant in specific scenarios. As a result, many different
technologies can be used to retrieve meaningful data. To support
improved QoE, monitoring of the application layer is helpful but far
from being easy with the heterogeneity of end-user applications and
the wide use of encrypted channels. Monitoring techniques need to be
reinvented through the definition of new techniques to extract
knowledge from raw measurement [Bri19] or by involving end-users with
crowd-sourcing [Hir15] and distributed monitoring.
The collecting process requirements depend on the kind of processing.
We can distinguish two major classes: batch/offline vs real-time/
online processing. In particular, real-time monitoring tools are key
in enabling dynamic resource management functions to operate on short
reconfiguration cycles. However, maintaining an accurate view of the
network state requires a vast amount of information to be collected
and processed. While efficient mechanisms that extract raw
measurement data at line rate have been recently developed, the
processing of collected data is still a costly operation. This
involves evaluating and aggregating a vast amount of state
information as a response to a diverse set of monitoring queries,
before generating accurate reports. Machine learning methods, e.g.
based on regression, can be used to intelligently filter the raw
measurements and thus reduce the volume of data to process. For
example, in [Tan20] the authors proposed an approach in which the
classifiers derived for this purpose (according to measurements on
traffic properties) can achieve a threefold improvement in the query
processing capability. A residual question is the storage of raw
François, et al. Expires 12 January 2023 [Page 20]
Internet-Draft Coupling AI and network management July 2022
measurements. In fact, predicting the lifetime of data is
challenging because their analysis may not be planned and triggered
by a particular event (for example, an anomaly or attack). As a
result, the provisioning of storage capacity can be hard.
In parallel to the continuously increasing dynamicity of networks and
complexity of traffic, there is a trend towards more user traffic
processing customization [RFC8986][Li19]. As a result, fine grained
information about network element states is expected and new
propositions have emerged to collect on-path data or in-band network
telemetry information [Tan20b]. These new approaches have been
designed by introducing much flexibility and customization and could
be helpful to be used in conjunction with AI applications. However,
the seamless coupling of telemetry processes with packet forwarding
requires careful definition of solutions to limit the overhead and
the impact of the throughput while providing the necessary level of
details. This shares commonalities with the lightweight AI
challenge.
7.3. Usable data
Although all agree on the necessity to have more shared datasets, it
is quite uncommon in practice. Data contains private or sensitive
information and may not be shared because of the criticality of data
(which can be used by ill-intentioned adversaries) or due to laws or
regulations, even within the same company. To solve this issue,
anonymization techniques [Dij19] can be enhanced to optimize the
trade-off between valuable data vs sensitive information (potential)
leakage or reconstruction. Whatever the final user of data,
regulations and laws impose rules on data management with potentially
costly impact if they are not respected voluntarily or not. Defining
a new monitoring framework should always consider security and
privacy aspects, for example to let any user/customer or access/
remove its own data with General Data Protection Regulation (GDPR) in
EU. The challenge resides here in the capacity of qualifying what is
critical or private information and the capacity for an adversary to
reconstruct it from other sources of data. Hence AI/ML based
solutions will require more data but also more administrative, legal
and ethical procedures. Those can last long and so slow down the
deployment of a new solution. In addition, this requires interaction
with experts from different domains (e.g. AI engineer and a lawyer).
The integration of these non-technical constraints should be
considered when defining new data to be collected or a new technique
to collect data. However, knowing the final use of data is most of
the time necessary for ethical and legal assessment which assumes
that those considerations SHOULD be integrated from the early design
of new AI-based solutions.
François, et al. Expires 12 January 2023 [Page 21]
Internet-Draft Coupling AI and network management July 2022
For supervised or semi-supervised training, having a labeled dataset
is a prerequisite. It constitutes a major challenge as well. One
one hand, collectors are able to retrieve data. On the other hand,
those network data are typically unlabeled. This limits application
of ML to unsupervised learning tasks (learning from data). Because
manual labeling is a tedious task. one option is to leverage AI to
guide humans. This may also support a better generalization of a
learned model. Indeed, an underlying challenge is the genericity or
coverage of the datasets. Labels encode values of an objective
function, the challenge posed by the design of such tools is
tremendous since for involving a M:N relationship: 1 data type may be
associated to M objective function values and N data types may be
associated to 1 objective function. As a result, most datasets used
for research encodes a single label for a particular application like
attack label for datasets to be used in the context of intrusion
detection or application type for network traffic used for
classification where the value of a single dataset could be
capitalized in several applications.
Again, researchers need empirical (or at least realistic) datasets to
validate their solutions. Unfortunately, as highlighted above,
having such data from real deployments for various reasons (business
secrets, privacy concerns, concerns that vulnerabilities are revealed
by accident, raw unlabeled data, etc.) is tough. Even if such a
dataset is available it might not be enough to convincingly validate
a new algorithm. Instead of falling back to artificial testbed
experiments or simulation, it would be useful to have the capability
to generate datasets with characteristics that are not 100% identical
but similar to the characteristics of one or more real datasets.
Such synthetic networks can be used to validate new management
algorithms, intrusion detection systems, etc. The usage of AI (for
example GANs) in this area [Hui22] is not yet widespread and there
are still many concerns that deter researchers, e.g. the fear of
leaking sensitive information from the original dataset into the
synthetic dataset.
8. Acceptability of AI
Networks are critical infrastructures. On one hand, they SHOULD be
operated without interruption and must be interoperable. Networks,
except in a lab, are not isolated which slow down innovation in
general. For example, changing Internet routing protocols SHOULD be
accepted by all. The same applies for protocol. Even if there have
been several versions of major protocols in use like TCP or DNS,
there are still some security issues which cannot be patched with
100% guarantee. On the other hand, results provided by AI solutions
are uncertain by nature. The same technique applied in different
environments can produce different results. AI techniques need some
François, et al. Expires 12 January 2023 [Page 22]
Internet-Draft Coupling AI and network management July 2022
effort (time and human) to be properly configured or to be
stabilized. For instance, reinforcement learning needs several
iterations before being able to produce acceptable results. These
properties of AI techniques are thus a bit antagonist with the
criticality of network infrastructures. With that in mind,
acceptability of AI by network operators is clearly an obstacle for
its larger adoption.
8.1. Explainability of Network-AI products
A common issue across all Machine Learning (ML) applications is that
they are black boxes. This means that, after training, the knowledge
acquired by ML models is unintelligible to humans. As a result,
offering hard guarantees on performance is a very challenging issue.
In addition, complex ML models like neural networks -that often have
more than hundreds of thousands of parameters- are very hard to debug
or troubleshoot in case of failure.
While this is a common issue for all applications of AI, many areas
work well with uncertainty and the black-box behavior of AI-based
solutions. For instance, users accept an inherent error in
recommender systems or computer vision solutions.
The networking field has already produced a set of well-established
network management algorithms and methods, with clear performance
guarantees and troubleshooting mechanisms [Rex06][Kr14]. As such,
improving debugging, troubleshooting and guarantees on AI-based
solutions for networking is a must.
AI researchers and practitioners are devoting large research efforts
to improve this aspect of ML models, which is commonly known as
explainability [XAI].
This set of techniques provides insights and, in some cases,
guarantees on the performance and behavior of ML-based solutions.
Understanding such techniques, researching and applying them to
network AI is critical for the success of the field.
François, et al. Expires 12 January 2023 [Page 23]
Internet-Draft Coupling AI and network management July 2022
There exist several ML-based methods that are human-understandable,
although not widely used today. For instance, [Mar20] shows a method
for building anticipation models (prediction) that provide
explanations while determining some actions for tuning some
parameters of the network. There are other challenges that SHOULD be
addressed, such as providing explanations for other ML methods that
are quite extended. For instance, xNN/SVM models can be accompanied
by Digital Twins of the network that are reversely explored to
explain some output from the ML model (e.g., xNN/SVM). In this
context, there already exist several methods [Zil20][Puj21] that
produce human-readable interpretations of trained NN models, by
analyzing their neural activations on different inputs.
8.2. AI-based products and algorithms in production systems
AI-based network management and optimization algorithms are first
trained, then the resulting model is used to produce relevant
inferences in operation, either in management or optimization
scenarios. A relevant question for the success of AI-based solutions
is: where does this training occur?
Traditionally, AI-based models have been trained in the same scenario
where they operate[Val17][Xu18], this is the customer network.
However this presents critical drawbacks. First, training an AI
model for management and operation typically requires generating
network configurations and scenarios that can break the network.
This is because training requires seeing a broad spectrum of
scenarios. Thus, it is not feasible in production networks. Second,
customer networks may not be equipped with the monitoring
infrastructure required to collect the data used in the training
process (e.g., performance metrics).
A more sensible approach is to train the AI-based product in a lab,
for instance in the vendor’s premises. In the lab, AI models can be
trained in a controlled testbed, with any configuration, even ones
that break the network. However, the main challenge here arises from
the fundamental differences between the lab’s network and the
customer networks. For instance, the topology of the lab’s network
might be smaller, etc. As a result, there is a need for models that
are able to generalize. In this context, generalization means that
models should be able to operate in other scenarios not seen during
training, with different topologies, routing configurations,
scheduling policies, etc.
In order to address this generalization problem, two main approaches
are possible: The first one is Transfer Learning [tl1]. With this
technique, the knowledge gained in the lab’s training is used to
operate in the customer network. Transfer Learning still requires
François, et al. Expires 12 January 2023 [Page 24]
Internet-Draft Coupling AI and network management July 2022
that some data from the customer is used to re-train the model (e.g.,
accurate performance measurements). This means that, for each
customer network, re-training is required. This presents important
drawbacks, since this represents an added cost and access to customer
data might be problematic.
A different approach is to use Graph Neural Networks (GNN)
[gnn1][gnn2]. GNNs are a novel type of neural network able to
operate and generalize over graphs. Indeed, networks are
fundamentally represented as graphs: topology, routing, etc. With
GNN, vendors can train the AI model in a lab and then use the
resulting model, as is, in different customer networks, without
additional re-training using customer data.
8.3. AI with humans in the loop
Depending on the network management task, AI can automate and replace
manual human control or it can complement human experts and keep them
in the loop. Keeping humans in the loop will be an important step of
building trust in AI approaches and help ensure the desired outcomes.
There are various ways of keeping humans in the loop in the different
fields of AI, which could be useful for different aspects of network
management.
In classification tasks (e.g., detecting security breaches, malware
or detecting anomalies), trained AI models provide a confidence score
in addition to the predicted class. If the confidence is high, the
prediction is used directly. If the confidence is too low, a human
expert may jump in and make the decision - thereby also providing
valuable training data to improve the AI model. Such approaches are
already being used in industry, e.g., to automatically label datasets
(AWS SageMake). Similar approaches could also be used for other
supervised learning tasks, e.g., regression. Still, it is an open
challenge to keep humans in the loop in all phases of the learning
process.
Another field of AI is reinforcement learning, which is useful for
taking continuous control decisions in network management, e.g.,
controlling service scaling and placement as well as flow scheduling
and routing over time. Reinforcement learning agents typically
interact with the environment (i.e., the simulated or real network)
completely autonomously without human feedback. However, there is a
growing number of approaches to put human experts back into the loop.
One approach is offline reinforcement learning, where the training
data does not come from the reinforcement learning agent’s own
exploration but from pre-recorded traces of human experts (e.g.,
placement decisions that were made by humans before). Another
approach is to reward the reinforcement learning agent based on human
François, et al. Expires 12 January 2023 [Page 25]
Internet-Draft Coupling AI and network management July 2022
feedback rather than a pre-defined reward function [Lee21]. Again,
while there are first promising approaches, more work is required in
this area. Overall, it is an open challenge to both leverage the
benefits of AI but keep human experts in the loop where it is useful.
9. Security Considerations
TODO Security
10. IANA Considerations
This document has no IANA actions.
11. References
11.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC7011] Claise, B., Trammell, B., and P. Aitken, "Specification of
the IP Flow Information Export (IPFIX) Protocol for the
Exchange of Flow Information", STD 77, RFC 7011,
DOI 10.17487/RFC7011, September 2013,
<https://www.rfc-editor.org/info/rfc7011>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8986] Filsfils, C., Ed., Camarillo, P., Ed., Leddy, J., Voyer,
D., Matsushima, S., and Z. Li, "Segment Routing over IPv6
(SRv6) Network Programming", RFC 8986,
DOI 10.17487/RFC8986, February 2021,
<https://www.rfc-editor.org/info/rfc8986>.
11.2. Informative References
[Abd10] Jalil, K. A., Kamarudin, M. H., and M. N. Masrek, "A
Diagnosis Expert System for Network Traffic Management",
2010. IEEE international conference on networking and
information technology
François, et al. Expires 12 January 2023 [Page 26]
Internet-Draft Coupling AI and network management July 2022
[Beg19] Bega, D., Gramaglia, M., Fiore, M., Banchs, A., and X.
Costa-Perez, "DeepCog: Cognitive Network Management in
Sliced 5G Networks with Deep Learning", 2019. IEEE
INFOCOM
[Bos13] Bosshart, P., Gibb, G., Kim, H.-S., Varghese, G., McKeown,
N., Izzard, M., Mujica, F., and M. Horowitz, "Forwarding
metamorphosis: Fast programmable match-action processing
in hardware for SDN", 2013. ACM SIGCOMM
[Bos14] Bosshart, P., Daly, D., Gibb, G., Izzard-, M., McKeown,
N., Rexford, J., Schlesinger, C., Talayco, D., Vahdat, A.,
Varghese, G., and D. Walker, "P4: programming protocol-
independent packet processors", 2014. SIGCOMM Comput.
Commun. Rev. 44
[Bou18] Boutaba, R., Salahuddin, M. A., Limam, N., Ayoubi, S.,
Shahriar, N., Estrada-Solano, F., and O. M. Caicedo, "A
comprehensive survey on machine learning for networking:
evolution, applications and research opportunities", 2018.
Journal of Internet Services and Applications 9, 16
[Bri19] Brissaud, P.-O., François, J., Chrisment, I., Cholez, T.,
and O. Bettan, "Transparent and Service-Agnostic
Monitoring of Encrypted Web Traffic", 2019. IEEE
Transactions on Network and Service Management, 16 (3)
[Cha18] Chaignon, P., Lazri, K., François, J., Delmas, T., and O.
Festor, "Oko: Extending Open vSwitch with Stateful
Filters", 2018. ACM Symposium on SDN Research (SOSR)
[Che19] Chen, Y., Yen, L., Wang, W., Chuang, C., Liu, Y., and C.
Tseng, "P4-Enabled Bandwidth Management", 2019. Asia-
Pacific Network Operations and Management Symposium
(APNOMS)
[czb20] Clemm, A., Zhani, M. F., and R. Boutaba, "Network
Management 2030: Operations and Control of Network 2030
Services", 2020. Springer Journal of Network and Systems
Management (JNSM)
[Dat18] Datta, R., Choi, S., Chowdhary, A., and Y. Park,,
"P4Guard: Designing P4 Based Firewall", 2018. IEEE
Military Communications Conference (MILCOM)
François, et al. Expires 12 January 2023 [Page 27]
Internet-Draft Coupling AI and network management July 2022
[Dij19] Dijkhuizen, N. V., Ham, J. V. D., and X. Li, "A Survey of
Network Traffic Anonymisation Techniques and
Implementations", 2014. ACM Comput. Surv. 51, 3, Article
52
[Evr19] Evrard, L., François, J., Colin, J.-N., and F. Beck,
"port2dist: Semantic Port Distances for Network
Analytics", 2019. IFIP/IEEE Symposium on Integrated
Network and Service Management (IM)
[gnn1] Battaglia, P. W. and E. al, "Relational inductive biases,
deep learning, and graph networks", 2018. arXiv preprint
arXiv:1806.01261
[gnn2] Rusek, K., Suárez-Varela, J., Mestres, A., Barlet-Ros, P.,
and A. Cabellos-Aparicio, "Unveiling the potential of
Graph Neural Networks for network modeling and
optimization in SDN", 2019. ACM Symposium on SDN Research
[Gup18] Gupta, A., Harrison, R., Canini, M., Feamster, N.,
Rexford, J., and W. Willinger, "Sonata: query-driven
streaming network telemetry", 2018. ACM SIGCOMM
Conference
[Hir15] Hirth, M., Hossfeld, T., Mellia, M., Schwartz, C., and F.
Lehrieder, "Crowdsourced network measurements: Benefits
and best practices", 2015. Computer Networks. 90
[Hoo18] Hooft, J. V. D., Claeys, M., Bouten, N., Wauters, T.,
Schönwälder, J., Stiller, A. P. B., Charalambides, M.,
Badonnel, R., Serrat, J., Santos, C. R. P. D., and F. D.
Turck, "Updated Taxonomy for the Network and Service
Management Research Field", 2018. Journal of Network
System Managemen (JNSM) 26, 790-808
[Hua19] Huang, C., Zhai, S., Talbott, W., Bautista, M. A., Sun,
S.-Y., Guestrin, C., and J. Susskind, "Addressing the
Loss-Metric Mismatch with Adaptive Loss Alignment", 2020.
ICRL
[Hui22] Hui, S., Wang, H., Wang, Z., Yang, X., Liu, Z., Jin, D.,
and Y. Li, "Knowledge Enhanced GAN for IoT Traffic
Generation", 2022. ACM Web Conference 2022 (WWW)
[Kaf19] Kafle, V. P., Martinez-Julia, P., and T. Miyazawa,
"Automation of 5G Network Slice Control Functions with
Machine Learning", 2019. IEEE Communications Standards
Magazine, vol. 3, no. 3, pp. 54-62
François, et al. Expires 12 January 2023 [Page 28]
Internet-Draft Coupling AI and network management July 2022
[Kr14] Kreutz, D., Ramos, F. M., Verissimo, P. E., Rothenberg, C.
E., Azodolmolky, S., and S. Uhlig, "Software-defined
networking: A comprehensive survey", 2015. Proceedings of
the IEEE, vol. 103, no. 1, pp. 14-76
[Lee21] Lee, K., Smith, L., and P. Abbeel, "Feedback-efficient
interactive reinforcement learning via relabeling
experience and unsupervised pre-training", 2021. arXiv
preprint arXiv:2106.05091
[Li19] Li, R., Makhijani, K., Yousefi, H., Westphal, C., Dong,
L., Wauters, T., and F. D. Turck., "A Framework for
Qualitative Communications Using Big Packet Protocol",
2019. ACM SIGCOMM Workshop on Networking for Emerging
Applications and Technologies (NEAT)
[Lia18] Liang, S., Yin, S., Liu, L., Luk, W., and S. Wei, "FP-BNN:
Binarized neural network on FPGA", 2018. Neurocomputing,
Volume 275
[Liu16] Liu, Z., Manousis, A., Vorsanger, G., Sekar, V., and V.
Braverman, "One Sketch to Rule Them All: Rethinking
Network Flow Monitoring with UnivMon", 2016. ACM SIGCOMM
Conference
[Lop20] López, J., Labonne, M., Poletti, C., and D. Belabed,
"Priority Flow Admission and Routing in SDN: Exact and
Heuristic Approaches", 2020. IEEE International Symposium
on Network Computing and Applications (NCA)
[Mar18] Martinez-Julia, P., Kafle, V. P., and H. Harai,
"Exploiting External Events for Resource Adaptation in
Virtual Computer and Network Systems", 2018. IEEE
Transactions on Network and Service Management, Vol. 15,
N. 2,
[Mar20] Martinez-Julia, P., Kafle, V. P., and H. Asaeda,
"Explained Intelligent Management Decisions in Virtual
Networks and Network Slices", 2020. Conference on
Innovation in Clouds, Internet and Networks and Workshops
(ICIN)
[Mus18] Musumeci, F., Rottondi, C., Nag, A., Macaluso, I., Zibar,
D., Ruffini, M., and M. Tornatore, "An overview on
application of machine learning techniques in optical
networks", 2018. IEEE Communications Surveys & Tutorials,
21(2), 1383-1408.
François, et al. Expires 12 January 2023 [Page 29]
Internet-Draft Coupling AI and network management July 2022
[Ngu20] Nguyen, T. G., Phan, T. V., Hoang, D. T., Nguyen, T. N.,
and C. So-In, "Efficient SDN-based traffic monitoring in
IoT networks with double deep Q-network", 2020.
International conference on computational data and social
networks, Springer
[Puj21] Pujol-Perich, D., Suárez-Varela, J., Xiao, S., Wu, B.,
Cabello, A., and P. Barlet-Ros, "NetXplain: Real-time
explainability of Graph Neural Networks applied to
Computer Networks", 2021. MLSys workshop on Graph Neural
Networks and Systems (GNNSys)
[Rex06] Rexford, J., "Route optimization in IP networks", 2006.
Handbook of Optimization in Telecommunications (pp.
679-700), Springer
[Rin17] Ring, M., Dallmann, A., Landes, D., and A. Hotho, "IP2Vec:
Learning Similarities Between IP Addresses", 2017. IEEE
International Conference on Data Mining Workshops (ICDMW)
[Sco11] Coull, S. E., Monrose, F., and M. Bailey, "On Measuring
the Similarity of Network Hosts: Pitfalls, New Metrics,
and Empirical Analyses", 2011. NDSS
[Sen04] Sen, S., Spatscheck, O., and D. Wang, "Accurate, scalable
in-network identification of p2p traffic using application
signatures", 2004. ACM International conference on World
Wide Web (WWW)
[Sol20] Soliman, H. M., Salmon, G., Sovilij, D., and M. Rao, "A
Graph Neural Network Approach for Scalable and Dynamic IP
Similarity in Enterprise Networks", 2020. IEEE
International Conference on Cloud Networking (CloudNet)
[Ste92] Stern, D. and P. Chemouil, "A Diagnosis Expert System for
Network Traffic Management", 1992. Networks, Kobe, Japan
[Tan20] Tangari, G., Charalambides, M., Pavlou, G., Grazian, C.,
and D. Tuncer, "Classification-assisted Query Processing
for Network Telemetry", 2020. Network Traffic Measurement
and Analysis Conference (TMA)
[Tan20b] Lizhuang, T., Wei, S., Zhenyi, Z., Jingying, M., Xiaoxi,
L., and L. Na, "In-band Network Telemetry: A Survey",
2020. Computer Networks. 186. 10.1016
François, et al. Expires 12 January 2023 [Page 30]
Internet-Draft Coupling AI and network management July 2022
[tl1] Torrey, L. and J. Shavlik, "Transfer learning", 2010.
Handbook of research on machine learning applications and
trends: algorithms, methods, and techniques
[Val17] A., V., M., S., D., S., and T. A., "Learning to route",
2017. ACM HotNets
[XAI] Samek, W., Wiegand, T., and K.-R. Müller, "Explainable
artificial intelligence: Understanding, visualizing and
interpreting deep learning models", 2017. arXiv preprint
arXiv:1708.08296
[Xie18] Xie, J., Yu, F. R., Huang, T., Xie, R., Liu, J., Wang, C.,
and Y. Liu, "A survey of machine learning techniques
applied to software defined networking (SDN): Research
issues and challenges", 2018. IEEE Communications Surveys
& Tutorials
[Xu18] Z., X., J., T., J., M., W., Z., Y., W., H., L. C., and Y.
D., "Experience-driven networking: A deep reinforcement
learning based approach", 2018. IEEE INFOCOM
[Yan18] Yang, T., Jiang, J., Liu, P., Huang, Q., Gong, J., Zhou,
Y., Miao, R., Li, X., and S. Uhlig, "Elastic sketch:
adaptive and fast network-wide measurements", 2018. ACM
SIGCOMM Conference
[Yan20] Yang, H., Alphones, A., Xiong, Z., Niyato, D., Zhao, J.,
and K. Wu,, "Artificial-Intelligence-Enabled Intelligent
6G Networks", 2020. IEEE Network, vol. 34, no. 6, pp.
272-280
[Yu14] Yu, Y., Qian, C., and X. Li, "Distributed and
collaborative traffic monitoring in software defined
networks", 2014. ACM Hot topics in software defined
networking
[Zil20] Meng, Z., Wang, M., Bai, J., Xu, M., Mao, H., and H. Hu,
"Interpreting Deep Learning-Based Networking Systems",
2020. ACM SIGCOMM
François, et al. Expires 12 January 2023 [Page 31]
Internet-Draft Coupling AI and network management July 2022
Acknowledgments
This document is the result of a collective work. Authors of this
document are the main contributors and the editors but contributions
have been also received from the following people we acknowledge:
Laurent Ciavaglia, Felipe Alencar Lopes, Abdelkader Lahamdi, Albert
Cabellos, Jose Suarez-Varela, Marinos Charalambides, Ramin Sadre,
Pedro Martinez-Julia and Flavio Esposito
This document is also partially supported by project AI@EDGE, funded
from the European Union’s Horizon 2020 H2020-ICT-52 call for
projects, under grant agreement no. 101015922.
Authors’ Addresses
Jérôme François
Inria
615 rue du jardin botanique
Villers-lès-transparency
France
Email: jerome.francois@inria.fr
Alexander Clemm
Futurewei Technologies, Inc.
United States of America
Email: alex@clemm.org
Dimitri Papadimitriou
Nokia
Greece
Email: papadimitriou.dimitri.be@gmail.com
Stenio Fernandes
Central Bank of Canada
Canada
Email: steniofernandes@gmail.com
Stefan Schneider
Digital Railway (DSD) at Deutsche Bahn
Germany
Email: stefanschneider93@googlemail.com
François, et al. Expires 12 January 2023 [Page 32]
Network Working Group Y-G. HongInternet-Draft Daejeon UniversityIntended status: Informational S-B. OhExpires: January 12, 2023 KSA J-S. Youn DONG-EUI Univ S-J. Lee Korea University/KT H-K. Kahng Korea University July 11, 2022
Considerations of deploying AI services in a distributed approach draft-hong-nmrg-ai-deploy-01
Abstract
As the development of AI technology matured and AI technology began to be applied in various fields, AI technology is changed from running only on very high-performance servers with small hardware, including microcontrollers, low-performance CPUs and AI chipsets. In this document, we consider how to configure the system in terms of AI inference service to provide AI service in a distributed approach. Also, we describe the points to be considered in the environment where a client connects to a cloud server and an edge device and requests an AI service.
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 12, 2023.
Hong, et al. Expires January 12, 2023 [Page 1]
Internet-Draft Deploying AI services July 2022
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Procedure to provide AI services . . . . . . . . . . . . . . 4 3. Network configuration structure to provide AI services . . . 5 3.1. AI inference service on Local machine . . . . . . . . . . 6 3.2. AI inference service on Cloud server . . . . . . . . . . 6 3.3. AI inference service on Edge device . . . . . . . . . . . 7 3.4. AI inference service on Cloud server and Edge device . . 8 4. Considerations when configuring a system to provide AI services . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 8. Informative References . . . . . . . . . . . . . . . . . . . 12 Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction
In the Internet of Things (IoT), the amount of data generated from IoT devices has exploded along with the number of IoT devices due to industrial digitization and the development and dissemination of new devices. Various methods are being tried to effectively process the explosively increasing IoT devices and data of IoT devices. One of them is to provide IoT services in a place located close to IoT devices and users, away from cloud computing that transmits all data generated from IoT devices to a cloud server[I-D.irtf-t2trg-iot-edge].
IoT services also started to break away from the traditional method of analyzing IoT data collected so far in the cloud and delivering the analyzed results back to IoT objects or devices. In other words, AIoT (Artificial Intelligence of Things) technology, a combination of
Hong, et al. Expires January 12, 2023 [Page 2]
Internet-Draft Deploying AI services July 2022
IoT technology and artificial intelligence (AI) technology, started to be discussed at international standardization organizations such as ITU-T. AIoT technology, discussed by the ITU-T CG-AIoT group, is defined as a technology that combines AI technology and IoT infrastructure to achieve more efficient IoT operations, improve human-machine interaction, and improve data management and analysis[CG-AIoT].
The first work started by the IETF to apply IoT technology to the Internet was to research a lightweight protocol stack instead of the existing TCP/IP protocol stack so that various types of IoT devices, not traditional Internet terminals, could access the Internet. It was a technology that made it possible to connect to the Internet[RFC6574][RFC7452]. These technologies have been developed by 6LoWPAN working group, 6lo working group, 6tisch working group, core working group, t2trg group, etc. As the development of AI technology matured and AI technology began to be applied in various fields, just as IoT technology was mounted on resource-constrained devices and connected to the Internet, AI technology is also changed from running only on very high-performance servers with the old GPU installed. The technology is being developed to run on small hardware, including microcontrollers, low-performance CPUs and AI chipsets. This technology development direction is called On-device AI or TinyML[tinyML].
In this document, we consider how to configure the system in terms of AI inference service to provide AI service in the IoT environment. In the IoT environment, the technology of collecting sensing data from various sensors and delivering it to the cloud has already been studied by many standardization organizations including the IETF and many standards have been developed. Now, after creating an AI model to provide AI services based on the collected data, how to configure this AI model as a system has become the main research goal. Until now, it has been common to develop AI services that collect data and perform inferences from the trained servers, but in terms of the spread and spread of AI services, it is not appropriate to use expensive servers to provide AI services. In addition, since the server that collects and trains data mainly exists in the form of a cloud server, there are also many problems in proceeding in the form of requesting AI service by connecting a large number of terminals to these cloud servers to provide AI services. Therefore, when an AI service is requested to an edge device located at a close distance, it may have effects such as real-time service support, network traffic reduction, and important data security rather than requesting an AI service to an AI server located in a distant cloud.[I-D.irtf-t2trg-iot-edge]
Hong, et al. Expires January 12, 2023 [Page 3]
Internet-Draft Deploying AI services July 2022
Even if an edge device is used to serve AI services, it is still important to connect to an AI server in the cloud for tasks that take a lot of time or require a lot of data. Therefore, an offloading technique for properly distributing the workload between the cloud server and the edge device is also a field that is being actively studied. In this contribution, in the following proposed network structure, the points to be considered in the environment where a client connects to a server and an edge device and requests an AI service are derived and described. That is, the following considerations and options could be derived.
o AI inference service execution entity
o Hardware specifications of the machine to perform AI inference services
o Selection of AI models to perform AI inference services
o A method of providing AI services from cloud servers or edge devices
o Communication method to transmit data to request AI inference service
2. Procedure to provide AI services
Since research on AI services has been started for a long time, there may be shapes to provide various types of AI services. However, due to the nature of AI technology, in general, a system for providing AI services consists of the following steps[AI_inference_archtecture][Google_cloud_iot].
+-----------+ +-----------+ +-----------+ +-----------+ +-----------+| Collect & | | Analysis &| | Train | | Deploy & | | Monitor & || Store |->| Preprocess|->| AI model |->| Inference |->| Maintain || data | | data | | | | AI model | | Accuracy |+-----------+ +-----------+ +-----------+ +-----------+ +-----------+|<--------->| |<------------------------>| |<--------->| |<--------->| Sensor, DB AI Server Target AI Server & machine Target machine|<---------------->|<--------------------->|<-------------->|<--------->| Interent Local Internet Local & Internet
Figure 1: AI service workflow
o Data collection & Store
Hong, et al. Expires January 12, 2023 [Page 4]
Internet-Draft Deploying AI services July 2022
o Data Analysis & Preprocess
o AI Model Training
o AI Model Deploy & Inference
o Monitor & Maintain Accuracy
In the data collection step, data required for training is prepared by collecting data from sensors and IoT devices or by using data stored in a database. Equipment involved in this step includes sensors, IoT devices and servers that store them, and database servers. Since the operations performed at this step are conducted through the Internet, many IoT technologies studied by the IETF so far have developed technologies suitable for this step.
In the data analysis and pre-processing step, the features of the prepared data are analyzed and pre-processing for training is performed. Equipment involved in this step includes a high- performance server equipped with a GPU and a database server, and is mainly performed in the local network.
In the model training step, a training model is created by applying an algorithm suitable for the characteristics of the data and the problem to be solved. Equipment involved in this step includes a high-performance server equipped with a GPU, and is mainly performed on a local network.
In the model deploying and inference service provision step, the problem to be solved (e.g., classification, regression problem) is solved using AI technology. Equipment involved in this step may include a target machine, a client, a cloud, etc. that provide AI services, and since various equipment is involved in this stage, it is conducted through the Internet. This document summarizes the factors to be considered at this step.
In the accuracy monitoring step, if the performance deteriorates due to new data, a new model is created through re-training, and the AI service quality is maintained by using the newly created model. This step is the same as described in the model training, model deploying, and inference service provision steps described in the previous step because re-training and model deploying are performed again.
3. Network configuration structure to provide AI services
In general, after training the AI model, the AI model can be built on a local machine for AI model deploying and inference services to provide AI services. Alternatively, we can place AI models on cloud
Hong, et al. Expires January 12, 2023 [Page 5]
Internet-Draft Deploying AI services July 2022
servers or edge devices and make AI service requests remotely. In addition, for overall service performance, some AI service requests to the cloud server and some AI service requests to edge devices can be performed through appropriate load balancing.
3.1. AI inference service on Local machine
The following figure shows a case where a client module requesting AI service on the same local machine requests AI service from an AI server module on the same machine.
+---------------------------------------------------------------------+ | | | +-----------------+ Request AI +-----------------+ | | | Client module | Inference service | Server module | | | | for AI service |----------------------->| for AI service | | | | |<-----------------------| | | | +-----------------+ Reply AI +-----------------+ | | Inference result | +---------------------------------------------------------------------+ Local machine
Figure 2: AI inference service on Local machine
This method is often used when configuring a system focused on training AI models to improve the inference accuracy and performance of AI models without considering AI services or AI model deploying and inference in particular. In this case, since the client module that requests the AI inference service and the AI server module that directly performs the AI inference service are on the same machine, it is not necessary to consider the communication/network environment or service provision method too much. Alternatively, this method can be used when we want to simply decorate the AI inference service on one machine without changing the AI service in the future, such as an embedded machine or a customized machine.
In this case, a high level of hardware performance is not required to train the AI model, but hardware performance sufficient to run the AI inference service is required, so it is possible on a machine with a certain amount of hardware performance.
3.2. AI inference service on Cloud server
The following figure shows the case where the client module that requests AI service and the AI server module that directly performs AI service run on different machines.
Hong, et al. Expires January 12, 2023 [Page 6]
Internet-Draft Deploying AI services July 2022
+--------------------------------------++------------------------+ | +---------------------------+ || +-----------------+ | | | +-----------------+ | || | Client module |<-+--------+-----+---->| Server module | | || | for AI service | | | | | for AI service | | || +-----------------+ | | | +-----------------+ | |+------------------------+ | + --------------------------+ | Client machine | Server machine | +--------------------------------------+ Cloud(Internet)
Figure 3: AI inference service on Cloud server
In this case, the client module requesting the AI inference service runs on the client machine, and the AI server module that directly performs the AI inference service runs on a separate server machine, and this server machine is in the cloud network. In this case, the performance of the client machine does not need to be high because the client machine simply needs to request the AI inference service and, if necessary, deliver only the data required for the AI service request. For the AI server module that directly performs AI inference service, we can set up our own AI server, or we can use commercial clouds such as Amazon, Microsoft, and Google.
3.3. AI inference service on Edge device
The following figure shows the case where the client module that requests AI service and the AI server module that directly performs AI service are separated, and the AI server module is located in the edge device.
+--------------------------------------++------------------------+ | +---------------------------+ || +-----------------+ | | | +-----------------+ | || | Client module |<-+--------+-----+---->| Server module | | || | for AI service | | | | | for AI service | | || +-----------------+ | | | +-----------------+ | |+------------------------+ | + --------------------------+ | Client machine | Edge device | +--------------------------------------+ Edge network
Figure 4: AI inference service on Edge device
Hong, et al. Expires January 12, 2023 [Page 7]
Internet-Draft Deploying AI services July 2022
Even in this case, the client module that requests the AI inference service runs on the client machine, the AI server module that directly performs the AI inference service runs on the edge device, and the edge device is in the edge network. Even in this case, the client module that requests the AI inference service runs on the client machine, the AI server module that directly performs the AI inference service runs on the edge device, and the edge device is in the edge network. The AI module that directly performs the AI inference service on the edge device can directly configure the edge device or use a commercial edge computing module.
The difference from the above case where the AI server module is in the cloud is that the edge device is usually close to the client, whereas the performance is lower than that of the server in the cloud, so there are advantages in data transfer time and inference time, but in unit time Inference service performance is poor.
3.4. AI inference service on Cloud server and Edge device
The following figure shows the case where AI server modules that directly perform AI services are distributed in the cloud and edge devices.
Hong, et al. Expires January 12, 2023 [Page 8]
Internet-Draft Deploying AI services July 2022
+--------------------------------------++------------------------+ | +---------------------------+ || +-----------------+ | | | +-----------------+ | || | Client module |<-+---+----+-----+---->| Server module | | || | for AI service |<-+---+ | | | for AI service | | || +-----------------+ | | | | +-----------------+ | |+------------------------+ | | + --------------------------+ | Client machine | | Edge device | | +--------------------------------------+ | Edge network | | +--------------------------------------+ | | +---------------------------+ | | | | +-----------------+ | | +----+-----+---->| Server module | | | | | | for AI service | | | | | +-----------------+ | | | + --------------------------+ | | Server machine | +--------------------------------------+ Cloud(Internet)
Figure 5: AI inference service on Cloud sever and Edge device
There is a difference between the AI server module performed in the cloud and the AI server module performed on the edge device in terms of AI inference service performance. Therefore, the client requesting the AI inference service may request by distributing the AI inference service request to the cloud and edge device appropriately in order to perform the desired AI service. In other words, in the case of an AI service with low inference accuracy but short inference time, we can request an AI inference service to the edge device.
4. Considerations when configuring a system to provide AI services
As described in the previous chapter, the AI server module that directly performs AI inference services by utilizing AI models can be performed on a local machine or a cloud server or an edge device. In theory, if AI inference service is performed on a local machine, AI service can be provided without communication delay time or packet loss, but a certain amount of hardware performance is required to perform AI service inference. So, in the future environment where AI services become popular, such as when various AI services are activated and AI services are disseminated, the cost of a machine that performs AI services is important and this case would not that
Hong, et al. Expires January 12, 2023 [Page 9]
Internet-Draft Deploying AI services July 2022
many. If so, whether the AI inference service will be performed on the cloud server or the discount price on the edge device can be a determining factor in the system configuration.
When AI inference service request is made to a distant cloud server, it may take a lot of time to transmit, but it has the advantage of being able to perform many AI inference service requests in a short time, and the accuracy of AI service inference increases. Conversely, when an AI service request is made to a nearby edge device, the transmission time is short, but many AI inference service requests cannot be performed at once, and the accuracy of AI service inference is lowered. Therefore, by analyzing the characteristics and requirements of the AI service to be performed, it is necessary to determine where to perform the AI inference service on a local machine, a cloud server, or an edge device.
According to the characteristics of the AI service, the characteristics of the data used for training and the problem to be solved, the hardware characteristics of the machine performing the AI service varies. In general, machines on cloud servers are viewed as machines with higher performance than edge devices. However, the performance of AI inference service varies depending on how the hardware such as CPU, RAM, GPU, and network interface is configured for each cloud server and edge device. If we do not think about cost, it is good to configure a system for performing AI services with a machine with the best hardware performance, but in reality, we should always consider the cost when configuring the system. So, according to the characteristics and requirements of the AI service to be performed, the performance of the local machine, cloud server, and edge device must be determined.
Although not directly related to communication/network, the biggest influence on AI inference services is the AI model to be used for AI inference service. For example, in AI services such as image classification, there are various types of AI models such as ResNet, EfficientNet, VGG, and Inception. These AI models differ in AI inference accuracy, but also in AI model file size and AI inference time. AI models with the highest inference accuracy typically have very large file sizes and take a lot of AI inference time. So, when constructing an AI service system, it is not always good to choose an AI model with the highest AI inference accuracy. Again, it is important to select an AI model according to the characteristics and requirements of the AI service to be performed.
Experimentally, it is recommended to use an AI model with high AI inference accuracy in the cloud server, and use an AI model that can provide fast AI inference service although the AI inference accuracy
Hong, et al. Expires January 12, 2023 [Page 10]
Internet-Draft Deploying AI services July 2022
is slightly lower for the fast AI inference service in the edge device.
It might be a bit of an implementation issue, but we should also consider how we deliver AI services on cloud servers or edge devices. With the current technology, a traditional web server method or a server method specialized for AI service inference (e.g., Google’s Tensorflow Serving) can be used. Traditional web server methods such as Flask and Django have the advantage of running on various types of machines, but since they are designed to support general web services, the service execution time is not fast. Tensorflow Serving uses the features of Tensorflow to make AI service inference services very fast and efficient. However, older CPUs that do not support AVX cannot use the Tensorflow serving function because Google’s Tensorflow does not run. Therefore, rather than unconditionally using the server method specialized in AI service inference, it is necessary to decide the AI server module method that provides AI services in consideration of the hardware characteristics of the AI system that can be built.
The communication method for transferring data to request AI inference service is also an important decision in constructing an AI system. Using the traditional REST method, it can be used for various machines and services, but its performance is inferior to Google’s gRPC. There are many advantages to using gRPC for AI inference services because Google’s gRPC enables large-capacity data transfer and efficient data transfer compared to REST.
Cloud-edge collaboration-based AI service development is actively underway. In particular, in the case of AI services that are sensitive to network delays, such as object recognition and autonomous vehicle services, (micro)services for inference are placed on edge devices to obtain fast inference results and provide services. As such, in the development of intelligent IoT services, various devices that can provide computing services within the network, such as edge devices, are being added as network elements, and the number of IoT devices using them is rapidly increasing. Therefore, a new function for computing resource management and operation is required in terms of providing computing services within the network.
5. IANA Considerations
There are no IANA considerations related to this document.
Hong, et al. Expires January 12, 2023 [Page 11]
Internet-Draft Deploying AI services July 2022
6. Security Considerations
When AI service is performed on a local machine, there is no security issue, but when AI service is provided through a cloud server or edge device, IP address and port number may be known to the outside can attack. Therefore, when providing AI services by utilizing machines on the network such as cloud servers and edge devices, it is necessary to analyze the characteristics of the modules to be used well, identify vulnerabilities in security, and take countermeasures.
7. Acknowledgements
TBA
8. Informative References
[RFC6574] Tschofenig, H. and J. Arkko, "Report from the Smart Object Workshop", RFC 6574, DOI 10.17487/RFC6574, April 2012, <https://www.rfc-editor.org/info/rfc6574>.
[RFC7452] Tschofenig, H., Arkko, J., Thaler, D., and D. McPherson, "Architectural Considerations in Smart Object Networking", RFC 7452, DOI 10.17487/RFC7452, March 2015, <https://www.rfc-editor.org/info/rfc7452>.
[I-D.irtf-t2trg-iot-edge] Hong, J., Hong, Y., de Foy, X., Kovatsch, M., Schooler, E., and D. Kutscher, "IoT Edge Challenges and Functions", draft-irtf-t2trg-iot-edge-07 (work in progress), June 2022.
[CG-AIoT] "ITU-T CG-AIoT", <https://www.itu.int/en/ITU-T/ studygroups/2017-2020/20/Pages/ifa-structure.aspx>.
[tinyML] "tinyML Foundation", <https://www.tinyml.org/>.
[AI_inference_archtecture] "IBM Systems, AI Infrastructure Reference Architecture", <https://www.ibm.com/downloads/cas/W1JQBNJV>.
[Google_cloud_iot] "Bringing intelligence to the edge with Cloud IoT", <https://cloud.google.com/blog/products/gcp/bringing- intelligence-edge-cloud-iot>.
Hong, et al. Expires January 12, 2023 [Page 12]
Internet-Draft Deploying AI services July 2022
Authors’ Addresses
Yong-Geun Hong Daejeon University 62 Daehak-ro, Dong-gu Daejeon 34520 Korea
Phone: +82 42 280 4841 Email: yonggeun.hong@gmail.com
SeokBeom Oh KSA Digital Transformation Center, 5 Teheran-ro 69-gil, Gangnamgu Seoul 06160 Korea
Phone: +82 2 1670 6009 Email: isb6655@korea.ac.kr
Joo-Sang Youn DONG-EUI University 176 Eomgwangno Busan_jin_gu Busan 614-714 Korea
Phone: +82 51 890 1993 Email: joosang.youn@gmail.com
SooJeong Lee Korea University/KT 2511 Sejong-ro Sejong City 30019 Korea
Email: ngenius@korea.ac.kr
Hong, et al. Expires January 12, 2023 [Page 13]
Internet-Draft Deploying AI services July 2022
Hyun-Kook Kahng Korea University 2511 Sejong-ro Sejong City 30019 Korea
Email: kahng@korea.ac.kr
Hong, et al. Expires January 12, 2023 [Page 14]
Internet Research Task Force C. ZhouInternet-Draft H. YangIntended status: Informational X. DuanExpires: 12 January 2023 China Mobile D. Lopez A. Pastor Telefonica I+D Q. Wu Huawei M. Boucadair C. Jacquenet Orange 11 July 2022
Digital Twin Network: Concepts and Reference Architecture draft-irtf-nmrg-network-digital-twin-arch-01
Abstract
Digital Twin technology has been seen as a rapid adoption technology in Industry 4.0. The application of Digital Twin technology in the networking field is meant to develop various rich network applications and realize efficient and cost effective data driven network management and accelerate network innovation.
This document presents an overview of the concepts of Digital Twin Network, provides the basic definitions and a reference architecture, lists a set of application scenarios, and discusses the benefits and key challenges of such technology.
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on 12 January 2023.
Zhou, et al. Expires 12 January 2023 [Page 1]
Internet-Draft Digital Twin Network Concept July 2022
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Acronyms & Abbreviations . . . . . . . . . . . . . . . . 4 2.2. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4 3. Introduction and Concepts of Digital Twin Network . . . . . . 4 3.1. Background of Digital Twin . . . . . . . . . . . . . . . 4 3.2. Digital Twin for Networks . . . . . . . . . . . . . . . . 5 3.3. Definition of Digital Twin Network . . . . . . . . . . . 6 4. Benefits of Digital Twin Network . . . . . . . . . . . . . . 9 4.1. Optimized Network Total Cost of Operation . . . . . . . . 10 4.2. Optimized Decision Making . . . . . . . . . . . . . . . . 10 4.3. Safer Assessment of Innovative Network Capabilities . . . 10 4.4. Privacy and Regulatory Compliance . . . . . . . . . . . . 11 4.5. Customized Network Operation Training . . . . . . . . . . 11 5. Challenges to Build Digital Twin Network . . . . . . . . . . 11 6. A Reference Architecture of Digital Twin Network . . . . . . 13 7. Enabling Technologies to Build Digital Twin Network . . . . . 16 7.1. Data Collection and Data Services . . . . . . . . . . . . 16 7.2. Network Modeling . . . . . . . . . . . . . . . . . . . . 17 7.3. Network Visualization . . . . . . . . . . . . . . . . . . 18 7.4. Interfaces . . . . . . . . . . . . . . . . . . . . . . . 19 8. Interaction with IBN . . . . . . . . . . . . . . . . . . . . 19 9. Sample Application Scenarios . . . . . . . . . . . . . . . . 20 9.1. Human Training . . . . . . . . . . . . . . . . . . . . . 20 9.2. Machine Learning Training . . . . . . . . . . . . . . . . 20 9.3. DevOps-Oriented Certification . . . . . . . . . . . . . . 21 9.4. Network Fuzzing . . . . . . . . . . . . . . . . . . . . . 21 10. Research Perspectives: A Summary . . . . . . . . . . . . . . 21 11. Security Considerations . . . . . . . . . . . . . . . . . . . 21 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 14. Open issues . . . . . . . . . . . . . . . . . . . . . . . . . 22
Zhou, et al. Expires 12 January 2023 [Page 2]
Internet-Draft Digital Twin Network Concept July 2022
15. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 15.1. Normative References . . . . . . . . . . . . . . . . . . 23 15.2. Informative References . . . . . . . . . . . . . . . . . 23 Appendix A. Change Logs . . . . . . . . . . . . . . . . . . . . 27 Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 28
1. Introduction
The fast growth of network scale and the increased demand placed on these networks require them to accommodate and adapt dynamically to customer needs, implying a significant challenge to network operators. Indeed, network operation and maintenance are becoming more complex due to higher complexity of the managed networks and the sophisticated services they are delivering. As such, providing innovations on network technologies, management and operation will be more and more challenging due to the high risk of interfering with existing services and the higher trial costs if no reliable emulation platforms are available.
A Digital Twin is the real-time representation of a physical entity in the digital world. It has the characteristics of virtual-reality interrelation and real-time interaction, iterative operation and process optimization, full life-cycle and comprehensive data-driven network infrastructure. Currently, digital twin has been widely acknowledged in academic publications. See more in Section 3.
A digital twin for networks platform can be built by applying Digital Twin technologies to networks and creating a virtual image of physical network facilities (called herein, emulation). Basically, the digital twin for networks is an expansion platform of network simulation. The main difference compared to traditional network management systems is the interactive virtual-real mapping and data driven approach to build closed-loop network automation. Therefore, a digital twin network platform is more than an emulation platform or network simulator.
Through the real-time data interaction between the physical network and its twin network(s), the digital twin network platform might help the network designers to achieve more simplification, automatic, resilient, and full life-cycle operation and maintenance. More specifically, the digital twin network can, thus, be used to develop various rich network applications and assess specific behaviors (including network transformation) before actual implementation in the physical network, tweak the network for better optimized behavior, run ’what-if’ scenarios that cannot be tested and evaluated easily in the physical network. In addition, service impact analysis tasks can also be facilitated.
Zhou, et al. Expires 12 January 2023 [Page 3]
Internet-Draft Digital Twin Network Concept July 2022
2. Terminology
2.1. Acronyms & Abbreviations
IBN: Intent-Based Networking
AI Artificial Intelligence
CI/CD: Continuous Integration / Continuous Delivery
ML: Machine Learning
OAM: Operations, Administration, and Maintenance
PLM: Product Lifecycle Management
2.2. Definitions
This document makes use of the following terms:
Digital Twin: a virtual instance of a physical system (twin) that is continually updated with the latter’s performance, maintenance, and health status data throughout the physical system’s life cycle.
Digital twin network: a digital twin that is used in the context of networking. This is also called, digital twin for networks. See more in Section 3.3.
3. Introduction and Concepts of Digital Twin Network
3.1. Background of Digital Twin
The concept of the "twin" dates to the National Aeronautics and Space Administration (NASA) Apollo program in the 1970s, where a replica of space vehicles on Earth was built to mirror the condition of the equipment during the mission [Rosen2015].
In 2003, Digital Twin was attributed to John Vickers by Michael Grieves in his product lifecycle management (PLM) course as "virtual digital representation equivalent to physical products" [Grieves2014]. Digital twin can be defined as a virtual instance of a physical system (twin) that is continually updated with the latter’s performance, maintenance, and health status data throughout the physical system’s life cycle [Madni2019]. By providing a living copy of physical system, digital twins bring numerous advantages, such as accelerated business processes, enhanced productivity, and faster innovation with reduced costs. So far, digital twin has been
Zhou, et al. Expires 12 January 2023 [Page 4]
Internet-Draft Digital Twin Network Concept July 2022
successfully applied in the fields of intelligent manufacturing, smart city, or complex system operation and maintenance to help with not only object design and testing, but also management aspects [Tao2019].
Compared with ’digital model’ and ’digital shadow’, the key difference of ’digital twin’ is the direction of data between the physical and virtual systems [Fuller2020]. Typically, when using a digital twin, the (twin) system is generated and then synchronized using data flows in both directions between physical and digital components, so that control data can be sent, and changes between the physical and digital objectives of systems are automatically represented. This behavior is unlike a ’digital model’ or ’digital shadow’, which are usually synchronized manually, lacking of control data, and might not have a full cycle of data integrated.
At present (2022), there is no unified definition of digital twin framework. The industry, scientific research institutions, and standards developing organizations are trying to define a general or domain-specific framework of digital twin. [Natis-Gartner2017] proposed that building a digital twin of a physical entity requires four key elements: model, data, monitoring, and uniqueness. [Tao2019] proposed a five-dimensional framework of digital twin {PE, VE, SS, DD, CN}, in which PE represents physical entity, VE represents virtual entity, SS represents service, DD represents twin data, and CN represents the connection between various components. [ISO-2021] issued a draft standard for digital twin manufacturing system, and proposed a reference framework including data collection domain, device control domain, digital twin domain, and user domain.
3.2. Digital Twin for Networks
Communication networks can provide a solid foundation for implementing various ’digital twin’ applications. At the same time, in the face of increasing business types, scale and complexity, a network itself also needs to use digital twin technology to seek better solutions beyond physical network. Since 2017, the application of digital twin technology in the field of communication networks has gradually been researched. Some examples are listed below.
In academy, [Dong2019] established the digital twin of 5G mobile edge computing (MEC) network, used the twin offline to train the resource allocation optimization and normalized energy-saving algorithm based on reinforcement learning, and then updated the scheme to MEC network. [Dai2020] established a digital twin edge network for mobile edge computing system, in which a twin edge server is used to evaluate the state of entity server, and the twin mobile edge
Zhou, et al. Expires 12 January 2023 [Page 5]
Internet-Draft Digital Twin Network Concept July 2022
computing system provides data for training offloading strategy. [Nguyen2021] discusses how to deploy a digital twin for complex 5G networks. [Hong2021] presents a digital twin platform towards automatic and intelligent management for data center networks, and then proposes a simplified the workflows of network service management. In addition, international workshops dedicated to digital twin in network field have already appeared, such as IEEE DTPI 2021 - Digital Twin Network Online Session [DTPI2021], and IEEE NOMS 2022 - TNT workshop [TNT2022].
Although the application of digital twin technology in networking has started, the research of digital twin for networks technology is still in its infancy. Current applications focus on specific scenarios (such as network optimization), where network digital twin is just used as a network simulation tool to solve the problem of network operation and maintenance. Combined with the characteristics of digital twin technology and its application in other industries, this document believes that digital twin network can be regarded as an indispensable part of the overall network system and provides a general architecture involving the whole life cycle of physical network in the future, serving the application of network innovative technologies such as network planning, construction, maintenance and optimization, improving the automation and intelligence level of the network.
3.3. Definition of Digital Twin Network
So far, there is no standard definition of "digital twin network" within the networking industry. This document defines "digital twin network" as a virtual representation of the physical network. Such virtual representation of the network is meant to be used to analyze, diagnose, emulate, and then control the physical network based on data, models, and interfaces. To that aim, a real-time and interactive mapping is required between the physical network and its virtual twin network.
Referring the characteristics of digital twin in other industries and the characteristics of the networking itself, the digital twin network should involve four key elements: data, mapping, models and interfaces as shown in Figure 1.
Zhou, et al. Expires 12 January 2023 [Page 6]
Internet-Draft Digital Twin Network Concept July 2022
+-------------+ +--------------+ | | | | | Mapping | | Interface | | | | | +-------------+-----------------+--------------+ | | | Analyze, Diagnose | | | | +----------------------+ | | | Digital Twin Network | | | +----------------------+ | +------------+ +------------+ | | Emulate, Control | | | Models | | Data | | |------------------------| | +------------+ +------------+
Figure 1: Key Elements of Digital Twin Network
Data: A digital twin network should maintain historical data and/or real time data (configuration data, operational state data, topology data, trace data, metric data, process data, etc.) about its real-world twin (i.e. physical network) that are required by the models to represent and understand the states and behaviors of the real-world twin.
The data is characterized as the single source of "truth" and populated in the data repository, which provides timely and accurate data service support for building various models.
Models: Techniques that involve collecting data from one or more sources in the real-world twin and developing a comprehensive representation of the data (e.g., system, entity, process) using specific models. These models are used as emulation and diagnosis basis to provide dynamics and elements on how the live physical network operates and generates reasoning data utilized for decision-making.
Various models such as service models, data models, dataset models, or knowledge graph can be used to represent the physical network assets and, then, instantiated to serve various network applications.
Interfaces: Standardized interfaces can ensure the interoperability of digital twin network. There are two major types of interfaces:
* The interface between the digital twin network platform and the physical network infrastructure.
Zhou, et al. Expires 12 January 2023 [Page 7]
Internet-Draft Digital Twin Network Concept July 2022
* The interface between digital twin network platform and applications.
The former provides real-time data collection and control on the physical network. The latter helps in delivering application requests to the digital twin network platform and exposing the various platform capabilities to applications.
Mapping: Used to identify the digital twin and the underlying entities and establish a real-time interactive relation between the physical network and the twin network or between two twin networks. The mapping can be:
* One to one (pairing, vertical): Synchronize between a physical network and its virtual twin network with continuous flows.
* One to many (coupling, horizontal): Synchronize among virtual twin networks with occasional data exchange.
Such mappings provide a good visibility of actual status, making the digital twin suitable to analyze and understand what is going on in the physical network. It also allows using the digital twin to optimize the performance and maintenance of the physical network.
The digital twin network constructed based on the four core technology elements can analyze, diagnose, emulate, and control the physical network in its whole life cycle with the help of optimization algorithms, management methods, and expert knowledge. One of the objectives of such control is to master the digital twin network environment and its elements to derive the required system behavior, e.g., provide:
* repeatability: that is the capacity to replicate network conditions on-demand.
* reproducibility: i.e., the ability to replay successions of events, possibly under controlled variations.
Zhou, et al. Expires 12 January 2023 [Page 8]
Internet-Draft Digital Twin Network Concept July 2022
Note: Real-time interaction is not always mandatory for all twins. When testing some configuration changes or trying some innovative techniques, the digital twins can behave as a simulation platform without the need of real time telemetry data. And even in this scenario, it is better to have interactive mapping capability so that the validated changes can be tested in real network whenever required by the testers. In most other cases (e.g., network optimization, network fault recovery), real-time interaction between virtual and real network is mandatory. This way, digital twin network can help achieve the goal of autonomous network or self-driven network.
4. Benefits of Digital Twin Network
Digital twin network can help enabling closed-loop network management across the entire lifecycle, from deployment and emulation, to visualized assessment, physical deployment, and continuous verification. By doing so, network operators and end-users to some extent, as allowed by specific application interfaces, can maintain a global, systemic, and consistent view of the network. Also, network operators and/or enterprise user can safely exercise the enforcement of network planning policies, deployment procedures, etc., without jeopardizing the daily operation of the physical network.
The main difference between digital twin network and simulation platform is the use of interactive virtual-real mapping to build closed-loop network automation. Simulation platforms are the predecessor of the digital twin network, one example of such a simulation platform is network simulator [NS-3], which can be seen as a variant of digital twin network but with low fidelity and lacking for interactive interfaces to the real network. Compared with those classical approaches, key benefits of digital twin network can be summarized as follows:
1) Using real-time data to establish high fidelity twins, the effectiveness of network simulation is higher; then the simulation cost will be relatively low.
2) The impact and risk on running networks is low when automatically applying configuration/policy changes after the full analysis and required verifications (e.g., service impact analysis) within the twin network.
3) The faults of the physical network can be automatically captured by analyzing real-time data, then the correction strategy can be distributed to the physical network elements after conducting adequate analysis within the twins to complete the closed-loop automatic fault repair.
Zhou, et al. Expires 12 January 2023 [Page 9]
Internet-Draft Digital Twin Network Concept July 2022
The following subsections further elaborate such benefits in details.
4.1. Optimized Network Total Cost of Operation
Large scale networks are complex to operate. Since there is no effective platform for simulation, network optimization designs have to be tested on the physical network at the cost of jeopardizing its daily operation and possibly degrading the quality of the services supported by the network. Such assessment greatly increases network operator’s Operational Expenditure (OPEX) budgets too.
With a digital twin network platform, network operators can safely emulate candidate optimization solutions before deploying them in the physical network. In addition, operator’s OPEX on the real physical network deployment will be greatly decreased accordingly at the cost of the complexity of the assessment and the resources involved.
4.2. Optimized Decision Making
Traditional network operation and management mainly focus on deploying and managing running services, but hardly support predictive maintenance techniques.
Digital twin network can combine data acquisition, big data processing, and AI modeling to assess the status of the network, but also to predict future trends, and better organize predictive maintenance. The ability to reproduce network behaviors under various conditions facilitates the corresponding assessment of the various evolution options as often as required.
4.3. Safer Assessment of Innovative Network Capabilities
Testing a new feature in an operational network is not only complex, but also extremely risky. Service impact analysis is required to be adequately achieved prior to effective activation of a new feature.
Digital twin network can greatly help assessing innovative network capabilities without jeopardizing the daily operation of the physical network. In addition, it helps researchers to explore network innovation (e.g., new network protocols, network AI/ML applications) efficiently, and network operators to deploy new technologies quickly with lower risks. Take AI/ ML application as example, it is a conflict between the continuous high reliability requirement (i.e., 99.999%) and the slow learning speed or phase-in learning steps of AI/ML algorithms. With digital twin network, AI/ML can complete the learning and training with the sufficient data before deploying the model in the real network. This would encourage more network AI innovations in future networks.
Zhou, et al. Expires 12 January 2023 [Page 10]
Internet-Draft Digital Twin Network Concept July 2022
4.4. Privacy and Regulatory Compliance
The requirements on data confidentiality and privacy on network providers increase the complexity of network management, as decisions made by computation logics such as an SDN controller may rely upon the packet payloads. As a result, the improvement of data-driven management requires complementary techniques that can provide a strict control based upon security mechanisms to guarantee data privacy protection and regulatory compliance. This may range from flow identification (using the archetypal five-tuple of addresses, ports and protocol) to techniques requiring some degree of payload inspection, all of them considered suitable to be associated to an individual person, and hence requiring strong protection and/or data anonymization mechanisms.
With strong modeling capability provided by the digital twin network, very limited real data (if at all) will be needed to achieve similar or even higher level of data-driven intelligent analysis. This way, a lower demand of sensitive data will permit to satisfy privacy requirements and simplify the use of privacy-preserving techniques for data-driven operation.
4.5. Customized Network Operation Training
Network architectures can be complex, and their operation requires expert personnel. Digital twin network offers an opportunity to train staff for customized networks and specific user needs. Two salient examples are the application of new network architectures and protocols or the use of "cyber-ranges" to train security experts in threat detection and mitigation.
5. Challenges to Build Digital Twin Network
According to [Hu2021], the main challenges in building and mantaining digital twins can be summarized as the following five aspects:
* Data acquisition and processing
* High-fidelity modeling
* Real-time, two-way communication between the virtual and the real twins
* Unified development platform and tools
* Environmental coupling technologies
Zhou, et al. Expires 12 January 2023 [Page 11]
Internet-Draft Digital Twin Network Concept July 2022
Compared with other industrial fields, digital twin in networking field has its unique characteristics. On one hand, network elements and system have higher level of digitalization, which implies that data acquisition and virtual-real communication are relatively easy to achieve. On the other hand, there are various different type of network elements and typologies in the network field; and the network size is characterized by the numbers of nodes and links in it but the network size growth pace can not meet the service needs, especially in the deployment of end to end service which spans across multiple administrative domains. So, the construction of a digital twin network system needs to consider the following major challenges:
Large scale challenge: A digital twin of large-scale networks will significantly increase the complexity of data acquisition and storage, the design and implementation of relevant models. The requirements of software and hardware of the digital twin network system will be even more constraining. Therefore, efficient and low cost tools in various fields should be required. Take data as an example, massive network data can help achieve more accurate models. However, the cost of virtual-real communication and data storage becomes extremely expensive, especially in the multi- domain data-driven network management case, therefore efficient tools on data collection and data compression methods must be used.
Interoperability: Due to the inconsistency of technical implementations and the heterogeneity of vendor adopted technologies, it is difficult to establish a unified digital twin network system with a common technology in a network domain. Therefore, it is needed firstly to propose a unified architecture of digital twin network, in which all components and functionalities are clear to all stakeholders; then define standardized and unified interfaces to connect all network twins via ensuring necessary compatibility.
Data modeling difficulties: Based on large-scale network data, data modeling should not only focus on ensuring the accuracy of model functions, but also has to consider the flexibility and scalability to compose and extend as required to support large scale and multi-purpose applications. Balancing these requirements further increases the complexity of building efficient and hierarchical functional data models. As an optional solution, straightforwardly clone the real network using virtualized resources is feasible to build the twin network when the network scale is relatively small. However, it will be of unaffordable resource cost for larger scales network. In this case, network modeling using mathematical abstraction or leveraging the AI algorithms will be more suitable solutions.
Zhou, et al. Expires 12 January 2023 [Page 12]
Internet-Draft Digital Twin Network Concept July 2022
Real-time requirements: Network services normally have real-time requirements, the processing of model simulation and verification through a digital twin network will introduce the service latency. Meanwhile, the real-time requirements will further impose performance requirements on the system software and hardware. However, given the nature of distributed systems and propagation delays, it is challenge to keep network digital twins in sync or auto-sync between physical network and digital twin network. Changes to the digital object automatically drive changes in the physical object can be even challenging. To address these requirements, the function and process of the data model need to be based on automated processing mechanism under various network application scenarios. On the one hand, it is needed to design a simplified process to reduce the time cost for tasks in network twin as much as possible; on the other hand, it is recommended to define the real-time requirements of different applications, and then match the corresponding computing resources and suitable solutions as needed to complete the task processing in the twin.
Security risks: A digital twin network has to synchronize all or subset of the data related to involved physical networks in real time, which inevitably augments the attack surface, with a higher risk of information leakage, in particular. On one hand, it is mandatory to design more secure data mechanism leveraging legacy data protection methods, as well as innovative technologies such as block chain. On the other hand, the system design can limit the data (especially raw data) requirement on building digital twin network, leveraging innovative modeling technologies such as federal learning.
In brief, to address the above listed challenges, it is important to firstly propose a unified architecture of digital twin network, which defines the main functional components and interfaces (Section 6). Then, relying upon such an architecture, it is required to continue researching on the key enabling technologies including data acquisition, data storage, data modeling, interface standardization, and security assurance.
6. A Reference Architecture of Digital Twin Network
Based on the definition of the key digital twin network technology elements introduced in Section 3.3, a digital twin network architecture is depicted in Figure 2. This digital twin network architecture is broken down into three layers: Application Layer, Digital Twin Layer, and Physical Network Layer.
Zhou, et al. Expires 12 January 2023 [Page 13]
Internet-Draft Digital Twin Network Concept July 2022
+---------------------------------------------------------+ | +-------+ +-------+ +-------+ | | | App 1 | | App 2 | ... | App n | Application| | +-------+ +-------+ +-------+ | +-------------^-------------------+-----------------------+ |Capability Exposure| Intent Input | | +-------------+-------------------v-----------------------+ | Instance of Digital Twin Network | | +--------+ +------------------------+ +--------+ | | | | | Service Mapping Models | | | | | | | | +------------------+ | | | | | | Data +---> |Functional Models | +---> Digital| | | | Repo- | | +-----+-----^------+ | | Twin | | | | sitory | | | | | | Network| | | | | | +-----v-----+------+ | | Mgmt | | | | <---+ | Basic Models | <---+ | | | | | | +------------------+ | | | | | +--------+ +------------------------+ +--------+ | +--------^----------------------------+-------------------+ | | | data collection | control +--------+----------------------------v-------------------+ | Physical Network | | | +---------------------------------------------------------+
Figure 2: Reference Architecture of Digital Twin Network
Physical Network: All or subset of network elements in the physical network exchange network data and control messages with a network digital twin instance, through twin-physical control interfaces. The physical network can be a mobile access network, a transport network, a mobile core, a backbone, etc. The physical network can also be a data center network, a campus enterprise network, an industrial Internet of Things, etc.
The physical network can span across a single network administrative domain or multiple network administrative domains.
This document focuses on the IETF related physical network such as IP bearer network and datacenter network.
Digital Twin Layer: This layer includes three key subsystems: Data Repository subsystem, Service Mapping Models subsystem, and Digital Twin Network Management subsystem. These key subsystems can be placed in one single network administrative domain and provide the service to the application (e.g.,SDN controller) in
Zhou, et al. Expires 12 January 2023 [Page 14]
Internet-Draft Digital Twin Network Concept July 2022
other network administrative domain, or lied in every network administrative domain and coordinate between each other to provide services to the application in the upper layer.
One or multiple digital twin network instances can be built and maintained:
* Data Repository subsystem is responsible for collecting and storing various network data for building various models by collecting and updating the real-time operational data of various network elements through the twin southbound interface, and providing data services (e.g., fast retrieval, concurrent conflict handling, batch service) and unified interfaces to Service Mapping Models subsystem.
* Service Mapping Models complete data modeling, provide data model instances for various network applications, and maximizes the agility and programmability of network services. The data models include two major types: basic and functional models.
- Basic models refer to the network element model(s) and network topology model(s) of the network digital twin based on the basic configuration, environment information, operational state, link topology and other information of the network element(s), to complete the real-time accurate characterization of the physical network.
- Functional models refer to various data models used for network analysis, emulation, diagnosis, prediction, assurance, etc. The functional models can be constructed and expanded by multiple dimensions: by network type, there can be models serving for a single or multiple network domains; by function type, it can be divided into state monitoring, traffic analysis, security exercise, fault diagnosis, quality assurance and other models; by network lifecycle management, it can be divided into planning, construction, maintenance, optimization and operation. Functional models can also be divided into general models and special-purpose models. Specifically, multiple dimensions can be combined to create a data model for more specific application scenarios.
New applications might need new functional models that do not exist yet. If a new model is needed, ’Service Mapping Models’ subsystem will be triggered to help creating new models based on data retrieved from ’Data Repository’.
Zhou, et al. Expires 12 January 2023 [Page 15]
Internet-Draft Digital Twin Network Concept July 2022
* Digital Twin Network Management fulfils the management function of digital twin network, records the life-cycle transactions of the twin entity, monitors the performance and resource consumption of the twin entity or even of individual models, visualizes and controls various elements of the network digital twin, including topology management, model management and security management.
Notes: ’Data collection’ and ’change control’ are regarded as southbound interfaces between virtual and physical network. From implementation perspective, they can optionally form a sub-layer or sub-system to provide common functionalities of data collection and change control, enabled by a specific infrastructure supporting bi-directional flows and facilitating data aggregation, action translation, pre-processing and ontologies.
Application Layer: Various applications (e.g., Operations, Administration, and Maintenance (OAM)) can effectively run over a digital twin network platform to implement either conventional or innovative network operations, with low cost and less service impact on real networks. Network applications make requests that need to be addressed by the digital twin network. Such requests are exchanged through a northbound interface, so they are applied by service emulation at the appropriate twin instance(s).
7. Enabling Technologies to Build Digital Twin Network
This section briefly describes several key enabling technologies to build digital twin work system, based on the challenges and the reference architecture described in above sections. Actually, each enabling technology is worth of deep researching respectively and separately.
7.1. Data Collection and Data Services
Data collection technology is the foundation of building data repository for digital twin network. Target driven mode should be adopted for data collection from heterogeneous data sources. The type, frequency and method of data collection shall meet the application of digital twin network. Whenever building network models for a specific network application, the required data can be efficiently obtained from the data repository.
Diverse existing tools and methods (e.g., SNMP, NETCONF, IPFIX, Telemetry, INT, etc.) can be used to collect different type of network data. And, some innovative new methods (e.g., sketch-based measurement) can be used to acquire complex network data such as network performance. Also, data transformation and aggregation
Zhou, et al. Expires 12 January 2023 [Page 16]
Internet-Draft Digital Twin Network Concept July 2022
capacity can be used to improve the applicability on network modelling. Toward building data repository for a digital twin system, data collection tools and methods should be as lightweight as possible, so as to reduce the occupation of network equipment resources, and meaningful so it can be useful. Several solutions on data collection in IETF/IRTF are in working progress, e.g., adaptive subscription defined in [I-D.ietf-netconf-adaptive-subscription], efficient data collection define in [I-D.zcz-nmrg-digitaltwin-data-collection], and contextual information defined in [I-D.claise-opsawg-collected-data-manifest].
Data repository works to effectively store large-scale and heterogeneous network data, as well provide data and services to build various network models. So, it is also necessary to study technologies regarding data services including fast search, batch- data handling, conflict avoidance, data access interfaces, etc.
7.2. Network Modeling
The basic network element models and topology models help generate virtual twin of the network according to the network element configuration, operation data, network topology relationship, link state and other network information. Then the operation status can be monitored and displayed, and the network configuration change and optimization strategy can be pre-verified.
For small scale network, network simulating tools (e.g., [NS-3], [Mininet], etc.) and emulating tools (e.g., [EVE-NG], [GNS-3]) can be used to build basic network models. By using the packet processing capability of virtual network element, such tools can quickly verify the functions of the control plane and data plane. However, this modeling method also has many limitations, including high resource consumption, poor performance analysis ability, and poor scalability. For large scale network, mathematical abstraction methods can be used to build basic network models efficiently. Knowledge graph, network calculus, and formal verification can be candidate methods. Some relevant researches have emerged in recent years, such as [Hong2021], [G2-SIGCOMM], and [DNA-2022]. Going forward, how to improve the extensibility and accuracy of the models is still a big challenge.
As an example, the theory of bottleneck structures introduced in [G2-SIGCOMM, G2-SIGMETRICS] can be used to construct a mathematical model of the network (see also [I-D.giraltyellamraju-alto-bsg-requirements] for more info). A bottleneck structure is a computational graph that efficiently captures the topology, the routing and flow properties of the network. The graph embeds the latent relationships that exist between bottlenecks and the application flows in a distributed
Zhou, et al. Expires 12 January 2023 [Page 17]
Internet-Draft Digital Twin Network Concept July 2022
system, providing an efficient mathematical framework to compute the ripple effects of perturbations (e.g., a flow arriving or departing from the system, or the dynamic change in capacity of a wireless link, among others). Because these perturbations can be seen as mathematical derivatives of the communication system, bottleneck structures can be used to compute optimized network configurations, providing a natural engineering sandbox for building network models. One of the key advantages of bottleneck structures is that they can be used to compute (symbolically or numerically) key performance indicators of the network (e.g., expected flow throughput, projected flow completion time, etc.) without the need to use computationally intensive simulators. This capability can be especially useful when building a digital twin or a large-scale network, potentially saving orders or magnitude in computational resources in comparison to simulation or emulation-based approaches.
The functional model aims to realize the dynamic evolution of network performance evaluation and intelligent decision-making. Data driven AI/ML algorithm will play a great role in building complex network functional models. As a research hotspot in recent years, many successfully cases have been demonstrated, such as [RouteNet], [MimicNet], etc. In the future, in addition to improving the generalization ability and interpretability of AI models, we also need to focus on how to improve the real-time and interactivity of model reasoning based on data and control in network digital twin layer.
7.3. Network Visualization
It is the internal requirement of the digital twin network system to use network visibility technology to visually present the data and model in the network twin with high fidelity and intuitively reflect the interactive mapping between the physical network entity and the network twin. Network Visibility technology can help users understand the internal structure of the network, and also help mine valuable information hidden in the network.
Network Visibility can use algorithms such as hierarchical layout, heuristic layout or force oriented layout (or a combination of several algorithms) for topology layout. And the related topology data can be acquired using solutions provided in [RFC8345], [RFC8346], [RFC8944], etc. Meanwhile, digital twin network system can select different interaction methods or combinations of interaction methods to realize the visual dynamic interaction mapping of virtual and real networks. The data query technology such as SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware.
Zhou, et al. Expires 12 January 2023 [Page 18]
Internet-Draft Digital Twin Network Concept July 2022
7.4. Interfaces
Based on the reference architecture, there are three types of interfaces on building a digital twin network system.
1) Southbound interfaces are twin interfaces between the physical network and its twin entity. They are responsible for information exchange between physical network and network digital twin. The candidate interfaces can be SNMP, NetConf, etc.
2) Northbound interfaces are Application-facing interfaces between the network digital twin and applications. They are responsible for information exchange between network digital twin and network applications. The lightweight and extensible [RESTFul] interface can be the candidate northbound interface.
3) Internal interfaces are within network digital twin layer. They are responsible for information exchange between the three subsystems: Data Repository, Service Mapping Models, and Digital Twin Network Management. These interfaces should be of high- speed, high-efficiency and high-concurrency. The candidate interfaces or protocols can be XMPP (defined in [RFC7622]), and HTTP/3.0 (defined in [RFC9114]).
All interfaces are recommended to be open and standardized so as to help avoid either hardware or software vendor lock, and achieve inter-operability. Besides the interfaces list above, some new interfaces or protocols can be created to better serve digital twin network system.
8. Interaction with IBN
Implementing Intent-Based Networking (IBN) is an innovative technology for life-cycle network management. Future networks will be possibly Intent-based, which means that users can input their abstract ’intent’ to the network, instead of detailed policies or configurations on the network devices. [I-D.irtf-nmrg-ibn-concepts-definitions] clarifies the concept of "Intent" and provides an overview of IBN functionalities. The key characteristic of an IBN system is that user intent can be assured automatically via continuously adjusting the policies and validating the real-time situation.
Zhou, et al. Expires 12 January 2023 [Page 19]
Internet-Draft Digital Twin Network Concept July 2022
IBN can be envisaged in a digital twin network context to show how digital twin network improves the efficiency of deploying network innovation. To lower the impact on real networks, several rounds of adjustment and validation can be emulated on the digital twin network platform instead of directly on physical network. Therefore, digital twin network can be an important enabler platform to implement IBN systems and speed up their deployment.
9. Sample Application Scenarios
Digital twin network can be applied to solve different problems in network management and operation.
9.1. Human Training
The usual approach to network OAM with procedures applied by humans is open to errors in all these procedures, with impact in network availability and resilience. Response procedures and actions for most relevant operational requests and incidents are commonly defined to reduce errors to a minimum. The progressive automation of these procedures, such as predictive control or closed-loop management, reduce the faults and response time, but still there is the need of a human-in-the-loop for multiples actions. These processes are not intuitive and require training to learn how to respond.
The use of digital twin network for this purpose in different network management activities will improve the operators performance. One common example is cybersecurity incident handling, where "cyber- range" exercises are executed periodically to train security practitioners. Digital twin network will offer realistic environments, fitted to the real production networks.
9.2. Machine Learning Training
Machine Learning requires data and their context to be available in order to apply it. A common approach in the network management environment has been to simulate or import data in a specific environment (the ML developer lab), where they are used to train the selected model, while later, when the model is deployed in production, re-train or adjust to the production environment context. This demands a specific adaption period.
Digital twin network simplifies the complete ML lifecycle development by providing a realistic environment, including network topologies, to generate the data required in a well-aligned context. Dataset generated belongs to the digital twin network and not to the production network, allowing information access by third parties, without impacting data privacy.
Zhou, et al. Expires 12 January 2023 [Page 20]
Internet-Draft Digital Twin Network Concept July 2022
9.3. DevOps-Oriented Certification
The potential application of CI/CD models network management operations increases the risk associated to deployment of non- validated updates, what conflicts with the goal of the certification requirements applied by network service providers. A solution for addressing these certification requirements is to verify the specific impacts of updates on service assurance and SLAs using a digital twin network environment replicating the network particularities, as a previous step to production release.
Digital twin network control functional block supports such dynamic mechanisms required by DevOps procedures.
9.4. Network Fuzzing
Network management dependency on programmability increases systems complexity. The behavior of new protocol stacks, API parameters, and interactions among complex software components are examples that imply higher risk to errors or vulnerabilities in software and configuration.
Digital twin network allows to apply fuzzing testing techniques on a twin network environment, with interactions and conditions similar to the production network, permitting to identify and solve vulnerabilities, bugs and zero-days attacks before production delivery.
10. Research Perspectives: A Summary
Research on digital twin network has just started. This document presents an overview of the digital twin network concepts and reference architecture. Looking forward, further elaboration on digital twin network scenarios, requirements, architecture, and key enabling technologies should be investigated by the industry, so as to accelerate the implementation and deployment of digital twin network.
11. Security Considerations
This document describes concepts and definitions of digital twin network. As such, the following security considerations remain high level, i.e., in the form of principles, guidelines or requirements.
Security considerations of the digital twin network include:
* Secure the digital twin system itself.
Zhou, et al. Expires 12 January 2023 [Page 21]
Internet-Draft Digital Twin Network Concept July 2022
* Data privacy protection.
Securing the digital twin network system aims at making the digital twin system operationally secure by implementing security mechanisms and applying security best practices. In the context of digital twin network, such mechanisms and practices may consist in data verification and model validation, mapping operations between physical network and digital counterpart network by authenticated and authorized users only.
Synchronizing the data between the physical and the digital twin networks may increase the risk of sensitive data and information leakage. Strict control and security mechanisms must be provided and enabled to prevent data leaks.
12. Acknowledgements
Many thanks to the NMRG participants for their comments and reviews. Thanks to Daniel King, Quifang Ma, Laurent Ciavaglia, Jerome Francois, Jordi Paillisse, Luis Miguel Contreras Murillo, Alexander Clemm, Qiao Xiang, Ramin Sadre, Pedro Martinez-Julia, Wei Wang, Zongpeng Du, and Peng Liu.
Diego Lopez and Antonio Pastor were partly supported by the European Commission under Horizon 2020 grant agreement no. 833685 (SPIDER), and grant agreement no. 871808 (INSPIRE-5Gplus).
13. IANA Considerations
This document has no requests to IANA.
14. Open issues
* Some technologies (e.g. Network connectivity, Real-time data communication, Collaboration management, conflict detection and resolution, etc.) recently discussed in the IRTF/IETF should be described.
* In section of ’Sample Application Scenarios’, to dig deeper into one or two use cases.
* On the research side, the idea behind digital twin networks is reminiscent of earlier work from the 1990s that should be referenced/acknowledged. Examples include the Shadow MIB concept, Inductive Modeling Technique, etc.
15. References
Zhou, et al. Expires 12 January 2023 [Page 22]
Internet-Draft Digital Twin Network Concept July 2022
15.1. Normative References
[RFC7622] Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Address Format", RFC 7622, DOI 10.17487/RFC7622, September 2015, <https://www.rfc-editor.org/info/rfc7622>.
[RFC8345] Clemm, A., Medved, J., Varga, R., Bahadur, N., Ananthakrishnan, H., and X. Liu, "A YANG Data Model for Network Topologies", RFC 8345, DOI 10.17487/RFC8345, March 2018, <https://www.rfc-editor.org/info/rfc8345>.
[RFC8346] Clemm, A., Medved, J., Varga, R., Liu, X., Ananthakrishnan, H., and N. Bahadur, "A YANG Data Model for Layer 3 Topologies", RFC 8346, DOI 10.17487/RFC8346, March 2018, <https://www.rfc-editor.org/info/rfc8346>.
[RFC8944] Dong, J., Wei, X., Wu, Q., Boucadair, M., and A. Liu, "A YANG Data Model for Layer 2 Network Topologies", RFC 8944, DOI 10.17487/RFC8944, November 2020, <https://www.rfc-editor.org/info/rfc8944>.
[RFC9114] Bishop, M., Ed., "HTTP/3", RFC 9114, DOI 10.17487/RFC9114, June 2022, <https://www.rfc-editor.org/info/rfc9114>.
15.2. Informative References
[Dai2020] Dai, Y. Dai., Zhang, K. Zhang., Maharjan, S. Maharjan., and Yan Zhang. Zhang, "Deep Reinforcement Learning for Stochastic Computation Offloading in Digital Twin Networks. IEEE Transactions on Industrial Informatics, vol. 17, no. 17", August 2020.
[DNA-2022] Zhang, P. Zhang., Gember-Jacobson, A. Gember-Jacobson., Zuo, Y. Zuo., Huang, Y Huang., Liu, X. Liu., and H. Li. Li, "Differential Network Analysis, USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)", April 2022.
[Dong2019] Dong, R. Dong., She, C. She., HardjawanaLiu, W. Hardjawana., Li, Y. Li., and B. Vucetic. Vucetic, "Deep Learning for Hybrid 5G Services in Mobile Edge Computing Systems: Learn from a Digital Twin. IEEE Transactions on Wireless Communications,vol. 18, no. 10", July 2019.
[DTPI2021] "IEEE International Conference on Digital Twins and Parallel Intelligence - Digital Twin Network Session, https://www.dtpi.org/video/10", July 2021.
Zhou, et al. Expires 12 January 2023 [Page 23]
Internet-Draft Digital Twin Network Concept July 2022
[EVE-NG] "Emulated Virtual Environment Next Generation, EVE-NG. https://www.eve-ng.net/".
[Fuller2020] Fuller, A. Fuller., Fan, Z., Day, C., and C. Barlow, "Digital Twin: Enabling Technologies, Challenges and Open Research," in IEEE Access, vol. 8, pp. 108952-108971", 2020.
[G2-SIGCOMM] Ros-Giralt, J. Ros-Giralt., Amsel, N. Amsel., Yellamraju, S. Yellamraju., Ezick, J. Ezick., Lethin, R. Lethin., Jiang, Y. Jiang., Feng, A. Feng., Tassiulas, L. Tassiulas., Wu, Z. Wu., and K, Bergman. Bergman, "Designing data center networks using bottleneck structures", ACM SIGCOMM", August 2021.
[G2-SIGMETRICS] Ros-Giralt, J. Ros-Giralt., Bohara, A. Bohara., Yellamraju, S. Yellamraju., Langston, H. Langston., Lethin, R. Lethin., Jiang, Y. Jiang., Tassiulas, L. Tassiulas., Li, J. Li., Tan, Y. Tan., and M. Veeraraghavan. Veeraraghavan, "On the Bottleneck Structure of Congestion-Controlled Networks, ACM SIGMETRICS", December 2019.
[GNS3] "Graphical Network Simulator-3, GNS3. https://www.gns3.com/".
[Grieves2014] Grieves, M. Grieves., "Digital twin: Manufacturing excellence through virtual factory replication", 2003, <https://www.3ds.com/fileadmin/PRODUCTS- SERVICES/DELMIA/PDF/Whitepaper/DELMIA-APRISO-Digital-Twin- Whitepaper.pdf>.
[Hong2021] Hong, H., Wu, Q., Dong, F., Song, W., Sun, R., Han, T., Zhou, C., and H. Yang, "NetGraph: An Intelligent Operated Digital Twin Platform for Data Center Networks. In ACM SIGCOMM 2021 Workshop on Network-Application Integration (NAI’ 21), Virtual Event, USA. ACM, New York, NY, USA", 2021.
[Hu2021] Hu, W., Zhang, T., Deng, X., Liu, Z., and J. Tan, "Digital twin: a state-of-the-art review of its enabling technologies, applications and challenges. Journal of Intelligent Manufacturing and Special Equipment, Vol. 2 No. 1, pp. 1-34", 2021.
Zhou, et al. Expires 12 January 2023 [Page 24]
Internet-Draft Digital Twin Network Concept July 2022
[I-D.claise-opsawg-collected-data-manifest] Claise, B., Quilbeuf, J., Lopez, D. R., Dominguez, I., and T. Graf, "A Data Manifest for Contextualized Telemetry Data", Work in Progress, Internet-Draft, draft-claise- opsawg-collected-data-manifest-02, 20 March 2022, <https://www.ietf.org/archive/id/draft-claise-opsawg- collected-data-manifest-02.txt>.
[I-D.giraltyellamraju-alto-bsg-requirements] Ros-Giralt, J., Yellamraju, S., Wu, Q., Contreras, L. M., Yang, R., and K. Gao, "Supporting Bottleneck Structure Graphs in ALTO: Use Cases and Requirements", Work in Progress, Internet-Draft, draft-giraltyellamraju-alto-bsg- requirements-01, 23 March 2022, <https://www.ietf.org/archive/id/draft-giraltyellamraju- alto-bsg-requirements-01.txt>.
[I-D.ietf-netconf-adaptive-subscription] Wu, Q., Song, W., Liu, P., Ma, Q., Wang, W., and Z. Niu, "Adaptive Subscription to YANG Notification", Work in Progress, Internet-Draft, draft-ietf-netconf-adaptive- subscription-00, 23 June 2022, <https://www.ietf.org/archive/id/draft-ietf-netconf- adaptive-subscription-00.txt>.
[I-D.irtf-nmrg-ibn-concepts-definitions] Clemm, A., Ciavaglia, L., Granville, L. Z., and J. Tantsura, "Intent-Based Networking - Concepts and Definitions", Work in Progress, Internet-Draft, draft- irtf-nmrg-ibn-concepts-definitions-09, 24 March 2022, <https://www.ietf.org/archive/id/draft-irtf-nmrg-ibn- concepts-definitions-09.txt>.
[I-D.zcz-nmrg-digitaltwin-data-collection] Zhou, C., Chen, D., and P. Martinez-Julia, "Data Collection Requirements and Technologies for Digital Twin Network", Work in Progress, Internet-Draft, draft-zcz- nmrg-digitaltwin-data-collection-00, 10 July 2022, <https://www.ietf.org/archive/id/draft-zcz-nmrg- digitaltwin-data-collection-00.txt>.
[ISO-2021] ISO, "Digital Twin manufacturing framework - Part 2: Reference architecture: ISO/CD 23247-2. https://www.iso.org/standard/78743.html", 2021.
Zhou, et al. Expires 12 January 2023 [Page 25]
Internet-Draft Digital Twin Network Concept July 2022
[Madni2019] Madni, A. Madni., Madni, C. Madni., and S. Lucero. Lucero, "Leveraging digital twin technology in model-based systems engineering. Systems, vol. 7, no. 1, p. 7", January 2019.
[MimicNet] Zhang, Q. Zhang., NG, K. K.W. NG., Kazer, C. W. Kazer., Yan, S. Yan., Sedoc, J. Sedoc., and V. Liu. Liu, "MimicNet: Fast Performance Estimates for Data Center Networks with Machine Learning. In ACM SIGCOMM 2021 Conference (SIGCOMM 21).", August 2021.
[Mininet] "Mninet: An Instant Virtual Network on your Laptop, http://mininet.org/".
[Natis-Gartner2017] Natis, Y. Natis., Velosa, A. Velosa., and W. R. Schulte. Schulte, "Innovation insight for digital twins - driving better IoT-fueled decisions. https://www.gartner.com/en/documents/3645341", 2017.
[Nguyen2021] Nguyen, H. X. Nguyen., Trestian, R. Trestian., To, D. To., and M. Tatipamula. Tatipamula, "Digital Twin for 5G and Beyond. IEEE Communications Magazine, vol. 59, no. 2", February 2021.
[NS-3] "Network Simulator, NS-3. https://www.nsnam.org/".
[RESTFul] Richardson, L. Richardson. and M. Amundsen. Amundsen, "RESTful Web APIs. O’Reilly Media, Inc.", 2013.
[Roson2015] Rosen, R. Rosen., Wichert, G. Von Wichert., Lo, G. Lo., and K.D. Bettenhausen. Bettenhausen, "About the importance of autonomy and DTs for the future of manufacturing. IFAC- Papersonline, Vol. 48, pp. 567-572.", 2015.
[RouteNet] Rusek, K. Rusek., Suárez-Varela, J. Suárez-Varela., Almasan, P. Almasan., Barlet-Ros, P. Barlet-Ros., and A. Cabellos-Aparicio. Cabellos-Aparicio, "RouteNet: Leveraging Graph Neural Networks for network modeling and optimization in SDN. IEEE Journal on Selected Areas in Communication (JSAC), vol. 38, no. 10", October 2020.
[Tao2019] Tao, F. Tao., Zhang, H. Zhang., Liu, A. Liu., and A. Y. C. Nee. Nee, "Digital Twin in Industry: State-of-the-Art. IEEE Transactions on Industrial Informatics, vol. 15, no. 4.", April 2019.
Zhou, et al. Expires 12 January 2023 [Page 26]
Internet-Draft Digital Twin Network Concept July 2022
[TNT2022] "IEEE International workshop on Technologies for Network Twins, https://sites.google.com/view/tnt-2022/", 2022.
Appendix A. Change Logs
v06 - v07: Addressed reviewer’s comments from adoption call, including below major changes.
* Resequenced the sections via adding more subsections on concepts of digital twin network, removing the ’Requirements Language’ section, and moving ahead the ’Challenges’ section.
* Cited more papers, or industrial information on digital twin concepts and digital twin for networks.
* Added more information on describing the challenges and key characteristics digital twin network.
* Removed previous open issue on investigating related digital twin network work and identify the differences and commonalities, and added several new open issues for future study.
* Other editorial changes.
v05 - v06: Addressed comments form meeting and maillist, to request adoptoin call.
* Remove acronym DTN to avoid conflict with ’Delay Tolerant Network’;
* Elaborate the descriptoin of Digital Twin Network architecture that supports multiple instances;
* Other Editorial changes.
04 - v05
* Clarify the difference between digital twin network platform and traditional network management system;
* Add more references of researches on applying digital twin to network field;
* Clarify the benefit of ’Privacy and Regulatory Compliance’;
* Refine the description of reference architecture;
* Other Editorial changes.
Zhou, et al. Expires 12 January 2023 [Page 27]
Internet-Draft Digital Twin Network Concept July 2022
v03 - v04
* Update data definition and models definitions to clarify their difference.
* Remove the orchestration element and consolidated into control functionality building block in the digital twin network.
* Clarify the mapping relation (one to one, and one to many) in the mapping definition.
* Add explanation text for continuous verification.
v02 - v03
* Split interaction with IBN part as a separate section.
* Fill security section;
* Clarify the motivation in the introduction section;
* Use new boilerplate for requirements language section;
* Key elements definition update.
* Other editorial changes.
* Add open issues section.
* Add section on application scenarios.
Authors’ Addresses
Cheng Zhou China Mobile Beijing 100053 China Email: zhouchengyjy@chinamobile.com
Hongwei Yang China Mobile Beijing 100053 China Email: yanghongwei@chinamobile.com
Zhou, et al. Expires 12 January 2023 [Page 28]
Internet-Draft Digital Twin Network Concept July 2022
Xiaodong Duan China Mobile Beijing 100053 China Email: duanxiaodong@chinamobile.com
Diego Lopez Telefonica I+D Seville Spain Email: diego.r.lopez@telefonica.com
Antonio Pastor Telefonica I+D Madrid Spain Email: antonio.pastorperales@telefonica.com
Qin Wu Huawei 101 Software Avenue, Yuhua District Nanjing Jiangsu, 210012 China Email: bill.wu@huawei.com
Mohamed Boucadair Orange Rennes 35000 France Email: mohamed.boucadair@orange.com
Christian Jacquenet Orange Rennes 35000 France Email: christian.jacquenet@orange.com
Zhou, et al. Expires 12 January 2023 [Page 29]
Network Management Research Group J. PaillisseInternet-Draft P. AlmasanIntended status: Informational M. FerriolExpires: 12 January 2023 P. Barlet A. Cabellos UPC-BarcelonaTech S. Xiao X. Shi X. Cheng Huawei D. Perino D. Lopez A. Pastor Telefonica I+D 11 July 2022
A Performance-Oriented Digital Twin for Carrier Networks draft-paillisse-nmrg-performance-digital-twin-00
Abstract
This draft introduces the concept of a Network Digital Twin (NDT) for performance evaluation. A Performance NDT is able to produce performance estimates (delay, jitter, loss) of a given input network with a specific topology, traffic demand, and routing and scheduling configuration. Also, this draft discusses the interface of the digital twin, how it relates to existing control plane elements, use cases, and possible implementation options.
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on 12 January 2023.
Paillisse, et al. Expires 12 January 2023 [Page 1]
Internet-Draft Network Performance Digital Twin July 2022
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Architecture of the Network Performance Digital Twin . . . . 5 4. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Administrator . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Configuration Interface . . . . . . . . . . . . . . . . . 7 4.3. Digital Twin Interface (DTI) . . . . . . . . . . . . . . 7 5. Mapping to the Network Digital Twin Architecture . . . . . . 8 6. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.1. Network Operations and Management . . . . . . . . . . . . 9 6.1.1. Network planning . . . . . . . . . . . . . . . . . . 9 6.1.2. What-if scenarios . . . . . . . . . . . . . . . . . . 10 6.1.3. Troubleshooting . . . . . . . . . . . . . . . . . . . 11 6.1.4. Anomaly detection . . . . . . . . . . . . . . . . . . 11 6.1.5. Training . . . . . . . . . . . . . . . . . . . . . . 11 6.2. Network Optimization . . . . . . . . . . . . . . . . . . 12 7. Implementation Challenges . . . . . . . . . . . . . . . . . . 13 7.1. Simulation . . . . . . . . . . . . . . . . . . . . . . . 13 7.2. Emulation . . . . . . . . . . . . . . . . . . . . . . . . 14 7.3. Analytical Modelling . . . . . . . . . . . . . . . . . . 14 7.4. Neural Networks . . . . . . . . . . . . . . . . . . . . . 14 7.4.1. MultiLayer Perceptron . . . . . . . . . . . . . . . . 15 7.4.2. Recurrent Neural Networks . . . . . . . . . . . . . . 15 7.4.3. Convolutional Neural Networks . . . . . . . . . . . . 15 7.4.4. Graph Neural Networks . . . . . . . . . . . . . . . . 15 7.4.5. NN Comparison . . . . . . . . . . . . . . . . . . . . 16 8. Training . . . . . . . . . . . . . . . . . . . . . . . . . . 17 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 10. Security Considerations . . . . . . . . . . . . . . . . . . . 18 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 11.1. Normative References . . . . . . . . . . . . . . . . . . 18 11.2. Informative References . . . . . . . . . . . . . . . . . 18
Paillisse, et al. Expires 12 January 2023 [Page 2]
Internet-Draft Network Performance Digital Twin July 2022
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 22 Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 22
1. Introduction
A Digital Twin for computer networks is a virtual replica of an existing network with a behavior equivalent to that of the real one. The key advantage of a Network Digital Twin (NDT) is the ability to recreate the complexities and particularities of the network infrastructure without the deployment cost of a real network. Hence, network administrators can test, deploy and modify network configurations safely, without worrying about the impact on the real network. Once the administrator has found a configuration that fulfills the expected objectives, it is deployed to the real network. In addition, a NDT is faster, safer and more cost-effective than interacting with the physical network. All these characteristics make NDT useful for different network management tasks ranging from network planning or troubleshooting to optimization.
The concept of a NDT has been proposed for different approaches: network management [I-D.draft-zhou-nmrg-digitaltwin-network-concepts], 5G networks [digital-twin-5G], Vehicular networks [digital-twin-vanets], artificial intelligence [digital-twin-AI], or Industry 4.0 [digital-twin-industry], among others.
This draft proposes a Digital Twin for network management with a focus on performance evaluation. That is, given several input parameters (topology, traffic matrix, etc), a Network Performance Digital Twin (NPDT) predicts network performance metrics such as delay (per path or per link), jitter, or loss. This draft defines the inputs and outputs of such Digital Twin, the associated interfaces with other modules in the network control plane, and details use cases.
In addition, this draft discusses possible implementation options for the NPDT, with a special emphasis on those based on Machine Learning. The aim of Section 7 (Implementation Challenges) is describing the advantages and limitations of these techniques. For example, most Machine Learning technologies rely heavily on large amounts of data to achieve acceptable accuracy. Other considerations include adjusting the architecture of the Neural Network to successfully understand the structure of the input data.
In order to use a Network Performance Digital Twin (NPDT) in practical scenarios (c.f. Section 6), such as network optimization, it should meet certain requirements:
Paillisse, et al. Expires 12 January 2023 [Page 3]
Internet-Draft Network Performance Digital Twin July 2022
Fast: low delay when making predictions (in the order of milliseconds) to use it in optimization scenarios that need to test a large number of configuration variables (c.f. Section 6.2).
Accurate: the error of the prediction (vs the ground truth) has to be below a certain threshold to be deployable in real-world networks.
Scalable: support networks of arbitrarily large topologies
Variety of Inputs: accept a wide range of combinations of:
* Routing configurations
* Scheduling configurations (FIFO, Weighted Fair Queueing, Deficit Round Robin, etc)
* Topologies
* Traffic Matrices
* Traffic Models (constant bitrate, Poisson, ON/OFF, etc)
Accessible: despite the internal architecture of the NPDT, it needs to be easy to use for network engineers and administrators. This includes, but is not limited to: interfaces to communicate with NPDT that are well-known in the networking community, metrics that are readily understood by network engineers, or confidence values of the estimations.
Note that the inputs and outputs described here are an example, but other inputs and outputs are possible depending on the specificities of each scenario.
2. Terminology
Digital Twin (DT): A virtual replica of a physical system.
Network Digital Twin (NDT): A virtual replica a physical network.
Network Performance Digital Twin (NPDT): A virtual replica a physical network, that can predict with accuracy several performance metrics of the physical network.
Paillisse, et al. Expires 12 January 2023 [Page 4]
Internet-Draft Network Performance Digital Twin July 2022
Network Optimizer: An algorithm capable of finding the optimal configuration parameters of a network, e.g. OSPF weights, given an optimization objective, e.g. latency below a certain threshold.
Control Plane: Any system, hardware or software, centralized or decentralized, in charge of controlling and managing a physical network. Examples are routing protocols, SDN controllers, etc.
3. Architecture of the Network Performance Digital Twin
Figure 1 presents an overview of the architecture of a Network Performance Digital Twin (NPDT).
Administrator Intent | | |Intent-Based Interface | | +-------------+-----------------------------+ | | | | | | Intent-Based Optimizer | | | Rendered | +-------------+ | | | DTI | Network | | Management | |Interface| Performance | | Plane | |<------->| Digital | | | | | Twin | | | | | | | | Measure Configure | +-------------+ | | | | | +-------------+-----------------------------+ | | | | Measurement | | Configuration Interface | | Interface | | +--------------------------------------+ | | | Physical Network | | | +--------------------------------------+
Figure 1: Global architecture of the Performance DT
Each element is defined as:
Paillisse, et al. Expires 12 January 2023 [Page 5]
Internet-Draft Network Performance Digital Twin July 2022
Network Performance Digital Twin (NPDT): a system capable of generating performance estimates of a specific instance of a network.
Physical Network: a real-world network that can be configured via standard interfaces.
Management Plane: The set of hardware and software elements in charge of controlling the Physical Network. This ranges from routing processes, optimization algorithms, network controllers, visibility platforms, etc. The definition, organization and implementation of the elements within the management plane is outside of the scope of this document. In what follows, some elements of the management plane that are relevant to this document are described.
* Optimizer: a network optimizer that can tune the configuration parameters of a network given one or more optimization objectives, e.g. do not exceed a latency threshold in all paths, minimize the load of the most used link, and avoid more than 10 Gbps of traffic at router R4 [DEFO].
* Intent-Based Renderer: a system capable of understanding network intent, according to the definitions in [irtf-nmrg-ibn-concepts-definitions-09].
* Measure: any system to measure the status and performance of a network, e.g. Netflow [RFC3954], streaming telemetry [streaming-telemetry], etc.
* Configure: any system to apply configuration settings to the network devices, e.g. a NETCONF Manager or an end-to-end system to manage device configuration files [facebook-config].
And the functions of each interface are:
DT Interface (DTI): an interface to communicate with the Network Performance Digital Twin (NPDT). Inputs to the DT are a description of the network (topology, routing configuration, etc), and the outputs are performance metrics (delay, jitter, loss, c.f. Section 4).
Configuration Interface (CI): a standard interface to configure the physical network, such as NETCONF [RFC6241], YANG, OpenFlow [OFspec], LISP [RFC6830], etc.
Measurement Interface (MI): a standard interface to collect network
Paillisse, et al. Expires 12 January 2023 [Page 6]
Internet-Draft Network Performance Digital Twin July 2022
status information, such as Netflow [RFC3954], SNMP, streaming telemetry [openconfig-rtgwg-gnmi-spec-01], etc.
Intent-Based Interface (IBI): an interface for the network administrator to define optimization objectives or run the DT to obtain performance estimates, among others.
4. Interfaces
4.1. Administrator
This interface can be a simple CLI or a state-of-the-art GUI, depending on the final product. In summary, it has to offer the network administrator the following options/features:
* Predict the performance of one or more network scenarios, defined by the administrator. Several use-cases related to this option are detailed in Section 6.1.
* Define network optimization objectives and run the network optimizer.
* Apply the optimized configuration to the physical network.
4.2. Configuration Interface
This interface is used to configure the Physical Network with the configuration parameters obtained from the optimizer. It can be composed of one or more IETF protocols for network configuration, a non-exhaustive list is: NETCONF [RFC6241], RESTCONF/YANG [RFC8040], PCE [RFC4655], OVSDB [RFC7047], or LISP [RFC6830]. It is also possible to use other standards defined outside the IETF that allow the configuration of elements in the forwarding plane, e.g. OpenFlow [OFspec] or P4 Runtime [P4Rspec].
4.3. Digital Twin Interface (DTI)
This interface can be defined with any widespread data format, such as CSV files or JSON objects. There are two groups of data. We are assuming a network with N nodes.
Inputs: data sent to the NPDT to calculate the performance estimates:
* Topology: description of the network topology in graph format, eg. NetworkX [NetworkXlib].
Paillisse, et al. Expires 12 January 2023 [Page 7]
Internet-Draft Network Performance Digital Twin July 2022
* Routing configuration: a matrix of size N*N. Each cell contains the path from source N(i) to destination N(j) as a series of nodes of the topology. Note that not all source-destination pairs may have a path. Since the NPDT only needs a sequence of nodes to define a route, it supports different routing protocols, from OSPF, IS-IS or BGP, to SRv6, LISP, etc.
* Traffic Demands: a definition of the traffic that is injected into the network. It can be specified with different granularities, ranging from a list of 5-tuple flows and their associated traffic intensity, to a N*N matrix defining the traffic intensity for each source-destination pair. Some source-destination pairs may have zero traffic intensity. The traffic intensity defines parameters of the traffic: bits per second, number of packets, average packet size, etc.
* Traffic Model: the statistical properties of the input traffic, e.g. Video on Demand, backup, VoIP traffic, etc. It can be defined globally for the whole network or individually for each flow in the Traffic Demands.
* Scheduling configuration: attributes associated to the nodes of the topology graph describing the scheduling configuration of the network, that is (1) scheduling policy (e.g. FIFO, WFQ, DRR, etc), and (2) number of queues per output port.
Outputs: performance estimates of the NPDT: three matrices of size N*N containing the delay, jitter and loss for all the paths in the input topology.
Note that this is an example of the inputs/outputs of a performance NPDT, but other inputs and outputs are possible depending on the specificities of each scenario.
5. Mapping to the Network Digital Twin Architecture
Since the NPDT is a type of Network Digital Twin, its elements can be mapped to the reference architecture of a NDT described in [I-D.draft-zhou-nmrg-digitaltwin-network-concepts]. Table 1 maps the elements of the NDT reference architecture to those of the NPDT. Note that the Physical Network is the same for both architectures.
Paillisse, et al. Expires 12 January 2023 [Page 8]
Internet-Draft Network Performance Digital Twin July 2022
+=====================================+========================+ | NDT Reference Architecture | This draft | +====================+================+========================+ | Application Layer | | Intent-Based Interface | | | +------------------------+ | | | Optimizer | +--------------------+----------------+------------------------+ | Digital Twin Layer | Management | Management Plane | | +----------------+------------------------+ | | Service | Network Performance | | | Mapping Models | Digital Twin | | +----------------+------------------------+ | | Data | Optional in production | | | Repository | deployments | +--------------------+----------------+------------------------+ | Physical Network | Data | Measurement Interface | | | Collection | | | +----------------+------------------------+ | | Control | Configuration | | | | Interface | +--------------------+----------------+------------------------+
Table 1: Mapping of NDT reference architecture elements to the architecture of the Network Performance DT.
6. Use Cases
6.1. Network Operations and Management
6.1.1. Network planning
The size and traffic of networks has doubled every year [network-capacity]. To accommodate this growth in users and network applications, networks need periodical upgrades. For example, ISPs might be willing to increase certain link capacities or add new connections to alleviate the burden on the existing infrastructure. This is typically a cumbersome process that relies on expert knowledge. Furthermore, modern networks are becoming larger and more complex, thus exacerbating the difficulty of existing solutions to scale to larger networks [planning-scalability].
Since the NPDT models large infrastructures and can produce accurate and fast performance estimates, it can help in different tasks related to network capacity and planning:
* Estimating when an existing network will run out of resources, assuming a given growth in users.
Paillisse, et al. Expires 12 January 2023 [Page 9]
Internet-Draft Network Performance Digital Twin July 2022
* Use performance estimates to plan the optimal upgrade that can cope with user growth. Network operators can leverage the NPDT to make better planning decisions and anticipate network upgrades.
* Find unconventional topologies: in some networking scenarios, especially datacenter networks, some topologies are well-known to offer high performance [Google-Clos]. However, it is also possible to search for new topologies that optimize performance with the help of algorithms. On one hand, the algorithm explores different topologies and, on the other hand, the NPDT provides fast performance estimations to the algorithm. Hence, the NPDT guides the optimization algorithm towards the topologies with better performance [auto-dc-topology].
6.1.2. What-if scenarios
The NPDT is a unique tool to perform what-if analysis, that is, analyze the impact of potential scenarios and configurations safely without any impact on the real network. In this context, the NPDT acts as a safe sandbox where different configurations are applied to the NPDT to understand their impact on the network. Some examples of What-if analysis are:
* What is the impact in my network performance if we acquire company ACME and we incorporate all its employees?
* When will the network run out of capacity if we have an organic growth of users?
* What is the optimal network hardware upgrade given a budget?
* We need to update this path. What is the impact on the performance of the other flows?
* A particular day has a spike of 10% in traffic intensity. How much loss will it introduce? Can we reduce this loss if we rate- limit another flow?
* How many links can fail until the SLA is degraded?
* What happens if link B fails? Is the network able to process the current traffic load?
Paillisse, et al. Expires 12 January 2023 [Page 10]
Internet-Draft Network Performance Digital Twin July 2022
6.1.3. Troubleshooting
There are many factors that cause network failures (e.g., invalid network configurations, unexpected protocol interactions). Debugging modern networks is complex and time consuming. Currently, troubleshooting is typically done by human experts with years of experience using networking tools.
Network operators can leverage a NPDT to reproduce previous network failures, in order to find the source of service disruptions. Specifically, network operators can replicate past network failure scenarios and analyze their impact on network performance, making it easier to find specific configuration errors. In addition, the NPDT helps in finding more robust network configurations that prevent service disruptions in the future.
6.1.4. Anomaly detection
Since the NPDT models the behaviour of a real-world network, network operators have access to an estimation of the expected network behaviour. When the real-world network behaviour deviates from the NPDT’s behaviour, it can act as an indicator of an anomaly in the real-world network. Such anomalies can appear at different places in a network (e.g., core, edge, IoT), and different data sources can be used to detect such anomalies.
6.1.5. Training
As discussed before, the NPDT can be understood as a safe playground where misconfigurations don’t affect the real-world system performance. In this context, the NPDT can play an important role in improving the education and certification process of network professionals, both in basic networking training and advanced scenarios. For example:
* In basic network training, understand how routing modifications impact delay.
* In more advanced studies, showcase the impact of scheduling configuration on flow performance, and how to use them to optimize SLAs.
* In cybersecurity scenarios, evaluate the effects of network attacks and possible counter-measures.
Paillisse, et al. Expires 12 January 2023 [Page 11]
Internet-Draft Network Performance Digital Twin July 2022
6.2. Network Optimization
Since the DT can provide performance estimates in short timescales, it is possible to pair it with a network optimizer (Figure 2). The network administrator defines one or more optimization objectives e.g. maximum average delay for all paths in the network. The optimizer can be implemented with a classical optimization algorithm, like Constraint Programming [DEFO], or Local Search [LS], or a Machine-Learning one, such as Deep Neural Networks [DNN-TM], or Multi-Agent Reinforcement Learning [MARL-TE]. Regardless of the implementation, the optimizer tests various configurations to find the network configuration parameters that satisfy the optimization objectives. In order to know the performance of a specific network configuration, the optimizer sends such configuration to the NPDT, that predicts the performance metrics of such configuration.
+------------+ Candidate +-------------+ | | Network Config. | Network | Optimization----> | Network |------------------->| Performance | objectives | Optimizer | | Digital | | |<-------------------| Twin | +------------+ Estimated +-------------+ | Performance | | v Optimized Network Configuration
Figure 2: Using a NPDT as a network model for an optimizer.
An example of optimization use case would be multi-objective optimization scenarios: commonly, the network administrator defines a set of optimization goals that must be concurrently met [DEFO], for example:
* Bound the latency of all links to a maximum.
* Do not exceed a link utilization of 80%, but for only a sub-set of all the links.
* Route all flows of type B through node 10.
* Avoid more than 35 Gbps of traffic to router R5.
* Minimize the routing cost, that is, the number of flow to re-route [ReRoute-Cost].
Paillisse, et al. Expires 12 January 2023 [Page 12]
Internet-Draft Network Performance Digital Twin July 2022
7. Implementation Challenges
This section presents different technologies that can be used to build a NPDT, and details the advantages and disadvantages of using them to implement a NPDT. It takes into account how they perform with respect to the requirements of accuracy, speed, and scale of the NPDT predictions.
7.1. Simulation
Packet-level simulators, such as OMNET++ [OMNET] and NS-3 [ns-3] simulate network events. In a nutshell, they simulate the operation of a network by processing a series of events, such as the transmission of a packet, enqueuing and dequeuing packets in the router, etc. Hence, they offer excellent accuracy when predicting network performance metrics (delay, jitter and loss), but they take a significant amount of time to run the simulation. They scale linearly with number of packets to simulate.
In fact, the simulation time depends on the number of events to process [limitations-net-sim]. This limits the scalability of simulators, even if the topology does not change: increasing traffic intensities will take longer to simulate because more packets enter the network per unit of time. Conversely, simulating the same traffic intensity in larger topologies will also increase the simulation time. For example, consider a simulator that takes 11 hours to process 4 billion events (these values are obtained from an actual simulation). Although 4 billion events may appear a large figure, consider:
* A 1 Gbps ethernet link, transmitting regular frames with the maximum of 1518 bytes.
* This translates to approx. 82k packets crossing the link per second.
* Assuming a network with 50 links, and that the transmission of a packet over a link equals to a single event a in the simulator, such network translates to 82k packets/s/link * 50 links * 1 event/packet ˜ 4 million events to simulate one second of network activity.
* Then, with a budget of 4 billion events, it takes 11 hours to simulate only 16 minutes of network activity.
These figures show that, despite the high accuracy of network simulators, they take too much time to calculate performance estimations.
Paillisse, et al. Expires 12 January 2023 [Page 13]
Internet-Draft Network Performance Digital Twin July 2022
7.2. Emulation
Network emulators run the original network software in a virtualized environment. This makes them easy to deploy, and depending on the emulation hardware, they can produce reasonably fast estimations. However, for large scale networks their speed will eventually decrease because they are not using specific hardware built for networking. For fully-virtualized networks, emulating a network requires as many resources as the real one, which is not cost- effective.
In addition, some studies have reported variable accuracy depending on the emulation conditions, both the parameters and underlying hardware and OS configurations [emulation-perf]. Hence, emulators show some limitations if we want to build a fast and scalable NPDT. However, emulators are useful in other use cases, for example in training, debugging, or testing new features.
7.3. Analytical Modelling
Queueing Theory (QT) is an analytical tool that models computer networks as a series of queues. The key advantage of QT is its speed, because the calculations rely on mathematical equations. QT is arguably the most popular modeling technique, where networks are represented as interconnected queues that are evaluated analytically. This represents a well-established framework that can model complex and large networks.
However, the main limitation of QT is the traffic model: although it offers high accuracy for Poisson traffic models, it presents poor accuracy under realistic traffic models [qt-precision]. Internet traffic has been extensively analyzed in the past two decades, and despite the community has not agreed on a universal model, there is consensus that in general aggregated traffic shows strong autocorrelation and a heavy-tail [inet-traffic].
7.4. Neural Networks
Finally, Neural Networks (NN) and other Machine Learning (ML) tools are as fast as QT (in the order of milliseconds), and can provide similar accuracy to that of packet-level simulators. They represent an interesting alternative, but have two key limitations. First, they require training the NN with a large amount of data from a wide range of network scenarios: different routings, topologies, scheduling configurations, as well as link failures and network congestion. This dataset may not be always accessible, or easy to produce in a production network (see Section 8). Second, in order to scale to larger topologies and keep the accuracy, not all NN provide
Paillisse, et al. Expires 12 January 2023 [Page 14]
Internet-Draft Network Performance Digital Twin July 2022
sufficient accuracy, therefore, some use cases need custom NN architectures.
7.4.1. MultiLayer Perceptron
A MultiLayer Perceptron [MLP] is a basic kind of NN from the family of feedforward NN. In short, input data is propagated unidirectionally from the input layer of neurons through the output. There may be an arbitrary number of hidden layers between the input and output layer. They are widely used for basic ML applications, such as regression.
7.4.2. Recurrent Neural Networks
Recurrent Neural Networks [RNN] are a more advanced type of NN because they connect some layers to the previous ones, which gives them the ability to store state. They are mostly used to process sequential data, such as handwriting, text, or audio. They have been used extensively in speech processing [RNN-speech], and in general, Natural Language Processing applications [NLP].
7.4.3. Convolutional Neural Networks
Convolutional Neural Networks (CNN), are a Deep Learning NN designed to process structured arrays of data such as images. CNNs are highly performant when detecting patterns in the input data. This makes them widely used in computer vision tasks, and have become the state of the art for many visual applications, such as image classification [CNN-images]. Hence, their current design presents limited applicability to computer networks.
7.4.4. Graph Neural Networks
Graph Neural Networks [GNN] are a type of neural network designed to work with graph-structured data. A relevant type of GNN with interesting characteristics for computer networks are Message Passing Neural Networks (MPNN). In a nutshell, MPNN exchanges a set of messages between the graph nodes in order to understand the relationship between the input graph and the expected outputs of the training dataset. They are composed of three functions, that are repeated several iterations, depending on the size of the graph:
* Message: encodes information about the relationship of two contiguous elements of the graph in a message (an n-element array).
Paillisse, et al. Expires 12 January 2023 [Page 15]
Internet-Draft Network Performance Digital Twin July 2022
* Aggregation: combines the different messages received on a particular node. It is typically an element-wise summation. The result is an array of constant length, independently of the number of received messages.
* Update: combines the hidden states of a node with the aggregated message. The result of this function is used as input to the next message-passing iteration.
Note that the internal architecture of a MPNN is re-build for each input graph.
Such ability to understand graph-structured data naturally renders them interesting for a Network Performance Digital Twin. Since computer networks are fundamentally graphs, they have the potential to take as input a graph of the network, and produce as output performance estimations of such the input network [qt-precision].
7.4.5. NN Comparison
Figure 3 presents a comparison of different types of NN that predict the delay of a given input network. We use a dataset of the performance of different network topologies, created with simulation data (i.e, ground truth) from OMNET++. We measure the error relative to the delay of the simulation data. In order to evaluate how well the different NN deal with different network topologies, we train each NN in three different scenarios:
* Same topology: the training and testing datasets contain the same network topologies.
* Different topology: the training and testing datasets contain different sets of network topologies. The objective is determining if the NN keeps the same performance if we show it a topology it has never seen.
* Link failures: here we remove a random link from the topology.
+----------------------------------------------------------+ | Mean Average Percentage Error of the delay prediction | +----------------------+-----------------------------------+ | Scenario | MLP | RNN | GNN | +----------------------+-----------+-----------+-----------+ | Same topology | 0.123 | 0.1 | 0.020 | | Different topology | 11.5 | 0.305 | 0.019 | | Link failures | 1.15 | 0.638 | 0.042 | +----------------------+-----------+-----------+-----------+
Paillisse, et al. Expires 12 January 2023 [Page 16]
Internet-Draft Network Performance Digital Twin July 2022
Figure 3: Performance comparison of different NN architectures
We can see that all NNs predict with excellent accuracy the network delay if we don’t change the topology used during training. However, when it comes to new topologies, the error of the MLP is unacceptable (1150 %), as well as the RNN, around 30%. On the other hand, the GNN can understand new topologies, with an error below 2%. Similarly, if a link fails, the RNN has difficulties offering accurate predictions (60% error), while the GNN maintains the accuracy (4.2%). These results show the potential of GNNs to build a Network Performance Digital Twin.
8. Training
In the context of Digital Twins based on Machine Learning, they require a training process before they can be deployed. Commonly, the training process makes use of a dataset of inputs and expected outputs, that guides the training process to adjust the internal architecture of e.g. the neural network. There are some caveats regarding the training process:
* In order to obtain sufficient accuracy, the training dataset needs to be representative, that is, contain samples of a wide range of possible inputs and outputs. In networks, this translates to samples of a congested network, with a link failure, etc. Otherwise, the resulting algorithm cannot predict such situations.
* Taking the latter into account, this means that some kind of samples, e.g. those of a congested or disrupted network are difficult to obtain from a production network.
* A way to acquire those samples is in a testbed, although it may not be possible for some networks, especially those of large scale. A possible solution in this situation is developing Neural Networks that are invariant to some of the metrics of the graph, e.g. number of nodes. That is, the NN does not lose accuracy if the number of nodes increases. This makes it possible to train the NN in a testbed, and then deploy it in a network that is larger than the testbed without losing accuracy.
9. IANA Considerations
This memo includes no request to IANA.
Paillisse, et al. Expires 12 January 2023 [Page 17]
Internet-Draft Network Performance Digital Twin July 2022
10. Security Considerations
An attacker can alter the software image of the NPDT. This could produce inaccurate performance estimations, that could result in network misconfigurations, disruptions or outages. Hence, in order to prevent the accidental deployment of a malicious NPDT, the software image of the NPDT MUST be digitally signed by the vendor.
11. References
11.1. Normative References
11.2. Informative References
[OMNET] "https://omnetpp.org/", 2022.
[ns-3] "https://www.nsnam.org/", 2022.
[P4Rspec] "https://p4.org/p4-spec/p4runtime/main/P4Runtime- Spec.html", 2021.
[OFspec] "TS-025: OpenFlow Switch Specification https://opennetworking.org/wp-content/uploads/2014/10/ openflow-switch-v1.5.1.pdf", 2015.
[NetworkXlib] "https://networkx.org/", 2022.
[openconfig-rtgwg-gnmi-spec-01] Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, C., and C. Morrow, "gRPC Network Management Interface (gNMI)", March 2018, <https://datatracker.ietf.org/doc/html/draft-openconfig- rtgwg-gnmi-spec-01>.
[RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, <https://www.rfc-editor.org/info/rfc8040>.
[RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, <https://www.rfc-editor.org/info/rfc6241>.
[RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 10.17487/RFC6830, January 2013, <https://www.rfc-editor.org/info/rfc6830>.
Paillisse, et al. Expires 12 January 2023 [Page 18]
Internet-Draft Network Performance Digital Twin July 2022
[RFC4655] Farrel, A., Vasseur, J.-P., and J. Ash, "A Path Computation Element (PCE)-Based Architecture", RFC 4655, DOI 10.17487/RFC4655, August 2006, <https://www.rfc-editor.org/info/rfc4655>.
[RFC7047] Pfaff, B. and B. Davie, Ed., "The Open vSwitch Database Management Protocol", RFC 7047, DOI 10.17487/RFC7047, December 2013, <https://www.rfc-editor.org/info/rfc7047>.
[RFC3954] Claise, B., Ed., "Cisco Systems NetFlow Services Export Version 9", RFC 3954, DOI 10.17487/RFC3954, October 2004, <https://www.rfc-editor.org/info/rfc3954>.
[I-D.draft-zhou-nmrg-digitaltwin-network-concepts] Zhou, C., Yang, H., Duana, X., Lopez, D., Pastor, A., Wu, Q., Boucadir, M., and C. Jacquenet, "Digital Twin Network: Concepts and Reference Architecture", Work in Progress, Internet-Draft, draft-zhou-nmrg-digitaltwin-network- concepts-06, 2 December 2021, <https://datatracker.ietf.org/doc/html/draft-zhou-nmrg- digitaltwin-network-concepts-06>.
[irtf-nmrg-ibn-concepts-definitions-09] Clemm, A., Ciavaglia, L., Granville, L. Z., and J. Tantsura, "Intent-Based Networking - Concepts and Definitions", March 2022, <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg- ibn-concepts-definitions-09>.
[digital-twin-5G] Nguyen, H. X., Trestian, R., To, D., and M. Tatipamula, "Digital Twin for 5G and Beyond", 2021, <https://doi.org/10.1109/MCOM.001.2000343>.
[digital-twin-vanets] Zhao, L., Han, G., Li, Z., and L. Shu, "Intelligent Digital Twin-Based Software-Defined Vehicular Networks", 2020, <https://doi.org/10.1109/MNET.011.1900587>.
[digital-twin-industry] Groshev, M., Guimarães, C., Martín-Pérez, J., and A. D. L. Oliva, "Toward Intelligent Cyber-Physical Systems: Digital Twin Meets Artificial Intelligence", 2021, <https://doi.org/10.1109/MCOM.001.2001237>.
Paillisse, et al. Expires 12 January 2023 [Page 19]
Internet-Draft Network Performance Digital Twin July 2022
[streaming-telemetry] Gupta, A., Harrison, R., Canini, M., Feamster, N., Rexford, J., and W. Willinger, "Sonata: Query-Driven Streaming Network Telemetry", 2018, <https://doi.org/10.1145/3230543.3230555>.
[network-capacity] Ellis, A. D., Suibhne, N. M., Saad, D., and D. N. Payne, "Communication networks beyond the capacity crunch", 2016, <https://royalsocietypublishing.org/doi/abs/10.1098/ rsta.2015.0191>.
[planning-scalability] Zhu, H., Gupta, V., Ahuja, S. S., Tian, Y., Zhang, Y., and X. Jin, "Network Planning with Deep Reinforcement Learning", 2021, <https://doi.org/10.1145/3452296.3472902>.
[limitations-net-sim] Rampfl, S., "Network simulation and its limitations", 2013, <https://doi.org/10.2313/NET-2013-08-1_08>.
[emulation-perf] Jurgelionis, A., Laulajainen, J., Hirvonen, M., and A. I. Wang, "An Empirical Study of NetEm Network Emulation Functionalities", 2011, <https://doi.org/10.1109/ICCCN.2011.6005933>.
[qt-precision] Ferriol-Galmés, M., Rusek, K., Suárez-Varela, J., Xiao, S., Cheng, X., Barlet-Ros, P., and A. Cabellos-Aparicio, "RouteNet-Erlang: A Graph Neural Network for Network Performance Evaluation", 2022, <https://arxiv.org/abs/2202.13956>.
[inet-traffic] Popoola, J. and R. Ipinyomi, "Empirical Performance of Weibull Self-Similar Tele-traffic Model", 2017.
[MLP] Pal, S. and S. Mitra, "Multilayer perceptron, fuzzy sets, and classification", 1992, <https://doi.org/10.1109/72.159058>.
[RNN] Hochreiter, S. and J. Schmidhuber, "Long Short-Term Memory", 1997, <https://doi.org/10.1162/neco.1997.9.8.1735>.
Paillisse, et al. Expires 12 January 2023 [Page 20]
Internet-Draft Network Performance Digital Twin July 2022
[RNN-speech] Mikolov, T., Kombrink, S., Burget, L., ernocký, J., and S. Khudanpur, "Extensions of recurrent neural network language model", 2011, <https://doi.org/10.1109/ICASSP.2011.5947611>.
[GNN] Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and G. Monfardini, "The Graph Neural Network Model", 2009, <https://doi.org/10.1109/TNN.2008.2005605>.
[DEFO] Hartert, R., Vissicchio, S., Schaus, P., Bonaventure, O., Filsfils, C., Telkamp, T., and P. Francois, "A Declarative and Expressive Approach to Control Forwarding Paths in Carrier-Grade Networks", 2015, <https://doi.org/10.1145/2785956.2787495>.
[facebook-config] Sung, Y. E., Tie, X., Wong, S. H., and H. Zeng, "Robotron: Top-down Network Management at Facebook Scale", 2016, <https://doi.org/10.1145/2934872.2934874>.
[auto-dc-topology] Salman, S., Streiffer, C., Chen, H., Benson, T., and A. Kadav, "DeepConf: Automating Data Center Network Topologies Management with Machine Learning", 2018, <https://doi.org/10.1145/3229543.3229554>.
[CNN-images] Krizhevsky, A., Sutskever, I., and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", 2012, <https://proceedings.neurips.cc/paper/2012/file/ c399862d3b9d6b76c8436e924a68c45b-Paper.pdf>.
[MARL-TE] Bernárdez, G., Suárez-Varela, J., López, A., Wu, B., Xiao, S., Cheng, X., Barlet-Ros, P., and A. Cabellos-Aparicio, "Is Machine Learning Ready for Traffic Engineering Optimization?", 2021, <https://doi.org/10.1109/ICNP52444.2021.9651930>.
[LS] Gay, S., Hartert, R., and S. Vissicchio, "Expect the unexpected: Sub-second optimization for segment routing", 2017, <https://doi.org/10.1109/INFOCOM.2017.8056971>.
[DNN-TM] Valadarsky, A., Schapira, M., Shahaf, D., and A. Tamar, "Learning to Route", 2017, <https://doi.org/10.1145/3152434.3152441>.
Paillisse, et al. Expires 12 January 2023 [Page 21]
Internet-Draft Network Performance Digital Twin July 2022
[ReRoute-Cost] Zheng, J., Xu, Y., Wang, L., Dai, H., and G. Chen, "Online Joint Optimization on Traffic Engineering and Network Update in Software-defined WANs", 2021, <https://doi.org/10.1109/INFOCOM42981.2021.9488837>.
[NLP] Chowdhary, K. R., "Natural Language Processing", 2020, <https://doi.org/10.1007/978-81-322-3972-7_19>.
[Google-Clos] Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A., Provost, J., Simmons, J., Tanda, E., Wanderer, J., H\"{o}lzle, U., Stuart, S., and A. Vahdat, "Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network", 2015, <https://doi.org/10.1145/2785956.2787508>.
[digital-twin-AI] Mozo, A., Karamchandani, A., Gómez-Canaval, S., Sanz, M., Moreno, J. I., and A. Pastor, "B5GEMINI: AI-Driven Network Digital Twin", 2022, <https://www.mdpi.com/1424-8220/22/11/4106>.
Acknowledgements
TBD
Authors’ Addresses
Jordi Paillisse UPC-BarcelonaTech c/ Jordi Girona 1-3 08034 Barcelona Catalonia Spain Email: jordi.paillisse@upc.edu
Paul Almasan UPC-BarcelonaTech c/ Jordi Girona 1-3 08034 Barcelona Catalonia Spain Email: felician.paul.almasan@upc.edu
Paillisse, et al. Expires 12 January 2023 [Page 22]
Internet-Draft Network Performance Digital Twin July 2022
Miquel Ferriol UPC-BarcelonaTech c/ Jordi Girona 1-3 08034 Barcelona Catalonia Spain Email: miquel.ferriol@upc.edu
Pere Barlet UPC-BarcelonaTech c/ Jordi Girona 1-3 08034 Barcelona Catalonia Spain Email: pere.barlet@upc.edu
Albert Cabellos UPC-BarcelonaTech c/ Jordi Girona 1-3 08034 Barcelona Catalonia Spain Email: alberto.cabellos@upc.edu
Shihan Xiao Huawei China Email: xiaoshihan@huawei.com
Xiang Shi Huawei China Email: shixiang16@huawei.com
Xiangle Cheng Huawei China Email: chengxiangle1@huawei.com
Diego Perino Telefonica I+D Barcelona Spain Email: diego.perino@telefonica.com
Paillisse, et al. Expires 12 January 2023 [Page 23]
Internet-Draft Network Performance Digital Twin July 2022
Diego Lopez Telefonica I+D Seville Spain Email: diego.r.lopez@telefonica.com
Antonio Pastor Telefonica I+D Madrid Spain Email: antonio.pastorperales@telefonica.com
Paillisse, et al. Expires 12 January 2023 [Page 24]
Internet Research Task Force D. ChenInternet-Draft H. YangIntended status: Informational K. YaoExpires: 7 January 2023 China Mobile G. Fioccola Q. Wu Huawei 6 July 2022
Network measurement intent - one of IBN use cases draft-yang-nmrg-network-measurement-intent-05
Abstract
As an important technical means to detect network state, network measurement has attracted more and more attention in the development of network. However, the current network measurement technology has the problem that the measurement method and the measurement purpose cannot match well. To solve this problem, this memo introduces network measurement intent, presents a process of scheduling the network resource and measurement task to meet the user or network operator’s needs. And it can be seen as a specific use case of intent based network.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on 7 January 2023.
Chen, et al. Expires 7 January 2023 [Page 1]
Internet-Draft Network Working Group July 2022
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Definitions and Acronyms . . . . . . . . . . . . . . . . . . 3 3. Relationship to Existing Documents . . . . . . . . . . . . . 4 4. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5. Concrete Examples . . . . . . . . . . . . . . . . . . . . . . 7 5.1. Time Accuracy Measurement . . . . . . . . . . . . . . . . 8 5.2. Spatial Accuracy Measurement . . . . . . . . . . . . . . 10 6. Classification of NMI . . . . . . . . . . . . . . . . . . . . 12 6.1. Static NMI . . . . . . . . . . . . . . . . . . . . . . . 12 6.2. Dynamic NMI . . . . . . . . . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 9.1. Normative References . . . . . . . . . . . . . . . . . . 13 9.2. Informative References . . . . . . . . . . . . . . . . . 14 Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 15
1. Introduction
With the rapid growth of the present network, the scale of the network increases, while users’ service requirements for the network are getting strict and diversified,e.g., both loss requirements and throughput needs to be met simultaneously. At the same time, network resources growth is hard to meet user’s service requirments. In order to make good of network resources and improve utilization of the bandwidth, it becomes necessary to understand the current running state of the network, and network measurement, as a technical means to detect the network resource change, has been paid of more attention. The continuous development of network measurement technology has also increases higher precision of network awareness. However, both the traditional network measurement technology (e.g.,loss measurement and delay measurement defined in (RFC 2679
Chen, et al. Expires 7 January 2023 [Page 2]
Internet-Draft Network Working Group July 2022
[RFC2679]RFC 2680 [RFC2680]) and the network telemetry technology RFC 8639 [RFC8639]RFC 8641 [RFC8641][I-D.ietf-netconf-adaptive-subscription], which has emerged with the development of software-defined network in recent years, need to consume more network resources when detecting the network state changes and feeding back the detection results. Therefore, to some extent, the choice of network measurement methods, in addition to different accuracy of measurement results, will also cause different level of network load to the network.
In order to balance the accuracy of network measurement results with the network load, it is very important to choose the appropriate network measurement method according to the different requirements of network measurement. As a result, accurate on-demand network measurement technology is becoming more and more important. At the same time, the development of Intent based Network (IBN) enables the network to be configured according to users’ or network administrators’ intent. Therefore, we can integrate the network measurement with IBN, i.e., the users’ or network administrators’ perceived demand for network state is regarded as network measurement intent.
Our proposed approach is to use the network measurement intent to achieve network performance acquisition based on user/network administrator intent- , verify whether network measurement results meet the measurement intent, and further improve the accuracy of the configuration in IBN.
2. Definitions and Acronyms
CLI: Command-line Interface.
IBN: Intent based Network.
Policy: A set of rules that governs the choices in behavior of a system.
NMI: Network Measurement Intent, refers to based on user/network operator’s demand for network status, and automatically collect network status information on demand.
SLA: Service Level Agreement.
Chen, et al. Expires 7 January 2023 [Page 3]
Internet-Draft Network Working Group July 2022
3. Relationship to Existing Documents
As the rise of IBN, different groups have different definitions of The intent. For example, ONF [ONOS] defines intent is represented as a list of CLI modes that allows users to pass low-level details on the network; and there are two active RG drafts in the NMRG right now, Intent-Based Networking - Concepts and Definitions, [I-D.irtf-nmrg-ibn-concepts-definitions] solves the problem that "What is an intent?" and[I-D.irtf-nmrg-ibn-intent-classification]solves the problem "Given a specific intent, how to parse/disassemble it from different angles?".
Naturally, the question that needs to be solved after concept definition should be "How to realize an specific intent?".[I-D.irtf-nmrg-ibn-intent-classification]can be considered as the first step of realization of a given intent, however, it is not enough. Some other issues should be clarified, like" whether the input intent is valid or not?" , "What would the IBN system do when the result is not acceptable?", "If the result is not acceptable, does human/operator interference required?"... We should take a specific IBN use case for illustration of the realization procedure, so we will take the network measurement intent as an example.
Referring to the taxonomy of intent proposed in [I-D.irtf-nmrg-ibn-intent-classification], the network measurement intent can be classified into different categories.
Solution: the intent could cover carrier and data center.
Intent user type: customer.
Intent type: customer service intent.
Intent scope: Application, QoS.
Network scope: Radio Access, Transport, Edge, Core.
Abstraction: Non-technical.
Lifecycle Requirements: transient.
In order to integrate the NMI with the IBN, in this document we define the components of the NMI interactive process as follows:
* NMI Recognition and Acquisition
* NMI Translation
Chen, et al. Expires 7 January 2023 [Page 4]
Internet-Draft Network Working Group July 2022
* NMI Policy
* NMI Orchestration and pre-Verification
* Data Collection and Analytics
* NMI Compliance Assessment
4. Overview
As mentioned above, NMI refers to the on-demand measurement of the network state based on the user/network operators’ perceived intent of the network state.The user/network operators’ perceived intent is usually in the form of service level objective or service level expectation. We will take the measurement of the performance of the network overwhelming with the network traffic as a simple example and present the detailed interactive process for those components defined in section 3.
* NMI Recognition and Acquisition.
- In this function, NMI will be recognized by "ingesting" users’ or network operators’ measurement intent. They have the ability to identify the NMI of a certain network performance that users want to measure, such as delay, jitter, etc., and at the same time allow users to express the NMI of network performance in a variety of interactive ways to ensure the accuracy of the identification of the NMI. To achieve this functionality, such an interaction requires the use of the intent-northbound interface defined in the IBN,e.g., service interface model in [RFC8299][RFC8466] or intent interface defined in [TMF1253A].
* NMI Translation.
- In this function, NMI needs to be translated into corresponding measurement policy, which includes but is not limited to network performance parameters to be measured (such as delay, jitter, and packet loss), time period to be measured, and measurement unit. For a simple example, in the measurement of busy network performances, due to dynamic changes of network characteristics, such as daily network bandwidth utilization rate, the period of network busy time is not fixed. As a result, NMI Policy generated by NMI Translation can determine the threshold when the network state is busy or the network is congested on the same day based on the historical data learned by AI.
Chen, et al. Expires 7 January 2023 [Page 5]
Internet-Draft Network Working Group July 2022
* NMI Policy
- In this function, NMI policy needs to be translated into actions and instructioninvoked against the specified network element. Therefore, NMI policy generated by NMI Translation must be executable, that is, corresponding underlying network devices must be able to support policy execution. If the generated policy cannot be executed by the underlying device, the policy needs to be adjusted. And if the measurement results cannot meet the service requirements set by the users and network operators, the policy also needs to be adjusted.
* NMI Orchestration and pre-Verification.
- In this function, according to the previous NMI Translation and NMI Policy step, NMI Orchestration and pre-Verification determines the measurement scheme according to the measurement policy generated by NMI Policy, and pre-verifies whether the measurement scheme is feasible.
- Take busy time network measurement as an example, besides choosing of measurement schemes and assigning measurement tasks [RFC8639][RFC8641][I-D.ietf-netconf-adaptive-subscription][RFC8 194][I-D.ietf-netmod-eca-policy], it also needs to determine whether the network is busy according to the current network state. In addition, this function performs automatic network deployment,e.g.,using model driven network management approach defined in [RFC8969].
* Data Collection and Analytics.
- In NMI, data collection and analysis should be based on the selected measurement scheme and parameters set to be measured that determined in previous steps, automatically realize the collection on demand, and generate corresponding data analysis results.
* NMI Compliance Assessment.
- At the end, this function verifies whether the results meets the service requirement and whether the NMI is satisfied. If either of the two conditions is not satisfied, the NMI should be modified and re-enter the NMI Policy.
And he measurement flow diagram is shown as the following figure:
Chen, et al. Expires 7 January 2023 [Page 6]
Internet-Draft Network Working Group July 2022
+ ^ NMI input| | +---------v-------+ | | NMI Recognition | |Measurement |and Acquisition | |Results +--------+--------+ |Feedback | | +--------v--------+ | | NMI Translation | | +--------+--------+ | | +---+----- -----+ +--------v--------+ |NMI Compliance | | NMI Policy <------+Assessment | +--------+--------+ +--^------------+ | | +---------v-----------+ +--+--------------+ | NMI Orchestration | | Data Collection | | and pre-Verification| | and Analytics | +---------+-----------+ +--^--------------+ | | +---v------------------+---+ | Network Infrastructure | +--------------------------+
5. Concrete Examples
In this section, we will take SLA measurement intent as an example to illustrate each step of the process.
With the development of measurement technology in recent years, network measurement methods can be divided into active measurement, passive measurement and a hybrid measurement [RFC7799]. No matter which measurement technology is used, the network resource consumption will be influenced by the network condition and change over the time.e.g.,, if the transmission frequency of active measurement message is too fast, it will occupy too much bandwidth resources and affect the normal operation of actual business. While if the transmission frequency is too slow, some instantaneous network anomalies will be missed and the network status cannot be accurately reflected. Passive measurement requires real- time collection of actual business data. If the sampling rate is too high, a large amount of data will be accumulated in a short time [I-D.ietf-netconf-adaptive-subscription].The analysis system for real-time analysis of these data needs strong processing capacity; if the sampling rate is too low, some network anomalies will also be omitted.
Chen, et al. Expires 7 January 2023 [Page 7]
Internet-Draft Network Working Group July 2022
How to balance and accurately measure the network state, especially the abnormal network affecting the service, while occupying as little network bandwidth as possible, and the processing capacity of the data analysis system is not high, this is the function that the NMI scheme based on IBN should realize.
In this section, we will consider two examples to illustrate each step of the process.
5.1. Time Accuracy Measurement
Taking network SLA performance metric -- delay measurement as an example, the simple schematic diagram is as follows, different thresholds, warning value and alert value should be set for network delay in advance. When the delay value is below warning, the network is normal and the business is normal. When the delay is between warning value and alert value, the network fluctuation is abnormal, but the business is normal. When the delay exceeds the alert value, both the network and business are abnormal. For delay in different thresholds, different measurement strategies should be adopted:
* When the network delay exceeds the alert value, or when the historical data predict that the delay will exceed the alert value, passive measurement requires 100% sampling of business data, and the transmission frequency of active measurement is modulated to the maximum. At the same time, the log and alarm data of the whole network equipment are collected to realize the most fine-grained measurement of the network, locate the root cause of the problem and repair the network in time.
* When the network delay exceeds warning value but is lower than alert value, passive measurement samples 60% of business data, and the transmission message frequency of the active measurement is adjusted to the median value, and the running state data of some key devices in the network is collected synchronously.
* When the network delay is less than warning value, passive measurement data is sampled at 20%, and active measurement message frequency is adjusted to the lowest, and the network equipment running state of key nodes can be collected as needed.
Chen, et al. Expires 7 January 2023 [Page 8]
Internet-Draft Network Working Group July 2022
^ms | | | XX | X X Sampling Rate 100% | XX X alert +--------------------------------------------------------+ | X X Sampling Rate 60% | X XX | X X XX | XX X X XXX | XXX X X X X | XX X X X X XX | X XX X X XX XX X XX warning +-------------------------------------------------------+ | X XX X XX X XX X XX XX | XX X X X X XX XX X X | XX X X X X X XX XXX X | X XX XXX X XX X | X XX XX X | X XX Sampling Rate 20% | +----------------------------------------------------------->
Based on the above SLA time delay index measurement, different thresholds adopt different measurement strategies, the concrete steps of SLA measurement intent are as follows:
* In NMI Recognition and Acquisition, SLA measurement intent is recognized, and business requirements and performance metrics are identified by interacting with users. Then the NMI Recognition and Acquisition module inputs the SLA measurement intent into the NMI Translation module.
* The NMI Translation module consolidates the SLA measurement intent with the measurement policy in NMI Policy, and outputs the executable measurement policy, such as the message transmission frequency of active measurement, the sampling rate of passive measurement, the collection range of equipment running state, etc.
* The NMI Orchestration and pre-Verification module uses the measurement policy as input and for orchestration layer which is responsible for translating it into the specific configuration and execution time of each device in the tested network. The NMI Orchestration and pre-Verification module verifies the implementation of the policy in the equipment and pre-analyzes the measurement results.
Chen, et al. Expires 7 January 2023 [Page 9]
Internet-Draft Network Working Group July 2022
* The Data Collection and Analysis module will collect the measurement data according to the configuration and execution time requirements of the previous step, make a simple analysis of the collected data (e.g.,verify the correctness of the measurement data), and then send the collected measurement data to the NMI Compliance Assessment module. After that, the NMI Compliance Assessment module feedbacks the measurement results (e.g., the measurement results match user intent) to the user to complete the closed loop of the measurement task.
* The NMI Compliance Assessment module evaluates whether the actual measurement results are in line with the user’s intent. If they are, the results will be fed back. If they are not, the NMI Policy module will be informed to adjust the policy, and then the measurement will be restarted. According to the measurement results, the NMI Compliance Assessment module notifies the NMI Orchestration and pre-Verification module to modify the execution time of the policy in time, and at the same time updates the measured results to the delay history database to improve the accuracy of delay prediction.
5.2. Spatial Accuracy Measurement
The desired approach is to accurately measure the network state, especially when there are some issues affecting the service, but at the same time, reduce the resources to be employed to achieve the desired accuracy.
In this regard, the Clustered Alternate-Marking framework[RFC8889] adds flexibility to Performance Measurement (PM), because it can reduce the order of magnitude of the packet counters. This allows the NMI Orchestration and pre-Verification module to supervise, control, and manage PM in large networks.
[RFC8889] introduces the concept of cluster partition of a network. The monitored network can be considered as a whole or split into clusters that are the smallest subnetworks (group-to-group segments), maintaining the packet loss property for each subnetwork. The clusters can be combined in new connected subnetworks at different levels, forming new clusters, depending on the level of detail to achieve.
The clustered performance measurement intent represents the spatial accuracy, that is the size of the subnetworks to consider for the monitoring. It is possible to start without examining in depth and, in case of necessity, the "network zooming" approach can be used.
Chen, et al. Expires 7 January 2023 [Page 10]
Internet-Draft Network Working Group July 2022
This approach called "network zooming" and can be performed in two different ways:
1. change the traffic filter and select more detailed flows;
2. activate new measurement points by defining more specified clusters.
The network-zooming approach implies that some filters, rules or flow identifiers are changed. But these changes must be done in a way that do not affect the performance. Therefore there could be a transient time to wait once the new network configuration takes effect. Anyway, if the performance issue is relevant, it is likely to last for a time much longer than the transient time.
The concrete steps of the clustered performance measurement intent are as follows:
* In NMI Recognition and Acquisition, the clustered performance measurement intent is recognized. Then the NMI Recognition and Acquisition module inputs the clustered performance measurement intent into the NMI Translation module.
* The NMI Translation module analyzes the clustered performance measurement intent and outputs the executable measurement policy, such as network partition and the spatial accuracy for the monitoring.
* The NMI Orchestration and pre-Verification module arranges and calibrates the measurement with the specific configuration to split the whole network into clusters at different levels.
* The Data Collection and Analysis module collects the measurement data from the different clusters, and then send these data to the NMI Compliance Assessment module. It verifies the performance for each cluster and send the measurement results to the user.
* The NMI Compliance Assessment module, in case a cluster is experiencing a packet loss or the delay is high, notifies the NMI Orchestration and pre-Verification module to modify the cluster partition of the network for further investigation. The network configuration can be immediately modified in order to perform a new partition of the network but only for the cluster with bad performance. In this way, the problem can be localized with successive approximation up to a flow detailed analysis. This is the so-called "closed loop" performance management.
Chen, et al. Expires 7 January 2023 [Page 11]
Internet-Draft Network Working Group July 2022
6. Classification of NMI
In this section, we divide the network measurement intent into static NMI and dynamic NMI according to different requirement characteristics.
6.1. Static NMI
Static NMI refers to the measurement purposes remain unchanged and is independent of the network state/external environment. Static NMI can be translated into determined network performance indicator values, such as concrete delay values, network bandwidth utilization, throughput and so on.
Because the static NMI can be translated into the measurement of the determined network performance parameters, the whole process is relatively simple and error-prone, and only needs to verify whether the measurement results meet the requirements.
6.2. Dynamic NMI
Dynamic NMI refers to the measurement purpose remains unchanged but the measurement process changes dynamically according to the network state/external environment. Dynamic NMI can also be translated into the measurement of determined network performance parameters, however, the values of network performance parameters will change with the changes of network states and external environment.
For example, the measurement of busy network performances mentioned in the previous section. Although the corresponding network parameters for judging whether the network is busy are determined, the corresponding network parameters have different values according to different network states and external environments.
Due to the dynamic nature of dynamic NMI, its processing process is more complex than static NMI. It is not only necessary to verify the accuracy of demand analysis, but also to verify whether the final measurement results meet the requirements.
7. Security Considerations
This document introduces the network measurement intent, and uses two concrete examples to illustrate the process of network measurement intent. On the basis of existing intent work, this document can be used as a use case for IBN.
Chen, et al. Expires 7 January 2023 [Page 12]
Internet-Draft Network Working Group July 2022
[I-D.irtf-nmrg-ibn-concepts-definitions]provides a comprehensive discussion of security considerations in the context of IBN, which are generally applicable also to the network measurement intent discussed in this document.
8. IANA Considerations
This document has no requests to IANA.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way Delay Metric for IPPM", RFC 2679, DOI 10.17487/RFC2679, September 1999, <https://www.rfc-editor.org/info/rfc2679>.
[RFC2680] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way Packet Loss Metric for IPPM", RFC 2680, DOI 10.17487/RFC2680, September 1999, <https://www.rfc-editor.org/info/rfc2680>.
[RFC7799] Morton, A., "Active and Passive Metrics and Methods (with Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, May 2016, <https://www.rfc-editor.org/info/rfc7799>.
[RFC8194] Schoenwaelder, J. and V. Bajpai, "A YANG Data Model for LMAP Measurement Agents", RFC 8194, DOI 10.17487/RFC8194, August 2017, <https://www.rfc-editor.org/info/rfc8194>.
[RFC8299] Wu, Q., Ed., Litkowski, S., Tomotaki, L., and K. Ogaki, "YANG Data Model for L3VPN Service Delivery", RFC 8299, DOI 10.17487/RFC8299, January 2018, <https://www.rfc-editor.org/info/rfc8299>.
[RFC8466] Wen, B., Fioccola, G., Ed., Xie, C., and L. Jalil, "A YANG Data Model for Layer 2 Virtual Private Network (L2VPN) Service Delivery", RFC 8466, DOI 10.17487/RFC8466, October 2018, <https://www.rfc-editor.org/info/rfc8466>.
Chen, et al. Expires 7 January 2023 [Page 13]
Internet-Draft Network Working Group July 2022
[RFC8639] Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard, E., and A. Tripathy, "Subscription to YANG Notifications", RFC 8639, DOI 10.17487/RFC8639, September 2019, <https://www.rfc-editor.org/info/rfc8639>.
[RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, September 2019, <https://www.rfc-editor.org/info/rfc8641>.
[RFC8889] Fioccola, G., Ed., Cociglio, M., Sapio, A., and R. Sisto, "Multipoint Alternate-Marking Method for Passive and Hybrid Performance Monitoring", RFC 8889, DOI 10.17487/RFC8889, August 2020, <https://www.rfc-editor.org/info/rfc8889>.
[RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and L. Geng, "A Framework for Automating Service and Network Management with YANG", RFC 8969, DOI 10.17487/RFC8969, January 2021, <https://www.rfc-editor.org/info/rfc8969>.
9.2. Informative References
[I-D.ietf-netconf-adaptive-subscription] Wu, Q., Song, W., Liu, P., Ma, Q., Wang, W., and Z. Niu, "Adaptive Subscription to YANG Notification", Work in Progress, Internet-Draft, draft-ietf-netconf-adaptive- subscription-00, 23 June 2022, <https://www.ietf.org/archive/id/draft-ietf-netconf- adaptive-subscription-00.txt>.
[I-D.ietf-netmod-eca-policy] Wu, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise, "A YANG Data model for ECA Policy Management", Work in Progress, Internet-Draft, draft-ietf-netmod-eca-policy-01, 19 February 2021, <https://www.ietf.org/archive/id/draft- ietf-netmod-eca-policy-01.txt>.
[I-D.irtf-nmrg-ibn-concepts-definitions] Clemm, A., Ciavaglia, L., Granville, L. Z., and J. Tantsura, "Intent-Based Networking - Concepts and Definitions", Work in Progress, Internet-Draft, draft- irtf-nmrg-ibn-concepts-definitions-09, 24 March 2022, <https://www.ietf.org/archive/id/draft-irtf-nmrg-ibn- concepts-definitions-09.txt>.
[I-D.irtf-nmrg-ibn-intent-classification] Li, C., Havel, O., Olariu, A., Martinez-Julia, P., Nobre, J. C., and D. R. Lopez, "Intent Classification", Work in
Chen, et al. Expires 7 January 2023 [Page 14]
Internet-Draft Network Working Group July 2022
Progress, Internet-Draft, draft-irtf-nmrg-ibn-intent- classification-08, 18 May 2022, <https://www.ietf.org/archive/id/draft-irtf-nmrg-ibn- intent-classification-08.txt>.
Authors’ Addresses
Danyang Chen China Mobile Beijing 100053 China Email: chendanyang@chinamobile.com
Hongwei Yang China Mobile Beijing 100053 China Email: yanghongwei@chinamobile.com
Kehan Yao China Mobile Beijing 100053 China Email: yaokehan@chinamobile.com
Giuseppe Fioccola Huawei Riesstrasse, 25 80992 Munich Germany Email: giuseppe.fioccola@huawei.com
Qin Wu Huawei 101 Software Avenue, Yuhua District Nanjing 210012 China Email: bill.wu@huawei.com
Chen, et al. Expires 7 January 2023 [Page 15]
Internet Research Task Force H. YangInternet-Draft D. ChenIntended status: Informational China MobileExpires: 2 January 2023 1 July 2022
One-way delay measurement method based on Digital Twin Network draft-yc-nmrg-dtn-owd-measurement-00
Abstract
This document implements an accurate network delay measurement method based on the digital twin network. This method does not need to send measurement packets, change the physical network configuration, change the format of service packets, and do not require physical network elements to support the time synchronization protocol. Two- way delay and one-way delay measurement of any service packet.The digital twin network architecture of this document follows the NMRG working group paper draft-irtf-nmrg-network-digital-twin-arch-00.
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on 2 January 2023.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.
Yang & Chen Expires 2 January 2023 [Page 1]
Internet-Draft Digital Twin Network July 2022
This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions Used in This Document . . . . . . . . . . . . . . 4 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 3. Method Introduction . . . . . . . . . . . . . . . . . . . . . 4 4. Implementation Process . . . . . . . . . . . . . . . . . . . 6 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 8. Normative References . . . . . . . . . . . . . . . . . . . . 8 Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction
Digital twin network is a virtual representation of the physical network. Such virtual representation of the network is meant to be used to analyze, diagnose, emulate, and then control the physical network based on data, models, and interfaces. The DTN architecture diagram is shown in Figure 1.
Yang & Chen Expires 2 January 2023 [Page 2]
Internet-Draft Digital Twin Network July 2022
+---------------------------------------------------------+ | +-------+ +-------+ +-------+ | | | App 1 | | App 2 | ... | App n | Application| | +-------+ +-------+ +-------+ | +-------------^-------------------+-----------------------+ |Capability Exposure| Intent Input | | +-------------+-------------------v-----------------------+ | Instance of Digital Twin Network | | +--------+ +------------------------+ +--------+ | | | | | Service Mapping Models | | | | | | | | +------------------+ | | | | | | Data +---> |Functional Models | +---> Digital| | | | Repo- | | +-----+-----^------+ | | Twin | | | | sitory | | | | | | Network| | | | | | +-----v-----+------+ | | Mgmt | | | | <---+ | Basic Models | <---+ | | | | | | +------------------+ | | | | | +--------+ +------------------------+ +--------+ | +--------^----------------------------+-------------------+ | | | data collection | control +--------+----------------------------v-------------------+ | Physical Network | | | +---------------------------------------------------------+
Figure 1: Figure1:Reference Architecture of Digital Twin Network
The digital twin layer forms a network element model by modeling physical network elements, and the network element model forms a twin network element through instantiation, that is, each physical network element in the physical network has a corresponding twin network element in the digital twin layer. Similarly, each physical flow of the physical network also has a corresponding twin flow at the digital twin layer.
Traditional network delay measurement methods include active measurement, passive measurement, hybrid measurement, etc., but they all have some disadvantages:
1) It is necessary to inject measurement packets into the physical network, but this will affect the forwarding behavior of actual service traffic, affect the accuracy of delay measurement, and increase the network burden and occupy network resources;
Yang & Chen Expires 2 January 2023 [Page 3]
Internet-Draft Digital Twin Network July 2022
2) It is impossible to perform accurate delay measurement on the packets of all network protocols. For example, it is difficult to measure the one-way delay for UDP packets;
3) Some solutions need to change the format of service packets and insert measurement parameters, but this requires upgrading the physical network, which is difficult to implement, and affects the normal forwarding behavior of service packets and affects the measurement accuracy;
4) The time synchronization protocol is required to measure the one- way delay of the network, and the physical network is required to support this protocol, which increases the difficulty of implementing the solution.
2. Conventions Used in This Document
2.1. Terminology
NTP Network Time Protocol
PTP Precision Time Protocol
DTN Digital Twin Network
2.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.
3. Method Introduction
The delay measurement method based on DTN is as follows:
1) According to the digital twin network architecture, build a digital twin layer, including twin network elements corresponding to physical network elements, such as twin switches, twin routers, etc.;
Yang & Chen Expires 2 January 2023 [Page 4]
Internet-Draft Digital Twin Network July 2022
2) Time synchronization is maintained between each twin network element in the digital twin layer. a) If multiple twin NEs are in the same physical entity, such as the NFV-based modeling method, where multiple twin NEs are deployed in one server and share the same local clock, the twin NEs themselves is time-synchronized; b) If multiple twin NEs are deployed in different physical entities, use PTP (Precise Time Protocol) [IEEE.1588.2008]or NTP (Network Time Protocol) [RFC5905]to achieve time synchronization between physical entities to ensure time synchronization of all twin NEs;
3) The data transmission from the physical network layer to the digital twin layer uses a delay deterministic network (Detnet) to ensure that the data transmission delay between each physical network element and the twin network element is deterministic or pre- calculable, as shown in the figure 2. T1˜Tn is the delay of data transmission; the delay deterministic network can be based on TSN or DIP technology;
4) When a flow of the physical network is input from the physical network element 1, passes through the physical network elements 2 and 3, and finally is output from the physical network element n. When physical network element 1 receives the data packet, it will normally forward the data to physical network element 2 and transmit the data to twin network element 1 at the same time. At this time, the local time of the twin NE 1 is t1, and the deterministic network transmission delay is T1, then the arrival time of the traffic information recorded by the twin NE is t1-T1; similarly, the arrival time of the data packet recorded by other twin NEs is tn- Tn.
5) Finally, according to the arrival time of the data packet at the twin network elements, its one-way transmission delay between physical network elements can be calculated.
Yang & Chen Expires 2 January 2023 [Page 5]
Internet-Draft Digital Twin Network July 2022
+--------------------------------------------------------------+ | Digital Twin Network +----------+ | | +---------+ Twin NE 3+----------+ | | | +----------+ | | | | | | | -----------+ +-----+----+ +----------+ +-----+----+ | | | Twin NE 1+----+ Twin NE 2+----+ Twin NE 4+----+ Twin NE n| | | -----------+ +----------+ +----------+ +----------+ | +--------------------------------------------------------------+ | +-------------------------------+------------------------------+ | Delay Deterministic Networking | +-------------------------------^------------------------------+ | +---------------------------------+---------------------------------+ | Phsical Network +------------+ | | +----------+Physical NE3+----------+ | | | +------------+ | | | | | | | +------------+ +-----+------+ +------------+ +------+-----+ | | |Physical NE1+---+Physical NE2+---+Physical NE4+---+Physical NEn| | | +------------+ +------------+ +------------+ +------------+ | +-------------------------------------------------------------------+
Figure 2: Figure 2: Between the physical network and the twin network is a delay deterministic network
4. Implementation Process
The detailed calculation process is shown in Figure 3:
(1) When the traffic data to be measured reaches physical network element 1, physical network element 1 forwards the traffic to physical network element 2, but also transmits the data to twin network element 1, and the transmission delay is T1. The local time of network element 1 is t1, and the arrival time of the data recorded by twin network element 1 is t1-T1;
(2) When the data packet is forwarded to physical network element 2, physical network element 2 will also forward it to physical network element 3 normally, but also to twin network element 2, and the delay to reach twin network element 2 is T2 , at this time, the local time of twin network element 2 is t2, and the arrival time of data packet information recorded by twin network element 2 is t2-T2, then (t2-T2)-(t1-T1) is the data packet from physical network element 1 to One-way delay of physical network element 2.
Yang & Chen Expires 2 January 2023 [Page 6]
Internet-Draft Digital Twin Network July 2022
(3) Similarly, when the data packet reaches the nth physical network element, the nth physical network element will also transmit the data packet to the twin network element n. The data transmission time is Tn, and the local time of the twin network element n is tn, then record tn. -Tn is the time when the packet reaches the twin network element n, then (tn-Tn)-(t1-T1) is the one-way transmission delay of the data packet from physical network element 1 to physical network element n;
So far, the one-way transmission delay of data packets between physical NEs is obtained by calculating the time when the data packet to be tested reaches the twin NEs. During the measurement process, only time synchronization between twin NEs is required, but no physical network is required. Inter-meta time synchronization. The accuracy of delay measurement depends on the time synchronization accuracy of the twin network elements and the time synchronization accuracy of the delay deterministic network. If both use the PTP synchronization protocol, the delay measurement accuracy can reach the nanosecond level.
+--------+ +--------+ +--------+ +------+ +------+ +------+ +------+ |Physical| |Physical| |Physical| |Detnet| | Twin | | Twin | | Twin | | NE1 | | NE2 | | NEn | | | | NE1 | | NE2 | | NEn | +----+---+ +----+---+ +----+---+ +---+--+ +---+--+ +---+--+ +---+--+ | | | | | | | |1.The packet is sent from physical NE1 to twin NE1, | |and twin NE1 records the arri^al time of|the packet | +----------+----------+---------+------->+ | | | | | | | | | | |2.The packet is sent|from physical NE2|to twin NE2, | |and twin NE2 records|the arri^al time of the packet | +----------+---------------------------> | | | | | | | | | | | | | | | | | n.The packet is sent from physical NEn to twin NEn, | | and twin NEn records the arri^al time of the|packet | | +---------+--------+--------+--------> | | | | | | | | | | | | | | | | | | | | |
Figure 3: Figure 3: Delay Measurement Process
Yang & Chen Expires 2 January 2023 [Page 7]
Internet-Draft Digital Twin Network July 2022
5. Conclusion
This method can realize segment-by-segment or end-to-end one-way delay measurement in the physical network. The advantages of this method include: no need to send measurement packets, all traffic protocol types can be measured, physical network configuration is not changed, and traffic data format is not changed. , It does not need the physical network to support the time synchronization protocol, and the measurement accuracy is high.
6. Security Considerations
TBD.
7. IANA Considerations
TBD.
8. Normative References
[IEEE.1588.2008] IEEE, "IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems", July 2008.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, "Network Time Protocol Version 4: Protocol and Algorithms Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, <https://www.rfc-editor.org/info/rfc5905>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
Authors’ Addresses
Hongwei Yang China Mobile Beijing 100053 China Email: yanghongwei@chinamobile.com
Yang & Chen Expires 2 January 2023 [Page 8]
Internet-Draft Digital Twin Network July 2022
Danyang Chen China Mobile Beijing 100053 China Email: chendanyang@chinamobile.com
Yang & Chen Expires 2 January 2023 [Page 9]
Internet Research Task Force H. YangInternet-Draft C. ZhouIntended status: Informational China MobileExpires: 2 January 2023 1 July 2022
Digital Twin Network Flow Simulation draft-yz-nmrg-dtn-flow-simulation-00
Abstract
Some important application scenarios of digital twin network, such as network new technology experiment, network configuration verification, network performance optimization, etc., all require the virtual traffic in the twin network to accurately simulate the real traffic in the physical network.The real traffic in the physical network is called the physical traffic, and the virtual traffic in the twin network is called the twin traffic. In order to realize the high-fidelity simulation of the physical traffic by the twin traffic, this paper proposes that the twin traffic and the physical traffic should satisfy three consistent characteristics, and An implementation method of twin flow is introduced.
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on 2 January 2023.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document.
Yang & Zhou Expires 2 January 2023 [Page 1]
Internet-Draft Digital Twin Network July 2022
Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions Used in This Document . . . . . . . . . . . . . . 4 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 3. Key characteristics of DTN flow . . . . . . . . . . . . . . . 4 4. DTN flow implementation method . . . . . . . . . . . . . . . 4 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 8. Normative References . . . . . . . . . . . . . . . . . . . . 9 Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 9
1. Introduction
Digital twin network is a virtual representation of the physical network. Such virtual representation of the network is meant to be used to analyze, diagnose, emulate, and then control the physical network based on data, models, and interfaces. The DTN architecture diagram is shown in Figure 1.
Yang & Zhou Expires 2 January 2023 [Page 2]
Internet-Draft Digital Twin Network July 2022
+---------------------------------------------------------+ | +-------+ +-------+ +-------+ | | | App 1 | | App 2 | ... | App n | Application| | +-------+ +-------+ +-------+ | +-------------^-------------------+-----------------------+ |Capability Exposure| Intent Input | | +-------------+-------------------v-----------------------+ | Instance of Digital Twin Network | | +--------+ +------------------------+ +--------+ | | | | | Service Mapping Models | | | | | | | | +------------------+ | | | | | | Data +---> |Functional Models | +---> Digital| | | | Repo- | | +-----+-----^------+ | | Twin | | | | sitory | | | | | | Network| | | | | | +-----v-----+------+ | | Mgmt | | | | <---+ | Basic Models | <---+ | | | | | | +------------------+ | | | | | +--------+ +------------------------+ +--------+ | +--------^----------------------------+-------------------+ | | | data collection | control +--------+----------------------------v-------------------+ | Physical Network | | | +---------------------------------------------------------+
Figure 1: Figure1:Reference Architecture of Digital Twin Network
The digital twin layer forms a network element model by modeling physical network elements, and the network element model forms a twin network element through instantiation, that is, each physical network element in the physical network has a corresponding twin network element in the digital twin layer. Similarly, each physical flow of the physical network also has a corresponding twin flow at the digital twin layer.
Through the real-time data interaction between the physical network and the twin network, the physical network elements, network topology, network traffic, network status and other data in the physical network are virtualized at the twin network layer. The topology of the physical network and the twin network are consistent, The number of NEs is the same, and the traffic information is the same.
Yang & Zhou Expires 2 January 2023 [Page 3]
Internet-Draft Digital Twin Network July 2022
2. Conventions Used in This Document
2.1. Terminology
DTN Digital Twin Network
2.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.
3. Key characteristics of DTN flow
The twin network layer needs to accurately simulate the traffic of the physical network to support the normal operation of the network application layer.The twin traffic of the twin network layer and the physical traffic of the physical network need to satisfy the following three characteristics at the same time.
1) The two traffic forwarding paths are consistent, that is, the twin nodes that twin traffic passes through at the twin network layer are consistent with the physical nodes that physical traffic passes through at the physical network layer;
2) The network performance of the two types of traffic is consistent, that is, the twin traffic and the physical traffic have the same performance as network delay, packet loss, and jitter;
3) The two traffic data characteristics are consistent, that is, the data packets of twin traffic and physical traffic have the same key characteristics such as traffic rate, quintuple information, data packet length, and data packet priority;
4. DTN flow implementation method
If the twin flow and physical flow are to meet the above three characteristics, three problems need to be solved:
1) The physical network element and the twin network element have unique identifiers in the entire network, so as to realize the mutual correspondence between the two. The physical traffic passes through those physical network elements, and the twin traffic also passes through the corresponding twin network element, so as to achieve the same forwarding path;
Yang & Zhou Expires 2 January 2023 [Page 4]
Internet-Draft Digital Twin Network July 2022
2) The physical flow is uniformly collected and managed by the Data Repository of the twin network layer, and then distributed to each twin network element. Because the time for each physical network element to complete data collection and data transmission is inconsistent, in order to ensure that the twin flow and physical flow have the same performance as forwarding delay, packet loss, and jitter, the twin flow must be delayed by a fixed time. That is, the twin flow delays the physical flow by a fixed time.
3) The flow data collected by the Data Repository should include the key information of physical flow, so that the twin flow and physical flow data characteristics are consistent; when the Data Repository collects physical flow, it can be collected in full package by package or partially collected at a certain sampling rate;
+--------------------------------------------------------------+ | Digital Twin Network +----------+ | | +---------+ Twin NE 3+----------+ | | | +----------+ | | | | | | | -----------+ +-----+----+ +----------+ +-----+----+ | | | Twin NE 1+----+ Twin NE 2+----+ Twin NE 4+----+ Twin NE n| | | -----------+ +----------+ +----------+ +----------+ | | +-----------------+ | | | Data Repository | | | +-----------------+ | +--------------------------------------------------------------+ | +-------------------------------+------------------------------+ | Delay Deterministic Networking | +-------------------------------^------------------------------+ | +---------------------------------+---------------------------------+ | Phsical Network +------------+ | | +----------+Physical NE3+----------+ | | | +------------+ | | | | | | | +------------+ +-----+------+ +------------+ +------+-----+ | | |Physical NE1+---+Physical NE2+---+Physical NE4+---+Physical NEn| | | +------------+ +------------+ +------------+ +------------+ | +-------------------------------------------------------------------+
Figure 2: Figure 2: Twin Flow and Physical Flow
For the above three problems, use the following three methods to solve:
Yang & Zhou Expires 2 January 2023 [Page 5]
Internet-Draft Digital Twin Network July 2022
1) Each physical network element has a system MAC address, because the MAC address is unique in the whole network and can be used as the identifier of the physical network element. The twin NE ID can be extended based on the physical NE ID. For example, an 8-bit custom field is added after the MAC address of the physical NE system, for example, to identify the device type. The twin NE is identified based on the MAC address of the physical NE, which not only realizes the one-to-one correspondence between the physical NE and the twin NE, but also realizes the unique identification of the twin NE in the entire network.
2) The data transmission network between the physical network element and the Data Repository uses a delay deterministic network, such as TSN (Time Sensitive Network), DIP (Deterministic Internet Network), etc. Since the delays of different physical network elements to transmit data to the Data Repository may be different, if a delay deterministic network is used, the data transmission delays T1˜Tn are fixed and can be pre-calculated. After the Data Repository calculates T1˜Tn, the maximum value Tmax is selected as the reference time. Assume that the data collected from each physical network element arrives at the Data Repository from t1 to tn. If the data transmission time Tn<Tmax, the Data Repository waits for (Tmax-Tn) time before transmitting the data to the twin network elements. If Tn =Tmax, then Tmax-Tn=0, the Data Repository immediately transmits the data to the twin network elements. Because the Data Repository and twin network elements are deployed in the same local area network or the same physical entity (such as a server), the transmission delay between the Data Repository and each twin network element can be ignored. So far, all twin flow is delayed by a fixed time Tmax compared to physical flow, but the forwarding delay, jitter, packet loss and other performances of the two are the same.
3) The data collected by the Data Repository needs to contain key information of physical flow, such as physical network element MAC address, traffic sampling rate, source MAC, destination MAC, protocol type, source IP address, destination IP address, protocol number, source port number , destination port number, packet priority, packet length, packet forwarding delay, etc. The first two parameters are mandatory, and the latter fields are optional according to application requirements.
The implementation steps of twin flow are as follows, as shown in Figure 3:
(1) To build a digital twin network, the physical network elements and the twin network elements are in one-to-one correspondence through the unique identifiers of the entire network, and the number of network elements and the topology are consistent;
Yang & Zhou Expires 2 January 2023 [Page 6]
Internet-Draft Digital Twin Network July 2022
(2) The physical network element forms a data set of key flow information, such as {network element identification, sampling rate, source MAC, destination MAC, protocol type, source IP address, destination IP address};
(3) The Data Repository collects the data sets of each physical network element, and calculates the maximum delay Tmax of data transmission;
(4) After the Data Repository collects the data set, it is sent to the corresponding twin network element according to the physical network element identifier
(5) Twin network elements generate twin flow according to the sampling rate and flow information of the dataset. Because the data transmission delay between the physical network element and the Data Repository is fixed at Tmax, all the flow of the twin network is delayed by Tmax relative to the physical flow. . Because the Data Repository and the twin network elements are in the same server or local area network, the transmission delay is negligible.
Yang & Zhou Expires 2 January 2023 [Page 7]
Internet-Draft Digital Twin Network July 2022
+---------+ +-------+ +-----------+ +------+ | Physical| |Detnet | | Data | | Twin | | NE | | | | Repository| | NE | +-----+---+ +---+---+ +-----+-----+ +---+--+ | | | | | | | | |1.According to the characteristics of |physical NEs, build twin NEs | +----------+------------+------------> | | | | | | | | +-------------------------+ | | | | 2. Physical NEs|collect | | | | | key flow information | | | | | and form a data|set | | | | +-------------------------+ | | | | | | | | | | | 3.The|dataset is|sent to the Data Repository +---------------------->| | | | 4.The Data Repository sendsto | | the corresponding twin NE | | according to the NE identifier | | of the data|set | | | +----------->+ | | | | | | | +----------+---------------+ | | | | 5.The twin NEs generate | | | | | twin flow|according to | | | | | the data set information | | | | +----------+---------------+ | | | |
Figure 3: Figure 3: The generation process of twin traffic
5. Conclusion
This paper realizes high-precision simulation of DTN twin flow, so that twin flow and physical flow meet the following three characteristics:
1) The forwarding paths of the two types of flow are the same, that is, the physical nodes they pass through are the same;
2) The network performance of the two types of flow is the same, that is, the two have the same performance as network delay, packet loss, and jitter;
Yang & Zhou Expires 2 January 2023 [Page 8]
Internet-Draft Digital Twin Network July 2022
3) The data characteristics of the two types of flow are consistent, that is, they have the same key characteristics such as flow rate, quintuple information, data packet length, and data packet priority;
6. Security Considerations
TBD.
7. IANA Considerations
TBD.
8. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
Authors’ Addresses
Hongwei Yang China Mobile Beijing 100053 China Email: yanghongwei@chinamobile.com
Cheng Zhou China Mobile Beijing 100053 China Email: zhouchengyjy@chinamobile.com
Yang & Zhou Expires 2 January 2023 [Page 9]
Internet Research Task Force C. ZhouInternet-Draft D. ChenIntended status: Informational China MobileExpires: 11 January 2023 P. Martinez-Julia, Ed. NICT 10 July 2022
Data Collection Requirements and Technologies for Digital Twin Network draft-zcz-nmrg-digitaltwin-data-collection-00
Abstract
The Digital Twin Network is a network system with Physical Network and Twin Network, which can be mapped interactively in real time. The construction of Digital Twin Network requires real-time data of Physical Network to update the state of Twin Network. This document aims to describe the data collection requirements and provide data collection methods or tools to build the data repository for digital twin network.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on 11 January 2023.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.
Zhou, et al. Expires 11 January 2023 [Page 1]
Internet-Draft Network Working Group July 2022
This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Definitions and Acroyms . . . . . . . . . . . . . . . . . . . 3 3. Data Collection Requirements for Digital Twin Network . . . . 3 3.1. Target Driven and On-demand Collection . . . . . . . . . 3 3.2. Diverse Tools for Various Data . . . . . . . . . . . . . 4 3.3. Lightweight and Efficient Collection . . . . . . . . . . 5 3.4. Open and Standardized Interfaces . . . . . . . . . . . . 5 3.5. Naming for Caching . . . . . . . . . . . . . . . . . . . 6 3.6. Efficient Multi-Destination Delivery . . . . . . . . . . 6 4. An Efficient Data Collection Method for Digital Twin Network . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Efficient Data Collection Mechanism . . . . . . . . . . . 6 4.3. Data Collection Process . . . . . . . . . . . . . . . . . 8 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 8.1. Normative References . . . . . . . . . . . . . . . . . . 10 8.2. Informative References . . . . . . . . . . . . . . . . . 10 Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction
With the deployment of Internet of Things (IoT), cloud computing and data center, etc., the scale of the current network is expanded gradually. However, the increase of network scale leads to also increasing the complexity of the current network, and it induces plenty of problems. In order to improve the autonomy ability of network and reduce potential negative effects on physical and virtual networks, we consider that an endogenous intelligent and autonomous network architecture which achieves self-optimization and decision is indispensable (in general, self-management and self-operation). The digital twin technology answers to the challenge of building self- management systems because it can optimize and validate policies through real-time and interactive mapping with physical entities.[I-D.irtf-nmrg-network-digital-twin-arch]
Zhou, et al. Expires 11 January 2023 [Page 2]
Internet-Draft Network Working Group July 2022
Data is the cornerstone required for constructing a digital twin for a network, namely a Digital Twin Network (DTN). In the face of large network scale, data collection, storage and management are faced with great challenges. So, data collection methods and tools should meet the requirements of target-driven, diversity, lightweight and efficiency, while being open and standardized. Among all the requirements, achieving a lightweight and efficient data collection method is of the most importance. If the full-data collection method is adopted, huge storage space and bandwidth resource is needed, especially for complex scenarios that require real-time data and traffic from multi-source and heterogeneous devices. Therefore, it is extremely important to agree on lightweight and efficient data collection, aggregation, and correlation methods, toward building the telemetry data transmission, processing, and storage required to build a DTN system.
2. Definitions and Acroyms
PN: Physical Network
IMC: Instruction Management Center
DSC: Data Storage Center
DTN: Digital Twin Network
TSE: Telemetry Streaming Element
RDF: Resource Description Framework
CPE: Complex Event Processing
3. Data Collection Requirements for Digital Twin Network
3.1. Target Driven and On-demand Collection
The monitoring data of a network is the basis to build a DTN system. Such data is collected from physical and virtual networks. It includes, but is not limited to, the following types:
* Provisional and operational status of physical or virtual devices, as well as the network topology with all network elements.
* Running status of physical, logical, or virtual ports and links.
* Logs and events records of all the network elements.
Zhou, et al. Expires 11 January 2023 [Page 3]
Internet-Draft Network Working Group July 2022
* Statistics (packet loss, traffic throughput, latency, etc.) of flows and ports.
* Various data regarding users and services.
* Lift-cycle operation data of all network elements.
* All above data in time series.
The collection of network data for maintaining a DTN should be in target-driven and on-demand mode. It is not always necessary to collect complete network data list above because of the high cost of resources (CPU, memory, bandwidth etc.). The type, frequency and method of data collection aim to meet the application of a DTN depends on the specific network topology and application requirements.
3.2. Diverse Tools for Various Data
The different types of network data used to maintain a DTN have several characteristics. Some data (e.g. port statistics, key link info, etc.) requires higher collecting frequency, and some data (e.g. flow status, link fault, etc.) needs to be of higher level of real- time. Some data (e.g. device status, port statistics, etc.) can be collected directly and simply via normal tools, while some data (e.g. per-flow latency, traffic matrix, etc.) can only be acquired through complex network measurement. Therefore, multiple tools or methods are needed to collect the massive data required to build the DTN entity.
Currently, some widely-used tools, such as SNMP, NetConf, Telemetry, INT (In-band Network Telemetry), DPI (Deep Packet Inspection), etc. can be candidate tools to collect data for digital twin network. Going forward, it is necessary to study new data collection technology in the following aspects in combination with the data requirements of network application for DTN:
* High-performance data collection technology based on programmable circuits.
* Measurement methods for complex network data such as network performance and network traffic.
* Collaborative data collection technology for multiple data sources.
Zhou, et al. Expires 11 January 2023 [Page 4]
Internet-Draft Network Working Group July 2022
* Distributed and collaborative data collection technology for complex network, and the time synchronization problem of data acquisition.
3.3. Lightweight and Efficient Collection
Data collection tools and methods should be as lightweight as possible, so as to reduce the occupation of network equipment resources and ensure that data collection does not affect the normal operation of the network. The major requirements are list as below.
* Data collection tools and methods needs to improve efficiency of execution, reduce the cost of computing, storage and communication bandwidth.
* The collection of redundant data should be avoided or minimized.
* For the data set that needs to be collected, make full use of the data compression technology, to reduce the resource cost in the collection phase.
3.4. Open and Standardized Interfaces
Data collection interface used to build the DTN should be open and standardized to help avoid either hardware or software vendor lock, and achieve inter-operability. The major requirements of data collection interfaces are:
* Support configuration management, including the data collection protocol, frequency or period, etc.
* Support several speed options (e.g. minute-level, 10-second level, second level (near real time), and real time level) to accommodate different data requirements from applications.
* Be extensible so that more features can be added with limited parameter changes and with backward compatibility.
* Be able to provide secure and reliable information exchange mechanism.
Zhou, et al. Expires 11 January 2023 [Page 5]
Internet-Draft Network Working Group July 2022
3.5. Naming for Caching
Both raw network data and knowledge items obtained from monitoring must be able to be addressed uniquely. This means to give a unique identifier or "name" to each data or knowledge item that references it. This name will be used by caching mechanisms to store the data and provide it for clients that request it, which will also use such name.
3.6. Efficient Multi-Destination Delivery
The maintenance of DTN systems will not be the sole purpose of monitoring information and knowledge communication. Other applications would also request raw telemetry data or knowledge items. They can use the name to identify it. The telemetry system, following the recommendations of RFC 9232 [RFC9232], will deliver the requested data or knowledge items to the requesters as much efficiently as possible. On the one hand, items will be provided by the closest cache to the destination of the data. On the other hand, items will be replicated in the best nodes, following an efficient multi-cast spanning tree. Different underlying protocols can be used to achieve this mechanism.
4. An Efficient Data Collection Method for Digital Twin Network
4.1. Overview
The system that manages the DTN maps, in real time, the PN to the DTN. However the existing methods collect the full data from the PN for modeling, and do not consider problems like time-lag, insufficient storage resources, low computational efficiency and waste of bandwidth resources caused by data transmission. In order to solve these problems, this section introduces an efficient data collection method for maintaining the DTN. This data collection method is based on sending instructions to the elements of the PN for them to pre-process the data (data cleaning or knowledge representation) before sending it back to be applied to the DTN.
4.2. Efficient Data Collection Mechanism
The management system structure consists of the PN and the DTN. The PN includes multiple Data Storage Centers (DSC) and Telemetry Streaming Element (TSE), and the DTN includes the Instruction Management Center (IMC) and Data Storage Center (DSC). The TSE has multiple functions, including data collection, data aggregation, data correlation, knowledge representation and query, etc. In addition, a Complex Event Processing (CEP) engine is integrated into TSE to perform queries to the streamed data. The IMC has two functions. On
Zhou, et al. Expires 11 January 2023 [Page 6]
Internet-Draft Network Working Group July 2022
the one hand, it is used to manage the registration of the DSC in the PN side, and its registration information can include various key information such as the IP address of the DSC in the PN side, chosen data type, and various index names in the data, data source name and data size, etc. On the other hand, it is used to adaptively configure data collection instructions according to the collection requirements of the DSC in the DTN side and search for IP addresses to send instructions. The instruction-carrying information includes rule-based mathematical expressions, executable models in .exe format, dynamic collection frequency, parameter lists, program text files in .m format, text files with parameter configuration, and other types of files. Instructions are flexible and programmable, and can be created, modified, combined, and deleted at any time according to requirements. When the DSC of the DTN side requests data to the IMC, the IMC searches the IP address of the DSC in the database with the registration information, which is built according to critical information, such as data type and data name, and functional instructions for data processing or knowledge representation can be implemented depending on the demand configuration. The DSC of the DTN side stores the effective information after data processing and knowledge representation returned by the TSE.
The DSC in the PN side has two functions. On the one hand, it stores data of various types, such as performance indicators, operational status, log, traffic scheduling, business requirements, etc. On the other hand, it has the function of automatically parsing the instructions sent by the TSE. Then the operating environment of the instruction is configured according to the instruction needs, and data processing or knowledge representation is performed based on the instruction. Data processing mainly includes data cleaning, filling missing data, normalization, conflict verification, etc. Knowledge representation refers to the representation of the original data as a data structure that can be used for efficient computation. Such representation results are closer to machine language, which is conducive to the rapid and accurate construction of the model. The role of knowledge representation is to represent the original data as a data structure that can be used to efficiently calculate. Such representation results closer to the machine language, which is conducive to the rapid and accurate construction of the model.
Zhou, et al. Expires 11 January 2023 [Page 7]
Internet-Draft Network Working Group July 2022
+------------------------------+ +-----------------------+ | Physical Network | | Digital Twin Network | | +-----+ +-----+ +------+ | | +------+ +-------+ | | | | | | | | | | | | | | | | | DSC |... | DSC | | TSE | | | | IMC | | DSC | | | | | | | | | | | | | | | | | +-+---+ +--+--+ +---+--+ | | +---+--+ +----+--+ | | | | | | | | | | +------------------------------+ +-----------------------+ | | | | | | 1.1. Register | | | +-----------+---------> | | | | | | | | | 1.2. Register | | | +---------> | | | | | 1.3. Register | | | | +---------------> | | | | 2. Data req. | | | | <----------+ | | | 3. Query and instruction | | | | configuration | | | | + | | | 4. Send instructions | | | <---------------+ | | | | | | | | 5. Parse and execute | | | | instruction | | | 6. Data subscript. | | | <---------------------+ | | | 7. Knowledge | | | | representation | | | | 8. Data pushing | | | +---------------------> | | | | 9. Data aggregation and | | | | correlation | | | | | 10. Send processed data | | | +--------------------------> | | | | |
Figure 1: Data Collection Process
4.3. Data Collection Process
The specific process is as follows:
* The DSC in the PN side registers into the TSE. The TSE registers into the IMC. Both provide their IP addresses, the data type, the data source, the data size, etc.
Zhou, et al. Expires 11 January 2023 [Page 8]
Internet-Draft Network Working Group July 2022
* The DSC in the DTN side sends the data collection request to the IMC.
* According to the data collection request, the IMC intelligently queries the registration addressing information and configures the data processing instruction.
* The IMC in the DTN side sends the corresponding instruction according to the query result to the TSE.
* After receiving the instructions, the TSE parses them and executes them. The query function can be performed by the CEP engine, which receives all telemetry data and processes it with all queries provided.
* The TSE sends data subscription to DSC in the PN side.
* The DSC in the PN side represents the data semantically in RDF form or sends the data in raw form to the TSE for it to make the semantic representation.
* The DSC in the PN side pushes the data or knowledge item to the TSE.
* The TSE aggregates and correlates the collected data or knowledge items. Then, according to the actual needs, generates aggregated data or knowledge items.
* The TSE sends the resulting data or knowledge items to the DSC in the DTN side.
5. Summary
This draft describes the requirements for data collection and provides the data collection methods or tools required to build the data repository for maintaining DTN systems. These data collection methods or tools should meet the requirement of target-driven, diversity, lightweight and efficiency, while being open and standardized. Among all the requirements, lightweight and efficiency requirements are the most important. Thus, this draft provides a lightweight and efficient method for data collection that is particularly optimized for maintaining DTN systems. Going forward, more methods (transformation and aggregation functions) and tools (solutions) shall be studied to extend the contents of this draft.
Zhou, et al. Expires 11 January 2023 [Page 9]
Internet-Draft Network Working Group July 2022
6. Security Considerations
TBD.
7. IANA Considerations
This document has no requests to IANA.
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC9232] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, May 2022, <https://www.rfc-editor.org/info/rfc9232>.
8.2. Informative References
[I-D.irtf-nmrg-network-digital-twin-arch] Zhou, C., Yang, H., Duan, X., Lopez, D., Pastor, A., Wu, Q., Boucadair, M., and C. Jacquenet, "Digital Twin Network: Concepts and Reference Architecture", Work in Progress, Internet-Draft, draft-irtf-nmrg-network-digital- twin-arch-00, 21 March 2022, <https://www.ietf.org/archive/id/draft-irtf-nmrg-network- digital-twin-arch-00.txt>.
Authors’ Addresses
Cheng Zhou China Mobile Beijing 100053 China Email: zhouchengyjy@chinamobile.com
Danyang Chen China Mobile Beijing 100053 China
Zhou, et al. Expires 11 January 2023 [Page 10]