TECDEV-2765 - Cisco Live

227

Transcript of TECDEV-2765 - Cisco Live

Frank Marsman Systems Engineer [email protected] Mikhail Architect [email protected] Nilsen-Nygaard Principal Engineer [email protected]

TECDEV-2765

YANG > Telemetry > Visualization > MLFrom YANG to Machine Learning in 4 Hours!

Questions? Use Cisco Webex Teams to chat with the speaker after the session

Find this session in the Cisco Events Mobile App

Click “Join the Discussion”

Install Webex Teams or go directly to the team space

Enter messages/questions in the team space

How

1

2

3

4

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco Webex Teams

TECDEV-2765 3

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

➢ YANG: & the data

➢ Telemetry: transport, senders, receiver, collector

➢ TSDB: the datastore

➢ Visualization, dashboards

➢ ML models: the monitor workers

➢ Production line: YANG > Telemetry > TSDB > ML

Agenda

5TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

➢ Definition & Components

➢ Neural Networks

➢ ML Capabilities

➢ Engineering Process

➢ ML @CISCO

➢ Models, Training, Validation

6

Machine Learning -subtopics

TECDEV-2765

➢ Prediction & Accuracy

➢ Data Engineering

➢ ML Quality Assurance

➢ Case 1: Anomaly detection

➢ Case 2: Anomaly prediction

➢ Case 3: AI game

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

The Team

• Frank Marsman, Systems Engineer, [email protected]

• Interests: Programmability, Machine Learning

• Mike Mikhail, Delivery Architect, [email protected]

• Interests: Automation, Machine Learning, NFV, SP technologies

• Einar Nilsen-Nygaard, Principal Engineer, [email protected]

• Interests: Programmability, Python, Access Policy, Telemetry

7TECDEV-2765

Initiate ML model, initial training…

YANG

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Terminology

11TECDEV-2765

Components of a working system

Yang

Models

Management

Applications

YANG Modules

YANG Modules

YANG Modules

YANG Modules

Client

NETCONF

ServerData StoresConfig & Oper

DataNETCONF

Session

YANG: Data Models

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

YANG is a data modeling language

• Readable by humans and machines

• Hierarchical, modular, and extensible

• https://datatracker.ietf.org/wg/netmod/documents/

“Yet Another Next Generation” — really!

13

UTF-8 Text

YANG Model

Configuration Data

Operational Data

Actions (RPCs)

Notifications

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Industry definitionIETF, ITU, OpenConfig, etc.

Common functionality shared across vendors

Example: ietf-diffserv-policy.yang

(IETF Diffserv data model)

Vendor definition

Unique to a Vendor operating system or platform

Example: Cisco-IOS-XR-ipv4-bgp-cfg.yang(IOS-XR BGP config data model)

Open

Models

Native

Models

Today Open Models are a functional subset of Native Models

Two flavors of YANG data models

15TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

YANG data models are hierarchical (trees)

• Example: Cisco XR OSPFv3 module

$ pyang -f tree [email protected]

module: Cisco-IOS-XR-ipv6-ospfv3-oper+--ro ospfv3+--ro processes+--ro process* [process-name]+--ro vrfs| +--ro vrf* [vrf-name]| +--ro vrf-name xr:Cisco-ios-xr-string| +--ro summary-prefixes| | +--ro summary-prefix*| | +--ro prefix? inet:ipv6-address-no-zone| | +--ro prefix-length? xr:Ipv6-prefix-length| | +--ro prefix-metric? uint32| | +--ro prefix-metric-type? Ospfv3-default-metric| | +--ro tag? Uint32... ... ...

Downloaded from server (router)

Type defined in another module

Module name

Container

List entry (note “*”)

LeafRead-only

(operational)

data

TECDEV-2765 16

How and where to findYANG models

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Where to find YANG models

• Retrieve from the YANG server, via NETCONF <get-schema>

• GitHub

• https://github.com/YangModels/yang/

• IETF, IEEE, Broadband Forum, and MEF draft and standard models

• Vendor models for Cisco, Ciena, Huawei, and Juniper

• https://github.com/openconfig/public/tree/master/release/models

• OpenConfig models

TECDEV-2765 18

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

OpenConfig YANG models

• http://www.openconfig.net/

• https://github.com/openconfig/public

Vendor neutral

20TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

NETCONF YANG Module Capabilities

21

ServerClient

<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">

<capabilities>

<capability>urn:ietf:params:netconf:base:1.1</capability>

<capability>urn:ietf:params:netconf:capability:startup:1.0</capability>

...

<capability>urn:ietf:params:xml:ns:yang:ietf-interfaces?

module=ietf-interfaces&amp;revision=2014-05-08

&amp;features=pre-provisioning,if-mib,arbitrary-names</capability>

</capabilities>

<session-id>4</session-id>

</hello>

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Getting YANG Modules

22

ServerClient

<rpc message-id="102" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">

<get-schema xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">

<identifier>ietf-interfaces</identifier>

</get-schema>

</rpc>

<rpc-reply message-id="102" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">

<data xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">

module ietf-interfaces {

//ietf-interfaces yang module contents here ...

}

</data>

</rpc-reply>

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

• For your convenience!

• Grouped into subdirectories,per-OS, per-release

• Includes per-platform capabilities data

• Includes copies of all open models supported in each release

Cisco YANG models on github.com

23TECDEV-2765

YANG: Toolspyang & YANG Catalog

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

pyang: Extensible YANG validator and converter

• https://github.com/mbj4668/pyang/

• Open source (ISC license)

• Python based

• Usable as standalone tool or as part of a Python workflow

25TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Validate & Display YANG Modules With pyang

26

https://github.com/mbj4668/pyang$ pyang -f tree --tree-depth 5 Cisco-IOS-XR-l2-eth-infra-oper@2015-11-09.yangCisco-IOS-XR-l2-eth-infra-oper-sub1@2015-11-09.yang:11: warning: imported module Cisco-IOS-XR-types not [email protected]:11: warning: imported module Cisco-IOS-XR-types not usedmodule: Cisco-IOS-XR-l2-eth-infra-oper

+--ro mac-accounting| +--ro interfaces| +--ro interface* [interface-name]| +--ro interface-name xr:Interface-name| +--ro state| | +--ro is-ingress-enabled? boolean| | +--ro is-egress-enabled? boolean| | +--ro number-available-ingress? uint32| | +--ro number-available-egress? uint32| | +--ro number-available-on-node? uint32| +--ro ingress-statistic*| | +--ro mac-address? yang:mac-address| | +--ro packets? uint64| | +--ro bytes? uint64| +--ro egress-statistic*| +--ro mac-address? yang:mac-address| +--ro packets? uint64| +--ro bytes? uint64+--ro vlan| +--ro nodes| +--ro node* [node-id]

GET: Cisco-IOS-XR-l2-eth-infra-oper:vlan

GET: Cisco-IOS-XR-l2-eth-infra-oper:mac-accounting

Also have --tree-path

Or try jstree instead?

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

https://yangcatalog.orgA repository of YANG tools and the metadata around YANG models with the purpose of driving collaboration between authors and adoption with consumers.

TECDEV-2765 27

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

YANG Catalog

28

Web-Based Searching of YANG Models

Yang Search

View model relationships

Search for nodes

Display model trees

REST queries

http://yangcatalog.org/yang-search/TECDEV-2765

pyang, YANG Catalog: Let’s see it…

Telemetry

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

7% 7% 8%7%

14%

20%

0

10

20

30

1 2 3

CPU load

0

100

200

300

400

5s 10s 15s 20s

Thousands

Counters

0 5 10 15 20 25

Memory

Interface

counters

Time to collect all data

(NCS5516, 576х100GE)

Telemetry

SNMP

Destinations

Seconds

More counter data

Reduction in CPU load

Faster collection

“Pushing” More Data Really Does Work Better

31TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How Do You See Telemetry?

1. Which Telemetry?

2. What to Observe?

3. How to Observe?

4. Time to Explore

32TECDEV-2765

Telemetry: Protocols

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

NETCONF RESTconf gRPC

Device Features

Interface BGP QoS ACL …

SNMP

YANG Data Model

Open Native Open Native

Physical and Virtual Network Infrastructure

Configuration Operational

Programmable

Interfaces

Data Models

Protocol

Data

Programmable Interface “Stack”

TECDEV-2765 34

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

NETCONF RESTconf gRPC

Device Features

Interface BGP QoS ACL …

SNMP

YANG Data Model

Open Native Open Native

Physical and Virtual Network Infrastructure

Configuration Operational

Programmable

Interfaces

Data Models

Protocol

Data

Collectors & Applications Visibility

Programmable Interface “Stack”

TECDEV-2765 35

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 36TECDEV-2765

Data Model-driven Management

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Model-Driven Telemetry

TECDEV-2765 37

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry

YANG Model Data Push – Dial-Out

38TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry

YANG Model Data Push – Dial-In

39TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

IOS-XR Support Matrix

Classic XR ASR9k

Evolved XR ASR9k

NCS5500 NCS6k

MDT support 6.1.1 6.1.1 6.1.1 6.1.3

Data modelsYANG

(native, OC)YANG

(native, OC)YANG

(native, OC) YANG

(native, OC)

Transport (Control

protocols)

TCP, UDP (6.2.1)

gRPC (dial-in, dial-out), TCP, UDP (6.2.1)

gRPC (dial-in, dial-out), TCP, UDP (6.2.1)

TCP, UDP (6.2.1)gRPC (mgmt port only, dial-in, dial-out, 6.5.1)

EncodingGPB /

GPB-KV / JSON (6.3.1)

GPB / GPB-KV /

JSON (6.3.1)

GPB / GPB-KV /

JSON (6.3.1)

GPB / GPB-KV /

JSON (6.3.1)

gNMI 6.5.1 6.5.1 6.5.1

40

New

New

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

NX-OS Support Matrix

41TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

NX-OS Support Matrix

42TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

IOS-XE Support Matrix

43

Switching Wireless Routing

CAT 3650 / 3850

CAT 9200L

CAT 9200

CAT 9300L

CAT 9300 / 9400 / 9500

CAT 9500H

CAT 9600

CAT 9800-CL

CAT 9800 -40/80

ISR 1000 ISR 4000CSR

1000v

ASR 1000 Fixed

ASR 1000

Modular

Model Driven Configuration Management

NETCONF 16.5+ 16.9+ 16.9+ 16.9+ 16.6+ 16.8+ 16.11+ 16.10+ 16.10+ 16.8+ 16.3+ 16.3+ 16.3+ 16.3+

RESTCONF 16.7+ 16.9+ 16.9+ 16.9+ 16.7+ 16.8+ 16.11+ 16.11+ 16.11+ 16.8+ 16.6+ 16.6+ 16.6+ 16.6+

gNMI 16.12+ 16.12+ 16.12+ 16.8+ 16.10+ 16.11+

Model Driven Telemetry

NETCONF Dial-In 16.6+ 16.9+ 16.9+ 16.9+ 16.6+ 16.8+ 16.11+ 16.8+ 16.7+ 16.10+ 16.7+ 16.8+

gRPC Dial-Out 16.10+ 16.10+ 16.10+ 16.10+ 16.10+ 16.11+ 16.10+ 16.10+ 16.10+ 16.10+ 16.10+

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

IOS-XE Support Matrix

44

IoT SPAG Cable

IR1100

ESR6300

IE3x00

ESS 3300ASR 900 /

920NCS 520 NCS 4200 cBR-8

Model Driven Configuration Management

NETCONF 16.10+ 16.11+ 16.11+ 16.8+ 16.10+ 16.8+ 16.8+

RESTCONF 16.10+ 16.11+ 16.11+ 16.8+ 16.10+ 16.8+ 16.8+

Model Driven Telemetry

NETCONF Dial-In

16.10+ 16.9+ 16.10+ 16.9+ 16.9+

gRPC Dial-Out

16.10+ 16.10+ 16.10+ 16.10+ 16.10+

TECDEV-2765

Telemetry: Server Configuration

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry: Node Configuration

RP/0/RP0/CPU0:PE125#show running-config telemetry model-drivenFri Apr 21 21:10:32.469 EDTtelemetry model-drivendestination-group COLL1address family ipv4 192.168.30.101 port 2103encoding self-describing-gpbprotocol tcp

!address family ipv4 192.168.30.102 port 2103encoding self-describing-gpbprotocol tcp

!!destination-group COLL-ROUTINGaddress family ipv4 192.168.30.101 port 2103encoding self-describing-gpbprotocol tcp

!.

IOS-XR example

46TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry: Node Configuration - Continued

.

!sensor-group YD1sensor-path Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfacessensor-path Cisco-IOS-XR-infra-statsd-oper:infra-

statistics/interfaces/interface/latest/generic-counters!sensor-group YD-ROUTINGsensor-path Cisco-IOS-XR-fib-common-oper:fibsensor-path Cisco-IOS-XR-ip-rib-ipv4-oper:rib/vrfs/vrf/afs/af/safs/saf/ip-rib-route-

table-names/ip-rib-route-table-name/routes!subscription SUB1sensor-group-id YD1 sample-interval 30000destination-id COLL1

!subscription SUB-ROUTINGsensor-group-id YD-ROUTING sample-interval 30000destination-id COLL-ROUTING

!!

IOS-XR example

47TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry: Node Operation

RP/0/RP0/CPU0:PE125#show telemetry model-driven subscription SUB-ROUTINGFri Apr 21 21:31:27.002 EDT

Subscription: SUB-ROUTING-------------State: ACTIVESensor groups:Id: YD-ROUTINGSample Interval: 30000 msSensor Path: Cisco-IOS-XR-fib-common-oper:fibSensor Path State: ResolvedSensor Path: Cisco-IOS-XR-ip-rib-ipv4-oper:rib/vrfs/vrf/afs/af/safs/saf/ip-rib-route-table-

names/ip-rib-route-table-name/routesSensor Path State: Resolved

Destination Groups:Group Id: COLL-ROUTINGDestination IP: 192.168.30.101Destination Port: 2103Encoding: self-describing-gpbTransport: tcpState: ActiveNo TLS Total bytes sent: 2256783989Total packets sent: 32568Last Sent time: 2017-04-21 21:31:20.1422343376 -0400

.

IOS-XR example

48TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry: Node Operation - Continued

.

Collection Groups:------------------Id: 7Sample Interval: 30000 msEncoding: self-describing-gpbNum of collection: 607Collection time: Min: 48 ms Max: 559 msTotal time: Min: 280 ms Avg: 18724 ms Max: 21364 msTotal Deferred: 0Total Send Errors: 0Total Send Drops: 0Total Other Errors: 4248Last Collection Start:2017-04-21 21:30:55.1397666376 -0400Last Collection End: 2017-04-21 21:31:15.1417238376 -0400Sensor Path: Cisco-IOS-XR-fib-common-oper:fib

Id: 8Sample Interval: 30000 msEncoding: self-describing-gpbNum of collection: 450Collection time: Min: 44 ms Max: 180 msTotal time: Min: 103 ms Avg: 154 ms Max: 590 msTotal Deferred: 0Total Send Errors: 0

.

IOS-XR example

49TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry – Node configuration

IOS XE example

50

csr1000v#show run | section telemetry

telemetry ietf subscription 22

encoding encode-kvgpb

filter xpath /memory-ios-xe-oper:memory-statistics/memory-statistic/free-memory

stream yang-push

update-policy periodic 30000

receiver ip address 192.168.222.43 57500 protocol grpc-tcp

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry: Node Operation

IOS XE example

51

.

csr1000v#show telemetry ietf subscription 22 detail

Telemetry subscription detail:

Subscription ID: 22

Type: Configured

State: Valid

Stream: yang-push

Filter:

Filter type: xpath

XPath: /memory-ios-xe-oper:memory-statistics/memory-statistic/free-memory

Update policy:

Update Trigger: periodic

Period: 100

Encoding: encode-kvgpb

Source VRF:

Source Address:

Notes:

Receivers:

Address Port Protocol Protocol Profil

------------------------------------------------------------------

192.168.222.43 57500 grpc-tcp

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Telemetry – Node Configuration

telemetrydestination-group 1ip address 172.31.219.148 port 50001 protocol gRPC encoding GPB

sensor-group 1path sys/bgp/ depth unboundedpath sys/intf depth unbounded

sensor-group 2data-source NX-APIpath "show environment power" depth 0path "show processes cpu sort" depth 0

subscription 1dst-grp 1snsr-grp 1 sample-interval 30000

subscription 2dst-grp 1snsr-grp 2 sample-interval 60000

NX-OS Example

52TECDEV-2765

Telemetry: Receiver/Collector

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Remember the Hardware!

Designed for Model Training

• Eight NVIDIA Tesla V100s

• NVIDIA NVLink Interconnect

• Up to 24 Disks; RAID Controller

• Up to 6 NVMe Drives

• Network: Up to 100GB

• High Availability

UCS C480 ML: Storage, processing, memory

TECDEV-2765

GPUs8 X V100 32GB NVLink Interconnect

54

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

C220/240’s for Dev & Testing

55TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Pre-packaged: ELK Stack

• Elasticsearch + Logstash + Kibana

• ELK stack + Cisco proto @ https://github.com/cisco/bigmuddy-network-telemetry-stacks

• 3 programs:

• Logstash: Data/log receiver << data goes there

• Elasticsearch: Data extractor << the engine indexing and sorting

• Kibana: Data visualization << rendering in several formats

• Easy steps:

• Clone the code

• Script installs in 3 Docker “containers”

• Run, Logstash listens on TCP 2103 (default)

• Customize Kibana “visualizations”, save, accessible via http: links

• Optionally organize dashboards, save

56TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

ELK Stack Components

cisco@mamikhai-ubuntu:/var/local/stack_elk/logstash_data$ sudo docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES94d64b0f0523 logstash:2.3.1 "/bin/sh -c '/star..." 27 hours ago Up 27 hours stack_elk_logstashdaad55f9db47 kibana:4.5.0 "/bin/sh -c '/star..." 27 hours ago Up 27 hours 0.0.0.0:5601->5601/tcp stack_elk_kibana8f4c4ec193fc elasticsearch:2.3.1 "/docker-entrypoin..." 27 hours ago Up 27 hours stack_elk_elasticsearch

Elasticsearch/Logstash/Kibana containers, and receiving port

57

cisco@mamikhai-ubuntu:/var/local/stack_elk/logstash_data$ netstat -n | grep 2103tcp6 0 0 192.168.30.101:2103 10.100.25.25:18566 ESTABLISHEDtcp6 0 0 192.168.30.101:2103 10.100.25.25:22514 ESTABLISHEDtcp6 0 0 192.168.30.101:2103 10.100.24.24:46247 ESTABLISHED

cisco@mamikhai-ubuntu:/var/local/stack_elk/logstash_data$ netstat -ln | grep 2103tcp6 0 0 :::2103 :::* LISTENudp6 0 0 :::2103 :::*

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Prepackaged: Telemetry Collection Stack

• Receiver >> visualization/dahsboards, notifications• https://xrdocs.io/telemetry/tutorials/2018-06-04-ios-xr-telemetry-collection-stack-intro/

• https://github.com/vosipchu/XR_TCS• Used in use case 1 (ML Anomaly Detection)

58TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Receiver connectedcisco@mamikhai-ubuntu:/opt/ML$ netstat -n | grep 5432tcp6 0 0 192.168.30.101:5432 192.168.30.137:41598 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.124:29437 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.135:55154 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.137:34688 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.125:22002 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.135:33199 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.124:25891 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.112:21773 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.178:20541 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.135:51278 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.112:30979 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.124:37490 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.137:54524 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.125:44535 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.124:32766 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.178:54599 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.137:31732 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.125:31449 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.112:51587 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.112:40991 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.135:16006 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.178:16672 ESTABLISHED.

TECDEV-2765 59

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Pipeline dump – receiving datacisco@mamikhai-ubuntu:/opt/ML$ pipeline troubleshooting startcisco@mamikhai-ubuntu:/opt/ML$ more ~/analytics/pipeline/bin/dump.txt

------- 2019-12-08 14:03:16.973093852 -0500 EST -------Summary: GPB(common) Message [192.168.30.125:21743(PE125)/Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/statistics-global msg len: 1549]{

"Source": "192.168.30.125:21743","Telemetry": {

"node_id_str": "PE125","subscription_id_str": "routing","encoding_path": "Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/statistics-

global","collection_id": 3007999,"collection_start_time": 1575831781969,"msg_timestamp": 1575831781969,"collection_end_time": 1575831781972

},"Rows": [

{"Timestamp": 1575831781971,"Keys": {

"instance-name": "ISIS".

60TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Pipeline dump – receiving data -continued},"Content": {

"per-area-data": {"level": "isis-level2","per-topology-data": {

"id": {"af-name": "ipv4","saf-name": "unicast","topology-name": "","vrf-name": ""

},"statistics": {

"ispf-run-count": 0,"nhc-run-count": 7,"periodic-run-count": 9127,"prc-run-count": 5,"spf-run-count": 9234

}},"statistics": {

"system-lsp-build-count": 12,"system-lsp-refresh-count":

10436}

61TECDEV-2765

},"statistics": {

"avg-csnp-process-time": {"nano-seconds": 69306,"seconds": 0

},"avg-csnp-recv-rate": 2,"avg-csnp-send-rate": 2,"avg-csnp-transmit-time": {

"nano-seconds": 43261,"seconds": 0

},"avg-hello-process-time": {

"nano-seconds": 17370,"seconds": 0

},"avg-hello-recv-rate": 0,"avg-hello-send-rate": 0,"avg-hello-transmit-time": {

"nano-seconds": 65944,"seconds": 0

.

Telemetry: How to…

Time Series Database

TSDB: Structure

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Characteristics of TSDB

• Records are time-stamped or time-series

• Suitable for volatile value records: Financials, weather, sensory data

• Can be aggregated, compressed, sampled over time

• Data lifecycle management

• Efficient and flexible time-based retrieval

• Recently gaining popularity faster than other db categories

Time-referenced records, compression, retention, retrieval

TECDEV-2765 65

InfluxDB is optimized for collecting, storing, retrieving & processing of time series data

Characteristics of time series data

● All Time-stamped data (metrics, logs, traces)● Huge volumes of data● Push and pull collection methods ● Real-time processing (aggregations, alerts, analytics)● Time sensitive life-cycle (roll-ups, long-term storage,

eviction)● High variety of semi-structured data

Time series databases are taking the center stage in modern IT - InfluxDB leads the segment.

~125% increase in DB-Engines score over 24 months

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

InfluxDB, Open Source TSDB• https://github.com/influxdata/influxdb

• Part of a packaged telemetry stack https://github.com/vosipchu/XR_TCS

72TECDEV-2765

cisco@mamikhai-ubuntu:~$ influx -execute "show diagnostics"name: buildBranch Build Time Commit Version------ ---------- ------ -------1.5 cdae4ccde4c67c3390d8ae8a1a06bd3b4cdce5c5 1.5.1.name: systemPID currentTime started uptime--- ----------- ------- ------4407 2019-12-15T00:39:02.112022443Z 2019-09-04T20:03:01.975987784Z 2428h36m0.136034659scisco@mamikhai-ubuntu:~$ cisco@mamikhai-ubuntu:~$ influx -execute "show measurements" -database="mdt_db"name: measurementsname----Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/levels/level/adjacencies/adjacencyCisco-IOS-XR-clns-isis-oper:isis/instances/instance/statistics-globalCisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-countersCisco-IOS-XR-ip-rib-ipv4-oper:rib/vrfs/vrf/afs/af/safs/saf/ip-rib-route-table-names/ip-rib-route-table-name/protocol/isis/as/informationCisco-IOS-XR-ip-rsvp-oper:rsvp/counters/interface-messages/interface-message.

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Database performance

TECDEV-2765 73

TSDB: Queries

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Python query, return a dataframe#! /usr/bin/env python

from influxdb import InfluxDBClient

client = InfluxDBClient(host='localhost', port=8086, database='mdt_db’)

data = client.query('SELECT non_negative_derivative("bytes-sent", 1s) *8 FROM "Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters" WHERE ("Producer" =~ /^PE124$/) AND ("interface-name" =~ /^tunnel/) AND time >= now() -1h GROUP BY "interface-name" LIMIT 120')

Print(data)

cisco@mamikhai-ubuntu:/opt/ML$ sudo ./read.pyResultSet({'(u'Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters', {u'interface-name': u'tunnel-te12400'})': [{u'non_negative_derivative': 130.66666666666666, u'time': u'2019-12-08T12:31:19.948Z'}, {u'non_negative_derivative': 392, u'time': u'2019-12-08T12:31:49.948Z'}, {u'non_negative_derivative': 213.5644059323446, u'time': u'2019-12-08T12:32:19.953Z'}, {u'non_negative_derivative': 261.10148019735965, u'time': u'2019-12-08T12:32:49.949Z'}, {u'non_negative_derivative': 2577.0666666666666, u'time': u'2019-12-08T12:33:19.949Z'}, {u'non_negative_derivative': 0, u'time': u'2019-12-08T12:33:49.948Z'}, {u'non_negative_derivative': 0, u'time': u'2019-12-08T12:34:19.951Z .

TECDEV-2765 75

Visualization

Visualization: Data > graphs

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Visualize a query

TECDEV-2765 78

Visualization: Dashboards

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Dashboard: organized collection of visualizations

TECDEV-2765 80

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Dashboard: JSON{"annotations": {

"list": [{

"$$hashKey": "object:83","builtIn": 1,"datasource": "-- Grafana --","enable": true,"hide": true,"iconColor": "rgba(0, 211, 255, 1)","name": "Annotations & Alerts","type": "dashboard"

}]

},"editable": true,"gnetId": null,"graphTooltip": 0,"id": 12,"iteration": 1575818163044,"links": [],"panels": [

{"collapsed": false,

81TECDEV-2765

"gridPos": {"h": 1,"w": 24,"x": 0,"y": 0

},"id": 64,"panels": [],"repeat": null,"title": "Summary","type": "row"

},{

"cacheTimeout": null,"colorBackground": false,"colorValue": false,"colors": [

"rgba(50, 172, 45, 0.97)","rgba(237, 129, 40, 0.89)","rgba(245, 54, 54, 0.9)"

],"datasource": "InfluxDB","format": "none",.

Visualization: See it…

Machine Learning

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How to work with Neural Networks

• What is Machine Learning?

• What is a Neural Network

• Essentials

• Hyperparameters & weights

• How to Train

• Training parameters

• How training works

• Demo

TECDEV-2765 85

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

AI Using ML vs Expert Systems

In an Expert System, the full knowledge of the expert acquired is digitized, and is used in the decision making. An expert specifies all steps she/he took to make the decision, the basis for doing the same, and how to handle exceptions.

In Machine Learned solution, while giving the training examples, the expert is only asked for a decision. A "Supervised Learning" algorithm would determine, based on all the data available, mimic the end-behaviour of the expert.

TECDEV-2765 86

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

ML Basic Definition

Input

Developer Logic

Output Input Logic

Output

Traditional Programming ML Training

TECDEV-2765 87

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

ML Process & Use

Input

Model

Output

ML Production

Input Model

Output

ML Training

TECDEV-2765 88

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Data Science Terms

• Data Science – using data to solve problems

• AI – Teaching computers to solve problems

• ML – Computers teaching themselves

• Supervised – known outcome

• Unsupervised – unknown outcome

• DL – Artificial neural network

Source: https://qph.fs.quoracdn.net/main-qimg-

cf42db79eb79239884a29568fcc24002-c

TECDEV-2765 90

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Supervised Learning

Source: https://aldro61.github.io/microbiome-summer-school-2017/figures/figure.classification.vs.regression.png

TECDEV-2765 91

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Why leverage ML

• There is data available

• People make mistakes

• Is it an arduous/boring task

• Requires constant attention

• Is it a difficult decision or prediction

• Can human bias impact the outcome

• What’s the motivation

• Is there an opportunity for efficiency or cost savings

• Can it grow the business

TECDEV-2765 94

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What does Machine Learning technically do?

• Defined: An algorithmic approach to iteratively tuning the parameters of a Statistical Model to achieve the best estimated model – effectively training than explicitly programming.

• Needs three things:

• Input data – Observations and their associated features related to some phenomenon

• Outcomes – A feature we are trying to predict (supervised learning)

• Measure of success – A measurement of how well our model predicts the outcome using the input data that can be used as a feedback signal

• The main problem in ML/DL – how do we transform our data in a meaningful way to learn this relationship?

TECDEV-2765 96

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How to work with Neural Networks

• What is Machine Learning?

• What is a Neural Network

• Essentials

• Hyperparameters & weights

• How to Train

• Training parameters

• How training works

• Demo

TECDEV-2765 99

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What is a neural network?

TECDEV-2765 100

computing systems inspired by the biological neural networks that constitute animal brains

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What is a neural network?

TECDEV-2765 101

They all boil down to same thing. Requires an input (a list of numbers) and use this to generate an output (another list of numbers)

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What is a neural network?

TECDEV-2765 102

Input could be pixel values of an image, output could be an indicator of the content of that image

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Parameters and Hyperparameters of a NN

TECDEV-2765 103

The hyperparameters of a network define the structure and are fixed.

The parameters of the network are not fixed and are tuned during the training phase.

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Parameters and Hyperparameters of a NN

TECDEV-2765 104

The hyperparameters of a network define its overall structure;

• Number of input nodes• Number of output nodes• Number of hidden layers and number

of neurons in each hidden layer• Activation functions of layers• …

Number of input/output nodes are often dictated by the dataset. However, number of hidden layers is up to the user. Too many hidden layers leads to overfitting, too little leads to bad performance!

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Source: https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-learning-and-how-to-deal-with-it-6803a989c76

Overfitting – An illustration

TECDEV-2765 105

Typically, overfitting will not occur if the size of your datasetis at least twice as large as the number of parameters (weights)

in your network!

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Parameters and Hyperparameters of a NN

TECDEV-2765 106

The parameters of a network define its behavior;

Each line between two nodes (neurons) represents a connection between them. Each connection has a weight; a floating point number.

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Parameters and Hyperparameters of a NN

TECDEV-2765 107

The parameters of a network define its behavior;

The state of a layer in the network depends only on the state of the previous layer, and is computed using the weights of the connections to that layer.

The weights of the network are tuned during training!

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How a Neural Network works

TECDEV-2765 108

Let’s see how a Neural Network computes the output from a given input

input output

hidden Hyperparameters:

• Two input nodes• One hidden layer, containing

three nodes• One output node• Activation function:

Sigmoid (we will see this later)

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How a Neural Network works

TECDEV-2765 109

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How a Neural Network works

TECDEV-2765 110

0.2

0.5

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How a Neural Network works

TECDEV-2765 111

0.2

0.5

0.8

-0.2

0.8 * 0.2 + 0.5 * (-0.2)= 0.16 -0.1= 0.060.06

3.96

0.50

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Activation Functions

TECDEV-2765 112

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How a Neural Network works

TECDEV-2765 113

0.2

0.5

0.06

3.96

0.50

Sigmoid(0.06) = 0.51499…

Sigmoid(3.96) = 0.98129…

Sigmoid(0.50) = 0.62245…

0.51

0.98

0.62

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Source: https://qph.fs.quoracdn.net/main-qimg-8a19e73bffab9a7f6eab55fd5b47c00a

TECDEV-2765 114

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Source: https://cdn-images-1.medium.com/max/2000/1*cuTSPlTq0a_327iTPJyD-Q.png

A variety of different ANN architectures exist to solve

different types of problems…

For example:Convolutional NNs for computer

visionRecurrent NNs for natural

language processingLong/Short Term Memory NNs for

time series

TECDEV-2765 115

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Source: https://www.mdpi.com/applsci/applsci-09-03169/article_deploy/html/images/applsci-09-03169-g001.png

TECDEV-2765 120

MNIST Dataset

• Contains 10,000 handwritten digits

• Each digit is 28 * 28 pixels

Example of Neural Network classifying handwritten digits

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 121

28 * 28= 784

neurons

10neurons

Neural Network to classify the digits

• 1 input layer of 784 neurons• 3 hidden layers, having a total of

10,000 neurons• 1 output layer of 10 neurons

Example of Neural Network classifying handwritten digits

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Example of Neural Network classifying handwritten digits

TECDEV-2765 123

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

How to work with Neural Networks

• What is Machine Learning?

• What is a Neural Network

• Essentials

• Hyperparameters & weights

• How to Train

• Training parameters

• How training works

• Demo

TECDEV-2765 124

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 125

Training parameters are considered hyperparameters of the system. That means they do not change.

However, during training, the network parameters are continuously changed!

Training parameters

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 126

Training parameters define how the network weights are tuned during training. Some training parameters are;

• Learning rate• Learning rate decay• Batch size• …

Training parameters

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 127

In our demo, we will train a Neural Network to classify the species of a flower based on the sizes of the sepals and petals.

Three species, so three output nodes!

How training works - Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 128

Four measurements, so four input nodes!

How training works - Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 129

5.1

3.5

1.4

0.2

How training works - Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 130

5.1

3.5

1.4

0.2

Setosa

Versicolor

Virginica

How training works - Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 131

5.1

3.5

1.4

0.2

Setosa

Versicolor

Virginica

Setosa -> (1, 0, 0)

Versicolor -> (0, 1, 0)

Virginica -> (0, 0, 1)

How training works - Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 132

5.1

3.5

1.4

0.2

Setosa

Versicolor

Virginica

Setosa = (1, 0, 0)

Versicolor = (0, 1, 0)

Virginica = (0, 0, 1)

(5.1, 3.5, 1.4, 0.2) -> (1, 0, 0)

How training works - Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 133

5.1

3.5

1.4

0.2

Setosa

Versicolor

Virginica

Setosa = (1, 0, 0)

Versicolor = (0, 1, 0)

Virginica = (0, 0, 1)

(5.1, 3.5, 1.4, 0.2) -> (0.3, 0.2, 0.3)

During training:

input output

(5.1, 3.5, 1.4, 0.2) -> (0.4, 0.1, 0.2)

*training step happens*

How training works - Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 134

(5.1, 3.5, 1.4, 0.2) -> (0.3, 0.2, 0.3)

During training:

input output

(5.1, 3.5, 1.4, 0.2) -> (0.4, 0.1, 0.2)

*training step happens*

How training works - Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 135

Dataset will be split into two parts;

• Training set: used to train the network• Test set: used to evaluate the performance of the network

How training works – Test set and Training set

ML Capabilities

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What can a neural network do?

TECDEV-2765 139

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What can a neural network do?

141TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What can a neural network do?

144TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What can a neural network do?

145TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Adversarial attacks on Neural Networks

146TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Want to practice?

• MNIST handwritten digits at http://yann.lecun.com/exdb/mnist/

• CAIDA network-related data sets at http://www.caida.org/data/overview/

• Wide variety of datasets at https://www.kaggle.com/datasets

• Variety of Pcap files at https://www.netresec.com/?page=PcapFiles

• NOAA weather data sets at https://www.ncdc.noaa.gov/cdo-web/datasets

• Wide variety of datasets from Deep Learning at http://deeplearning.net/datasets/

• BGP datasets at http://www.sfu.ca/~ljilja/cnl/projects/BGP_datasets/index.html

Publicly Available Datasets

147TECDEV-2765

Demo

Machine Learning in Cisco Products

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Machine Learning/Deep Learning at Cisco Systems

• Cisco Encrypted Traffic Analysis

• Malware detection

• Cisco DNA Analytics

• Optimizes network functions across users

• Cisco Intersight

• Detect issues and connect with TAC

• Cisco Spark Assistant

• Meeting automation and optimization

• Cisco Crosswork

• Suite

Source: https://png.kisspng.com/20180401/jdw/kisspng-cisco-systems-router-

data-center-computer-software-ai-

5ac115ef7ef1f6.91676325152260350352.png

TECDEV-2765 151

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

CX: Telemetry >> AI / ML

152TECDEV-2765

Encrypted Traffic Analysis

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Identifying malicious encrypted traffic

Model

Google Search Page Download

src dst

Packet lengths, arrival times and durations tend to be inherently different for

malware than benign traffic

Client

SentPackets

ReceivedPackets

Server src dst

Exfiltration and Keylogging

src dst

Initiate Command and Control

Cisco Crosswork: Situation Manager

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Supervised ML for root cause probability

TECDEV-2765 156

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Unsupervised ML for event clustering

TECDEV-2765 157

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Notify to similar “situations”

TECDEV-2765 158

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Machine Learning: Intelligent Control at Scale

• Supervised ML for root cause probability• Learn from operators marking of causal alerts

• Unsupervised ML for grouping alerts into “Situations” [clustering], and noise reduction• Recognize patterns in reported events

• Assign/adjust significance to events

159TECDEV-2765

ML Engineering Process

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Process Phases for Applied ML/DL - CRISP-DM

161

Business Understanding

• The most important phase!

• The right answers to the wrong questions

• Framing business problems as data problems

• Decomposing large problems to smaller ones

• Defining baselines

• Success criteria/Expected Value

Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-

DM_Process_Diagram.png

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Process Phases for Applied ML/DL - CRISP-DM

162

Data Understanding

• The second most important phase!

• Strengths/limitations of the “raw material”

• Data cost and benefits

• Collect, describe, explore, verify - EDA

• Data needs in Expected Value context

Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-

DM_Process_Diagram.png

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Process Phases for Applied ML/DL - CRISP-DM

163

Data Preparation

• Selecting data/Data integration

• Feature extraction/engineering

• Cleaning data based on EDA

• Standardization/Normalization

• Missing values

• Logic checks

Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-

DM_Process_Diagram.png

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Process Phases for Applied ML/DL - CRISP-DM

164

Modeling

• Supervised/unsupervised learning

• Model selection

• Classification or Regression?

• Prediction or Inference?

• Accuracy or Interpretability?

• Model assumptions

• Model comparison/assessment

• Parameter tuning

Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-

DM_Process_Diagram.png

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Process Phases for Applied ML/DL - CRISP-DM

165

Evaluation

• Does selected model(s) satisfy original goal(s)?

• Staged deployment or test/control live deployment?

• Stakeholder sign-off

• Regulatory concerns

• Accuracy or Interpretability

• Go or no-go?

Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-

DM_Process_Diagram.png

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Process Phases for Applied ML/DL - CRISP-DM

166

Deployment

• Integration in information system/business process

• Model socialization

• Monitoring/maintenance

• Automation in production

• Documentation/reporting

• Debrief

• Rinse and repeat! Welcome to the life of a data scientist!

Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-

DM_Process_Diagram.png

TECDEV-2765

YANG > Telemetry > TSDB > ML

YANG >> ML: The Production Line

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Production Line

YANG data Telemetry streamer

Receiver / Collector

Bus / Pipeline

Time Series DB

ML models AI

Visualization

Orchestration

TECDEV-2765 169

Case 1: “Eyes on business comms”Anomaly Detection

https://github.com/mikemikhail/ML-anomaly_detectionhttps://github.com/mikemikhail/ML-anomaly_detection-demo

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Data Source

Business app data

Store 135

DC 124

Store 137 Store 178

DC 125

Store 112

timezone timezone -1h

18 unidirectional comm

TECDEV-2765 171

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Visualized – 8days

172TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Visualized – 2 business hours

173TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Visualized – 2 off hours

174TECDEV-2765

ML Advantage

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Why?

Affordable and fastDATA via

Telemetry

NETCONF

Netflow

SNMP

Syslog

db

TECDEV-2765 176

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Why?

Affordable and fastDATA via

Telemetry

NETCONF

Netflow

SNMP

Syslog

db

TECDEV-2765 177

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Why?

Affordable and fastDATA via

Telemetry

NETCONF

Netflow

SNMP

Syslog

db

TECDEV-2765 178

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

ML versus Designed Logic Systems

• ML is typically cheaper, faster to achieve desired behavior

➢ Eliminate the need to design the logic. Only design input/output data, model architecture, parameters, system. Sometimes start by mimicking manual/human process characteristics. Lots of experimentation & testing

➢ Cheaper and faster debug/update cycle. Mostly experimentation & testing

➢ No perfect design. Actually perfection is undesirable

• Designed Logic is needed for:

➢ Absolutely clear and universal production (but is there, really?!)

➢ Has to be perfect the first time (e.g. a new probe on a planet never explored before)

Where, and why ML is advantageous

179TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Next... Evolution

• Add more data: quantity, types

• Gets “smarter” with: more historical, weather/percipitaion, seasonal/holidays, promotions, price fluctuations, …!

• When “smart” enough, can predict impact of…, design/modeling tool

• Within resources, and aware of ROI

• ML(ML) (future)

• ML models to decide on ML models and attributes

• Models of models (future)

• Sticking models together, will synergize

This is just a start,… What you make possible?!

180TECDEV-2765

ML: Model Training, Validation

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What’s an ML Model? Supervised example

A tiny trainable brain

182TECDEV-2765

INFERENCEDATA –t2

MODEL compare

LABELS –t2

loss

INFERENCEDATA –t1

MODEL compare

LABELS –t1

✔︎, RMSE

INFERENCEDATA 0

MODEL compare

ACTUAL 0

?, RMSE

TRAIN

Validate

Predict

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Basis for Inference

Data inputs and validation data

Targetlast 60m

T –

60

m

T –

1d

T –

1w

T –

2w

day

week

TECDEV-2765 183

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Training: Larger Datasets

Data inputs and label data: 1d periodsday

week

TECDEV-2765 184

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Training

Data inputs and label data: -1d

Labels-1d

T –

60

m

T –

1d

T –

1w

T –

2w

day

week

TECDEV-2765 185

ML: See model and training cycles

ML: Data Engineering

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Data Engineering: Why & What?

• Arbitrary ML models do not know my system or meaning of my data

• I don’t know internals/logic of model!

• Domain knowledge becomes critical:

• Which data to collect?

• Data trends? Is it time-dependent? Is it strongly cyclical?

• What are you actually looking to learn from the data?

• With your understanding and with an understanding of what certain ML techniques are good at, you can start to experiment!

It is about the data!

TECDEV-2765 190

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Useful to Learn and Practice..

libraries and tools, samples:

• Numpy: Organizing (arrays) and handling/manipulating (algebra) of numbers

• Pandas: Data structure and analysis

• Matplotlib, Bokeh, Seaborn, PyPlot, D3.js: Graphing

• Jupyter Notebooks: A scratchpad for gluing it all together!

Libraries and tools to prep and work with data

RawData

StructureData

DataPre-

processing

DataExploration

Insights, Reports, Graphs...

TECDEV-2765 191

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Adhere to the “Rules”

• Specific data type(s): float64, float32, string, …

• Rank [axes]: Scalar [0], vector [1], array [2], cube [3], …

• But batch size can vary (per cycle, not per input)

ML model may accept (or risk exception/error):

• A 2-dimensional labeled data structure with columns of potentially different types. Most commonly used pandas object.

DataFrame

TECDEV-2765 192

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

DataFramedef read_train_long(record_count, label_prefix, verbose=True):.

print('\ntraining long data')print(train.describe())

193TECDEV-2765

training long datad_te11200_previous d_te11200_1d d_te11200_1w d_te11200_2w \

count 2.880000e+03 2.880000e+03 2.880000e+03 2.880000e+03mean 1.093179e+07 1.210950e+07 1.215980e+07 2.038959e+07std 1.072837e+07 1.140074e+07 1.141777e+07 1.141292e+07min 0.000000e+00 1.365000e+03 1.087200e+04 2.841000e+0325% 1.663094e+06 1.665956e+06 1.689180e+06 1.086874e+0750% 5.952412e+06 8.211694e+06 8.303540e+06 2.399129e+0775% 1.959832e+07 2.178496e+07 2.212936e+07 3.107957e+07max 3.269652e+07 3.305397e+07 3.308679e+07 3.262037e+07

d_te11201_previous d_te11201_1d d_te11201_1w d_te11201_2w \count 2.880000e+03 2.880000e+03 2.880000e+03 2.880000e+03mean 7.527995e+06 8.306467e+06 8.337991e+06 1.384803e+07std 7.289889e+06 7.747967e+06 7.757351e+06 7.804774e+06min 0.000000e+00 1.380000e+03 8.182000e+03 1.001000e+03

Column “label”

df is 72 x 2880

Math description

of column

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

The Data in 4x 24-hour DataFrame, [4x18]x2880

194TECDEV-2765

4 24-hour_period x 18 path x 2880 30-second_field = 72 column x 2880 row DataFrame

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

DataFrame: same format, smaller sizevalidation data

d_te11200_previous d_te11200_1d d_te11200_1w d_te11200_2w \count 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02mean 1.358485e+06 9.795289e+05 1.179847e+06 1.246457e+06std 8.000545e+05 6.292864e+05 6.797141e+05 7.001280e+05min 1.380000e+03 7.731200e+04 9.600000e+01 1.928000e+0325% 4.905140e+05 3.478915e+05 6.696790e+05 8.413410e+0550% 1.362629e+06 1.087073e+06 1.240100e+06 1.377682e+0675% 2.190353e+06 1.559045e+06 1.827315e+06 1.777683e+06max 2.840542e+06 2.067730e+06 2.336623e+06 2.262407e+06

d_te11201_previous d_te11201_1d d_te11201_1w d_te11201_2w \count 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02mean 7.821397e+05 1.026548e+06 7.012429e+05 7.430594e+05std 5.078806e+05 4.558566e+05 4.654541e+05 5.078973e+05min 0.000000e+00 0.000000e+00 0.000000e+00 1.350000e+0325% 3.321160e+05 5.965380e+05 4.511570e+05 3.695710e+0550% 9.229315e+05 9.878900e+05 8.389290e+05 7.320545e+0575% 1.225480e+06 1.511388e+06 1.236286e+06 1.205447e+06max 1.499793e+06 1.857222e+06 1.556056e+06 1.526029e+06.

195TECDEV-2765

df is 72 x 120

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Fetching, Formatting, Conditioning the Datadef read_data(field_key, measurement_name, condition1, condition2, condition3, limit, label):

query_db = str('SELECT "%s" FROM "%s" WHERE %s AND %s AND %s LIMIT %d ' % (field_key, measurement_name, condition1, condition2, condition3, limit+1))

data_db = client.query(query_db)print('\ndata_db:\n', data_db)data_df = pd.DataFrame(data_db[str(measurement_name)])print('\ndata_df:\n', data_df)print('\ndata_df description:\n', data_df.describe())data_df.columns = [label]data_df.reset_index(drop=True, inplace=True)data_df.fillna(method='ffill', inplace=True)data_df.fillna(method='bfill', inplace=True)data_df -= data_df.min()data_df.drop(data_df.index[0], inplace=True)print('\ndata_df:\n', data_df)print('\ndata_df description:\n', data_df.describe())sys.exit()# data_df = data_df.sub(data_df.shift(fill_value=0))# print('\n', query_db, '\n', data_df.describe())return data_df

196TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Raw Datadata_db:defaultdict(<class 'list'>, {'Cisco-IOS-XR-infra-statsd-oper:infra-

statistics/interfaces/interface/latest/generic-counters': bytes-sent2020-01-03 06:34:04.032000+00:00 22698402082020-01-03 06:34:34.031000+00:00 22698402082020-01-03 06:35:04.027000+00:00 22698402082020-01-03 06:35:34.031000+00:00 22698402082020-01-03 06:36:04.032000+00:00 22698408202020-01-03 06:36:34.041000+00:00 22698423502020-01-03 06:37:04.032000+00:00 22698425542020-01-03 06:37:34.036000+00:00 22698425542020-01-03 06:38:04.032000+00:00 22698425542020-01-03 06:38:34.033000+00:00 22698425542020-01-03 06:39:04.090000+00:00 22698425542020-01-03 06:39:34.035000+00:00 22698535832020-01-03 06:40:04.036000+00:00 22698541292020-01-03 06:40:34.033000+00:00 22698541292020-01-03 06:41:04.035000+00:00 22698542282020-01-03 06:41:34.033000+00:00 22698557132020-01-03 06:42:04.035000+00:00 2269856604.

197TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

In Single Column DataFramedata_df:

bytes-sent2020-01-03 06:34:04.032000+00:00 22698402082020-01-03 06:34:34.031000+00:00 22698402082020-01-03 06:35:04.027000+00:00 22698402082020-01-03 06:35:34.031000+00:00 22698402082020-01-03 06:36:04.032000+00:00 22698408202020-01-03 06:36:34.041000+00:00 22698423502020-01-03 06:37:04.032000+00:00 22698425542020-01-03 06:37:34.036000+00:00 22698425542020-01-03 06:38:04.032000+00:00 22698425542020-01-03 06:38:34.033000+00:00 22698425542020-01-03 06:39:04.090000+00:00 22698425542020-01-03 06:39:34.035000+00:00 22698535832020-01-03 06:40:04.036000+00:00 22698541292020-01-03 06:40:34.033000+00:00 22698541292020-01-03 06:41:04.035000+00:00 22698542282020-01-03 06:41:34.033000+00:00 22698557132020-01-03 06:42:04.035000+00:00 22698566042020-01-03 06:42:34.034000+00:00 22698662682020-01-03 06:43:04.034000+00:00 2269866916.

198TECDEV-2765

data_df description:bytes-sent

count 2.881000e+03mean 2.284429e+09std 1.233722e+07min 2.269840e+0925% 2.271348e+0950% 2.283434e+0975% 2.296978e+09max 2.303451e+09

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Column Label Changed, Baselined, Conditioneddata_df:

d_te11200_previous1 02 03 04 6125 21426 23467 23468 23469 234610 234611 1337512 1392113 1392114 1402015 1550516 1639617 2606018 2670819 28328.

199TECDEV-2765

data_df description:d_te11200_previous

count 2.880000e+03mean 1.459358e+07std 1.233636e+07min 0.000000e+0025% 1.507463e+0650% 1.362955e+0775% 2.713798e+07max 3.361109e+07

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Constructing Multi-Column DataFramedef read_train(record_count, label_prefix, verbose=True):

for interface in tunnel_ifs:query_if = str('("interface-name" = \'%s\')' % (interface))label = str(label_prefix + interface[-7:] + "_previous")read_if = read_data('bytes-sent', 'Cisco-IOS-XR-infra-statsd-oper:infra-

statistics/interfaces/interface/latest/generic-counters', query_if, 'time >= now() - {} - 2h -1m'.format(previous), 'time <= now()', record_count, label)

if interface == tunnel_ifs[0]:train = read_if

else:train = pd.concat([train, read_if], axis=1, sort=False)

label = str(label_prefix + interface[-7:] + "_1d")read_if = read_data('bytes-sent', 'Cisco-IOS-XR-infra-statsd-oper:infra-

statistics/interfaces/interface/latest/generic-counters', query_if, 'time >= now() - {} - 1d - 1h -1m'.format(previous), 'time <= now()', record_count, label)

train = pd.concat([train, read_if], axis=1, sort=False).

label = str(label_prefix + interface[-7:] + "_2w")read_if = read_data('bytes-sent', 'Cisco-IOS-XR-infra-statsd-oper:infra-

statistics/interfaces/interface/latest/generic-counters', query_if, 'time >= now() - 3w - {} - 1h -1m'.format(previous), 'time <= now()', record_count, label)

train = pd.concat([train, read_if], axis=1, sort=False)

train.fillna(method='ffill', inplace=True)return train

200TECDEV-2765

Concatenate columns

Unique column label

ML: Data prep demo

The Model, Process

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Create Model

Every n cycles

Training: Initial, and Periodic; Predict

“Monitor” flowchart

203TECDEV-2765

10 minutes

Parameters & functions

Model exists?

60 minute train,

validate, predict

Very large data train, validate

Large data train,

validate

Data

NRMSE

No

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

The Neural Network, and Sample Cycle.

my_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)dnn_regressor = tf.estimator.DNNRegressor(

feature_columns=construct_feature_columns(training_examples),hidden_units=hidden_units,optimizer=my_optimizer,model_dir= model_directory,label_dimension= len(tunnel_ifs) + len(physical_ifs)

.

hidden_units = [72, 36, 18] # probably an overkill for our small scale.

cycle += 1print('cycle number ', cycle)dnn_regressor = train_nn_regression_model(

learning_rate = 0.0003,steps = 1000,batch_size = 120,hidden_units = hidden_units,training_examples = read_train(120, 'd_'),training_targets = read_train_target(120, 'l_'),validation_examples = read_validate(120, 'd_'),validation_targets = read_last_target(120, 'v_'),prediction = True

.

206TECDEV-2765

Set once, for a new model

Can change every call

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

+ Normalization Layerdef read_train_long(record_count, label_prefix, verbose=True):

global feature_meanglobal feature_stdglobal feature_max

.if feature_mean == 0:

feature_mean = train.mean().mean()print('feature mean: ', feature_mean)

if feature_std == 0:feature_std = train.std().mean()print('feature std: ', feature_std)

if feature_max == 0:feature_max = train.max().mean() / 24 # The mean max per 1 hourprint('feature max: ', feature_max)

.def construct_feature_columns(input_features):.# epsilon = 0.000001

.# choose best normalization of input data

.return set([tf.feature_column.numeric_column(my_feature, normalizer_fn=lambda val: (val) /

(feature_max))for my_feature in input_features])

207TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

The Model

208TECDEV-2765

Input

72

x b

atc

h

Outp

ut

18

x b

atc

h

Norm

aliz

ation layer

72

nodes

Input

laye

r 7

2 n

odes

Hid

den layer

Hid

den layer

Outp

ut

layer

18

nodes

Hid

den layer

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

The data - inputsfeature mean: 7692395.787210648feature std: 7084487.678199986feature max: 899857.4594907407

training long datad_te11200_previous d_te11200_1d d_te11200_1w d_te11200_2w \

count 2.880000e+03 2.880000e+03 2.880000e+03 2.880000e+03mean 1.460392e+07 1.148071e+07 1.377143e+07 1.286884e+07std 1.208010e+07 1.079281e+07 1.288531e+07 1.193284e+07min 5.194400e+04 8.214400e+04 7.731200e+04 1.128000e+0325% 5.464170e+06 2.660394e+06 3.144108e+06 2.952690e+0650% 7.655800e+06 5.238823e+06 6.525733e+06 6.685706e+0675% 2.390421e+07 2.051300e+07 2.478536e+07 2.295742e+07max 4.010818e+07 3.499195e+07 3.808687e+07 3.528133e+07

d_te11201_previous d_te11201_1d d_te11201_1w d_te11201_2w \count 2.880000e+03 2.880000e+03 2.880000e+03 2.880000e+03mean 9.872998e+06 7.983878e+06 9.340476e+06 8.721320e+06std 8.096162e+06 7.326994e+06 8.648336e+06 8.025967e+06min 0.000000e+00 1.080000e+03 0.000000e+00 0.000000e+0025% 3.781685e+06 2.003244e+06 2.256103e+06 2.102746e+0650% 5.334513e+06 3.862060e+06 4.561317e+06 4.507384e+06.

209TECDEV-2765

Data indicators,Useful for normalization

Column key

Record count

Min value

Max value

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

The Training.period 08 : 3582.59

training_predictions baoundaries[ 7208.3516 3251.541 7443.031 4104.243 4939.107 4307.9517-4889.6 -2828.935 5941.4014 726.0451 7325.6025 4610.5356-3598.1191 3449.1812 635.07806 4949. 4667.8926 -1990.8337 ]

[228423.4 177642.34 234673.56 181417.75 247542.95 188667.36 242869.14185446.6 211784.83 67781.5 213718.23 222849.58 222998.84 172893.5567602.49 174859.56 182980.92 178175.17]

validation_predictions boundaries[ 6515.7734 3770.0752 8726.353 3760.6396 13467.836 3967.8716-5080.483 -3837.6772 5243.997 539.8233 8261.776 11545.85-3959.705 3852.7988 432.32617 4614.4375 4369.189 -3105.7751 ]

[193788.67 157614.84 150174.69 216238.94 187304.67 213302.8288468.47 197036.98 181182.31 51306.69 143677.8 166325.1265046.75 152532.97 51454.254 206665.23 205994.08 187957.39 ]period 09 : 3548.03

Model training finished.Final RMSE (on training data): 3548.03Final RMSE (on validation data): 22812.22Final NRMSE (/prediction, /actual): 0.25, 0.25 cycle number 788

210TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Training Makes Perfect!

211TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Best Indicator: NRMSE

212TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

A Short Term, Small Issue, & the Morning After

213TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

A Severe Issue, 1 Store Partially Crippled

214TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

A Severe Issue, the 2 main events

215TECDEV-2765

Memory issueNRMSE > 430

Reset (reload)NRMSE > 1,450

YANG >> ML: Automation

YANG >> ML: Look at some code

YANG >> ML: Quality Assurance

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Source: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

The importance of visualizing your data!

TECDEV-2765 219

All four of these four datasets have the same

• Mean of x and mean of y• Variance of x and variance of y• Correlation coefficient between x and y• Linear regression line (y = 3 + 0.5x)• Coefficient of determination of the

linear regression

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Quality Assurance

• Check data every step of the way: description, extents, format, graph

• Ensure data quality: real raw data, complete, sufficient [quality learning experience!]

• Run a control model, maybe testing next release

• Avoid overfitting. Train on bigger set periodically

• Test

• Monitor the “monitors”

Ensure model is doing the job, every step of the way, better as time goes

TECDEV-2765 220

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Check the Data: Inputtraining target

l_te11200 l_te11201 l_te13501 l_te13502 l_te13703 \count 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02 mean 2.113163e+06 9.949988e+05 1.371433e+06 5.974629e+05 1.400498e+06 std 6.925446e+05 7.613872e+05 8.314939e+05 6.123204e+05 9.417285e+05 min 7.852000e+04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 25% 2.056924e+06 1.579850e+04 5.622930e+05 1.453625e+04 2.296910e+05 50% 2.375026e+06 1.365219e+06 1.964674e+06 3.089710e+05 1.966452e+06 75% 2.392869e+06 1.707655e+06 1.982820e+06 1.342634e+06 2.208534e+06 max 3.309991e+06 1.722082e+06 2.450328e+06 1.357508e+06 2.224787e+06

l_te13704 l_te17801 l_te17802 l_te12400 l_te12401 \count 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02 120.000000 mean 5.292162e+05 9.654491e+05 1.236253e+06 1.804734e+06 32161.100000 std 6.398793e+05 7.201600e+05 5.183502e+05 5.926864e+05 19069.038165 min 0.000000e+00 6.040000e+04 0.000000e+00 6.040000e+04 0.000000 25% 1.630975e+04 2.968030e+05 1.061104e+06 1.753321e+06 15048.250000 50% 3.639200e+04 9.669420e+05 1.078636e+06 2.028434e+06 32943.000000 75% 1.342081e+06 1.211375e+06 1.683253e+06 2.046180e+06 48227.000000 max 1.506047e+06 2.422147e+06 2.238537e+06 2.824590e+06 63574.000000

.

221TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Check the Data: Model Output.training_predictions baoundaries[ -9676.203 43134.777 30126.502 49435.23 -32687.848

53458.71 -41498.527 2386.6956 -7947.478 841.84825334.666 -29982.818 -35702.715 39189.33 822.512443476.05 48315.46 459.85422]

[2567748.5 1706653.8 1980904.5 2473544.2 2354506.8 2123372.2704984.5 1812864.2 2203519. 67233.766 1694452.8 1959683.92268782.5 1549306.6 66137.98 2231304.2 1900274.9 1607212.8 ]

validation_predictions boundaries[ 9514.773 37849.348 3103.1733 43236.957 -5364.3784

22101.646 -17330.65 16654.14 8049.155 765.13153570.572 -6731.2876 -15120.497 33285.02 766.89764

37848.816 19647.463 13101.8 ][2846600. 2264547.2 3210291. 2034950. 3650229.2 1717540.41242460.5 1751227.8 2441805.8 68177.71 2717810.5 3045913.51010453.4 2042300.1 66662.13 1811712.9 1519572.6 1552536.8 ]period 08 : 55037.89

.

222TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Check Progress: RMSE.

periods = 10steps_per_period = steps / periods

.for period in range (0, periods):

# Train the model, starting from the prior state.dnn_regressor.train(

input_fn=training_input_fn,steps=steps_per_period

).

# Compute training and validation loss.training_root_mean_squared_error = math.sqrt(

metrics.mean_squared_error(training_predictions, training_targets))validation_root_mean_squared_error = math.sqrt(

metrics.mean_squared_error(validation_predictions, validation_targets))# Occasionally print the current loss.print(" period %02d : %0.2f" % (period, training_root_mean_squared_error))# Add the loss metrics from this period to our list.training_rmse.append(training_root_mean_squared_error)validation_rmse.append(validation_root_mean_squared_error)

.

223TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Check Progress: RMSE.RMSE (on training data):

period 00 : 3117491.16period 01 : 2921701.20period 02 : 2787017.03period 03 : 2680300.57period 04 : 2589999.24period 05 : 2511086.36period 06 : 2441079.98period 07 : 2378441.63period 08 : 2321990.45period 09 : 2270826.91

Model training finished..

224TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Visualize Progress: RMSE.

for period in range (0, periods):# Train the model, starting from the prior state.dnn_regressor.train(

input_fn=training_input_fn,steps=steps_per_period

)# Take a break and compute predictions.training_predictions = dnn_regressor.predict(input_fn=predict_training_input_fn)training_predictions = np.array([[item['predictions'][i] for i in range(0,

len(tunnel_ifs) + len(physical_ifs))] for item in training_predictions]).

validation_predictions = dnn_regressor.predict(input_fn=predict_validation_input_fn)

validation_predictions = np.array([[item['predictions'][i]for i in range(0, len(tunnel_ifs) + len(physical_ifs))] for item in validation_predictions]).

if if_plot:# RMSE values and graphsglobal x_periodsx_periods += periodsplt.ion()

.

225TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Visualize Progress: RMSE

226TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Validate: Visualize Prediction vs. Actual

227TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Validate: Visualize Prediction vs. Actual

228TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Eyes are Open?

229TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Quality Indicators

• Newborns start ~random, struggle a bit, then learn at a reasonable rate

• Gets better with experience, without overfitting. Slow to “forget”

• Learns better with bigger and fatter data sets

• Never at a steady state, never perfect

Getting better with time, and with data!

TECDEV-2765 230

YANG >> ML: Look at quality indicators

Case 2: “When will my switch melt?”

Hardware Resource Utilization Prediction

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

When Will My Switch Melt?

233TECDEV-2765

• What data might be relevant?

• Let’s consider the application of IP access lists or role-based access lists to a Catalyst edge switch

• We can monitor the security access control entries, a limited resource…

GET https://{{switch}}/restconf/data/tcam-details/tcam-detail

{

"Cisco-IOS-XE-tcam-oper:tcam-detail": {

"asic-no": 0,

"name": "Security Access Control Entries",

"hash-entries-max": 0,

"tcam-entries-max": 5120,

"hash-entries-used": 0,

"tcam-entries-used": 188

}

}

• How can we predict when we may run out of TCAM space?

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Yep, The Data is Available Over CLI Also

234TECDEV-2765

DC-C9300-1-Fabric1#show platform hardware fed switch active fwd-asic resource tcam utilization

CAM Utilization for ASIC [0]

Table Max Values Used Values

--------------------------------------------------------------------------------

Unicast MAC addresses 32768/1024 25/21

L3 Multicast entries 8192/512 0/9

L2 Multicast entries 8192/512 0/11

Directly or indirectly connected routes 24576/8192 51/151

QoS Access Control Entries 5120 85

Security Access Control Entries 5120 188

Ingress Netflow ACEs 256 6

Policy Based Routing ACEs 1024 22

Egress Netflow ACEs 768 6

Flow SPAN ACEs 1024 13

Control Plane Entries 512 259

Tunnels 512 18

Lisp Instance Mapping Entries 512 8

Input Security Associations 256 4

Output Security Associations and Policies 256 5

SGT_DGT 8192/512 1/1

CLIENT_LE 4096/256 3/0

INPUT_GROUP_LE 1024 0

OUTPUT_GROUP_LE 1024 0

Macsec SPD 256 2

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What Do We Know About The Data?

• TCAM entries are a physically limited resource

• They are shared across multiple features, but tend to have fixed pool sizes

• Utilization changes over time

• On enterprise network edge switches a strong correlation to when users logon

• Would some form of linear regression be useful…? Let’s investigate!

235TECDEV-2765

Resource Utilization & Linear Regression Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What have we learned?

• Increasing the order of the polynomial you fit can give a curve that tracks very closely to your data…but will it be useful for prediction against anything but your test data

• Naïve use of linear regression, as an example, may not give good results:

• Too long a time window => really bad fit

• Too short a time window => too spiky, false positives

• Conclusion:

• Run multiple experiments with your observed data

• Play with the hyperparameters you use

• Short-term predictions are pretty good for this use case

• Have a backup!!

241TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Can we do better than LR?

• LR & ARIMA perhaps feel like they’re just applied statistics…and yes, that’s right!

• But are there other approaches?

• We’ve talked about neural networks and application to image processing and other use cases…

• But can we train a neural network to do time-series predictions over hardware resource utilization?

• Let’s take our example and scale it up…

242TECDEV-2765

Resource Utilization & TensorFlow RNN Example

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

What have we learned?

• Yes, an RNN can be used with univariate time series data…and it can do quite well for this use case, but so can simple LR and ARIMA at lower cost!

• Training data and time to train is important:

• Too little data and predictions are not very good

• Too few epochs of training and predictions are not very good

• Diminishing returns at a certain point

• The number of layers you pick has a big impact:

• Lots of layers takes longer to train

• Fewer layers can be quicker to train

• Results with fewer layers may be as good as or even better than multi-layer models

• Use tools like TensorBoard to help you visualize what is happening with training

• Can help you see when your training is converging

• Again, experiment with the hyperparameters:

• Batch size, epochs, layers, training steps, validation steps

255TECDEV-2765

Case 3: “The Routing Game!”AI Reward-Based

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Different from Case 1

• Interacting with network nodes via NETCONF, live. No Telemetry, no TSDB.

• Using tf-keras library, on Tensorflow 2.0+/latest [previous 1.x]

• Brain is fat and shallow, for fast learning & flexibility (short term “retention”)

• Machine explored and frequently optimized target

• Ideal for: analytics slice [ex. VRF-Lite] of the network, with 1:n real-time sampling

analytics

production

TECDEV-2765 257

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Playing the Game

Experimenting & improving forever..

TECDEV-2765 258

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

The ML Model.# Wide, shallow modelmodel = tf.keras.Sequential([

tf.keras.layers.Dense(1200, activation=tf.nn.relu, input_shape=(4, 15)), #input shapetf.keras.layers.Dense(1200, activation=tf.nn.relu),tf.keras.layers.Dense(15)

]).

# Categorical cross entropy per prefix per nodeloss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True).# transform probabilities into discrete choice, per routelabels_tf = tf.transpose(tf.nn.softmax(labels_all, axis=0)).

259TECDEV-2765

4 probabilities

each

Normalize vectors

into probability

distribution

15 prefixes

Between

probabilities

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Probabilities, Normalized [Softmax] - example.tf.Tensor([[0.25497461 0.24361039 0.24042456 0.2463647 0.25370661 0.25775105

0.23473186 0.25987438 0.24686818 0.24663202 0.24490015 0.253107680.24204974 0.25942015 0.25059948]

[0.24819476 0.2517995 0.25362213 0.25551216 0.24509107 0.25448860.26019332 0.24580416 0.25076208 0.24758969 0.24134524 0.248244750.25225773 0.24561698 0.2488974 ]

[0.25545497 0.25443653 0.25465455 0.25028481 0.25478405 0.242966170.25593156 0.24964981 0.25029009 0.25659815 0.25335059 0.250561050.25391062 0.24713582 0.25155338]

[0.24137566 0.25015358 0.25129875 0.24783833 0.24641826 0.244794180.24914326 0.24467165 0.25207965 0.24918014 0.26040402 0.248086520.25178191 0.24782706 0.24894974]], shape=(4, 15), dtype=float64)

.

260TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Array of Probabilities - example.labels_all tf.Tensor([[-3367. -4061. -3539. -3460. -3797. -3243. -3797. -2900. -3716. -3731.

-2676. -3424. -3780. -3735. -3648.][-3486. -3007. -3136. -2131. -3292. -2908. -2013. -3524. -3364. -3463.-3693. -3650. -3480. -2329. -3961.]

[-2870. -3230. -3699. -3897. -3249. -3318. -3554. -3484. -2941. -3541.-3754. -2772. -3735. -3910. -1762.]

[-3675. -3100. -3024. -3910. -3060. -3929. -4034. -3490. -3377. -2663.-3275. -3552. -2403. -3424. -4027.]], shape=(4, 15), dtype=float64)

.

261TECDEV-2765

probabilities

shape

print('labels_all', tf.argmax(labels_all, axis=0)).labels_all tf.Tensor([2 1 3 1 3 1 1 0 2 3 0 2 3 1 2], shape=(15,), dtype=int64).

Only max values

Play the Game

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

The Routing Table.load_new: 13.0weight : 2.3Loss: 23.342491149902344labels_all tf.Tensor([[-2938. -2432. -3260. -3101. -2996. -2440. -3173. -2005. -2725. -3154.-2267. -2936. -2979. -3326. -3085.][-3147. -2537. -2499. -1662. -2889. -3094. -1579. -3297. -2955. -2403.-2587. -3090. -2348. -2035. -2866.][-1586. -2640. -2977. -3198. -2951. -2512. -3198. -2369. -2155. -3049.-3062. -1888. -3117. -2777. -2218.][-3279. -3341. -2214. -2989. -2114. -2904. -3000. -3279. -3115. -2344.-3034. -3036. -2506. -2812. -2781.]],

shape=(4, 15), dtype=float64)labels_all tf.Tensor([2 0 3 1 3 0 1 0 2 3 0 2 1 1 2], shape=(15,), dtype=int64).

263TECDEV-2765

.labels:tf.Tensor([2 0 3 1 3 0 1 0 2 3 0 2 1 1 2], shape=(15,), dtype=int64)targets:tf.Tensor([2 1 3 1 3 0 1 0 2 3 0 2 3 1 2], shape=(15,), dtype=int64)

gi2 gi3 gi4 gi5r711 10.1.0.0/16 0 0 1 0

10.2.0.0/16 1 0 0 010.3.0.0/16 0 0 0 1

r712 10.1.0.0/16 0 1 0 010.2.0.0/16 0 0 0 110.3.0.0/16 1 0 0 0

r713 10.1.0.0/16 0 1 0 010.2.0.0/16 1 0 0 010.3.0.0/16 0 0 1 0

w701 10.1.0.0/16 0 0 0 110.2.0.0/16 1 0 0 010.3.0.0/16 0 0 1 0

w702 10.1.0.0/16 0 1 0 010.2.0.0/16 0 1 0 010.3.0.0/16 0 0 1 0

.

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

✓ YANG: & the data

✓ Telemetry: transport, senders, receiver, collector

✓ TSDB: the datastore

✓ Visualization, dashboards

✓ ML models: the monitor workers

✓ Production line: YANG > Telemetry > TSDB > ML

264

We have covered..

TECDEV-2765

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

✓ Definition & Components

✓ Neural Networks

✓ ML Capabilities

✓ Engineering Process

✓ ML @CISCO

✓ Models, Training, Validation

265

Machine Learning

✓ Prediction & Accuracy

✓ Data Engineering

✓ ML Quality Assurance

✓ Case 1: Anomaly detection

✓ Case 2: Anomaly prediction

✓ Case 3: AI game

TECDEV-2765

Complete your online session survey • Please complete your session survey

after each session. Your feedback is very important.

• Complete a minimum of 4 session surveys and the Overall Conference survey (starting on Thursday) to receive your Cisco Live t-shirt.

• All surveys can be taken in the Cisco Events Mobile App or by logging in to the Content Catalog on ciscolive.com/emea.

Cisco Live sessions will be available for viewing on demand after the event at ciscolive.com.

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 266

#CLEMEA

11:00

BRKOPS-1871Automate your SW

delivery process

09:00Opening Keynote

17:00Guest Keynote

18:30Cisco Live

Celebration

09:00

BRKNMS-2032YANG Data Modeling and

NETFCONF: Cisco and Industry Developments

11:30

BRKOPS-2285Programmability with

IOS-XR Platforms

BRKSDN-2717The hitchhiker's guide -Managing your Network

as Code (DevOps)

08:30

BRKSDN-237913 steps from an

unprogrammed to a fully automated network

14:45

BRKPRG-2482Infrastructure as Code -

Building, Deploying, Securing, Monitoring and

Managing Robust and Repeatable Networks Using

Code and APIS

08:30

BRKNMS-3021Advanced Cisco IOS

Device Instrumentation

14:30

BRKNMS-2285How to be a hero with

Cisco DNA Center Platform APIs

BRKOPS-2562Data is the new Oil: The Nuts & Bolts of

leveraging Cisco DNA Assurance data for

creating value added services

17:00

BRKSDN-2497Build Your API-Based NW Troubleshooting

Kit

16:45

BRKOPS-2024Wireless Automation & Assurance with Cisco

DNA Center using APIs

11:00

PSOOPS-2236Unlocking the power of

open platform with Cisco DNA Center Platform

11:15

BRKOPS-3825Interpreting streaming

telemetry data using ML/AI

OPSOperations Track

www.ciscolive.com/emea/learn/technology-tracks/operations.html

Network Programmability

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public

Continue your education

269TECDEV-2765

Related sessions

Walk-In LabsDemos in the Cisco Showcase

Meet the Engineer 1:1 meetings

Thank youThank you