TECDEV-2765 - Cisco Live
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of TECDEV-2765 - Cisco Live
Frank Marsman Systems Engineer [email protected] Mikhail Architect [email protected] Nilsen-Nygaard Principal Engineer [email protected]
TECDEV-2765
YANG > Telemetry > Visualization > MLFrom YANG to Machine Learning in 4 Hours!
Questions? Use Cisco Webex Teams to chat with the speaker after the session
Find this session in the Cisco Events Mobile App
Click “Join the Discussion”
Install Webex Teams or go directly to the team space
Enter messages/questions in the team space
How
1
2
3
4
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Webex Teams
TECDEV-2765 3
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
➢ YANG: & the data
➢ Telemetry: transport, senders, receiver, collector
➢ TSDB: the datastore
➢ Visualization, dashboards
➢ ML models: the monitor workers
➢ Production line: YANG > Telemetry > TSDB > ML
Agenda
5TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
➢ Definition & Components
➢ Neural Networks
➢ ML Capabilities
➢ Engineering Process
➢ ML @CISCO
➢ Models, Training, Validation
6
Machine Learning -subtopics
TECDEV-2765
➢ Prediction & Accuracy
➢ Data Engineering
➢ ML Quality Assurance
➢ Case 1: Anomaly detection
➢ Case 2: Anomaly prediction
➢ Case 3: AI game
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
The Team
• Frank Marsman, Systems Engineer, [email protected]
• Interests: Programmability, Machine Learning
• Mike Mikhail, Delivery Architect, [email protected]
• Interests: Automation, Machine Learning, NFV, SP technologies
• Einar Nilsen-Nygaard, Principal Engineer, [email protected]
• Interests: Programmability, Python, Access Policy, Telemetry
7TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Terminology
11TECDEV-2765
Components of a working system
Yang
Models
Management
Applications
YANG Modules
YANG Modules
YANG Modules
YANG Modules
Client
NETCONF
ServerData StoresConfig & Oper
DataNETCONF
Session
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
YANG is a data modeling language
• Readable by humans and machines
• Hierarchical, modular, and extensible
• https://datatracker.ietf.org/wg/netmod/documents/
“Yet Another Next Generation” — really!
13
UTF-8 Text
YANG Model
Configuration Data
Operational Data
Actions (RPCs)
Notifications
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Industry definitionIETF, ITU, OpenConfig, etc.
Common functionality shared across vendors
Example: ietf-diffserv-policy.yang
(IETF Diffserv data model)
Vendor definition
Unique to a Vendor operating system or platform
Example: Cisco-IOS-XR-ipv4-bgp-cfg.yang(IOS-XR BGP config data model)
Open
Models
Native
Models
Today Open Models are a functional subset of Native Models
Two flavors of YANG data models
15TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
YANG data models are hierarchical (trees)
• Example: Cisco XR OSPFv3 module
$ pyang -f tree [email protected]
module: Cisco-IOS-XR-ipv6-ospfv3-oper+--ro ospfv3+--ro processes+--ro process* [process-name]+--ro vrfs| +--ro vrf* [vrf-name]| +--ro vrf-name xr:Cisco-ios-xr-string| +--ro summary-prefixes| | +--ro summary-prefix*| | +--ro prefix? inet:ipv6-address-no-zone| | +--ro prefix-length? xr:Ipv6-prefix-length| | +--ro prefix-metric? uint32| | +--ro prefix-metric-type? Ospfv3-default-metric| | +--ro tag? Uint32... ... ...
Downloaded from server (router)
Type defined in another module
Module name
Container
List entry (note “*”)
LeafRead-only
(operational)
data
TECDEV-2765 16
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Where to find YANG models
• Retrieve from the YANG server, via NETCONF <get-schema>
• GitHub
• https://github.com/YangModels/yang/
• IETF, IEEE, Broadband Forum, and MEF draft and standard models
• Vendor models for Cisco, Ciena, Huawei, and Juniper
• https://github.com/openconfig/public/tree/master/release/models
• OpenConfig models
TECDEV-2765 18
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
OpenConfig YANG models
• http://www.openconfig.net/
• https://github.com/openconfig/public
Vendor neutral
20TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
NETCONF YANG Module Capabilities
21
ServerClient
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.1</capability>
<capability>urn:ietf:params:netconf:capability:startup:1.0</capability>
...
<capability>urn:ietf:params:xml:ns:yang:ietf-interfaces?
module=ietf-interfaces&revision=2014-05-08
&features=pre-provisioning,if-mib,arbitrary-names</capability>
</capabilities>
<session-id>4</session-id>
</hello>
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Getting YANG Modules
22
ServerClient
<rpc message-id="102" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<get-schema xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
<identifier>ietf-interfaces</identifier>
</get-schema>
</rpc>
<rpc-reply message-id="102" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<data xmlns="urn:ietf:params:xml:ns:yang:ietf-netconf-monitoring">
module ietf-interfaces {
//ietf-interfaces yang module contents here ...
}
</data>
</rpc-reply>
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
• For your convenience!
• Grouped into subdirectories,per-OS, per-release
• Includes per-platform capabilities data
• Includes copies of all open models supported in each release
Cisco YANG models on github.com
23TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
pyang: Extensible YANG validator and converter
• https://github.com/mbj4668/pyang/
• Open source (ISC license)
• Python based
• Usable as standalone tool or as part of a Python workflow
25TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Validate & Display YANG Modules With pyang
26
https://github.com/mbj4668/pyang$ pyang -f tree --tree-depth 5 Cisco-IOS-XR-l2-eth-infra-oper@2015-11-09.yangCisco-IOS-XR-l2-eth-infra-oper-sub1@2015-11-09.yang:11: warning: imported module Cisco-IOS-XR-types not [email protected]:11: warning: imported module Cisco-IOS-XR-types not usedmodule: Cisco-IOS-XR-l2-eth-infra-oper
+--ro mac-accounting| +--ro interfaces| +--ro interface* [interface-name]| +--ro interface-name xr:Interface-name| +--ro state| | +--ro is-ingress-enabled? boolean| | +--ro is-egress-enabled? boolean| | +--ro number-available-ingress? uint32| | +--ro number-available-egress? uint32| | +--ro number-available-on-node? uint32| +--ro ingress-statistic*| | +--ro mac-address? yang:mac-address| | +--ro packets? uint64| | +--ro bytes? uint64| +--ro egress-statistic*| +--ro mac-address? yang:mac-address| +--ro packets? uint64| +--ro bytes? uint64+--ro vlan| +--ro nodes| +--ro node* [node-id]
GET: Cisco-IOS-XR-l2-eth-infra-oper:vlan
GET: Cisco-IOS-XR-l2-eth-infra-oper:mac-accounting
Also have --tree-path
Or try jstree instead?
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
https://yangcatalog.orgA repository of YANG tools and the metadata around YANG models with the purpose of driving collaboration between authors and adoption with consumers.
TECDEV-2765 27
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
YANG Catalog
28
Web-Based Searching of YANG Models
Yang Search
View model relationships
Search for nodes
Display model trees
REST queries
http://yangcatalog.org/yang-search/TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
7% 7% 8%7%
14%
20%
0
10
20
30
1 2 3
CPU load
0
100
200
300
400
5s 10s 15s 20s
Thousands
Counters
0 5 10 15 20 25
Memory
Interface
counters
Time to collect all data
(NCS5516, 576х100GE)
Telemetry
SNMP
Destinations
Seconds
More counter data
Reduction in CPU load
Faster collection
“Pushing” More Data Really Does Work Better
31TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How Do You See Telemetry?
1. Which Telemetry?
2. What to Observe?
3. How to Observe?
4. Time to Explore
32TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
NETCONF RESTconf gRPC
Device Features
Interface BGP QoS ACL …
SNMP
YANG Data Model
Open Native Open Native
Physical and Virtual Network Infrastructure
Configuration Operational
Programmable
Interfaces
Data Models
Protocol
Data
Programmable Interface “Stack”
TECDEV-2765 34
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
NETCONF RESTconf gRPC
Device Features
Interface BGP QoS ACL …
SNMP
YANG Data Model
Open Native Open Native
Physical and Virtual Network Infrastructure
Configuration Operational
Programmable
Interfaces
Data Models
Protocol
Data
Collectors & Applications Visibility
Programmable Interface “Stack”
TECDEV-2765 35
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 36TECDEV-2765
Data Model-driven Management
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Model-Driven Telemetry
TECDEV-2765 37
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry
YANG Model Data Push – Dial-Out
38TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry
YANG Model Data Push – Dial-In
39TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
IOS-XR Support Matrix
Classic XR ASR9k
Evolved XR ASR9k
NCS5500 NCS6k
MDT support 6.1.1 6.1.1 6.1.1 6.1.3
Data modelsYANG
(native, OC)YANG
(native, OC)YANG
(native, OC) YANG
(native, OC)
Transport (Control
protocols)
TCP, UDP (6.2.1)
gRPC (dial-in, dial-out), TCP, UDP (6.2.1)
gRPC (dial-in, dial-out), TCP, UDP (6.2.1)
TCP, UDP (6.2.1)gRPC (mgmt port only, dial-in, dial-out, 6.5.1)
EncodingGPB /
GPB-KV / JSON (6.3.1)
GPB / GPB-KV /
JSON (6.3.1)
GPB / GPB-KV /
JSON (6.3.1)
GPB / GPB-KV /
JSON (6.3.1)
gNMI 6.5.1 6.5.1 6.5.1
40
New
New
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
NX-OS Support Matrix
41TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
NX-OS Support Matrix
42TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
IOS-XE Support Matrix
43
Switching Wireless Routing
CAT 3650 / 3850
CAT 9200L
CAT 9200
CAT 9300L
CAT 9300 / 9400 / 9500
CAT 9500H
CAT 9600
CAT 9800-CL
CAT 9800 -40/80
ISR 1000 ISR 4000CSR
1000v
ASR 1000 Fixed
ASR 1000
Modular
Model Driven Configuration Management
NETCONF 16.5+ 16.9+ 16.9+ 16.9+ 16.6+ 16.8+ 16.11+ 16.10+ 16.10+ 16.8+ 16.3+ 16.3+ 16.3+ 16.3+
RESTCONF 16.7+ 16.9+ 16.9+ 16.9+ 16.7+ 16.8+ 16.11+ 16.11+ 16.11+ 16.8+ 16.6+ 16.6+ 16.6+ 16.6+
gNMI 16.12+ 16.12+ 16.12+ 16.8+ 16.10+ 16.11+
Model Driven Telemetry
NETCONF Dial-In 16.6+ 16.9+ 16.9+ 16.9+ 16.6+ 16.8+ 16.11+ 16.8+ 16.7+ 16.10+ 16.7+ 16.8+
gRPC Dial-Out 16.10+ 16.10+ 16.10+ 16.10+ 16.10+ 16.11+ 16.10+ 16.10+ 16.10+ 16.10+ 16.10+
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
IOS-XE Support Matrix
44
IoT SPAG Cable
IR1100
ESR6300
IE3x00
ESS 3300ASR 900 /
920NCS 520 NCS 4200 cBR-8
Model Driven Configuration Management
NETCONF 16.10+ 16.11+ 16.11+ 16.8+ 16.10+ 16.8+ 16.8+
RESTCONF 16.10+ 16.11+ 16.11+ 16.8+ 16.10+ 16.8+ 16.8+
Model Driven Telemetry
NETCONF Dial-In
16.10+ 16.9+ 16.10+ 16.9+ 16.9+
gRPC Dial-Out
16.10+ 16.10+ 16.10+ 16.10+ 16.10+
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry: Node Configuration
RP/0/RP0/CPU0:PE125#show running-config telemetry model-drivenFri Apr 21 21:10:32.469 EDTtelemetry model-drivendestination-group COLL1address family ipv4 192.168.30.101 port 2103encoding self-describing-gpbprotocol tcp
!address family ipv4 192.168.30.102 port 2103encoding self-describing-gpbprotocol tcp
!!destination-group COLL-ROUTINGaddress family ipv4 192.168.30.101 port 2103encoding self-describing-gpbprotocol tcp
!.
IOS-XR example
46TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry: Node Configuration - Continued
.
!sensor-group YD1sensor-path Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfacessensor-path Cisco-IOS-XR-infra-statsd-oper:infra-
statistics/interfaces/interface/latest/generic-counters!sensor-group YD-ROUTINGsensor-path Cisco-IOS-XR-fib-common-oper:fibsensor-path Cisco-IOS-XR-ip-rib-ipv4-oper:rib/vrfs/vrf/afs/af/safs/saf/ip-rib-route-
table-names/ip-rib-route-table-name/routes!subscription SUB1sensor-group-id YD1 sample-interval 30000destination-id COLL1
!subscription SUB-ROUTINGsensor-group-id YD-ROUTING sample-interval 30000destination-id COLL-ROUTING
!!
IOS-XR example
47TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry: Node Operation
RP/0/RP0/CPU0:PE125#show telemetry model-driven subscription SUB-ROUTINGFri Apr 21 21:31:27.002 EDT
Subscription: SUB-ROUTING-------------State: ACTIVESensor groups:Id: YD-ROUTINGSample Interval: 30000 msSensor Path: Cisco-IOS-XR-fib-common-oper:fibSensor Path State: ResolvedSensor Path: Cisco-IOS-XR-ip-rib-ipv4-oper:rib/vrfs/vrf/afs/af/safs/saf/ip-rib-route-table-
names/ip-rib-route-table-name/routesSensor Path State: Resolved
Destination Groups:Group Id: COLL-ROUTINGDestination IP: 192.168.30.101Destination Port: 2103Encoding: self-describing-gpbTransport: tcpState: ActiveNo TLS Total bytes sent: 2256783989Total packets sent: 32568Last Sent time: 2017-04-21 21:31:20.1422343376 -0400
.
IOS-XR example
48TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry: Node Operation - Continued
.
Collection Groups:------------------Id: 7Sample Interval: 30000 msEncoding: self-describing-gpbNum of collection: 607Collection time: Min: 48 ms Max: 559 msTotal time: Min: 280 ms Avg: 18724 ms Max: 21364 msTotal Deferred: 0Total Send Errors: 0Total Send Drops: 0Total Other Errors: 4248Last Collection Start:2017-04-21 21:30:55.1397666376 -0400Last Collection End: 2017-04-21 21:31:15.1417238376 -0400Sensor Path: Cisco-IOS-XR-fib-common-oper:fib
Id: 8Sample Interval: 30000 msEncoding: self-describing-gpbNum of collection: 450Collection time: Min: 44 ms Max: 180 msTotal time: Min: 103 ms Avg: 154 ms Max: 590 msTotal Deferred: 0Total Send Errors: 0
.
IOS-XR example
49TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry – Node configuration
IOS XE example
50
csr1000v#show run | section telemetry
telemetry ietf subscription 22
encoding encode-kvgpb
filter xpath /memory-ios-xe-oper:memory-statistics/memory-statistic/free-memory
stream yang-push
update-policy periodic 30000
receiver ip address 192.168.222.43 57500 protocol grpc-tcp
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry: Node Operation
IOS XE example
51
.
csr1000v#show telemetry ietf subscription 22 detail
Telemetry subscription detail:
Subscription ID: 22
Type: Configured
State: Valid
Stream: yang-push
Filter:
Filter type: xpath
XPath: /memory-ios-xe-oper:memory-statistics/memory-statistic/free-memory
Update policy:
Update Trigger: periodic
Period: 100
Encoding: encode-kvgpb
Source VRF:
Source Address:
Notes:
Receivers:
Address Port Protocol Protocol Profil
------------------------------------------------------------------
192.168.222.43 57500 grpc-tcp
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Telemetry – Node Configuration
telemetrydestination-group 1ip address 172.31.219.148 port 50001 protocol gRPC encoding GPB
sensor-group 1path sys/bgp/ depth unboundedpath sys/intf depth unbounded
sensor-group 2data-source NX-APIpath "show environment power" depth 0path "show processes cpu sort" depth 0
subscription 1dst-grp 1snsr-grp 1 sample-interval 30000
subscription 2dst-grp 1snsr-grp 2 sample-interval 60000
NX-OS Example
52TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Remember the Hardware!
Designed for Model Training
• Eight NVIDIA Tesla V100s
• NVIDIA NVLink Interconnect
• Up to 24 Disks; RAID Controller
• Up to 6 NVMe Drives
• Network: Up to 100GB
• High Availability
UCS C480 ML: Storage, processing, memory
TECDEV-2765
GPUs8 X V100 32GB NVLink Interconnect
54
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
C220/240’s for Dev & Testing
55TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pre-packaged: ELK Stack
• Elasticsearch + Logstash + Kibana
• ELK stack + Cisco proto @ https://github.com/cisco/bigmuddy-network-telemetry-stacks
• 3 programs:
• Logstash: Data/log receiver << data goes there
• Elasticsearch: Data extractor << the engine indexing and sorting
• Kibana: Data visualization << rendering in several formats
• Easy steps:
• Clone the code
• Script installs in 3 Docker “containers”
• Run, Logstash listens on TCP 2103 (default)
• Customize Kibana “visualizations”, save, accessible via http: links
• Optionally organize dashboards, save
56TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
ELK Stack Components
cisco@mamikhai-ubuntu:/var/local/stack_elk/logstash_data$ sudo docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES94d64b0f0523 logstash:2.3.1 "/bin/sh -c '/star..." 27 hours ago Up 27 hours stack_elk_logstashdaad55f9db47 kibana:4.5.0 "/bin/sh -c '/star..." 27 hours ago Up 27 hours 0.0.0.0:5601->5601/tcp stack_elk_kibana8f4c4ec193fc elasticsearch:2.3.1 "/docker-entrypoin..." 27 hours ago Up 27 hours stack_elk_elasticsearch
Elasticsearch/Logstash/Kibana containers, and receiving port
57
cisco@mamikhai-ubuntu:/var/local/stack_elk/logstash_data$ netstat -n | grep 2103tcp6 0 0 192.168.30.101:2103 10.100.25.25:18566 ESTABLISHEDtcp6 0 0 192.168.30.101:2103 10.100.25.25:22514 ESTABLISHEDtcp6 0 0 192.168.30.101:2103 10.100.24.24:46247 ESTABLISHED
cisco@mamikhai-ubuntu:/var/local/stack_elk/logstash_data$ netstat -ln | grep 2103tcp6 0 0 :::2103 :::* LISTENudp6 0 0 :::2103 :::*
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Prepackaged: Telemetry Collection Stack
• Receiver >> visualization/dahsboards, notifications• https://xrdocs.io/telemetry/tutorials/2018-06-04-ios-xr-telemetry-collection-stack-intro/
• https://github.com/vosipchu/XR_TCS• Used in use case 1 (ML Anomaly Detection)
58TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Receiver connectedcisco@mamikhai-ubuntu:/opt/ML$ netstat -n | grep 5432tcp6 0 0 192.168.30.101:5432 192.168.30.137:41598 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.124:29437 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.135:55154 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.137:34688 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.125:22002 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.135:33199 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.124:25891 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.112:21773 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.178:20541 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.135:51278 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.112:30979 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.124:37490 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.137:54524 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.125:44535 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.124:32766 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.178:54599 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.137:31732 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.125:31449 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.112:51587 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.112:40991 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.135:16006 ESTABLISHEDtcp6 0 0 192.168.30.101:5432 192.168.30.178:16672 ESTABLISHED.
TECDEV-2765 59
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pipeline dump – receiving datacisco@mamikhai-ubuntu:/opt/ML$ pipeline troubleshooting startcisco@mamikhai-ubuntu:/opt/ML$ more ~/analytics/pipeline/bin/dump.txt
------- 2019-12-08 14:03:16.973093852 -0500 EST -------Summary: GPB(common) Message [192.168.30.125:21743(PE125)/Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/statistics-global msg len: 1549]{
"Source": "192.168.30.125:21743","Telemetry": {
"node_id_str": "PE125","subscription_id_str": "routing","encoding_path": "Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/statistics-
global","collection_id": 3007999,"collection_start_time": 1575831781969,"msg_timestamp": 1575831781969,"collection_end_time": 1575831781972
},"Rows": [
{"Timestamp": 1575831781971,"Keys": {
"instance-name": "ISIS".
60TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Pipeline dump – receiving data -continued},"Content": {
"per-area-data": {"level": "isis-level2","per-topology-data": {
"id": {"af-name": "ipv4","saf-name": "unicast","topology-name": "","vrf-name": ""
},"statistics": {
"ispf-run-count": 0,"nhc-run-count": 7,"periodic-run-count": 9127,"prc-run-count": 5,"spf-run-count": 9234
}},"statistics": {
"system-lsp-build-count": 12,"system-lsp-refresh-count":
10436}
61TECDEV-2765
},"statistics": {
"avg-csnp-process-time": {"nano-seconds": 69306,"seconds": 0
},"avg-csnp-recv-rate": 2,"avg-csnp-send-rate": 2,"avg-csnp-transmit-time": {
"nano-seconds": 43261,"seconds": 0
},"avg-hello-process-time": {
"nano-seconds": 17370,"seconds": 0
},"avg-hello-recv-rate": 0,"avg-hello-send-rate": 0,"avg-hello-transmit-time": {
"nano-seconds": 65944,"seconds": 0
.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Characteristics of TSDB
• Records are time-stamped or time-series
• Suitable for volatile value records: Financials, weather, sensory data
• Can be aggregated, compressed, sampled over time
• Data lifecycle management
• Efficient and flexible time-based retrieval
• Recently gaining popularity faster than other db categories
Time-referenced records, compression, retention, retrieval
TECDEV-2765 65
InfluxDB is optimized for collecting, storing, retrieving & processing of time series data
Characteristics of time series data
● All Time-stamped data (metrics, logs, traces)● Huge volumes of data● Push and pull collection methods ● Real-time processing (aggregations, alerts, analytics)● Time sensitive life-cycle (roll-ups, long-term storage,
eviction)● High variety of semi-structured data
Time series databases are taking the center stage in modern IT - InfluxDB leads the segment.
~125% increase in DB-Engines score over 24 months
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
InfluxDB, Open Source TSDB• https://github.com/influxdata/influxdb
• Part of a packaged telemetry stack https://github.com/vosipchu/XR_TCS
72TECDEV-2765
cisco@mamikhai-ubuntu:~$ influx -execute "show diagnostics"name: buildBranch Build Time Commit Version------ ---------- ------ -------1.5 cdae4ccde4c67c3390d8ae8a1a06bd3b4cdce5c5 1.5.1.name: systemPID currentTime started uptime--- ----------- ------- ------4407 2019-12-15T00:39:02.112022443Z 2019-09-04T20:03:01.975987784Z 2428h36m0.136034659scisco@mamikhai-ubuntu:~$ cisco@mamikhai-ubuntu:~$ influx -execute "show measurements" -database="mdt_db"name: measurementsname----Cisco-IOS-XR-clns-isis-oper:isis/instances/instance/levels/level/adjacencies/adjacencyCisco-IOS-XR-clns-isis-oper:isis/instances/instance/statistics-globalCisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-countersCisco-IOS-XR-ip-rib-ipv4-oper:rib/vrfs/vrf/afs/af/safs/saf/ip-rib-route-table-names/ip-rib-route-table-name/protocol/isis/as/informationCisco-IOS-XR-ip-rsvp-oper:rsvp/counters/interface-messages/interface-message.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Database performance
TECDEV-2765 73
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Python query, return a dataframe#! /usr/bin/env python
from influxdb import InfluxDBClient
client = InfluxDBClient(host='localhost', port=8086, database='mdt_db’)
data = client.query('SELECT non_negative_derivative("bytes-sent", 1s) *8 FROM "Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters" WHERE ("Producer" =~ /^PE124$/) AND ("interface-name" =~ /^tunnel/) AND time >= now() -1h GROUP BY "interface-name" LIMIT 120')
Print(data)
cisco@mamikhai-ubuntu:/opt/ML$ sudo ./read.pyResultSet({'(u'Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters', {u'interface-name': u'tunnel-te12400'})': [{u'non_negative_derivative': 130.66666666666666, u'time': u'2019-12-08T12:31:19.948Z'}, {u'non_negative_derivative': 392, u'time': u'2019-12-08T12:31:49.948Z'}, {u'non_negative_derivative': 213.5644059323446, u'time': u'2019-12-08T12:32:19.953Z'}, {u'non_negative_derivative': 261.10148019735965, u'time': u'2019-12-08T12:32:49.949Z'}, {u'non_negative_derivative': 2577.0666666666666, u'time': u'2019-12-08T12:33:19.949Z'}, {u'non_negative_derivative': 0, u'time': u'2019-12-08T12:33:49.948Z'}, {u'non_negative_derivative': 0, u'time': u'2019-12-08T12:34:19.951Z .
TECDEV-2765 75
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Visualize a query
TECDEV-2765 78
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Dashboard: organized collection of visualizations
TECDEV-2765 80
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Dashboard: JSON{"annotations": {
"list": [{
"$$hashKey": "object:83","builtIn": 1,"datasource": "-- Grafana --","enable": true,"hide": true,"iconColor": "rgba(0, 211, 255, 1)","name": "Annotations & Alerts","type": "dashboard"
}]
},"editable": true,"gnetId": null,"graphTooltip": 0,"id": 12,"iteration": 1575818163044,"links": [],"panels": [
{"collapsed": false,
81TECDEV-2765
"gridPos": {"h": 1,"w": 24,"x": 0,"y": 0
},"id": 64,"panels": [],"repeat": null,"title": "Summary","type": "row"
},{
"cacheTimeout": null,"colorBackground": false,"colorValue": false,"colors": [
"rgba(50, 172, 45, 0.97)","rgba(237, 129, 40, 0.89)","rgba(245, 54, 54, 0.9)"
],"datasource": "InfluxDB","format": "none",.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How to work with Neural Networks
• What is Machine Learning?
• What is a Neural Network
• Essentials
• Hyperparameters & weights
• How to Train
• Training parameters
• How training works
• Demo
TECDEV-2765 85
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
AI Using ML vs Expert Systems
In an Expert System, the full knowledge of the expert acquired is digitized, and is used in the decision making. An expert specifies all steps she/he took to make the decision, the basis for doing the same, and how to handle exceptions.
In Machine Learned solution, while giving the training examples, the expert is only asked for a decision. A "Supervised Learning" algorithm would determine, based on all the data available, mimic the end-behaviour of the expert.
TECDEV-2765 86
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
ML Basic Definition
Input
Developer Logic
Output Input Logic
Output
Traditional Programming ML Training
TECDEV-2765 87
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
ML Process & Use
Input
Model
Output
ML Production
Input Model
Output
ML Training
TECDEV-2765 88
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Data Science Terms
• Data Science – using data to solve problems
• AI – Teaching computers to solve problems
• ML – Computers teaching themselves
• Supervised – known outcome
• Unsupervised – unknown outcome
• DL – Artificial neural network
Source: https://qph.fs.quoracdn.net/main-qimg-
cf42db79eb79239884a29568fcc24002-c
TECDEV-2765 90
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Supervised Learning
Source: https://aldro61.github.io/microbiome-summer-school-2017/figures/figure.classification.vs.regression.png
TECDEV-2765 91
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Why leverage ML
• There is data available
• People make mistakes
• Is it an arduous/boring task
• Requires constant attention
• Is it a difficult decision or prediction
• Can human bias impact the outcome
• What’s the motivation
• Is there an opportunity for efficiency or cost savings
• Can it grow the business
TECDEV-2765 94
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What does Machine Learning technically do?
• Defined: An algorithmic approach to iteratively tuning the parameters of a Statistical Model to achieve the best estimated model – effectively training than explicitly programming.
• Needs three things:
• Input data – Observations and their associated features related to some phenomenon
• Outcomes – A feature we are trying to predict (supervised learning)
• Measure of success – A measurement of how well our model predicts the outcome using the input data that can be used as a feedback signal
• The main problem in ML/DL – how do we transform our data in a meaningful way to learn this relationship?
TECDEV-2765 96
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How to work with Neural Networks
• What is Machine Learning?
• What is a Neural Network
• Essentials
• Hyperparameters & weights
• How to Train
• Training parameters
• How training works
• Demo
TECDEV-2765 99
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What is a neural network?
TECDEV-2765 100
computing systems inspired by the biological neural networks that constitute animal brains
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What is a neural network?
TECDEV-2765 101
They all boil down to same thing. Requires an input (a list of numbers) and use this to generate an output (another list of numbers)
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What is a neural network?
TECDEV-2765 102
Input could be pixel values of an image, output could be an indicator of the content of that image
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Parameters and Hyperparameters of a NN
TECDEV-2765 103
The hyperparameters of a network define the structure and are fixed.
The parameters of the network are not fixed and are tuned during the training phase.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Parameters and Hyperparameters of a NN
TECDEV-2765 104
The hyperparameters of a network define its overall structure;
• Number of input nodes• Number of output nodes• Number of hidden layers and number
of neurons in each hidden layer• Activation functions of layers• …
Number of input/output nodes are often dictated by the dataset. However, number of hidden layers is up to the user. Too many hidden layers leads to overfitting, too little leads to bad performance!
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Source: https://medium.com/greyatom/what-is-underfitting-and-overfitting-in-machine-learning-and-how-to-deal-with-it-6803a989c76
Overfitting – An illustration
TECDEV-2765 105
Typically, overfitting will not occur if the size of your datasetis at least twice as large as the number of parameters (weights)
in your network!
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Parameters and Hyperparameters of a NN
TECDEV-2765 106
The parameters of a network define its behavior;
Each line between two nodes (neurons) represents a connection between them. Each connection has a weight; a floating point number.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Parameters and Hyperparameters of a NN
TECDEV-2765 107
The parameters of a network define its behavior;
The state of a layer in the network depends only on the state of the previous layer, and is computed using the weights of the connections to that layer.
The weights of the network are tuned during training!
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How a Neural Network works
TECDEV-2765 108
Let’s see how a Neural Network computes the output from a given input
input output
hidden Hyperparameters:
• Two input nodes• One hidden layer, containing
three nodes• One output node• Activation function:
Sigmoid (we will see this later)
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How a Neural Network works
TECDEV-2765 109
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How a Neural Network works
TECDEV-2765 110
0.2
0.5
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How a Neural Network works
TECDEV-2765 111
0.2
0.5
0.8
-0.2
0.8 * 0.2 + 0.5 * (-0.2)= 0.16 -0.1= 0.060.06
3.96
0.50
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Activation Functions
TECDEV-2765 112
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How a Neural Network works
TECDEV-2765 113
0.2
0.5
0.06
3.96
0.50
Sigmoid(0.06) = 0.51499…
Sigmoid(3.96) = 0.98129…
Sigmoid(0.50) = 0.62245…
0.51
0.98
0.62
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Source: https://qph.fs.quoracdn.net/main-qimg-8a19e73bffab9a7f6eab55fd5b47c00a
TECDEV-2765 114
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Source: https://cdn-images-1.medium.com/max/2000/1*cuTSPlTq0a_327iTPJyD-Q.png
A variety of different ANN architectures exist to solve
different types of problems…
For example:Convolutional NNs for computer
visionRecurrent NNs for natural
language processingLong/Short Term Memory NNs for
time series
TECDEV-2765 115
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Source: https://www.mdpi.com/applsci/applsci-09-03169/article_deploy/html/images/applsci-09-03169-g001.png
TECDEV-2765 120
MNIST Dataset
• Contains 10,000 handwritten digits
• Each digit is 28 * 28 pixels
Example of Neural Network classifying handwritten digits
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 121
28 * 28= 784
neurons
10neurons
Neural Network to classify the digits
• 1 input layer of 784 neurons• 3 hidden layers, having a total of
10,000 neurons• 1 output layer of 10 neurons
Example of Neural Network classifying handwritten digits
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Example of Neural Network classifying handwritten digits
TECDEV-2765 123
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
How to work with Neural Networks
• What is Machine Learning?
• What is a Neural Network
• Essentials
• Hyperparameters & weights
• How to Train
• Training parameters
• How training works
• Demo
TECDEV-2765 124
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 125
Training parameters are considered hyperparameters of the system. That means they do not change.
However, during training, the network parameters are continuously changed!
Training parameters
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 126
Training parameters define how the network weights are tuned during training. Some training parameters are;
• Learning rate• Learning rate decay• Batch size• …
Training parameters
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 127
In our demo, we will train a Neural Network to classify the species of a flower based on the sizes of the sepals and petals.
Three species, so three output nodes!
How training works - Example
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 128
Four measurements, so four input nodes!
How training works - Example
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 129
5.1
3.5
1.4
0.2
How training works - Example
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 130
5.1
3.5
1.4
0.2
Setosa
Versicolor
Virginica
How training works - Example
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 131
5.1
3.5
1.4
0.2
Setosa
Versicolor
Virginica
Setosa -> (1, 0, 0)
Versicolor -> (0, 1, 0)
Virginica -> (0, 0, 1)
How training works - Example
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 132
5.1
3.5
1.4
0.2
Setosa
Versicolor
Virginica
Setosa = (1, 0, 0)
Versicolor = (0, 1, 0)
Virginica = (0, 0, 1)
(5.1, 3.5, 1.4, 0.2) -> (1, 0, 0)
How training works - Example
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 133
5.1
3.5
1.4
0.2
Setosa
Versicolor
Virginica
Setosa = (1, 0, 0)
Versicolor = (0, 1, 0)
Virginica = (0, 0, 1)
(5.1, 3.5, 1.4, 0.2) -> (0.3, 0.2, 0.3)
During training:
input output
(5.1, 3.5, 1.4, 0.2) -> (0.4, 0.1, 0.2)
*training step happens*
How training works - Example
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 134
(5.1, 3.5, 1.4, 0.2) -> (0.3, 0.2, 0.3)
During training:
input output
(5.1, 3.5, 1.4, 0.2) -> (0.4, 0.1, 0.2)
*training step happens*
How training works - Example
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 135
Dataset will be split into two parts;
• Training set: used to train the network• Test set: used to evaluate the performance of the network
How training works – Test set and Training set
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What can a neural network do?
TECDEV-2765 139
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What can a neural network do?
141TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What can a neural network do?
144TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What can a neural network do?
145TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Adversarial attacks on Neural Networks
146TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Want to practice?
• MNIST handwritten digits at http://yann.lecun.com/exdb/mnist/
• CAIDA network-related data sets at http://www.caida.org/data/overview/
• Wide variety of datasets at https://www.kaggle.com/datasets
• Variety of Pcap files at https://www.netresec.com/?page=PcapFiles
• NOAA weather data sets at https://www.ncdc.noaa.gov/cdo-web/datasets
• Wide variety of datasets from Deep Learning at http://deeplearning.net/datasets/
• BGP datasets at http://www.sfu.ca/~ljilja/cnl/projects/BGP_datasets/index.html
Publicly Available Datasets
147TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Machine Learning/Deep Learning at Cisco Systems
• Cisco Encrypted Traffic Analysis
• Malware detection
• Cisco DNA Analytics
• Optimizes network functions across users
• Cisco Intersight
• Detect issues and connect with TAC
• Cisco Spark Assistant
• Meeting automation and optimization
• Cisco Crosswork
• Suite
Source: https://png.kisspng.com/20180401/jdw/kisspng-cisco-systems-router-
data-center-computer-software-ai-
5ac115ef7ef1f6.91676325152260350352.png
TECDEV-2765 151
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
CX: Telemetry >> AI / ML
152TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Identifying malicious encrypted traffic
Model
Google Search Page Download
src dst
Packet lengths, arrival times and durations tend to be inherently different for
malware than benign traffic
Client
SentPackets
ReceivedPackets
Server src dst
Exfiltration and Keylogging
src dst
Initiate Command and Control
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Supervised ML for root cause probability
TECDEV-2765 156
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Unsupervised ML for event clustering
TECDEV-2765 157
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Notify to similar “situations”
TECDEV-2765 158
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Machine Learning: Intelligent Control at Scale
• Supervised ML for root cause probability• Learn from operators marking of causal alerts
• Unsupervised ML for grouping alerts into “Situations” [clustering], and noise reduction• Recognize patterns in reported events
• Assign/adjust significance to events
159TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Process Phases for Applied ML/DL - CRISP-DM
161
Business Understanding
• The most important phase!
• The right answers to the wrong questions
• Framing business problems as data problems
• Decomposing large problems to smaller ones
• Defining baselines
• Success criteria/Expected Value
Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-
DM_Process_Diagram.png
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Process Phases for Applied ML/DL - CRISP-DM
162
Data Understanding
• The second most important phase!
• Strengths/limitations of the “raw material”
• Data cost and benefits
• Collect, describe, explore, verify - EDA
• Data needs in Expected Value context
Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-
DM_Process_Diagram.png
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Process Phases for Applied ML/DL - CRISP-DM
163
Data Preparation
• Selecting data/Data integration
• Feature extraction/engineering
• Cleaning data based on EDA
• Standardization/Normalization
• Missing values
• Logic checks
Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-
DM_Process_Diagram.png
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Process Phases for Applied ML/DL - CRISP-DM
164
Modeling
• Supervised/unsupervised learning
• Model selection
• Classification or Regression?
• Prediction or Inference?
• Accuracy or Interpretability?
• Model assumptions
• Model comparison/assessment
• Parameter tuning
Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-
DM_Process_Diagram.png
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Process Phases for Applied ML/DL - CRISP-DM
165
Evaluation
• Does selected model(s) satisfy original goal(s)?
• Staged deployment or test/control live deployment?
• Stakeholder sign-off
• Regulatory concerns
• Accuracy or Interpretability
• Go or no-go?
Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-
DM_Process_Diagram.png
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Process Phases for Applied ML/DL - CRISP-DM
166
Deployment
• Integration in information system/business process
• Model socialization
• Monitoring/maintenance
• Automation in production
• Documentation/reporting
• Debrief
• Rinse and repeat! Welcome to the life of a data scientist!
Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/CRISP-
DM_Process_Diagram.png
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Production Line
YANG data Telemetry streamer
Receiver / Collector
Bus / Pipeline
Time Series DB
ML models AI
Visualization
Orchestration
TECDEV-2765 169
Case 1: “Eyes on business comms”Anomaly Detection
https://github.com/mikemikhail/ML-anomaly_detectionhttps://github.com/mikemikhail/ML-anomaly_detection-demo
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Data Source
Business app data
Store 135
DC 124
Store 137 Store 178
DC 125
Store 112
timezone timezone -1h
18 unidirectional comm
TECDEV-2765 171
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Visualized – 8days
172TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Visualized – 2 business hours
173TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Visualized – 2 off hours
174TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Why?
Affordable and fastDATA via
Telemetry
NETCONF
Netflow
SNMP
Syslog
db
TECDEV-2765 176
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Why?
Affordable and fastDATA via
Telemetry
NETCONF
Netflow
SNMP
Syslog
db
TECDEV-2765 177
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Why?
Affordable and fastDATA via
Telemetry
NETCONF
Netflow
SNMP
Syslog
db
TECDEV-2765 178
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
ML versus Designed Logic Systems
• ML is typically cheaper, faster to achieve desired behavior
➢ Eliminate the need to design the logic. Only design input/output data, model architecture, parameters, system. Sometimes start by mimicking manual/human process characteristics. Lots of experimentation & testing
➢ Cheaper and faster debug/update cycle. Mostly experimentation & testing
➢ No perfect design. Actually perfection is undesirable
• Designed Logic is needed for:
➢ Absolutely clear and universal production (but is there, really?!)
➢ Has to be perfect the first time (e.g. a new probe on a planet never explored before)
Where, and why ML is advantageous
179TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Next... Evolution
• Add more data: quantity, types
• Gets “smarter” with: more historical, weather/percipitaion, seasonal/holidays, promotions, price fluctuations, …!
• When “smart” enough, can predict impact of…, design/modeling tool
• Within resources, and aware of ROI
• ML(ML) (future)
• ML models to decide on ML models and attributes
• Models of models (future)
• Sticking models together, will synergize
This is just a start,… What you make possible?!
180TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What’s an ML Model? Supervised example
A tiny trainable brain
182TECDEV-2765
INFERENCEDATA –t2
MODEL compare
LABELS –t2
loss
INFERENCEDATA –t1
MODEL compare
LABELS –t1
✔︎, RMSE
INFERENCEDATA 0
MODEL compare
ACTUAL 0
?, RMSE
TRAIN
Validate
Predict
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Basis for Inference
Data inputs and validation data
Targetlast 60m
T –
60
m
T –
1d
T –
1w
T –
2w
day
week
TECDEV-2765 183
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Training: Larger Datasets
Data inputs and label data: 1d periodsday
week
TECDEV-2765 184
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Training
Data inputs and label data: -1d
Labels-1d
T –
60
m
T –
1d
T –
1w
T –
2w
day
week
TECDEV-2765 185
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Data Engineering: Why & What?
• Arbitrary ML models do not know my system or meaning of my data
• I don’t know internals/logic of model!
• Domain knowledge becomes critical:
• Which data to collect?
• Data trends? Is it time-dependent? Is it strongly cyclical?
• What are you actually looking to learn from the data?
• With your understanding and with an understanding of what certain ML techniques are good at, you can start to experiment!
It is about the data!
TECDEV-2765 190
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Useful to Learn and Practice..
libraries and tools, samples:
• Numpy: Organizing (arrays) and handling/manipulating (algebra) of numbers
• Pandas: Data structure and analysis
• Matplotlib, Bokeh, Seaborn, PyPlot, D3.js: Graphing
• Jupyter Notebooks: A scratchpad for gluing it all together!
Libraries and tools to prep and work with data
RawData
StructureData
DataPre-
processing
DataExploration
Insights, Reports, Graphs...
TECDEV-2765 191
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Adhere to the “Rules”
• Specific data type(s): float64, float32, string, …
• Rank [axes]: Scalar [0], vector [1], array [2], cube [3], …
• But batch size can vary (per cycle, not per input)
ML model may accept (or risk exception/error):
• A 2-dimensional labeled data structure with columns of potentially different types. Most commonly used pandas object.
DataFrame
TECDEV-2765 192
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
DataFramedef read_train_long(record_count, label_prefix, verbose=True):.
print('\ntraining long data')print(train.describe())
193TECDEV-2765
training long datad_te11200_previous d_te11200_1d d_te11200_1w d_te11200_2w \
count 2.880000e+03 2.880000e+03 2.880000e+03 2.880000e+03mean 1.093179e+07 1.210950e+07 1.215980e+07 2.038959e+07std 1.072837e+07 1.140074e+07 1.141777e+07 1.141292e+07min 0.000000e+00 1.365000e+03 1.087200e+04 2.841000e+0325% 1.663094e+06 1.665956e+06 1.689180e+06 1.086874e+0750% 5.952412e+06 8.211694e+06 8.303540e+06 2.399129e+0775% 1.959832e+07 2.178496e+07 2.212936e+07 3.107957e+07max 3.269652e+07 3.305397e+07 3.308679e+07 3.262037e+07
d_te11201_previous d_te11201_1d d_te11201_1w d_te11201_2w \count 2.880000e+03 2.880000e+03 2.880000e+03 2.880000e+03mean 7.527995e+06 8.306467e+06 8.337991e+06 1.384803e+07std 7.289889e+06 7.747967e+06 7.757351e+06 7.804774e+06min 0.000000e+00 1.380000e+03 8.182000e+03 1.001000e+03
Column “label”
df is 72 x 2880
Math description
of column
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
The Data in 4x 24-hour DataFrame, [4x18]x2880
194TECDEV-2765
4 24-hour_period x 18 path x 2880 30-second_field = 72 column x 2880 row DataFrame
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
DataFrame: same format, smaller sizevalidation data
d_te11200_previous d_te11200_1d d_te11200_1w d_te11200_2w \count 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02mean 1.358485e+06 9.795289e+05 1.179847e+06 1.246457e+06std 8.000545e+05 6.292864e+05 6.797141e+05 7.001280e+05min 1.380000e+03 7.731200e+04 9.600000e+01 1.928000e+0325% 4.905140e+05 3.478915e+05 6.696790e+05 8.413410e+0550% 1.362629e+06 1.087073e+06 1.240100e+06 1.377682e+0675% 2.190353e+06 1.559045e+06 1.827315e+06 1.777683e+06max 2.840542e+06 2.067730e+06 2.336623e+06 2.262407e+06
d_te11201_previous d_te11201_1d d_te11201_1w d_te11201_2w \count 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02mean 7.821397e+05 1.026548e+06 7.012429e+05 7.430594e+05std 5.078806e+05 4.558566e+05 4.654541e+05 5.078973e+05min 0.000000e+00 0.000000e+00 0.000000e+00 1.350000e+0325% 3.321160e+05 5.965380e+05 4.511570e+05 3.695710e+0550% 9.229315e+05 9.878900e+05 8.389290e+05 7.320545e+0575% 1.225480e+06 1.511388e+06 1.236286e+06 1.205447e+06max 1.499793e+06 1.857222e+06 1.556056e+06 1.526029e+06.
195TECDEV-2765
df is 72 x 120
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Fetching, Formatting, Conditioning the Datadef read_data(field_key, measurement_name, condition1, condition2, condition3, limit, label):
query_db = str('SELECT "%s" FROM "%s" WHERE %s AND %s AND %s LIMIT %d ' % (field_key, measurement_name, condition1, condition2, condition3, limit+1))
data_db = client.query(query_db)print('\ndata_db:\n', data_db)data_df = pd.DataFrame(data_db[str(measurement_name)])print('\ndata_df:\n', data_df)print('\ndata_df description:\n', data_df.describe())data_df.columns = [label]data_df.reset_index(drop=True, inplace=True)data_df.fillna(method='ffill', inplace=True)data_df.fillna(method='bfill', inplace=True)data_df -= data_df.min()data_df.drop(data_df.index[0], inplace=True)print('\ndata_df:\n', data_df)print('\ndata_df description:\n', data_df.describe())sys.exit()# data_df = data_df.sub(data_df.shift(fill_value=0))# print('\n', query_db, '\n', data_df.describe())return data_df
196TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Raw Datadata_db:defaultdict(<class 'list'>, {'Cisco-IOS-XR-infra-statsd-oper:infra-
statistics/interfaces/interface/latest/generic-counters': bytes-sent2020-01-03 06:34:04.032000+00:00 22698402082020-01-03 06:34:34.031000+00:00 22698402082020-01-03 06:35:04.027000+00:00 22698402082020-01-03 06:35:34.031000+00:00 22698402082020-01-03 06:36:04.032000+00:00 22698408202020-01-03 06:36:34.041000+00:00 22698423502020-01-03 06:37:04.032000+00:00 22698425542020-01-03 06:37:34.036000+00:00 22698425542020-01-03 06:38:04.032000+00:00 22698425542020-01-03 06:38:34.033000+00:00 22698425542020-01-03 06:39:04.090000+00:00 22698425542020-01-03 06:39:34.035000+00:00 22698535832020-01-03 06:40:04.036000+00:00 22698541292020-01-03 06:40:34.033000+00:00 22698541292020-01-03 06:41:04.035000+00:00 22698542282020-01-03 06:41:34.033000+00:00 22698557132020-01-03 06:42:04.035000+00:00 2269856604.
197TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
In Single Column DataFramedata_df:
bytes-sent2020-01-03 06:34:04.032000+00:00 22698402082020-01-03 06:34:34.031000+00:00 22698402082020-01-03 06:35:04.027000+00:00 22698402082020-01-03 06:35:34.031000+00:00 22698402082020-01-03 06:36:04.032000+00:00 22698408202020-01-03 06:36:34.041000+00:00 22698423502020-01-03 06:37:04.032000+00:00 22698425542020-01-03 06:37:34.036000+00:00 22698425542020-01-03 06:38:04.032000+00:00 22698425542020-01-03 06:38:34.033000+00:00 22698425542020-01-03 06:39:04.090000+00:00 22698425542020-01-03 06:39:34.035000+00:00 22698535832020-01-03 06:40:04.036000+00:00 22698541292020-01-03 06:40:34.033000+00:00 22698541292020-01-03 06:41:04.035000+00:00 22698542282020-01-03 06:41:34.033000+00:00 22698557132020-01-03 06:42:04.035000+00:00 22698566042020-01-03 06:42:34.034000+00:00 22698662682020-01-03 06:43:04.034000+00:00 2269866916.
198TECDEV-2765
data_df description:bytes-sent
count 2.881000e+03mean 2.284429e+09std 1.233722e+07min 2.269840e+0925% 2.271348e+0950% 2.283434e+0975% 2.296978e+09max 2.303451e+09
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Column Label Changed, Baselined, Conditioneddata_df:
d_te11200_previous1 02 03 04 6125 21426 23467 23468 23469 234610 234611 1337512 1392113 1392114 1402015 1550516 1639617 2606018 2670819 28328.
199TECDEV-2765
data_df description:d_te11200_previous
count 2.880000e+03mean 1.459358e+07std 1.233636e+07min 0.000000e+0025% 1.507463e+0650% 1.362955e+0775% 2.713798e+07max 3.361109e+07
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Constructing Multi-Column DataFramedef read_train(record_count, label_prefix, verbose=True):
for interface in tunnel_ifs:query_if = str('("interface-name" = \'%s\')' % (interface))label = str(label_prefix + interface[-7:] + "_previous")read_if = read_data('bytes-sent', 'Cisco-IOS-XR-infra-statsd-oper:infra-
statistics/interfaces/interface/latest/generic-counters', query_if, 'time >= now() - {} - 2h -1m'.format(previous), 'time <= now()', record_count, label)
if interface == tunnel_ifs[0]:train = read_if
else:train = pd.concat([train, read_if], axis=1, sort=False)
label = str(label_prefix + interface[-7:] + "_1d")read_if = read_data('bytes-sent', 'Cisco-IOS-XR-infra-statsd-oper:infra-
statistics/interfaces/interface/latest/generic-counters', query_if, 'time >= now() - {} - 1d - 1h -1m'.format(previous), 'time <= now()', record_count, label)
train = pd.concat([train, read_if], axis=1, sort=False).
label = str(label_prefix + interface[-7:] + "_2w")read_if = read_data('bytes-sent', 'Cisco-IOS-XR-infra-statsd-oper:infra-
statistics/interfaces/interface/latest/generic-counters', query_if, 'time >= now() - 3w - {} - 1h -1m'.format(previous), 'time <= now()', record_count, label)
train = pd.concat([train, read_if], axis=1, sort=False)
train.fillna(method='ffill', inplace=True)return train
200TECDEV-2765
Concatenate columns
Unique column label
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Create Model
Every n cycles
Training: Initial, and Periodic; Predict
“Monitor” flowchart
203TECDEV-2765
10 minutes
Parameters & functions
Model exists?
60 minute train,
validate, predict
Very large data train, validate
Large data train,
validate
Data
NRMSE
No
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
The Neural Network, and Sample Cycle.
my_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)dnn_regressor = tf.estimator.DNNRegressor(
feature_columns=construct_feature_columns(training_examples),hidden_units=hidden_units,optimizer=my_optimizer,model_dir= model_directory,label_dimension= len(tunnel_ifs) + len(physical_ifs)
.
hidden_units = [72, 36, 18] # probably an overkill for our small scale.
cycle += 1print('cycle number ', cycle)dnn_regressor = train_nn_regression_model(
learning_rate = 0.0003,steps = 1000,batch_size = 120,hidden_units = hidden_units,training_examples = read_train(120, 'd_'),training_targets = read_train_target(120, 'l_'),validation_examples = read_validate(120, 'd_'),validation_targets = read_last_target(120, 'v_'),prediction = True
.
206TECDEV-2765
Set once, for a new model
Can change every call
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
+ Normalization Layerdef read_train_long(record_count, label_prefix, verbose=True):
global feature_meanglobal feature_stdglobal feature_max
.if feature_mean == 0:
feature_mean = train.mean().mean()print('feature mean: ', feature_mean)
if feature_std == 0:feature_std = train.std().mean()print('feature std: ', feature_std)
if feature_max == 0:feature_max = train.max().mean() / 24 # The mean max per 1 hourprint('feature max: ', feature_max)
.def construct_feature_columns(input_features):.# epsilon = 0.000001
.# choose best normalization of input data
.return set([tf.feature_column.numeric_column(my_feature, normalizer_fn=lambda val: (val) /
(feature_max))for my_feature in input_features])
207TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
The Model
208TECDEV-2765
Input
72
x b
atc
h
Outp
ut
18
x b
atc
h
Norm
aliz
ation layer
72
nodes
Input
laye
r 7
2 n
odes
Hid
den layer
Hid
den layer
Outp
ut
layer
18
nodes
Hid
den layer
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
The data - inputsfeature mean: 7692395.787210648feature std: 7084487.678199986feature max: 899857.4594907407
training long datad_te11200_previous d_te11200_1d d_te11200_1w d_te11200_2w \
count 2.880000e+03 2.880000e+03 2.880000e+03 2.880000e+03mean 1.460392e+07 1.148071e+07 1.377143e+07 1.286884e+07std 1.208010e+07 1.079281e+07 1.288531e+07 1.193284e+07min 5.194400e+04 8.214400e+04 7.731200e+04 1.128000e+0325% 5.464170e+06 2.660394e+06 3.144108e+06 2.952690e+0650% 7.655800e+06 5.238823e+06 6.525733e+06 6.685706e+0675% 2.390421e+07 2.051300e+07 2.478536e+07 2.295742e+07max 4.010818e+07 3.499195e+07 3.808687e+07 3.528133e+07
d_te11201_previous d_te11201_1d d_te11201_1w d_te11201_2w \count 2.880000e+03 2.880000e+03 2.880000e+03 2.880000e+03mean 9.872998e+06 7.983878e+06 9.340476e+06 8.721320e+06std 8.096162e+06 7.326994e+06 8.648336e+06 8.025967e+06min 0.000000e+00 1.080000e+03 0.000000e+00 0.000000e+0025% 3.781685e+06 2.003244e+06 2.256103e+06 2.102746e+0650% 5.334513e+06 3.862060e+06 4.561317e+06 4.507384e+06.
209TECDEV-2765
Data indicators,Useful for normalization
Column key
Record count
Min value
Max value
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
The Training.period 08 : 3582.59
training_predictions baoundaries[ 7208.3516 3251.541 7443.031 4104.243 4939.107 4307.9517-4889.6 -2828.935 5941.4014 726.0451 7325.6025 4610.5356-3598.1191 3449.1812 635.07806 4949. 4667.8926 -1990.8337 ]
[228423.4 177642.34 234673.56 181417.75 247542.95 188667.36 242869.14185446.6 211784.83 67781.5 213718.23 222849.58 222998.84 172893.5567602.49 174859.56 182980.92 178175.17]
validation_predictions boundaries[ 6515.7734 3770.0752 8726.353 3760.6396 13467.836 3967.8716-5080.483 -3837.6772 5243.997 539.8233 8261.776 11545.85-3959.705 3852.7988 432.32617 4614.4375 4369.189 -3105.7751 ]
[193788.67 157614.84 150174.69 216238.94 187304.67 213302.8288468.47 197036.98 181182.31 51306.69 143677.8 166325.1265046.75 152532.97 51454.254 206665.23 205994.08 187957.39 ]period 09 : 3548.03
Model training finished.Final RMSE (on training data): 3548.03Final RMSE (on validation data): 22812.22Final NRMSE (/prediction, /actual): 0.25, 0.25 cycle number 788
210TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Training Makes Perfect!
211TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Best Indicator: NRMSE
212TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
A Short Term, Small Issue, & the Morning After
213TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
A Severe Issue, 1 Store Partially Crippled
214TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
A Severe Issue, the 2 main events
215TECDEV-2765
Memory issueNRMSE > 430
Reset (reload)NRMSE > 1,450
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Source: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
The importance of visualizing your data!
TECDEV-2765 219
All four of these four datasets have the same
• Mean of x and mean of y• Variance of x and variance of y• Correlation coefficient between x and y• Linear regression line (y = 3 + 0.5x)• Coefficient of determination of the
linear regression
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Quality Assurance
• Check data every step of the way: description, extents, format, graph
• Ensure data quality: real raw data, complete, sufficient [quality learning experience!]
• Run a control model, maybe testing next release
• Avoid overfitting. Train on bigger set periodically
• Test
• Monitor the “monitors”
Ensure model is doing the job, every step of the way, better as time goes
TECDEV-2765 220
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Check the Data: Inputtraining target
l_te11200 l_te11201 l_te13501 l_te13502 l_te13703 \count 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02 mean 2.113163e+06 9.949988e+05 1.371433e+06 5.974629e+05 1.400498e+06 std 6.925446e+05 7.613872e+05 8.314939e+05 6.123204e+05 9.417285e+05 min 7.852000e+04 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 25% 2.056924e+06 1.579850e+04 5.622930e+05 1.453625e+04 2.296910e+05 50% 2.375026e+06 1.365219e+06 1.964674e+06 3.089710e+05 1.966452e+06 75% 2.392869e+06 1.707655e+06 1.982820e+06 1.342634e+06 2.208534e+06 max 3.309991e+06 1.722082e+06 2.450328e+06 1.357508e+06 2.224787e+06
l_te13704 l_te17801 l_te17802 l_te12400 l_te12401 \count 1.200000e+02 1.200000e+02 1.200000e+02 1.200000e+02 120.000000 mean 5.292162e+05 9.654491e+05 1.236253e+06 1.804734e+06 32161.100000 std 6.398793e+05 7.201600e+05 5.183502e+05 5.926864e+05 19069.038165 min 0.000000e+00 6.040000e+04 0.000000e+00 6.040000e+04 0.000000 25% 1.630975e+04 2.968030e+05 1.061104e+06 1.753321e+06 15048.250000 50% 3.639200e+04 9.669420e+05 1.078636e+06 2.028434e+06 32943.000000 75% 1.342081e+06 1.211375e+06 1.683253e+06 2.046180e+06 48227.000000 max 1.506047e+06 2.422147e+06 2.238537e+06 2.824590e+06 63574.000000
.
221TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Check the Data: Model Output.training_predictions baoundaries[ -9676.203 43134.777 30126.502 49435.23 -32687.848
53458.71 -41498.527 2386.6956 -7947.478 841.84825334.666 -29982.818 -35702.715 39189.33 822.512443476.05 48315.46 459.85422]
[2567748.5 1706653.8 1980904.5 2473544.2 2354506.8 2123372.2704984.5 1812864.2 2203519. 67233.766 1694452.8 1959683.92268782.5 1549306.6 66137.98 2231304.2 1900274.9 1607212.8 ]
validation_predictions boundaries[ 9514.773 37849.348 3103.1733 43236.957 -5364.3784
22101.646 -17330.65 16654.14 8049.155 765.13153570.572 -6731.2876 -15120.497 33285.02 766.89764
37848.816 19647.463 13101.8 ][2846600. 2264547.2 3210291. 2034950. 3650229.2 1717540.41242460.5 1751227.8 2441805.8 68177.71 2717810.5 3045913.51010453.4 2042300.1 66662.13 1811712.9 1519572.6 1552536.8 ]period 08 : 55037.89
.
222TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Check Progress: RMSE.
periods = 10steps_per_period = steps / periods
.for period in range (0, periods):
# Train the model, starting from the prior state.dnn_regressor.train(
input_fn=training_input_fn,steps=steps_per_period
).
# Compute training and validation loss.training_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(training_predictions, training_targets))validation_root_mean_squared_error = math.sqrt(
metrics.mean_squared_error(validation_predictions, validation_targets))# Occasionally print the current loss.print(" period %02d : %0.2f" % (period, training_root_mean_squared_error))# Add the loss metrics from this period to our list.training_rmse.append(training_root_mean_squared_error)validation_rmse.append(validation_root_mean_squared_error)
.
223TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Check Progress: RMSE.RMSE (on training data):
period 00 : 3117491.16period 01 : 2921701.20period 02 : 2787017.03period 03 : 2680300.57period 04 : 2589999.24period 05 : 2511086.36period 06 : 2441079.98period 07 : 2378441.63period 08 : 2321990.45period 09 : 2270826.91
Model training finished..
224TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Visualize Progress: RMSE.
for period in range (0, periods):# Train the model, starting from the prior state.dnn_regressor.train(
input_fn=training_input_fn,steps=steps_per_period
)# Take a break and compute predictions.training_predictions = dnn_regressor.predict(input_fn=predict_training_input_fn)training_predictions = np.array([[item['predictions'][i] for i in range(0,
len(tunnel_ifs) + len(physical_ifs))] for item in training_predictions]).
validation_predictions = dnn_regressor.predict(input_fn=predict_validation_input_fn)
validation_predictions = np.array([[item['predictions'][i]for i in range(0, len(tunnel_ifs) + len(physical_ifs))] for item in validation_predictions]).
if if_plot:# RMSE values and graphsglobal x_periodsx_periods += periodsplt.ion()
.
225TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Visualize Progress: RMSE
226TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Validate: Visualize Prediction vs. Actual
227TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Validate: Visualize Prediction vs. Actual
228TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Quality Indicators
• Newborns start ~random, struggle a bit, then learn at a reasonable rate
• Gets better with experience, without overfitting. Slow to “forget”
• Learns better with bigger and fatter data sets
• Never at a steady state, never perfect
Getting better with time, and with data!
TECDEV-2765 230
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
When Will My Switch Melt?
233TECDEV-2765
• What data might be relevant?
• Let’s consider the application of IP access lists or role-based access lists to a Catalyst edge switch
• We can monitor the security access control entries, a limited resource…
GET https://{{switch}}/restconf/data/tcam-details/tcam-detail
{
"Cisco-IOS-XE-tcam-oper:tcam-detail": {
"asic-no": 0,
"name": "Security Access Control Entries",
"hash-entries-max": 0,
"tcam-entries-max": 5120,
"hash-entries-used": 0,
"tcam-entries-used": 188
}
}
• How can we predict when we may run out of TCAM space?
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Yep, The Data is Available Over CLI Also
234TECDEV-2765
DC-C9300-1-Fabric1#show platform hardware fed switch active fwd-asic resource tcam utilization
CAM Utilization for ASIC [0]
Table Max Values Used Values
--------------------------------------------------------------------------------
Unicast MAC addresses 32768/1024 25/21
L3 Multicast entries 8192/512 0/9
L2 Multicast entries 8192/512 0/11
Directly or indirectly connected routes 24576/8192 51/151
QoS Access Control Entries 5120 85
Security Access Control Entries 5120 188
Ingress Netflow ACEs 256 6
Policy Based Routing ACEs 1024 22
Egress Netflow ACEs 768 6
Flow SPAN ACEs 1024 13
Control Plane Entries 512 259
Tunnels 512 18
Lisp Instance Mapping Entries 512 8
Input Security Associations 256 4
Output Security Associations and Policies 256 5
SGT_DGT 8192/512 1/1
CLIENT_LE 4096/256 3/0
INPUT_GROUP_LE 1024 0
OUTPUT_GROUP_LE 1024 0
Macsec SPD 256 2
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What Do We Know About The Data?
• TCAM entries are a physically limited resource
• They are shared across multiple features, but tend to have fixed pool sizes
• Utilization changes over time
• On enterprise network edge switches a strong correlation to when users logon
• Would some form of linear regression be useful…? Let’s investigate!
235TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What have we learned?
• Increasing the order of the polynomial you fit can give a curve that tracks very closely to your data…but will it be useful for prediction against anything but your test data
• Naïve use of linear regression, as an example, may not give good results:
• Too long a time window => really bad fit
• Too short a time window => too spiky, false positives
• Conclusion:
• Run multiple experiments with your observed data
• Play with the hyperparameters you use
• Short-term predictions are pretty good for this use case
• Have a backup!!
241TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Can we do better than LR?
• LR & ARIMA perhaps feel like they’re just applied statistics…and yes, that’s right!
• But are there other approaches?
• We’ve talked about neural networks and application to image processing and other use cases…
• But can we train a neural network to do time-series predictions over hardware resource utilization?
• Let’s take our example and scale it up…
242TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
What have we learned?
• Yes, an RNN can be used with univariate time series data…and it can do quite well for this use case, but so can simple LR and ARIMA at lower cost!
• Training data and time to train is important:
• Too little data and predictions are not very good
• Too few epochs of training and predictions are not very good
• Diminishing returns at a certain point
• The number of layers you pick has a big impact:
• Lots of layers takes longer to train
• Fewer layers can be quicker to train
• Results with fewer layers may be as good as or even better than multi-layer models
• Use tools like TensorBoard to help you visualize what is happening with training
• Can help you see when your training is converging
• Again, experiment with the hyperparameters:
• Batch size, epochs, layers, training steps, validation steps
255TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Different from Case 1
• Interacting with network nodes via NETCONF, live. No Telemetry, no TSDB.
• Using tf-keras library, on Tensorflow 2.0+/latest [previous 1.x]
• Brain is fat and shallow, for fast learning & flexibility (short term “retention”)
• Machine explored and frequently optimized target
• Ideal for: analytics slice [ex. VRF-Lite] of the network, with 1:n real-time sampling
analytics
production
TECDEV-2765 257
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Playing the Game
Experimenting & improving forever..
TECDEV-2765 258
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
The ML Model.# Wide, shallow modelmodel = tf.keras.Sequential([
tf.keras.layers.Dense(1200, activation=tf.nn.relu, input_shape=(4, 15)), #input shapetf.keras.layers.Dense(1200, activation=tf.nn.relu),tf.keras.layers.Dense(15)
]).
# Categorical cross entropy per prefix per nodeloss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True).# transform probabilities into discrete choice, per routelabels_tf = tf.transpose(tf.nn.softmax(labels_all, axis=0)).
259TECDEV-2765
4 probabilities
each
Normalize vectors
into probability
distribution
15 prefixes
Between
probabilities
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Probabilities, Normalized [Softmax] - example.tf.Tensor([[0.25497461 0.24361039 0.24042456 0.2463647 0.25370661 0.25775105
0.23473186 0.25987438 0.24686818 0.24663202 0.24490015 0.253107680.24204974 0.25942015 0.25059948]
[0.24819476 0.2517995 0.25362213 0.25551216 0.24509107 0.25448860.26019332 0.24580416 0.25076208 0.24758969 0.24134524 0.248244750.25225773 0.24561698 0.2488974 ]
[0.25545497 0.25443653 0.25465455 0.25028481 0.25478405 0.242966170.25593156 0.24964981 0.25029009 0.25659815 0.25335059 0.250561050.25391062 0.24713582 0.25155338]
[0.24137566 0.25015358 0.25129875 0.24783833 0.24641826 0.244794180.24914326 0.24467165 0.25207965 0.24918014 0.26040402 0.248086520.25178191 0.24782706 0.24894974]], shape=(4, 15), dtype=float64)
.
260TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Array of Probabilities - example.labels_all tf.Tensor([[-3367. -4061. -3539. -3460. -3797. -3243. -3797. -2900. -3716. -3731.
-2676. -3424. -3780. -3735. -3648.][-3486. -3007. -3136. -2131. -3292. -2908. -2013. -3524. -3364. -3463.-3693. -3650. -3480. -2329. -3961.]
[-2870. -3230. -3699. -3897. -3249. -3318. -3554. -3484. -2941. -3541.-3754. -2772. -3735. -3910. -1762.]
[-3675. -3100. -3024. -3910. -3060. -3929. -4034. -3490. -3377. -2663.-3275. -3552. -2403. -3424. -4027.]], shape=(4, 15), dtype=float64)
.
261TECDEV-2765
probabilities
shape
print('labels_all', tf.argmax(labels_all, axis=0)).labels_all tf.Tensor([2 1 3 1 3 1 1 0 2 3 0 2 3 1 2], shape=(15,), dtype=int64).
Only max values
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
The Routing Table.load_new: 13.0weight : 2.3Loss: 23.342491149902344labels_all tf.Tensor([[-2938. -2432. -3260. -3101. -2996. -2440. -3173. -2005. -2725. -3154.-2267. -2936. -2979. -3326. -3085.][-3147. -2537. -2499. -1662. -2889. -3094. -1579. -3297. -2955. -2403.-2587. -3090. -2348. -2035. -2866.][-1586. -2640. -2977. -3198. -2951. -2512. -3198. -2369. -2155. -3049.-3062. -1888. -3117. -2777. -2218.][-3279. -3341. -2214. -2989. -2114. -2904. -3000. -3279. -3115. -2344.-3034. -3036. -2506. -2812. -2781.]],
shape=(4, 15), dtype=float64)labels_all tf.Tensor([2 0 3 1 3 0 1 0 2 3 0 2 1 1 2], shape=(15,), dtype=int64).
263TECDEV-2765
.labels:tf.Tensor([2 0 3 1 3 0 1 0 2 3 0 2 1 1 2], shape=(15,), dtype=int64)targets:tf.Tensor([2 1 3 1 3 0 1 0 2 3 0 2 3 1 2], shape=(15,), dtype=int64)
gi2 gi3 gi4 gi5r711 10.1.0.0/16 0 0 1 0
10.2.0.0/16 1 0 0 010.3.0.0/16 0 0 0 1
r712 10.1.0.0/16 0 1 0 010.2.0.0/16 0 0 0 110.3.0.0/16 1 0 0 0
r713 10.1.0.0/16 0 1 0 010.2.0.0/16 1 0 0 010.3.0.0/16 0 0 1 0
w701 10.1.0.0/16 0 0 0 110.2.0.0/16 1 0 0 010.3.0.0/16 0 0 1 0
w702 10.1.0.0/16 0 1 0 010.2.0.0/16 0 1 0 010.3.0.0/16 0 0 1 0
.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
✓ YANG: & the data
✓ Telemetry: transport, senders, receiver, collector
✓ TSDB: the datastore
✓ Visualization, dashboards
✓ ML models: the monitor workers
✓ Production line: YANG > Telemetry > TSDB > ML
264
We have covered..
TECDEV-2765
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
✓ Definition & Components
✓ Neural Networks
✓ ML Capabilities
✓ Engineering Process
✓ ML @CISCO
✓ Models, Training, Validation
265
Machine Learning
✓ Prediction & Accuracy
✓ Data Engineering
✓ ML Quality Assurance
✓ Case 1: Anomaly detection
✓ Case 2: Anomaly prediction
✓ Case 3: AI game
TECDEV-2765
Complete your online session survey • Please complete your session survey
after each session. Your feedback is very important.
• Complete a minimum of 4 session surveys and the Overall Conference survey (starting on Thursday) to receive your Cisco Live t-shirt.
• All surveys can be taken in the Cisco Events Mobile App or by logging in to the Content Catalog on ciscolive.com/emea.
Cisco Live sessions will be available for viewing on demand after the event at ciscolive.com.
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco PublicTECDEV-2765 266
#CLEMEA
11:00
BRKOPS-1871Automate your SW
delivery process
09:00Opening Keynote
17:00Guest Keynote
18:30Cisco Live
Celebration
09:00
BRKNMS-2032YANG Data Modeling and
NETFCONF: Cisco and Industry Developments
11:30
BRKOPS-2285Programmability with
IOS-XR Platforms
BRKSDN-2717The hitchhiker's guide -Managing your Network
as Code (DevOps)
08:30
BRKSDN-237913 steps from an
unprogrammed to a fully automated network
14:45
BRKPRG-2482Infrastructure as Code -
Building, Deploying, Securing, Monitoring and
Managing Robust and Repeatable Networks Using
Code and APIS
08:30
BRKNMS-3021Advanced Cisco IOS
Device Instrumentation
14:30
BRKNMS-2285How to be a hero with
Cisco DNA Center Platform APIs
BRKOPS-2562Data is the new Oil: The Nuts & Bolts of
leveraging Cisco DNA Assurance data for
creating value added services
17:00
BRKSDN-2497Build Your API-Based NW Troubleshooting
Kit
16:45
BRKOPS-2024Wireless Automation & Assurance with Cisco
DNA Center using APIs
11:00
PSOOPS-2236Unlocking the power of
open platform with Cisco DNA Center Platform
11:15
BRKOPS-3825Interpreting streaming
telemetry data using ML/AI
OPSOperations Track
www.ciscolive.com/emea/learn/technology-tracks/operations.html
Network Programmability
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
Continue your education
269TECDEV-2765
Related sessions
Walk-In LabsDemos in the Cisco Showcase
Meet the Engineer 1:1 meetings