Deployment manual - EU Digital Scoreboard

14
Deployment manual Support services for the Digital Agenda Data Tool SMART 2015/1086 Deployment manual detailing the complete process that would allow a third party to replicate the entire content and functionalities of the website in a new suitable environment Last update: December 2018

Transcript of Deployment manual - EU Digital Scoreboard

Deployment manual

Support services for the Digital Agenda Data Tool

SMART 2015/1086

Deployment manual detailing the complete process that would allow a third party to

replicate the entire content and functionalities of the website in a new suitable

environment

Last update: December 2018

Digital Agenda Data Website –Deployment Manual (D7)

Page 3 Support services for the Digital Agenda Data website - SMART 2015/1086

Table of contents

1. Current installation .......................................................................................................................... 5

1.1. Domain names ......................................................................................................................... 5

1.2. Deployment diagram ............................................................................................................... 5

1.3. Operating system, system tools, libraries and prerequisites .................................................... 6

1.3.1. Libraries, tools, prerequisites .......................................................................................... 6

1.4. Description of 3rd party software components ......................................................................... 6

1.4.1. Virtuoso Open-Source Edition ........................................................................................ 6

1.4.2. Content Registry .............................................................................................................. 6

1.4.3. Plone ................................................................................................................................ 6

1.4.4. Apache Web Server ......................................................................................................... 7

1.4.5. NGINX web server .......................................................................................................... 7

1.4.6. Matomo web analytics ..................................................................................................... 7

1.4.7. MariaDB .......................................................................................................................... 7

1.4.8. Apache Solr ..................................................................................................................... 8

1.4.9. HAProxy .......................................................................................................................... 8

1.4.10. Memcached ..................................................................................................................... 8

1.4.11. Postfix .............................................................................................................................. 8

1.5. Source code ............................................................................................................................. 9

1.6. Data files ................................................................................................................................. 9

1.7. Secrets ................................................................................................................................... 10

2. Installation using docker ............................................................................................................... 11

2.1. Prerequisites .......................................................................................................................... 11

2.2. Running the stack .................................................................................................................. 11

2.3. Restoring data files ................................................................................................................ 12

2.3.1. Plone data ...................................................................................................................... 12

2.3.2. Virtuoso data ................................................................................................................. 12

2.3.3. Content Registry settings ............................................................................................... 12

2.3.4. Restore Matomo database.............................................................................................. 12

2.3.5. Rebuild exports and search index .................................................................................. 13

2.3.6. Cleanup .......................................................................................................................... 13

2.4. Running under a different URL ............................................................................................. 13

2.5. Testing installation ................................................................................................................ 13

2.6. Application-specific settings ................................................................................................. 14

Digital Agenda Data Website –Deployment Manual (D7)

Page 5 Support services for the Digital Agenda Data website - SMART 2015/1086

1. Current installation

This document describes the hardware and software architecture currently used (as of September

2018) by the production and test instances of Digital Agenda Data Website.

It contains a detailed technical description of the software components, their configuration and

installation steps on a clean system.

1.1. Domain names

PRODUCTION

digital-agenda-data.eu Visualisation website (Plone)

www.digital-agenda-data.eu Alias for digital-agenda-data.eu

digital-agenda-data.eu/data Content Registry

virtuoso.digital-agenda-data.eu OpenLink Virtuoso

semantic.digital-agenda-data.eu this namespace is also used by triples inside the RDF data

and is served by the OpenLink Virtuoso Faceted Browser

tool (Linked Data API)

digital-agenda-data.eu/analytics Matomo (web analytics tool) TEST

test.digital-agenda-data.eu Visualisation website (Plone)

test.digital-agenda-data.eu/analytics Matomo (web analytics tool) for test environment

test-cr.digital-agenda-data.eu

test.digital-agenda-data.eu/data

Content Registry (test)

test-virtuoso.digital-agenda-data.eu OpenLink Virtuoso

* All these domains are virtual hosts and point to a single IP address (85.9.22.69) but can be moved to

different hosts if required.

1.2. Deployment diagram

The main software components and their interaction are depicted in the diagram below:

APACHE WEB SERVER

haproxy (load balancer)

Plone instance 1(Zeo client)

Plone instance 2(Zeo client)

Plone instance 3(Zeo client)

HTTP

Zeo server

RPC

ZODBFile I/O

Plone instance

Plone 4.3 (official docker image)

memcached modules

Content Registry

Virtuoso

HTTP Proxy

SPARQL/HTTP

PRODUCTION

Apache Solr search

PHP MariaDB

WEB ANALYTICS

HTTP

nginx

SPARQL/JDBC

ELDA

postfix

cron / scripts

Digital Agenda Data Website –Deployment Manual (D7)

Page 6 Support services for the Digital Agenda Data website - SMART 2015/1086

1.3. Operating system, system tools, libraries and prerequisites

All software is running in a Linux server (CPU: Intel® Quad Core™ i7-4770 HyperThreading, RAM:

16 GB DDR3, RAID 1 2*2 TB SATA):

▪ Linux CentOS 7.2 64bit

▪ Kernel 3.10.0-327.4.5.el7.x86_64 (updated constantly).

1.3.1. Libraries, tools, prerequisites

The architecture is based on Docker microservices orchestrated using docker-compose.

The only prerequisites which must be installed on the host server are:

▪ docker-ce (https://docs.docker.com/install/linux/docker-ce/centos/#install-docker-ce)

▪ docker-compose (https://docs.docker.com/compose/install/)

▪ apache httpd or any other web server (https://httpd.apache.org)

▪ git client

▪ certbot for generating Let’s Encrypt SSL certificates (https://certbot.eff.org/all-instructions)

All other software components, including scheduled jobs and scripts run as docker containers, as

described below.

The entire setup is maintained online in the official git repository (https://github.com/digital-agenda-

data/scoreboard.docker), including the default docker-compose.yml file and detailed technical

instructions (see README.md in the repo). In case of any discrepancy with this document, the online

version prevails.

1.4. Description of 3rd party software components

The following software components have not been developed specifically for the Digital Agenda

Scoreboard project.

1.4.1. Virtuoso Open-Source Edition

This component is used as storage engine for semantic and relational data.

Vendor OpenLink Software

Open Source Yes, GPL v2 (http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSLicense)

Technology native/mixed

Home page http://docs.openlinksw.com/virtuoso

Source code https://github.com/openlink/virtuoso-opensource

Current version 07.20.3229 (release 7.2.5.1 – Aug 2018)

Management

interface

http://virtuoso.digital-agenda-data.eu (production instance)

http://test-virtuoso.digital-agenda-data.eu (test instance)

Docker image https://hub.docker.com/r/tenforce/virtuoso/

1.4.2. Content Registry

This component is used to maintain and browse through data and metadata

Vendor TripleDev, funded by the European Environment Agency and DG-Connect

Open Source Yes

Technology Java

Source code https://github.com/digital-agenda-data/scoreboard.contreg

Management

interface

http://digital-agenda-data.eu/data (production instance)

http://test.digital-agenda-data.eu/data (test instance)

Docker image https://hub.docker.com/r/digitalagendadata/cr/

1.4.3. Plone

This component is used as a content management framework and application server for the

visualisation website.

Digital Agenda Data Website –Deployment Manual (D7)

Page 7 Support services for the Digital Agenda Data website - SMART 2015/1086

Vendor The Plone Foundation

Open Source Yes

Technology Python

Home page http://plone.org/

Source code https://github.com/plone

Current version Plone 4.3.17

Docker image https://hub.docker.com/r/digitalagendadata/plone/, based on https://hub.docker.com/_/plone/

1.4.4. Apache Web Server

This is the public web server and reverse proxy that stands in front of all application servers.

Vendor The Apache Software Foundation

Open Source Yes

Technology Native

Home page http://httpd.apache.org/

Current version 2.4.6

Virtual Hosts Production:

▪ digital-agenda-data.eu (main website)

▪ www.digital-agenda-data.eu (alias)

▪ semantic.digital-agenda-data.eu (Linked Data API)

▪ virtuoso.digital-agenda-data.eu (Virtuoso)

Test:

▪ test.digital-agenda-data.eu (website)

▪ test-cr.digital-agenda-data.eu (Content Registry)

▪ test-virtuoso.digital-agenda-data.eu (Virtuoso)

1.4.5. NGINX web server

This runs as docker container and is used internally to route connections to Plone, Content Registry,

Virtuoso and Matomo.

Open Source Yes

Technology Native

Home page https://nginx.org/en/

Source code http://hg.nginx.org/nginx/

Current version 1.15.2

Docker image https://hub.docker.com/_/nginx/

1.4.6. Matomo web analytics

This PHP software is used for web statistics. It might be better known by its previous name, Piwik.

Open Source Yes

Technology PHP

Home page https://matomo.org/

Source code https://github.com/matomo-org/matomo

Current version 3.5+

Docker image https://hub.docker.com/_/matomo/

1.4.7. MariaDB

The underlying relational database used by Matomo web analytics.

Open Source Yes

Technology Native

Digital Agenda Data Website –Deployment Manual (D7)

Page 8 Support services for the Digital Agenda Data website - SMART 2015/1086

Home page https://mariadb.org/

Source code https://github.com/MariaDB/server

Current version 10.2

Docker image https://hub.docker.com/_/mariadb/

1.4.8. Apache Solr

A full-text search engine, used by the indicator search functionality of the main website.

Open Source Yes

Technology Java

Home page http://lucene.apache.org/solr/guide/6_6/

Source code https://github.com/apache/lucene-solr

Current version 6.6

Docker image https://hub.docker.com/_/solr/

1.4.9. HAProxy

A HTTP load balancer, used to distribute incoming connections to the main website (routed to several

Plone instances)

Open Source Yes

Technology Native

Home page http://www.haproxy.org/

Source code https://github.com/haproxy/haproxy

Current version 1.8

Docker image https://hub.docker.com/r/eeacms/haproxy/ (based on https://hub.docker.com/_/haproxy/)

1.4.10. Memcached

Internal cache store

Open Source Yes

Technology Native

Home page https://memcached.org/

Source code https://github.com/memcached/memcached

Current version 1.5

Docker image https://hub.docker.com/_/memcached/

1.4.11. Postfix

A simple SMTP server for sending email notifications.

Open Source Yes

Technology Native

Home page http://www.postfix.org/

Source code http://cdn.postfix.johnriley.me/mirrors/postfix-release/index.html

Current version 2.10

Digital Agenda Data Website –Deployment Manual (D7)

Page 9 Support services for the Digital Agenda Data website - SMART 2015/1086

Docker image https://hub.docker.com/r/eeacms/postfix/

1.5. Source code

The following components have been developed specifically for the Digital Agenda Scoreboard

project:

Description Technology URL

Installation scripts / container

orchestration

Docker/Bash/Python https://github.com/digital-agenda-data/scoreboard.docker

Build scripts for Plone Python https://github.com/digital-agenda-

data/scoreboard.buildout

Visualisation website (Plone) Javascript (96%),

Python

https://github.com/digital-agenda-

data/scoreboard.visualization

Plone theme, CSS stylesheets, front-end

widgets (e.g. navigation)

Javascript (55%),

CSS (43%), Python

https://github.com/digital-agenda-data/scoreboard.theme

Backend components (data access,

SPARQL queries)

Python https://github.com/digital-agenda-data/edw.datacube

Content Registry with modifications for

the Digital Agenda Scoreboard

https://github.com/digital-agenda-

data/scoreboard.contreg

RDF data model and initial content of

the triple store

RDF https://github.com/digital-agenda-data/rdf

Various utility scripts used in day-to-

day maintenance

scripts https://github.com/digital-agenda-data/scripts

Installation scripts for the entire

solution on a development environment

using Vagrant and VirtualBox

(deprecated and unmaintained, but may

be used as a reference for deployment

on a host machine without using

docker)

scripts https://github.com/digital-agenda-

data/scoreboard.vagrant

1.6. Data files

The following files need to be backed-up and restored in the case of a server migration in order to

keep all current data. They are not stored in GitHub repositories and cannot be restored by other means

except a clean re-upload of all datasets and re-configuration of all visualisations:

File(s) Docker volume Approx

size

Description

virtuoso.db virtuoso_db 2 GB Semantic data

Data.fs zeodata 15 MB ZODB database, contains metadata for all

objects (datasets, charts, users, settings, etc.)

created in the visualisation website

blobstorage/* 150 MB Directory containing binary objects (images,

attachments, etc.)

mariadb data files mariadb 1.5 GB Matomo data (web statistics)

The following data files can be copied, but can also be re-created:

File(s) Docker volume Approx

size

Description

./.env N/A 4 KB Configuration file containing passwords and

other settings

Digital Agenda Data Website –Deployment Manual (D7)

Page 10 Support services for the Digital Agenda Data website - SMART 2015/1086

/etc/letsencrypt/archive N/A 100 KB SSL (HTTPS) certificates generated using

certbot/letsencrypt tool

acl/users.xml

acl/cr.groups.xml

cr_home 50 KB Access Control Lists (ACLs) designating

access permissions of users and groups to the

functionality of Content Registry.

staging/* 14 GB Staging files uploaded in Content Registry

(e.g. *.mdb)

filestore/* 0 Files uploaded by users, and also content of

the dynamically editable documentation

sections of the Content Registry user

interface.

* exported_datasets 700 MB Datasets exported as csv, tsv and ttl (e.g. as

listed in page http://digital-agenda-

data.eu/datasets/digital_agenda_scoreboard_k

ey_indicators#download)

Some of the above files can be re-created if lost:

• The files in /var/local/(test-)cr/apphome/staging/* can be downloaded or obtained in other ways from

their original providers, such as Eurostat. For example the latest MS Access statistics on ICT

survey can be downloaded from http://epp.eurostat.ec.europa.eu/portal/page/portal/information_society/data/comprehensive_databases They can be downloaded directly into the above location in Content Registry file system, or

they can be downloaded via the "Staging files" section in Content Registry's "Admin actions".

• The files in /var/local/(test-)cr/apphome/acl/* can be either edited manually or when building

Content Registry from source code and providing the corresponding folder path in build

properties

• The files in /var/local/(test-)cr/apphome/filestore/* can only be re-created by users re-uploading them

via dedicated sections in Content Registry web interface.

• SSL certificates can be regenerated using certbot/letsencrypt tool, after correct DNS

configuration (hostnames must point to the new server before generating certificates)

• The exported datasets are generated automatically, daily (using scripts located in the cron

container)

1.7. Secrets

To fully take over the project you will need admin access to the following repositories and services:

▪ Host machine (ssh digital-agenda-data.eu)

▪ GitHub (https://github.com/digital-agenda-data)

▪ Docker Hub (https://hub.docker.com/u/digitalagendadata)

▪ Plone admin password (https://digital-agenda-data.eu/login)

▪ Virtuoso dba password (https://virtuoso.digital-agenda-data.eu/conductor/)

▪ Piwik admin password (https://digital-agenda-data.eu/analytics)

▪ Other application passwords stored in <APP_HOME>/.env (different passwords for test and

production)

Digital Agenda Data Website –Deployment Manual (D7)

Page 11 Support services for the Digital Agenda Data website - SMART 2015/1086

2. Installation using docker

2.1. Prerequisites

▪ Install git: yum install -y git (CentOS)

▪ Install docker-ce (https://docs.docker.com/install/linux/docker-ce/centos/#install-docker-ce)

and docker-compose (yum install -y docker-compose)

▪ Install Apache httpd if not already installed

▪ Clone the bootstrap repository: git clone https://github.com/digital-agenda-

data/scoreboard.docker.git. The created folder will be further called <APP_HOME>

▪ Copy the Apache configuration files from <APP_HOME>/etc to /etc

▪ Create the DocumentRoot folders: /var/www/html and /var/www/test-html

▪ Prepare the data files (see 1.6) into folder <APP_HOME>/data

▪ Read the instructions from <APP_HOME>/README.md before proceeding further

2.2. Running the stack

In order to install all applications using Docker, use the scripts maintained at

https://github.com/digital-agenda-data/scoreboard.docker (connection to the Internet is required).

Run the commands below and replace “production” with “test” if setting up a new test environment

cd <APP_HOME>

cp .env.PRODUCTION .env

vim .env

and change parameters CR_DB_PASSWORD, CR_DB_RO_PASSWORD,

VIRTUOSO_DBA_PASSWORD, MARIADB_PASSWORD, MARIADB_PIWIK_PASSWORD.

cp docker-compose.production.yml docker-compose.override.yml

vim docker-compose.overrideyml

and review all configuration.

docker-compose pull

(this command downloads all necessary images and will take up to 20 minutes, depending on the

internet connection speed)

docker-compose build

(this command updates some of the docker images with local deployment settings)

docker-compose up -d

(this command starts all containers and it’s required before restoring the data files)

Test that all containers are running by executing docker-compose ps and checking the output. All

containers should be in state “Up”.

The following ports should be open on the host machine:

▪ 81 (production) / 82 (test) - opened by nginx

▪ 8891 (production) / 8892 (test) - opened by virtuoso

The following ports should be open, but only visible to the internal docker network:

▪ 443 (nginx)

▪ 8080 (cr, plone, zeo)

▪ 8983 (solr)

▪ 1111 (virtuoso)

▪ 25 (mail)

Digital Agenda Data Website –Deployment Manual (D7)

Page 12 Support services for the Digital Agenda Data website - SMART 2015/1086

▪ 9000 (piwik)

▪ 3306 (mariadb)

2.3. Restoring data files

These steps should be only performed once, after the initial deployment.

2.3.1. Plone data

Prerequisite: an archive file called plone_files.tar.gz found in <APP_HOME>/data.

docker-compose stop plone zeoserver

docker cp data/plone_files.tar.gz rsync:/plone-data

docker-compose exec rsync-server sh

cd /plone-data

tar xzvf plone_files.tar.gz

chown -R 500.500 blobstorage/ filestorage/

rm plone_files.tar.gz

exit

docker-compose restart zeoserver plone

2.3.2. Virtuoso data

Prerequisite: an archive file called virtuoso.db.gz found in <APP_HOME>/data.

docker-compose stop virtuoso

docker cp data/virtuoso.db.gz rsync:/virtuoso-data

docker-compose exec rsync-server sh

cd /virtuoso-data

rm -f virtuoso.db virtuoso.lck virtuoso.pxa virtuoso.trx virtuoso-temp.db virtuoso.tdb

gzip -d -c virtuoso.db.gz > virtuoso.db

rm virtuoso.db.gz

exit

docker-compose restart virtuoso memcached

2.3.3. Content Registry settings

Prerequisite: the files listed below found in <APP_HOME>/data:

docker cp data/cr/apphome/acl/users.xml rsync:/var/local/cr/apphome/acl

docker cp data/cr/apphome/acl/cr.groups.xml rsync:/var/local/cr/apphome/acl

docker cp data/cr/apphome/staging rsync:/var/local/cr/apphome

# copy each staging database

docker cp data/cr/apphome/staging/$file rsync:/var/local/cr/apphome/staging/$file

docker-compose restart cr

2.3.4. Restore Matomo database

Prerequisite: a database dump called piwik.sql found in <APP_HOME>/data

Digital Agenda Data Website –Deployment Manual (D7)

Page 13 Support services for the Digital Agenda Data website - SMART 2015/1086

docker cp data/piwik.sql rsync:/var/lib/mysql

docker-compose exec mariadb bash

cd /var/lib/mysql

mysql -u root -p$MYSQL_ROOT_PASSWORD piwik < piwik.sql

rm -f piwik.sql

exit

Note: Additional configuration steps are necessary after all solution components are running. To

execute these steps, follow the instructions shown at https://digital-agenda-data.eu/analytics. See also

section “Piwik configuration” from <APP_HOME>/README.md.

2.3.5. Rebuild exports and search index

docker-compose exec cron bash

source /etc/environment

cd /var/cron/scripts/

./export_datasets.sh

python /var/cron/scripts/solr-index.py --core scoreboard --base-path ${SCOREBOARD_URL}

--solr ${SOLR_URL}

2.3.6. Cleanup

After restoration of all data files, the “rsync” service is no longer needed and should be deleted from

docker-compose.override.yml, especially if you plan to perform multiple installations on the same host

(e.g. production and test).

2.4. Running under a different URL

In case the main domain name (digital-agenda-data.eu) has to be changed, several changes to the

configuration files and application settings must be done:

/etc/httpd/conf.d/*.conf (this is where the virtual hosts are defined on the host machine)

<APP_HOME>/nginx/project-*.conf (this is where more virtual hosts and reverse proxy are defined)

<APP_HOME>/cron/scripts/exportcl.sh (this script uses the SPARQL endpoint)

<APP_HOME>/.env and <APP_HOME>/docker-compose.yml

2.5. Testing installation

To test that all applications were correctly installed, run the following checks:

1. Run a sparql query (https://digital-agenda-data.eu/sparql)

2. Login in Content Registry (https://digital-agenda-data.eu/data/login.action)

3. check a chart (https://digital-agenda-data.eu/charts/desi-composite)

4. Check dataset indicators table (https://digital-agenda-data.eu/datasets/desi/indicators)

5. Add a comment and check mail (https://digital-agenda-

data.eu/board/digital_agenda_scoreboard_key_indicators). If no email received, check the

distribution list at https://digital-agenda-data.eu/@@ploneboard_notification

6. download a csv file (https://digital-agenda-data.eu/download/DESI.csv.zip)

7. Run a search (https://test.digital-agenda-data.eu/search-indicators?q=media)

8. Check web analytics configuration (https://digital-agenda-

data.eu/portal_skins/custom/analytics.js/manage_main)

9. Check for live activity in Matomo (https://digital-agenda-data.eu/analytics)

10. Review the Privacy page (https://digital-agenda-data.eu/privacy). It should contain an embedded

iframe showing the status (Opt-In/Opt-Out) for web analytics.

Digital Agenda Data Website –Deployment Manual (D7)

Page 14 Support services for the Digital Agenda Data website - SMART 2015/1086

11. Check mail settings (https://digital-agenda-data.eu/@@mail-controlpanel)

12. Check log files for errors:

/var/log/httpd/*.log

cd <APP_HOME>

docker-compose logs nginx

docker-compose logs plone

docker-compose logs zeoserver

docker-compose logs virtuoso

docker-compose logs cr

docker-compose logs piwik

2.6. Application-specific settings

The following settings are explained in more details in deliverable D2 - Technical Report:

▪ Open https://digital-agenda-data.eu and login using an administrator account using the Login

link in the footer

▪ Check the SMTP settings in page https://digital-agenda-data.eu/mail-controlpanel

▪ Check the reCAPTCHA keys in page https://digital-agenda-data.eu/recaptcha-settings

▪ Check the email addresses that receive notifications when comments are posted, in page

https://digital-agenda-data.eu/ploneboard_notification

▪ Check the properties in page https://digital-agenda-data.eu/portal_registry/ (select prefix

IDataCubeSettings from the dropdown list). In particular, the test instance must have different

values for parameters DEFAULT_CR_URL, DEFAULT_SPARQL_ENDPOINT,

DEFAULT_USER_SPARQL_ENDPOINT