Deployment manual - EU Digital Scoreboard
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Deployment manual - EU Digital Scoreboard
Deployment manual
Support services for the Digital Agenda Data Tool
SMART 2015/1086
Deployment manual detailing the complete process that would allow a third party to
replicate the entire content and functionalities of the website in a new suitable
environment
Last update: December 2018
Digital Agenda Data Website –Deployment Manual (D7)
Page 3 Support services for the Digital Agenda Data website - SMART 2015/1086
Table of contents
1. Current installation .......................................................................................................................... 5
1.1. Domain names ......................................................................................................................... 5
1.2. Deployment diagram ............................................................................................................... 5
1.3. Operating system, system tools, libraries and prerequisites .................................................... 6
1.3.1. Libraries, tools, prerequisites .......................................................................................... 6
1.4. Description of 3rd party software components ......................................................................... 6
1.4.1. Virtuoso Open-Source Edition ........................................................................................ 6
1.4.2. Content Registry .............................................................................................................. 6
1.4.3. Plone ................................................................................................................................ 6
1.4.4. Apache Web Server ......................................................................................................... 7
1.4.5. NGINX web server .......................................................................................................... 7
1.4.6. Matomo web analytics ..................................................................................................... 7
1.4.7. MariaDB .......................................................................................................................... 7
1.4.8. Apache Solr ..................................................................................................................... 8
1.4.9. HAProxy .......................................................................................................................... 8
1.4.10. Memcached ..................................................................................................................... 8
1.4.11. Postfix .............................................................................................................................. 8
1.5. Source code ............................................................................................................................. 9
1.6. Data files ................................................................................................................................. 9
1.7. Secrets ................................................................................................................................... 10
2. Installation using docker ............................................................................................................... 11
2.1. Prerequisites .......................................................................................................................... 11
2.2. Running the stack .................................................................................................................. 11
2.3. Restoring data files ................................................................................................................ 12
2.3.1. Plone data ...................................................................................................................... 12
2.3.2. Virtuoso data ................................................................................................................. 12
2.3.3. Content Registry settings ............................................................................................... 12
2.3.4. Restore Matomo database.............................................................................................. 12
2.3.5. Rebuild exports and search index .................................................................................. 13
2.3.6. Cleanup .......................................................................................................................... 13
2.4. Running under a different URL ............................................................................................. 13
2.5. Testing installation ................................................................................................................ 13
2.6. Application-specific settings ................................................................................................. 14
Digital Agenda Data Website –Deployment Manual (D7)
Page 5 Support services for the Digital Agenda Data website - SMART 2015/1086
1. Current installation
This document describes the hardware and software architecture currently used (as of September
2018) by the production and test instances of Digital Agenda Data Website.
It contains a detailed technical description of the software components, their configuration and
installation steps on a clean system.
1.1. Domain names
PRODUCTION
digital-agenda-data.eu Visualisation website (Plone)
www.digital-agenda-data.eu Alias for digital-agenda-data.eu
digital-agenda-data.eu/data Content Registry
virtuoso.digital-agenda-data.eu OpenLink Virtuoso
semantic.digital-agenda-data.eu this namespace is also used by triples inside the RDF data
and is served by the OpenLink Virtuoso Faceted Browser
tool (Linked Data API)
digital-agenda-data.eu/analytics Matomo (web analytics tool) TEST
test.digital-agenda-data.eu Visualisation website (Plone)
test.digital-agenda-data.eu/analytics Matomo (web analytics tool) for test environment
test-cr.digital-agenda-data.eu
test.digital-agenda-data.eu/data
Content Registry (test)
test-virtuoso.digital-agenda-data.eu OpenLink Virtuoso
* All these domains are virtual hosts and point to a single IP address (85.9.22.69) but can be moved to
different hosts if required.
1.2. Deployment diagram
The main software components and their interaction are depicted in the diagram below:
APACHE WEB SERVER
haproxy (load balancer)
Plone instance 1(Zeo client)
Plone instance 2(Zeo client)
Plone instance 3(Zeo client)
HTTP
Zeo server
RPC
ZODBFile I/O
Plone instance
Plone 4.3 (official docker image)
memcached modules
Content Registry
Virtuoso
HTTP Proxy
SPARQL/HTTP
PRODUCTION
Apache Solr search
PHP MariaDB
WEB ANALYTICS
HTTP
nginx
SPARQL/JDBC
ELDA
postfix
cron / scripts
Digital Agenda Data Website –Deployment Manual (D7)
Page 6 Support services for the Digital Agenda Data website - SMART 2015/1086
1.3. Operating system, system tools, libraries and prerequisites
All software is running in a Linux server (CPU: Intel® Quad Core™ i7-4770 HyperThreading, RAM:
16 GB DDR3, RAID 1 2*2 TB SATA):
▪ Linux CentOS 7.2 64bit
▪ Kernel 3.10.0-327.4.5.el7.x86_64 (updated constantly).
1.3.1. Libraries, tools, prerequisites
The architecture is based on Docker microservices orchestrated using docker-compose.
The only prerequisites which must be installed on the host server are:
▪ docker-ce (https://docs.docker.com/install/linux/docker-ce/centos/#install-docker-ce)
▪ docker-compose (https://docs.docker.com/compose/install/)
▪ apache httpd or any other web server (https://httpd.apache.org)
▪ git client
▪ certbot for generating Let’s Encrypt SSL certificates (https://certbot.eff.org/all-instructions)
All other software components, including scheduled jobs and scripts run as docker containers, as
described below.
The entire setup is maintained online in the official git repository (https://github.com/digital-agenda-
data/scoreboard.docker), including the default docker-compose.yml file and detailed technical
instructions (see README.md in the repo). In case of any discrepancy with this document, the online
version prevails.
1.4. Description of 3rd party software components
The following software components have not been developed specifically for the Digital Agenda
Scoreboard project.
1.4.1. Virtuoso Open-Source Edition
This component is used as storage engine for semantic and relational data.
Vendor OpenLink Software
Open Source Yes, GPL v2 (http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSLicense)
Technology native/mixed
Home page http://docs.openlinksw.com/virtuoso
Source code https://github.com/openlink/virtuoso-opensource
Current version 07.20.3229 (release 7.2.5.1 – Aug 2018)
Management
interface
http://virtuoso.digital-agenda-data.eu (production instance)
http://test-virtuoso.digital-agenda-data.eu (test instance)
Docker image https://hub.docker.com/r/tenforce/virtuoso/
1.4.2. Content Registry
This component is used to maintain and browse through data and metadata
Vendor TripleDev, funded by the European Environment Agency and DG-Connect
Open Source Yes
Technology Java
Source code https://github.com/digital-agenda-data/scoreboard.contreg
Management
interface
http://digital-agenda-data.eu/data (production instance)
http://test.digital-agenda-data.eu/data (test instance)
Docker image https://hub.docker.com/r/digitalagendadata/cr/
1.4.3. Plone
This component is used as a content management framework and application server for the
visualisation website.
Digital Agenda Data Website –Deployment Manual (D7)
Page 7 Support services for the Digital Agenda Data website - SMART 2015/1086
Vendor The Plone Foundation
Open Source Yes
Technology Python
Home page http://plone.org/
Source code https://github.com/plone
Current version Plone 4.3.17
Docker image https://hub.docker.com/r/digitalagendadata/plone/, based on https://hub.docker.com/_/plone/
1.4.4. Apache Web Server
This is the public web server and reverse proxy that stands in front of all application servers.
Vendor The Apache Software Foundation
Open Source Yes
Technology Native
Home page http://httpd.apache.org/
Current version 2.4.6
Virtual Hosts Production:
▪ digital-agenda-data.eu (main website)
▪ www.digital-agenda-data.eu (alias)
▪ semantic.digital-agenda-data.eu (Linked Data API)
▪ virtuoso.digital-agenda-data.eu (Virtuoso)
Test:
▪ test.digital-agenda-data.eu (website)
▪ test-cr.digital-agenda-data.eu (Content Registry)
▪ test-virtuoso.digital-agenda-data.eu (Virtuoso)
1.4.5. NGINX web server
This runs as docker container and is used internally to route connections to Plone, Content Registry,
Virtuoso and Matomo.
Open Source Yes
Technology Native
Home page https://nginx.org/en/
Source code http://hg.nginx.org/nginx/
Current version 1.15.2
Docker image https://hub.docker.com/_/nginx/
1.4.6. Matomo web analytics
This PHP software is used for web statistics. It might be better known by its previous name, Piwik.
Open Source Yes
Technology PHP
Home page https://matomo.org/
Source code https://github.com/matomo-org/matomo
Current version 3.5+
Docker image https://hub.docker.com/_/matomo/
1.4.7. MariaDB
The underlying relational database used by Matomo web analytics.
Open Source Yes
Technology Native
Digital Agenda Data Website –Deployment Manual (D7)
Page 8 Support services for the Digital Agenda Data website - SMART 2015/1086
Home page https://mariadb.org/
Source code https://github.com/MariaDB/server
Current version 10.2
Docker image https://hub.docker.com/_/mariadb/
1.4.8. Apache Solr
A full-text search engine, used by the indicator search functionality of the main website.
Open Source Yes
Technology Java
Home page http://lucene.apache.org/solr/guide/6_6/
Source code https://github.com/apache/lucene-solr
Current version 6.6
Docker image https://hub.docker.com/_/solr/
1.4.9. HAProxy
A HTTP load balancer, used to distribute incoming connections to the main website (routed to several
Plone instances)
Open Source Yes
Technology Native
Home page http://www.haproxy.org/
Source code https://github.com/haproxy/haproxy
Current version 1.8
Docker image https://hub.docker.com/r/eeacms/haproxy/ (based on https://hub.docker.com/_/haproxy/)
1.4.10. Memcached
Internal cache store
Open Source Yes
Technology Native
Home page https://memcached.org/
Source code https://github.com/memcached/memcached
Current version 1.5
Docker image https://hub.docker.com/_/memcached/
1.4.11. Postfix
A simple SMTP server for sending email notifications.
Open Source Yes
Technology Native
Home page http://www.postfix.org/
Source code http://cdn.postfix.johnriley.me/mirrors/postfix-release/index.html
Current version 2.10
Digital Agenda Data Website –Deployment Manual (D7)
Page 9 Support services for the Digital Agenda Data website - SMART 2015/1086
Docker image https://hub.docker.com/r/eeacms/postfix/
1.5. Source code
The following components have been developed specifically for the Digital Agenda Scoreboard
project:
Description Technology URL
Installation scripts / container
orchestration
Docker/Bash/Python https://github.com/digital-agenda-data/scoreboard.docker
Build scripts for Plone Python https://github.com/digital-agenda-
data/scoreboard.buildout
Visualisation website (Plone) Javascript (96%),
Python
https://github.com/digital-agenda-
data/scoreboard.visualization
Plone theme, CSS stylesheets, front-end
widgets (e.g. navigation)
Javascript (55%),
CSS (43%), Python
https://github.com/digital-agenda-data/scoreboard.theme
Backend components (data access,
SPARQL queries)
Python https://github.com/digital-agenda-data/edw.datacube
Content Registry with modifications for
the Digital Agenda Scoreboard
https://github.com/digital-agenda-
data/scoreboard.contreg
RDF data model and initial content of
the triple store
RDF https://github.com/digital-agenda-data/rdf
Various utility scripts used in day-to-
day maintenance
scripts https://github.com/digital-agenda-data/scripts
Installation scripts for the entire
solution on a development environment
using Vagrant and VirtualBox
(deprecated and unmaintained, but may
be used as a reference for deployment
on a host machine without using
docker)
scripts https://github.com/digital-agenda-
data/scoreboard.vagrant
1.6. Data files
The following files need to be backed-up and restored in the case of a server migration in order to
keep all current data. They are not stored in GitHub repositories and cannot be restored by other means
except a clean re-upload of all datasets and re-configuration of all visualisations:
File(s) Docker volume Approx
size
Description
virtuoso.db virtuoso_db 2 GB Semantic data
Data.fs zeodata 15 MB ZODB database, contains metadata for all
objects (datasets, charts, users, settings, etc.)
created in the visualisation website
blobstorage/* 150 MB Directory containing binary objects (images,
attachments, etc.)
mariadb data files mariadb 1.5 GB Matomo data (web statistics)
The following data files can be copied, but can also be re-created:
File(s) Docker volume Approx
size
Description
./.env N/A 4 KB Configuration file containing passwords and
other settings
Digital Agenda Data Website –Deployment Manual (D7)
Page 10 Support services for the Digital Agenda Data website - SMART 2015/1086
/etc/letsencrypt/archive N/A 100 KB SSL (HTTPS) certificates generated using
certbot/letsencrypt tool
acl/users.xml
acl/cr.groups.xml
cr_home 50 KB Access Control Lists (ACLs) designating
access permissions of users and groups to the
functionality of Content Registry.
staging/* 14 GB Staging files uploaded in Content Registry
(e.g. *.mdb)
filestore/* 0 Files uploaded by users, and also content of
the dynamically editable documentation
sections of the Content Registry user
interface.
* exported_datasets 700 MB Datasets exported as csv, tsv and ttl (e.g. as
listed in page http://digital-agenda-
data.eu/datasets/digital_agenda_scoreboard_k
ey_indicators#download)
Some of the above files can be re-created if lost:
• The files in /var/local/(test-)cr/apphome/staging/* can be downloaded or obtained in other ways from
their original providers, such as Eurostat. For example the latest MS Access statistics on ICT
survey can be downloaded from http://epp.eurostat.ec.europa.eu/portal/page/portal/information_society/data/comprehensive_databases They can be downloaded directly into the above location in Content Registry file system, or
they can be downloaded via the "Staging files" section in Content Registry's "Admin actions".
• The files in /var/local/(test-)cr/apphome/acl/* can be either edited manually or when building
Content Registry from source code and providing the corresponding folder path in build
properties
• The files in /var/local/(test-)cr/apphome/filestore/* can only be re-created by users re-uploading them
via dedicated sections in Content Registry web interface.
• SSL certificates can be regenerated using certbot/letsencrypt tool, after correct DNS
configuration (hostnames must point to the new server before generating certificates)
• The exported datasets are generated automatically, daily (using scripts located in the cron
container)
1.7. Secrets
To fully take over the project you will need admin access to the following repositories and services:
▪ Host machine (ssh digital-agenda-data.eu)
▪ GitHub (https://github.com/digital-agenda-data)
▪ Docker Hub (https://hub.docker.com/u/digitalagendadata)
▪ Plone admin password (https://digital-agenda-data.eu/login)
▪ Virtuoso dba password (https://virtuoso.digital-agenda-data.eu/conductor/)
▪ Piwik admin password (https://digital-agenda-data.eu/analytics)
▪ Other application passwords stored in <APP_HOME>/.env (different passwords for test and
production)
Digital Agenda Data Website –Deployment Manual (D7)
Page 11 Support services for the Digital Agenda Data website - SMART 2015/1086
2. Installation using docker
2.1. Prerequisites
▪ Install git: yum install -y git (CentOS)
▪ Install docker-ce (https://docs.docker.com/install/linux/docker-ce/centos/#install-docker-ce)
and docker-compose (yum install -y docker-compose)
▪ Install Apache httpd if not already installed
▪ Clone the bootstrap repository: git clone https://github.com/digital-agenda-
data/scoreboard.docker.git. The created folder will be further called <APP_HOME>
▪ Copy the Apache configuration files from <APP_HOME>/etc to /etc
▪ Create the DocumentRoot folders: /var/www/html and /var/www/test-html
▪ Prepare the data files (see 1.6) into folder <APP_HOME>/data
▪ Read the instructions from <APP_HOME>/README.md before proceeding further
2.2. Running the stack
In order to install all applications using Docker, use the scripts maintained at
https://github.com/digital-agenda-data/scoreboard.docker (connection to the Internet is required).
Run the commands below and replace “production” with “test” if setting up a new test environment
cd <APP_HOME>
cp .env.PRODUCTION .env
vim .env
and change parameters CR_DB_PASSWORD, CR_DB_RO_PASSWORD,
VIRTUOSO_DBA_PASSWORD, MARIADB_PASSWORD, MARIADB_PIWIK_PASSWORD.
cp docker-compose.production.yml docker-compose.override.yml
vim docker-compose.overrideyml
and review all configuration.
docker-compose pull
(this command downloads all necessary images and will take up to 20 minutes, depending on the
internet connection speed)
docker-compose build
(this command updates some of the docker images with local deployment settings)
docker-compose up -d
(this command starts all containers and it’s required before restoring the data files)
Test that all containers are running by executing docker-compose ps and checking the output. All
containers should be in state “Up”.
The following ports should be open on the host machine:
▪ 81 (production) / 82 (test) - opened by nginx
▪ 8891 (production) / 8892 (test) - opened by virtuoso
The following ports should be open, but only visible to the internal docker network:
▪ 443 (nginx)
▪ 8080 (cr, plone, zeo)
▪ 8983 (solr)
▪ 1111 (virtuoso)
▪ 25 (mail)
Digital Agenda Data Website –Deployment Manual (D7)
Page 12 Support services for the Digital Agenda Data website - SMART 2015/1086
▪ 9000 (piwik)
▪ 3306 (mariadb)
2.3. Restoring data files
These steps should be only performed once, after the initial deployment.
2.3.1. Plone data
Prerequisite: an archive file called plone_files.tar.gz found in <APP_HOME>/data.
docker-compose stop plone zeoserver
docker cp data/plone_files.tar.gz rsync:/plone-data
docker-compose exec rsync-server sh
cd /plone-data
tar xzvf plone_files.tar.gz
chown -R 500.500 blobstorage/ filestorage/
rm plone_files.tar.gz
exit
docker-compose restart zeoserver plone
2.3.2. Virtuoso data
Prerequisite: an archive file called virtuoso.db.gz found in <APP_HOME>/data.
docker-compose stop virtuoso
docker cp data/virtuoso.db.gz rsync:/virtuoso-data
docker-compose exec rsync-server sh
cd /virtuoso-data
rm -f virtuoso.db virtuoso.lck virtuoso.pxa virtuoso.trx virtuoso-temp.db virtuoso.tdb
gzip -d -c virtuoso.db.gz > virtuoso.db
rm virtuoso.db.gz
exit
docker-compose restart virtuoso memcached
2.3.3. Content Registry settings
Prerequisite: the files listed below found in <APP_HOME>/data:
docker cp data/cr/apphome/acl/users.xml rsync:/var/local/cr/apphome/acl
docker cp data/cr/apphome/acl/cr.groups.xml rsync:/var/local/cr/apphome/acl
docker cp data/cr/apphome/staging rsync:/var/local/cr/apphome
# copy each staging database
docker cp data/cr/apphome/staging/$file rsync:/var/local/cr/apphome/staging/$file
docker-compose restart cr
2.3.4. Restore Matomo database
Prerequisite: a database dump called piwik.sql found in <APP_HOME>/data
Digital Agenda Data Website –Deployment Manual (D7)
Page 13 Support services for the Digital Agenda Data website - SMART 2015/1086
docker cp data/piwik.sql rsync:/var/lib/mysql
docker-compose exec mariadb bash
cd /var/lib/mysql
mysql -u root -p$MYSQL_ROOT_PASSWORD piwik < piwik.sql
rm -f piwik.sql
exit
Note: Additional configuration steps are necessary after all solution components are running. To
execute these steps, follow the instructions shown at https://digital-agenda-data.eu/analytics. See also
section “Piwik configuration” from <APP_HOME>/README.md.
2.3.5. Rebuild exports and search index
docker-compose exec cron bash
source /etc/environment
cd /var/cron/scripts/
./export_datasets.sh
python /var/cron/scripts/solr-index.py --core scoreboard --base-path ${SCOREBOARD_URL}
--solr ${SOLR_URL}
2.3.6. Cleanup
After restoration of all data files, the “rsync” service is no longer needed and should be deleted from
docker-compose.override.yml, especially if you plan to perform multiple installations on the same host
(e.g. production and test).
2.4. Running under a different URL
In case the main domain name (digital-agenda-data.eu) has to be changed, several changes to the
configuration files and application settings must be done:
/etc/httpd/conf.d/*.conf (this is where the virtual hosts are defined on the host machine)
<APP_HOME>/nginx/project-*.conf (this is where more virtual hosts and reverse proxy are defined)
<APP_HOME>/cron/scripts/exportcl.sh (this script uses the SPARQL endpoint)
<APP_HOME>/.env and <APP_HOME>/docker-compose.yml
2.5. Testing installation
To test that all applications were correctly installed, run the following checks:
1. Run a sparql query (https://digital-agenda-data.eu/sparql)
2. Login in Content Registry (https://digital-agenda-data.eu/data/login.action)
3. check a chart (https://digital-agenda-data.eu/charts/desi-composite)
4. Check dataset indicators table (https://digital-agenda-data.eu/datasets/desi/indicators)
5. Add a comment and check mail (https://digital-agenda-
data.eu/board/digital_agenda_scoreboard_key_indicators). If no email received, check the
distribution list at https://digital-agenda-data.eu/@@ploneboard_notification
6. download a csv file (https://digital-agenda-data.eu/download/DESI.csv.zip)
7. Run a search (https://test.digital-agenda-data.eu/search-indicators?q=media)
8. Check web analytics configuration (https://digital-agenda-
data.eu/portal_skins/custom/analytics.js/manage_main)
9. Check for live activity in Matomo (https://digital-agenda-data.eu/analytics)
10. Review the Privacy page (https://digital-agenda-data.eu/privacy). It should contain an embedded
iframe showing the status (Opt-In/Opt-Out) for web analytics.
Digital Agenda Data Website –Deployment Manual (D7)
Page 14 Support services for the Digital Agenda Data website - SMART 2015/1086
11. Check mail settings (https://digital-agenda-data.eu/@@mail-controlpanel)
12. Check log files for errors:
/var/log/httpd/*.log
cd <APP_HOME>
docker-compose logs nginx
docker-compose logs plone
docker-compose logs zeoserver
docker-compose logs virtuoso
docker-compose logs cr
docker-compose logs piwik
2.6. Application-specific settings
The following settings are explained in more details in deliverable D2 - Technical Report:
▪ Open https://digital-agenda-data.eu and login using an administrator account using the Login
link in the footer
▪ Check the SMTP settings in page https://digital-agenda-data.eu/mail-controlpanel
▪ Check the reCAPTCHA keys in page https://digital-agenda-data.eu/recaptcha-settings
▪ Check the email addresses that receive notifications when comments are posted, in page
https://digital-agenda-data.eu/ploneboard_notification
▪ Check the properties in page https://digital-agenda-data.eu/portal_registry/ (select prefix
IDataCubeSettings from the dropdown list). In particular, the test instance must have different
values for parameters DEFAULT_CR_URL, DEFAULT_SPARQL_ENDPOINT,
DEFAULT_USER_SPARQL_ENDPOINT