unidades de saúde pública em curitiba: uma análise
-
Upload
khangminh22 -
Category
Documents
-
view
1 -
download
0
Transcript of unidades de saúde pública em curitiba: uma análise
UNIVERSIDADE TECNOLÓGICA FEDERAL DO PARANÁ
PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO APLICADA
TATIANE ARAUJO MUNIZ LAUTERT
UNIDADES DE SAÚDE PÚBLICA EM CURITIBA: UMA ANÁLISEEXPLORATÓRIA E UM PROTÓTIPO DE DASHBOARD DE SAÚDEPARA APOIO À DECISÃO NO DOMÍNIO DA GESTÃO EM SAÚDE.
DISSERTAÇÃO
CURITIBA
2020
TATIANE ARAUJO MUNIZ LAUTERT
UNIDADES DE SAÚDE PÚBLICA EM CURITIBA: UMA ANÁLISEEXPLORATÓRIA E UM PROTÓTIPO DE DASHBOARD DE SAÚDEPARA APOIO À DECISÃO NO DOMÍNIO DA GESTÃO EM SAÚDE.
Public health units in Curitiba: an exploratory analysis and a healthdashboard prototype for decision support in health management domain.
Dissertação apresentada ao Programa de Pós-Graduação em Computação Aplicada (PPGCA)da Universidade Tecnológica Federal do Paraná(UTFPR) como requisito à obtenção do título de“Mestra em Ciências” - Área de Concentração:Engenharia De Sistemas Computacionais.
Orientador(a): Prof(a). Dr(a). Nádia PuchalskiKozievitchCoorientador(a): Prof(a). Dr(a). Monika Akbar
CURITIBA
2020
4.0 Internacional
Esta licença permite compartilhamento, remixe, adaptação e criação a partir do trabalho,
mesmo para fins comerciais, desde que sejam atribuídos créditos ao(s) autor(es). Con-
teúdos elaborados por terceiros, citados e referenciados nesta sobra não são cobertos
pela licença.
12/01/2021 -
https://utfws.utfpr.edu.br/aluno01/sistema/mpCADEDocsAssinar.pcTelaAssinaturaDoc?p_pesscodnr=202022&p_cadedocpescodnr=3790&p_cadedoc… 1/1
Ministério da Educação
Universidade Tecnológica Federal do Paraná Câmpus Curitiba
TATIANE ARAUJO MUNIZ LAUTERT
UNIDADES DE SAÚDE PÚBLICA EM CURITIBA: UMA ANÁLISE EXPLORATÓRIA E UM PROTÓTIPO DEDASHBOARD DE SAÚDE PARA APOIO À DECISÃO NO DOMÍNIO DA GESTÃO EM SAÚDE.
Trabalho de pesquisa de mestrado apresentado como requisitopara obtenção do título de Mestra Em Computação Aplicada daUniversidade Tecnológica Federal do Paraná (UTFPR). Área deconcentração: Engenharia De Sistemas Computacionais.
Data de aprovação: 17 de Dezembro de 2020
Prof.a Nadia Puchalski Kozievitch, Doutorado - Universidade Tecnológica Federal do Paraná
Prof.a Anelise Munaretto Fonseca, Doutorado - Universidade Tecnológica Federal do Paraná
Prof Artur Ziviani, Doutorado - Laboratório Nacional de Computação Científica
Documento gerado pelo Sistema Acadêmico da UTFPR a partir dos dados da Ata de Defesa em 17/12/2020.
“Dedico este trabalho ao meu pai Israel Araujo
Muniz (in memoriam), que me ensinou que para
aprender é preciso manter-se humilde.”
ACKNOWLEDGEMENTS
Primeiramente agradeço a Deus pela oportunidade em cursar este mestrado.
À minha orientadora, professora Dra. Nádia Puchalski Kozievitch, pelo todo o suporte,
motivação nos momentos difíceis, disponibilidade e incansáveis direcionamentos.
À minha co-orientadora, professora Dra. Monika Akbar, também pelas revisões e
contribuições dadas ao longo deste trabalho.
Aos professores e coordenadores da Pós, por todo o conteúdo ministrado, direciona-
mento e compartilhamento de conhecimento em cada matéria do curso.
Aos demais profissionais da instituição que direta e/ou indiretamente contribuem para
que tenhamos acesso a toda infraestrutura.
Aos meus familiares pela paciência, apoio e compreensão em momentos de ausência
para que este trabalho pudesse ser concluído.
Ao meu esposo, Filipe por suas palavras de ânimo, incentivo e encorajamento mesmo
nos momentos em quais eu mesma não acreditava possível continuar em frente.
Às minhas filhas, Melissa e Sophie pelos inevitáveis momentos de ausência, apoio e
por me fazerem querer ser uma pessoa cada vez melhor.
"There are many hypotheses in science that are
wrong. That’s perfectly alright; it’s the aperture
to finding out what’s right. Science is a
self-correcting process. To be accepted, new
ideas must survive the most rigorous standards
of evidence and scrutiny." Carl Sagan
RESUMO
LAUTERT, Tatiane Araujo Muniz. Unidades de Saúde Pública em Curitiba: Uma análiseexploratória e um protótipo de dashboard de saúde para apoio à decisão no domínio dagestão em saúde.. 2020. 70 f. Dissertação (Mestrado em Computação Aplicada) – UniversidadeTecnológica Federal do Paraná. Curitiba, 2020.
Garantir serviços de saúde adequados à população é um desafio, principalmente em países emdesenvolvimento, onde recursos limitados devem ser otimizados para atingir um percentual maiorda população. Para avaliar adequadamente os serviços de saúde e priorizar novos investimentos,é importante coletar, integrar e analisar grande quantidade de dados relevantes. Esta disser-tação apresenta um estudo agregando dados socioambientais, socioeconômicos, geográficos,de transporte e de saúde disponíveis ao público de diferentes fontes para a cidade de Curitiba,Brasil, com o objetivo de compreender a dinâmica das necessidades atuais de saúde e o uso dosistema de saúde pelos moradores da cidade. Esta dissertação apresenta uma análise detalhadadas consultas médicas, do bairro de procedência dos pacientes, das condições de saúde maisfrequentes relatadas nas fichas de atendimento médico. Também analisamos o aparecimento decertas doenças infecciosas nos bairros da cidade e as correlações entre essas doenças e diversosfatores socioambientais e socioeconômicos. Por fim, o artigo discute os resultados da análiseexploratória, destacando possíveis pontos que merecem atenção especial ou investimentos dagestão municipal para melhor atender o público.
Palavras-chave: Dados Abertos. Dados de Saúde. Análise Exploratória. Visualização de Dados.Dashboard.
ABSTRACT
LAUTERT, Tatiane Araujo Muniz. Unidades de Saúde Pública em Curitiba: Uma análiseexploratória e um protótipo de dashboard de saúde para apoio à decisão no domínio dagestão em saúde.. 2020. 70 p. Dissertation (Master’s Degree in Applied Computing) –Universidade Tecnológica Federal do Paraná. Curitiba, 2020.
Guaranteeing adequate health services to the population is a challenge, especially in developingcountries where limited resources must be optimized in order to reach a larger percentage of thepopulation. To properly assess health services and prioritize new investments, it is importantto collect, integrate, and analyze large amount of relevant data. This dissertation presentsa study aggregating publicly available socio-environmental, socio-economic, geographical,transportation, and health data from different sources for the city of Curitiba in Brazil, tounderstand the dynamics of current healthcare needs and healthcare usage by the city dwellers.This dissertation presents a detailed analyses on medical appointments, the neighborhood wherepatients come from, the most frequent health conditions reported from the medical assistancerecords. We also analyze the onset of certain infectious diseases in the city’s neighborhoodsand correlations between these diseases with various socio-environmental and socio-economicfactors. Finally, the dissertation discusses the findings from our exploratory analysis, highlightingpossible points deserving special attention or investments from the city management to betterserve the public.
Keywords: Open Data. Health Data. Exploratory Data Analysis. Data Visualization. Dashboard.
LIST OF FIGURES
Figure 1 – Research Areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2 – Methodology steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Figure 3 – RDA COVID-19 Guidelines and Recommendations. . . . . . . . . . . . . . 21Figure 4 – Steps for investigating disease outbreaks . . . . . . . . . . . . . . . . . . . 23Figure 5 – Data Visualization and Explanatory Data Analysis (EDA) relationship. . . . 24Figure 6 – Steps taken in preparation for data analysis. . . . . . . . . . . . . . . . . . 28Figure 7 – Number of medical appointments across the years by month. . . . . . . . . 30Figure 8 – Distribution of appointments by age range for each gender. . . . . . . . . . 31Figure 9 – Population + Medical appointments per health units + Income per household. 32Figure 10 – Histogram showing the distance of the HU to the bus stops in meters. . . . . 33Figure 11 – The top 5 Health Units most visited by patients from other neighborhoods. . 34Figure 12 – Time comparison of general medical visits and top 5 health units visited by
patients from other neighborhood. . . . . . . . . . . . . . . . . . . . . . . 36Figure 13 – Number of appointments by hour per day of the week for all health units. . . 36Figure 14 – Number of appointments by hour per day of the week for top 5 health units
visited by patients from other neighborhoods. . . . . . . . . . . . . . . . . 36Figure 15 – Distribution of appointment by gender and age range in the top 5 health units 37Figure 16 – Influenza (Jan-2017 to Dec-2019). . . . . . . . . . . . . . . . . . . . . . . 40Figure 17 – Dengue (Jan-2017 to Dec-2019). . . . . . . . . . . . . . . . . . . . . . . . 40Figure 18 – Escherichia coli (Jan-2017 to Dec-2019). . . . . . . . . . . . . . . . . . . . 41Figure 19 – Hepatitis (Jan-2017 to Dec-2019). . . . . . . . . . . . . . . . . . . . . . . 41Figure 20 – Yellow Fever (Jan-2017 to Dec-2019). . . . . . . . . . . . . . . . . . . . . 41Figure 21 – Measles (Jan-2017 to Dec-2019). . . . . . . . . . . . . . . . . . . . . . . . 41Figure 22 – Meningitis (Jan-2017 to Dec-2019). . . . . . . . . . . . . . . . . . . . . . 42Figure 23 – Health Dashboard Architecture. . . . . . . . . . . . . . . . . . . . . . . . . 45Figure 24 – Health Dashboard database tables and relationship. . . . . . . . . . . . . . 46Figure 25 – Interface of the Health Dashboard Prototype. . . . . . . . . . . . . . . . . . 47Figure 26 – Health Dashboard Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Figure 27 – Health Dashboard Map Layers. . . . . . . . . . . . . . . . . . . . . . . . . 48Figure 28 – Health Dashboard - Graphs available in the prototype. . . . . . . . . . . . . 49Figure 29 – Heat Map Layer for a given infection disease. . . . . . . . . . . . . . . . . 50Figure 30 – Region Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 31 – Neighborhood Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Figure 32 – Health Unit Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Figure 33 – Health Units and Neighborhood Layers. . . . . . . . . . . . . . . . . . . . 53Figure 34 – Users ability to complete usability testing scenarios. . . . . . . . . . . . . . 55Figure 35 – Evaluation Questionnaire Results - Easiness of Use and Graph Complexity. . 55Figure 36 – Evaluation Questionnaire Results - Relevance of Filters. . . . . . . . . . . . 56
LIST OF TABLES
Table 1 – Summary of challenges, guidelines and recommendations of RDA COVID-19Working Group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Table 2 – Types of Health Units from Curitiba . . . . . . . . . . . . . . . . . . . . . . 27Table 3 – Database table unidade_de_saude . . . . . . . . . . . . . . . . . . . . . . . 28Table 4 – Total of medical appointments over the years. . . . . . . . . . . . . . . . . . 29Table 5 – Quality standards for public transportation by bus in meters. . . . . . . . . . 32Table 6 – Top 5 ICD for medical visits performed outside the patient’s residential neigh-
borhood. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Table 7 – Top 5 ICD for all the medical visits. . . . . . . . . . . . . . . . . . . . . . . 35Table 8 – Selected Infectious Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . 39Table 9 – Cases of selected diseases over the years. . . . . . . . . . . . . . . . . . . . 39Table 10 – Medical appointments which triggered hospitalization . . . . . . . . . . . . 42Table 11 – Medical appointments which triggered hospitalization by Gender . . . . . . 42Table 12 – Usability testing scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Table 13 – Evaluation Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Table 14 – Suggestions for Improvement . . . . . . . . . . . . . . . . . . . . . . . . . 56Table 15 – Medical Records Data Dictionary. . . . . . . . . . . . . . . . . . . . . . . . 66
LIST OF ACRONYMS
ACRONYMS
BHU Basic Health UnitsBSI British Standards InstitutionCIC Cidade Industrial de CuritibaCNES Cadastro Nacional de Estabelecimentos de SaúdeCDC Centers for Disease Control and PreventionDDL Data Definition LanguageDML Data Manipulation LanguageDQL Data Query LanguageECU Emergency Care UnitsEDA Explanatory Data AnalysisESRI Environmental Systems Research InstituteGIS Geographic Information SystemIBGE Brazilian Institute of Geography and StatisticsICD International Classification of DiseasesIPPUC Instituto de Pesquisa e Planejamento de CuritibaISO International Organization for StandardizationITU International Telecommunication UnionNNDSS National Notifiable Diseases Surveillance SystemPSAC Psychosocial Attention CentersRDA Research Data AllianceSARS Severe Acute Respiratory SyndromeSUS Sistema Único de SaúdeUTFPR Universidade Tecnológica Federal do ParanáWHO World Health Organization
CONTENTS
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.1 GENERAL OBJECTIVE . . . . . . . . . . . . . . . . . . . . . . . . . . 141.1.1 Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.2 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.3 PUBLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4 DISSERTATION STRUCTURE . . . . . . . . . . . . . . . . . . . . . . 16
2 BASIC CONCEPTS AND RELATED WORK . . . . . . . . . . . . . . . 172.1 GEOGRAPHIC INFORMATION SYSTEMS . . . . . . . . . . . . . . . 172.2 SMART CITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 OPEN DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.1 Public Health Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.3.2 Disease Outbreaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 EXPLORATORY DATA ANALYSIS . . . . . . . . . . . . . . . . . . . 232.5 CHALLENGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 DATA COLLECTION, PRE-PROCESSING, AND INTEGRATION . . 273.1 PREPARATION FOR EXPLORATORY DATA ANALYSIS . . . . . . 27
4 EXPLORATORY DATA ANALYSIS . . . . . . . . . . . . . . . . . . . . 294.1 MEDICAL APPOINTMENTS ACROSS THE MONTHS AND YEARS 294.2 GENDER AND AGE RANGE . . . . . . . . . . . . . . . . . . . . . . . 304.3 LOCATION ANALYSIS OF THE HEALTH UNITS . . . . . . . . . . . 314.4 PUBLIC TRANSPORT ACCESSIBILITY TO HEALTH UNITS . . . 324.5 DISPLACEMENTS TO ARRIVE AT HEALTH UNITS . . . . . . . . . 334.6 SELECTED INFECTIOUS DISEASES . . . . . . . . . . . . . . . . . . 374.6.1 Distribution of the selected diseases . . . . . . . . . . . . . . . . . . . . . . 394.7 HOSPITALIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 HEALTH DASHBOARD PROTOTYPE . . . . . . . . . . . . . . . . . . 435.1 ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.1.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.1.2 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 APPLICATION INTERFACE . . . . . . . . . . . . . . . . . . . . . . . 475.3 USABILITY TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.4 LESSONS LEARNT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
APPENDIX 65
12
1 INTRODUCTION
People are usually attracted to live in cities due to city services (such as health, education,
water), job perspectives, and living conditions (access to transportation, public facilities, etc.)
Cities influence people’s health and well-being through policies and interventions, including
those addressing social inclusion and social support; support for healthy and active lifestyles (for
example, the existence of cycling lanes); safety and environmental issues supporting children
and elderly population; working conditions; climate change preparedness; among others1.
According to the World Health Organization (WHO)2, today’s cities are facing a triple
health burden: infectious diseases (such as pneumonia, dengue, HIV/AIDS, tuberculosis, pneu-
monia); noncommunicable diseases (such as heart disease, stroke, asthma) and other respiratory
illnesses, cancers, diabetes and depression; and violence and injuries, including road traffic
injuries.
On the other hand, the increase in life expectancy and the aging of the world popu-
lation may increase demand in health systems which are already saturated in most countries
(BITTENCOURT; HORTALE, 2009). High population density combined with poverty and the
lack of adequate sanitation, can create conditions where infectious diseases can spread easily.
This increasing demand on the health system directly affects the quality of services provided.
Governments from several countries are taking initiatives to provide open health data,
such as the European Data Portal3 and United States4 to support primary health care and local
services5. In the meantime, since the approval of the Brazilian Law in Information Access in
20116, public agencies are making available their data in a variety of ways to citizens: through
transparency portals (e.g., the government of Paraná7 and API’s (Application Programming
Interface), (e.g., Brazilian Central Bank8). Along the same line, Curitiba, the largest city of the
Brazilian state of Paraná, is the eighth most populous city of Brazil, and has been participating in1 https://www.euro.who.int/en/health-topics/environment-and-health/urban-health/publications/2019/implementation-
framework-for-phase-vii-20192024-of-the-who-european-healthy-cities-network-goals-requirements-and-strategic-approaches-2019, last accessed 22-Jul-2020.
2 https://www.who.int/health-topics/urban-health, last accessed 22-Jul-2020.3 https://www.europeandataportal.eu/en/highlights/open-health-data-european-data-portal, last accessed 22-Feb-
2020.4 https://healthdata.gov/, last accessed 22-Feb-2020.5 https://www.euro.who.int/__data/assets/pdf_file/0003/376833/almaty-acclamation-mayors-eng.pdf, last ac-
cessed 22-Jul-2020.6 http://www.planalto.gov.br/ccivil_03/_ato2011-2014/2011/lei/l12527.htm, last accessed 23-Jul-2020.7 http://www.transparencia.pr.gov.br/, last accessed 22-Jul-2020.8 https://dadosabertos.bcb.gov.br/, last accessed 22-Jul-2020.
13
Figure 1 – Research Areas.
Smart City
GeographicInformation
Systems(GIS)
Exploratory DataAnalysis
Data Analysis
Visualization
Is aggregatedwith
Other SourcesSocio-Economic
DataCensus
DataGeorreferenced
Data
Open DataHealthData
Health DashboardPrototype
Dashboard
Source: The Author.
open data initiatives along with several government stakeholders, such as Instituto de Pesquisa
e Planejamento de Curitiba (IPPUC)9 and the Municipality of Curitiba through its Open Data
Portal10.
Along with the data, the use of technologies, such as Geographic Information Systems
(GIS) assist in the analysis of services within a city. Initiatives such as Open Cities Project11 is
an example of that, where geographic and open data are combined to help urban planner and city
administrators make better informed decisions on natural disaster preparedness.
This dissertation presents a new analysis based on distinct data sources, including:
public health data, health units’ location, patients’ residence location and availability of bus stops
near the health units. Data about medical appointments from public Health Units of Curitiba is
investigated and integrated with the city’s georeferenced data. More specifically, it includes an
analysis of patient residence neighborhood versus the health units, categorizing the most frequent
diseases these patients were diagnosed with when the medical appointment took place, and the
spacial distribution of a group of diseases. The goal is to understand the medical appointments
and the diseases at public health units in Curitiba by doing an exploratory data analysis and relate
this information to other factors and data sources (such as health units location, accessibility of
bus stops).
The research areas of this dissertation is summarized in Figure 1.9 http://ippuc.org.br/, last accessed 22-Feb-2020.10 http://www.curitiba.pr.gov.br/DADOSABERTOS/, last accessed 22-Feb-2020.11 https://www.worldbank.org/en/region/sar/publication/planning-open-cities-mapping-project, last accessed 13-
Sep-2020.
14
1.1 GENERAL OBJECTIVE
The general objective of this dissertation is to understand the medical appointments at
public health units in Curitiba by doing an exploratory data analysis and relating this information
to other factors and data sources, such as health units location, accessibility of bus stops close
to these health units and displacements performed by the citizens to arrive at the health units,
aggregate it with socio-economic data and analyze the geographical distribution of selected
infectious diseases. Upon the exploratory data analysis, the objective is to also create a prototype
of a health data dashboard leveraging the code base developed by (PARCIANELLO, 2019).
1.1.1 Specific Objectives
To achieve the general objective, the following specific objectives are considered:
1. Explore related work on geographic information systems, smart cities, open data with
focus on health data, disease outbreaks and exploratory data analysis.
2. Review challenges of each area in order to understand their context.
3. Collect the raw data required and make it available for future researches.
4. Process, clean and aggregate the data to enable an exploratory data analysis.
5. Report any findings identified as part of the exploratory data analysis that might be useful
for health department to make decisions and hence contribute to public policies that can
be built to design or implement better services.
6. Develop a Health Dashboard prototype with geographic and temporal data leveraging the
code base developed by (PARCIANELLO, 2019).
7. Provide the solution for user testing and evaluation.
8. Analyse the usability testing and evaluation results and identify suggestions for improve-
ments based on the user evaluation.
15
Figure 2 – Methodology steps.
Basic Concepts &Related Work
Geographic information systemsSmart CitiesOpen data
Health dataDisease OutbreaksExploratory data analysis
Collect, Pre-process &Integrate Data
Provide a background on Curitibahealth systemDescribe the process to collect,pre-process and integrate thedata in preparation for theexploratory data analysis
Exploratory DataAnalysisPerform an exploratory data analysis on:
Medical appointments acrossmonths and yearsAge & gender distributionLocation analysis of health unitsPublic transport access to healthUnitsDisplacementsInfeccious diseasesHospitalization
Implement HealthDashboardPrototype
Create data model andconsolidate the dataDevelop a Health DashboardPrototype
Testing &Evaluation
Provide the link of the prototypefor users testing and evaluationIdentify opportunities forenhacements
ConclusionReport any findings identified aspart of the exploratory dataanalysisAnalyze results of testing andevaluation of the HealthDashboard Prototype
Source: The Author.
1.2 METHODOLOGY
The methodology of this dissertation has been divided into 6 steps, as shown in Figure 2.
The first step involved the research of the theoretical concepts related to the objectives intended,
therefore this work presents the basic concepts and related work on Geographic Information
Systems (GIS), Smart Cities, Open data with a focus on Public Health data, Disease Outbreaks,
and Exploratory Data Analysis, also it presents the challenges related to health data management.
The second step consisted of providing background about Curitiba health system, the process to
collect, pre-process, and integrate the data in preparation for the exploratory data analysis. As a
third step, a detailed exploratory data analysis is presented. The fourth step consisted of creating
a health dashboard prototype leveraging the code developed by (PARCIANELLO, 2019). At this
stage, the dashboard has been completely re-designed for the purpose of this work. Also, the
dashboard performance has been enhanced through the use of materialized views. As part of the
fifth step, the dashboard has been made available online and 7 users have been invited to perform
a usability test and evaluation of the dashboard. Also, users were asked to provide inputs for
future enhancements of the dashboard. In this step, the results of the usability test and evaluation
were presented. As a final and sixth step, the conclusion of this work is presented.
16
1.3 PUBLICATIONS
The following publications is directly related to this dissertation:
• LAUTERT, F. ; LAUTERT, T. ; Kozievitch, N. P. ; GOMEZ-JR, L. C. . Is the location of
Public Health Units in Curitiba meeting the citizen’s needs?. In: Workshop on Big Social
Data and Urban Computing, 2018, Rio de Janeiro. Proceedings of the Workshop on Big
Social Data and Urban Computing co-located with 44th International Conference on Very
Large Data Bases (VLDB 2018), 2018.
Additionally, during the Master Degree course the following research work, on distinct
subjects have also been produced and presented. The first has also been published as a book
chapter for the 10th Brazilian Workshop on Agile Methods (WBMA-2019), which is part of
Agile Brazil conference.
• LAUTERT, T. ; NETO, A. G. ; Kozievitch, N. P. A survey on agile practices and challenges
of a global software development team, 2019, Belo Horizonte. 10th Brazilian Workshop
on Agile Methods (WBMA-2019) (Agile Brazil 2019), 2019.
• LAUTERT, T ; How businesses are benefited from crowd source review systems, 2018,
Curitiba. III Workshop de Computação Social (UTFPR), 2018.
1.4 DISSERTATION STRUCTURE
This dissertation is organized as follows: Section 2 presents the concepts and gives an
overview on the related work. The data collection, pre-processing, and integration is presented
in Section 3, followed by the Exploratory Data Analysis on Section 4. Details about the Health
Dashboard prototype and its evaluation are presented in Section 5. Finally, conclusion is presented
at Section 6.
17
2 BASIC CONCEPTS AND RELATED WORK
Several health challenges can be listed as urgent for the next decade: (i) earning public
trust, (ii) harnessing new technologies, (iii) stopping infectious diseases, (iv) elevating health in
the climate debate, (v) delivering health in conflict and crisis, (vi) making healthy care fairer,
(vii) expanding access to medicine, (viii) preparing for epidemics, among others (World Health
Organization, 2020). This dissertation approaches the first three challenges, by using public
data from Curitiba public health system to provide and overview of the health system); by
using techniques such as Geographic Information Systems to better understand the data); and by
understanding the behaviour of infectious diseases in the public health system.
2.1 GEOGRAPHIC INFORMATION SYSTEMS
Geographic Information Systems (GIS) is a framework for gathering, managing, and
analysing data (ESRI, 2019). GIS integrates many types of data. It analyzes spatial location and
organizes layers of information into visualization using maps and 3D scenes, hence GIS helps to
reveal deeper insights into data, such as patterns, relationships, and situations. GIS can help in
the public health issues with maps and analysis to monitor and prevent pandemics, chronic or
infectious diseases, and monitor environmental quality that may affect community health.
The use of GIS techniques in the context of public health to investigate epidemiology is
of particular interest of a number of researchers. The most classical example is the one produced
in 1854 by (SNOW, 1856), who used maps to plot cholera cases and found a correlation with
the water supplied from a pump on a street called Broad in London, United Kingdom. In recent
studies, (NETO et al., 2014) demonstrate the use of GIS through a mobile application, which
provides maps, demographic data, health data and urban structure of some regions in the state
of São Paulo. (CAVALCANTE et al., 2018) describe an application with the location of both
public and private health units and also medical specialties offered by these health units. In
another study, (SOUSA et al., 2018) demonstrate a platform for preventing and combating
mosquito-borne diseases, in which the users can directly report cases of mosquito breeding sites
or disease cases.
Another example is presented by (VILAICHONE et al., 2020), with a community-based
study in Buthan, with the objective to assess antibiotic resistance patterns of H. pylori strains in
18
different geographical locations of the country to guide H. pylori treatment in order to reduce
gastric cancer mortality.
As highlighted by (HINO et al., 2006), analyzing disease distribution and determinants
in populations in space and time is an essential aspect of epidemiology. As per the literature
review of (RUSHTON, 2003), public health is now presented with the opportunity to examine
key relationships between the health characteristics of populations with both human and physical
environmental characteristics.
2.2 SMART CITIES
According to 2018 Revision of World Urbanization Prospects1, 55% of the world’s
population lives in urban areas today, and that is expected to increase to 68% by 2050.
As mentioned by (DAMERI; CAMILLE, 2014), in order to face the increasing problems
of urban areas, local public government, companies, not-for-profit organizations and the citizens
themselves embraced the idea of a smarter city, using more technologies, creating better life
conditions and safeguarding the environment.
As the effort of smart cities become a solution for managing the rapid urban growth
efficiently and effectively, there is a need of certain standardisation, with examples such as
International Organization for Standardization (ISO)2, British Standards Institution (BSI)3,
International Telecommunication Union (ITU)4, and others have published standards focused on
Smart Cities. BSI, for example, defines a smart city as one where there is “effective integration of
physical, digital and human systems in the built environment to deliver a sustainable, prosperous
and inclusive future for its citizens”5. For ISO6, a Smart City is one that dramatically increases
the pace at which it improves its social, economic and environmental (sustainability) outcomes.
In 2009, IBM published the IBM Smarter Cities campaign7, divided into 3 pillars:
planning and management, efficient daily management and infrastructure.
Data is also focused in smart cities (Harrison et al., 2010; BATTY et al., 2012). For
(Harrison et al., 2010) Smart Cities are urban areas that exploit operational data to optimize the1 https://population.un.org/wup/, last accessed 08-Jul-2019.2 https://www.iso.org/, last accessed 22-Feb-2020.3 https://www.bsigroup.com/, last accessed 22-Feb-2020.4 https://www.itu.int/, last accessed 22-Feb-2020.5 https://shop.bsigroup.com/upload/PASs/Free-Download/PAS180.pdf, last accessed 08-Mar-2020.6 https://www.iso.org/obp/ui/#iso:std:iso:37106:ed1:v1:en, last accessed on 08-Mar-2020.7 https://www.ibm.com/smarterplanet/us/en/, last accessed 16-Jun-2019.
19
operation of city services. (BATTY et al., 2012) describe that cities can only be smart if there are
intelligent functions that are able to integrate and synthesize the data to some purpose, in a way
of improving the efficiency, equity, sustainability and quality of life in cities.
2.3 OPEN DATA
As highlighted in (CALEGARI et al., 2016), we live in the age of data and the digitaliza-
tion of cities has led to produce massive data-sets and data streams related to urban environment.
Open data is a concept that governmental data should be available to anyone with a possibility
of redistribution in any form without any copyright restrictions (KASSEN, 2013). The overall
intention of the open data movement is to make local, regional and national data available in
machine readable format or direct processing form that allows for direct manipulation using
software tools for the purposes of cross tabulation, visualization, mapping and among others
(GURSTEIN, 2011).
As an example of data limitation, (SILVEIRA et al., 2015) refers to many mobility
models that are used to describe or predict urban mobility, but most of them are limited to a
single source of data to do such analysis.
Open data has been used in several contexts in the city of Curitiba. (KOZIEVITCH et
al., 2017) evaluate three decades of business activity through open data in the city of Curitiba and
in (VILA et al., 2016) an exploratory analysis was presented analyzing the bus service from the
perspective of pattern discovery, statistical analysis, data integration, and the use of connected
and open data. (NAKONETCHNEL et al., 2017) analyzes the open data of public transportation
in the cities of Curitiba and New York.
2.3.1 Public Health Data
Gathering relevant public health data is a challenge; health data are rarely available
although they are needed to monitor population health, prevent pandemics, chronic diseases, and
monitor environmental quality that may affect population’s health. Modern cities are flooded
with data and new information sources provide opportunities for novel applications that will
improve the citizen’s quality of life (KATAKIS, 2015).
Among the government systems aimed at managing and improving healthcare access to
20
Brazilians citizens (such as Cartão SUS 8, CNES9 and E-Saúde10), most of them do not provide
open data for research purposes. Although Brazil provides universal access to public healthcare
services11, data about specific issues such as immigrants or relation with transportation is rare.
Considering the public health data from Curitiba, several studies can be mentioned.
In the study of (LIMA et al., 2019), open data of Curitiba public health is aggregated with
transportation data, analysing the accessibility to Curitiba public health units via public trans-
portation. The study of (OLIVEIRA et al., 2018) presented a characterization of Paraguay’s
public health data, and using the information about the city of Asuncion a comparison was made
with Curitiba’s public health data. An approach based on evaluation to model the health care for
the elderly population in Curitiba health units is proposed by (SANTIN et al., 2017), segregating
the health data of the elderly population from the non-elderly.
If we consider the last pandemics, (DONG et al., 2019) present an interactive web-
based dashboard to track the novel coronavirus (COVID-19) in real time. In response to the
ongoing public health emergency, the researchers developed an online interactive dashboard12, to visualise and track reported cases of coronavirus. Further more, as a response to the
global COVID-19 Pandemic, the Global Research Data Alliance Community (RDA) (RDA
COVID-19 Working Groups, 2020) created a set of guidelines and recommendations for data
sharing of health data under the present COVID-19 circumstances. The report13 also includes
legal and ethical considerations, research software, community participation and indigenous data
(Figure 3).
Table 1 presents a summary of the challenges, guidelines and recommendations pre-
sented in the report of RDA COVID-19 Working Group.
8 http://portalsaude.saude.gov.br/index.php/oministerio/principal/secretarias/sgep/cartao-nacional-de-saude, lastaccessed 16-Nov-2018.
9 http://datasus.saude.gov.br/sistemas-e-aplicativos/cadastros-nacionais/cnes, last accessed 16-Nov-2018.10 http://esaude.curitiba.pr.gov.br/PortalSaude, last accessed 16-Nov-2018.11 http://portalms.saude.gov.br/index.php/sistema-unico-de-saude/sistema-unico-de-saude, last accessed 16-Nov-
2018.12 https://arcg.is/0fHmTX, last accessed 22-Feb-2020.13 https://www.rd-alliance.org/global-research-data-alliance-community-response-global-covid-19-pandemic, last
accessed on 06-May-2020.
21
Figure 3 – RDA COVID-19 Guidelines and Recommendations.
Source: Global Research Data Alliance(2020).
Table 1 – Summary of challenges, guidelines and recommendations of RDA COVID-19 WorkingGroup.
Summary of challenges, guidelines and recommendations of RDASub-groups /cross cuttingthemes
Challenges Guidelines for researchers Recommendations forfunders / policy makers
Clinical Promotion of clinical datasharing is important dueto many studies and tri-als being performed underenormous time pressure
Standardised clinicalterminologies shouldbe used and a fair bal-ance achieved betweentimely data sharing andprotecting privacy andconfidentiality
Measures should be takenin order to organise thesharing of data and trialdocuments in a suitable,trustworthy and securedata repository
Omics An increased need ofrapid openness for omicsdata to gain early insightsinto molecular biology ofthe processes at cellularlevel
Omics research should bea collaborative effort tolearn the genetic determi-nants of COVID-19 sus-ceptibility, severity andoutcomes
Promote use of domainspecific repositories to en-able standardisation ofterms and enforce meta-data standards
Epidemiology Data and models are fre-quently incomplete, pro-visional, and subject tocorrection under changingconditions
Data models must includeclinical data, disease mile-stones, indicators and re-porting data, contact trac-ing and personal risk fac-tors
Incentivise the publica-tion of situational data, an-alytical models, scientificfindings, and reports usedin decision making
22
Continuation of Table 1Sub-groups /cross cuttingthemes
Challenges Guidelines for researchers Recommendations forfunders / policy makers
Social Sciences Need equal inclusion ofsocial and economic is-sues with medical infor-mation to enable evidence-based decision making
Promote interoperablecross disciplinary andcross-cultural data useand collaboration formanaging social sciencedata during pandemics
Robust funding streamsfor social science researchfor understanding andmanaging the behavioural,and economic aspects
Community Need specific guidelinesfor enabling citizen scien-tists undertaking researchto contribute to a commonbody of knowledge
Encourage public and pa-tient involvement (PPI)throughout the data man-agement lifecycle from re-search question to finaldata sharing and usage
Balance between timelytesting and contact trac-ing, emergency response,community safety and in-dividual privacy concerns
IndigenousData Guide-lines
Indigenous data rights,priorities and interestsmust be recognised inCOVID-19 research activ-ities
Co-determination ofdata collection, owner-ship, sharing and usepriorities is the centralprinciple of Indigenousdata sovereignty
CARE Principles ofIndigenous Data Gov-ernance set minimalguidance for collectors,users and stewards of data
Legal and Eth-ical Considera-tions
Achieve a balance be-tween rights of people andinterests of researchersand policymakers
Ethical instrumentsshould be interpreted withthe law, and can guide theinterpretation of the law ifthe law does not address aparticular issue
During a pandemic, eth-ical review and approvalfor legally sharing datashould be expedited
Research Soft-ware
Need systems in placefor rapid dissemination ofdata and accelerated andreproducible research dur-ing a pandemic
It is critical for softwarethat is used in data analy-sis to produce results thatcan, if necessary, be repro-duced
Funders must allocate fi-nancial resources to sup-port the development andmaintenance of new re-search software
Source: Adapted from RDA (2020).
2.3.2 Disease Outbreaks
A disease outbreak is the occurrence of disease cases in excess of normal expectancy14.
The number of cases varies according to the disease-causing agent, and the size and type of
previous and existing exposure to the agent, and the definition may vary for countries, regions
and even cities.
In Brazil, this information can be found in Epidemiological Bulletins, which contains
detailed data and analysis15. In July, 2000 the Brazilian Ministry of Health implemented Episus-
Advanced, which is a Training Program in Applied Epidemiology. Professionals from this training
program created a Guide for Investigations of Outbreaks or Epidemics16. EpiSUS-Advanced14 https://www.who.int/environmental_health_emergencies/disease_outbreaks/en/, last accessed 19-July-2020.15 https://www.saude.gov.br/boletins-epidemiologicos, last accessed 19-July-2020.16 http://www.saude.gov.br/images/pdf/2018/novembro/21/guia-investigacao-surtos-epidemias-web.pdf, last ac-
23
Figure 4 – Steps for investigating disease outbreaks
Source: (Adapted from Episus-Advanced Guide. (2018))
has become one of the main response strategies to public health emergencies in nationwide.
According to the guide, there are 10 steps for investigation of a disease outbreak, as summarized
in Figure 4.
In USA the information is provided by the Center for Disease Control and Prevention
(CDC)17. The WHO outbreak definition (World Health Organization and others, 1999) states that
for a defined area, the average number of cases from previous years can be taken as a threshold.
All observations above that threshold should be considered as an outbreak. In order to detect the
outbreaks of these diseases, we are using a modified version of the World Health Organization
(WHO) outbreak definition. Instead of using the raw disease count, we set the threshold for
outbreak at two standard deviations in excess of the endemic channel (i.e., average) (BRADY et
al., 2015).
2.4 EXPLORATORY DATA ANALYSIS
Explanatory Data Analysis (EDA) is a branch of statistical analysis (TUKEY, 1977) .
It is an approach for analyzing data-sets in order to summarize the main characteristics of the
cessed 19-July-2020.17 https://wwwnc.cdc.gov/travel/destinations/traveler/none/brazil, last accessed 19-July-2020.
24
Figure 5 – Data Visualization and Explanatory Data Analysis (EDA) relationship.
Source: Schutt and O’Neil (2013).
data, often consisting in visual methods and graphical techniques to explore, insight into or
categorize the data. Figure 5 shows that EDA is a continuous approach in the context of Data
Science process and it is primarily used for seeing what the data can tell.
According to (Ferreira et al., 2013) with the increasing volume of urban data and more
data becoming available, new opportunities arise from data-driven analysis which can reveal
opportunities for improvements in the urban spaces. The data presents many challenges, due to the
fact that data are complex, contain geographical and temporal components in addition to multiple
variables. Similarly, (EDSALL, 2003) presents a system for exploring multidimensional health
statistics. The system makes use of GIS to explore health-statistics data of many dimensions, it
is an interactive system which allows multiple perspectives on complex information.
2.5 CHALLENGES
Several challenges can be listed within health data:
Huge volume of data: Many cities are taking initiatives to provide access to open data. The
challenge is how to make sense of such large amount of data, since the data is complex, and
might contain geographical and temporal components (Ferreira et al., 2013).
Data Integrity and consistency: When correlating multiple data sources, integrity and consis-
tency among the different sources of data is a challenge, specially in terms of inconsistencies
25
in data vocabulary, lack of common identifier across different data bases, missing data and
others(OLIVEIRA et al., 2018).
Prior knowledge required to make use of data: There are several limitations in the use of
open data(JANSSEN et al., 2012), which includes technologies, metadata and standardization.
Health data and privacy concerns: in the recent years health systems are being introduced,
with that patient records are becoming more electronic.Technologies helps improve the quality
of health care(Meingast et al., 2006), but there are still many concerns related to privacy when it
comes to health data. Data access, storage, and integrity are key challenges when it comes to
electronic patient records.
Health data and differences in data definitions and/or measurement methods: Health data
is derived from health information systems, including health-facility records, surveys or vital
statistics, and it may not be representative of the entire population of a country and in some
cases may not even be accurate. Comparisons between populations or over time can also be
complicated by differences in data definitions and/or measurement methods. Although some
countries may have multiple sources of data for the same year, it is more usual for data not to be
available for every population or year18.
The Brazilian Unified Health System (SUS)19 is one of the largest and most complex
public health systems in the world, encompassing individual treatment, ensuring full, universal
and free access for the entire population of the country. SUS was created in 1990, following
the change in the Federal Constitution of 1988 (CF-88), which article 19620 defines that health
is everyone’s right and the duty of the State. It is also presented that before the changes in the
CF-88, the public health system provided assistance only to workers linked to Social Security,
approximately 30 million people had access to hospital services, and philanthropic entities were
responsible for serving other citizens. (CASTRO et al., 2019) present how SUS has contributed
to improving the health and well-being of the Brazilian population since its 30 years of inception,
also shows how it has helped to reduce inequalities in health, however it also brings a study about
how recent political measures may present a threat to future expansion and sustainability of the
SUS.
From a data measurement perspective, there is still lack a fully integrated health system,
for example, when a patient goes to a private hospital his historical medical records are may18 https://www.who.int/gho/publications/world_health_statistics/2018/en/, last accessed 09-Jun-2019.19 http://www.saude.gov.br/sistema-unico-de-saude, last accessed 19-Aug-2019.20 https://www.jusbrasil.com.br/topicos/920107/artigo-196-da-constituicao-federal-de-1988, last accessed 16-May-
2020.
26
not be available in that hospital system, unless the patient has always been treated in that same
hospital. And the same happens when the patient goes to a public hospital, his full medical
records may not be available if the patient has been treated in a private hospital previously.
27
3 DATA COLLECTION, PRE-PROCESSING, AND INTEGRATION
For this dissertation, the dataset of medical services was used, basically focusing on
in Emergency Care Units and Basic Health Units. According to information from the city’s
Department of Transportation, Curitiba has an average of 250 bus lines and 9,940 bus stops (VILA
et al., 2016). The city also has 23 bus terminals and one intercity and interstate terminal which
also offers train services. The medical services dataset was integrated with the transportation
data. One of the major challenges of the data integration was the consistency of data within the
different sources (different description names, overlapping of data, among others). The following
section provides more details on the data cleaning and integration process.
Table 2 – Types of Health Units from Curitiba
Unit Type N. of UnitsBasic Health Units (BHU) 111Health Spaces 68Psychosocial Attention Centers (PSAC) 13Emergency Care Units (ECU) 9Medical Specialty Centers 5Therapeutic Residences 5Hospitals 2Clinical Analysis Laboratory 1Central of Vaccines 1Zoonosis Center 1
Source: The Author.
3.1 PREPARATION FOR EXPLORATORY DATA ANALYSIS
The data used in this dissertation is originated from Curitiba Open Data Portal. We used
the Medical appointments only, from Basic Health Units (BHU) and Emergency Care Units
(ECU).
The raw data is available in Comma Separated Values (CSV) format and for most of
this dissertation the data comprises from January-2017 to December-2019. Each file contained
three months of data, and the summary of the columns is presented in in Table 3. Additionally
socioeconomic and census data are also aggregated, such as income per household data published
by Curitiba Agency1 and population density published by IBGE2.1 http://www.agenciacuritiba.com.br/, last accessed 16-May-2020.2 https://www.ibge.gov.br/, last accessed 16-May-2020.
28
Table 3 – Database table unidade_de_saude
Column Name DescriptionCd_equip Code of the health unitNome_abrev Short name of the health unitNome_mapa Name of the health unit on the mapCd_bairro Code of the neighborhoodBairro NeighborhoodQuadr_equi Block of the health unitCd_regiona Code of the RegionRegional RegionFunc_manha Whether or not it is open in the morningFunc_tarde Whether or not it is open in the afternoonFunc_24hr Whether or not it is open 24 hours a dayDesativado Whether or not it is disabledCoord_e Coordinate eCoord_n Coordinate nGeom Geometry
Source: The Author.
Figure 6 – Steps taken in preparation for data analysis.
Source: The Author.
The following technologies were used: PostgreSQL3, PostGIS4, QGIS5, versions 2.14.11
(Essen) and 2.18.18 (Las Palmas), Gephi6, Tableau Desktop version7 and Microsoft Excel8.
Figure 6 shows the major steps of the data transformation. In the first phase, data from different
sources was collected. In the second phase, the data was cleaned and integrated, followed by the
analysis and visualization.
3 https://www.postgresql.org, last accessed 16-Nov-2018.4 https://postgis.net, last accessed 16-Nov-2018.5 https://www.qgis.org/en/site/, last accessed 16-Nov-2018.6 https://gephi.org/, last accessed 16-Nov-2018.7 https://www.tableau.com/pt-br, last accessed 14-Nov-2020.8 https://www.microsoft.com/pt-br/microsoft-365/excel, last accessed 14-Nov-2020.
29
4 EXPLORATORY DATA ANALYSIS
Three years of medical appointment data (2017, 2018 and 2019) from 31 different
medical procedures were used as input. Table 4 shows the number of appointment across these
months and years. Note that the month of February 2017 is incomplete, because the data was
missing at the portal.
4.1 MEDICAL APPOINTMENTS ACROSS THE MONTHS AND YEARS
Table 4 shows the number of appointments by month across the years of 2017, 2018
and 2019. January and December were the months with the lowest number of appointments and
May, August and October the months with the highest number. Note that the total of medical
appointments per month shows an increase over the years.
Figure 7 shows the number of appointments by month across the years of 2017, 2018
and 2019. An average of 280,580 appointments were made by month in 2017, 313,075 in
2018 and 334,210 in 2019. January and December were the months with the lowest number of
appointments and May, August and October the months with the highest number. With this, we
can notice that the total of medical appointments per month shows an increase over the years.
Table 4 – Total of medical appointments over the years.
Months 2017 2018 2019Jan 229,452 276,070 281,713Feb 80 262,529 293,435Mar 282,615 316,269 299,559Apr 255,302 340,316 333,390May 303,252 350,874 361,254Jun 292,728 327,678 345,401Jul 272,112 309,117 346,419
Aug 302,229 345,852 367,456Sep 292,787 307,728 353,917Oct 310,991 349,514 398,133Nov 300,311 315,633 351,056Dec 244,601 255,320 278,788Total 3,086,460 3,756,900 4,010,521
Source: The Author.
30
Figure 7 – Number of medical appointments across the years by month.
Source: The Author.
4.2 GENDER AND AGE RANGE
According to the 2010 census data1, the gender distribution in Curitiba is 916,792
women and 835,115 men, that represents 52% of the population is female while 48% is male.
From the 3 years of medical appointments, we analyzed how the numbers of appointments are
distributed by gender, taking into consideration the ratio and proportion of male and female
population distribution. The results show that 60.62% were female patients while 39.38% were
male patients. According to studies presented by (BERTAKIS et al., 2000), women had a
significantly higher mean number of visits to their primary care clinic and diagnostic services
than men, this behavior is confirmed by the numbers presented in this dissertation.
The distribution of the number of appointments per age range among female and male
patients is shown on Figure 8. The only age range in which the number of male patients is higher
than the number of female patients is from 0 to 4 years old. While in all other age ranges the
number of female patients are always higher than male patients, even on non-reproductive age
ranges.1 https://censo2010.ibge.gov.br/sinopse, last accessed 09-Dec-2018.
31
Figure 8 – Distribution of appointments by age range for each gender.
Source: The Author.
4.3 LOCATION ANALYSIS OF THE HEALTH UNITS
From the 75 existing neighborhoods in Curitiba, 33 of them do not have a health unit.
Most of these 33 neighborhoods, are those with the highest income per household2. On the other
hand, in 29 neighborhoods there are 2 or more health units, and the Industrial District of Curitiba
(CIC) has the highest number of health units (a total of 16).
Figure 9-left shows a heat map with the distribution of the population according to data
from the 2010 census3. As a comparison, Figure 9-center shows the medical appointments per
health units for the year of 2017, and there is a notable overlap if compared to the heat map
on the left. Additionally, in Figure 9-right, there is an empty space around the downtown area,
which shows a heat map of income per household. Therefore, by looking at those maps it is
confirmed that the appointments at health units and the units themselves are located closer to the
population with the lowest income.2 http://www.agencia.curitiba.pr.gov.br/arquivos/regionais/perfil-economico-regional-matriz.pdf, last accessed
16-Nov-2018.3 http://www.ippuc.org.br/nossobairro/nosso_bairro.htm, last accessed 16-Nov-2018.
32
Figure 9 – Population + Medical appointments per health units + Income per household.
Source: The Author.
4.4 PUBLIC TRANSPORT ACCESSIBILITY TO HEALTH UNITS
According to (FERRAZ; TORRES, 2014) there are twelve main factors that influence
quality of urban public transportation, being them: accessibility, frequency of service, time of
journey, capacity, security, vehicle resources, stop resources, information system, connectivity,
operator behavior and road conditions. In this section, is analyzed the existence of bus stops near
the health units from the perspective of the distance between the health unit to the bus stop or
bus station. Table 5, shows the quality standards defined by (FERRAZ; TORRES, 2014) for the
accessibility factor, considering the walking distance parameter from the start to the end of the
trip.
Considering the quality of standards presented, all the 125 health units considered in
this dissertation have either a bus stop or a bus station well located, within a walking distance
of less than 300 radius in meters. Figure 10 shows a histogram with the number of health units
within each defined bucket of distance from the health units to bus stops or bus stations.
Table 5 – Quality standards for public transportation by bus in meters.
Factor Parameter Good Regular BadAccessibility Walking distance < 300 300-500 > 500
Source: Adapted from Ferraz and Torres (2004).
33
Figure 10 – Histogram showing the distance of the HU to the bus stops inmeters.
Source: The Author.
4.5 DISPLACEMENTS TO ARRIVE AT HEALTH UNITS
In this section, the displacements made by the citizens to arrive at the public health
units is analyzed. For this analysis, only the data from 2017 is used, due to the great amount of
data and computer processing limitation.
The results show that 12.34% of the 3,086,460 medical appointments made in 2017
were performed outside the neighborhood where the patients reside. It means that a daily average
of 1,130 people went to another neighborhood to have a medical appointment. Figure 11
shows the displacement made by the patients from their residential neighborhood to the 5 most
visited health units by patients from outside the neighborhood where the unit is located. These 5
most visited health units are emergency care units (ECU) that provide service 24 hours a day,
being: Cajuru, Boqueirão, Boa Vista, Campo Comprido and Sítio Cercado. Figure11 shows a
network graph which highlights the intensity of the flow for those patients that had their medical
procedures outside their residential neighborhood - thicker lines represent a greater flow of
people. Looking at the numbers of those 5 units, a total of 28.64% of the medical appointments
were performed by people from other neighborhoods in 2017.
Note that ECU Boqueirão is one of the top 5 that most receive people from other neigh-
borhoods but it does not have a bus stop in the 200 meter area. For these medical visits performed
outside the patients’ residential neighborhood in 2017, the top 5 International Classification
of Diseases (ICD) codes are presented in Table 6 and compared to the 5 ICD codes of all the
medical visits of the same period for comparison purposes as shown in Table 7.
34
Figure 11 – The top 5 Health Units most visited by pa-tients from other neighborhoods.
Source: The Author.
35
Table 6 – Top 5 ICD for medical visits performed outside the patient’s residentialneighborhood.
Top 5 outside residential neighborhood TotalAcute upper respiratory infections, unspecified 14512Acute tonsillitis, unspecified 9229Acute nasopharyngitis [common cold] 8969General Medical Examination 8309Other gastroenteritis and colitis of infectious andunspecified origin
8098
Source: The Author.
Table 7 – Top 5 ICD for all the medical visits.
Top 5 ICDs for All Medical Visits TotalGeneral Medical Examination 309366Issue of Repeat Prescription 174749Acute upper respiratory infections , unspecified 124942Essential (primary) hypertension 108797Acute nasopharyngitis [common cold] 95349
Source: The Author.
Based on this information, it is possible to notice that the great majority of the medical
visits made outside the patients’ residential neighborhood consist of complications of the res-
piratory tract, which usually get aggravated at night. To confirm this information, the time of
these medical visits were checked in Figure 12. The numbers on the left represent the number of
medical appointments in general, the blue line shows the number of medical appointments by
hour at the health units and the numbers on the right represent the number of medical appoint-
ments at the top 5 most health units most visited by patients from other neighborhoods, in the
red line it is possible to see the number of medical appointment at the top 5 health units by hour.
In summary, this analysis points out that the number of medical visits is constant from 14:00
to 23:00 in the top 5 health units most visited by patients from other neighborhood, while the
number of general medical appointments drops sharply after this period. Figures 13 shows the
number of appointments by hour per day of the week for all health units while Figure 14 shows
the same view but only for the top 5 health units visited by patients from other neighborhoods.
Figure 13 and Figure 14 also reinforce the finding that when top 5 facilities get more patients,
the number of appointments in the general facilities declines proportionally.
In regards to medical appointments distribution by gender, looking only at the top
5 health units most visited by patients from other neighborhood it was found that female
patients represent 54% while male patients represent 46%. Comparing to the overall distribution
36
Figure 12 – Time comparison of general medical visits and top 5 healthunits visited by patients from other neighborhood.
Source: The Author.
Figure 13 – Number of appointments by hour per day of the week for all health units.
Source: The Author.
Figure 14 – Number of appointments by hour per day of the week for top 5 health units visited by patientsfrom other neighborhoods.
Source: The Author.
37
Figure 15 – Distribution of appointment by gender and age range in the top5 health units
Source: The Author.
mentioned in sub-session 4.1, the distribution is quite different. And finally, on Figure 15, the
gender distribution per age range on these top 5 health units, age ranges 0-4 and 0-14 show the
same pattern as the overall distribution on figure 3, while in all other ranges the differences in
the number of appointments decrease.
4.6 SELECTED INFECTIOUS DISEASES
The diseases were selected based on the following criteria:
1. A list of the most important diseases was ranked, using their symptoms and the TFIDF
(term frequency–inverse document frequency) as a quantifier based on more than 7 million
bibliographic records from PubMed4.
2. Once identified, they were checked against the National Notifiable Diseases Surveillance
System (NNDSS)5.
3. The diseases selected in items 1 and 2, were ranked with the highest occurrence in the
open data of the health units in Curitiba.4 https://pubmed.ncbi.nlm.nih.gov, last accessed 13-Mar-2020.5 https://www.cdc.gov/, last accessed 13-Mar-2020.
38
The occurrences of the selected diseases are shown in Table 8 for the period from
January 2017 to December 2019. The selected ICDs are grouped into 7 categories of infectious
diseases and occurrences by geographical distribution is plotted by neighborhood.
Escherichia coli: Also known as E. coli is a bacteria commonly found in the lower
intestine of humans and warm-blooded animals. Most E. coli strains are harmless, but some can
cause severe diarrhea. E. cole is transmitted by contaminated water or food. The Environmental
Institute of Parana6 performs monitoring of the waters used for recreation, places can be flagged
as appropriate or inappropriate for bathing. The quality of the water indicates the amount of
sewage present in the water. For this, the indicator Escherichia coli is used, the greater the number
of the bacteria in the water, the greater the amount of sewage and, consequently, the greater the
probability of the existence of pathogenic organisms (causing disease).
Dengue and Yellow Fever: are transmitted by a mosquito called Aedes aegypti
mosquito7. The transmission of dengue is done by the female mosquito, because the female
needs a blood protein for the maturation of the eggs. The Municipal Dengue Control Program is
part of the routine of the Environmental Health Center. Dengue manifests in two ways: Classical
Dengue and Hemorrhagic Dengue8. In December 2019, for example, Paraná state had 2,631
dengue cases and 10 municipalities in epidemic9.
Yellow Fever can be divided into wild yellow fever and urban yellow fever, they are
caused by the same virus, but transmitted by different mosquitoes. Wild fever is transmitted by
Haemagogus or Sabethes vectors and the urban fever is transmitted by Aedes aegypti.
Measles: is a highly contagious viral disease, transmitted from person to person through
coughing, sneezing, close personal contact or direct contact with infected secretions10. The most
common complications are otitis, pneumonia, diarrhea, encephalitis and neurological problems.
The disease can, in some cases, lead to death.
Hepatitis: viral hepatitis is a serious public health problem in Brazil and worldwide11.
The most relevant etiological agents in Curitiba are viruses A, virus B and virus C. It is estimated
that 1.9% of the Curitiba population has hepatitis C virus and that 60% of those infected have6 http://www.iap.pr.gov.br/pagina-297.html, last accessed 05-Jul-2020.7 http://www.saude.curitiba.pr.gov.br/orientacao-e-prevencao/dengue.html, last accessed 04-Jul-2020.8 http://www.saude.curitiba.pr.gov.br/images/MS%20Guia-febre-amarela-2018.pdf, last accessed 04-Jul-2020.9 http://www.aen.pr.gov.br/modules/noticias/article.php?storyid=105013&tit=Parana-tem-2.631-casos-de-
dengue-e-10-municipios-em-epidemia, last accessed 19-Jul-2020.10 http://www.saude.curitiba.pr.gov.br/noticias/1169-tire-todas-as-duvidas-sobre-a-vacinacao-contra-o-
sarampo.html, last accessed 05-Jul-2020.11 http://www.saude.curitiba.pr.gov.br/images/PROT.HEPATITES%20VIRAIS%20-2018%201.pdf, last accessed
05-Jul-2020.
39
not yet been diagnosed12.
Meningitis: is an inflammatory process of the meninges, the membranes that surround
the brain and spinal cord13. It can be caused by various infectious agents, such as bacteria, viruses,
parasites and fungi, or also by non-infectious processes. Bacterial and viral meningitis are the
most important from the point of view of public health, due to their magnitude, ability to cause
outbreaks, and in the case of bacterial meningitis, due to the severity of cases.
Influenza: Every year there are vaccination campaigns against influenza promoted
throughout Brazil. The goal of the campaign is focused on groups of people more susceptible to
the disease.
Table 8 – Selected Infectious Diseases
ICD DescriptionA04* Escherichia coliA90 DengueA95* Yellow FeverB05* MeaslesB15* HepatitisB16* HepatitisB17* HepatitisG008, G009 MeningitisJ10* Influenza
Source: The Author.
4.6.1 Distribution of the selected diseases
Table 9 – Cases of selected diseases over the years.
Year Influenza Dengue E. Coli Hepatitis Yellow Fever Measles Meningitis2017 3853 786 100 79 12 2 232018 3886 525 143 117 96 2 212019 3150 737 141 132 108 51 8Total 10889 2048 384 328 216 55 52
Source: The Author.
Table 9 presents the the distribution over the years for the previous selected diseases.
For normalization, the occurrence index for ratio and proportion, as presented in the equation 1,12 https://www.curitiba.pr.gov.br/noticias/centro-de-referencia-tem-97-de-cura-para-casos-de-hepatite-c/51698, last
accessed 05-Jul-2020.13 http://www.saude.curitiba.pr.gov.br/12-vigilancia/460-meningites.html, last accessed 11-Jul-2020.
40
Figure 16 – Influenza (Jan-2017 to Dec-2019). Figure 17 – Dengue (Jan-2017 to Dec-2019).
Source: The Author.
was used, considering the population density by neighborhood (based on census data from 2010).
Figures 12, 13, 14, 15, 16, 17, 18 presents the neighborhoods that have the highest occurrences
of certain infectious diseases, where the darker colors represent the highest number of cases per
neighborhood.
Among the selected diseases, Influenza and Dengue have a greater presence in Curitiba,
in addition it is possible to observe a sign of high correlation in the geographical distribution
of dengue and meningitis cases (52 cases). According to the Human Symptoms and Diseases
Network presented by (ZHOU et al., 2014), Dengue and Meningitis share some symptoms,
meningitis contains 57% of dengue symptoms. There are reports in the literature that indicate
that meningitis is a rare complication of dengue, as reported in (PUCCIONI-SOHLER et al.,
2013).
𝑂𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥 =𝑇𝑜𝑡𝑎𝑙𝑂𝑓𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝐵𝑦𝑁𝑒𝑖𝑔ℎ𝑏𝑜𝑟ℎ𝑜𝑜𝑑* 1000 (1)
4.7 HOSPITALIZATION
Within the selected data, it is also possible to see how many medical appointment trig-
gered hospitalization. Table 10, shows that about 0.7% of the medical visit require hospitalization.
41
Figure 18 – Escherichia coli (Jan-2017 to Dec-2019). Figure 19 – Hepatitis (Jan-2017 to Dec-2019).
Source: The Author.
Figure 20 – Yellow Fever (Jan-2017 to Dec-2019). Figure 21 – Measles (Jan-2017 to Dec-2019).
Source: The Author.
42
Figure 22 – Meningitis (Jan-2017 to Dec-2019).
Source: The Author.
Table 10 – Medical appointments which triggered hospitaliza-tion
Year Total Hosp TotalPerYear %Total Hosp2017 23681 3086460 0.77%2018 28185 3756900 0.75%2019 27516 4010521 0.69%
Source: The Author.
Table 11 – Medical appointments which triggered hospitalization by Gender
Year Female Hosp Male Hosp %Female Hosp %Male Hosp2017 11732 11949 46% 55%2018 14296 13889 47% 53%2019 14143 13373 47% 53%
Source: The Author.
In the context of gender distribution, it is possible to see in Table 11 that a higher percentage of
male patients required to be hospitalized, these numbers are also taking into consideration the
ratio and proportion of male and female population distribution in the Curitiba.
43
5 HEALTH DASHBOARD PROTOTYPE
The exploratory data analysis presented in session 4 of this dissertation was a manual
process which demanded a great amount of effort and required knowledge in different tools as
shown in Figure 6, previous knowledge in Data Definition Language (DDL), Data Manipulation
Language (DML), Data Query Language (DQL) in order to create the database tables needed,
insert the data into the database tables, read and aggregate the data with different sources,
and show the data in a way that facilitates the visualization of a great amount of data. This
shows that although open data is available for the general public to use, the process of making
use of a great amount of data and transforming data into information is complex. With this
in mind, this session presents a Health Data Dashboard prototype which leverages the code
produced by (PARCIANELLO, 2019). The purpose of (PARCIANELLO, 2019)’s prototype,
named Origin-destination was to understand the patterns of public transportation systems in
Curitiba1, therefore for this dissertation, the code has been redesigned to the purpose of analyzing
Curitiba public health data and enable the analysis of the health data in a dynamic way. From a
Database design point of view, Origin-destination uses indexes and table partitioning in order
to improve performance while the Health Dashboard uses Materialized Views. In regards to
application-database access management, Health Dashboard has been enhanced to use prepared
statements or parameterized statements.
5.1 ARCHITECTURE
This session presents details of the technologies used and presents the database design.
5.1.1 Technologies
The prototype enables end-users to easily access and analyze Curitiba public health
data, without the need for any software installation, data manipulation, or previous knowledge in
the data. Also, the prototype uses only open-source technologies to avoid any kind of spending
on software acquisition and/or contracting.
Database Server: the database server is a Linux server with Debian 9 distribution2 with1 https://github.com/yussefparcianello/OrigemDestinoStpCuritiba, last accessed 18-Oct-2020.2 https://www.debian.org/index.pt.html, last accessed 05-Nov-2020.
44
two cores of AMD EPYC 7401 processors and 1.5G of memory, the database system used was
PostgreSQL 9.5 x643 with PostGIS 2.1.1 extension4.
Application Server: the application server where the online prototype has been made
available for testing and evaluation is a Linux server with 12 Cores, 16GB of RAM and Ubuntu
distribution5 running the following software: Apache Server6 and PHP7, while the development
environment of application is a Intel Core i3-6006U x64 2.0GHz, 8GB RAM with Windows 10
64 bits, running Apache Server and PHP.
Maps and Libraries: Open Street Map8 combined with Leaflet library9 and Heatmap
puglin10 have been used for Mapping and Data visualization
Other Libraries such as: JQuery v. 3.3.111, Bootstrap v. 3.3.712, and Chart.js v. 2.7.313.
Figure 23 shows the technologies used and how they are segregated from Client and
Server side perspectives. The source-code of the prototype is available in GitHub14 and video
showing the prototype usage is available in Youtube15.
5.1.2 Database
To return the database query results in a fast and efficient way, materialized views have
been created. Materialized views are database objects that contain the results of a query, they
are useful to allow faster access to data or for making the data available in a way that facilitates
the displaying of the data in a graph without the need of querying multiple sources of data at
the time of processing. The use of materialized views in this prototype is an enhancement to
(PARCIANELLO, 2019)’s work.
Health Data Source Table: Similar to the data used in the exploratory data analysis
session, only the Medical appointments data has been used, from Basic Health Units (BHU) and
Emergency Care Units (ECU). A total of 14,382,414 medical records from 2016 to 2020 have3 https://www.postgresql.org/, last accessed 05-Nov-2020.4 https://postgis.net/, last accessed 05-Nov-2020.5 https://ubuntu.com/, last accessed 05-Nov-2020.6 https://httpd.apache.org/, last accessed 05-Nov-2020.7 https://www.php.net/, last accessed 05-Nov-2020.8 https://www.openstreetmap.org/, last accessed 18-Oct-2020.9 https://leafletjs.com/, last accessed 18-Oct-2020.10 https://github.com/Leaflet/Leaflet.heat, last accessed 18-Oct-2020.11 jquery.com/, last accessed 18-Oct-2020.12 getbootstrap.com/, last accessed 18-Oct-2020.13 www.chartjs.org/, last accessed 18-Oct-2020.14 https://github.com/TatianeLautert/dadosAbertosSaudeCuritiba, last accessed on 18-Oct-2020.15 https://youtu.be/47KMZO-N91A, last accessed 08-Nov-2020.
45
Figure 23 – Health Dashboard Architecture.
Server-Side Client-Side
Architecture: Health Dashboard - Prototype
Data Visualization
leaflet.js OpenStreetMap
Filter Panel
bootstrap.js
Graphs
jquery.js chart.js
Web Server
Apache HTTP Server PHP
js librariesand
plugins
Database Server
Postgres PostGIS
Source: The Author.
been inserted into this database table. Details of database columns are shown in Figure 24.
Geodata Tables: 3 geo-referenced database tables were used to plot the maps in
the prototype, details of these database table are shown in Figure 24. Database table limites_-
legais.divisa_de_bairros stores the geo-referenced data of Curitiba neighborhoods. Table limites_-
legais.divisa_de_regionais stores the geo-referenced data of Curitiba macro regions. Table
saude.unidade_saude stores the geo-referenced data of Curitiba Health Units.
The purpose of storing consolidated information from Health Data Source table in the 4
Materialized Views was in terms of (i) number of medical appointments by month and year, (ii)
number of medical appointments by age range and gender, (iii) number of medical appointments
by hour and day of the week and (iv) number of medical appointments by selected infectious
disease per week number.
46
Figure 24 – Health Dashboard database tables and relationship.
DatabaseGeodata Tables
limites_legais.divisa_de_regionaisgidcodigonometiposhape_areashape_lengeom
limites_legais.divisa_de_bairrosgidcodigonometiposhape_areashape_lengeom
saude.unidade_saudegidcd_equinomecoord_ecoord_ngeom
Health Data Source Table
saude.tatiane_atendimento_unidade_saudedt_atendimentodt_nascimentosexocod_tipo_unidadetipo_unidadecod_unidadedesc_unidadecod_procedimentodesc_procedimentocod_cbodesc_cbocod_ciddesc_cidsolicitacao_examesqt_prescrita_farmacia_ctbanaqt_dispensada_farmacia_ctbanaqt_medicamento_nao_padronizadoencaminhamento_atendimento_especialistaarea_atuacaodesencadeou_internamentodt_internamentoestab_solicitanteestab_destinoestab_destinoestab_destinotratamento_domicilioabastecimentoenergia_eletricatipo_habitacaodestino_lixofezes_urinacomodosem_casodoencagrupo_comunitariomeio_comunicacaomeio_transportemunicipiobairronacionalidadecod_usuarioorigem_usuarioresidentecod_profissional
Source: The Author.
47
Figure 25 – Interface of the Health Dashboard Prototype.
Source: The Author.
5.2 APPLICATION INTERFACE
In the prototype, all interaction takes place on a single screen. As shown in Figure 25,
the web interface is divided vertically in two: an interactive map with a search panel and map
layer options is available on the right; on the left, the results of the search are available in the
form of dynamic graphs. Through the search panel users can filter the data according to their
needs, and through the map layer option users can select the maps layers they wish to see on the
map.
Figure 26 shows the filter options available in the search panel. The first filter option
allows users to select the data by year, the second filter option allows users to filter the data based
on each Health Unit, the third filter allows users to filter by Month, the fourth option can be used
to filter the day of the week, and the last option allows users to select by a particular Infectious
Disease.
Figure 27 shows the mapping layers available in the prototype. By default, layer options
show Health Units, Disease Outbreak Heat Map, and Neighborhood are selected, Region layer is
not selected by default and this is to make the mapping visualization cleaner. The user can select
the desired mapping layers to be shown by clicking on the icon and then checking or unchecking
each mapping layer checkbox.
48
Figure 26 – Health Dashboard Filters.
Source: The Author.
Figure 27 – Health Dashboard Map Layers.
Source: The Author.
Figure 28 shows the 4 graphs available in the prototype, graph A from Figure 28 shows
the number of medical appointments over the months. Data is displayed according to the filter
criteria selected by the user. In case the user hovers the mouse over the bars of the graphs, a
message is displayed with the exact number of medical appointments for that given month. In
this example, it is possible to see that a filter by year 2020 has been selected by looking at the
title of the graph.
As part of the exploratory data analysis in session 4, the same information was presented
in the form of a table and a graph in Figure 7 for years 2017, 2018, and 2019. In that session, the
data presented was fixed and it was required previous knowledge of the data source, database
queries, visualization tool to display the information. With the prototype, the user can access the
49
Figure 28 – Health Dashboard - Graphs available in the prototype.
Source: The Author.
information without any previous knowledge, and data is dynamically displayed according to the
user’s preference and needs.
Graph B from Figure 28 shows the number of medical appointments by age range and
gender. The data is displayed according to the filter criteria defined by the user. Similarly, the
same information was presented in session 4, Figure 8, however with the prototype, the data can
be dynamically displayed according to the user’s preference and needs.
Graph C from Figure 28 shows the number of medical appointments by hour per day of
week. Data is displayed according to the filter criteria defined by the user. It allows the users
to identify the volume of medical appointment by hour for a given day of the week and it is
also possible to select a specific health unit. This information was also presented in session 4
Figure 13 and Figure 14, however with the prototype, it becomes very easy for the user to apply
50
Figure 29 – Heat Map Layer for a given infection dis-ease.
Source: The Author.
different filters. It is also possible to check the peak times of appointments for a specific health
unit for example.
Graph D from Figure 28 shows number of a given infectious disease by week number.
Data is displayed according to the filter criteria defined by the user. The red bars show that there
has been an outbreak of the given infectious disease in those weeks. In this graph, the modified
WHO definition of disease outbreak described in session 2.3.2 has been used to detect an
outbreak, i.e. the threshold for an outbreak in a given week is defined at two standard deviations
in excess. Also, the heat map outbreak is based on the same method, however, the criteria are
re-applied at each neighborhood level. As an example, Figure 29 shows the heat map where
the outbreak criteria have been applied by neighborhood, areas in red on the heat map means
that there has been a great number of outbreaks for the given period. As opposed to the other
graphs available on the Dashboard, Graph D is a new form of data visualization when compared
to session 4.
The selected infectious diseases available for user selection in Graph D of the prototype
were the same infectious diseases presented in the exploratory data analysis session, being
them Escherichia coli, Dengue, Yellow Fever, Measles, Hepatitis. Additionally considering the
ongoing COVID-19 pandemics the following are also added for use selection: Confirmed-Covid
cases, Suspected-Covid cases, and Severe Acute Respiratory Syndrome (SARS).
51
Figure 30 – Region Layer.
Source: The Author.
Besides the heat map for infectious disease, there are also other options of map layers
available on the prototype. Figure 30 shows the Region Layer, Figure 31 shows the Neighborhood
Layer, Figure 32 shows the Health Units Layer. Users are also able to select and view multiple
layers at the same time, Figure 33 shows a combined view of both the Neighborhood Layer and
the Health Units layer.
5.3 USABILITY TEST
An experiment was conducted between 10-Oct-2020 to 23-Oct-2020 with seven partici-
pants, 1 participant from the Health Science area, 1 from Architecture, and 5 from Computer
Science area (1 IT Architect, 1 IT Quality Engineer, 1 IT Manager, 1 System Analyst, 1 Post
Doctoral Student). It is important to mention that given the current COVID-19 pandemic scenario,
it was not possible to perform the usability test scenario in deeper detail or in person due to the
current social distancing guidelines.
In order to test the usability of the prototype, 3 usability testing scenarios as presented in
Table 12 and an evaluation questionnaire with 4 questions as presented in Table 13 were defined.
The participants were asked to execute 1 low, 1 medium, and 1 high complexity usability testing
52
Figure 31 – Neighborhood Layer.
Source: The Author.
Figure 32 – Health Unit Layer.
Source: The Author.
53
Figure 33 – Health Units and Neighborhood Layers.
Source: The Author.
scenarios. For each scenario, participants were asked to provide their responses based on the
results they could see either on the Dashboard Graphs or on the Dashboard Map. The purpose
of this usability test is to measure the user success rate, which is defined by the percentage of
tasks that users complete correctly, as explained by Nilsen Normam Group16. After executing
the usability testing scenarios, participants were asked to respond an evaluation questionnaire
to assess their perception in the following aspects (i) easiness of use, (ii) relevance of filters
available, (iii) complexity in understanding the graphs, (iv) suggestion for improvements, in case
of any.
The instructions on how to access the prototype was provided to each user by email
and each email contained a user-specific link to a Google Sheet17 with the description of each
usability testing scenario and the questionnaire where users could input their answers.
Figure 34, shows the results of the users’ ability to complete each usability testing
scenarios. It shows that all 7 users executed the usability testing scenario and provided the correct
answer to the question. For scenario 2, 4 users were able to complete the scenario successfully,
2 users executed the scenario but provide a partially right answer to the question, and 1 user16 https://www.nngroup.com/articles/success-rate-the-simplest-usability-metric/, last accessed 01-Nov-2020.17 https://docs.google.com/spreadsheets/, last accessed 14-Nov-2020.
54
Table 12 – Usability testing scenarios
Scenario ID Scenario Complexity Usability Testing Scenario1 Low How many medical visits were there
in March 2019?2 Medium In October 2018, was there any age
range in which there was a greaternumber of medical appointments formale patients?
3 High Based on the heat map, using allthe data available, which regionor neighborhood presented a greatnumber of outbreaks for confirmedcases of Covid?
Source: The Author.
Table 13 – Evaluation Questionnaire
Question ID Evaluation Question Options1 Was it easy to use the prototype? Yes / No2 Do you think the filter options were relevant? Yes / No3 How do you assess the complexity in understanding the graphs? Easy / Medium / Difficult4 Do you have any suggestion for improvements? Open-ended Question
Source: The Author.
executed the scenario but provide a wrong answer to the question. For scenario 3, all 7 users
executed the scenario and provided the correct answer to the question. This shows that for the
low complexity usability testing scenario the user success rate of the prototype usability was
at 100%, for the medium complexity scenario the success rate was at 57%, and for the high
complexity scenario, the success rate is at 100%.
Figure 35 shows the results of the users’ evaluation of the prototype in terms of easiness
of use and graph complexity. All 7 users evaluated the prototype as easy to use, 6 uses evaluated
the complexity in understanding the graph as easy and 1 user evaluated as difficult.
Figure 36 shows the results of the users evaluation of the relevance of the filters available
in the prototype. All 7 users evaluated the filter options as relevant.
Table 14 shows the suggestions for improvement for the prototype given by the users.
There were 7 suggestions in total and some of these suggestions can be implemented as a future
work.
Suggestions 1, 2, 4, 5, and 6 can be easily implemented and would bring value to user
experience. Suggestion 3 needs to be further analyzed and tested on what would be the best
approach. Suggestion 7 would require additional data to be aggregated to our existing database.
55
Figure 34 – Users ability to complete usability testing scenarios.
Source: The Author.
Figure 35 – Evaluation Questionnaire Results - Easiness of Use and GraphComplexity.
Source: The Author.
5.4 LESSONS LEARNT
Several factors need to be considered when designing a health dashboard prototype
such as:
1. Analysis of the technology to be used in terms of cost and libraries compatibility. In this
prototype, only open source technologies were used and as mentioned by (PARCIANELLO,
2019) due to libraries incompatibility, many times when you want to add new functionality
it is required to redesign the application to replace conflicting plugins with others.
56
Figure 36 – Evaluation Questionnaire Results - Relevance of Filters.
Source: The Author.
Table 14 – Suggestions for Improvement
ID Suggestion for Prototype Improvement1 Show labels and units of x and y axis of the graphs.2 The first bar graph (graph A in Figure 28) has many colors, it would be easier to understand if only
one color is used.3 The last bar chart (graph D in Figure 28) shows no data when some options in the Search Panel are
selected. This might be confusing for some users. An alternative could be to hide this graph when isempty.
4 The top right icon has a light color that makes hard to identify it at first sight. The suggestion is tochange the color and move the Search Panel at the same level (to the left). You can also put a floatingdiv tag and put these two options inside of it and add a legend ("menu"). This could help to user tolocate these options.
5 The Search Panel is fixed, making it flexible would help to give the users more functionalities to viewthe data.
6 Lack of legend for the heat map and the number of cases could also appear when hovering the mouseover the map, just as it appeared in the bar graphs; I had difficulties to answer question 3, because theanswer varied according to the zoom I used to look at the map (when I zoomed in a lot, I couldn’tdistinguish which neighborhood had the most reddish coloring, so I preferred to use less zoom andanswer by region).
7 If possible add a lethality rate of the coronavirus in the form of a graph, for example, in one week xnumber of people have been tested positively versus how many people died.
Source: The Author.
57
2. Previous knowledge of the data is required, therefore an exploratory data analysis is an
important step to understanding what the data can tell and it serves as a basis on what
visual information to show on the prototype.
3. Identification of relevant filters for end-users to explore the data in an efficient manner.
Ideally, before implementing such filters it is important to assess the users’ requirements.
4. Analysis of what are the best ways to display the data and what statistical method to apply.
For example, one of the challenges faced during this research was to find an appropriate
method for identifying disease outbreaks. It has even been found imprecision between
national and international methods for publication of disease outbreaks, while the CDC
presented disease outbreak information easily on their website, in Brazil such information
is published in the Brazilian Epidemiological Bulletins.
5. The prototype must provide a good user experience, functionalities need to be intuitive
and simple.
6. Use of techniques to enhance the system performance in fetching and retrieving the data
from the database.
7. Use of data security measures to avoid malicious SQL injections. In our health dashboard
prototype, the security measure chosen was the use of prepared statements to access the
data from the database server.
8. Public health data must always be anonymized to be displayed in this kind of dashboard.
The public health data available in Curitiba Open Data Portal was already anonymized,
but in scenarios where such type of dashboard accesses the data directly from the health
system, data anonymization is a crucial step to be taken.
9. Consideration of user feedback for prototype enhancement. During the evaluation ques-
tionnaire the users provided good suggestions which can enhance the user experience of
our prototype.
Lastly, it would be helpful if there were standards defined for data sharing and use of
technologies by governing bodies which would allow scaling of Health Dashboards by
any city, in a plug and play mode. For example, the open data could be available via
Application Programming Interfaces (APIs) with appropriate documentation to facilitate
58
interaction between multiple applications, and template applications which any cities could
simply leverage the template code and deploy a new Dashboard instance without the need
to developing it from scratch.
59
6 CONCLUSION
There are several challenges related to health data, mainly faced by large urban centers.
New technologies, infectious diseases, public trust, delivering health in conflict and crisis,
making healthy care fairer, expanding access to medicine are some of the problems that impact
the development and planning of a city (World Health Organization, 2020). Ideally, data must be
accompanied by openly accessible metadata so that it can be discovered, interpreted correctly,
and reused for subsequent research. If crisis such as COVID-19 pandemic are considered,
data combined with context and meaning turns into knowledge for informing public health
response (RDA COVID-19 Working Groups, 2020), and the use of common metadata standards,
as well as vocabularies, are recommended.
In this sense, this dissertation presented an exploratory analysis using sociopolitical,
geographical, transportation and health data in Curitiba, relating this information to other factors,
such as health units locations, bus stops accessibility, displacements performed by the citizens to
arrive at the health units and analysis of selected infectious diseases distribution across the cities’
neighborhoods.
In order to have a better understanding on the context of each related area, this disserta-
tion presented a summary of the basic concepts about GIS, smart cities, open data with focus on
health data, among others. The main challenges of each area have also been presented.
In our preliminary data analysis, the results show that the bus stops are well located and
allow citizens to arrive at the majority of health units without the need of long walks, fulfilling
its function of being located where the population needs. But it is also observed that there is a
relevant amount of displacements from outside the citizens’ neighborhood to arrive at the top 5
ECU, especially looking for treatment of respiratory problems. It is required further research on
related works and with the municipality of Curitiba to understand if those numbers are expected
and why. As future work, it would be useful to analyse if the displacements are caused by
specific demands per regions, so that the results of the analysis could be used for enhancement
of specialty center units redistribution according to the demands. The results also showed some
differences in gender behaviour towards medical visits, with the majority (60.62%) being of
female visits, except in the age range from 0 to 4 years old. Similar behaviour could also be seen
in related work explored.
The initial data analysis also showed that on the 7 categories of selected infectious
60
diseases, influenza and dengue have a greater presence in Curitiba when compared to the other
categories considered in this dissertation, also a sign of correlation in the geographical distribution
of dengue and meningitis could be observed. Further research is needed to confirm this behaviour
in more details.
Our Health Data Prototype showed that it can facilitate the analysis of the health data
aggregated to other sources to either the general public or professionals of the Health Science
departments. It also showed that even with the use of a great amount of data the dashboard
performance to show the graphs and maps is not affected with the appropriate techniques of
data modeling design. And that, from a user point of view no prior knowledge of the data or
technology are required to perform the search and analysis of the data through the dashboard.
As future work, we can mention the enhancement of the Health Data prototype based
on the users’ inputs, the inclusion of other data-sets such as air quality data and how it correlates
with diseases of respiratory tract, enhancement of data visualization and statistical methods to
allow for prediction of health service demands per region.
As the population increases its important to conduct further analysis to identify if
the health services are still adequate to the needs of the population, in this dissertation the
demographic data source is from 2010 census. In Brazil demographic census is conducted every
10 years, in 2020 as a result of the Covid-19 Pandemics, IBGE has decided to postpone the
census to 2021 as per communication published in March 2020 1. With this scenario, as future
work, newer census data can be used as it becomes available to analyse the how the increase
in population affects the quality and demands in health services. Also, other methods such as,
leveraging crowd source data could be used to bring insights on the population’s perception of
quality of the health services, for example by using Google rating and review data of the health
units available on Google Maps.
The Municipal government of Curitiba could use the results of this research to assess
whether the different data sources administered by the city can be normalized in order to facilitate
future research. Other studies using data from different cities could use the numbers from this
dissertation as a baseline for comparison. The results can also be used as a starting point for
professionals in the public health field to obtain insights for a more detailed future research.
1 https://www.ibge.gov.br/novo-portal-destaques/27161-censo-2020-adiado-para-2021.html, last accessed Dec-27-2020.
61
REFERENCES
BATTY, M.; AXHAUSEN, K. W.; GIANNOTTI, F.; POZDNOUKHOV, A.; BAZZANI,A.; WACHOWICZ, M.; OUZOUNIS, G.; PORTUGALI, Y. Smart cities of the future. TheEuropean Physical Journal Special Topics, v. 214, n. 1, p. 481–518, Nov 2012. Available at:https://doi.org/10.1140/epjst/e2012-01703-3.
BERTAKIS, Klea D; AZARI, Rahman; HELMS, L. Jays; CALLAHAN, Edward J; ROBBINS,John A. Gender differences in the utilization of health care services. Journal of FamilyPractice, Appleton-Century-Crofts, v. 49, n. 2, p. 147–152, 2000. ISSN 0094-3509.
BITTENCOURT, Roberto José; HORTALE, Virginia Alonso. Intervenção para solucionara superlotação nos serviços de emergência hospitalar: uma revisão sistemática. Cadernosde Saúde, scielo, v. 25, p. 1439 – 1454, 07 2009. ISSN 0102-311X. Available at: http://www.scielo.br/scielo.php?script=sci\_arttext&pid=S0102-311X2009000700002&nrm=iso.
BRADY, Oliver J; SMITH, David L; SCOTT, Thomas W; HAY, Simon I. Dengue diseaseoutbreak definitions are implicitly variable. Epidemics, Elsevier, v. 11, p. 92–102, 2015.
CALEGARI, G. Re; CELINO, I.; PERONI, D. City data dating: Emerging affinities betweendiverse urban datasets. Information Systems, volume 57, p. 223–240, 2016.
CASTRO, Marcia C; MASSUDA, Adriano; ALMEIDA, Gisele; MENEZES-FILHO,Naercio Aquino; ANDRADE, Monica Viegas; NORONHA], Kenya Valéria Micaela[de Souza; ROCHA, Rudi; MACINKO, James; HONE, Thomas; TASCA, Renato;GIOVANELLA, Ligia; MALIK, Ana Maria; WERNECK, Heitor; FACHINI, Luiz Augusto;ATUN, Rifat. Brazil’s unified health system: the first 30 years and prospects for thefuture. The Lancet, v. 394, n. 10195, p. 345 – 356, 2019. ISSN 0140-6736. Available at:http://www.sciencedirect.com/science/article/pii/S0140673619312437.
CAVALCANTE, J. L. S. B.; NETO, M. S.; KOZIEVITCH, N. P. Utilização e estudo de dados desaúde georreferenciados para desenvolvimento de aplicação móvel. GeoInfo, p. 170–175, 2018.
DAMERI, Renata; CAMILLE, Rosenthal_Sabroux. Smart city and value creation. Springer, p.pp. 1–12, 06 2014.
DONG, Ensheng; DU, Hongru; GARDNER, Lauren. An interactive web-based dashboard totrack covid-19 in real time. Lancet Infect Dis, v. 5, 02 2019.
EDSALL, Robert M. Design and usability of an enhanced geographic information system forexploration of multivariate health statistics. The Professional Geographer, Routledge, v. 55,
62
n. 2, p. 146–160, 2003. Available at: https://www.tandfonline.com/doi/abs/10.1111/0033-0124.5502003.
ESRI. What is gis. Available in https://www.esri.com/en-us/what-is-gis/overview, lastaccessed 17-Jun-2019, 2019.
FERRAZ, A. C. P.; TORRES, I. G. E. Transporte público urbano. Rima, 2014.
Ferreira, N.; Poco, J.; Vo, H. T.; Freire, J.; Silva, C. T. Visual exploration of big spatio-temporalurban data: A study of new york city taxi trips. IEEE Transactions on Visualization andComputer Graphics, v. 19, n. 12, p. 2149–2158, Dec 2013. ISSN 1077-2626.
GURSTEIN, Michael. Open data: Empowering the empowered or effective datause for everyone? First Monday, v. 16, n. 2, 2011. ISSN 13960466. Available at:https://firstmonday.org/ojs/index.php/fm/article/view/3316.
Harrison, C.; Eckman, B.; Hamilton, R.; Hartswick, P.; Kalagnanam, J.; Paraszczak, J.; Williams,P. Foundations for smarter cities. IBM Journal of Research and Development, v. 54, n. 4, p.1–16, July 2010. ISSN 0018-8646.
HINO, P.; VILLA, T. C. S.; SASSAKI, C. M.; NOGUEIRA, J. de A.; SANTOS, C. B. dos. Onthe mode of communication of cholera. Revista Latino-Americana de Enfermagem, v. 14(6),p. 939–943, 2006.
JANSSEN, M.; CHARALABIDIS, Y.; ZUIDERWIJK, A. Benefits, adoption barriers andmyths of open data and open government. Information Systems Management, volume 29, p.258–268, 2012.
KASSEN, Maxat. A promising phenomenon of open data: A case study of the chicago open dataproject. Government Information Quarterly, v. 30, 10 2013.
KATAKIS, I. Mining urban data (part a). Information Systems, volume 54, p. 113–114, 2015.
KOZIEVITCH, N. P.; SILVA, T. H.; ZIVIANI, A.; COSTA, G.; LUGO, G. Three decades ofbusiness activity evolution in curitiba: A case study. Annals of Data Science, v. 4, p. 1–21,2017.
LIMA, C. D.; PEIXOTO, A. M.; GOMES-JR, L. C.; LUDERS, R.; FONSECA, K. V. O.Avaliação da qualidade do transporte público no acesso a unidades de saúde de curitiba. In:Anais do III Workshop de Computação Urbana (COURB 2019). Gramado, RS, Brasil: SBC,2019. v. 1. Available at: http://sbrc2019.sbc.org.br/wp-content/uploads/2019/05/courb2019.pdf.
63
Meingast, M.; Roosta, T.; Sastry, S. Security and privacy issues with health care informationtechnology. 2006 International Conference of the IEEE Engineering in Medicine andBiology Society, p. 5453–5458, Aug 2006. ISSN 1557-170X.
NAKONETCHNEL, E. C.; KOZIEVITCH, N. P.; CAPPIELLO, C.; VITALI, M.; AKBAR, M.Mobility open data: Use case for curitiba and new york. Anais do XIII Escola Regional deBanco de Dados, ERBD, p. 111–114, 2017.
NETO, Virgilio Cavicchioli; CHIARI-CORREIA, Natalia; CARVALHO, Isabelle; PISA,Ivan Torres; ALVES, Domingos. Desenvolvimento e integração de mapas dinâmicosgeorreferenciados para o gerenciamento e vigilância em saúde. Journal of Health Informatics,v. 6, p. 3, 01 2014.
OLIVEIRA, Matheus F. A. de; KOZIEVITCH, Nádia P.; BIM, Silvia A.; LEGAL-AYALA,Horacio. Caracterização dos dados públicos de saúde do paraguai. In: Anais da XIV EscolaRegional de Banco de Dados (ERBD 2018). Porto Alegre, RS, Brasil: SBC, 2018. p. 12–21.ISSN 2595-413X. Available at: https://ojs.sbc.org.br/index.php/erbd/article/view/2825.
PARCIANELLO, Yussef. Análise de origem-destino do uso do sistema de transportecoletivo de curitiba sob o ponto de vista de regions of interest. Master’s Thesis,Universidade Tecnológica Federal do Paraná, Curitiba, 2019. Available at: http://repositorio.utfpr.edu.br/jspui/handle/1/4821.
PUCCIONI-SOHLER, Marzia; ROSADAS, Carolina; CABRAL-CASTRO, Mauro Jorge.Neurological complications in dengue infection: a review for clinical practice. Arquivos deNeuro-Psiquiatria, scielo, v. 71, p. 667 – 671, 09 2013. ISSN 0004-282X. Available at: http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0004-282X2013001000667&nrm=iso.
RDA COVID-19 Working Groups. RDA COVID-19 Working Group Recommendationsand Guidelines, 1st release. Research Data Alliance, 2020. Available at: https://doi.org/10.15497/rda00046.
RUSHTON, Gerard. Public health, gis, and spatial analytic tools. Annual Review of PublicHealth, v. 24, n. 1, p. 43–56, 2003. PMID: 12471269.
SANTIN, Priscila L.L.; MUNARETTO, Anelise; FONSECA, Mauro. Modelagem do perfilde atendimento aos idosos nas unidades de saúde de curitiba. In: Anais do I Workshop deComputação Urbana (COURB 2017). Porto Alegre, RS, Brasil: SBC, 2017. v. 1. ISSN2595-2706. Available at: https://ojs.sbc.org.br/index.php/courb/article/view/2575.
SILVEIRA, L. M.; ALMEIDA, J. M.; MARQUES-NETO, H. T.; ZIVIANI, A. Mobdatu:Um novo modelo de previsão de mobilidade humana para dados heterogêneos. XXXIIISimposio Brasileiro de Redes de Computadores e Sistemas Distribuídos, Vitoria / ES.
64
Anais do XXXIII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos,SBRC’2015, p. 515–528, 2015.
SNOW, John. On the mode of communication of cholera. Edinburgh medical journal, v. 1,7,p. 668–670, 01 1856.
SOUSA, Leonardo; MELLO, Rafael de; CEDRIM, Diego; GARCIA, Alessandro; MISSIER,Paolo; UCHôA, Anderson; OLIVEIRA, Anderson; ROMANOVSKY, Alexander. Vazadengue:An information system for preventing and combating mosquito-borne diseases with socialnetworks. Information Systems, v. 75, p. 26–42, 02 2018.
TUKEY, John W. Exploratory data analysis. Addison-Wesley, 1977.
VILA, J. R.; KOZIEVITCH, N.; FONSECA, K.; GADDA, T.; ROSA, M.; GOMES-JR, L. C.;AKBAR, M. Urban mobility challenges – an exploratory analysis of public transportation data incuritiba. Revista de Informática Aplicada, v. 12, p. 1, 12 2016.
VILAICHONE, Ratha-Korn; AUMPAN, Natsuda; RATANACHU-EK, Thawee; UCHIDA,Tomohisa; TSHERING, Lotay; MAHACHAI, Varocha; YAMAOKA, Yoshio. Population-basedstudy of helicobacter pylori infection and antibiotic resistance in bhutan. International Journalof Infectious Diseases, v. 97, 05 2020.
World Health Organization. Urgent health challenges for the next decade. WHO,2020. Available at: https://www.who.int/news-room/photo-story/photo-story-detail/urgent-health-challenges-for-the-next-decade.
World Health Organization and others. WHO guidelines for epidemic preparedness andresponse to measles outbreaks. WHO, 1999. Available at: https://www.who.int/csr/resources/publications/measles/WHO_CDS_CSR_ISR_99_1/en/.
ZHOU, Xuezhong; MENCHE, Jörg; BARABASI, Albert-Laszlo; SHARMA, Amitabh. Humansymptoms–disease network. Nature communications, v. 5, p. 4212, 06 2014.
66
APPENDIX A - MEDICAL RECORDS DATA DICTIONARY
Table 15 – Medical Records Data Dictionary.Medical Records Data Dictionary
Column Name Port Column Name Eng Description Type Sizedt_atendimento Date of Appointment Date when the medical ap-
pointment occurredDATE
dt_nascimento Date of Birth Date of birth of the patient DATEsexo Gender Gender of the patient VARCHAR2 1cod_tipo_unidade Code of Unit Type Code of the health unit type NUMBER 5tipo_unidade Type of Unit Type of the health unit type VARCHAR2 50cod_unidade Code of Unit Code of the health unit VARCHAR2 150desc_unidade Description of the
UnitDescription of the health unit VARCHAR2 80
cod_procedimento Code of Procedure Code of procedure per-formed
VARCHAR2 12
desc_procedimento Description of Proce-dure
Description of procedure per-formed
VARCHAR2 255
cod_cbo Code of CBO Code of professional occupa-tion
VARCHAR2 8
desc_cbo Description of CBO Description of professionaloccupation
VARCHAR2 200
cod_cid Code of CID Code of diagnostic VARCHAR2 4desc_cid Description of CID Description of diagnosis VARCHAR2 150solicitacao_exames Request for Exam Indicates whether an exam
request has occurredVARCHAR2 3
qt_presc_farm_ctbana Qty Medicine Pre-scribed
Quantity of medicine pre-scribed at the Curitibanapharmacy
NUMBER 10
qt_disp_farm_ctbana Qty MedicineReleased
Quantity of medicine re-leased from the Curitibanapharmacy
NUMBER 10
qt_med_nao_padron Qty Non-StandardizedMedicine
Quantity of Non-Standardized Medicine
NUMBER 10
enc_atendimento_espec Referral to SpecialistService
Indicates whether referralwas made to Specialist Care
VARCHAR2 3
area_atuacao Practice Area Area of practice VARCHAR2 255desencadeou_intern Triggered Hospital-
izationIndicates whether hospital-ization was triggered
VARCHAR2 3
dt_internamento Date of Hospitaliza-tion
Date of the patient’s hospital-ization
DATE
estab_solicitante Requesting Facility Facility that requested thehospitalization
VARCHAR2 80
estab_destino Targeted Facility Facility in which hospitaliza-tion occurred
VARCHAR2 80
cid_internamento CID of Hospitaliza-tion
Code of the diagnosis of hos-pitalization
VARCHAR2 4
tratamento_domicilio Home Treatment Type of Water Treatment athome
VARCHAR2 30
abastecimento Supply Type of Water Supply athome
VARCHAR2 40
energia_eletrica Electricity Indicates whether there iselectricity in the household
VARCHAR2 3
tipo_habitacao Type of Housing Type of housing VARCHAR2 60destino_lixo Garbage Disposal Waste destination at home VARCHAR2 30
67
Continuation of Table 15Name of the Column Description Type Sizefezes_urina Faeces/Urine Destination of faeces / urine
at homeVARCHAR2 30
comodos Rooms Quantity of rooms at home NUMBER 5em_casodoenca In Case of Illness Services sought in the event
of illnessVARCHAR2 40
grupo_comunitario Community Group Community Group in whichthe patient participates
VARCHAR2 40
meio_comunicacao Means of Communi-cation
Communication media usedat home
VARCHAR2 40
meio_transporte Means of Transporta-tion
Means of Transport used athome
VARCHAR2 40
municipio Municipality Patient municipality VARCHAR2 50bairro Neighborhood Patient neighborhood VARCHAR2 72nacionalidade Nationality Patient’s nationalitycod_usuario user_code Unique user code NUMBER 10origem_usuario user_origin 1 - Resident in the munici-
pality 2 - Non resident in themunicipality
NUMBER 1
residente resident 1 - With definitive regis-tration at BHU 2 - With-out definitive registration atBHU
NUMBER 1
cod_profissional professional_code Professional unique code NUMBER 10Source: Adapted from Curitiba Open Data Portal (2019)
68
APPENDIX B - SCRIPTS FOR MATERIALIZED VIEWS
CREATE MATERIALIZED VIEW
public.qtd_atendimento_mes_ano AS
select
cod_unidade,
desc_unidade,
EXTRACT(MONTH from to_date(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as month,
EXTRACT(YEAR from to_date(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as year,
count(*)
from saude.tatiane_atendimento_unidade_saude
group by 1, 2, 3, 4;
CREATE MATERIALIZED VIEW
public.qtd_atendimento_faixa_etaria_genero AS
select date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) as age_range,
EXTRACT(MONTH from to_date(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as month,
EXTRACT(YEAR from to_date(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as year,
sexo as gender,
cod_unidade,
count(*),
case when date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) <= 4 Then ’0 to 4’
when date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) > 4
and date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) <= 14 Then ’5 to 14’
when date_part(’year’,age(to_date(dt_nascimento,
69
’DD-MM-YYYY’))) > 14
and date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) <= 24 Then ’15 to 24’
when date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) > 24
and date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) <= 34 Then ’25 to 34’
when date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) > 34
and date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) <= 44 Then ’35 to 44’
when date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) > 44
and date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) <= 54 Then ’45 to 54’
when date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) > 54
and date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) <= 64 Then ’55 to 64’
when date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) > 64
and date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) <= 74 Then ’65 to 74’
when date_part(’year’,age(to_date(dt_nascimento,
’DD-MM-YYYY’))) > 74 Then ’75+’
end
from atendimento_unidade_saude
group by 1,2,3,4,5
CREATE MATERIALIZED VIEW
public.qtd_atendimento_hora_dia_semana AS
SELECT
70
cod_unidade,
desc_unidade,
EXTRACT(HOUR from to_timestamp(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as hour,
EXTRACT(DOW from to_timestamp(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as day_of_week,
EXTRACT(YEAR from to_timestamp(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as year,
EXTRACT(MONTH from to_date(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as month,
count(*)
FROM saude.tatiane_atendimento_unidade_saude
group by 1, 2, 3, 4, 5, 6;
CREATE MATERIALIZED VIEW
public.qtd_doenca_semana_ano AS
SELECT
cod_cid,
desc_cid,
EXTRACT(YEAR from to_timestamp(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as year,
EXTRACT(WEEK from to_timestamp(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as Week_Number,
EXTRACT(MONTH from to_date(dt_atendimento,
’DD/MM/YYYY HH24:MI:SS’)) as month,
bairro,
count(*)
FROM saude.tatiane_atendimento_unidade_saude
group by 1, 2, 3, 4, 5,6;