Construction d'un score prédictif du cancer du sein adapté à la ...

HAL Id: tel-03360108https://tel.archives-ouvertes.fr/tel-03360108

Submitted on 30 Sep 2021

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Construction d’un score prédictif du cancer du seinadapté à la population française : détermination de

seuils de risque pour un dépistage organisé personnalisé.Emmanuel Bonnet

To cite this version:Emmanuel Bonnet. Construction d’un score prédictif du cancer du sein adapté à la populationfrançaise : détermination de seuils de risque pour un dépistage organisé personnalisé.. Cancer. Uni-versité Montpellier, 2020. Français. �NNT : 2020MONTS086�. �tel-03360108�

https://tel.archives-ouvertes.fr/tel-03360108

https://hal.archives-ouvertes.fr

THÈSE POUR OBTENIR LE GRADE DE DOCTEUR

DE L’UNIVERSITÉ DE MONTPELLIER

En Biostatistiques

École doctorale n°166 : Information, Structures, Systèmes (I2S)

Unité de recherche UPRES EA 2415 : Aide à la Décision Médicale Personnalisée

Présentée par Emmanuel BONNET Le 24 Novembre 2020

Sous la direction de Paul LANDAIS et Jean-Pierre DAURES

Devant le jury composé de

Christine LASSET

Catherine QUANTIN

Stéphane ZERVOUDIS

Paul LANDAIS

Jean-Pierre DAURES

PU-PH

PU-PH

PU-PH

PU-PH

PU-PH

Université Lyon 1

Université de Bourgogne

Université Ioannina, Athènes

Université de Montpellier

Université de Montpellier

Rapporteur

Rapporteur

Examinateur

Directeur

Co-directeur

Construct ion d’un score prédict if du cancer du sein adapté à la populat ion française :

déterminat ion de seui ls de r isque pour un dépistage organisé personnal isé.

Remerciements

Je tiens à exprimer mes sincères et profonds remerciements à monsieur le Professeur

Paul Landais, mon directeur de thèse. Je vous remercie pour vos précieux conseils, votre

confiance, votre écoute, et votre soutien durant ces 3 années, mais aussi pour votre aide

précieuse pour la rédaction et les nombreuses relectures des articles et de ce manuscrit.

Merci aussi pour la motivation et la positivité que vous m’avez fournies tout au long de

ce travail et plus particulièrement durant les derniers mois qui ont été particulièrement

difficiles. Ce fût un immense honneur et un plaisir de mener cette thèse à vos côtés.

Je remercie aussi monsieur le Professeur Jean-Pierre Daurès, mon co-directeur de thèse,

pour son aide précieuse tout au long de cette thèse. Votre confiance, vos conseils, votre

rigueur, et votre immense expertise dans un grand nombre de domaines dont le dépistage

organisé du cancer du sein m’ont été d’une grande aide pendant ces 3 années. Vous vous

êtes toujours intéressé à l’avancée de mes travaux, vous avez toujours été disponible et à

l’écoute. Pour tout cela merci encore.

Je tiens à remercier vivement les membres du jury madame le Professeur Christine

Lasset et madame le Professeur Catherine Quantin d’avoir accepté d’être les rapporteurs

de ce travail et pour leurs remarques pertinentes qui ont contribué à l’amélioration de

ce manuscrit. Je tiens aussi à remercier monsieur le Docteur Stéphane Zervoudis qui a

accepté d’examiner ma thèse.

Je voudrais également remercier toute l’équipe de Dépistage 34 (CRCDC-OC site Hé-

rault) sans qui ce travail n’aurait pas été possible. Je voudrais en particulier remercier

Jean-Loup Chappert d’avoir tout mis en œuvre pour que le recueil des données se passe

dans les meilleures conditions possibles, Marc Galindo pour son aide dans la création du

CRF et toute ses réponses dès que je lui demandais de nouvelles informations sur la base

de données et enfin Chantal pour son aide dans le test du questionnaire et sa bonne hu-

meur à chaque fois que je passais dans vos locaux. Un grand merci également à Madeline,

Xavière et Séverine pour leur efficacité et la grande qualité de leur travail dans le recueil

des données.

Je remercie aussi toute l’équipe de l’IURC (Institut Universitaire de Recherche Cli-

nique). Je ne peux pas citer tout le monde alors je me contenterai de faire quelques

mentions particulières. Je commence donc par remercier Françoise avec qui j’ai eu le plai-

sir de partager mon bureau pendant près de 5 ans et qui m’a accompagné dans toute cette

aventure, Sylvie pour sa gentillesse et son aide pour toutes les questions administratives

ainsi que Sandy pour son aide précieuse et sa réactivité pour les appels à projets et les

questions juridiques.

Je remercie aussi mes amis. Merci à vous pour tous les bons moments passés ensemble

et à venir !

Enfin, mes derniers remerciements sont pour ma famille. Un merci très spécial à ma

maman pour m’avoir donné le goût des maths et pour ses encouragements tout au long

de ce travail de thèse. Merci aussi de toujours avoir été présente quand j’en avais besoin.

Ce travail de thèse aborde un sujet qui me touche beaucoup, je tiens à adresser une

pensée particulière à mamie Mado, Babeth et Edith.

Sommaire

Liste des abréviations 1

Introduction 2

1 Optimisation de la recherche bibliographique 4

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 La méthode TEMAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Points clés de la mise en œuvre de la méthode dans R . . . . . . . . . . . . 5

1.3.1 Extraction des articles de PubMed . . . . . . . . . . . . . . . . . . 5

1.3.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.2.1 Matrice termes-documents . . . . . . . . . . . . . . . . . . 5

1.3.2.2 Division maximisant le �2 . . . . . . . . . . . . . . . . . . 6

1.3.2.3 Sélection des termes . . . . . . . . . . . . . . . . . . . . . 7

1.3.2.4 Itérations . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Application web TEMAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4.1 Shiny . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4.1.1 Shiny server . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4.1.2 Shiny UI . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4.2 Hébergement de l’application . . . . . . . . . . . . . . . . . . . . . 10

1.5 Article 1 : Optimizing Literature Search : TEMAS, A New Text-Mining

Algorithm-Assisted Search Tool . . . . . . . . . . . . . . . . . . . . . . . . 10

1.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Score de risque de cancer du sein 42

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.2 Création du questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2.1 Concertations et méthode de création du questionnaire . . . . . . . 43

2.2.2 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3 Création de la base de données anonymisée . . . . . . . . . . . . . . . . . . 43

2.3.1 Nombre de sujets nécessaire . . . . . . . . . . . . . . . . . . . . . . 43

2.3.2 Construction de la cohorte rétrospective . . . . . . . . . . . . . . . 44

2.3.2.1 Identifier les femmes à risque moyen dans l’Hérault . . . . 44

2.3.2.2 Collection des mammographies . . . . . . . . . . . . . . . 44

2.3.2.3 Constitution de la cohorte . . . . . . . . . . . . . . . . . . 45

2.3.2.4 Enquête téléphonique . . . . . . . . . . . . . . . . . . . . 45

2.4 Analyse statistique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.4.1 Modèle de Cox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.4.2 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.4.3 Détermination des seuils de risque . . . . . . . . . . . . . . . . . . . 47

2.4.3.1 Détermination de seuil à l’aide d’une courbe ROC . . . . . 47

2.4.3.2 Détermination des seuils par rapport de risque . . . . . . . 48

2.5 Article 2 : Towards a personalized organized screening in breast cancer :

practical determination of thresholds of risk in women at average-risk . . . 52

2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Conclusion 71

Bibliographie 75

Bibliographie de la thèse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Bibliographie des facteurs de risque . . . . . . . . . . . . . . . . . . . . . . . . . 78

Annexes 109

A Questionnaire 109

B Code première étape de TEMAS 115

B.1 Shiny UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

B.2 Shiny server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

C Annexes Article 1 128

C.1 Annexe 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C.2 Annexe 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

C.3 Annexe 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

D Annexes Article 2 134

D.1 Annexe 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

D.2 Annexe 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Liste des abréviations

AP Average Precision

CRCDC-OC Centre Régional de Coordination des Dépistages des Cancers Occi-

tanie

DCG Dicsounted Cumulative Gain

INCa Institut National du cancer

LR Likelihood ratio (rapport de vraissemblance)

MAP Mean Average Precision

OR Odds Ratio

ROC Receiver Operating Characteristic

RR Risque Relatif

RV Reactive Values

Se Sensibilité

Sp Spécificité

TEMAS Text Mining Algorithm-assisted Search

UI User Interface

1

Introduction

Le cancer du sein touche en France près de 60.000 nouvelles femmes par an, avec une

survie nette à 5 ans de 87%. Cependant, si le cancer du sein est dépisté à un stade précoce,

la survie à 5 ans est de 99% [16].

Pour tenter de dépister le cancer du sein chez les femmes à un stade précoce, la France

a mis en œuvre un programme national de dépistage organisé (anciennement Programme

National de Dépistage Systématique du cancer du sein) en 1993, après un premier essai

en 1989, et l’a officialisé en mai 1994 par arrêté. En juin 1996, 20 départements français

avaient intégré ce programme de dépistage. L’objectif de cette politique de santé publique

est de réduire la mortalité par cancer du sein ainsi que la lourdeur et les séquelles des

traitements, grâce à une détection précoce de la maladie. Généralisé en mars 2004 à

l’ensemble du territoire, le programme national de dépistage organisé du cancer du sein

s’adresse à toutes les femmes de 50 à 74 ans et consiste en la réalisation tous les deux ans

d’une mammographie, avec une deuxième lecture systématique en cas de cliché normal.

Si la mammographie est suspecte, le radiologue réalise un bilan complémentaire pour le

classement définitif de l’image (cliché de profil, agrandissement et/ou échographie) puis

propose le cas échéant une surveillance ou des prélèvements pour un examen histologique.

Ce dépistage organisé concerne les femmes dites à risque "moyen" de cancer du sein. En

effet, les femmes présentant des facteurs de risque de haut ou très haut risque ne sont pas

incluses dans le dépistage organisé [11]. Il s’agit par exemple des femmes présentant des

mutations génétiques prédisposant au cancer du sein (BRCA1-2 par exemple), des femmes

ayant des antécédents familiaux importants de cancer du sein ou encore des femmes pour

lesquelles une biopsie aurait mis en évidence un facteur de risque histologique.

En prenant en compte ces critères de non inclusion, la population cible du dépistage

organisé du cancer du sein correspond à 10 millions de femmes en France, avec un taux

de participation effectif au dépistage organisé d’environ 50%. Ce taux est assez bas et la

France vise un taux de participation à 65%. À la demande de la ministre en charge de

la Santé en octobre 2015, dix ans après la généralisation de ce programme de dépistage

à l’ensemble du territoire, l’Institut national du cancer a organisé une large concertation

citoyenne et scientifique sur le dépistage du cancer du sein, invitant les citoyennes, les

professionnels de santé, les associations et les institutions à réfléchir à l’amélioration de la

politique de dépistage du cancer du sein. Les citoyennes interrogées ont exprimé le souhait

que le dépistage organisé soit de plus en plus ciblé, au sein même de la population concer-

née par le dépistage organisé (femmes à risque "moyen" aujourd’hui). Une proposition de

conclusion a été que le dépistage organisé devienne de plus en plus personnalisé selon les

2

Chapitre 0

facteurs de risque de chacune.

En octobre 2016, la ministre des Affaires sociales et de la Santé, a engagé une rénova-

tion profonde du programme de dépistage organisé du cancer du sein. Ce nouveau plan

d’actions envisageait de proposer aux femmes un suivi plus personnalisé, mieux coordonné

et impliquant davantage le médecin traitant [17]. Dans ce rapport rédigé par l’INCa il est

souligné que les projets de recherche portant sur les outils et méthodes d’évaluation du

niveau de risque, dont le scoring, permettant d’orienter au mieux les personnes vers un

dépistage plus personnalisé seraient encouragés.

C’est dans le but de proposer une nouvelle stratégie de dépistage organisé plus person-

nalisé que ce travail de thèse a été mené. Les femmes actuellement incluses dans le dépis-

tage organisé du cancer du sein en France sont les femmes considérées à risque "moyen".

En effet, comme indiqué dans l’arrêté du 29 septembre 2006 relatif aux programmes de

dépistage des cancers, les femmes à risque "élevé" ou "très élevé" ne sont pas incluses dans

le dépistage organisé et doivent bénéficier d’une surveillance personnalisée. Comme nous

l’avons rappelé plus haut, il s’agit des femmes porteuses d’une mutation délétère prédis-

posant au cancer du sein ou à forte probabilité d’en être porteuses (BRCA1 ou BRCA2

par exemple) [20], des femmes pour lesquelles une biopsie a mis en évidence un facteur

de risque histologique (hyperplasie canalaire atypique ou néoplasie lobulaire in situ), ou

encore des femmes ayant un antécédent personnel de cancer du sein.

Concernant un dépistage plus personnalisé, nous avons fait l’hypothèse, parmi les

femmes dites à risque "moyen", que toutes n’avaient pas un risque homogène. Notre hy-

pothèse de travail était que, au sein du groupe des femmes à risque « moyen », il exis-

terait aussi un sous-groupe à risque plus bas et un autre à risque plus élevé. Nous nous

sommes alors attachés à créer un score de risque pour tester notre hypothèse. Celui-ci

a été construit à partir d’un questionnaire qui a tenu compte des facteurs de risque de

cancer du sein retrouvés dans la littérature scientifique. Ce questionnaire a été soumis à

des femmes de 50 à 70 ans dans le département de l’Hérault et qui avaient été suivies à

partir de 2006 jusqu’en 2018. C’est au sein de cette cohorte rétrospective de femmes à

risque "moyen" de cancer du sein que nous avons cherché à identifier trois sous-groupes à

risque gradué de cancer du sein : un groupe à bas risque, un groupe à risque "moyen" et

un groupe de femmes à plus haut risque.

3

Chapitre 1

Optimisation de la recherche

bibliographique

1.1 Introduction

Une des premières étapes de la thèse a été de retrouver dans la littérature les po-

tentiels facteurs de risque de cancer du sein pour pouvoir créer un questionnaire à poser

aux femmes incluses dans le dépistage afin d’établir leur score de risque. La recherche

bibliographique est partie intégrante de tout projet de recherche. Nous avons utilisé le

moteur de recherche PubMed qui indexe plus de 30 millions d’articles provenant de plus

de 5200 revues scientifiques. Le nombre d’articles est donc très important et il augmente

en permanence avec en moyenne deux articles ajoutés toutes les minutes. La recherche

documentaire devient alors difficile lorsque des milliers d’articles sont potentiellement

retenus suite à une recherche.

La littérature sur les facteurs de risque de cancer du sein sur les dix dernières années

est extrêmement fournie avec près de 16.000 articles. Il est alors apparu nécessaire de

développer une méthode pour faciliter la recherche bibliographique face à des bases de

données de littérature scientifique de plus en plus fournies. C’est donc dans cette optique

que nous avons créé TEMAS (Text-Mining Algorithm assisted Search), une méthode

permettant de réduire la quantité d’articles à lire tout en augmentant le taux de pertinence

de la recherche. Cet outil a été développé sous forme d’une application web facile à prendre

en main et utilisable sans connaissances particulières en informatique ou en statistique.

1.2 La méthode TEMAS

PubMed regroupe tellement d’articles que, pour permettre l’exhaustivité de la re-

cherche, la sensibilité de la recherche sur les travaux préliminaires que nous avons réalisé

était basse. BestMatch [10], un algorithme utilisant le principe de "machine learning" a été

développé en 2017 pour offrir une alternative au classement des résultats par date dans

PubMed et offrir un classement des résultats plus pertinent. Cependant, malgré cette

avancée, le taux de pertinence d’une recherche PubMed semblait rester faible. L’ajout de

décisions de l’utilisateur allié à du text-mining au cours du processus de recherche nous

4

Chapitre 1 1.3. Points clés de la mise en œuvre de la méthode dans R

semblait nécessaire pour diminuer le nombre d’articles à lire tout en augmentant le taux

de pertinence de la recherche. L’intégralité de la méthode est présentée dans la section

1.5, dans l’article qui a été soumis pour publication.

1.3 Points clés de la mise en œuvre de la méthode

dans R

1.3.1 Extraction des articles de PubMed

Le premier défi est d’arriver à récupérer les articles de PubMed dans R, un langage

de programmation et un logiciel libre destiné aux statistiques et à la science des don-

nées (https ://cran.r-project.org/). Le National Center for Biotechnology Information

(NCBI) qui regroupe plusieurs bases de données dont PubMed a développé des "Applica-

tion Programming Interfaces" (APIs) qui, comme leur nom l’indique, sont des interfaces

qui permettent une connexion entre les serveurs du NCBI et d’autres applications, ici R.

Cet ensemble d’APIs se nomme "Entrez Programming Utilities" (E-utilities). Le package

"rentrez" [31] a été créé pour permettre d’utiliser ces APIs sous R. Ce package, grâce

à la fonction "entrez_search" permet dans un premier temps de récupérer dans R l’en-

semble des "PubMed Identifiers" (PMIDs) correspondant à une recherche PubMed. Ces

PMID ne changent pas avec le temps ou pendant le traitement et ne sont jamais réuti-

lisés, ils identifient bien chaque article indexé par PubMed de façon unique. La fonction

"entrez_fetch" permet ensuite d’importer dans R, au format XML (Extensible Markup

Language), l’ensemble des informations (titre, résumé, nom du journal, auteurs, date de

publication, Mesh headings, doi...) des articles à récupérer à partir de la liste des PMIDs

identifiés auparavant. Puis, par une extraction par balise XML, nous avons créé une base

de données avec, pour chaque article, son PMID, son titre et son résumé.

1.3.2 Classification

1.3.2.1 Matrice termes-documents

Nous avons utilisé l’algorithme de classification du package Rainette [2] du logiciel R,

qui est inspiré de la "méthode Reinert" [26, 27] et du logiciel IRaMuTeQ [25, 24].

La méthode de classification utilisée est une classification descendante hiérarchique

(CDH). La première étape est la création d’une matrice terme-document qui est une

matrice binaire de présence/absence de chaque terme dans chaque document. Un exemple

de matrice est présenté dans la table 1.1.

5

Chapitre 1 1.3. Mise en oeuvre de la méthode dans R

Table 1.1 – Exemple de matrice termes-documents

terme 1 terme 2 terme 3 terme 4article 1 0 0 1 1article 2 1 1 0 0article 3 0 1 1 1article 4 1 0 1 1article 5 1 1 1 0

1.3.2.2 Division maximisant le �2

Le but est ensuite de diviser cette matrice en deux classes en maximisant la valeur du

�2. Cependant avec une matrice de plus de 10.000 documents il n’est pas envisageable de

tester tous les regroupements possibles pour déterminer le �2 maximal.

Pour diminuer ce temps de calcul, une Analyse Factorielle des Correspondances (AFC)

est réalisée sur la matrice termes-documents, et les documents sont ordonnés selon leurs

coordonnées sur le premier axe de l’AFC. Les �2 sont calculés pour chaque groupement

des n documents de coordonnées les plus basses versus tous les autres documents, et le

regroupement donnant le plus grand �2 est conservé. Ensuite chaque document est tour

à tour changé de groupe. Si cette réaffectation augmente le �2 elle est conservée, sinon

l’article est remis dans son groupe. Cette opération est répétée jusqu’à ce qu’il n’y ait

plus d’augmentation du �2. Cela permet alors de séparer la matrice termes-documents en

2 classes. Un exemple est présenté dans la Table 1.2.

Table 1.2 – Séparation de la matrice terme documents en deux classes

terme 1 terme 2 terme 3 terme 4

Classe 1article 1 0 0 1 1article 4 1 0 1 1

Classe 2article 2 1 1 0 0article 3 0 1 1 1article 5 1 1 1 0

Le �2 est calculé en sommant les lignes de chaque classe et en calculant le �2 sur le

tableau obtenu. En reprenant notre exemple, cela donne un �2 de 3.142 (Table 1.3).

Table 1.3 – Matrice pour calcul du �2

terme 1 terme 2 terme 3 terme 4Classe 1 1 0 2 2Classe 2 2 3 2 1

6

Chapitre 1 1.4. Application web TEMAS

1.3.2.3 Sélection des termes

La dernière étape est le choix des termes à garder dans chacune des classes pour les

prochaines itérations. Tout d’abord on sépare la matrice termes-documents en 2 matrices,

une pour chaque classe. Dans chacune des deux matrices les fréquences de chaque terme

sont calculées, et tous les termes qui apparaissent trop peu de fois (nous avons gardé

la valeur par défaut qui est de 3) sont supprimés. Pour chaque terme, un coefficient de

contingence est aussi calculé entre les deux matrices et, si ce coefficient est important

(supérieur à 0.3 par défaut), le terme en question n’est conservé que dans la matrice dans

laquelle il est surreprésenté. Pour terminer avec l’exemple, cela pourrait donner le résultat

présenté dans la table 1.4.

Table 1.4 – Sélection des termes

Classe 1 et Classe 2

terme 3 terme 4 terme 1 terme 2article 1 1 1 article 2 1 1article 4 1 1 article 3 0 1

article 5 1 1

1.3.2.4 Itérations

Pour finir, il faut itérer ces processus de séparation puis de sélection des termes.

Parmi les deux nouvelles matrices terme-document obtenues, on sélectionne la matrice

qui contient le plus de documents et on recommence l’algorithme k fois.

Le résultat après les k itérations est une classification hiérarchique descendante en k

groupes, avec une représentation graphique sous la forme d’un dendrogramme dont un

exemple est présenté dans la figure 1. L’avantage de Rainette est que l’utilisateur peut

remonter dans l’arbre complet de la classification et explorer la classification de 2 à k

groupes pour choisir le nombre de classes qui semble le plus adapté.

1.4 Application web TEMAS

Le but de TEMAS était qu’il soit accessible et utilisable par le plus grand nombre

d’utilisateurs. L’idée est donc venue de créer une application web intuitive, utilisable

sans connaissances particulières en statistiques ou en programmation, et sans nécessité de

télécharger un logiciel.

7


Figure 1 – Exemple de dendrogramme

1.4.1 Shiny

La méthode TEMAS ayant été développée avec le logiciel R [23], il semblait cohérent

que, derrière une interface web, les tris d’articles et les calculs continuent d’être exécutés

sous R. Il se trouve qu’il existe un package R nommé Shiny [30] développé par RStudio,

qui permet justement de construire des applications web interactives directement à partir

de R et qui peuvent être enrichies grâce à des thèmes CSS (Cascading Style Sheets) ou

encore à des actions de type JavaScript. CSS est un langage de feuille de style utilisé pour

décrire la présentation d’un document écrit en HTML (HyperText Markup Language). Un

"thème" est l’habillage graphique d’un site. Il en définit les éléments identifiants : couleurs,

style graphique... Il est constitué de déclarations CSS et s’applique à une structure HTML

donnée qui n’est pas interchangeable. Pour cela, il faut combiner deux fichiers, un fichier

serveur qui va être responsable de toute la partie de calculs et un fichier UI (User Interface)

qui va offrir un environnement graphique dans lequel évolue l’utilisateur de l’application

web.

Le code de la première étape de l’application web TEMAS est présenté en Annexe B.

La totalité du code de l’application web représente plus de 2500 lignes.

1.4.1.1 Shiny server

La partie serveur est la plus proche du langage de programmation de R. Une différence

importante est la notion de valeurs réactives. Puisque notre application Shiny est interac-

tive, les valeurs d’entrée peuvent changer à tout moment et les valeurs de sortie doivent

8


être mises à jour immédiatement pour refléter ces modifications. Ces valeurs d’entrée sont

donc des objets nommés "input" et qui sont des "ReactiveValues" (RV) définis par une

entrée de l’utilisateur dans le navigateur web. Ces input sont définis comme "source réac-

tive". Les valeurs de sortie sont des "output", qui sont aussi des RV et sont définis comme

"point de terminaison réactif". Un "point de terminaison réactif" est généralement un élé-

ment qui apparaît dans la fenêtre du navigateur de l’utilisateur, tel qu’un graphique ou un

tableau. L’output utilise l’input et, à chaque fois que l’input change, l’output est notifié

et ré-exécuté. Une source réactive peut être connectée à plusieurs points de terminaison,

et vice versa.

La plupart des applications simples n’utilisent que des sources réactives et des points de

terminaison réactifs, connectant les sources directement aux points de terminaison, mais

il est également possible de placer des composants réactifs entre les sources et les points

de terminaison. Ces composants sont appelés conducteurs réactifs. Un conducteur peut à

la fois être dépendant et avoir des dépendances. En d’autres termes, il peut être à la fois

un parent et un enfant dans un graphique de structure réactive. Les sources ne peuvent

être que des parents (ils peuvent avoir des dépendances), et les points de terminaison

ne peuvent être que des enfants (ils peuvent être dépendants) dans le graphique réactif

présenté en Figure 2. Les conducteurs réactifs sont à la fois des parents et des enfants,

et peuvent être utiles pour encapsuler des opérations lentes, complexes ou coûteuses en

calcul entre source et point de terminaison.

Figure 2 – Graphique du cheminement réactif

1.4.1.2 Shiny UI

Shiny comprend un certain nombre de fonctionnalités permettant de disposer les com-

posants d’une application sur une page web. Cela est rendu possible grâce à un système

de "grille fluide" issu de "Bootstrap 2 grid system" où le code HTML a été remplacé par

du code R.

De plus Shiny propose un nombre important de "widgets" (outils) permettant via l’UI

de récupérer dans la partie server l’ensemble des inputs de l’utilisateur. L’input peut

être aussi bien un fichier que l’utilisateur charge sur l’application, une date à renseigner

ou encore un choix parmi une liste de propositions prédéfinies.

9

Chapitre 1

1.5. Article 1 : Optimizing Literature Search : TEMAS, A New Text-Mining

Algorithm-Assisted Search Tool

Pour guider l’utilisateur lors de la prise en main de l’application, de nombreux outils

ont aussi été utilisés. Premièrement, la mise en place de "conditional panels" c’est-à-dire

que les différentes parties de la page web apparaissent au fur et à mesure de l’avancée de

l’utilisateur, conditionnellement à ses actions, pour le guider. Deuxièmement, l’utilisation

de la fonction "validate" qui permet de vérifier que l’entrée de l’utilisateur est cohérente

avec l’entrée attendue en vérifiant, par exemple, que l’extension du fichier fourni par

l’utilisateur est bien la bonne et, si ce n’est pas le cas d’informer l’utilisateur le plus

précisément possible sur son erreur. Enfin le package "shinyWidgets" [22] qui permet de

générer des "pop-up" pour alerter l’utilisateur, en cas de réussite et de finalisation d’une

opération longue par exemple, ou alors en cas d’alerte ou d’erreur.

1.4.2 Hébergement de l’application

L’application web est hébergée sur un server Linux Ubuntu 18.04 sur lequel a été

déployé Shiny Server et accessible à l’adresse https://shiny.temas-bonnet.site.

Le HTTPS, qui garantit que les informations échangées entre l’utilisateur et le serveur

seront chiffrées est maintenant généralisé à l’Internet entier et un site simplement en

HTTP peut même être bloqué par certains navigateurs. Il était donc nécessaire de chiffrer

notre protocole de communication. Il fallait dans un premier temps obtenir des certificats

(SSL/TLS) pour le serveur. Nous avons pour cela utilisé "Let’s Encrypt", qui est une

autorité de certification gratuite, automatisée et ouverte mise à disposition par la société

d’utilité publique "Internet Security Research Group" (ISRG). La dernière étape a été de

mettre en place une redirection automatique et permanente de toutes les connexions en

HTTP (via le port 80) vers la connexion HTTPS (port 443).

1.5 Article 1 : Optimizing Literature Search : TEMAS,

A New Text-Mining Algorithm-Assisted Search

Tool

10

Optimizing Literature Search: TEMAS, A New Text-Mining Algorithm-Assisted

Search Tool

Bonnet Emmanuel*, Daurès Jean-Pierre, Landais Paul

Montpellier University, EA2415, Clinical Research University Institute, Montpellier, France

*Corresponding author: E. Bonnet

Institut Universitaire de Recherche Clinique, 34093 Montpellier CEDEX 5, FRANCE

Tel.: +33 4 11 75 98 39

E-mail: [email protected]

ABSTRACT

Background: Literature search is challenging when thousands of articles are potentially

involved. To facilitate literature search we created TEMAS a Text Mining Algorithm-assisted

Search tool that we compared to a PubMed reference search (RS) in the context of etiological

epidemiology.

Methods: The 4 steps of TEMAS are: 1) a classic PubMed global search 2) a first sort

removing articles without abstracts or containing off-topic terms 3) a clustering step with a

descending hierarchical classification regrouping articles in independent classes 4) a final sort

extracting from the targeted class the abstracts containing the terms of interest, with a link to

the corresponding PubMed articles. Validation was performed for risk factors of breast

cancer. We estimated the precision and recall rate compared to RS. Average precision and

discounted cumulative gain (DCG) were also computed to perform a ranking-based

evaluation. We also compared TEMAS results with articles selected in two meta-analyses.

Results: For risk factors of breast cancer, breastfeeding, mammographic density, oral

contraceptive, and menarche were explored. TEMAS consistently increased precision vs RS

Chapitre 1 1.5. Article 1 : TEMAS

11

(from 23% to 32%), with a recall rate from 95% to 97%, and divided the number of selected

articles to read from 2.3 to 4.8 times. Mean average precision for 100 articles was 47.4% for

TEMAS vs 20.9% for PubMed ranked by best match, and DCG showed a consistent

improvement for TEMAS compared to PubMed best match.

Discussion: TEMAS divided the results of a literature search by 3.2, and improved the

precision rate, the average precision, and the DCG compared to RS for epidemiological

studies. Reducing the number of selected articles inevitably impacted the recall rate. However, it

remained satisfactory and did not bias the corpus of information. Moreover, the recall rate was 100%

for the two meta-analyses we analyzed, which suggests that the loss of recall rate observed above

concerned articles not relevant enough to be included in the meta-analyses.

Conclusion: TEMAS provides a user-friendly interface for non-specialists of literature search

confronted with thousands of articles and appeared useful for meta-analyses.

Keywords: Bibliography; Literature search; Text Mining; recall rate; precision; web

application

CONTRIBUTIONS TO THE LITERATURE

· Literature search is challenging when thousands of articles are potentially involved.

To facilitate it, we created TEMAS which was applied to etiological studies in

epidemiology. It divided the results of a literature search by 3, improved the precision

rate together with the average precision, and the discounted cumulative gain

compared to the PubMed best match.

· TEMAS web application provides a user-friendly interface for non-specialists of

literature search, and does not require any specific knowledge.


12

· Moreover, TEMAS appears as a potentially useful tool for meta-analyses with a 100%

recall rate.

· These findings contribute to make literature search simpler and more efficient.

INTRODUCTION

Literature search is an integral part of any research project. It requires using the correct syntax

and the right engine, as well as appropriately selecting and managing information and

documents. This last issue becomes complex when the search results bring thousands of

articles.

The PubMed search engine indexes more than 30 million citations of the biomedical

literature, from more than 5,200 worldwide journals and online books. Several communities

have addressed the challenge of automated information retrieval in the literature1–3

. Several

methods are provided to perform systematic reviews such as PRISMA4, MOOSE

5, or

COSMO-E6. These methods are very effective when the research question is clearly

formulated, however, they afford little guidance when the informational need of the researcher

is less circumscribed, for instance looking for risk factors or predictive factors. There are also

many methods using machine learning algorithms7–9

, or methods focused on specific topics10

.

However, these methods do not provide information for sorting without a priori a large

number of articles.

So we set up a new tool allowing researchers to easily explore very large volumes of data,

selecting relevant articles by a text mining approach. We assessed whether TEMAS, a text

mining algorithm-assisted search, was more relevant than a classic PubMed search in the field

of etiological epidemiology, which studies the causal factors of diseases. Most etiological

studies are observational case/control studies or exposed/non exposed cohorts. The validation

of this tool was carried out by comparing TEMAS and a classic PubMed search for non-genetic


13

risk factors of breast cancer and for meta-analyses including case-controls or cohorts studies.

MATERIALS AND METHODS

We developed an interactive web application using Shiny11

to implement our new Text

Mining Algorithm-assisted Search (TEMAS) tool. Shiny is a R software12

package that

enables building easily interactive web apps straight from R. Shiny, which combines the

computational power of R with the interactivity of the modern web. We complied with

MECIR13

guidelines for searching and selecting studies (1.5 and 1.6) available in Additional

file 3.

This 4 step web application is available at https://shiny.temas-bonnet.site.

TEMAS Tool description

A chart describing the TEMAS tool with the proceeding steps appears in Figure 1.


14

Figure 1: Flowchart describing the proceeding steps of TEMAS


15

TEMAS Step 1: Global Search

Our new tool allows choosing a date range for the study and entering the search keywords

(labeled “Global Search”). An advanced search is possible since it respects the PubMed search

syntax. The number of PubMed answers matching the search criteria is displayed.

All the abstracts are retrieved using the package rentrez14

of R software, which provides an

interface to the NCBI’s EUtils API. It enables users to search databases like PubMed (which

is, here, the default database), processing the results of those searches, and loading data into

their R sessions. The result of this research can be downloaded as a file in CSV format.

We chose a time period from 2006 to 2017 to perform our documentary search15

. We used an

advanced search with the following syntax:

((risk_factors[MeSH Terms] OR

risk_factor[Title/Abstract] OR

risk_factors[Title/Abstract]) AND

(breast neoplasms[MeSH Terms] OR

breast_cancer[Title/Abstract] OR

breast_neoplasms[Title/Abstract])

)

TEMAS Step 2: First sort

A first sort allows removing PubMed results without abstracts (letter to the editor, teaching

case, erratum). A second pass can be performed to eliminate all articles that contain off-topic

terms (selected by the user) of the search. The result of this step is downloaded as a file in

TXT format that will be used at the next step.


16

TEMAS Step 3: Clustering exploration and extraction

Clustering exploration

The resulting abstracts are analyzed using the Rainette package16

. This is a R package for

multidimensional analysis of texts and questionnaires that enables statistical analyses of large

corpora of texts17

. More specifically, it allows classifying abstracts by similarity using Reinert’s

classification18

.

It is a descending hierarchical classification (DHC) carried out in two stages, which offers

a global vision of the corpus explored. After a corpus partitioning, statistically independent

classes of words are identified, which are characterized by specific correlated terms. This

type of analysis enables grouping articles according to concepts. These classes are interpretable

according to their profiles, which are characterized by specific correlated terms. The DHC

summarizes this process by a dendrogram, which offers the possibility to choose a

clusterization including 2 to 6 classes. Once these classes characterized, the class of interest

can be identified among the available classes. This is the class that contains the terms or

concepts that best match the user's search.

Clustering extraction

Then the extraction step begins. At this stage, there are two distinct possibilities whether

the search deals with a single term or multiple associated terms:

– Single term extraction: Selects the abstracts containing the “term of interest” within

the retained class of interest mentioned above.

– Multiple terms extraction: First, target the most representative term, here called “main

term”, and define the “complementary term”. Then define the maximum distance

(expressed as a number of characters) allowed between “main” and “complementary”

terms. The selected articles contain the “main term” and the associated


17

“complementary term” within the defined maximum distance allowed (e.g. for “oral

contraceptive”, the “main” term is “contraceptive” and the “complementary” term is

“oral”).

These terms were called “extracted term(s)”.

TEMAS Step 4: Final sort

This last step enables carrying out the final sorting of the articles extracted from the former

step. A default choice is offered, which only keeps articles the abstracts of which contain at

least one of the following terms: Odds Ratio, Odds, OR, Relative Risk, RR, Hazards Ratio, or

HR. Other relevant terms for the targeted search may be introduced also.

TEMAS effectiveness analysis

Precision and recall rates

We studied the “relevance” of our information retrieval19. The information retrieval

evaluation is based on relevance metrics, namely recall and precision that were estimated as

follows20–22

.

!"#$%$&'(!)*" = ('+,-"!(&.(!"/"0)'*()!*$#/"%($'(*1"(23(4)*)-)%"(567893: !"% ; <

*&*)/('+,-"!(&.()!*$#/"%($'(*1"(23(4)*)-)%"567893: !"% ; <

!"#$$% #&! = %'()*! %+,% !$!-#'&%# &."$!/%.'%&0!%12345%6#&#*#/!'()*! %+,% !$!-#'&%# &."$!/%.'%&0!%75%6#&#*#/!

Precision is the fraction of relevant instances among the retrieved instances. Recall is the

fraction of the total amount of relevant instances that were actually retrieved. Both precision

and recall are therefore based on an understanding and measure of relevance.

The measurement of precision and recall rates requires a qualified individual to inspect the

output from a search and to address the output into two groups of articles: relevant and not

relevant.


18

The recall rate cannot be calculated for the RS. Indeed, all articles in the PubMed

database should be read to calculate the RS recall rate. The recall rate of a PubMed search is

not 100% since there may be relevant articles that are not extracted by the RS. Conversely, it

was possible to calculate TEMAS recall rate relative to RS. Precision was calculated for both

TEMAS and RS.

To assess TEMAS’ efficiency versus a classic PubMed search, we performed a reference

search (RS) with the syntax below:

("Global Search" AND

("extracted word(s)"[MeSH Terms] OR

"extracted word(s)"[Title/Abstract])

)

This RS retrieved nRS articles, which corresponds to the total number of RS articles.

In order to ascertain the content of each selected article, a stratified random sampling of 300

articles to read was performed; 89:;<>

8?>× @AA articles in the 'BCDEF articles and

8?>%G%89:;<>8?>

× @AA articles in 'HF I 'BCDEF articles.

A figure explaining the stratified random sampling is available in Additional file 1.

Two authors (EB and PL) independently read the 300 randomly selected articles for each risk

factor to assess their relevance. The criterion of relevance was the presence, in the abstract, of

an odds-ratio (OR), a relative risk (RR), or a hazards ratio (HR) for the “extracted word(s)”.

The final step in the comparison was to verify that a bias was not introduced in the

representativeness of the articles selected by TEMAS in relation to the RS articles. We

performed a Chi2 test (with simulated P-value for small numbers) to compare the distribution

of OR (< 1, = 1 or > 1) among the relevant articles retained by TEMAS and RS.


19

Average Precision and Discount Cumulative Gain (DCG)

To evaluate our method, we also calculated the average precision and the Discount

Cumulative Gain (DCG)23

to take into account the notion of ranking. We therefore retrieved

the first 100 most recent TEMAS articles ranked by decreasing publication date, and the first

100 PubMed articles classified by "Best Match"24

, the relevance search for PubMed.

Precision at n (KL') is the proportion of the top-n documents (the first n in the ranking) that

are relevant. If { MN ON � N P} are the relevant documents retrieved at rank ' the first n in the

ranking, then: KL' =% P8. The Average Precision (AP) is the mean of the precision scores

after each relevant document retrieved:

4KL' = Q KL RPRSMT

We computed 4KL' for ' U {VWN WAN XWN YAA}

The Mean Average Precision (MAP) is the arithmetic mean of the average precision values

for an information retrieval system over a set of q queries. Let 4KRL'%be the average

precision at n for query .:

34KL' = YZ[4KRL'

\

RSM

Discounted Cumulated Gain (DCG) assumes that the greater the ranked position of a relevant

document, the less valuable it is for the searcher, because the less likely it is that the searcher

will ever examine the document, and at least has to pay more effort to find it. DCG

formalizes these assumptions by crediting a retrieval system for retrieving relevant

documents by relevance, which is discounted by a factor dependent on the logarithm of the

document’s ranked position. Let R (. U {YN� N '}] be the rating of article .; R = A if the

article is not relevant and R = Y if the article is relevant.


20

^_`8 = M a[ RlogO .

8

RSO

Meta-analysis

We tested whether TEMAS would be suitable to find all the articles selected by the

researchers in meta-analyzes. On the one hand, we performed a reference search following

the queries described in the meta-analysis method to retrieve the articles. On the other hand,

we followed TEMAS method from step 1 to step 3 using the same query strategy. If the

relevance criterion in the meta-analysis was the presence of an OR, a RR, or a HR, we

stopped TEMAS at step 3.We performed TEMAS step 4 otherwise.

We then retrieved all the articles used in two meta-analyses and compared these articles to

those retrieved by the reference search and TEMAS.

Material studied

We performed these effectiveness analyses for several searches:

- Precision and recall, average precision and DCG were calculated for four non-genetic

risks factors of breast cancer: breastfeeding, menarche, oral contraceptive, and breast

density;

- Comparisons with two meta-analyses were performed. The first on physical activity,

the second for age at menarche and at menopause as well, as risk factors for breast

cancer.

RESULTS

TEMAS search

TEMAS step 1: Global Search

With this Global Search (GS) we obtained a set of 15,582 articles.


21

TEMAS step 2: First sort

We excluded articles without abstract, and we defined the following terms as off topic since

they were downstream the screening procedure:

- “mastectom” for mastectomy, mastectomies...

- “metasta” for metastatic, metastasis...

- “chemot” for chemotherapy, chemotherapies

- “surger” for surgery and surgeries

- “surgic” for surgical

Among the articles extracted in Step 1, we excluded 954 articles without abstract and

4,582 off-topic articles. We thus obtained a database of 10,046 classifiable articles.

TEMAS step 3: Clustering exploration

Clustering:

As a result of the classification, 99.9% of the abstracts were classified in five disjoint classes

as shown on Figure 2.


22

Figure 2: TEMAS dendrogram for breast cancer risk factors

Class 1 gathered genetic terms such as: polymorphism, gene, allele, genotype, DNA or

genetic.

Class 2 included risk factors such as menopause, BMI, age, hormone, parity for the most


23

relevant terms. It also comprised “statistical” terms such as: ratio, confidence interval (CI),

hazards ratio (HR), relative risk (RR), logistic, Cox. In this class we therefore expected to

find articles relative to risk factors corresponding to our criterion of relevance, which was the

presence in the abstract of an odds-ratio (OR), a relative risk (RR), or a Hazards Ratio (HR)

for the ”risk factor studied”.

Class 3 gathered all breast tumor detection methods with terms such as MRI, ultrasound,

mammogram, image, or biopsy.

Class 4 was related to public health aspects of screening. Indeed, the selected terms were:

screen, program, access, public, social or communication. Thus this class referred to a systemic

approach and management of breast cancer screening rather than a risk factors search.

Class 5 was focused on treatments as suggested by the first terms of the class: therapy,

treatment or treat. This class was therefore of less interest because it referred to conditions that

were downstream the screening of breast cancer.

Therefore Class 2 appeared to be the most appropriate class of interest because it

contained information related to risk factors.

We illustrate single, multiple terms extraction or both as follows.

Single term extraction:

– Menarche: the extraction term was “menarch” and extraction led to 246 abstracts.

Multiple term extraction is illustrated by the two following examples:

– Oral contraceptive: the main term retained was “contracept” and the complementary

term was “oral” with a maximum distance of 25 characters. This extraction led to 144

articles.

– Breast/mammographic density:


24

i. Breast density: the main term was “dens” and the complementary term was

“breast” with a maximum distance of 10 characters. This extraction led to 204

articles;

ii. Mammographic density: the main term was “dens” and the complementary term

was “mammo” with a maximum distance of 25 characters. This extraction led to

316 articles;

iii. After removing duplicates, this extraction led to 368 abstracts.

Both single and multiple terms extraction:

– Breastfeed/Breast-feed:

i. Breastfeed single term extraction: extraction term was “breastf” and extraction led

to 162 abstracts;

ii. Breast-feed multiple term extraction: main term was “feed” and complementary

term was “breast” with a maximum distance of 10 characters. This extraction led

to 176 abstracts;

iii. After removing duplicates, this extraction led to 189 abstracts.

TEMAS step 4: Final sort

We chose the default choice which only kept articles the abstracts of which contain at least

one of the following terms: Odds Ratio, Odds, OR, Relative Risk, RR, Hazards Ratio or HR.

All the complete search queries are available in Additional file 2.

Thus the final TEMAS and RS databases for the 4 risk factors are shown in Table 1


25

Table 1: retained articles for TEMAS and RS

Databases

Risk factors

!"#$

%&$

Breastfeeding 142 328

Menarche 157 446

Oral contraceptive 112 306

Breast/mammographic density 174 836

TEMAS effectiveness analysis

Precision and recall rates

The calculation method and results for precision and recall are displayed in Figures 3 to 6.

Figure 3: RS and TEMAS precision and recall rates for breastfeeding


26

Figure 4: RS and TEMAS precision and recall rates for menarche

Figure 5: RS and TEMAS precision and recall rates for oral contraceptive


27

Figure 6: RS and TEMAS precision and recall rates for mammographic

density

For “breastfeeding” we performed a stratified random sampling of 300 articles as follows:

130 out of 142 in the TEMAS set, and 170 out of 186 (nRS - nTEMAS) articles for the RS set.

This gave a representative sample of 300 articles. The RS returned 23% of relevant articles.

On the other hand, the precision of TEMAS was 51%. Thus, TEMAS effectively concentrated

the relevant articles by dividing by 2.3 the number of articles to be read and by increasing by

28% (p < 0.0001) the precision rate. Moreover, the recall rate of TEMAS was 97%. The

precision and recall rates for the other risk factors studied are displayed in Table 2.


28

Table 2: Recall and precision rates for the risk factors studied comparing TEMAS to the

Reference Search

Risk factor of breast

cancer

RS

Precision*

TEMAS

Precision*

TEMAS recall rate

relative to RS

breastfeeding 23 % 51 % 97 %

menarche 13 % 36 % 95 %

oral contraceptive 20 % 52 % 95 %

breast density 8 % 33 % 95 %

*Relevance criterion: presence in the abstract of an odds ratio (OR), a hazards ratio (HR), or a relative risk (RR)

for the risk factor studied.

For “menarche” TEMAS divided by 2.8 the number of relevant articles to be read and

increased by 23% (p < 0.0001) the precision rate.

For “oral contraceptive” TEMAS divided by 2.7 the number of articles to be read, and

increased by 32% (p < 0.0001) the precision rate.

For “breast density” TEMAS concentrated the relevant articles dividing by 4.8 the number

of articles to be read, and increased by 26% (p < 0.0001) the precision rate.

For the four risk factors studied, the slight loss in the recall rate did not affect the

representativeness of TEMAS with a distribution of ORs, shown in Table 3, not significantly

different from RS (p = 1).


29

Table 3: Comparison of the distribution of the ORs for relevant articles between TEMAS and RS.

Risk factor Breastfeeding Menarche Oral

Contraceptive

Mammographic

density

Databases RS* TEMAS RS TEMAS RS TEMAS RS TEMAS

OR

<1 82% 82% 5% 5% 4.9% 5.3% 0% 0%

=1 16% 17% 30% 29% 26% 25% 9% 10%

>1 2% 2% 65% 66% 69% 70% 91% 90%

P-value p=1 p=1 p=1 p=1

*RS: Reference Search

Average Precision and Discount Cumulative Gain (DCG)

The results for AP and DCG are displayed in Tables 4 and 5. For all the risk factors studied

the average precision and the DCG are higher with TEMAS than with a reference search on

PubMed. '()@25 is 60.16% for TEMAS and 24.22% for PubMed, and '()@100 is still

better with TEMAS (48.28%) vs PubMed (21.5%).

Table 4: Average precision for TEMAS and PubMed Best Match.

Breast cancer risk factors N TEMAS (%) PubMed Best Match (%)

Breastfeeding

25 73.3 47.7

50 69.9 40.1

75 66.6 36.9

100 63.6 35.0

Age at menarche

25 67.8 25.0

50 58.6 22.3

75 54.0 22.6

100 52.2 22.3

Oral contraceptive use

25 70.9 11.6

50 59.5 15.9

75 57.2 17.5

100 56.1 18.7


30

Breast density

25 57.2 18.5

50 50.9 18.0

75 45.6 19.3

100 42.8 19.3

AP (%) calculated for the first 25, 50, 75 and 100 articles read (N) of the breast cancer risk factors studied.

Table 5: Discounted Cumulative Gain for TEMAS and PubMed Best Match.

Breast cancer risk factors N TEMAS PubMed Best Match

Breastfeeding

25 6.5 3.2

50 8.5 4.4

75 10.4 5.6

100 12.4 6.8

Age at menarche

25 5.0 1.5

50 7.1 2.6

75 9.1 3.5

100 10.7 4.4

Oral contraceptive use

25 5.0 0.8

50 7.9 1.9

75 10.3 3.0

100 12.4 4.0

Breast density

25 3.6 1.5

50 4.8 2.0

75 6.5 3.2

100 7.8 3.7

DCG calculated for the first 25, 50, 75 or 100 articles read (N) of the breast cancer risk factors studied

Meta-analysis

We focused on two meta-analyses to test the ability of TEMAS to retrieve all the articles

selected by the researchers to perform their meta-analysis.

The first one was focused on physical activity and the risk of breast cancer25

and based on

prospective studies. The authors selected 31 articles to perform their meta-analysis; we will

call these articles “relevant articles”. We performed a PubMed reference search with the

search terms indicated in the meta-analysis that led to 552 articles. Among these articles,

there were 30/31 relevant articles. The missing article was either found by the authors in


31

another database or through another search not detailed in the article methodology. TEMAS

(whose method applied to this performance test for meta-analysis is detailed in Additional

file 2) led to a total of 145 articles among which we also found the 30 relevant articles. The

recall rate for TEMAS with this meta-analysis was therefore 100%, while dividing the

number of articles to be read by 3.8.

The second meta-analysis26

was focused on age at menarche and at menopause as risk factors

for breast cancer. The articles included in the meta-analysis were divided into 2 groups,

cohort studies and case-control studies. There were 31 relevant articles available on PubMed.

Among these 31 articles, 26 were found by the reference search; other articles came from

other sources, as indicated in the methodology. The recall rate for TEMAS (we stopped

TEMAS in step 3 because there was no condition on the presence of OR or RR in the meta-

analysis methodology) was 100% and divided the number of articles to read by 2.56 (5431

articles to read with TEMAS vs 13885 articles to read with a PubMed reference search).

For case-control studies, TEMAS recall rate was also 100%, all 52 relevant articles have been

found; while dividing the number of articles to read by 2.92.

For each of the three meta-analyses tested, TEMAS recall rate was 100%. Therefore, it

appears that TEMAS can help reducing the time required to search the literature for articles

on PubMed, in order to perform a meta-analysis, by dividing between 2.5 to 3.8 times the

number of articles to be read, while maintaining a recall rate of 100%.

DISCUSSION

Our Text-Mining Algorithm-assisted Search enabled making a global search on all the risk

factors of a given disease by performing once a single classification. Then we could extract

from this classification the articles concerning the risk factors of interest. For each risk factor

studied the number of articles to read was divided by 3.2 on average (2.3 to 4.8) compared to

RS. Thus, if we considered the five risk factors studied, RS lead to 2607 articles to read when


32

it was 706 for TEMAS. Moreover, we improved the precision rate by 23% to 32%, according

to the risk factor studied. Whatever the bibliographic search, TEMAS always improved the

precision rate even if RS provided poor precision. TEMAS can be used to look for risk

factors whatever the studied conditions, and appeared of interest for literature search for

epidemiological studies. In this context we retained the most used metrics, namely precision

and recall, average precision and Discounted Cumulative Gain to quantify the performance of

TEMAS information retrieval.

The workload dedicated to reading all the articles selected by a classic search becomes so

important that new approaches are needed. Several approaches are based on text-mining

using machine learning methods7–9

. However, they need to be trained on a carefully

constructed training data set representative of the topic of interest. Of note, it is necessary to

train these algorithms for each targeted topic. Moreover, it requires specific skills to

implement these algorithms. For instance, text mining tools have been implemented to extract

information about microbial biodiversity in food10

. However, this search cannot be extended to

another subject without re-calibration. Conversely, TEMAS does not require a training set. It

uses a hierarchical classification procedure based on co-occurrences on terms appearing in the

abstracts. This framework enabled to develop different types of searches without training step.

Furthermore, this method can be implemented by non-specialists thanks to the R Shiny web

app that helped providing a user-friendly interface available at https://shiny.temas-

bonnet.site.

Our method has some limits. At the classification stage, the choice of the optimal number

of classes might seem complex. Indeed, there are no prescriptive thresholding rules. In our

classification procedures, we kept 5 classes for breast cancer since it enabled obtaining

enough disjoined classes and an interesting clustering for our searches. The optimal number

of classes is different from a search to another.

For multi terms searches, we selected a distance of 10 to 25 characters between the main


33

and complementary terms depending on the risk factor studied and on the retained proximity

between the retained terms. These thresholds had been tested and appeared satisfactory without

omitting relevant articles on one side, and on the other side limiting irrelevant information.

Reducing the number of articles to read inevitably impacted the recall rate. However, the

recall rate of TEMAS compared to RS remained satisfactory, ranging from 95% to 97% for

the risk factors of breast cancer. It meant only one to three missing articles per search. In

addition, we ranked the relevant articles according to the value of the OR (< 1, = 1, or > 1)

for the risk factor studied. We did not observe any significant difference in the distribution of

ORs between RS and TEMAS relevant articles. Even if we did not get a recall rate of 100%,

the loss in recall rate did not bias the overall information given the above-mentioned

distribution of the OR values.

We also checked whether TEMAS would achieve a 100% recall on articles selected in

meta-analyses. We tested two meta-analyses, the recall rate was 100%. The loss of recall rate

that we observed for the risk factors analyses might have targeted articles not relevant enough

to be included in a meta-analysis. Of note, this PubMed-based search did not explore either

other databases or other sources (grey literature, theses,..)

Future work will test whether our approach applies to other medical literature databases.

CONCLUSION

Faced with the growing number of articles indexed in PubMed, TEMAS offers an effective

way to better target relevant articles in the literature. TEMAS was applied to etiological

studies in epidemiology. It divided the number of articles to read by 3, while increasing

search accuracy by at least 23% with a satisfactory recall rate. It also improved the average

precision and the discounted cumulative gain compared to PubMed best match. Moreover,

TEMAS appears as a potentially useful tool for meta-analytical search with a 100% recall

rate. In addition TEMAS tool is accessible online by everyone thanks to the web application


34

that we have developed.

LIST OF ABBREVIATIONS

· AP: Average Precision

· DCG: Discounted Cumulative Gain

· HR: Hazards Ratio

· MAP : Mean Average Precision

· OR: Odd Ratio

· RS : Reference Search

· TEMAS: Text-Mining Algorithm-Assisted Search

DECLARATIONS

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the

corresponding author (EB) on request.

Competing interests

The authors declare no competing interests

Funding

This work was funded by the University of Montpellier and Nîmes University Hospital which

supported EB phD thesis.

Authors' contributions

EB, PL and JPD conceived the project; EB designed the web application; EB collected the

material; EB and PL read the selected articles; EB, PL and JPD participated in results


35

interpretation, wrote and revised the manuscript.

Acknowledgements

We acknowledge Montpellier University and Nîmes University Hospital for enabling to

develop this research.

REFERENCES

1. Fleuren WWM, Alkema W. Application of text mining in the biomedical domain.

Methods. 2015;74:97-106. doi:10.1016/j.ymeth.2015.01.015

2. Song M. TAKES: Two-step Approach for Knowledge Extraction in Biomedical

Digital Libraries. J Inf Sci Theory Pract. 2014;2(1):6-21. doi:10.1633/jistap.2014.2.1.1

3. Huang CC, Lu Z. Community challenges in biomedical text mining over 10 years:

Success, failure and the future. Brief Bioinform. 2016;17(1):132-144.

doi:10.1093/bib/bbv024

4. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic

reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol.

2009;62(10):1006-1012. doi:10.1016/j.jclinepi.2009.06.005

5. Stroup DF, Berlin JA, Morton SC, et al. Meta-analysis Of Observational Studies in

Epidemiology A Proposal for Reporting. JAMA. 2000;28328315(15):2008-2012.

doi:10.1001/jama.283.15.2008

6. Dekkers OM, Vandenbroucke JP, Cevallos M, Renehan AG, Altman DG, Egger M.

COSMOS-E: Guidance on conducting systematic reviews and meta-analyses of

observational studies of etiology. PLoS Med. 2019;16(2):e1002742.

doi:10.1371/journal.pmed.1002742

7. Simon C, Davidsen K, Hansen C, Seymour E, Barnkob MB, Olsen LR. BioReader: A


36

text mining tool for performing classification of biomedical literature. BMC

Bioinformatics. 2019;19. doi:10.1186/s12859-019-2607-x

8. Ševa J, Wiegandt DL, Götze J, et al. VIST - a Variant-Information Search Tool for

precision oncology. BMC Bioinformatics. 2019;20(1). doi:10.1186/s12859-019-2958-3

9. Lin RTK, Dai HJ, Bow YY, Chiu JL Te, Tsai RTH. Using conditional random fields

for result identification in biomedical abstracts. Integr Comput Aided Eng.

2009;16(4):339-352. doi:10.3233/ICA-2009-0321

10. Chaix E, Deléger L, Bossy R, Nédellec C. Text mining tools for extracting information

about microbial biodiversity in food. Food Microbiol. 2019;81:63-75.

doi:10.1016/j.fm.2018.04.011

11. Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie JM. shiny: Web Application

Framework for R. 2019.

12. R Core Team. R: A Language and Environment for Statistical Computing. 2019.

https://www.r-project.org/.

13. Higgins JPT, Lasserson T, Tovey D, Thomas J. Methodological Expectations of

Cochrane Intervention Reviews ( MECIR ) Standards for the conduct and reporting of

new Cochrane Intervention Reviews , reporting of protocols and the planning , conduct

and reporting of updates Flemyng and Rachel Churchill V. Cochrane Methods.

2020;(March).

14. Winter DJ. rentrez: An R package for the NCBI eUtils API. R J. 2017;9(2):520-526.

doi:10.32614/rj-2017-058

15. Baumann N. How to use the medical subject headings (MeSH). Int J Clin Pract.

2016;70(2):171-174. doi:10.1111/ijcp.12767

16. Barnier J. rainette: The Reinert Method for Textual Data Clustering. 2020.

https://cran.r-project.org/package=rainette.


37

17. Ratinaud P, Marchand P. Application de la méthode ALCESTE aux “gros” corpus et

stabilité des “mondes lexicaux”: analyse du “CableGate” avec IRAMUTEQ

[Application of the ALCESTE method to “big” corpora and stability of “lexical

worlds”: analysis of “CableGate” with IRAMUTEQ]. Actes des 11èmes Journées Int

d’Analyse des Données Textuelles. 2012:835–844. http://lexicometrica.univ-

paris3.fr/jadt/jadt2012/Communications/Ratinaud, Pierre et al. - Application de la

methode Alceste.pdf. Accessed July 15, 2019.

18. Reinert M. Une méthode de classification descendante hiérarchique : application à

l’analyse lexicale par contexte. Cah l’analyse des données. 1983;3(2):187-198.

http://www.numdam.org/article/CAD_1983__8_2_187_0.pdf. Accessed July 15, 2019.

19. Kagolovsky Y, Möhr JR. A new approach to the concept of “relevance” in information

retrieval (IR). In: IOS Press, ed. Studies in Health Technology and Informatics. Vol 84.

; 2001:348-352. doi:10.3233/978-1-60750-928-8-348

20. Sampath Kumar B, Pavithra S. Evaluating the searching capabilities of search engines

and metasearch engines: a comparative study. Ann Libr Inf Stud. 2010;57(2):87-97.

https://pdfs.semanticscholar.org/cff6/e1fa3ab2b406b0b68bd9ee85ce26703cb020.pdf.

Accessed July 30, 2019.

21. Marchionini G. Evaluating digital libraries: A longitudinal and multifaceted view. Libr

Trends. 2000;49(2):304-333. https://core.ac.uk/download/pdf/4817672.pdf. Accessed

July 30, 2019.

22. Harter SP, Hert CA. Evaluation of Information Retrieval Systems: Approaches, Issues,

and Methods. Annu Rev Inf Sci Technol. 1997;32:3-94.

https://eric.ed.gov/?id=EJ565471. Accessed July 30, 2019.

23. Dupret G, Piwowarski B. Model based comparison of discounted cumulative gain and

average precision. In: Journal of Discrete Algorithms. Vol 18. Elsevier; 2013:49-62.


38

doi:10.1016/j.jda.2012.10.002

24. Fiorini N, Canese K, Starchenko G, et al. Best Match: New relevance search for

PubMed. PLoS Biol. 2018;16(8). doi:10.1371/journal.pbio.2005343

25. Wu Y, Zhang D, Kang S. Physical activity and risk of breast cancer: A meta-analysis

of prospective studies. Breast Cancer Res Treat. 2013;137(3):869-882.

doi:10.1007/s10549-012-2396-7

26. Hamajima N, Hirose K, Tajima K, et al. Menarche, menopause, and breast cancer risk:

Individual participant meta-analysis, including 118 964 women with breast cancer

from 117 epidemiological studies. Lancet Oncol. 2012;13(11):1141-1151.

doi:10.1016/S1470-2045(12)70425-4


39

Chapitre 1 1.6. Discussion

1.6 Discussion

Face au nombre grandissant d’articles, TEMAS appliqué à des recherches d’études

étiologiques en épidémiologie permet de diviser le nombre d’articles à lire par 3 tout en

augmentant la précision de la recherche entre 23% et 32% grâce aux différents choix de

l’utilisateur pendant le processus de recherche TEMAS. En effet l’utilisateur prend des

décisions à trois niveaux : sur les termes qu’il juge hors sujet afin d’éliminer les articles les

contenant, sur la classification en choisissant le nombre de classes qui permet d’obtenir des

classes assez distinctes tout en offrant une classe intéressante pour la recherche concernée ;

enfin sur les critères qu’il fixe pour évaluer la pertinence des articles (présence d’OR, RR

ou HR par exemple). Sur l’ensembles des exemples que nous avons étudié TEMAS permet

d’améliorer la précision par rapport à une recherche de référence de 27% en moyenne. La

précision finale de TEMAS dépend donc en grande partie de la pertinence des articles

retournés par la recherche PubMed de base. Si la recherche PubMed renvoie un taux de

réponses pertinentes bas, TEMAS peut l’améliorer mais ne peut pas retrouver des articles

qui ne contiennent pas, dès le départ, l’information pertinente recherchée.

Dans une recherche bibliographique, la sémantique est très importante et, même si

BestMatch [10] propose une méthode de machine-learning qui est censée améliorer le clas-

sement des résultats de PubMed, la décision humaine reste indispensable. Cela permet

d’obtenir avec TEMAS des résultats intéressants en taux de précision et de rappel bruts

sur l’ensemble des articles d’une recherche, mais aussi de meilleures performances sur le

classement (ranking) des articles pertinents avec une meilleure précision moyenne (Ave-

rage Precision) sur les 100 premiers articles ainsi qu’un meilleur "gain cumulé réduit" ou

Discounted Cumulative Gain (DCG) par rapport au BestMatch de PubMed.

Il existe encore une grande disparité dans les informations données par les auteurs dans

les abstracts. Nous l’avons également noté à l’occasion d’une recherche bibliographique

que nous avons réalisée sur les modèles prédictifs. Une grande partie des abstracts ne

présentaient ni le nombre de sujets inclus, ni aucune notion statistiques de type HR ou RR

entre les groupes analysés. Ce sont pourtant des notions fondamentales pour identifier les

articles pertinents. Le TRIPOD [6] (Transparent Reporting of a multivariable prediction

model for Individual Prognosis or Diagnosis) est une liste de contrôle de 22 éléments jugés

essentiels pour la rédaction d’un article sur une étude de modèle de prédictif et qui vise

à améliorer la transparence du "reporting" de celle-ci, quelles que soient les méthodes

utilisées. Il existe dans cette liste de contrôle une phrase sur les abstracts qui conseille

de "Présenter un résumé des objectifs, le design de l’étude, la taille de l’échantillon, les

prédicteurs, l’événement étudié, l’analyse statistique, les résultats, et la conclusion". Mais

il ne s’agit que d’une phrase avec peu de détails. En juillet 2020 une version spéciale pour

les abstracts a été publiée, le TRIPOD for Abstracts [12]. Ces nouvelles recommandations

sur la rédaction des résumés d’articles en matière d’analyse pronostique ou diagnostique

40


les rendront plus facilement analysables dans le cadre d’une recherche bibliographique.

Les indicateurs de performance peuvent varier selon le type de recherche. Comme

expliqué ci-dessus, les articles sur les modèles prédictifs sont très mal référencés dans

PubMed, ce qui peut réduire la performance de TEMAS. A contrario, choisir à l’étape

4 un terme très précis (par exemple, « HR » ou « survival » si le but de la recherche

est d’extraire uniquement des articles parlant de survie) permettrait de réduire le nombre

d’articles en ciblant beaucoup plus précisément la recherche par rapport au choix par

défaut que nous avons utilisé qui est présence de HR, OR ou RR. Cela permettrait alors

d’optimiser la performance de TEMAS.

TEMAS a été créé pour faciliter la recherche bibliographique dans le cas où le nombre

d’articles à compulser est important à très important. Un des premier choix demandé

à l’utilisateur est la plage de dates pour la recherche. Celle-ci doit tenir compte de cet

objectif et permettre de retourner un nombre conséquent d’articles pour que l’étape de

classification soit efficace et pertinente. Cependant, le choix des plages de dates reste

indépendant de TEMAS. Il dépend du choix éclairé du chercheur en fonction de l’objectif

de sa recherche. Une recherche portant sur des traitements par exemple, doit tenir compte

des recommandations d’usage des médicaments qui changent régulièrement, de la date

d’introduction d’un nouveau traitement ou de sa date de retrait du marché.

Selon le nombre d’articles que l’on choisit au départ, la durée totale du processus TE-

MAS peut varier. Les deux étapes qui nécessitent le plus de temps de traitement sont les

étapes 1 de recherche globale et 3 de clusterisation. L’étape 1 nécessite un temps d’extrac-

tion d’environ 10 à 15 secondes tous les 1000 articles (selon l’encombrement des serveurs

PubMed). Ainsi, le temps de traitement est-il directement lié au nombre d’articles que l’on

souhaite extraire. Nous avons également évalué la durée de l’étape de classification. Elle

varie aussi en fonction du nombre d’articles : 5 secondes pour 1940 articles, 50 secondes

pour 13.000 articles ou encore 1 minute 40 pour 22.000 articles.

Pour finir, nous avons présenté dans l’article le cas le plus simple de recherche. Ce-

pendant concernant l’étape 3 d’extraction, une seule classe peut être sélectionnée pour

cibler les termes d’intérêt. Il est cependant tout à fait possible, dans un deuxième temps,

de choisir une deuxième classe pour répéter le processus. De plus, il peut être intéressant

de regarder le clustering avec une classe de moins, il se peut alors que les deux classes

contenant des mots clés pertinents soient regroupées. Pour l’extraction à la fin de l’étape

3, nous avons aussi présenté le cas le plus simple dans l’article, cependant l’application

web permet de réaliser plusieurs extractions distinctes. Par exemple "density" peut être

choisi en terme principal et "breast" puis "mammographic" en termes secondaires. De plus,

une étape "bonus" a même été développée pour permettre aux utilisateurs de fusionner les

deux bases ainsi extraites et de retirer automatiquement tous les doublons qui pourraient

apparaître dans les requêtes.

41

Chapitre 2

Score de risque de cancer du sein

2.1 Introduction

Nous souhaitions explorer si les femmes à risque "moyen" qui suivent le dépistage or-

ganisé ne présenteraient pas des niveaux de risque hétérogènes vis-à-vis de la survenue du

cancer du sein au cours de leur suivi. Nous avons fait l’hypothèse que, dans cette popu-

lation, deux seuils pourraient être identifiés. Un seuil bas au-dessous duquel les femmes

seraient à bas risque de survenue d’un cancer du sein,et un seuil haut, au-dessus du-

quel le risque serait plus élevé et se rapprocherait du niveau de risque des femmes qui

ne suivent pas le dépistage organisé car à plus haut risque. Si l’hypothèse se vérifiait,

on pourrait alors proposer une nouvelle stratégie personnalisée du dépistage organisé du

cancer du sein selon trois groupes à risque pour cette population de femmes considérée

jusqu’à aujourd’hui comme homogène et à risque "moyen".

Pour développer ce raisonnement nous devions d’abord créer un nouveau score de

risque adapté à cette population de femmes françaises à risque « moyen ». En effet, les

scores existants ne sont pas adaptés aux femmes françaises [21]. De plus nous souhaitions

introduire de nouvelles variables incluant la densité mammaire, l’activité physique ou les

difficultés socio-économiques. Dans un premier temps nous avons donc exploré la littéra-

ture afin d’identifier les facteurs de risque les plus pertinents à inclure dans le modèle.

La bibliographie sur les principaux facteurs de risques, a été extraite grâce à TEMAS.

Elle est présentée en fin du document. Ensuite nous avons inclus ces facteurs dans un

questionnaire d’enquête pour interroger des femmes dans le cadre d’une enquête de co-

hortes rétrospective chez des femmes âgées de 50 à 60ans, suivies pendant 12 années à

partir de 2006. Les données ont été recueillies chez 3077 femmes héraultaises par deux

enquêtrices du Centre Régional de Coordination des Dépistages des Cancers–Occitanie

(CRCDC-OC).

La base de donnée préalablement anonymisée nous a été confiée pour l’analyse. Nous

avons alors établi un score de risque puis déterminé les seuls optimaux pour identifier

trois groupes à risque gradué de cancer du sein.

42

Chapitre 2 2.2. Création du questionnaire

2.2 Création du questionnaire

2.2.1 Concertations et méthode de création du questionnaire

La création du questionnaire a été une étape très importante. En effet, il était néces-

saire que les questions soient facilement compréhensibles, que les questions soient posées

par oral pour limiter l’incompréhension ou les réponses inadaptées et de ne pas oublier de

détail dans les réponses, car une fois que le recueil de données a commencé il n’est plus

possible de modifier ce questionnaire.

Pour cela, nous avons longuement échangé avec le Docteur Jean-Loup Chappert, mé-

decin responsable pour le département de l’Hérault du CRCDC-Occitanie, Marc Galindo,

informaticien du CRCDC-Occitanie, le Professeur Thierry Maudelonde, gynécologue mé-

dical et endocrinologue, ainsi que Faiza Bessaoud, biostatisticienne au registre des tumeurs

de l’Hérault. Ces échanges nous ont permis d’affiner les questions quand cela était néces-

saire.

2.2.2 Questionnaire

Le questionnaire est présenté en Annexe A.

2.3 Création de la base de données anonymisée

2.3.1 Nombre de sujets nécessaire

Pour estimer le nombre de sujets nécessaire, nous avons pris comme hypothèse un

rapport de risque supérieur ou égal à deux entre les groupes, et un ratio de 4 femmes

n’ayant pas développé de cancer pour une femme ayant développé un cancer sur une

période de suivi de 12 ans, avec une puissance de 80%.

Considérons :

ncancer D

�

1:96p

.c C 1/ Np.1 Np/C 0:84p

c:p1.1 p1/C p0.1 p0/�2

c.p1 p0/2

nnon cancer D c:ncancer

avec

43

Chapitre 2 2.3. Création de la base de données anonymisée

OR D 2I Rapport de risque entre les groupes définis par :

� groupe 1 : score de risque < s

� groupe 2 : score de risque � s

c D 4I Rapport nombre de témoins / nombre de cas

p0 D P.score de risque > 0:0457/ D 0:026

p1 Dp0:OR

1C p0.OR 1/

Np Dp1 C c:p0

1C c

Cela donne une taille d’échantillon nécessaire de 543 femmes ayant développé un cancer

du sein sur la période suivi et de 2174 femmes n’ayant pas développé de cancer.

2.3.2 Construction de la cohorte rétrospective

2.3.2.1 Identifier les femmes à risque moyen dans l’Hérault

La population d’intérêt étant la population des femmes incluses dans le dépistage

organisé du cancer du sein, nous nous sommes rapprochés du Centre Régional de Coordi-

nation des Dépistages des Cancers en Occitanie le CRCDC-Occitanie pour le département

de l’Hérault (anciennement Dépistage34). Cette structure suit toutes les femmes incluses

dans le dépistage. Elle envoie à toutes les femmes de l’Hérault âgées de 50 à 74 ans la

lettre d’invitation pour réaliser une mammographie gratuite dans le cabinet de radiologie

de leur choix ou dans le Mammobile [8], camion pourvu d’un mammographe allant à la

rencontre des femmes dans certaines zones moins pourvues en cabinet de radiologie. Les

résultats des mammographies sont colligés par le CRCDC-OC. Si l’examen a été considéré

comme sans anomalie par un premier radiologue, alors une deuxième lecture est réalisée

systématiquement par un deuxième radiologue agréé dans les locaux du site de l’Hérault

du CRCDC-Occitanie.

La base de données du CRCDC-OC site Hérault regroupe donc l’ensemble des femmes

de notre population d’intérêt.

2.3.2.2 Collection des mammographies sur une période de 12 années

Pour constituer la cohorte il fallait identifier les cas, c’est-à-dire les femmes qui ont

présenté un cancer au cours du suivi, et les contrôles qui n’ont pas présenté de cancer du

sein sur la période.

Le croisement des données du CRCDC-OC avec le registre de tumeurs de l’Hérault

permet une exhaustivité sur la connaissance du développement ou non d’un cancer sur la

44

Chapitre 2 2.3. Création de la base de données anonymisée

période de suivi, que celui-ci soit directement diagnostiqué suite au dépistage organisé ou

que ce soit un cancer de l’intervalle.

La base de données ainsi constituée contenait tous les cas et les contrôles auxquelles

poser les questions relatives à leur exposition à un ou plusieurs facteurs de risque de cancer

du sein sur la base du questionnaire. La base de données des femmes incluait la cotation

de leur densité mammaire selon la classification BI-RADS [1].

2.3.2.3 Constitution de la cohorte

Le flow-chart présenté plus bas détaille la constitution de l’échantillon d’étude. La

base de départ était constituée de 33.858 femmes qui ont été dépistées en 2006-2007 dans

l’Hérault. Pour 896 d’entre elles, elles ont présenté un cancer au cours de leur suivi de 12

années et 32.962 n’en ont pas présenté. Dans la mesure où nous ne pouvions pas interroger

autant de contrôles, nous avons procédé à un tirage au sort pour retenir 5000 contrôles.

Nous avons exclu les 91 femmes pour lesquelles un cancer a été diagnostiqué pendant les

2 premières années de suivi pour éliminer les cancers qui auraient pu commencer à se

développer avant l’inclusion dans la cohorte. Les appels téléphoniques des enquêtrices ont

donc ciblé 5805 femmes. Certaines d’entre elles n’ont pas pu être rejointes (non réponse

ou coordonnées téléphoniques non pertinentes) soit 141 cas et 2057 contrôles. Par ailleurs,

570 femmes ont refusé l’enquête, pour 67 cas et 463 contrôles. Le modèle du score a donc

été développé à partir de 597 cas et 2241 contrôles. Comme expliqué plus bas, pour l’étape

de rééchantillonnage, nous n’avons retenu dans le modèle final que les femmes qui avaient

répondu à toutes les questions sans aucune réponse de type « je ne sais pas ».

2.3.2.4 Enquête téléphonique

Pour obtenir des réponses les plus pertinentes possibles, nous avons retenu le principe

d’une enquête téléphonique avec des enquêtrices formées, plutôt que par courrier. Nous

avons répondu à l’appel à projets pour la recherche médicale 2018 de la Fondation de

l’Avenir pour obtenir le budget nécessaire au recrutement d’enquêtrices. Ces enquêtrices

officiaient au CRCDC-Occitanie.

Le CRF (Case Report Form) a été créé sous Microsoft Access. Access permet, grâce

à des macros, la création de questions "conditionnelles". Par exemple, si la patiente ré-

pondait "non" à la question "Avez-vous eu des grossesses avant 2006", les sous-questions

sur le nombre de grossesses et l’âge de la première grossesse n’apparaissaient pas, elles

n’apparaissaient que si la réponse à la question principale était "oui". Cela permettait de

limiter au maximum les erreurs ou oublis de saisie car l’enquêtrice savait que toutes les

questions qui s’affichaient nécessitaient une réponse. Cela a aussi permis d’optimiser le

temps des enquêtrices, en effet en reprenant le même exemple où la patiente disait qu’elle

n’avait pas eu de grossesse, en plus des sous-questions, c’était la totalité des deux ques-

45

Chapitre 2 2.4. Analyse statistique

tions suivantes sur le nombre d’enfants et la durée d’allaitement qui étaient masquées et

la prochaine question pertinente apparaissait à la suite.

Le CRCDC-Occitanie a lié le logiciel de recueil des données du questionaire avec sa

base de données. Cela lui a permis d’établir une liaison entre les données du question-

naire et celles du fichier de dépistage. Ainsi, pour chaque femme, l’occurrence ou non

d’un cancer du sein était ainsi identifiée grâce aux remontées réalisées périodiquement à

partir des dossiers médicaux ou du registre des tumeurs. Une extraction des données to-

talement anonymisée nous a été fournie par le CRCDC–Occitanie pour effectuer l’analyse

des données.

2.4 Analyse statistique

2.4.1 Modèle de Cox

Le but était de créer un score prédictif individuel de cancer du sein à partir de données

de survie avec comme événement la survenue d’un cancer du sein au cours de la période

de 12 ans de suivi.

Le modèle de Cox était le modèle le plus adapté, mais ce modèle est semi-paramétrique

c’est-à-dire qu’il ne cherche pas à estimer la fonction ƒ0, qui est la même pour tous les

individus à un instant donné, mais s’intéresse uniquement au rapport du risque instantané

d’événements pour deux individus exposés à des facteurs de risques différents.

Il est donc nécessaire d’estimer ƒ0 séparément pour obtenir un score individuel. Nous

avons donc utilisé l’estimateur de Breslow [3] défini dans l’équation 2.1 pour estimer ƒ0.

Oƒ0.t/ DX

i WTi6t

diX

j2R.Ti /

exp. O0Zj /(2.1)

avec :

— T1 < T2 < � � � < T12 les temps d’événements distincts

— di le nombre d’événements détectés à Ti

— Zi la valeur des covariables de la patiente i

— R.Ti/ l’ensemble des individus à risque au temps T i (juste avant Ti)

— O le vecteur des coefficients estimés par le modèle de Cox.

Cela permet alors d’estimer la fonction de risque F pour chaque individu comme une

fonction de ses covariables et du temps :

OF .t jZ/ D 1 OS.t jZ/ D 1 exp�

Oƒ0.t/ exp. O0Z/�

46


2.4.2 Bootstrap

Notre base de données finale d’analyse contient 2241 femmes sans cancer du sein et

527 femmes qui ont développé un cancer du sein pendant la période de suivi de 12 ans, ce

qui donne un taux de cancer du sein dans la base de 19,04%. Or dans la base de départ

du CRCDC-Oc, il y avait 33.858 femmes dont 896 ayant développé un cancer du sein

pendant le suivi, soit un taux de cancer de 2,65%. En effet, nous n’avons interrogé qu’un

sous échantillon tiré aléatoirement parmi les des 32.962 femmes n’ayant pas développé de

cancer.

Pour que le score de risque et les seuils estimés correspondent au taux de cancer

observé dans la base du CRCDC-Oc, nous avons utilisé une méthode de rééchantillon-

nage. Le boostrap est une méthode de rééchantillonnage qui consiste à créer de nouveaux

échantillons, uniquement par tirage avec remise, à partir de l’échantillon initial.

Nous avons ainsi créé 100.000 bases bootstrap de 33.858 "patientes bootstrap", conte-

nant chacune :

— 896 "cancer boostrap" i.e résultats du tirage aléatoire avec remise de 896 femmes

parmi les 527 femmes ayant développé un cancer du sein pendant le suivi et pour

lesquelles nous disposions de toutes les réponses pour toutes les variables d’intérêt.

— 32.962 "non cancer bootstrap" i.e résultats du tirage aléatoire avec remise de 32.962

femmes parmi les 2241 femmes n’ayant pas développé de cancer du sein pendant

le suivi et pour lesquelles nous disposions de toutes les réponses pour toutes les

variables d’intérêt.

Comme expliqué dans la partie 2.4.1, le modèle de Cox est un modèle semi-paramétrique

qui s’intéresse uniquement au rapport des risques instantanés d’événements pour deux in-

dividus exposés à des facteurs de risques différents. Les coefficients estimés ( O) à partir

de notre base de données restent donc identiques pour toutes les bases bootstrap. Cepen-

dant, l’estimateur de Breslow, défini dans l’équation 2.1, utilise le nombre d’événements à

chaque temps, ainsi que les valeurs des covariables des patients à risque à chaque temps.

Le ƒ0 a donc été recalculé grâce à l’estimateur de Breslow dans chacune des bases

bootstrap, pour chaque "patiente bootstrap". Cela a permis ensuite d’estrimer le risque

individuel de chaque "patiente bootstrap" en rapport avec le taux de cancer dans la base

du CRCDC-Oc. La moyenne du risque de l’ensemble des 100.000 bases bootstrap était de

2.65%, IC95% =[2,61 ;2,69]

2.4.3 Détermination des seuils de risque

2.4.3.1 Détermination de seuil à l’aide d’une courbe ROC

Nous avons créé dans la section 2.4.1 notre fonction de score de risque OF .t jZ/. Nous

notons pour la suite OF .Z/ D OF .12jZ/ puisque nous évaluons le risque à 12 ans. Un seuil

47

s définit une règle de classement de la forme :

(

OF .Zj / < s; l’individu est affecté à la classe non cancer du sein

OF .Zj / � s; l’individu est affecté à la classe cancer du sein

La courbe ROC (Receiver Operating Characteristic) est définie comme l’ensemble des

points .1 Sp.s/; Se.s// avec s 2 Œ0; 1�, avec :

— La sensibilité Se.s/ D P. OF .Zj / � sj la femme a développé un cancer du sein/

— La spécificité Sp.s/ D P. OF .Zj / < sj la femme n’a pas développé de cancer du sein/

Cette courbe ROC est très souvent utilisée pour déterminer un seuil optimal. Le critère

le plus utilisé est la recherche du seuil optimal s� qui minimise la distance euclidienne

entre la courbe ROC et le point de coordonnées (0,1) i.e. on cherche s� qui minimise

.1 Se.s//2 C .1 Sp.s//2. Cette méthode prend en compte le rapport entre le nombre

de faux positifs et de faux négatifs ainsi que P.M/ le pourcentage de malades dans la

population.

Une autre possibilité consiste à minimiser directement le nombre de mal classés F.s/

(faux positifs et faux négatifs) défini par F.s/ D N Œ.1 Se.s//P.M/C .1 Sp.s//P.M/�,

avec N le nombre de patients. Cette méthode prend en compte uniquement P.M/ et dé-

pend donc étroitement de la structure de la population à laquelle on l’applique. L’index

de Youden est un cas particulier de cette méthode dans le cas où P.M/ D 0:5.

Il est aussi possible de pondérer la sensibilité et la spécificité selon les coûts poten-

tiellement différents entre les faux positifs ou les faux négatifs. Dans ce cas, des poids q0

et q1 sont définis a priori et on cherche, par exemple, le seuil optimal s� qui maximise

q0Sp.s/C q1Se.s/.

Cependant, comme notre événement est binaire, développement ou non d’un cancer

du sein pendant la période de suivi ces différentes méthodes ne permettent de créer que 2

classes. Pour créer 3 classes, la méthode des surfaces ROC est utilisable, mais elle nécessite

une variable événement à 3 catégories. Il a donc fallu trouver une autre méthode pour

séparer notre population en trois groupes de risque gradué.

2.4.3.2 Détermination des seuils par rapport de risque

Nous avons fixé comme contrainte d’obtenir 3 groupes à risque gradué : risque "plus

bas", risque "moyen" et risque "plus élevé" avec un rapport de risque d’au moins 2 entre

les groupe connexes, à partir de notre variable événement binaire, diagnostic ou non

d’un cancer du sein. Ce choix d’un rapport de risque de 2 a été motivé par le fait que

2 � P.M/ > 5%, ce qui correspond à un niveau de risque proche de celui des femmes

déjà identifiées comme à risque élevé et non incluses dans le dépistage organisé du cancer

du sein. Nous avons ensuite décidé de conserver ce rapport en symétrie pour identifier le

groupe à plus bas risque.

48

Pour cela nous nous sommes intéressés aux rapports de vraisemblance (LR) définis

comme suit pour chaque seuil s :

LRC.s/ DSe.s/

1 Sp.s/LR .s/ D

1 Se.s/

Sp.s/

On veut donc que :

P.M jS > s/

P.M jS < s/� 2 8s � s1

et

P.M jS < s/

P.M jS > s/� 0:5 8s � s2

Ce qui revient à dire que l’on cherche s1 et s2 tels que :

— Les femmes avec un score supérieur à s1 ont 2 fois plus de risque de développer un

cancer que les femmes avec un score inférieur à s1

— Les femmes avec un score inférieur à s2 ont 2 fois moins de risque de développer un

cancer que les femmes avec un score supérieur à s2

Or, pour le seuil supérieur P.M jS < s/#P.M/. Dans ce premier cas on souhaite que,

8s � s1 :P.M jS > s/

P.M/� 2

De plus :

P.M jS > s/ DP.M \ S > s/

P.S > s/

DP.S > sjM/P.M/

P.S > s/

DP.S > sjM/P.M/

P.S > sjM/P.M/C P.S > sj NM/P. NM/

DSe.s/P.M/

Se.s/P.M/C .1 Sp.s//.1 P.M//

Donc

49

P.M jS > s/

P.M/D

Se.s/

Se.s/P.M/C .1 Sp.s//.1 P.M//

DSe.s/

P.M/.Se.s/C Sp.s/ 1/C .1 Sp.s//

D

Se.s/

1 Sp.s/

P.M/�

Se.s/

1 Sp.s/ 1

�

C 1

DLRC.s/

P.M/.LRC.s/ 1/C 1

Ainsi

P.M jS > s/

P.M/� 2

)LRC.s/

P.M/.LRC.s/ 1/C 1� 2

) LRC.s/ � 2 ŒP.M/.LRC.s/ 1/C 1� avec LRC croissante

) LRC.s/ � 2 ŒP.M/LRC.s/ P.M/C 1�

) LRC.s/ � 2P.M/LRC.s/C 2.1 P.M//

) LRC.s/ 2P.M/LRC.s/ � 2.1 P.M//

) LRC.s/ Œ1 2P.M/� � 2.1 P.M//

) LRC.s/ � 21 P.M/

1 2P.M/

Or ici P.M/ D 2:6%, et LRC est croissante, on cherche donc s1 tel que 8s � s1 :

LRC.s/ � LRC.s1/ D 2:054

Pour le seuil inférieur P.M jS > s/#P.M/. Dans ce second cas on souhaite que, 8s �

s2 :P.M jS < s/

P.M/� 0:5

50

Avec

P.M jS < s/ DP.S < sjM/P.M/

P.S < s/

DP.S < sjM/P.M/

P.S < sjM/P.M/C P.S < sj NM/P. NM/

D.1 Se.s//P.M/

.1 Se.s//P.M/C Sp.s/.1 P.M//

Donc

P.M jS < s/

P.M/D

.1 Se.s//

.1 Se.s//P.M/C Sp.s/.1 P.M//

D

1 Se.s/

Sp.s/

P.M/1 Se.s/

Sp.s/C .1 P.M//

DLR .s/

P.M/LR .s/C .1 P.M//

Ainsi

P.M jS < s/

P.M/� 0:5

)LR .s/

P.M/LR .s/C .1 P.M//� 0:5

) LR .s/ � 0:5 ŒP.M/LR .s/C .1 P.M//� avec LR croissante

) LR .s/ � 0:5P.M/LR .s/C 0:5.1 P.M//

) LR .s/ 0:5P.M/LR .s/ � 0:5.1 P.M//

) LR .s/ Œ1 0:5P.M/� � 0:5.1 P.M//

) LR .s/ � 0:51 P.M/

1 0:5P.M/

Or ici P.M/ D 2:6%, puisque LR est croissante, on cherche donc s2 tel que 8s � s2 :

LR .s/ � LR .s2/ D 0:493

Nos seuils optimaux sont alors définis comme suit :

— s1 est l’abscisse du point d’intersection de la droite y D 2:054 et de la courbe

représentative de la fonction LRC

51

Chapitre 2

2.5. Article 2 : Towards a personalized organized screening in breast cancer : practical

determination of thresholds of risk in women at average-risk

— s2 est l’abscisse du point d’intersection de la droite y D 0:493 et de la courbe

représentative de la fonction LR

Nous avons mis en œuvre cette approche dans l’article 2 présenté dans la partie 2.5 et

soumis à l’European Journal of Epidemiology.

2.5 Article 2 : Towards a personalized organized scree-

ning in breast cancer : practical determination of

thresholds of risk in women at average-risk

52

European Journal of Epidemiology manuscript No.(will be inserted by the editor)

Towards a personalized organized screening in breast

cancer : practical determination of thresholds of risk

in women at average-risk

Emmanuel Bonnet · Jean-Pierre

Daures · Paul Landais

Received: date / Accepted: date

Abstract In France, more than 10 million women at ”average” risk of breastcancer (BC), are included in the organized BC screening. Existing predictivemodels of BC risk are not adapted to the French population. Thus, we setup a new score in the French Herault region and looked for a graded levelof risk in a population at ”average” risk. We recruited a retrospective co-hort of women, aged 50 to 60, who underwent the organized BC screening,2241 non-cancer women and 527 who developed a BC during a 12-year follow-up period (2006-2018). The risk factors identified were high breast density(ACR BI-RADS grading)(B vs A: HR 1.41, 95%CI [1.05;1.9], p=0.023; C vs A:HR=1.65 [1.2;2.27], p=0.02 ; D vs A: HR=2.11 [1.25;3.58],p=0.006), a historyof maternal breast cancer (HR=1.61 [1.24;2.09], p < 0.001), and socioeconomicdifficulties (HR 1.23 [1.09;1.55], p=0.003). While early menopause (HR=0.36[0.13;0.99], p=0.003) and an age at menarche after 12 years (HR=0.77 [0.63;0.95],p=0.047) were protective factors. We identified 3 groups at risk: lower, aver-age, and higher, respectively. A low threshold was characterized at 1.9% ofrisk and a high threshold at 4.5%. Mean risks in the 3 groups of risk were1.37%, 2.68%, and 5.84%, respectively. Thus, 12% of women presented a levelof risk different from the average risk group, corresponding to 600,000 womeninvolved in the French organized BC screening, enabling to propose a newstrategy for performing an organized and personalized national BC screening.

E. BonnetMontpellier University, EA2415Institut Universitaire de recherche clinique, 34093 Montpellier CEDEX 5, FRANCETel.: +33 4 11 75 98 39E-mail: [email protected]

J.P. DauresMontpellier University EA2415 and Beau Soleil Clinic, Montpellier, France

P. LandaisMontpellier University, EA2415, France

Chapitre 2 2.5. Towards a personalized organized screening

53

2 Emmanuel Bonnet et al.

Keywords Breast cancer · Mass Screening · Scoring · Risk Factors · Breastdensity

1 Introduction

Breast cancer is the most common cancer in women and the leading cause ofdeath from cancer in women in France. The earlier breast cancer is detected,the greater the chances of recovery. For a breast cancer detected at an earlystage, 5-year survival is 99% and 27% for a metastatic cancer [1]. The differencein the detection stage also influences the cost of medical care [2]. While highrisk women [3] receive an individualized monitoring, the majority of womenare included in the nationwide organized breast cancer screening. In France,this screening consists of a mammography (2 views) every two years with adouble reading of mammographic images.

In April 2017, the Ministry of Health published an action plan to renovatethe organized breast cancer screening [4] based on the recommendations ofthe National Cancer Institute. This reorganization aimed to enable a morepersonalized organized screening, and encouraged research projects on toolsand methods for assessing the level of risk, including scoring. We wonderedwhether, among the 10 million women at ”average-risk” targeted for the Frenchorganized screening [5], some women would be at lower-risk than average riskand others at higher risk.

Several predictive models already exist, such as Gail [6], BRCAPRO [7],BODICEA [8], or Tyrer-Cuzick [9] models. But they are not adapted to theEuropean population [10] and most of them include BRCA1 and BRCA2 mu-tations. However, in France, women presenting with these mutations are notincluded in the organized nationwide screening. Moreover, it is difficult to es-timate an individual risk precisely [11,12]. We, therefore, intended to set upa new risk model, including new variables, adapted to the French population,and enabling to explore whether in this average risk population it was possibleto identify women at lower risk of breast cancer (or higher risk, respectively)in order to adapt the organized screening accordingly.

2 Material and methods

2.1 Database

We used the database of the Center that coordinates breast cancer screen-ing (CRCDC-OC) in the Herault department, France. This database registerswomen with precise, verified, and consolidated information on the developmentor not of breast cancer during the follow-up of their screening. We enrolledwomen, aged 50 to 60 at inclusion, who experienced a 12 year breast cancerscreening from 2006. We extracted a database of 33,858 women, including 805women who developed breast cancer in the 3-12 years following inclusion. We


54

Towards a personalized organized screening in breast cancer 3

excluded women who have been diagnosed with breast cancer during the firsttwo year of follow up to avoid including women who might have already de-veloped a cancer but not yet diagnosed at inclusion. Data were recovered by atelephone survey. It was not possible to interview 33,000 women. Therefore, onone hand we selected all the women who developed a breast cancer during thefollow-up period, and on the other hand, we randomly selected 5000 womenamong those who did not develop breast cancer.

2.2 Power

We hypothesized a risk ratio ≥ 2, with a ratio of 4 to 1 between women whodid not develop breast cancer and those who did during the follow-up period,together with a power of 80%. It gave a required sample size of 543 womenwith breast cancer during the follow-up period and 2174 women without breastcancer.

2.3 Study questionnaire

We looked for the risk factors for breast cancer in average risk women reportedin the literature prior to creating the study questionnaire. We elaborated atext mining algorithm-assisted search (TEMAS) (available at shiny.temas-bonnet.site) to optimize the bibliographic search on very large volumes ofscientific articles. We also included the variables used in Gail’s model relativeto women at average risk. The study questionnaire is provided in Supplemen-tary material 1.

2.4 Data collection

Data were collected by telephone by two trained investigators. Responses tothe study questionnaire were collected between November 2018 and May 2019.The inclusion flowchart is shown on Figure 1. This survey enabled collecting3077 questionnaires, including 597 questionnaires from women who developeda breast cancer. The Case Report Form was developed with the ACCESSsoftware. The analysis database was provided by the CRCDC-OC while havinganonymized the data beforehand. This study complies the MR003 CNIL’s(National Commission for Informatics and Liberties) methodology (declarationn◦: 2200402v0).

2.5 Statistical analysis

Data management was performed on the fly. Statistical analysis was carriedout using R software [13].


55


Fig. 1 Flowchart of the retrospective cohort study

2.5.1 Descriptive analysis

We discretized the variables according to the literature. For duration of oralcontraceptive [14] we retained the classes 0, [1,5[, [5,10[, and ≥ 10. For thenumber of biopsies, we complied with Gail’s classes [6] 0, 1, ≥ 2. Breast den-sity was categorized in four classes according to the ACR BI-RADS (BreastImaging-Reporting and Data System) Atlas [15] : A: the breasts are almostentirely fatty, B: there are scattered areas of fibroglandular density, C: the


56

breasts are heterogeneously dense, and D: the breasts are extremely dense.For the number of hours per week of physical activity we followed WHO rec-ommendations [16] : 0, [1-5[, and ≥ 5. Alcohol consumption (in glasses perweek) was discretized as follows : 0, [1,10[, ≥ 10 according to Sante PubliqueFrance (French public health service) recommendations [17]. Age at menarchewas divided as follow: < 12 and ≥ 12 [18]. Age at first live birth was dis-cretized in 3 classes [19]: nulliparous, ≤ 24, and > 24. For number of children,we used the average number of children in the generation of interest [20]: 0,[1,3[, and ≥ 3. For the total duration of breastfeeding we followed WHO andUNICEF recommendations [21]: 0, [1,7[, and ≥ 7. For age at menopause, wechose to use the definition of the national college of French gynecologists ofearly menopause [22]: < 41, ≥ 41. We created a socio-economic hardship bi-nary variable that was yes if the interviewee presented financial hardship and /or was not going on vacation. We also studied overweight (i.e. BMIgeq25). Wealso studied detailed family history of breast cancer : mother, sister, daugh-ter, and aunt. All the quantitative variables were also tested as continuousvariables.

2.5.2 Cox model

All variables with a P-value <0.2 in the univariate models were included ina global Cox model, then a selection with the stepwise Akaike informationcriterion (AIC) was achieved (age was forced in the model) in order to ob-

tain β, the estimated coefficients in the Cox proportional hazards regressionmodel. Schoenfeld residuals were computed to check the proportional hazardsassumption.

For our work, the goal was to define an individual score. We used Breslow’sestimator [23] defined below to estimate Λ0, the cumulative base-line hazardfunction.

Λ0(t) =∑

i:Ti6t

di∑

j∈R(Ti)

exp(β′Zj)(1)

with:

– T1 < T2 < · · · < T12 distinct times to event– di the number of events detected at Ti

– Zi the value of the covariates for patient i– R(Ti) all individuals still at risk at time T−

i (just before Ti)

– β the estimated coefficients in the Cox proportional hazards regressionmodel

This makes it possible to estimate a risk function F for each individual asa function of its covariates :

F (t|Z) = 1− S(t|Z) = 1− exp(

−Λ0(t) exp(β′Z)

)


57

2.5.3 Bootstrap

As we performed a random sampling among women without breast cancer, inour database, the distribution between women who developed cancer and theothers was no longer representative of the starting population. To remedy this,we performed bootstrap samples. We computed 100,000 bootstrap databasesusing stratified random sampling with replacement among women for whomwe got answers to all the covariates of the final Cox model, in order to obtaina database close to the initial CRCDC database (i.e. 33,858 women including896 who developed cancer in the 12 years of follow-up)

In these databases, mean risk was identical to the one of the CRCDCdatabase. Then, we estimated, for each woman, Λ0(t) as described in 2.5.2,and calculated F (12|Z) to obtain a risk score of developing a breast cancerduring the follow-up period.

2.5.4 Thresholds and likelihood ratios

In France, the target population of the organized screening is greater than 10million women. Current participation is about 50% [5] with a target of 65%.The number of women actually screened is therefore between 5 and 6.5 million.Among women at average risk, we hypothesized that 10% of them might have arisk different from the “average” risk, either lower or higher. This average riskwas 2.6% in this population. From a clinical standpoint, we looked for womenthat might have a risk two times lower than this “average” risk. This groupwas considered as at lower-risk. We also looked for women that might have arisk two times higher than this “average” risk. This group was considered asat higher-risk. We then searched for the corresponding thresholds. We lookedfor s1, the higher risk threshold, and s2, the lower risk threshold, such as:

P(M |R > s)

P(M |R < s)≥ 2 ∀s ≥ s1

and

P(M |R < s)

P(M |R > s)≤ 0.5 ∀s ≤ s2

with:

– M represents the event ”developing a breast cancer during the 12 yearsfollow-up”

– R represents the risk as determined by the score

This method then enabled to delineate 3 groups according to the level ofrisk: lower risk, average risk, and higher risk.

Given that, for each threshold s :

LR + (s) =sensitivity(s)

1− specificity(s)LR− (s) =

1− sensitivity(s)

specificity(s)


58


We were able to plot the curves LR+(s) and LR-(s), then to find s1 ands2 according to the calculations available in Supplementary material 2.

Women with a risk score below the ”low threshold” will be considered atlower risk of breast cancer, women with a risk level between ”low threshold”and ”high threshold” will be considered at average risk and finally, womenwith a risk score above the ”high threshold” will be considered at higher riskof breast cancer in this population of women at average risk.

3 Results

3.1 Descriptive analysis

Descriptive analysis results are presented in Table 1. The significant risk fac-tors in the univariate analysis were the number of biopsies (p = 0.042), breastdensity (p = 0.012), total duration of breastfeeding (p = 0.026), financial dif-ficulties and/or not going on vacation (p = 0.002), mother history of breastcancer (p = 0.001), and aunt history of breast cancer(p = 0.029).

We explored whether a selection bias might have occurred. We analyzedthe distribution of the two variables (age and breast density) we had for womenwho could not be reached or who refused to answer our questions. The resultsare shown in Tables 3 and 4. The only significant difference was observed forbreast density of women who did not develop cancer during follow-up. Thewomen who responded to the survey had significantly higher breast density onaverage. So the women selected, in the group without breast cancer, were moreat risk than those who did not respond. Under these conservative conditions,we considered that the selection bias would not penalize the analysis. More-over, we did not retained the fact that breast density as such was a criterionof choice for women to answer the questionnaire or not.

3.2 Cox model

The final model, presented in Table 2.5.2, has been built with 2768 women forwhom we had answers to all the covariates of the model, including 2241 non-cancer patients and 527 women who developed cancer during the follow-upperiod.

The risk factors selected by the model were high breast density (ACRBI-RADS grading) (B vs A: HR 1.41, 95% CI [1.05;1.9], p=0.023; C vs A:HR=1.65, 95% CI [1.2;2.27], p=0.02 ; D vs A: HR=2.11, CI [1.25;3.58],p=0.006),history of maternal breast cancer (HR=1.61, 95% CI [1.24;2.09], p ¡ 0.001), andsocioeconomic difficulties (HR 1.23, 95% CI [1.09;1.55], p=0.003). While earlymenopause (HR=0.36, 95% CI [0.13;0.99], p=0.003) and an age at menar-che after 12 years (HR=0.77, 95% CI [0.63;0.95], p=0.047). Age (HR 1.024,95% CI [0.997;1.05], p = 0.084) and interaction between age at menarche andmenopause (HR 2.73, 95% CI [0.91;8.19], p = 0.073) have been forced in themodel.


59


3.3 Bootstrap

We computed 100,000 bootstrap databases using random sampling with re-placement among the 2768 women for whom we had answers to all the covari-ates of the final Cox model as described in 2.5.3. F (t) has been calculated forall women in the 100,000 databases.

3.4 Threshold determination

Pre-test prevalence of the disease was the percentage of breast cancer in theCRCDC-OC database i.e. 2.6%.

So, as described in Supplementary material 2, we looked for :

– s1 such as ∀s ≥ s1 :

LR+(s) ≥ 2.054

– s2 such as ∀s ≤ s2 :

LR−(s) ≤ 0.493

We then plot LR+(s) and LR-(s) in Figure 2. This method gave a lowthreshold at 1.9% and a high threshold at 4.5%.

0.00 0.01 0.02 0.03 0.04 0.05

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Risk score

LR

Lower risk Average risk Higher risk

0.019

LR+

LR−

0.045

LR − (s2) = 0.493

LR + (s1) = 2.054

Fig. 2 Determination of the three risk groups with optimal thresholds in women at averagerisk of breast cancer: Lower risk, Average risk and Higher risk.


60


On the one hand 9.39% of women presented a risk lower than 1.9%. Amongthis group, the mean risk was 1.37%. On the other hand, 2.85% of womenpresented a risk higher than 4.5% with a mean risk of 5.84%.

4 Discussion

In a retrospective cohort of French women, aged 50 to 60, at average risk ofbreast cancer and followed-up for 12 years, from 2006 to 2018, who underwentthe organized screening, we identified two additional levels of risk, either twicelower than the average-risk, or twice higher.

Our partition into three groups at risk of breast cancer in these womenshowed that the level of risk in the so-called “average-risk” population is nothomogeneous.

Depending on the presence or level of clinical, genetic or biological mark-ers, treatment strategies are personalized to be as effective as possible. Thisapproach can be extended to public health and in particular for primary andsecondary prevention. Indeed, offering personalized prevention strategies couldincrease adherence to the program, and its the effectiveness as well. Thus,identifying new groups at risk among women at average risk offers new per-spectives.

Women’s wishes for a more personalized organized screening [24] thereforeappears attainable. Indeed, women interviewed during the French citizen con-sultation on breast cancer screening in 2017, expected the organized screeningto be more targeted, thanks to advances in research, and to become morepersonalized according to individual risk factors. Physicians, also consulted,mentioned the value of determining the level of individual risk [25,26]. Theyalso ask for a score in French women in order to substantiate the levels of risk.This risk assessment would be used by the screening management structures tooffer a personalized approach of the organized screening, adapted to the levelof risk of each woman. We developed a user friendly computer application inthis perspective.

The value of the organized screening is undeniable. In France, it enables aregular follow-up of invited women, aged 50 to 74, to carry out a free mam-mogram in the radiology office of their choice every two years. Moreover, ifthe initial assessment does not present any anomaly, a second reading of themammography is systematically performed by another radiologist. The sec-ond independent reading is a guarantee of reliability, with around 8% of thebreast cancers detected by the independent second reading[27,28]. The prin-ciple of an organized breast cancer screening has been challenged given theclaims of over-diagnosis or over-treatment. A reassessment of the organizedscreening that would allow a personalization according to the individual levelof risk, would enable a standardized and quality monitoring of women whileproducing evaluative data.

We identified three levels of risk with a risk ratio of at least 2 between thegroup at average risk and the group at lower risk on the one hand, and with the


61


group at higher risk on the other hand. It lead to fairly large group sizes. Thechoice of a risk divided or multiplied by a factor two remains arbitrary thoughclinically relevant. Moreover, sensitivity studies could be carried out by varyingthis ratio according to the beliefs of epidemiologists or physicians specializedin this field. Furthermore, the choice could be supported by a cost functiontaking into account monetary cost and years of life adjusted for quality of life.

According to the two thresholds of lower and higher risk, 12% of womenwere thus identified, corresponding to more than 600,000 women in the Frenchpopulation. In this population at ”average” risk of breast cancer, a new screen-ing strategy could therefore be proposed for the groups at lower or higher risk.For the group at average-risk of breast cancer risk, corresponding to 88% ofthe presently screened population, the current screening rate, i.e. a mammo-gram every two years, is maintained as such. For the group at lower risk thatincludes 9.4% of the screened population with a level of risk of 1.37% of de-veloping breast cancer within the 12 years of follow-up, we propose to reducethe frequency of mammograms with a mammogram every three or more yearsinstead of the current two years. This spacing of mammograms could reducethe risk of over-diagnosis while maintaining a regular follow-up. Moreover, itwould reduce, the risk of radiation-induced cancer, even if already very low.However, the risk of interval cancer can’t be excluded. This proposal will haveto be tested with an appropriate design. For the 2.9% of women at higher risk,increasing the frequency of mammograms is possible. However, high breastdensity is an important risk factor of this group. The distribution of high orvery high breast density the group at higher risk was 60% compared to 28%in the group at average risk and 3% in the group at lower risk. Furthermore,the sensitivity of the mammogram falls to 63% [29] for very dense breasts.The sensitivity of cancer detection via a mammogram appeared inversely pro-portional to the breast density [30]. Thus, it did not seem that increasing thefrequency of mammograms would bring a significant benefit. Based on recentresults, an alternative would be to offer these women an abbreviated breastMRI, which is associated with a higher rate of breast cancer detection amongwomen with dense breasts [31], while maintaining a two year frequency forsurveillance. This would bring the care strategy for these women at higher-risk closer to the strategy applied to women already identified at high risk(i.e. different from women here identified in the “average” risk population),and benefiting from an individual follow-up [3].

Such a personalization of the organized screening might make women feel-ing more involved. For women at lower-risk, a mammogram every three yearsmay be less binding than every two years. For women at higher risk, offering amore specific examination may increase adhesion to the organized screening.These strategies could then increase the participation of French women to theorganized screening which is currently around 50% [5], though the target is65%.

As previously stated, the cost of the case finding (including diagnosis andtreatment of patients diagnosed) should be economically balanced in relationto possible expenditure on medical care as a whole [32]. An analysis of the


62


medico-economic gain must be carried out taking into account, in particular,interval cancers, over-diagnosis and the stage of the cancers detected.

This study proposes a new approach to personalize the organized screeningby stratifying the risk within the population at ”average” risk and currentlyconsidered as homogeneous. However, the results of this study remain to beconfirmed in other contexts to validate whether the proposed strategy couldbe implemented. Six variables were included in the score of breast cancer risk,two of which (age and breast density) are already included in the databasesof the different French CRCDCs. Only four more questions are required to beasked in order to establish the score. This information should be made easierand simpler to collect. Moreover, an internal validation on another database,for example of women aged 40 to 50 in the Herault department, could beperformed. A double external geographic validation could be carried out onthe databases from other CDCRCs and a database from another country [33].

We collected information of interest for the creation of the score at t0, theinclusion time. But, over time, the value of variables retained at t0 may changeand the risk at t0 + x years may also change. It will therefore be necessary toregularly recalculate women risk of breast cancer in order to assign them tothe appropriate group of risk.

A re-evaluation of the score will also be necessary on regular time intervalsto reappraise the coefficients estimated by the model in order to evaluatewhether they would need to be adjusted.

Our proposal for a new personalized organized screening must be validatedby other studies.

5 Declarations

5.1 Funding

This work was funded by the University of Montpellier and Nımes UniversityHospital which supported Emmanuel Bonnet PhD thesis. A complementaryfunding was obtained from the Fondation de l’Avenir for the implementationof the telephone survey.

5.2 Conflicts of interest/Competing interests

The authors declare that they have no conflict of interest.

5.3 Ethics approval

We obtained the authorization n◦: 2200402v0 from the CNIL’s (National Com-mission for Informatics and Liberties).


63


5.4 Consent to participate

Verbal informed consent was obtained from all individual participants prior tothe telephone interview.

5.5 Consent for publication

Not applicable

5.6 Availability of data and material

The data used and analyzed during the current study are available from thecorresponding author (EB).

5.7 Code availability

The code used during the current study is available from the correspondingauthor (EB).

5.8 Authors’ contributions

All authors contributed to the study conception and design. Material prepara-tion and analysis were performed by Emmanuel Bonnet. The first draft of themanuscript was written by Emmanuel Bonnet and all authors commented onprevious versions of the manuscript. All authors read and approved the finalmanuscript.

References

1. D. Lefeuvre, N. Catajar, C. Le, B. Benjamin, and P. J. Bousquet, “Depistage du cancerdu sein : impact sur les trajectoires de soins,” tech. rep., Institut national du cancer,2018.

2. N. Jay, G. Nuemi, M. Gadreau, and C. Quantin, “A data mining approach for groupingand analyzing trajectories of care using claim data: The example of breast cancer,”BMC Medical Informatics and Decision Making, vol. 13, nov 2013.

3. Haute Autorite de Sante, “Depistage du cancer du sein en France : identification desfemmes a haut risque et modalites de depistage,” tech. rep., HAS, 2014.

4. INCa, “Plan d’action pour la renovation du depistage organise du cancer du sein,” tech.rep., Institut National du Cancer, 2017.

5. Sante publique France, “Taux de participation au programme de depistage organise ducancer du sein 2018-2019 et evolution depuis 2005,” tech. rep., Sante publique France,2020.

6. M. H. Gail, L. A. Brinton, D. P. Byar, D. K. Corle, S. B. Green, C. Schairer, and J. J.Mulvihill, “Projecting individualized probabilities of developing breast cancer for whitefemales who are being examined annually,” Journal of the National Cancer Institute,vol. 81, pp. 1879–1886, dec 1989.


64


7. D. M. Euhus, K. C. Smith, L. Robinson, A. Stucky, O. I. Olopade, S. Cummings,J. E. Garber, A. Chittenden, G. B. Mills, P. Rieger, and Others, “Pretest prediction ofBRCA1 or BRCA2 mutation by risk counselors and the computer model BRCAPRO,”Journal of the National Cancer Institute, vol. 94, no. 11, pp. 844–851, 2002.

8. A. C. Antoniou, P. P. Pharoah, P. Smith, and D. F. Easton, “The BOADICEA model ofgenetic susceptibility to breast and ovarian cancer,” British Journal of Cancer, vol. 91,pp. 1580–1590, oct 2004.

9. J. Tyrer, S. W. Duffy, and J. Cuzick, “A breast cancer prediction model incorporatingfamilial and personal risk factors,” Statistics in Medicine, vol. 23, pp. 1111–1130, apr2004.

10. R. Pastor-Barriuso, N. Ascunce, M. Ederra, N. Erdozain, A. Murillo, J. E. Ales-Martınez, and M. Pollan, “Recalibration of the Gail model for predicting invasive breastcancer risk in Spanish women: A population-based cohort study,” Breast Cancer Re-

search and Treatment, vol. 138, pp. 249–259, feb 2013.11. A. McTiernan, M. A. Gilligan, and C. Redmond, “Assessing individual risk for breast

cancer: Risky business,” Journal of Clinical Epidemiology, vol. 50, pp. 547–556, may1997.

12. X. Wang, Y. Huang, L. Li, H. Dai, F. Song, and K. Chen, “Assessment of performance ofthe Gail model for predicting breast cancer risk: A systematic review and meta-analysiswith trial sequential analysis,” Breast Cancer Research, vol. 20, mar 2018.

13. R Core Team, R: A Language and Environment for Statistical Computing. R Founda-tion for Statistical Computing, Vienna, Austria, 2019.

14. L. S. Mørch, C. W. Skovlund, P. C. Hannaford, L. Iversen, S. Fielding, and Ø. Lidegaard,“Contemporary hormonal contraception and the risk of breast cancer,” New England

Journal of Medicine, vol. 377, pp. 2228–2239, dec 2017.15. C. J. D’Orsi and Acr, 2013 ACR BI-RADS Atlas: Breast Imaging Reporting and Data

System. American College of Radiology, 2014.16. WHO, Global recommendations on physical activity for health. WHO Library

Cataloguing-in-Publication Data, 2010.17. Sante publique France (MILDECA), “Nouvelles recommandations sur l’alimentation, y

compris l’alcool, l’activite physique et la sedentarite — Mildeca,” jan 2019.18. L. Lalys and J. C. Pineau, “Age at menarche in a group of French schoolgirls,” Pediatrics

International, vol. 56, no. 4, pp. 601–604, 2014.19. S. Volant, “Un premier enfant a 28,5 ans en 2015 : 4,5 ans plus tard qu’en 1974,” INSEE

Premiere, vol. 1642, pp. 2015–2018, mar 2017.20. I. Robert-Bobee, “2,1 enfants par femme pour les generations nees entre 1947 et 1963,”

Insee Focus, vol. Avril, apr 2015.21. WHO, “Planning Guide for national implementation of the Global Strategy for Infant

and Young Child Feeding,” tech. rep., WHO, 2007.22. College National des Gynecologues et Obstetriciens Francais, “La menopause.”23. BRESLOW and N. E., “Contribution to discussion of paper by D. R. Cox,” J. Roy.

Statist. Soc., Ser. B, vol. 34, pp. 216–217, 1972.24. INCa, “Avis de la conference des citoyennes,” tech. rep., INCa, 2016.25. INCa, “Avis de la conference des professionnels,” tech. rep., INCa, 2016.26. C. Cases, M. Di Palma, E. Drahi, S. Fainzang, P. Landais, S. De Montgolfier, F. Paccaud,

J.-P. Riviere, and D. Thouvenin, “Rapport du comite d’orientation sur le depistage ducancer du sein,” tech. rep., INCA, 2016.

27. I. Doutriaux-Dumoulin, A. Allioux, L. Campion, P. Meingan, and L. Molina, “Cancersdetectes par le deuxieme lecteur : analyse des donnees de la campagne de depistage ducancer du sein en Loire-Atlantique, 2003-2005 (nouveau cahier des charges),” Journal

de radiologie, vol. 1137, no. 12, pp. 1839–1910, 2007.28. E. L. Thurfjell, K. A. Lernevall, and A. A. Taube, “Benefit of independent double

reading in a population-based mammography screening program,” Radiology, vol. 191,pp. 241–244, apr 1994.

29. P. A. Carney, D. L. Miglioretti, B. C. Yankaskas, K. Kerlikowske, R. Rosenberg,C. M. Rutter, B. M. Geller, L. A. Abraham, S. H. Taplin, M. Dignan, G. Cutter,and R. Ballard-Barbash, “Individual and combined effects of age, breast density, andhormone replacement therapy use on the accuracy of screening mammography,” Annals

of Internal Medicine, vol. 138, pp. 168–175, feb 2003.


65


30. J. S. Drukteinis, B. P. Mooney, C. I. Flowers, and R. A. Gatenby, “Beyond mammog-raphy: New frontiers in breast cancer screening,” jun 2013.

31. C. E. Comstock, C. Gatsonis, G. M. Newstead, B. S. Snyder, I. F. Gareen, J. T. Bergin,H. Rahbar, J. S. Sung, C. Jacobs, J. A. Harvey, M. H. Nicholson, R. C. Ward, J. Holt,A. Prather, K. D. Miller, M. D. Schnall, and C. K. Kuhl, “Comparison of AbbreviatedBreast MRI vs Digital Breast Tomosynthesis for Breast Cancer Detection amongWomenwith Dense Breasts Undergoing Screening,” JAMA - Journal of the American Medical

Association, vol. 323, no. 8, pp. 746–756, 2020.32. J. M. G. Wilson, G. Jungner, and World Health Organization, “Principles and practice

of screening for disease,” Public health papers, vol. 34, 1968.33. G. Iatrakis, J. P. Daures, N. Geahchan, T. Maudelonde, A. Bothou, C. Chraibi, O. Omar,

S. Voiculescu, E. Antoniou, T. Youseff, P. Tsikouras, G. Galazios, A. Chalazonitis, andS. Zervoudis, “Manosmed University’s Risk factor calculator for female breast cancer:Preliminary data,” Review of Clinical Pharmacology and Pharmacokinetics, Interna-

tional Edition, vol. 32, no. 1, pp. 23–27, 2018.


66

Table 1 Descriptive analysis

Breast cancerNon (%)

Yesn (%)

P-value

Age (years) [49,53[ 626 (25.2) 156 (26.1) 0.164[53,57[ 915 (36.9) 196 (32.8)[57,60] 939 (37.9) 245 (41.0)

Oral contraceptive use No 605 (24.5) 165 (27.7) 0.111Yes 1867 (75.5) 430 (72.3)

Contraceptive pill duration of use 0 605 (24.8) 165 (28.5) 0.156(years) [1,5[ 520 (21.3) 127 (21.9)

[5,10[ 327 (13.4) 80 (13.8)≥ 10 986 (40.4) 207 (35.8)

Number of breast biopsies 0 2182 (88.8) 506 (86.2) 0.0421 231 (9.4) 61 (10.4)≥ 2 45 (1.8) 20 (3.4)

Breast density A 338 (13.6) 60 (10.1) 0.012(BI-RADS classification) B 1483 (59.8) 348 (58.3)

C 604 (24.4) 168 (28.1)D 55 (2.2) 21 (3.5)

Physical activity 0 972 (39.3) 225 (37.7) 0.093(hours/week) [1,5[ 1185 (47.9) 275 (46.1)

≥ 5 318 (12.8) 97 (16.2)Alcohol consumption 0 1329 (53.6) 344 (57.7) 0.16(glasses/week) [1,10[ 920 (37.1) 197 (33.1)

≥ 10 229 (9.2) 55 (9.2)Age at menarche (years) <12 483 (20.1) 133 (23.1) 0.121

≥ 12 1919 (79.9) 442 (76.9)Age at first live birth (years) ≤ 24 1361 (55.0) 334 (56.4) 0.226

>24 911 (36.8) 200 (33.8)nulliparous 201 (8.1) 58 (9.8)

Number of children 0 245 (9.9) 72 (12.1) 0.068[1,3[ 1694 (68.3) 379 (63.5)≥ 3 541 (21.8) 146 (24.5)

Total duration of breastfeeding 0 1383 (55.8) 343 (57.6) 0.026(months) [1,7[ 739 (29.8) 149 (25.0)

≥ 7 355 (14.3) 104 (17.4)Overweight (BMI ≥ 25) No 2120 (87.7) 488 (85.2) 0.122

Yes 298 (12.3) 85 (14.8)Financial difficulties and/or not going on vacation No 1518 (61.9) 323 (54.8) 0.002

Yes 935 (38.1) 266 (45.2)Age at menopause (years) ≥ 41 2229 (94.1) 532 (95.3) 0.311

< 41 139 (5.9) 26 (4.7)Mother history of breast cancer No 2251 (92.4) 517 (87.9) 0.001

Yes 184 (7.6) 71 (12.1)Sister history of breast cancer No 2261 (91.6) 536 (90.1) 0.268

Yes 207 (8.4) 59 (9.9)Daughter history of breast cancer No 2451 (99.1) 588 (98.7) 0.499

Yes 23 (0.9) 8 (1.3)Aunt history of breast cancer No 2271 (93.9) 521 (91.2) 0.029

Yes 148 (6.1) 50 (8.8)


67

Table 3 Distribution of breast density according to the answer or not to the study ques-tionnaire

Breastdensity

Unreachableor refusal

Answers P-value

Nobreastcancer

A 17.2% 13.7%

0.0007B 58.1% 59.8%C 21.7% 24.3%D 3.0% 2.2%

Breastcancer

A 11.7% 10.5%

0.6035B 61.4% 58.1%C 23.3% 28.0%D 3.6% 3.4%

Table 4 Distribution of age according to the answer or not to the study questionnaire

Unreachableor refusal

Answers P-value

mean (sd) mean(sd)

No breast cancer 55 (3.2) 55.1 (3.2) 0.384Breast cancer 55.7 (3.1) 55.3 (3.3) 0.125

Table 5 Cox model for the occurrence of breast cancer

Hazard Ratio 95%CI P-value

Age 1.024 0.997-1.05 0.084Breast density A - - -

B 1.411 1.05-1.9 0.023C 1.651 1.2-2.27 0.002D 2.112 1.25-3.58 0.006

Mother history of breast cancer No - - -Yes 1.612 1.24-2.09 < 0.001

Financial difficulties and/or not going on vacation No - - -Yes 1.299 1.09-1.55 0.003

Age at menopause ≥ 41 - - -< 41 0.364 0.13-0.99 0.047

Age at menarche < 12 - - -≥ 12 0.774 0.63-0.95 0.015

Age at menopause < 41 and Age at menarche ≥ 12 2.73 0.91-8.19 0.073


68


2.6 Discussion

Dans cette cohorte rétrospective de femmes françaises âgées de 50 à 60 ans, incluses

dans le dépistage organisé du cancer du sein, et donc considérées actuellement de façon

homogène à risque "moyen" de cancer du sein, nous avons identifié deux sous-groupes à

niveaux de risque différents à 12 ans de suivi : un plus bas risque deux fois moins élevé

qui concerne 9,37% des femmes incluses dans le dépistage organisé avec un risque moyen

à 1,37% ; et un plus haut risque, multiplié par deux par rapport au risque moyen, qui

concerne 2,85% des femmes participant au dépistage organisé, avec un risque moyen de

5,84%. Le choix choix d’un facteur multiplicatif de 2 (ou de 0.5) a été retenu comme étant

le rapport de risque maximum tel qu’au moins 10% de la population ciblée par le dépis-

tage organisé soit incluse dans les groupes à plus haut ou à plus bas risque pour que les

proposition de nouvelles stratégies de dépistage soient réalisables et acceptables tant d’un

point de vue de la prise en compte du risque de cancer du sein et des conséquences pos-

sibles en termes de changement d’algorithme décisionnel, d’acceptabilité par les femmes

dépistées, de faisabilité sur le plan organisationnel et de médico-économiquement perti-

nent. Le groupe à risque moyen reste celui avec le plus grand effectif, soit 87,8% de la

population des femmes dépistées. Leur risque moyen était de 2,68%, soit très proche du

risque moyen observé dans la base totale du CRCDC-Oc qui est de 2,65%. Notre score

a identifié 12,2% des femmes comme ayant un risque différent du risque moyen, ce qui

représente plus de 600.000 femmes dans la population des femmes françaises qui adhèrent

actuellement au dépistage organisé du cancer du sein. Si la France atteint son objectif de

65% d’adhésion au dépistage organisé, cela correspondra à près de 800.000 femmes.

Nous avons proposé un contrôle d’un éventuel biais de sélection relatif au recrutement

des femmes qui ont présenté un cancer du sein et des femmes contrôles qui n’en ont

pas présenté. L’approche que nous avons réalisée n’est cependant pas optimale car elle

est liée à la nature des informations dont nous disposions. En effet, pour les contrôles

qui n’ont pas participé à l’étude (absence de réponse après 5 relances téléphoniques,

numéro de téléphone non attribué ou encore refus de réponse) nous n’avons pu recueillir

aucune information supplémentaire. Les seules informations disponibles étaient celles déjà

présentes dans la base de données du CRCDC, soient l’âge et la densité mammaire. C’est

la raison pour laquelle la comparaison n’a pu être réalisée que sur ces deux variables

afin d’évaluer un éventuel biais de sélection. Ainsi, n’avons pas la possibilité d’écarter

totalement un biais de sélection éventuel sur les autres variables.

La création de ce score avait pour but de répondre à la demande à la fois des citoyennes

et des médecins, dont les recommandations avaient été recueillies lors des concertations

citoyennes et professionnelles [14, 15]. La demande principale était de rendre le dépistage

organisé du cancer du sein plus personnalisé. Nous avons donc, grâce à l’identification par

notre score de trois groupes à risque gradué, proposé une nouvelle stratégie personnalisée

69


pour le dépistage organisé du cancer du sein. Pour les 88% des femmes dans le groupe

à risque moyen, étant donné que leur score de risque est identique au taux de cancer

observé dans la base du CRCDC-Oc, un dépistage par mammographie tous les deux ans

serait maintenu. Pour les femmes à plus bas risque nous proposons une mammographie

tous les 3 ans. Cet espacement des mammographies pourrait permettre de réduire le sur-

diagnostic et le sur-traitement mais aussi, même si il est très faible le risque de cancer

radio- induit. Pour les femmes à plus haut risque, qui ont en moyenne des seins plus

denses, nous ne pensons pas qu’augmenter la fréquence des mammographies soit adapté.

En effet, la sensibilité de la mammographie est inversement proportionnelle à la densité

mammaire [9] et chute à 63% pour les seins très denses [4]. Selon des résultats récents,

l’IRM mammaire abrégée semble être associée à une meilleure détection des cancers pour

les femmes présentant des seins denses [7]. Nous proposons donc pour le groupe des femmes

à haut risque une IRM mammaire abrégée tous les deux ans.

L’étude internationale My Personalized Breast Screening (MyPeBS) est actuellement

en cours de recrutement. Son but est aussi d’étudier l’intérêt d’une stratégie de dépistage

organisé basée sur le risque. Cependant le score de risque utilisé est le score MammoRisk R .

Il s’agit d’une méthode propriétaire de calcul de risque brevetée. Nous n’avons donc pas

pu l’utiliser sur notre base de données pour comparer les résultats. De plus MammoRisk R

nécessite, en plus de trois questions (âge, historique familial de cancer du sein, antécé-

dent de biopsie) et de la densité mammaire (selon la classification BI-RADS), un score

polygénique qui résulte de la combinaison de plusieurs SNPs suite à une analyse de prélè-

vement salivaire. Le coût d’un tel test est donc beaucoup plus important et semble difficile

à proposer aux 10 millions de femmes incluses dans le dépistage organisé du cancer du

sein. Notre score nécessite seulement la réponse à 6 questions simples dont deux (âge

et densité mammaire) sont déjà collectées par les structures de dépistage. Il suffit donc

d’obtenir la réponse à 4 questions complémentaires pour les femmes concernées : leur âge

aux premières règles, leur âge à la ménopause, la présence ou non de cancer du sein chez

leur mère et la présence ou non de difficultés socio-économiques.

Notre travail a pour originalité de se fonder sur des variables que l’on peut facile-

ment recueillir. Dans notre proposition, contrairement aux modèles développés notam-

ment outre Atlantique, et dans l’essai européen en cours, nous n’avons pas retenu la

mesure de variables génétiques, (hors recherche ciblée de BRCA1-BRCA2), pour deux

raisons principales : elles ont un coût souvent élevé et leur faisabilité à grande échelle est

peu compatible avec l’accès à la prévention de toutes. Nous proposons chez des femmes

à risque "moyen", entre 50 et 70 ans, de personnaliser le dépistage organisé à partir d’un

modèle fondé sur 6 variables et conduisant à trois stratégies définies selon le niveau de

risque de cancer du sein des femmes de 50 à 74 ans. Bien entendu, cette stratégie de

dépistage personnalisée à partir de deux seuils, l’un bas, l’autre élevé, doit être validée

par des études complémentaires adaptées.

70

Conclusion

L’objectif principal de ce travail de thèse a été de créer un nouveau score de risque du

cancer du sein, adapté à la population de femmes françaises à risque "moyen" de cancer

du sein incluses dans le dépistage organisé, afin de pouvoir proposer une nouvelle stratégie

en la personnalisant grâce à la détermination de 3 groupes de risque gradué.

Dans une première partie, afin de pouvoir gérer l’énorme quantité d’articles dédiés à la

recherche de facteurs de risque de cancer du sein disponibles dans la littérature scientifique,

nous avons développé une nouvelle méthode pour faciliter la recherche bibliographique.

Cette recherche bibliographique est fondée sur le principe de la fouille de données. Nous

l’avons nommée TEMAS pour Text-Mining Algorithm-Assisted Search. Elle divise la re-

cherche bibliographique en 4 étapes qui permettent à l’utilisateur d’inclure une fouille

de textes et d’introduire une classification des articles. Cette méthode permet de mieux

cibler les articles pertinents afin de réduire la charge de travail dédiée à la lecture de tous

les articles finalement retenus.

Plusieurs méthodes de recherche bibliographique basées sur le text-mining utilisant

l’apprentissage statistique [29, 28, 19] existent. Cependant, l’apprentissage statistique né-

cessite un entraînement de la méthode sur un échantillon d’apprentissage construit de

manière à être représentatif du sujet d’intérêt. Il est donc nécessaire d’avoir des compé-

tences particulières pour mettre en œuvre ces méthodes et pour les entraîner sur chaque

thème de recherche.

L’analyse de la qualité de TEMAS a été réalisée sur plusieurs critères. Les taux de

précision et de rappel permettent d’analyser globalement, sur l’ensemble des résultats,

le pourcentage d’articles pertinents retrouvés par la méthode ainsi que le pourcentage

d’articles pertinents non retrouvés par la méthode. La MAP (Mean Average Precision)

et le DCG (Discounted Cumulative Gain) permettent d’analyser les performances d’une

méthode en prenant en compte le "ranking" des articles. TEMAS a montré de meilleures

performances globales que des recherches PubMed de référence en termes de précision

et de nombre d’articles à lire. Notre nouvelle méthode s’est aussi montrée plus efficace

en prenant en compte le classement ou "ranking" comparée au BestMatch, la méthode

utilisée par PubMed pour classer les articles par pertinence. TEMAS a aussi été testé

sur deux méta analyses sur des cohortes rétrospectives et des études cas témoins et a

réussi a retrouver la totalité des articles inclus dans les méta-analyses tout en diminuant

le nombre d’articles récupérés.

TEMAS utilise la base de données de PubMed comme point de départ de la recherche.

71

Conclusion

Nous avons fait ce choix car PubMed propose une API ouverte efficace qui permet de

récupérer toutes les informations sur les articles ainsi que les abstracts dans R pour

pouvoir réaliser les analyses. De futurs tests pourraient être réalisés sur d’autres bases de

données comme Embase ou Google scholar.

TEMAS est accessible en ligne grâce à l’application web interactive que nous avons

développé. Cette application web permet d’utiliser la méthode TEMAS depuis un navi-

gateur internet sans connaissances particulières en informatique ou en statistique et sans

nécessité de télécharger un logiciel. L’application été développée pour guider l’utilisateur

à travers toutes les étapes avec un tutoriel en haut de chaque page, l’utilisation de pan-

neaux conditionnels s’affichant au fur et à mesure de l’avancée, et la mise en place de

messages d’alerte pop-up détaillés et informatifs en cas d’avertissement ou d’erreurs.

TEMAS semble donc être une méthode intéressante et facile à mettre en œuvre pour

optimiser la recherche bibliographique. Cependant, même si l’intégralité des articles in-

clus dans les méta-analyses ont été récupérés par TEMAS, sur les recherches globales les

taux de rappel se situaient entre 95% et 97%. Cela ne représente que 1 à 3 articles qui

sont passés au travers des mailles du filet pour chaque facteur de risque parmi tous les

articles relatifs aux facteurs de risque. La raison pour laquelle ces articles ont été manqués

a été activement recherchée mais n’a pas été identifiée. L’information apportée par ces

articles était toutefois redondante avec celle des articles récupérés. Nous n’avons pas mis

en évidence de biais d’information.

La seconde partie avait pour but de proposer une nouvelle stratégie pour le dépistage

organisé du cancer du sein. Pour arriver à cet objectif plusieurs étapes ont été nécessaires.

TEMAS nous a permis d’identifier, parmi plus de 16.000 articles présents dans la

littérature scientifique, les facteurs de risque à inclure dans notre questionnaire. Notre

questionnaire a été développé et validé avec l’aide de professionnels : médecin responsable

du dépistage organisé, gynécologue, et biostatisticienne du registre des tumeurs. Il contient

33 questions clefs. Nous avons cherché à le rendre le plus complet possible sans être

trop long. L’implémentation du CRF a ensuite été réalisée sous Access avec l’aide de

l’informaticien du CRCDC pour lier ce questionnaire à la base de données du CRCDC-

OC.

Le recueil des 3077 questionnaires a été réalisé par téléphone par des enquêtrices

formées au préalable pour obtenir les réponses les plus pertinentes possibles. Ce recueil a

duré plus de 9 mois et a nécessité plus de 6500 appels (en comptant les rappels en cas de

non réponse). Le data management a été réalisé au fil de l’eau pour pouvoir recontacter

directement les femmes interrogées en cas de données aberrantes ou d’absence de réponse

à une question. Cela a permis d’avoir une base de données propre et sans aucune valeur

absente.

Un modèle de Cox a ensuite été réalisé sur les "complete cases" c’est à dire uniquement

72

Conclusion

sur les femmes qui avaient répondu à toutes les questions retenues dans le modèle de Cox

(réponse autre que "je ne sais pas"). Le modèle a donc été créé à partir de 527 femmes

ayant développé un cancer pendant les 12 ans de suivi et de 2 241 femmes n’ayant pas

développé de cancer. Ce modèle, grâce à l’utilisation de l’estimateur de Breslow, a ensuite

permis d’estimer pour chaque femme un score de risque.

Le but de ce score de risque était de proposer une nouvelle stratégie pour le dépistage

organisé. En effet les femmes, les professionnels de santé et le groupe d’experts du comité

d’orientation ont exprimé lors des concertations sur le dépistage organisé du cancer du

sein en 2016 [14, 15, 5] leur volonté d’un dépistage plus personnalisé en fonction du niveau

de risque. L’INCa a donc dans ce sens formulé un plan d’action pour la rénovation du

dépistage organisé du cancer du sein [17] en 2017 dans lequel sont encouragés les projets

de recherche portant sur les outils et méthodes d’évaluation du niveau de risque, dont

le scoring, permettant d’orienter au mieux les femmes vers un dépistage organisé plus

personnalisé.

Nous avons identifié trois groupes avec un rapport de risque d’au moins un facteur 2

entre le groupe à risque "moyen" et les groupes à plus bas et plus haut risque. Ce choix

d’un facteur 2 est arbitraire mais reste pertinent d’un point de vue clinique. Il permet dans

un même temps de conserver des tailles de groupes à haut et bas risque assez importantes

pour pouvoir proposer une nouvelle stratégie de dépistage. Le choix du rapport de risque

pourrait aussi être guidé par la création d’une fonction de coût prenant en compte le coût

monétaire ainsi que le gain d’espérance de vie ajusté sur la qualité de vie de la nouvelle

stratégie. Nous sommes dans une situation d’évaluation des coûts du dépistage analogue

à celle du diagnostic lorsque l’on évalue les coûts de prises en charge qui diffèrent selon le

grade du cancer diagnostiqué [18].

Notre séparation en 3 groupes de risque gradué nous a permis d’identifier 12% des

femmes incluses dans le dépistage organisé du cancer du sein comme ayant un niveau de

risque différent de celui du risque moyen. Cela correspond à plus de 1,2 millions de femmes

ciblées par le dépistage organisé, et plus de 600.000 femmes participant actuellement au

dépistage organisé. Le taux de participation n’est actuellement que de 50%. Ce taux

d’adhésion est faible et la France souhaite atteindre au moins 65%. La personnalisation du

dépistage organisé pourrait rendre les femmes plus impliquées et donc augmenter le taux

de participation. En effet les 9,4% de femmes à bas risque avec une mammographie tous

les 3 ans pourraient être moins soucieuses du risque de sur-diagnostic, de sur-traitement

ou, même s’il est très faible, du risque de cancer du sein radio induit. Les 2,9% des femmes

à plus haut risque pourraient être incitées à adhérer au dépistage organisé en comprenant

que leur niveau de risque est plus élevé que la moyenne des femmes et leur permettrait

de bénéficier d’une prise en charge plus stringente que celle des femmes à risque "moyen".

Par ailleurs, le score est calculé à l’inclusion et prédit le risque de cancer pour un suivi

de 12 années. Compte tenu de l’évolution des facteurs de risque avec le temps, ce score

73

Conclusion

de risque pourra être recalculé à l’occasion des convocations itératives car les valeurs des

variables d’une femme à l’inclusion peuvent changer au cours du temps. Enfin, comme

tout score prédictif, les coefficients du modèle devront aussi être mis à jour au cours du

temps.

Concernant ce travail, une validation externe sur une base de femmes de 40 à 50 dans

le département de l’Hérault pourrait être réalisée, et une validation externe géographique

pourrait aussi être menée sur les bases d’autres départements membres du CRCDC-OC

ou non, et sur une base d’un autre pays [13].

Enfin, la proposition de cette nouvelle stratégie du dépistage organisé du cancer du sein

doit être validée par des études spécifiques pour savoir quel est le gain médico-économique

réel en tenant compte notamment des cancers d’intervalle, du sur-diagnostic, du stade des

cancers détectés et du sur-traitement.

En conclusion, la population dite aujourd’hui à risque "moyen" de cancer du sein

présente en fait un risque hétérogène de survenue d’un cancer du sein. Ainsi l’opportunité

de personnaliser le dépistage organisé du cancer du sein pourrait augmenter l’efficacité

du dépistage en diminuant le sur-diagnostic chez les femmes à plus bas risque et en

augmentant la sensibilité du dépistage chez les femmes à plus haut risque, mais aussi en

augmentant l’adhésion des femmes en les rendant plus impliquées dans ce programme de

dépistage organisé personnalisé.

74

Bibliographie

Bibliographie de la thèse

75

[1] American College of Radiology. ACR BI-RADS Atlas : Breast Imaging Reporting and

Data System ; Mammography, Ultrasound, Magnetic Resonance Imaging, Follow-up

and Outcome Monitoring, Data Dictionary. ACR, American College of Radiology,

2013.

[2] J. Barnier. rainette : The Reinert Method for Textual Data Clustering, 2020.

[3] BRESLOW and N. E. Contribution to discussion of paper by D. R. Cox. J. Roy.

Statist. Soc., Ser. B, 34 :216–217, 1972.

[4] P. A. Carney, D. L. Miglioretti, B. C. Yankaskas, K. Kerlikowske, R. Rosenberg,

C. M. Rutter, B. M. Geller, L. A. Abraham, S. H. Taplin, M. Dignan, G. Cutter,

and R. Ballard-Barbash. Individual and combined effects of age, breast density,

and hormone replacement therapy use on the accuracy of screening mammography.

Annals of Internal Medicine, 138(3) :168–175, feb 2003.

[5] C. Cases, M. Di Palma, É. Drahi, S. Fainzang, P. Landais, S. De Montgolfier, F. Pac-

caud, J.-P. Rivière, and D. Thouvenin. Rapport du comité d’orientation sur le dé-

pistage du cancer du sein. Technical report, INCA, 2016.

[6] G. S. Collins, J. B. Reitsma, D. G. Altman, and K. G. Moons. Transparent Re-

porting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis

(TRIPOD) : the TRIPOD statement. Journal of Clinical Epidemiology, 68(2) :112–

121, feb 2015.

[7] C. E. Comstock, C. Gatsonis, G. M. Newstead, B. S. Snyder, I. F. Gareen, J. T. Ber-

gin, H. Rahbar, J. S. Sung, C. Jacobs, J. A. Harvey, M. H. Nicholson, R. C. Ward,

J. Holt, A. Prather, K. D. Miller, M. D. Schnall, and C. K. Kuhl. Comparison of

Abbreviated Breast MRI vs Digital Breast Tomosynthesis for Breast Cancer Detec-

tion among Women with Dense Breasts Undergoing Screening. JAMA - Journal of

the American Medical Association, 323(8) :746–756, 2020.

[8] J. P. Daures, G. Berthaud, A. Stoebner, C. Arnaud, J. L. Lamarque, J. C. Laurent,

D. Cherischeikh, P. Boulet, and J. Pujol. Diagnosis of cancer of the breast by mam-

mography screening in a mobile van : the 5,000 first results of the Herault experience.

Pathologie-biologie, 39(9) :853–854, nov 1992.

[9] J. S. Drukteinis, B. P. Mooney, C. I. Flowers, and R. A. Gatenby. Beyond mammo-

graphy : New frontiers in breast cancer screening, jun 2013.

[10] N. Fiorini, K. Canese, G. Starchenko, E. Kireev, W. Kim, V. Miller, M. Osipov,

M. Kholodov, R. Ismagilov, S. Mohan, J. Ostell, and Z. Lu. Best Match : New

relevance search for PubMed. PLoS Biology, 16(8), aug 2018.

76

[11] Haute Autorité de Santé. Dépistage du cancer du sein en France : identification des

femmes à haut risque et modalités de dépistage. Technical report, HAS, 2014.

[12] P. Heus, J. B. Reitsma, G. S. Collins, J. A. Damen, R. J. Scholten, D. G. Altman,

K. G. Moons, and L. Hooft. Transparent Reporting of Multivariable Prediction

Models in Journal and Conference Abstracts : TRIPOD for Abstracts. Annals of

internal medicine, jun 2020.

[13] G. Iatrakis, J. P. Daures, N. Geahchan, T. Maudelonde, A. Bothou, C. Chraibi,

O. Omar, S. Voiculescu, E. Antoniou, T. Youseff, P. Tsikouras, G. Galazios, A. Cha-

lazonitis, and S. Zervoudis. Manosmed University’s Risk factor calculator for female

breast cancer : Preliminary data. Review of Clinical Pharmacology and Pharmacoki-

netics, International Edition, 32(1) :23–27, 2018.

[14] INCa. Avis de la conférence des citoyennes. Technical report, INCa, 2016.

[15] INCa. Avis de la conférence des professionnels. Technical report, INCa, 2016.

[16] INCa. Cancer du sein. Quelques chiffres. Technical report, Institut National du

Cancer, 2016.

[17] INCa. Plan d’action pour la rénovation du dépistage organisé du cancer du sein.

Technical report, Institut National du Cancer, 2017.

[18] N. Jay, G. Nuemi, M. Gadreau, and C. Quantin. A data mining approach for grouping

and analyzing trajectories of care using claim data : The example of breast cancer.

BMC Medical Informatics and Decision Making, 13(1), nov 2013.

[19] R. T. Lin, H. J. Dai, Y. Y. Bow, J. L. T. Chiu, and R. T. H. Tsai. Using conditional

random fields for result identification in biomedical abstracts. Integrated Computer-

Aided Engineering, 16(4) :339–352, 2009.

[20] J. Moretta, P. Berthet, V. Bonadona, O. Caron, O. Cohen-Haguenauer, C. Colas,

C. Corsini, V. Cusin, A. De Pauw, C. Delnatte, S. Dussart, C. Jamain, M. Longy,

E. Luporsi, C. Maugard, T. D. Nguyen, P. Pujol, D. Vaur, N. Andrieu, C. Lasset,

and C. Noguès. The French Genetic and Cancer Consortium guidelines for multigene

panel analysis in hereditary breast and ovarian cancer predisposition. Bulletin du

Cancer, 105(10) :907–917, oct 2018.

[21] R. Pastor-Barriuso, N. Ascunce, M. Ederra, N. Erdozáin, A. Murillo, J. E. Alés-

Martínez, and M. Pollán. Recalibration of the Gail model for predicting invasive

breast cancer risk in Spanish women : A population-based cohort study. Breast

Cancer Research and Treatment, 138(1) :249–259, feb 2013.

[22] V. Perrier, F. Meyer, and D. Granjon. shinyWidgets : Custom Inputs Widgets for

Shiny, 2020.

[23] R Core Team. R : A Language and Environment for Statistical Computing. R

Foundation for Statistical Computing, Vienna, Austria, 2019.

77

[24] P. Ratinaud. Iramuteq : Interface de R pour les Analyses Multidimensionnelles de

Textes et de Questionnaires, 2015.

[25] P. Ratinaud and P. Marchand. Application de la méthode ALCESTE aux "gros"

corpus et stabilité des "mondes lexicaux" : analyse du "CableGate" avec IRAMU-

TEQ [Application of the ALCESTE method to "big" corpora and stability of "lexical

worlds" : analysis of "CableGate" with IRAMUTEQ]. Actes des 11èmes Journées

Internationales d’Analyse des Données Textuelles (JADT), pages 835–844, 2012.

[26] M. Reinert. Une méthode de classification descendante hiérarchique : application à

l’analyse lexicale par contexte. Cahiers de l’analyse des données, 3(2) :187–198, 1983.

[27] M. Reinert. Alceste une méthodologie d’analyse des données textuelles et une ap-

plication : Aurelia De Gerard De Nerval. Bulletin de Méthodologie Sociologique,

26(1) :24–54, mar 1990.

[28] J. Ševa, D. L. Wiegandt, J. Götze, M. Lamping, D. Rieke, R. Schäfer, P. Jähnichen,

M. Kittner, S. Pallarz, J. Starlinger, U. Keilholz, and U. Leser. VIST - a Variant-

Information Search Tool for precision oncology. BMC Bioinformatics, 20(1), dec

2019.

[29] C. Simon, K. Davidsen, C. Hansen, E. Seymour, M. B. Barnkob, and L. R. Olsen.

BioReader : A text mining tool for performing classification of biomedical literature.

BMC Bioinformatics, 19, 2019.

[30] J. M. Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie. shiny : Web Application

Framework for R, 2019.

[31] D. J. Winter. rentrez : An R package for the NCBI eUtils API. R Journal, 9(2) :520–

526, 2017.

78

Breast density

Aitken, Zoe et al. « Screen-film mammographic density and breast cancer risk : a comparison of thevolumetric standard mammogram form and the interactive threshold measurement methods. » eng.In : Cancer epidemiology, biomarkers & prevention : a publication of the American Association forCancer Research, cosponsored by the American Society of Preventive Oncology 19.2 (2010), p. 418-428. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-09-1059.

Assi, Valentina et al. « A case-control study to assess the impact of mammographic density on breastcancer risk in women aged 40-49 at intermediate familial risk. » eng. In : International journal ofcancer 136.10 (2015), p. 2378-2387. issn : 1097-0215 (Electronic). doi : 10.1002/ijc.29275.

Attam, A et al. « Mammographic density as a risk factor for breast cancer in a low risk population. »eng. In : Indian journal of cancer 45.2 (2008), p. 50-53. issn : 0019-509X (Print). doi : 10.4103/0019-509x.41770.

Bae, Jong-Myon et Eun Hee Kim. « Breast Density and Risk of Breast Cancer in Asian Women : AMeta-analysis of Observational Studies. » eng. In : Journal of preventive medicine and public health= Yebang Uihakhoe chi 49.6 (2016), p. 367-375. issn : 2233-4521 (Electronic). doi : 10.3961/jpmph.16.054.

Bertrand, Kimberly A et al. « Dense and nondense mammographic area and risk of breast cancerby age and tumor characteristics. » eng. In : Cancer epidemiology, biomarkers & prevention : apublication of the American Association for Cancer Research, cosponsored by the American Societyof Preventive Oncology 24.5 (2015), p. 798-809. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-14-1136.

Boyd, Norman et al. « Mammographic density and breast cancer risk : evaluation of a novel methodof measuring breast tissue volumes. » eng. In : Cancer epidemiology, biomarkers & prevention : apublication of the American Association for Cancer Research, cosponsored by the American Societyof Preventive Oncology 18.6 (2009), p. 1754-1762. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-09-0107.

Boyd, Norman F. « Mammographic density and risk of breast cancer. » eng. In : American Society ofClinical Oncology educational book. American Society of Clinical Oncology. Annual Meeting (2013).issn : 1548-8756 (Electronic). doi : 10.1200/EdBook_AM.2013.33.e57.

Boyd, Norman F et al. « Body size, mammographic density, and breast cancer risk. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for CancerResearch, cosponsored by the American Society of Preventive Oncology 15.11 (2006), p. 2086-2092.issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-06-0345.

Boyd, Norman F et al. « Mammographic density and the risk and detection of breast cancer. » eng.In : The New England journal of medicine 356.3 (2007), p. 227-236. issn : 1533-4406 (Electronic).doi : 10.1056/NEJMoa062790.

Brand, Judith S et al. « Automated measurement of volumetric mammographic density : a tool forwidespread breast cancer risk assessment. » eng. In : Cancer epidemiology, biomarkers & preven-tion : a publication of the American Association for Cancer Research, cosponsored by the AmericanSociety of Preventive Oncology 23.9 (2014), p. 1764-1772. issn : 1538-7755 (Electronic). doi :10.1158/1055-9965.EPI-13-1219.

Brentnall, Adam R et al. « Mammographic density adds accuracy to both the Tyrer-Cuzick and Gailbreast cancer risk models in a prospective UK screening cohort. » In : Breast cancer research : BCR17.1 (2015), p. 147. issn : 1465-542X. doi : 10.1186/s13058-015-0653-5. url : http://www.ncbi.nlm.nih.gov/pubmed/26627479http://www.pubmedcentral.nih.gov/articlerender.

fcgi?artid=PMC4665886.Chiarelli, Anna M et al. « Influence of patterns of hormone replacement therapy use and mam-

mographic density on breast cancer detection. » eng. In : Cancer epidemiology, biomarkers &prevention : a publication of the American Association for Cancer Research, cosponsored by theAmerican Society of Preventive Oncology 15.10 (2006), p. 1856-1862. issn : 1055-9965 (Print).doi : 10.1158/1055-9965.EPI-06-0290.

Bibliographie des facteurs de risque

79

Chiu, Sherry Yueh-Hsia et al. « Effect of baseline breast density on breast cancer incidence, stage,mortality, and screening parameters : 25-year follow-up of a Swedish mammographic screening. »eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the American Associationfor Cancer Research, cosponsored by the American Society of Preventive Oncology 19.5 (2010),p. 1219-1228. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-09-1028.

Conroy, Shannon M et al. « Mammographic density and hormone receptor expression in breastcancer : the Multiethnic Cohort Study. » eng. In : Cancer epidemiology 35.5 (2011), p. 448-452.issn : 1877-783X (Electronic). doi : 10.1016/j.canep.2010.11.011.

Cummings, Steven R et al. « Prevention of breast cancer in postmenopausal women : approaches toestimating and reducing risk. » eng. In : Journal of the National Cancer Institute 101.6 (2009),p. 384-398. issn : 1460-2105 (Electronic). doi : 10.1093/jnci/djp018.

Ding, Jane et al. « Evaluating the effectiveness of using standard mammogram form to predict breastcancer risk : case-control study. » eng. In : Cancer epidemiology, biomarkers & prevention : apublication of the American Association for Cancer Research, cosponsored by the American Societyof Preventive Oncology 17.5 (2008), p. 1074-1081. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-07-2634.

Ding, Jane et al. « Mammographic density, estrogen receptor status and other breast cancer tumorcharacteristics. » eng. In : The breast journal 16.3 (2010), p. 279-289. issn : 1524-4741 (Electronic).doi : 10.1111/j.1524-4741.2010.00907.x.

Edwards, Brandy L et al. « The Association of Mammographic Density and Molecular Breast CancerSubtype. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the AmericanAssociation for Cancer Research, cosponsored by the American Society of Preventive Oncology 26.10(2017), p. 1487-1492. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-16-0881.

Fair, Alecia Malin et al. « Increased vitamin D and calcium intake associated with reduced mammogra-phic breast density among premenopausal women. » eng. In : Nutrition research (New York, N.Y.)35.10 (2015), p. 851-857. issn : 1879-0739 (Electronic). doi : 10.1016/j.nutres.2015.07.004.

Felix, Ashley S et al. « Relationships between mammographic density, tissue microvessel density,and breast biopsy diagnosis. » eng. In : Breast cancer research : BCR 18.1 (2016), p. 88. issn :1465-542X (Electronic). doi : 10.1186/s13058-016-0746-9.

Fowler, Erin E E et al. « Automated Percentage of Breast Density Measurements for Full-fieldDigital Mammography Applications. » eng. In : Academic radiology 21.8 (2014), p. 958-970. issn :1878-4046 (Electronic). doi : 10.1016/j.acra.2014.04.006.

Ghosh, Karthik et al. « Independent association of lobular involution and mammographic breastdensity with breast cancer risk. » eng. In : Journal of the National Cancer Institute 102.22 (2010),p. 1716-1723. issn : 1460-2105 (Electronic). doi : 10.1093/jnci/djq414.

Gill, Jasmeet K et al. « The association of mammographic density with ductal carcinoma in situ ofthe breast : the Multiethnic Cohort. » eng. In : Breast cancer research : BCR 8.3 (2006), R30. issn :1465-542X (Electronic). doi : 10.1186/bcr1507.

Habel, Laurel A et al. « Case-control study of mammographic density and breast cancer risk usingprocessed digital mammograms. » eng. In : Breast cancer research : BCR 18.1 (2016), p. 53. issn :1465-542X (Electronic). doi : 10.1186/s13058-016-0715-3.

Heine, John J et al. « A novel automated mammographic density measure and breast cancer risk. »eng. In : Journal of the National Cancer Institute 104.13 (2012), p. 1028-1037. issn : 1460-2105(Electronic). doi : 10.1093/jnci/djs254.

Heusinger, Katharina et al. « Mammographic density as a risk factor for breast cancer in a Germancase-control study. » eng. In : European journal of cancer prevention : the official journal of theEuropean Cancer Prevention Organisation (ECP) 20.1 (2011), p. 1-8. issn : 1473-5709 (Electronic).doi : 10.1097/CEJ.0b013e328341e2ce.

Hwang, E Shelley et al. « Association between breast density and subsequent breast cancer followingtreatment for ductal carcinoma in situ. » eng. In : Cancer epidemiology, biomarkers & prevention : apublication of the American Association for Cancer Research, cosponsored by the American Society

80

of Preventive Oncology 16.12 (2007), p. 2587-2593. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-07-0458.

Keller, Brad M et al. « Preliminary evaluation of the publicly available Laboratory for Breast Ra-diodensity Assessment (LIBRA) software tool : comparison of fully automated area and volumetricdensity measures in a case-control study with digital mammography. » eng. In : Breast cancer re-search : BCR 17 (2015), p. 117. issn : 1465-542X (Electronic). doi : 10.1186/s13058-015-0626-8.

Kerlikowske, Karla et al. « Breast cancer risk by breast density, menopause, and postmenopausalhormone therapy use. » eng. In : Journal of clinical oncology : official journal of the AmericanSociety of Clinical Oncology 28.24 (2010), p. 3830-3837. issn : 1527-7755 (Electronic). doi : 10.1200/JCO.2009.26.4770.

Kerlikowske, Karla et al. « Combining quantitative and qualitative breast density measures to assessbreast cancer risk. » eng. In : Breast cancer research : BCR 19.1 (2017), p. 97. issn : 1465-542X(Electronic). doi : 10.1186/s13058-017-0887-5.

Kerlikowske, Karla et al. « Longitudinal measurement of clinical mammographic breast density toimprove estimation of breast cancer risk. » eng. In : Journal of the National Cancer Institute 99.5(2007), p. 386-395. issn : 1460-2105 (Electronic). doi : 10.1093/jnci/djk066.

Ko, Hyeonyoung et al. « Comparison of the association of mammographic density and clinical factorswith ductal carcinoma in situ versus invasive ductal breast cancer in Korean women. » eng. In :BMC cancer 17.1 (2017), p. 821. issn : 1471-2407 (Electronic). doi : 10.1186/s12885-017-3841-0.

Krishnan, Kavitha et al. « Mammographic density and risk of breast cancer by mode of detectionand tumor size : a case-control study. » eng. In : Breast cancer research : BCR 18.1 (2016), p. 63.issn : 1465-542X (Electronic). doi : 10.1186/s13058-016-0722-4.

Lee, Charmaine Pei Ling et al. « Mammographic Breast Density and Common Genetic Variantsin Breast Cancer Risk Prediction. » eng. In : PloS one 10.9 (2015), e0136650. issn : 1932-6203(Electronic). doi : 10.1371/journal.pone.0136650.

Lokate, Mariëtte et al. « Mammographic density and breast cancer risk : the role of the fat surroun-ding the fibroglandular tissue. » eng. In : Breast cancer research : BCR 13.5 (2011), R103. issn :1465-542X (Electronic). doi : 10.1186/bcr3044.

Ma, Huiyan et al. « Is there a difference in the association between percent mammographic densityand subtypes of breast cancer ? Luminal A and triple-negative breast cancer. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for CancerResearch, cosponsored by the American Society of Preventive Oncology 18.2 (2009), p. 479-485.issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-08-0805.

Manduca, Armando et al. « Texture features from mammographic images and risk of breast cancer. »eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the American Associationfor Cancer Research, cosponsored by the American Society of Preventive Oncology 18.3 (2009),p. 837-845. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-08-0631.

Martin, Lisa J et al. « Family history, mammographic density, and risk of breast cancer. » eng.In : Cancer epidemiology, biomarkers & prevention : a publication of the American Associationfor Cancer Research, cosponsored by the American Society of Preventive Oncology 19.2 (2010),p. 456-463. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-09-0881.

Masala, Giovanna et al. « Mammographic breast density and breast cancer risk in a Mediterraneanpopulation : a nested case-control study in the EPIC Florence cohort. » eng. In : Breast cancerresearch and treatment 164.2 (2017), p. 467-473. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-017-4274-9.

Maskarinec, Gertraud et al. « Tumor characteristics and family history in relation to mammographicdensity and breast cancer : The French E3N cohort. » eng. In : Cancer epidemiology 49 (2017),p. 156-160. issn : 1877-783X (Electronic). doi : 10.1016/j.canep.2017.07.003.

McCormack, Valerie A et Isabel dos Santos Silva. « Breast density and parenchymal patternsas markers of breast cancer risk : a meta-analysis. » eng. In : Cancer epidemiology, biomarkers &prevention : a publication of the American Association for Cancer Research, cosponsored by the

81

American Society of Preventive Oncology 15.6 (2006), p. 1159-1169. issn : 1055-9965 (Print). doi :10.1158/1055-9965.EPI-06-0034.

Mitchell, Gillian et al. « Mammographic density and breast cancer risk in BRCA1 and BRCA2mutation carriers. » eng. In : Cancer research 66.3 (2006), p. 1866-1872. issn : 0008-5472 (Print).doi : 10.1158/0008-5472.CAN-05-3368.

Nguyen, Tuong L et al. « Breast Cancer Risk Associations with Digital Mammographic Density byPixel Brightness Threshold and Mammographic System. » eng. In : Radiology 286.2 (2018), p. 433-442. issn : 1527-1315 (Electronic). doi : 10.1148/radiol.2017170306.

Nguyen, Tuong L et al. « Mammographic density defined by higher than conventional brightnessthresholds better predicts breast cancer risk. » eng. In : International journal of epidemiology 46.2(2017), p. 652-661. issn : 1464-3685 (Electronic). doi : 10.1093/ije/dyw212.

Nguyen, Tuong Linh et al. « Mammographic density defined by higher than conventional brightnessthreshold better predicts breast cancer risk for full-field digital mammograms. » eng. In : Breastcancer research : BCR 17 (2015), p. 142. issn : 1465-542X (Electronic). doi : 10.1186/s13058-015-0654-4.

Nickson, Carolyn et al. « AutoDensity : an automated method to measure mammographic breastdensity that predicts breast cancer risk and screening outcomes. » eng. In : Breast cancer research :BCR 15.5 (2013), R80. issn : 1465-542X (Electronic). doi : 10.1186/bcr3474.

Nielsen, Mads et al. « Mammographic texture resemblance generalizes as an independent risk factorfor breast cancer. » eng. In : Breast cancer research : BCR 16.2 (2014), R37. issn : 1465-542X(Electronic). doi : 10.1186/bcr3641.

Olson, Janet E et al. « The influence of mammogram acquisition on the mammographic density andbreast cancer association in the Mayo Mammography Health Study cohort. » eng. In : Breast cancerresearch : BCR 14.6 (2012), R147. issn : 1465-542X (Electronic). doi : 10.1186/bcr3357.

Olsson, Åsa et al. « Breast density and mode of detection in relation to breast cancer specific survival :a cohort study. » eng. In : BMC cancer 14 (2014), p. 229. issn : 1471-2407 (Electronic). doi :10.1186/1471-2407-14-229.

Pettersson, Andreas et al. « Mammographic density phenotypes and risk of breast cancer : a meta-analysis. » eng. In : Journal of the National Cancer Institute 106.5 (2014). issn : 1460-2105 (Elec-tronic). doi : 10.1093/jnci/dju078.

Pettersson, Andreas et al. « Nondense mammographic area and risk of breast cancer. » eng. In :Breast cancer research : BCR 13.5 (2011), R100. issn : 1465-542X (Electronic). doi : 10.1186/bcr3041.

Phipps, Amanda I et al. « Risk factors for ductal, lobular, and mixed ductal-lobular breast cancer ina screening population. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication ofthe American Association for Cancer Research, cosponsored by the American Society of PreventiveOncology 19.6 (2010), p. 1643-1654. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-10-0188.

Pollán, Marina et al. « Mammographic density and risk of breast cancer according to tumor charac-teristics and mode of detection : a Spanish population-based case-control study. » eng. In : Breastcancer research : BCR 15.1 (2013), R9. issn : 1465-542X (Electronic). doi : 10.1186/bcr3380.

Ramón Y Cajal, Teresa et al. « Mammographic density and breast cancer in women from high riskfamilies. » eng. In : Breast cancer research : BCR 17.1 (2015), p. 93. issn : 1465-542X (Electronic).doi : 10.1186/s13058-015-0604-1.

Rauh, C et al. « Percent Mammographic Density and Dense Area as Risk Factors for Breast Cancer. »eng. In : Geburtshilfe und Frauenheilkunde 72.8 (2012), p. 727-733. issn : 0016-5751 (Print). doi :10.1055/s-0032-1315129.

Razzaghi, Hilda et al. « Association between mammographic density and basal-like and luminal Abreast cancer subtypes. » eng. In : Breast cancer research : BCR 15.5 (2013), R76. issn : 1465-542X(Electronic). doi : 10.1186/bcr3470.

82

Razzaghi, Hilda et al. « Mammographic density and breast cancer risk in White and African AmericanWomen. » eng. In : Breast cancer research and treatment 135.2 (2012), p. 571-580. issn : 1573-7217(Electronic). doi : 10.1007/s10549-012-2185-3.

Rice, Megan S, Bernard A Rosner et Rulla M Tamimi. « Percent mammographic density prediction :development of a model in the nurses’ health studies. » eng. In : Cancer causes & control : CCC28.7 (2017), p. 677-684. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-017-0898-7.

Rice, Megan S et al. « Breast cancer risk prediction : an update to the Rosner-Colditz breast cancerincidence model. » eng. In : Breast cancer research and treatment 166.1 (2017), p. 227-240. issn :1573-7217 (Electronic). doi : 10.1007/s10549-017-4391-5.

Sartor, Hanna et al. « Mammographic density in relation to tumor biomarkers, molecular subtypes,and mode of detection in breast cancer. » eng. In : Cancer causes & control : CCC 26.6 (2015),p. 931-939. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-015-0576-6.

Schoemaker, M J et al. « Combined effects of endogenous sex hormone levels and mammographicdensity on postmenopausal breast cancer risk : results from the Breakthrough Generations Study. »eng. In : British journal of cancer 110.7 (2014), p. 1898-1907. issn : 1532-1827 (Electronic). doi :10.1038/bjc.2014.64.

Shepherd, John A et al. « Volume of mammographic density and risk of breast cancer. » eng. In :Cancer epidemiology, biomarkers & prevention : a publication of the American Association forCancer Research, cosponsored by the American Society of Preventive Oncology 20.7 (2011), p. 1473-1482. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-10-1150.

Stuedal, Anne et al. « Does breast size modify the association between mammographic density andbreast cancer risk ? » eng. In : Cancer epidemiology, biomarkers & prevention : a publication ofthe American Association for Cancer Research, cosponsored by the American Society of PreventiveOncology 17.3 (2008), p. 621-627. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-07-2554.

Tesic, Vanja et al. « Mammographic density and estimation of breast cancer risk in intermediate riskpopulation. » eng. In : The breast journal 19.1 (2013), p. 71-78. issn : 1524-4741 (Electronic). doi :10.1111/tbj.12051.

Trieu, Phuong Dung (Yun) et al. « Inconsistencies of Breast Cancer Risk Factors between the Northernand Southern Regions of Vietnam. » eng. In : Asian Pacific journal of cancer prevention : APJCP18.10 (2017), p. 2747-2754. issn : 2476-762X (Electronic). doi : 10.22034/APJCP.2017.18.10.2747.

Trieu, Phuong Dung Yun et al. « Risk Factors of Female Breast Cancer in Vietnam : A Case-ControlStudy. » eng. In : Cancer research and treatment : official journal of Korean Cancer Association49.4 (2017), p. 990-1000. issn : 2005-9256 (Electronic). doi : 10.4143/crt.2016.488.

Vachon, Celine M et al. « Mammographic breast density as a general marker of breast cancer risk. »eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the American Associationfor Cancer Research, cosponsored by the American Society of Preventive Oncology 16.1 (2007),p. 43-49. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-06-0738.

Warwick, Jane et al. « Mammographic breast density refines Tyrer-Cuzick estimates of breast cancerrisk in high-risk women : findings from the placebo arm of the International Breast Cancer Inter-vention Study I. » eng. In : Breast cancer research : BCR 16.5 (2014), p. 451. issn : 1465-542X(Electronic). doi : 10.1186/s13058-014-0451-5.

Wong, C S et al. « Mammographic density and its interaction with other breast cancer risk factors inan Asian population. » eng. In : British journal of cancer 104.5 (2011), p. 871-874. issn : 1532-1827(Electronic). doi : 10.1038/sj.bjc.6606085.

Woolcott, Christy G et al. « Mammographic density, parity and age at first birth, and risk of breastcancer : an analysis of four case-control studies. » eng. In : Breast cancer research and treatment132.3 (2012), p. 1163-1171. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-011-1929-9.

Yaghjyan, Lusine et al. « Mammographic breast density and breast cancer risk by menopausal status,postmenopausal hormone use and a family history of breast cancer ». In : Cancer Causes and

83

Control 23.5 (2012), p. 785-790. issn : 09575243. doi : 10.1007/s10552- 012- 9936- 7. url :http://www.ncbi.nlm.nih.gov/pubmed/22438073.

Yaghjyan, Lusine et al. « Mammographic breast density and breast cancer risk : interactions ofpercent density, absolute dense, and non-dense areas with breast cancer risk factors. » eng. In :Breast cancer research and treatment 150.1 (2015), p. 181-189. issn : 1573-7217 (Electronic). doi :10.1007/s10549-015-3286-6.

Yaghjyan, Lusine et al. « Mammographic breast density and subsequent risk of breast cancer in post-menopausal women according to the time since the mammogram. » eng. In : Cancer epidemiology,biomarkers & prevention : a publication of the American Association for Cancer Research, cospon-sored by the American Society of Preventive Oncology 22.6 (2013), p. 1110-1117. issn : 1538-7755(Electronic). doi : 10.1158/1055-9965.EPI-13-0169.

Yaghjyan, Lusine et al. « Mammographic breast density and subsequent risk of breast cancer inpostmenopausal women according to tumor characteristics. » eng. In : Journal of the NationalCancer Institute 103.15 (2011), p. 1179-1189. issn : 1460-2105 (Electronic). doi : 10.1093/jnci/djr225.

Yaghjyan, Lusine et al. « Postmenopausal mammographic breast density and subsequent breast can-cer risk according to selected tissue markers. » eng. In : British journal of cancer 113.7 (2015),p. 1104-1113. issn : 1532-1827 (Electronic). doi : 10.1038/bjc.2015.315.

Breast feeding

Akbari, Atieh et al. « Parity and breastfeeding are preventive measures against breast cancer inIranian women. » eng. In : Breast cancer (Tokyo, Japan) 18.1 (2011), p. 51-55. issn : 1880-4233(Electronic). doi : 10.1007/s12282-010-0203-z.

Anothaisintawee, Thunyarat et al. « Risk factors of breast cancer : a systematic review and meta-analysis. » eng. In : Asia-Pacific journal of public health 25.5 (2013), p. 368-387. issn : 1941-2479(Electronic). doi : 10.1177/1010539513488795.

Atkinson, Rachel L et al. « Epidemiological risk factors associated with inflammatory breast cancersubtypes. » eng. In : Cancer causes & control : CCC 27.3 (2016), p. 359-366. issn : 1573-7225(Electronic). doi : 10.1007/s10552-015-0712-3.

Awatef, Msolly et al. « Breastfeeding reduces breast cancer risk : a case-control study in Tunisia. »eng. In : Cancer causes & control : CCC 21.3 (2010), p. 393-397. issn : 1573-7225 (Electronic).doi : 10.1007/s10552-009-9471-3.

Bano, Raisa et al. « Potential Risk Factors for Breast Cancer in Pakistani Women. » eng. In : AsianPacific journal of cancer prevention : APJCP 17.9 (2016), p. 4307-4312. issn : 2476-762X (Elec-tronic).

Bertrand, Kimberly A et al. « Differential Patterns of Risk Factors for Early-Onset Breast Can-cer by ER Status in African American Women. » eng. In : Cancer epidemiology, biomarkers &prevention : a publication of the American Association for Cancer Research, cosponsored by theAmerican Society of Preventive Oncology 26.2 (2017), p. 270-277. issn : 1538-7755 (Electronic).doi : 10.1158/1055-9965.EPI-16-0692.

Bhadoria, A S et al. « Reproductive factors and breast cancer : a case-control study in tertiary carehospital of North India. » eng. In : Indian journal of cancer 50.4 (2013), p. 316-321. issn : 1998-4774(Electronic). doi : 10.4103/0019-509X.123606.

Butt, Salma et al. « Breastfeeding in relation to risk of different breast cancer characteristics. » eng.In : BMC research notes 7 (2014), p. 216. issn : 1756-0500 (Electronic). doi : 10.1186/1756-0500-7-216.

Chen, Lu et al. « Reproductive Factors and Risk of Luminal, HER2-Overexpressing, and Triple-Negative Breast Cancer Among Multiethnic Women. » eng. In : Cancer epidemiology, biomarkers& prevention : a publication of the American Association for Cancer Research, cosponsored by theAmerican Society of Preventive Oncology 25.9 (2016), p. 1297-1304. issn : 1538-7755 (Electronic).doi : 10.1158/1055-9965.EPI-15-1104.

84

Chollet-Hinton, Lynn et al. « Biology and Etiology of Young-Onset Breast Cancers among Pre-menopausal African American Women : Results from the AMBER Consortium. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for Cancer Re-search, cosponsored by the American Society of Preventive Oncology 26.12 (2017), p. 1722-1729.issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-17-0450.

Connor, Avonne E et al. « Pre-diagnostic breastfeeding, adiposity, and mortality among parous His-panic and non-Hispanic white women with invasive breast cancer : the Breast Cancer Health Dis-parities Study. » eng. In : Breast cancer research and treatment 161.2 (2017), p. 321-331. issn :1573-7217 (Electronic). doi : 10.1007/s10549-016-4048-9.

De Silva, Malintha et al. « Prolonged breastfeeding reduces risk of breast cancer in Sri Lankan women :a case-control study. » eng. In : Cancer epidemiology 34.3 (2010), p. 267-273. issn : 1877-783X(Electronic). doi : 10.1016/j.canep.2010.02.012.

Delort, Laetitia et al. « Risk factors for early age at breast cancer onset–the "COSA program"population-based study. » eng. In : Anticancer research 27.2 (2007), p. 1087-1094. issn : 0250-7005(Print).

Dianatinasab, Mostafa et al. « Hair Coloring, Stress, and Smoking Increase the Risk of Breast Can-cer : A Case-Control Study. » eng. In : Clinical breast cancer 17.8 (2017), p. 650-659. issn : 1938-0666(Electronic). doi : 10.1016/j.clbc.2017.04.012.

Elkum, Naser et al. « Obesity is a significant risk factor for breast cancer in Arab women. » eng. In :BMC cancer 14 (2014), p. 788. issn : 1471-2407 (Electronic). doi : 10.1186/1471-2407-14-788.

Faheem, Muhammad et al. « Risk factors for breast cancer in patients treated at NORI Hospital,Islamabad. » eng. In : JPMA. The Journal of the Pakistan Medical Association 57.5 (2007), p. 242-245. issn : 0030-9982 (Print).

Gajalakshmi, Vendhan et al. « Breastfeeding and breast cancer risk in India : a multicenter case-control study. » eng. In : International journal of cancer 125.3 (2009), p. 662-665. issn : 1097-0215(Electronic). doi : 10.1002/ijc.24429.

Galukande, Moses et al. « Breast Cancer Risk Factors among Ugandan Women at a Tertiary Hos-pital : A Case-Control Study. » eng. In : Oncology 90.6 (2016), p. 356-362. issn : 1423-0232 (Elec-tronic). doi : 10.1159/000445379.

Gaudet, Mia M et al. « Risk factors by molecular subtypes of breast cancer across a population-basedstudy of women 56 years or younger. » eng. In : Breast cancer research and treatment 130.2 (2011),p. 587-597. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-011-1616-x.

Ghiasvand, Reza et al. « Risk factors for breast cancer among young women in southern Iran. » eng.In : International journal of cancer 129.6 (2011), p. 1443-1449. issn : 1097-0215 (Electronic). doi :10.1002/ijc.25748.

Giudici, Fabiola et al. « Breastfeeding : a reproductive factor able to reduce the risk of luminal Bbreast cancer in premenopausal White women. » eng. In : European journal of cancer prevention :the official journal of the European Cancer Prevention Organisation (ECP) 26.3 (2017), p. 217-224.issn : 1473-5709 (Electronic). doi : 10.1097/CEJ.0000000000000220.

Hadjisavvas, Andreas et al. « An investigation of breast cancer risk factors in Cyprus : a case controlstudy. » eng. In : BMC cancer 10 (2010), p. 447. issn : 1471-2407 (Electronic). doi : 10.1186/1471-2407-10-447.

Hajian-Tilaki, K O et T Kaveh-Ahangar. « Reproductive factors associated with breast cancerrisk in northern Iran. » eng. In : Medical oncology (Northwood, London, England) 28.2 (2011),p. 441-446. issn : 1559-131X (Electronic). doi : 10.1007/s12032-010-9498-z.

Holm, Johanna et al. « Assessment of Breast Cancer Risk Factors Reveals Subtype Heterogeneity. »eng. In : Cancer research 77.13 (2017), p. 3708-3717. issn : 1538-7445 (Electronic). doi : 10.1158/0008-5472.CAN-16-2574.

Hosseinzadeh, Mina et al. « Risk factors for breast cancer in Iranian women : a hospital-based case-control study in tabriz, iran. » eng. In : Journal of breast cancer 17.3 (2014), p. 236-243. issn :1738-6756 (Print). doi : 10.4048/jbc.2014.17.3.236.

85

Huo, D et al. « Parity and breastfeeding are protective against breast cancer in Nigerian women. »eng. In : British journal of cancer 98.5 (2008), p. 992-996. issn : 0007-0920 (Print). doi : 10.1038/sj.bjc.6604275.

Ilic, Milena, Hristina Vlajinac et Jelena Marinkovic. « Breastfeeding and Risk of Breast Can-cer : Case-Control Study. » eng. In : Women & health 55.7 (2015), p. 778-794. issn : 1541-0331(Electronic). doi : 10.1080/03630242.2015.1050547.

Iqbal, Javaid et al. « Risk Factors for Premenopausal Breast Cancer in Bangladesh. » eng. In :International journal of breast cancer 2015 (2015), p. 612042. issn : 2090-3170 (Print). doi : 10.1155/2015/612042.

Islami, F et al. « Breastfeeding and breast cancer risk by receptor status–a systematic review andmeta-analysis. » eng. In : Annals of oncology : official journal of the European Society for MedicalOncology 26.12 (2015), p. 2398-2407. issn : 1569-8041 (Electronic). doi : 10.1093/annonc/mdv379.

Jeong, Seok Hun et al. « Risk Reduction of Breast Cancer by Childbirth, Breastfeeding, and TheirInteraction in Korean Women : Heterogeneous Effects Across Menopausal Status, Hormone Re-ceptor Status, and Pathological Subtypes. » eng. In : Journal of preventive medicine and pu-blic health = Yebang Uihakhoe chi 50.6 (2017), p. 401-410. issn : 2233-4521 (Electronic). doi :10.3961/jpmph.17.152.

Joukar, Farahnaz et al. « The Investigation of Risk Factors Impacting Breast Cancer in GuilanProvince. » eng. In : Asian Pacific journal of cancer prevention : APJCP 17.10 (2016), p. 4623-4629. issn : 2476-762X (Electronic). doi : 10.22034/apjcp.2016.17.10.4623.

Kabat, Geoffrey C et al. « Reproductive and menstrual factors and risk of ductal carcinoma in situ ofthe breast in a cohort of postmenopausal women. » eng. In : Cancer causes & control : CCC 22.10(2011), p. 1415-1424. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-011-9814-8.

Kawai, Masaaki et al. « Reproductive factors and breast cancer risk in relation to hormone receptorand menopausal status in Japanese women. » eng. In : Cancer science 103.10 (2012), p. 1861-1870.issn : 1349-7006 (Electronic). doi : 10.1111/j.1349-7006.2012.02379.x.

Kim, Yeonju et al. « Dose-dependent protective effect of breast-feeding against breast cancer amongever-lactated women in Korea. » eng. In : European journal of cancer prevention : the official journalof the European Cancer Prevention Organisation (ECP) 16.2 (2007), p. 124-129. issn : 0959-8278(Print). doi : 10.1097/01.cej.0000228400.07364.52.

Kocic, B, B Petrovic et S Filipovic. « Risk factors for breast cancer : a hospital-based case-controlstudy. » eng. In : Journal of B.U.ON. : official journal of the Balkan Union of Oncology 13.2 (2008),p. 231-234. issn : 1107-0625 (Print).

Kwan, Marilyn L et al. « Breastfeeding, PAM50 tumor subtype, and breast cancer prognosis and survi-val. » eng. In : Journal of the National Cancer Institute 107.7 (2015). issn : 1460-2105 (Electronic).doi : 10.1093/jnci/djv087.

Laamiri, Fatima Zahra et al. « Risk factors associated with a breast cancer in a population of Moroccanwomen whose age is less than 40 years : a case control study. » eng. In : The Pan African medicaljournal 24 (2016), p. 19. issn : 1937-8688 (Electronic). doi : 10.11604/pamj.2016.24.19.8784.

Lambertini, Matteo et al. « Reproductive behaviors and risk of developing breast cancer accordingto tumor subtype : A systematic review and meta-analysis of epidemiological studies. » eng. In :Cancer treatment reviews 49 (2016), p. 65-76. issn : 1532-1967 (Electronic). doi : 10.1016/j.ctrv.2016.07.006.

Liu, Yan-Ting et al. « Physiological, reproductive factors and breast cancer risk in Jiangsu provinceof China. » eng. In : Asian Pacific journal of cancer prevention : APJCP 12.3 (2011), p. 787-790.issn : 2476-762X (Electronic).

Lodha, R et al. « Association between reproductive factors and breast cancer in an urban set up atcentral India : a case-control study. » eng. In : Indian journal of cancer 48.3 (2011), p. 303-307.issn : 1998-4774 (Electronic). doi : 10.4103/0019-509X.84928.

Lodha, Rama S et al. « Risk factors for breast cancer among women in Bhopal urban agglomerate :a case-control study. » eng. In : Asian Pacific journal of cancer prevention : APJCP 12.8 (2011),p. 2111-2115. issn : 2476-762X (Electronic).

86

Lumachi, Franco et al. « Estrogen therapy and risk of breast cancer in postmenopausal women : acase-control study and results of a multivariate analysis. » eng. In : Menopause (New York, N.Y.)17.3 (2010), p. 524-528. issn : 1530-0374 (Electronic). doi : 10.1097/gme.0b013e3181ca0c74.

Ma, Huiyan et al. « Reproductive factors and breast cancer risk according to joint estrogen andprogesterone receptor status : a meta-analysis of epidemiological studies. » eng. In : Breast cancerresearch : BCR 8.4 (2006), R43. issn : 1465-542X (Electronic). doi : 10.1186/bcr1525.

Ma, Huiyan et al. « Reproductive factors and the risk of triple-negative breast cancer in white womenand African-American women : a pooled analysis. » eng. In : Breast cancer research : BCR 19.1(2017), p. 6. issn : 1465-542X (Electronic). doi : 10.1186/s13058-016-0799-9.

Makama, M et al. « An association study of established breast cancer reproductive and lifestyle riskfactors with tumour subtype defined by the prognostic 70-gene expression signature (MammaPrint( R©)). »eng. In : European journal of cancer (Oxford, England : 1990) 75 (2017), p. 5-13. issn : 1879-0852(Electronic). doi : 10.1016/j.ejca.2016.12.024.

Matalqah, Laila et al. « Predictors of breast cancer among women in a northern state of Malaysia :a matched case-control study. » eng. In : Asian Pacific journal of cancer prevention : APJCP 12.6(2011), p. 1549-1553. issn : 2476-762X (Electronic).

McLaughlin, John R et al. « Reproductive risk factors for ovarian cancer in carriers of BRCA1 orBRCA2 mutations : a case-control study. » eng. In : The Lancet. Oncology 8.1 (2007), p. 26-34.issn : 1470-2045 (Print). doi : 10.1016/S1470-2045(06)70983-4.

Minami, Yuko et al. « Being breastfed in infancy and adult breast cancer risk among Japanese wo-men. » eng. In : Cancer causes & control : CCC 23.2 (2012), p. 389-398. issn : 1573-7225 (Electro-nic). doi : 10.1007/s10552-011-9888-3.

Morales, Luisa et al. « Factors associated with breast cancer in Puerto Rican women. » eng. In :Journal of epidemiology and global health 3.4 (2013), p. 205-215. issn : 2210-6014 (Electronic). doi :10.1016/j.jegh.2013.08.003.

Msolly, Awatef, Olfa Gharbi et Slim Ben Ahmed. « Impact of menstrual and reproductive factorson breast cancer risk in Tunisia : a case-control study. » eng. In : Medical oncology (Northwood,London, England) 30.1 (2013), p. 480. issn : 1559-131X (Electronic). doi : 10.1007/s12032-013-0480-4.

Nagrani, Rajini et al. « Understanding rural-urban differences in risk factors for breast cancer inan Indian population. » eng. In : Cancer causes & control : CCC 27.2 (2016), p. 199-208. issn :1573-7225 (Electronic). doi : 10.1007/s10552-015-0697-y.

Naieni, Kourosh Holakouie et al. « Risk factors of breast cancer in north of Iran : a case-control inMazandaran Province. » eng. In : Asian Pacific journal of cancer prevention : APJCP 8.3 (2007),p. 395-398. issn : 1513-7368 (Print).

Navarro-Ibarra, María Jossé et al. « [INFLUENCE OF REPRODUCTIVE FACTORS, BREAST-FEEDING AND OBESITY ON THE RISK OF BREAST CANCER IN MEXICAN WOMEN]. »spa. In : Nutricion hospitalaria 32.1 (2015), p. 291-298. issn : 1699-5198 (Electronic). doi : 10.3305/nh.2015.32.1.9049.

Oh, Hannah et al. « Breast cancer risk factors in relation to estrogen receptor, progesterone receptor,insulin-like growth factor-1 receptor, and Ki67 expression in normal breast tissue. » eng. In : NPJbreast cancer 3 (2017), p. 39. issn : 2374-4677 (Print). doi : 10.1038/s41523-017-0041-7.

Ozmen, Vahit et al. « Breast cancer risk factors in Turkish women–a University Hospital based nestedcase control study. » eng. In : World journal of surgical oncology 7 (2009), p. 37. issn : 1477-7819(Electronic). doi : 10.1186/1477-7819-7-37.

Palmer, Julie R et al. « Parity, lactation, and breast cancer subtypes in African American women :results from the AMBER Consortium. » eng. In : Journal of the National Cancer Institute 106.10(2014). issn : 1460-2105 (Electronic). doi : 10.1093/jnci/dju237.

Pan, Hong et al. « Reproductive factors and breast cancer risk among BRCA1 or BRCA2 mutationcarriers : results from ten studies. » eng. In : Cancer epidemiology 38.1 (2014), p. 1-8. issn : 1877-783X (Electronic). doi : 10.1016/j.canep.2013.11.004.

87

Phipps, Amanda I et al. « Reproductive and hormonal risk factors for postmenopausal luminal, HER-2-overexpressing, and triple-negative breast cancer. » eng. In : Cancer 113.7 (2008), p. 1521-1526.issn : 0008-543X (Print). doi : 10.1002/cncr.23786.

Phipps, Amanda I et al. « Reproductive history and oral contraceptive use in relation to risk of triple-negative breast cancer. » eng. In : Journal of the National Cancer Institute 103.6 (2011), p. 470-477.issn : 1460-2105 (Electronic). doi : 10.1093/jnci/djr030.

Rice, Megan S et al. « Breast cancer risk prediction : an update to the Rosner-Colditz breast cancerincidence model. » eng. In : Breast cancer research and treatment 166.1 (2017), p. 227-240. issn :1573-7217 (Electronic). doi : 10.1007/s10549-017-4391-5.

Ritte, Rebecca et al. « Reproductive factors and risk of hormone receptor positive and negative breastcancer : a cohort study. » eng. In : BMC cancer 13 (2013), p. 584. issn : 1471-2407 (Electronic).doi : 10.1186/1471-2407-13-584.

Sarmento de Almeida, Gibran et al. « Reproductive risk factors differ among breast cancer patientsand controls in a public hospital of Paraiba, northeast Brazil. » eng. In : Asian Pacific journalof cancer prevention : APJCP 16.7 (2015), p. 2959-2965. issn : 2476-762X (Electronic). doi :10.7314/apjcp.2015.16.7.2959.

Shantakumar, Sumitra et al. « Reproductive factors and breast cancer risk among older women. »eng. In : Breast cancer research and treatment 102.3 (2007), p. 365-374. issn : 0167-6806 (Print).doi : 10.1007/s10549-006-9343-4.

Shema, Lilach et al. « The association between breastfeeding and breast cancer occurrence amongIsraeli Jewish women : a case control study. » eng. In : Journal of cancer research and clinicaloncology 133.8 (2007), p. 539-546. issn : 0171-5216 (Print). doi : 10.1007/s00432-007-0199-8.

Shinde, Shivani S et al. « Higher parity and shorter breastfeeding duration : association with triple-negative phenotype of breast cancer. » eng. In : Cancer 116.21 (2010), p. 4933-4943. issn : 0008-543X (Print). doi : 10.1002/cncr.25443.

Stendell-Hollis, Nicole R et al. « Investigating the association of lactation history and postmeno-pausal breast cancer risk in the Women’s Health Initiative. » eng. In : Nutrition and cancer 65.7(2013), p. 969-981. issn : 1532-7914 (Electronic). doi : 10.1080/01635581.2013.815787.

Stuebe, Alison M et al. « Lactation and incidence of premenopausal breast cancer : a longitudinalstudy. » eng. In : Archives of internal medicine 169.15 (2009), p. 1364-1371. issn : 1538-3679(Electronic). doi : 10.1001/archinternmed.2009.231.

Sugawara, Yumi et al. « Lactation pattern and the risk for hormone-related female cancer in Japan :the Ohsaki Cohort Study. » eng. In : European journal of cancer prevention : the official journalof the European Cancer Prevention Organisation (ECP) 22.2 (2013), p. 187-192. issn : 1473-5709(Electronic). doi : 10.1097/CEJ.0b013e3283564610.

Tao, Ping et al. « [Risk factors of breast cancer in Asian women : a meta-analysis]. » chi. In : Zhonghualiu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi 32.2 (2011), p. 164-169. issn : 0254-6450(Print).

Tehranian, Najmeh et al. « Risk factors for breast cancer in Iranian women aged less than 40 years. »eng. In : Asian Pacific journal of cancer prevention : APJCP 11.6 (2010), p. 1723-1725. issn : 2476-762X (Electronic).

Toss, Angela et al. « The impact of reproductive life on breast cancer risk in women with family historyor BRCA mutation. » eng. In : Oncotarget 8.6 (2017), p. 9144-9154. issn : 1949-2553 (Electronic).doi : 10.18632/oncotarget.13423.

Unar-Munguía, Mishel et al. « Breastfeeding Mode and Risk of Breast Cancer : A Dose-ResponseMeta-Analysis. » eng. In : Journal of human lactation : official journal of International LactationConsultant Association 33.2 (2017), p. 422-434. issn : 1552-5732 (Electronic). doi : 10.1177/0890334416683676.

Warner, Erica T et al. « Reproductive factors and risk of premenopausal breast cancer by age atdiagnosis : are there differences before and after age 40 ? » eng. In : Breast cancer research andtreatment 142.1 (2013), p. 165-175. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-013-2721-9.

88

Xing, Peng, Jiguang Li et Feng Jin. « A case-control study of reproductive factors associated withsubtypes of breast cancer in Northeast China. » eng. In : Medical oncology (Northwood, London,England) 27.3 (2010), p. 926-931. issn : 1559-131X (Electronic). doi : 10.1007/s12032-009-9308-7.

Yanhua, Che et al. « Reproductive variables and risk of breast malignant and benign tumours inYunnan province, China. » eng. In : Asian Pacific journal of cancer prevention : APJCP 13.5(2012), p. 2179-2184. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2012.13.5.2179.

Zhou, Ying et al. « Association between breastfeeding and breast cancer risk : evidence from a meta-analysis. » eng. In : Breastfeeding medicine : the official journal of the Academy of BreastfeedingMedicine 10.3 (2015), p. 175-182. issn : 1556-8342 (Electronic). doi : 10.1089/bfm.2014.0141.

Oral contraceptive

Anothaisintawee, Thunyarat et al. « Risk factors of breast cancer : a systematic review and meta-analysis. » eng. In : Asia-Pacific journal of public health 25.5 (2013), p. 368-387. issn : 1941-2479(Electronic). doi : 10.1177/1010539513488795.

Beaber, Elisabeth F et al. « Oral contraceptives and breast cancer risk overall and by molecularsubtype among young women. » eng. In : Cancer epidemiology, biomarkers & prevention : a publi-cation of the American Association for Cancer Research, cosponsored by the American Society ofPreventive Oncology 23.5 (2014), p. 755-764. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-13-0944.

Beaber, Elisabeth F et al. « Recent oral contraceptive use by formulation and breast cancer riskamong women 20 to 49 years of age. » eng. In : Cancer research 74.15 (2014), p. 4078-4089. issn :1538-7445 (Electronic). doi : 10.1158/0008-5472.CAN-13-3400.

Beji, N K et N Reis. « Risk factors for breast cancer in Turkish women : a hospital-based case-controlstudy. » eng. In : European journal of cancer care 16.2 (2007), p. 178-184. issn : 0961-5423 (Print).doi : 10.1111/j.1365-2354.2006.00711.x.

Bernholtz, Shiri et al. « Cancer risk in Jewish BRCA1 and BRCA2 mutation carriers : effects of oralcontraceptive use and parental origin of mutation. » eng. In : Breast cancer research and treatment129.2 (2011), p. 557-563. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-011-1509-z.

Bidgoli, Sepideh Arbabi, Tara Eftekhari et Reza Sadeghipour. « Role of xenoestrogens andendogenous sources of estrogens on the occurrence of premenopausal breast cancer in Iran. » eng.In : Asian Pacific journal of cancer prevention : APJCP 12.9 (2011), p. 2425-2430. issn : 2476-762X(Electronic).

Braganza, Melissa Z et al. « Benign breast and gynecologic conditions, reproductive and hormonalfactors, and risk of thyroid cancer. » eng. In : Cancer prevention research (Philadelphia, Pa.) 7.4(2014), p. 418-425. issn : 1940-6215 (Electronic). doi : 10.1158/1940-6207.CAPR-13-0367.

Brohet, Richard M et al. « Oral contraceptives and breast cancer risk in the international BRCA1/2carrier cohort study : a report from EMBRACE, GENEPSO, GEO-HEBON, and the IBCCS Col-laborating Group. » eng. In : Journal of clinical oncology : official journal of the American Societyof Clinical Oncology 25.25 (2007), p. 3831-3836. issn : 1527-7755 (Electronic). doi : 10.1200/JCO.2007.11.1179.

Cauchi, John Paul, Liberato Camilleri et Christian Scerri. « Environmental and lifestyle riskfactors of breast cancer in Malta-a retrospective case-control study. » eng. In : The EPMA journal7.1 (2016), p. 20. issn : 1878-5077 (Print). doi : 10.1186/s13167-016-0069-z.

Chollet-Hinton, Lynn et al. « Biology and Etiology of Young-Onset Breast Cancers among Pre-menopausal African American Women : Results from the AMBER Consortium. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for Cancer Re-search, cosponsored by the American Society of Preventive Oncology 26.12 (2017), p. 1722-1729.issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-17-0450.

Cook, Linda S et al. « Hormone contraception before the first birth and endometrial cancer risk. » eng.In : Cancer epidemiology, biomarkers & prevention : a publication of the American Association for

89

Cancer Research, cosponsored by the American Society of Preventive Oncology 23.2 (2014), p. 356-361. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-13-0943.

Delort, Laetitia et al. « Risk factors for early age at breast cancer onset–the "COSA program"population-based study. » eng. In : Anticancer research 27.2 (2007), p. 1087-1094. issn : 0250-7005(Print).


Elebro, Karin et al. « Age at first childbirth and oral contraceptive use are associated with risk ofandrogen receptor-negative breast cancer : the Malmö Diet and Cancer Cohort. » eng. In : Cancercauses & control : CCC 25.8 (2014), p. 945-957. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-014-0394-2.


Ferris, J S et al. « Oral contraceptive and reproductive risk factors for ovarian cancer within sistersin the breast cancer family registry. » eng. In : British journal of cancer 110.4 (2014), p. 1074-1080.issn : 1532-1827 (Electronic). doi : 10.1038/bjc.2013.803.

Figueiredo, Jane C et al. « Oral contraceptives and postmenopausal hormones and risk of contrala-teral breast cancer among BRCA1 and BRCA2 mutation carriers and noncarriers : the WECAREStudy. » eng. In : Breast cancer research and treatment 120.1 (2010), p. 175-183. issn : 1573-7217(Electronic). doi : 10.1007/s10549-009-0455-5.

Figueiredo, Jane C et al. « Oral contraceptives, postmenopausal hormones, and risk of asynchronousbilateral breast cancer : the WECARE Study Group. » eng. In : Journal of clinical oncology : officialjournal of the American Society of Clinical Oncology 26.9 (2008), p. 1411-1418. issn : 1527-7755(Electronic). doi : 10.1200/JCO.2007.14.3081.

Folger, Suzanne G et al. « Risk of breast cancer associated with short-term use of oral contracep-tives. » eng. In : Cancer causes & control : CCC 18.2 (2007), p. 189-198. issn : 0957-5243 (Print).doi : 10.1007/s10552-006-0086-7.

Ghiasvand, Reza et al. « Risk factors for breast cancer among young women in southern Iran. » eng.In : International journal of cancer 129.6 (2011), p. 1443-1449. issn : 1097-0215 (Electronic). doi :10.1002/ijc.25748.

Gill, Jasmeet K et al. « Oral contraceptive use and risk of breast carcinoma in situ (United States). »eng. In : Cancer causes & control : CCC 17.9 (2006), p. 1155-1162. issn : 0957-5243 (Print). doi :10.1007/s10552-006-0056-0.

Haile, Robert W et al. « BRCA1 and BRCA2 mutation carriers, oral contraceptive use, and breastcancer before age 50. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication ofthe American Association for Cancer Research, cosponsored by the American Society of PreventiveOncology 15.10 (2006), p. 1863-1870. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-06-0258.

Hosseinzadeh, Mina et al. « Risk factors for breast cancer in Iranian women : a hospital-based case-control study in tabriz, iran. » eng. In : Journal of breast cancer 17.3 (2014), p. 236-243. issn :1738-6756 (Print). doi : 10.4048/jbc.2014.17.3.236.

Hunter, David J et al. « Oral contraceptive use and breast cancer : a prospective study of youngwomen. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the AmericanAssociation for Cancer Research, cosponsored by the American Society of Preventive Oncology 19.10(2010), p. 2496-2502. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-10-0747.

Ichida, Miho et al. « No increase in breast cancer risk in Japanese women taking oral contraceptives : acase-control study investigating reproductive, menstrual and familial risk factors for breast cancer. »eng. In : Asian Pacific journal of cancer prevention : APJCP 16.9 (2015), p. 3685-3690. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2015.16.9.3685.

90

Iodice, S et al. « Oral contraceptive use and breast or ovarian cancer risk in BRCA1/2 carriers :a meta-analysis. » eng. In : European journal of cancer (Oxford, England : 1990) 46.12 (2010),p. 2275-2284. issn : 1879-0852 (Electronic). doi : 10.1016/j.ejca.2010.04.018.

Kahlenborn, Chris et al. « Oral contraceptive use as a risk factor for premenopausal breast cancer :a meta-analysis. » eng. In : Mayo Clinic proceedings 81.10 (2006), p. 1290-1302. issn : 0025-6196(Print). doi : 10.4065/81.10.1290.

Karim, Syed Mustafa et al. « Oral contraceptives, abortion and breast cancer risk : a case controlstudy in Saudi Arabia. » eng. In : Asian Pacific journal of cancer prevention : APJCP 16.9 (2015),p. 3957-3960. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2015.16.9.3957.

Kotsopoulos, Joanne et al. « Timing of oral contraceptive use and the risk of breast cancer in BRCA1mutation carriers. » eng. In : Breast cancer research and treatment 143.3 (2014), p. 579-586. issn :1573-7217 (Electronic). doi : 10.1007/s10549-013-2823-4.

Kwan, Marilyn L et al. « Epidemiology of breast cancer subtypes in two prospective cohort studies ofbreast cancer survivors. » eng. In : Breast cancer research : BCR 11.3 (2009), R31. issn : 1465-542X(Electronic). doi : 10.1186/bcr2261.

Lee, Eunjung et al. « Effect of reproductive factors and oral contraceptives on breast cancer riskin BRCA1/2 mutation carriers and noncarriers : results from a population-based study. » eng.In : Cancer epidemiology, biomarkers & prevention : a publication of the American Associationfor Cancer Research, cosponsored by the American Society of Preventive Oncology 17.11 (2008),p. 3170-3178. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-08-0396.

Lodha, R et al. « Association between reproductive factors and breast cancer in an urban set up atcentral India : a case-control study. » eng. In : Indian journal of cancer 48.3 (2011), p. 303-307.issn : 1998-4774 (Electronic). doi : 10.4103/0019-509X.84928.

Lodha, Rama S et al. « Risk factors for breast cancer among women in Bhopal urban agglomerate :a case-control study. » eng. In : Asian Pacific journal of cancer prevention : APJCP 12.8 (2011),p. 2111-2115. issn : 2476-762X (Electronic).

Lumachi, Franco et al. « Estrogen therapy and risk of breast cancer in postmenopausal women : acase-control study and results of a multivariate analysis. » eng. In : Menopause (New York, N.Y.)17.3 (2010), p. 524-528. issn : 1530-0374 (Electronic). doi : 10.1097/gme.0b013e3181ca0c74.

Lund, Eiliv et al. « Hormone replacement therapy and breast cancer in former users of oral contraceptives–The Norwegian Women and Cancer study. » eng. In : International journal of cancer 121.3 (2007),p. 645-648. issn : 0020-7136 (Print). doi : 10.1002/ijc.22699.

Matalqah, Laila et al. « Predictors of breast cancer among women in a northern state of Malaysia :a matched case-control study. » eng. In : Asian Pacific journal of cancer prevention : APJCP 12.6(2011), p. 1549-1553. issn : 2476-762X (Electronic).

Nichols, Hazel B et al. « Oral contraceptive use and risk of breast carcinoma in situ. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for CancerResearch, cosponsored by the American Society of Preventive Oncology 16.11 (2007), p. 2262-2268.issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-07-0456.

Nindrea, Ricvan Dana, Teguh Aryandono et Lutfan Lazuardi. « Breast Cancer Risk From Mo-difiable and Non-Modifiable Risk Factors among Women in Southeast Asia : A Meta-Analysis. »eng. In : Asian Pacific journal of cancer prevention : APJCP 18.12 (2017), p. 3201-3206. issn :2476-762X (Electronic). doi : 10.22034/APJCP.2017.18.12.3201.

Nyante, Sarah J et al. « The association between oral contraceptive use and lobular and ductal breastcancer in young women. » eng. In : International journal of cancer 122.4 (2008), p. 936-941. issn :1097-0215 (Electronic). doi : 10.1002/ijc.23163.

Ozmen, Vahit et al. « Breast cancer risk factors in Turkish women–a University Hospital based nestedcase control study. » eng. In : World journal of surgical oncology 7 (2009), p. 37. issn : 1477-7819(Electronic). doi : 10.1186/1477-7819-7-37.

Phillips, Lynette S et al. « Reproductive and hormonal risk factors for ductal carcinoma in situ of thebreast. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the American

91

Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 18.5(2009), p. 1507-1514. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-08-0967.


Poosari, Arisara et al. « Hormonal contraceptive use and breast cancer in Thai women. » eng. In :Journal of epidemiology 24.3 (2014), p. 216-220. issn : 1349-9092 (Electronic). doi : 10.2188/jea.je20130121.

Rejali, Lokman, Mohd Hasni Jaafar et Noor Hassim Ismail. « Serum selenium level and otherrisk factors for breast cancer among patients in a Malaysian hospital. » eng. In : Environmentalhealth and preventive medicine 12.3 (2007), p. 105-110. issn : 1342-078X (Print). doi : 10.1007/BF02898024.

Ritte, Rebecca et al. « Reproductive factors and risk of hormone receptor positive and negative breastcancer : a cohort study. » eng. In : BMC cancer 13 (2013), p. 584. issn : 1471-2407 (Electronic).doi : 10.1186/1471-2407-13-584.

Rosenberg, Lynn et al. « A case-control study of oral contraceptive use and incident breast cancer. »eng. In : American journal of epidemiology 169.4 (2009), p. 473-479. issn : 1476-6256 (Electronic).doi : 10.1093/aje/kwn360.

Rosenberg, Lynn et al. « Oral contraceptive use and estrogen/progesterone receptor-negative breastcancer among African American women. » eng. In : Cancer epidemiology, biomarkers & preven-tion : a publication of the American Association for Cancer Research, cosponsored by the AmericanSociety of Preventive Oncology 19.8 (2010), p. 2073-2079. issn : 1538-7755 (Electronic). doi :10.1158/1055-9965.EPI-10-0428.

Shantakumar, Sumitra et al. « Age and menopausal effects of hormonal birth control and hormonereplacement therapy in relation to breast cancer risk. » eng. In : American journal of epidemiology165.10 (2007), p. 1187-1198. issn : 0002-9262 (Print). doi : 10.1093/aje/kwm006.

Sofi, Nighat Y et al. « Reproductive factors, nutritional status and serum 25(OH)D levels in womenwith breast cancer : A case control study. » eng. In : The Journal of steroid biochemistry andmolecular biology 175 (2018), p. 200-204. issn : 1879-1220 (Electronic). doi : 10.1016/j.jsbmb.2017.11.003.

Sweeney, Carol et al. « Oral, injected and implanted contraceptives and breast cancer risk amongU.S. Hispanic and non-Hispanic white women. » eng. In : International journal of cancer 121.11(2007), p. 2517-2523. issn : 1097-0215 (Electronic). doi : 10.1002/ijc.22970.

Tazhibi, Mehdi et al. « Hormonal and reproductive risk factors associated with breast cancer in Isfahanpatients. » eng. In : Journal of education and health promotion 3 (2014), p. 69. issn : 2277-9531(Print). doi : 10.4103/2277-9531.134818.


Thorbjarnardottir, Thuridur et al. « Oral contraceptives, hormone replacement therapy and breastcancer risk : a cohort study of 16 928 women 48 years and older. » eng. In : Acta oncologica (Stock-holm, Sweden) 53.6 (2014), p. 752-758. issn : 1651-226X (Electronic). doi : 10.3109/0284186X.2013.878471.


Trivers, Katrina F et al. « Oral contraceptives and survival in breast cancer patients aged 20 to 54years. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the AmericanAssociation for Cancer Research, cosponsored by the American Society of Preventive Oncology 16.9(2007), p. 1822-1827. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-07-0053.

92

Veneroso, Carmela, Robert Siegel et Paul H Levine. « Early age at first childbirth associated withadvanced tumor grade in breast cancer. » eng. In : Cancer detection and prevention 32.3 (2008),p. 215-223. issn : 1525-1500 (Electronic). doi : 10.1016/j.cdp.2008.04.001.

Vessey, M et R Painter. « Oral contraceptive use and cancer. Findings in a large cohort study,1968-2004. » eng. In : British journal of cancer 95.3 (2006), p. 385-389. issn : 0007-0920 (Print).doi : 10.1038/sj.bjc.6603260.

Williams, Lindsay A et al. « Reproductive risk factor associations with lobular and ductal carcinomain the Carolina Breast Cancer Study. » eng. In : Cancer causes & control : CCC 29.1 (2018),p. 25-32. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-017-0977-9.

Work, M E et al. « Reproductive risk factors and oestrogen/progesterone receptor-negative breastcancer in the Breast Cancer Family Registry. » eng. In : British journal of cancer 110.5 (2014),p. 1367-1377. issn : 1532-1827 (Electronic). doi : 10.1038/bjc.2013.807.

Work, Meghan E et al. « Risk factors for uncommon histologic subtypes of breast cancer usingcentralized pathology review in the Breast Cancer Family Registry. » eng. In : Breast cancer researchand treatment 134.3 (2012), p. 1209-1220. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-012-2056-y.

Xu, Ya-Li et al. « A case-control study on risk factors of breast cancer in China. » eng. In : Archives ofmedical science : AMS 8.2 (2012), p. 303-309. issn : 1896-9151 (Electronic). doi : 10.5114/aoms.2012.28558.

Age at menarche and at menopause

Adami, H O et al. « Absence of association between reproductive variables and the risk of breastcancer in young women in Sweden and Norway. » eng. In : British journal of cancer 62.1 (1990),p. 122-126. issn : 0007-0920 (Print). doi : 10.1038/bjc.1990.242.

Ahmed, Kawsar et al. « Association Assessment among Risk Factors and Breast Cancer in a LowIncome Country : Bangladesh. » eng. In : Asian Pacific journal of cancer prevention : APJCP 16.17(2015), p. 7507-7512. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2015.16.17.7507.

Alexander, F E, M M Roberts et A Huggins. « Risk factors for breast cancer with applications toselection for the prevalence screen. » eng. In : Journal of epidemiology and community health 41.2(1987), p. 101-106. issn : 0143-005X (Print). doi : 10.1136/jech.41.2.101.

Alsaker, Mirjam D K et al. « The association of reproductive factors and breastfeeding with longterm survival from breast cancer. » eng. In : Breast cancer research and treatment 130.1 (2011),p. 175-182. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-011-1566-3.

Ambrosone, Christine B et al. « Associations between estrogen receptor-negative breast cancer andtiming of reproductive events differ between African American and European American women. »eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the American Associationfor Cancer Research, cosponsored by the American Society of Preventive Oncology 23.6 (2014),p. 1115-1120. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-14-0110.

Anderson, Kristin N, Richard B Schwab et Maria Elena Martinez. « Reproductive risk factors andbreast cancer subtypes : a review of the literature. » eng. In : Breast cancer research and treatment144.1 (2014), p. 1-10. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-014-2852-7.

Arthur, Rhonda et al. « Association between lifestyle, menstrual/reproductive history, and histolo-gical factors and risk of breast cancer in women biopsied for benign breast disease. » eng. In :Breast cancer research and treatment 165.3 (2017), p. 623-631. issn : 1573-7217 (Electronic). doi :10.1007/s10549-017-4347-9.

Bakken, Kjersti et al. « Menopausal hormone therapy and breast cancer risk : impact of different treat-ments. The European Prospective Investigation into Cancer and Nutrition. » eng. In : Internationaljournal of cancer 128.1 (2011), p. 144-156. issn : 1097-0215 (Electronic). doi : 10.1002/ijc.25314.

Balekouzou, Augustin et al. « Reproductive risk factors associated with breast cancer in women inBangui : a case-control study. » eng. In : BMC women’s health 17.1 (2017), p. 14. issn : 1472-6874(Electronic). doi : 10.1186/s12905-017-0368-0.

93

Beasley, Jeannette M et al. « Alcohol and risk of breast cancer in Mexican women. » eng. In : Cancercauses & control : CCC 21.6 (2010), p. 863-870. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-010-9513-x.

Becher, Heiko, Silke Schmidt et Jenny Chang-Claude. « Reproductive factors and familial pre-disposition for breast cancer by age 50 years. A case-control-family study for assessing main effectsand possible gene-environment interaction. » eng. In : International journal of epidemiology 32.1(2003), p. 38-48. issn : 0300-5771 (Print). doi : 10.1093/ije/dyg003.

Beji, N K et N Reis. « Risk factors for breast cancer in Turkish women : a hospital-based case-controlstudy. » eng. In : European journal of cancer care 16.2 (2007), p. 178-184. issn : 0961-5423 (Print).doi : 10.1111/j.1365-2354.2006.00711.x.

Beral, Valerie et al. « Breast cancer risk in relation to the interval between menopause and startinghormone therapy. » eng. In : Journal of the National Cancer Institute 103.4 (2011), p. 296-305.issn : 1460-2105 (Electronic). doi : 10.1093/jnci/djq527.

Bergkvist, L et al. « The risk of breast cancer after estrogen and estrogen-progestin replacement. »eng. In : The New England journal of medicine 321.5 (1989), p. 293-297. issn : 0028-4793 (Print).doi : 10.1056/NEJM198908033210505.

Bertrand, Kimberly A et al. « Pubertal growth and adult height in relation to breast cancer risk inAfrican American women. » eng. In : International journal of cancer 141.12 (2017), p. 2462-2470.issn : 1097-0215 (Electronic). doi : 10.1002/ijc.31019.

Bodicoat, Danielle H et al. « Timing of pubertal stages and breast cancer risk : the BreakthroughGenerations Study. » eng. In : Breast cancer research : BCR 16.1 (2014), R18. issn : 1465-542X(Electronic). doi : 10.1186/bcr3613.

Braaten, Tonje et al. « Education and risk of breast cancer in the Norwegian-Swedish women’s lifestyleand health cohort study. » eng. In : International journal of cancer 110.4 (2004), p. 579-583. issn :0020-7136 (Print). doi : 10.1002/ijc.20141.

Brinton, L A, R Hoover et J F Jr Fraumeni. « Reproductive factors in the aetiology of breastcancer. » eng. In : British journal of cancer 47.6 (1983), p. 757-762. issn : 0007-0920 (Print). doi :10.1038/bjc.1983.128.

Brinton, L A et al. « Modification of oral contraceptive relationships on breast cancer risk by selectedfactors among younger women. » eng. In : Contraception 55.4 (1997), p. 197-203. issn : 0010-7824(Print). doi : 10.1016/s0010-7824(97)00012-7.

Brinton, Louise A et al. « Menopausal hormone therapy and breast cancer risk in the NIH-AARPDiet and Health Study Cohort. » eng. In : Cancer epidemiology, biomarkers & prevention : apublication of the American Association for Cancer Research, cosponsored by the American Societyof Preventive Oncology 17.11 (2008), p. 3150-3160. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-08-0435.

Brouckaert, Olivier et al. « Reproductive profiles and risk of breast cancer subtypes : a multi-centercase-only study. » eng. In : Breast cancer research : BCR 19.1 (2017), p. 119. issn : 1465-542X(Electronic). doi : 10.1186/s13058-017-0909-3.

Butt, Zeeshan et al. « Breast cancer risk factors : a comparison between pre-menopausal and post-menopausal women. » eng. In : JPMA. The Journal of the Pakistan Medical Association 62.2(2012), p. 120-124. issn : 0030-9982 (Print).

Calle, Eugenia E et al. « Overweight, obesity, and mortality from cancer in a prospectively studiedcohort of U.S. adults. » eng. In : The New England journal of medicine 348.17 (2003), p. 1625-1638.issn : 1533-4406 (Electronic). doi : 10.1056/NEJMoa021423.

Calle, Eugenia E et al. « The American Cancer Society Cancer Prevention Study II Nutrition Cohort :rationale, study design, and baseline characteristics. » eng. In : Cancer 94.9 (2002), p. 2490-2501.issn : 0008-543X (Print). doi : 10.1002/cncr.101970.

Cauley, J A et al. « Bone mineral density and risk of breast cancer in older women : the study ofosteoporotic fractures. Study of Osteoporotic Fractures Research Group. » eng. In : JAMA 276.17(1996), p. 1404-1408. issn : 0098-7484 (Print).

94

Chang-Claude, Jenny et al. « Age at menarche and menopause and breast cancer risk in the Interna-tional BRCA1/2 Carrier Cohort Study. » eng. In : Cancer epidemiology, biomarkers & prevention :a publication of the American Association for Cancer Research, cosponsored by the American So-ciety of Preventive Oncology 16.4 (2007), p. 740-746. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-06-0829.

Chaveepojnkamjorn, Wisit et al. « Body Mass Index and Breast Cancer Risk among Thai Preme-nopausal Women : a Case-Control Study. » eng. In : Asian Pacific journal of cancer prevention :APJCP 18.11 (2017), p. 3097-3101. issn : 2476-762X (Electronic). doi : 10.22034/APJCP.2017.18.11.3097.

Chen, Wendy Y et al. « Moderate alcohol consumption during adult life, drinking patterns, andbreast cancer risk. » eng. In : JAMA 306.17 (2011), p. 1884-1890. issn : 1538-3598 (Electronic).doi : 10.1001/jama.2011.1590.

Chie, W C et al. « Age at any full-term pregnancy and breast cancer risk. » eng. In : American journalof epidemiology 151.7 (2000), p. 715-722. issn : 0002-9262 (Print). doi : 10.1093/oxfordjournals.aje.a010266.

Cui, Yong et al. « Associations of hormone-related factors with breast cancer risk according to hormonereceptor status among white and African American women. » eng. In : Clinical breast cancer 14.6(2014), p. 417-425. issn : 1938-0666 (Electronic). doi : 10.1016/j.clbc.2014.04.003.

Daling, J R et al. « Risk of breast cancer among young women : relationship to induced abortion. »eng. In : Journal of the National Cancer Institute 86.21 (1994), p. 1584-1592. issn : 0027-8874(Print). doi : 10.1093/jnci/86.21.1584.


Dumeaux, Vanessa, Elin Alsaker et Eiliv Lund. « Breast cancer and specific types of oral contra-ceptives : a large Norwegian cohort study. » eng. In : International journal of cancer 105.6 (2003),p. 844-850. issn : 0020-7136 (Print). doi : 10.1002/ijc.11167.

Ernster, V L et al. « Benign and malignant breast disease : initial study results of serum and breastfluid analyses of endogenous estrogens. » eng. In : Journal of the National Cancer Institute 79.5(1987), p. 949-960. issn : 0027-8874 (Print).

Ewertz, M et S W Duffy. « Risk of breast cancer in relation to reproductive factors in Denmark. »eng. In : British journal of cancer 58.1 (1988), p. 99-104. issn : 0007-0920 (Print). doi : 10.1038/bjc.1988.172.

Fentiman, I S, D S Allen et G T H Ellison. « The impact of the Occupation of Guernsey 1940-1945on breast cancer risk factors and incidence. » eng. In : International journal of clinical practice 61.6(2007), p. 937-943. issn : 1368-5031 (Print). doi : 10.1111/j.1742-1241.2007.01288.x.

Flesch-Janys, Dieter et al. « Risk of different histological types of postmenopausal breast cancer bytype and regimen of menopausal hormone therapy. » eng. In : International journal of cancer 123.4(2008), p. 933-941. issn : 1097-0215 (Electronic). doi : 10.1002/ijc.23655.

Friedenreich, C M, H E Bryant et K S Courneya. « Case-control study of lifetime physical activityand breast cancer risk. » eng. In : American journal of epidemiology 154.4 (2001), p. 336-347. issn :0002-9262 (Print). doi : 10.1093/aje/154.4.336.

Ghiasvand, Reza et al. « Postmenopausal breast cancer in Iran ; risk factors and their populationattributable fractions. » eng. In : BMC cancer 12 (2012), p. 414. issn : 1471-2407 (Electronic).doi : 10.1186/1471-2407-12-414.

Graham, S et al. « Nutritional epidemiology of postmenopausal breast cancer in western New York. »eng. In : American journal of epidemiology 134.6 (1991), p. 552-566. issn : 0002-9262 (Print). doi :10.1093/oxfordjournals.aje.a116129.


95

Hamajima, N. et al. « Menarche, menopause, and breast cancer risk : individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. » eng. In :The Lancet Oncology 13.11 (2012), p. 1141-1151. issn : 1474-5488 (Electronic). doi : 10.1016/S1470-2045(12)70425-4. url : https://pubmed.ncbi.nlm.nih.gov/23084519/.

Han, Ding-fen et al. « [A case-control study on the risk of female breast cancer in Wuhan area]. » chi.In : Zhonghua liu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi 25.3 (2004), p. 256-260.issn : 0254-6450 (Print).

Hiatt, R A et al. « Exogenous estrogen and breast cancer after bilateral oophorectomy. » eng. In :Cancer 54.1 (1984), p. 139-144. issn : 0008-543X (Print). doi : 10.1002/1097-0142(19840701)54:1<139::aid-cncr2820540128>3.0.co;2-x.

Hislop, T G et al. « Relationship between risk factors for breast cancer and hormonal status. » eng.In : International journal of epidemiology 15.4 (1986), p. 469-476. issn : 0300-5771 (Print). doi :10.1093/ije/15.4.469.

Hopper, J L et al. « Design and analysis issues in a population-based, case-control-family study ofthe genetic epidemiology of breast cancer and the Co-operative Family Registry for Breast CancerStudies (CFRBCS). » eng. In : Journal of the National Cancer Institute. Monographs 26 (1999),p. 95-100. issn : 1052-6773 (Print). doi : 10.1093/oxfordjournals.jncimonographs.a024232.

Horn, Julie et al. « Reproductive factors and the risk of breast cancer in old age : a Norwegian cohortstudy. » eng. In : Breast cancer research and treatment 139.1 (2013), p. 237-243. issn : 1573-7217(Electronic). doi : 10.1007/s10549-013-2531-0.

Hu, Xiao-feng et al. « [Population-attributable risk estimates for breast cancer in Chinese females]. »chi. In : Zhonghua zhong liu za zhi [Chinese journal of oncology] 35.10 (2013), p. 796-800. issn :0253-3766 (Print).

Huo, D et al. « Parity and breastfeeding are protective against breast cancer in Nigerian women. »eng. In : British journal of cancer 98.5 (2008), p. 992-996. issn : 0007-0920 (Print). doi : 10.1038/sj.bjc.6604275.

Iqbal, Javaid et al. « Risk Factors for Premenopausal Breast Cancer in Bangladesh. » eng. In :International journal of breast cancer 2015 (2015), p. 612042. issn : 2090-3170 (Print). doi : 10.1155/2015/612042.

Islam, T et al. « Reproductive and hormonal risk factors for luminal, HER2-overexpressing, and triple-negative breast cancer in Japanese women. » eng. In : Annals of oncology : official journal of theEuropean Society for Medical Oncology 23.9 (2012), p. 2435-2441. issn : 1569-8041 (Electronic).doi : 10.1093/annonc/mdr613.

John, Esther M, Pamela L Horn-Ross et Jocelyn Koo. « Lifetime physical activity and breastcancer risk in a multiethnic population : the San Francisco Bay area breast cancer study. » eng.In : Cancer epidemiology, biomarkers & prevention : a publication of the American Association forCancer Research, cosponsored by the American Society of Preventive Oncology 12.11 Pt 1 (2003),p. 1143-1152. issn : 1055-9965 (Print).

Johnson, K C, J Hu et Y Mao. « Passive and active smoking and breast cancer risk in Canada, 1994-97. » eng. In : Cancer causes & control : CCC 11.3 (2000), p. 211-221. issn : 0957-5243 (Print).doi : 10.1023/a:1008906105790.

Jung, Keum Ji et al. « Duration of ovarian hormone exposure and gynecological cancer risk in Koreanwomen : the Korean Heart Study. » eng. In : Cancer epidemiology 41 (2016), p. 1-7. issn : 1877-783X(Electronic). doi : 10.1016/j.canep.2016.01.005.

Kabat, Geoffrey C et al. « Reproductive and menstrual factors and risk of ductal carcinoma in situ ofthe breast in a cohort of postmenopausal women. » eng. In : Cancer causes & control : CCC 22.10(2011), p. 1415-1424. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-011-9814-8.

Kawai, Masaaki et al. « Reproductive factors and breast cancer risk in relation to hormone receptorand menopausal status in Japanese women. » eng. In : Cancer science 103.10 (2012), p. 1861-1870.issn : 1349-7006 (Electronic). doi : 10.1111/j.1349-7006.2012.02379.x.

96

Kawai, Masaaki et al. « Reproductive factors, exogenous female hormone use and breast cancer riskin Japanese : the Miyagi Cohort Study. » eng. In : Cancer causes & control : CCC 21.1 (2010),p. 135-145. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-009-9443-7.

Key, T J et al. « Soya foods and breast cancer risk : a prospective study in Hiroshima and Nagasaki,Japan. » eng. In : British journal of cancer 81.7 (1999), p. 1248-1256. issn : 0007-0920 (Print).doi : 10.1038/sj.bjc.6690837.

Kirsh, Victoria et Nancy Kreiger. « Estrogen and estrogen-progestin replacement therapy and riskof postmenopausal breast cancer in Canada. » eng. In : Cancer causes & control : CCC 13.6 (2002),p. 583-590. issn : 0957-5243 (Print). doi : 10.1023/a:1016330024268.

Ko, Hyeonyoung et al. « Comparison of the association of mammographic density and clinical factorswith ductal carcinoma in situ versus invasive ductal breast cancer in Korean women. » eng. In :BMC cancer 17.1 (2017), p. 821. issn : 1471-2407 (Electronic). doi : 10.1186/s12885-017-3841-0.

Kotsopoulos, Joanne et al. « Oophorectomy after menopause and the risk of breast cancer in BRCA1and BRCA2 mutation carriers. » eng. In : Cancer epidemiology, biomarkers & prevention : a publi-cation of the American Association for Cancer Research, cosponsored by the American Society ofPreventive Oncology 21.7 (2012), p. 1089-1096. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-12-0201.


Lacey, James V Jr et al. « Breast cancer epidemiology according to recognized breast cancer riskfactors in the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial Cohort. »eng. In : BMC cancer 9 (2009), p. 84. issn : 1471-2407 (Electronic). doi : 10.1186/1471-2407-9-84.

Largent, Joan A et al. « Reproductive history and risk of second primary breast cancer : the WE-CARE study. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the Ame-rican Association for Cancer Research, cosponsored by the American Society of Preventive Oncology16.5 (2007), p. 906-911. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-06-1003.

Lazovich, D et al. « Induced abortion and breast cancer risk. » eng. In : Epidemiology (Cambridge,Mass.) 11.1 (2000), p. 76-80. issn : 1044-3983 (Print). doi : 10.1097/00001648-200001000-00016.

Lecarpentier, Julie et al. « Breast Cancer Risk Associated with Estrogen Exposure and Trunca-ting Mutation Location in BRCA1/2 Carriers. » eng. In : Cancer epidemiology, biomarkers &prevention : a publication of the American Association for Cancer Research, cosponsored by theAmerican Society of Preventive Oncology 24.4 (2015), p. 698-707. issn : 1538-7755 (Electronic).doi : 10.1158/1055-9965.EPI-14-0884.

Lee, N C et al. « A case-control study of breast cancer and hormonal contraception in Costa Rica. »eng. In : Journal of the National Cancer Institute 79.6 (1987), p. 1247-1254. issn : 0027-8874(Print).

Lee, Sulggi et al. « Postmenopausal hormone therapy and breast cancer risk : the Multiethnic Cohort. »eng. In : International journal of cancer 118.5 (2006), p. 1285-1291. issn : 0020-7136 (Print). doi :10.1002/ijc.21481.

Leon Guerrero, Rachael T et al. « Risk factors for breast cancer in the breast cancer risk modelstudy of Guam and Saipan. » eng. In : Cancer epidemiology 50.Pt B (2017), p. 221-233. issn :1877-783X (Electronic). doi : 10.1016/j.canep.2017.04.008.

Li, Christopher I, Alyson J Littman et Emily White. « Relationship between age maximum height isattained, age at menarche, and age at first full-term birth and breast cancer risk. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for CancerResearch, cosponsored by the American Society of Preventive Oncology 16.10 (2007), p. 2144-2149.issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-07-0242.

Li, Christopher I, Kathleen E Malone et Janet R Daling. « Interactions between body mass indexand hormone therapy and postmenopausal breast cancer risk (United States). » eng. In : Cancer

97

causes & control : CCC 17.5 (2006), p. 695-703. issn : 0957-5243 (Print). doi : 10.1007/s10552-005-0001-7.

Li, Hui et al. « BMI, reproductive factors, and breast cancer molecular subtypes : A case-control studyand meta-analysis. » eng. In : Journal of epidemiology 27.4 (2017), p. 143-151. issn : 1349-9092(Electronic). doi : 10.1016/j.je.2016.05.002.

Liu, Yan-Ting et al. « Physiological, reproductive factors and breast cancer risk in Jiangsu provinceof China. » eng. In : Asian Pacific journal of cancer prevention : APJCP 12.3 (2011), p. 787-790.issn : 2476-762X (Electronic).

Lumachi, F et al. « Breast cancer risk in healthy and symptomatic women : results of a multiva-riate analysis. A case-control study. » eng. In : Biomedicine & pharmacotherapy = Biomedecine& pharmacotherapie 56.8 (2002), p. 416-420. issn : 0753-3322 (Print). doi : 10.1016/s0753-3322(02)00252-4.

Ma, Huiyan et al. « Reproductive factors and breast cancer risk according to joint estrogen andprogesterone receptor status : a meta-analysis of epidemiological studies. » eng. In : Breast cancerresearch : BCR 8.4 (2006), R43. issn : 1465-542X (Electronic). doi : 10.1186/bcr1525.

Ma, Huiyan et al. « Reproductive factors and the risk of triple-negative breast cancer in white womenand African-American women : a pooled analysis. » eng. In : Breast cancer research : BCR 19.1(2017), p. 6. issn : 1465-542X (Electronic). doi : 10.1186/s13058-016-0799-9.

Magnusson, C et al. « Breast-cancer risk following long-term oestrogen- and oestrogen-progestin-replacement therapy. » eng. In : International journal of cancer 81.3 (1999), p. 339-344. issn :0020-7136 (Print). doi : 10.1002/(sici)1097-0215(19990505)81:3<339::aid-ijc5>3.0.co;2-6.

Magnusson, C M K et al. « Body fatness and physical activity at young ages and the risk of breastcancer in premenopausal women. » eng. In : British journal of cancer 93.7 (2005), p. 817-824. issn :0007-0920 (Print). doi : 10.1038/sj.bjc.6602758.

Marchbanks, Polly A et al. « The NICHD Women’s Contraceptive and Reproductive ExperiencesStudy : methods and operational results. » eng. In : Annals of epidemiology 12.4 (2002), p. 213-221.issn : 1047-2797 (Print). doi : 10.1016/s1047-2797(01)00274-5.

Marcus, P M et al. « Adolescent reproductive events and subsequent breast cancer risk. » eng. In :American journal of public health 89.8 (1999), p. 1244-1247. issn : 0090-0036 (Print). doi : 10.2105/ajph.89.8.1244.

McCredie, M et al. « Reproductive factors and breast cancer in New Zealand. » eng. In : Internationaljournal of cancer 76.2 (1998), p. 182-188. issn : 0020-7136 (Print). doi : 10.1002/(sici)1097-0215(19980413)76:2<182::aid-ijc3>3.0.co;2-t.

McCredie, M R et al. « Breast cancer in Australian women under the age of 40. » eng. In : Cancercauses & control : CCC 9.2 (1998), p. 189-198. issn : 0957-5243 (Print). doi : 10.1023/a:1008886328352.

Miller, A B et al. « Canadian National Breast Screening Study : 1. Breast cancer detection anddeath rates among women aged 40 to 49 years. » eng. In : CMAJ : Canadian Medical Associationjournal = journal de l’Association medicale canadienne 147.10 (1992), p. 1459-1476. issn : 0820-3946 (Print).

Mills, P K et al. « Prospective study of exogenous hormone use and breast cancer in Seventh-dayAdventists. » eng. In : Cancer 64.3 (1989), p. 591-597. issn : 0008-543X (Print). doi : 10.1002/1097-0142(19890801)64:3<591::aid-cncr2820640305>3.0.co;2-u.

Monninkhof, E M, Y T van der Schouw et P H Peeters. « Early age at menopause and breastcancer : are leaner women more protected ? A prospective analysis of the Dutch DOM cohort. »eng. In : Breast cancer research and treatment 55.3 (1999), p. 285-291. issn : 0167-6806 (Print).doi : 10.1023/a:1006277207963.

Morabia, A et al. « Relation of breast cancer with passive and active exposure to tobacco smoke. »eng. In : American journal of epidemiology 143.9 (1996), p. 918-928. issn : 0002-9262 (Print). doi :10.1093/oxfordjournals.aje.a008835.

98

Motie, Mohammad R et al. « Modifiable Risk of Breast Cancer in Northeast Iran : Hope for theFuture. A Case-Control Study. » eng. In : Breast care (Basel, Switzerland) 6.6 (2011), p. 453-456.issn : 1661-3791 (Print). doi : 10.1159/000335203.

Msolly, Awatef, Olfa Gharbi et Slim Ben Ahmed. « Impact of menstrual and reproductive factorson breast cancer risk in Tunisia : a case-control study. » eng. In : Medical oncology (Northwood,London, England) 30.1 (2013), p. 480. issn : 1559-131X (Electronic). doi : 10.1007/s12032-013-0480-4.

Newcomb, P A et M T Mandelson. « A record-based evaluation of induced abortion and breastcancer risk (United States). » eng. In : Cancer causes & control : CCC 11.9 (2000), p. 777-781.issn : 0957-5243 (Print). doi : 10.1023/a:1008980804706.

Newcomb, P A et al. « Pregnancy termination in relation to risk of breast cancer. » eng. In : JAMA275.4 (1996), p. 283-287. issn : 0098-7484 (Print).

Nomura, A M et al. « The association of replacement estrogens with breast cancer. » eng. In : In-ternational journal of cancer 37.1 (1986), p. 49-53. issn : 0020-7136 (Print). doi : 10.1002/ijc.2910370109.

Olsson, Håkan L, Christian Ingvar et Anna Bladström. « Hormone replacement therapy containingprogestins and given continuously increases breast carcinoma risk in Sweden. » eng. In : Cancer97.6 (2003), p. 1387-1392. issn : 0008-543X (Print). doi : 10.1002/cncr.11205.

Pan, Hong et al. « Reproductive factors and breast cancer risk among BRCA1 or BRCA2 mutationcarriers : results from ten studies. » eng. In : Cancer epidemiology 38.1 (2014), p. 1-8. issn : 1877-783X (Electronic). doi : 10.1016/j.canep.2013.11.004.

Parameshwari, P, K Muthukumar et H Gladius Jennifer. « A population based case control studyon breast cancer and the associated risk factors in a rural setting in kerala, southern India. » eng.In : Journal of clinical and diagnostic research : JCDR 7.9 (2013), p. 1913-1916. issn : 2249-782X(Print). doi : 10.7860/JCDR/2013/5830.3356.

Park, Sue Kyung et al. « Intrauterine environment and breast cancer risk in a population-based case-control study in Poland. » eng. In : International journal of cancer 119.9 (2006), p. 2136-2141.issn : 0020-7136 (Print). doi : 10.1002/ijc.21974.

Phipps, Amanda I et al. « Reproductive and hormonal risk factors for postmenopausal luminal, HER-2-overexpressing, and triple-negative breast cancer. » eng. In : Cancer 113.7 (2008), p. 1521-1526.issn : 0008-543X (Print). doi : 10.1002/cncr.23786.


Pike, M C et al. « Oral contraceptive use and early abortion as risk factors for breast cancer in youngwomen. » eng. In : British journal of cancer 43.1 (1981), p. 72-76. issn : 0007-0920 (Print). doi :10.1038/bjc.1981.10.

Primic-Zakelj, M et al. « Breast-cancer risk and oral contraceptive use in Slovenian women aged 25to 54. » eng. In : International journal of cancer 62.4 (1995), p. 414-420. issn : 0020-7136 (Print).doi : 10.1002/ijc.2910620410.

Rajbongshi, N et al. « A matched case control study of risk indicators of breast cancer in assam,India. » eng. In : Mymensingh medical journal : MMJ 24.2 (2015), p. 385-391. issn : 1022-4742(Print).

Rebbeck, T R et al. « Pharmacogenetic modulation of combined hormone replacement therapy byprogesterone-metabolism genotypes in postmenopausal breast cancer risk. » eng. In : Americanjournal of epidemiology 166.12 (2007), p. 1392-1399. issn : 1476-6256 (Electronic). doi : 10.1093/aje/kwm239.

Ritte, Rebecca et al. « Height, age at menarche and risk of hormone receptor-positive and -negativebreast cancer : a cohort study. » eng. In : International journal of cancer 132.11 (2013), p. 2619-2629. issn : 1097-0215 (Electronic). doi : 10.1002/ijc.27913.

99

Roddam, A W et al. « Active and passive smoking and the risk of breast cancer in women aged 36-45years : a population based case-control study in the UK. » eng. In : British journal of cancer 97.3(2007), p. 434-439. issn : 0007-0920 (Print). doi : 10.1038/sj.bjc.6603859.

Rohan, T E et A J McMichael. « Oral contraceptive agents and breast cancer : a population-basedcase-control study. » eng. In : The Medical journal of Australia 149.10 (1988), p. 520-526. issn :0025-729X (Print).

Ronco, Alvaro L, Eduardo De Stefani et Hugo Deneo-Pellegrini. « Risk factors for premeno-pausal breast cancer : a case-control study in Uruguay. » eng. In : Asian Pacific journal of cancerprevention : APJCP 13.6 (2012), p. 2879-2886. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2012.13.6.2879.

Rookus, M A et F E van Leeuwen. « Induced abortion and risk for breast cancer : reporting (recall)bias in a Dutch case-control study. » eng. In : Journal of the National Cancer Institute 88.23 (1996),p. 1759-1764. issn : 0027-8874 (Print). doi : 10.1093/jnci/88.23.1759.

Rosato, Valentina et al. « Reproductive and hormonal factors, family history, and breast canceraccording to the hormonal receptor status. » eng. In : European journal of cancer prevention : theofficial journal of the European Cancer Prevention Organisation (ECP) 23.5 (2014), p. 412-417.issn : 1473-5709 (Electronic). doi : 10.1097/CEJ.0b013e3283639f7a.

Rosenberg, L et al. « A case-control study of the risk of breast cancer in relation to oral contraceptiveuse. » eng. In : American journal of epidemiology 136.12 (1992), p. 1437-1444. issn : 0002-9262(Print). doi : 10.1093/oxfordjournals.aje.a116464.

Ross, R K et al. « A case-control study of menopausal estrogen therapy and breast cancer. » eng. In :JAMA 243.16 (1980), p. 1635-1639. issn : 0098-7484 (Print).

Ross, R K et al. « Effect of hormone replacement therapy on breast cancer risk : estrogen versusestrogen plus progestin. » eng. In : Journal of the National Cancer Institute 92.4 (2000), p. 328-332. issn : 0027-8874 (Print). doi : 10.1093/jnci/92.4.328.

Rossing, M A et al. « Oral contraceptive use and risk of breast cancer in middle-aged women. » eng.In : American journal of epidemiology 144.2 (1996), p. 161-164. issn : 0002-9262 (Print). doi :10.1093/oxfordjournals.aje.a008903.

Sanderson, M et al. « Abortion history and breast cancer risk : results from the Shanghai BreastCancer Study. » eng. In : International journal of cancer 92.6 (2001), p. 899-905. issn : 0020-7136(Print). doi : 10.1002/ijc.1263.

Sarmento de Almeida, Gibran et al. « Reproductive risk factors differ among breast cancer patientsand controls in a public hospital of Paraiba, northeast Brazil. » eng. In : Asian Pacific journalof cancer prevention : APJCP 16.7 (2015), p. 2959-2965. issn : 2476-762X (Electronic). doi :10.7314/apjcp.2015.16.7.2959.

Schairer, C et al. « Menopausal estrogen and estrogen-progestin replacement therapy and risk ofbreast cancer (United States). » eng. In : Cancer causes & control : CCC 5.6 (1994), p. 491-500.issn : 0957-5243 (Print). doi : 10.1007/BF01831376.

Schuurman, A G, P A van den Brandt et R A Goldbohm. « Exogenous hormone use and the riskof postmenopausal breast cancer : results from The Netherlands Cohort Study. » eng. In : Cancercauses & control : CCC 6.5 (1995), p. 416-424. issn : 0957-5243 (Print). doi : 10.1007/BF00052181.

Sepandi, Mojtaba et al. « Breast cancer risk factors in women participating in a breast screeningprogram : a study on 11,850 Iranian females. » eng. In : Asian Pacific journal of cancer prevention :APJCP 15.19 (2014), p. 8499-8502. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2014.15.19.8499.

Shantakumar, Sumitra et al. « Age and menopausal effects of hormonal birth control and hormonereplacement therapy in relation to breast cancer risk. » eng. In : American journal of epidemiology165.10 (2007), p. 1187-1198. issn : 0002-9262 (Print). doi : 10.1093/aje/kwm006.

Sharma, A et al. « Assessing the malignant potential of ovarian inclusion cysts in postmenopausalwomen within the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) : a prospectivecohort study. » eng. In : BJOG : an international journal of obstetrics and gynaecology 119.2 (2012),p. 207-219. issn : 1471-0528 (Electronic). doi : 10.1111/j.1471-0528.2011.03038.x.

100

Sighoko, Dominique et al. « Breast cancer in pre-menopausal women in West Africa : analysis oftemporal trends and evaluation of risk factors associated with reproductive life. » eng. In : Breast(Edinburgh, Scotland) 22.5 (2013), p. 828-835. issn : 1532-3080 (Electronic). doi : 10.1016/j.breast.2013.02.011.

Sigurdson, Alice J et al. « Cancer incidence in the US radiologic technologists health study, 1983-1998. » eng. In : Cancer 97.12 (2003), p. 3080-3089. issn : 0008-543X (Print). doi : 10.1002/cncr.11444.

Siskind, V et al. « Breast cancer and breastfeeding : results from an Australian case-control study. »eng. In : American journal of epidemiology 130.2 (1989), p. 229-236. issn : 0002-9262 (Print). doi :10.1093/oxfordjournals.aje.a115329.

Sisti, Julia S et al. « Reproductive factors, tumor estrogen receptor status and contralateral breastcancer risk : results from the WECARE study. » eng. In : SpringerPlus 4 (2015), p. 825. issn :2193-1801 (Print). doi : 10.1186/s40064-015-1642-y.

Stahlberg, Claudia et al. « Increased risk of breast cancer following different regimens of hormonereplacement therapy frequently used in Europe. » eng. In : International journal of cancer 109.5(2004), p. 721-727. issn : 0020-7136 (Print). doi : 10.1002/ijc.20016.

Suzuki, Reiko et al. « Body weight and postmenopausal breast cancer risk defined by estrogen andprogesterone receptor status among Swedish women : A prospective cohort study. » eng. In : In-ternational journal of cancer 119.7 (2006), p. 1683-1689. issn : 0020-7136 (Print). doi : 10.1002/ijc.22034.

Tao, Ping et al. « [Risk factors of breast cancer in Asian women : a meta-analysis]. » chi. In : Zhonghualiu xing bing xue za zhi = Zhonghua liuxingbingxue zazhi 32.2 (2011), p. 164-169. issn : 0254-6450(Print).


Thomas, D B, J P Persing et W B Hutchinson. « Exogenous estrogens and other risk factors forbreast cancer in women with benign breast diseases. » eng. In : Journal of the National CancerInstitute 69.5 (1982), p. 1017-1025. issn : 0027-8874 (Print).


Trentham-Dietz, A et al. « Weight change and risk of postmenopausal breast cancer (UnitedStates). » eng. In : Cancer causes & control : CCC 11.6 (2000), p. 533-542. issn : 0957-5243(Print). doi : 10.1023/a:1008961931534.

Trieu, Phuong Dung Yun et al. « Risk Factors of Female Breast Cancer in Vietnam : A Case-ControlStudy. » eng. In : Cancer research and treatment : official journal of Korean Cancer Association49.4 (2017), p. 990-1000. issn : 2005-9256 (Electronic). doi : 10.4143/crt.2016.488.

Tryggvadóttir, L et al. « Breastfeeding and reduced risk of breast cancer in an Icelandic cohortstudy. » eng. In : American journal of epidemiology 154.1 (2001), p. 37-42. issn : 0002-9262 (Print).doi : 10.1093/aje/154.1.37.

Turkoz, Fatma P et al. « Association between common risk factors and molecular subtypes in breastcancer patients. » eng. In : Breast (Edinburgh, Scotland) 22.3 (2013), p. 344-350. issn : 1532-3080(Electronic). doi : 10.1016/j.breast.2012.08.005.

Ursin, G et al. « Oral contraceptives and premenopausal bilateral breast cancer : a case-control study. »eng. In : Epidemiology (Cambridge, Mass.) 3.5 (1992), p. 414-419. issn : 1044-3983 (Print). doi :10.1097/00001648-199209000-00006.

Ursin, G et al. « Use of oral contraceptives and risk of breast cancer in young women. » eng. In : Breastcancer research and treatment 50.2 (1998), p. 175-184. issn : 0167-6806 (Print). doi : 10.1023/a:1006037823178.

Viladiu, P et al. « A breast cancer case-control study in Girona, Spain. Endocrine, familial andlifestyle factors. » eng. In : European journal of cancer prevention : the official journal of the

101

European Cancer Prevention Organisation (ECP) 5.5 (1996), p. 329-335. issn : 0959-8278 (Print).doi : 10.1097/00008469-199610000-00004.

Wang, Q S et al. « A case-control study of breast cancer in Tianjin, China. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for CancerResearch, cosponsored by the American Society of Preventive Oncology 1.6 (1992), p. 435-439.issn : 1055-9965 (Print).

Warner, Erica T et al. « Reproductive factors and risk of premenopausal breast cancer by age atdiagnosis : are there differences before and after age 40 ? » eng. In : Breast cancer research andtreatment 142.1 (2013), p. 165-175. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-013-2721-9.

Weinstein, A L et al. « Breast cancer risk and oral contraceptive use : results from a large case-controlstudy. » eng. In : Epidemiology (Cambridge, Mass.) 2.5 (1991), p. 353-358. issn : 1044-3983 (Print).doi : 10.1097/00001648-199109000-00007.

Wingo, P A et al. « Age-specific differences in the relationship between oral contraceptive use andbreast cancer. » eng. In : Obstetrics and gynecology 78.2 (1991), p. 161-170. issn : 0029-7844 (Print).

Work, Meghan E et al. « Risk factors for uncommon histologic subtypes of breast cancer usingcentralized pathology review in the Breast Cancer Family Registry. » eng. In : Breast cancer researchand treatment 134.3 (2012), p. 1209-1220. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-012-2056-y.

Wrensch, M R et al. « Breast cancer risk in women with abnormal cytology in nipple aspirates ofbreast fluid. » eng. In : Journal of the National Cancer Institute 93.23 (2001), p. 1791-1798. issn :0027-8874 (Print). doi : 10.1093/jnci/93.23.1791.

Wu, A H et al. « Menstrual and reproductive factors and risk of breast cancer in Asian-Americans. »eng. In : British journal of cancer 73.5 (1996), p. 680-686. issn : 0007-0920 (Print). doi : 10.1038/bjc.1996.118.



Yang, C P et al. « Noncontraceptive hormone use and risk of breast cancer. » eng. In : Cancer causes& control : CCC 3.5 (1992), p. 475-479. issn : 0957-5243 (Print). doi : 10.1007/BF00051360.

Ye, Z et al. « Breast cancer in relation to induced abortions in a cohort of Chinese women. » eng. In :British journal of cancer 87.9 (2002), p. 977-981. issn : 0007-0920 (Print). doi : 10.1038/sj.bjc.6600603.

Yuan, J M et al. « Risk factors for breast cancer in Chinese women in Shanghai. » eng. In : Cancerresearch 48.7 (1988), p. 1949-1953. issn : 0008-5472 (Print).

Familial history of breast cancer

Ahern, Thomas P et al. « Family History of Breast Cancer, Breast Density, and Breast Cancer Riskin a U.S. Breast Cancer Screening Population. » eng. In : Cancer epidemiology, biomarkers &prevention : a publication of the American Association for Cancer Research, cosponsored by theAmerican Society of Preventive Oncology 26.6 (2017), p. 938-944. issn : 1538-7755 (Electronic).doi : 10.1158/1055-9965.EPI-16-0801.

Ahmed, Kawsar et al. « Association Assessment among Risk Factors and Breast Cancer in a LowIncome Country : Bangladesh. » eng. In : Asian Pacific journal of cancer prevention : APJCP 16.17(2015), p. 7507-7512. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2015.16.17.7507.

102

Anderson, Kristin et al. « Family history of breast and ovarian cancer and triple negative subtypein hispanic/latina women. » eng. In : SpringerPlus 3 (2014), p. 727. issn : 2193-1801 (Print). doi :10.1186/2193-1801-3-727.

Balekouzou, Augustin et al. « Behavioral risk factors of breast cancer in Bangui of Central AfricanRepublic : A retrospective case-control study. » eng. In : PloS one 12.2 (2017), e0171154. issn :1932-6203 (Electronic). doi : 10.1371/journal.pone.0171154.

Bethea, Traci N et al. « Family History of Cancer in Relation to Breast Cancer Subtypes in AfricanAmerican Women. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication ofthe American Association for Cancer Research, cosponsored by the American Society of PreventiveOncology 25.2 (2016), p. 366-373. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-15-1068.

Bevier, Melanie, Kristina Sundquist et Kari Hemminki. « Risk of breast cancer in families of mul-tiple affected women and men. » eng. In : Breast cancer research and treatment 132.2 (2012),p. 723-728. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-011-1915-2.

Biggar, Robert J, Andrew W Bergen et Gry N Poulsen. « Impact of x chromosome genes inexplaining the excess risk of cancer in males. » eng. In : American journal of epidemiology 170.1(2009), p. 65-71. issn : 1476-6256 (Electronic). doi : 10.1093/aje/kwp083.

Brewer, Hannah R et al. « Family history and risk of breast cancer : an analysis accounting forfamily structure. » eng. In : Breast cancer research and treatment 165.1 (2017), p. 193-200. issn :1573-7217 (Electronic). doi : 10.1007/s10549-017-4325-2.

Cerne, J Z et al. « Hormone replacement therapy and some risk factors for breast cancer among Slove-nian postmenopausal women. » eng. In : Climacteric : the journal of the International MenopauseSociety 14.4 (2011), p. 458-463. issn : 1473-0804 (Electronic). doi : 10.3109/13697137.2010.541307.

Colditz, Graham A et al. « Family history and risk of breast cancer : nurses’ health study. » eng.In : Breast cancer research and treatment 133.3 (2012), p. 1097-1104. issn : 1573-7217 (Electronic).doi : 10.1007/s10549-012-1985-9.

Crest, Anthony B et al. « Varying levels of family history of breast cancer in relation to mammographicbreast density (United States). » eng. In : Cancer causes & control : CCC 17.6 (2006), p. 843-850.issn : 0957-5243 (Print). doi : 10.1007/s10552-006-0026-6.


Gledo, Ibrahim, Nurka Pranjic et Subhija Parsko. « Quality of life factor as breast cancer risks. »eng. In : Materia socio-medica 24.3 (2012), p. 171-177. issn : 1512-7680 (Print). doi : 10.5455/msm.2012.24.171-177.

Gokdemir-Yazar, Ozden et al. « Family history attributes and risk factors for breast cancer inTurkey. » eng. In : Asian Pacific journal of cancer prevention : APJCP 15.6 (2014), p. 2841-2846.issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2014.15.6.2841.

Gorlova, Olga Y et al. « Aggregation of cancer among relatives of never-smoking lung cancer pa-tients. » eng. In : International journal of cancer 121.1 (2007), p. 111-118. issn : 0020-7136 (Print).doi : 10.1002/ijc.22615.


Hemminki, Kari et al. « Risk of familial breast cancer is not increased after pregnancy. » eng. In :Breast cancer research and treatment 108.3 (2008), p. 417-420. issn : 0167-6806 (Print). doi :10.1007/s10549-007-9611-y.

Kariri, Mueen et al. « Risk Factors for Breast Cancer in Gaza Strip, Palestine : a Case-ControlStudy. » eng. In : Clinical nutrition research 6.3 (2017), p. 161-171. issn : 2287-3732 (Print). doi :10.7762/cnr.2017.6.3.161.

103

Kazerouni, Neely et al. « Family history of breast cancer as a risk factor for ovarian cancer in aprospective study. » eng. In : Cancer 107.5 (2006), p. 1075-1083. issn : 0008-543X (Print). doi :10.1002/cncr.22082.

Kilfoy, Briseis A et al. « Family history of malignancies and risk of breast cancer : prospective datafrom the Shanghai women’s health study. » eng. In : Cancer causes & control : CCC 19.10 (2008),p. 1139-1145. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-008-9181-2.


Martin, Lisa J et al. « Family history, mammographic density, and risk of breast cancer. » eng.In : Cancer epidemiology, biomarkers & prevention : a publication of the American Associationfor Cancer Research, cosponsored by the American Society of Preventive Oncology 19.2 (2010),p. 456-463. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-09-0881.

Maskarinec, Gertraud et al. « Mammographic density and breast cancer risk by family history inwomen of white and Asian ancestry. » eng. In : Cancer causes & control : CCC 26.4 (2015), p. 621-626. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-015-0551-2.

Metcalfe, Kelly et al. « Family history of cancer and cancer risks in women with BRCA1 or BRCA2mutations. » eng. In : Journal of the National Cancer Institute 102.24 (2010), p. 1874-1878. issn :1460-2105 (Electronic). doi : 10.1093/jnci/djq443.

Naieni, Kourosh Holakouie et al. « Risk factors of breast cancer in north of Iran : a case-control inMazandaran Province. » eng. In : Asian Pacific journal of cancer prevention : APJCP 8.3 (2007),p. 395-398. issn : 1513-7368 (Print).

Nemesure, Barbara et al. « Risk factors for breast cancer in a black population–the Barbados NationalCancer Study. » eng. In : International journal of cancer 124.1 (2009), p. 174-179. issn : 1097-0215(Electronic). doi : 10.1002/ijc.23827.

Nielsen, Henriette Roed et al. « No evidence of increased breast cancer risk for proven noncarriersfrom BRCA1 and BRCA2 families. » eng. In : Familial cancer 15.4 (2016), p. 523-528. issn :1573-7292 (Electronic). doi : 10.1007/s10689-016-9898-0.

Nindrea, Ricvan Dana, Teguh Aryandono et Lutfan Lazuardi. « Breast Cancer Risk From Mo-difiable and Non-Modifiable Risk Factors among Women in Southeast Asia : A Meta-Analysis. »eng. In : Asian Pacific journal of cancer prevention : APJCP 18.12 (2017), p. 3201-3206. issn :2476-762X (Electronic). doi : 10.22034/APJCP.2017.18.12.3201.

Okobia, Michael et al. « Case-control study of risk factors for breast cancer in Nigerian women. »eng. In : International journal of cancer 119.9 (2006), p. 2179-2185. issn : 0020-7136 (Print). doi :10.1002/ijc.22102.

Palmer, Julie R et al. « Family history of cancer and risk of breast cancer in the Black Women’sHealth Study. » eng. In : Cancer causes & control : CCC 20.9 (2009), p. 1733-1737. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-009-9425-9.

Reiner, Anne S et al. « Risk of asynchronous contralateral breast cancer in noncarriers of BRCA1 andBRCA2 mutations with a family history of breast cancer : a report from theWomen’s EnvironmentalCancer and Radiation Epidemiology Study. » eng. In : Journal of clinical oncology : official journalof the American Society of Clinical Oncology 31.4 (2013), p. 433-439. issn : 1527-7755 (Electronic).doi : 10.1200/JCO.2012.43.2013.

Ronco, Alvaro L, Eduardo De Stefani et Hugo Deneo-Pellegrini. « Risk factors for premeno-pausal breast cancer : a case-control study in Uruguay. » eng. In : Asian Pacific journal of cancerprevention : APJCP 13.6 (2012), p. 2879-2886. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2012.13.6.2879.

Ronco, Alvaro L et al. « Food patterns and risk of breast cancer : A factor analysis study in Uruguay. »eng. In : International journal of cancer 119.7 (2006), p. 1672-1678. issn : 0020-7136 (Print). doi :10.1002/ijc.22021.

104

Shamsi, Uzma et al. « A multicenter matched case control study of breast cancer risk factors amongwomen in Karachi, Pakistan. » eng. In : Asian Pacific journal of cancer prevention : APJCP 14.1(2013), p. 183-188. issn : 2476-762X (Electronic). doi : 10.7314/apjcp.2013.14.1.183.

Shiyanbola, Oyewale O et al. « Emerging Trends in Family History of Breast Cancer and AssociatedRisk. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication of the AmericanAssociation for Cancer Research, cosponsored by the American Society of Preventive Oncology 26.12(2017), p. 1753-1760. issn : 1538-7755 (Electronic). doi : 10.1158/1055-9965.EPI-17-0531.

Trentham-Dietz, Amy et al. « Breast cancer risk factors and second primary malignancies amongwomen with breast cancer. » eng. In : Breast cancer research and treatment 105.2 (2007), p. 195-207.issn : 0167-6806 (Print). doi : 10.1007/s10549-006-9446-y.

Tse, Lap Ah et al. « Familial risks and estrogen receptor-positive breast cancer in Hong Kong Chinesewomen. » eng. In : PloS one 10.3 (2015), e0120741. issn : 1932-6203 (Electronic). doi : 10.1371/journal.pone.0120741.

Walker, Meghan J et al. « Impact of familial risk and mammography screening on prognostic in-dicators of breast disease among women from the Ontario site of the Breast Cancer Family Re-gistry. » eng. In : Familial cancer 13.2 (2014), p. 163-172. issn : 1573-7292 (Electronic). doi :10.1007/s10689-013-9689-9.

Welsh, Megan L et al. « Population-based estimates of the relation between breast cancer risk, tumorsubtype, and family history. » eng. In : Breast cancer research and treatment 114.3 (2009), p. 549-558. issn : 1573-7217 (Electronic). doi : 10.1007/s10549-008-0026-1.



Zhou, Wen-Bin et al. « The influence of family history and histological stratification on breast cancerrisk in women with benign breast disease : a meta-analysis. » eng. In : Journal of cancer research andclinical oncology 137.7 (2011), p. 1053-1060. issn : 1432-1335 (Electronic). doi : 10.1007/s00432-011-0979-z.

Physical activity

Bardia, Aditya et al. « Recreational physical activity and risk of postmenopausal breast cancer basedon hormone receptor status. » eng. In : Archives of internal medicine 166.22 (2006), p. 2478-2483.issn : 0003-9926 (Print). doi : 10.1001/archinte.166.22.2478.

Breslow, R A et al. « Long-term recreational physical activity and breast cancer in the NationalHealth and Nutrition Examination Survey I epidemiologic follow-up study. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for CancerResearch, cosponsored by the American Society of Preventive Oncology 10.7 (2001), p. 805-808.issn : 1055-9965 (Print).

Cerhan, J R et al. « Physical activity, physical function, and the risk of breast cancer in a prospectivestudy among elderly women. » eng. In : The journals of gerontology. Series A, Biological sciencesand medical sciences 53.4 (1998), p. M251-6. issn : 1079-5006 (Print). doi : 10.1093/gerona/53a.4.m251.

Chang, Shih-Chen et al. « Association of energy intake and energy balance with postmenopausalbreast cancer in the prostate, lung, colorectal, and ovarian cancer screening trial. » eng. In : Cancerepidemiology, biomarkers & prevention : a publication of the American Association for CancerResearch, cosponsored by the American Society of Preventive Oncology 15.2 (2006), p. 334-341.issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-05-0479.

105

Dallal, Cher M et al. « Long-term recreational physical activity and risk of invasive and in situbreast cancer : the California teachers study. » eng. In : Archives of internal medicine 167.4 (2007),p. 408-415. issn : 0003-9926 (Print). doi : 10.1001/archinte.167.4.408.

Dorgan, J F et al. « Physical activity and risk of breast cancer in the Framingham Heart Study. »eng. In : American journal of epidemiology 139.7 (1994), p. 662-669. issn : 0002-9262 (Print). doi :10.1093/oxfordjournals.aje.a117056.

Eliassen, A Heather et al. « Physical activity and risk of breast cancer among postmenopausal wo-men. » eng. In : Archives of internal medicine 170.19 (2010), p. 1758-1764. issn : 1538-3679 (Elec-tronic). doi : 10.1001/archinternmed.2010.363.

Howard, Regan A et al. « Physical activity and breast cancer risk among pre- and postmenopausalwomen in the U.S. Radiologic Technologists cohort. » eng. In : Cancer causes & control : CCC 20.3(2009), p. 323-333. issn : 1573-7225 (Electronic). doi : 10.1007/s10552-008-9246-2.

Lee, I M et al. « Physical activity and breast cancer risk : the Women’s Health Study (United States). »eng. In : Cancer causes & control : CCC 12.2 (2001), p. 137-145. issn : 0957-5243 (Print). doi :10.1023/a:1008948125076.

Leitzmann, Michael F et al. « Prospective study of physical activity and risk of postmenopausal breastcancer. » eng. In : Breast cancer research : BCR 10.5 (2008), R92. issn : 1465-542X (Electronic).doi : 10.1186/bcr2190.

Luoto, R et al. « The effect of physical activity on breast cancer risk : a cohort study of 30,548women. » eng. In : European journal of epidemiology 16.10 (2000), p. 973-980. issn : 0393-2990(Print). doi : 10.1023/a:1010847311422.

Margolis, Karen L et al. « Physical activity in different periods of life and the risk of breast cancer :the Norwegian-Swedish Women’s Lifestyle and Health cohort study. » eng. In : Cancer epidemio-logy, biomarkers & prevention : a publication of the American Association for Cancer Research,cosponsored by the American Society of Preventive Oncology 14.1 (2005), p. 27-32. issn : 1055-9965(Print).

Maruti, Sonia S et al. « A prospective study of age-specific physical activity and premenopausalbreast cancer. » eng. In : Journal of the National Cancer Institute 100.10 (2008), p. 728-737. issn :1460-2105 (Electronic). doi : 10.1093/jnci/djn135.

McTiernan, Anne et al. « Recreational physical activity and the risk of breast cancer in postme-nopausal women : the Women’s Health Initiative Cohort Study. » eng. In : JAMA 290.10 (2003),p. 1331-1336. issn : 1538-3598 (Electronic). doi : 10.1001/jama.290.10.1331.

Mertens, Amy J et al. « Physical activity and breast cancer incidence in middle-aged women : aprospective cohort study. » eng. In : Breast cancer research and treatment 97.2 (2006), p. 209-214.issn : 0167-6806 (Print). doi : 10.1007/s10549-005-9114-7.

Moradi, Tahereh et al. « Physical activity and risk for breast cancer a prospective cohort study amongSwedish twins. » eng. In : International journal of cancer 100.1 (2002), p. 76-81. issn : 0020-7136(Print). doi : 10.1002/ijc.10447.

Patel, Alpa V et al. « Recreational physical activity and risk of postmenopausal breast cancer in alarge cohort of US women. » eng. In : Cancer causes & control : CCC 14.6 (2003), p. 519-529.issn : 0957-5243 (Print). doi : 10.1023/a:1024895613663.

Peters, Tricia M et al. « Physical activity and postmenopausal breast cancer risk in the NIH-AARPdiet and health study. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication ofthe American Association for Cancer Research, cosponsored by the American Society of PreventiveOncology 18.1 (2009), p. 289-296. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-08-0768.

Pronk, A et al. « Physical activity and breast cancer risk in Chinese women. » eng. In : British journalof cancer 105.9 (2011), p. 1443-1450. issn : 1532-1827 (Electronic). doi : 10.1038/bjc.2011.370.

Rintala, by Pirjo E et al. « Self-experienced physical workload and risk of breast cancer. » eng.In : Scandinavian journal of work, environment & health 28.3 (2002), p. 158-162. issn : 0355-3140(Print). doi : 10.5271/sjweh.659.

106

Rockhill, B et al. « A prospective study of recreational physical activity and breast cancer risk. »eng. In : Archives of internal medicine 159.19 (1999), p. 2290-2296. issn : 0003-9926 (Print). doi :10.1001/archinte.159.19.2290.

Rockhill, B et al. « Physical activity and breast cancer risk in a cohort of young women. » eng. In :Journal of the National Cancer Institute 90.15 (1998), p. 1155-1160. issn : 0027-8874 (Print). doi :10.1093/jnci/90.15.1155.

Schnohr, Peter et al. « Physical activity in leisure-time and risk of cancer : 14-year follow-up of 28,000Danish men and women. » eng. In : Scandinavian journal of public health 33.4 (2005), p. 244-249.issn : 1403-4948 (Print). doi : 10.1080/14034940510005752.

Sesso, H D, R S Jr Paffenbarger et I M Lee. « Physical activity and breast cancer risk in theCollege Alumni Health Study (United States). » eng. In : Cancer causes & control : CCC 9.4(1998), p. 433-439. issn : 0957-5243 (Print). doi : 10.1023/a:1008827903302.

Silvera, Stephanie A Navarro et al. « Energy balance and breast cancer risk : a prospective cohortstudy. » eng. In : Breast cancer research and treatment 97.1 (2006), p. 97-106. issn : 0167-6806(Print). doi : 10.1007/s10549-005-9098-3.

Steindorf, Karen et al. « Physical activity and risk of breast cancer overall and by hormone receptorstatus : the European prospective investigation into cancer and nutrition. » eng. In : Internationaljournal of cancer 132.7 (2013), p. 1667-1678. issn : 1097-0215 (Electronic). doi : 10.1002/ijc.27778.

Suzuki, Reiko et al. « Leisure-time physical activity and breast cancer risk defined by estrogen andprogesterone receptor status–the Japan Public Health Center-based Prospective Study. » eng. In :Preventive medicine 52.3-4 (2011), p. 227-233. issn : 1096-0260 (Electronic). doi : 10.1016/j.ypmed.2011.01.016.

Suzuki, Sadao et al. « Effect of physical activity on breast cancer risk : findings of the Japan colla-borative cohort study. » eng. In : Cancer epidemiology, biomarkers & prevention : a publication ofthe American Association for Cancer Research, cosponsored by the American Society of PreventiveOncology 17.12 (2008), p. 3396-3401. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-08-0497.

Tehard, Bertrand et al. « Effect of physical activity on women at increased risk of breast cancer :results from the E3N cohort study. » eng. In : Cancer epidemiology, biomarkers & prevention : apublication of the American Association for Cancer Research, cosponsored by the American Societyof Preventive Oncology 15.1 (2006), p. 57-64. issn : 1055-9965 (Print). doi : 10.1158/1055-9965.EPI-05-0603.

Thune, I et al. « Physical activity and the risk of breast cancer. » eng. In : The New England journal ofmedicine 336.18 (1997), p. 1269-1275. issn : 0028-4793 (Print). doi : 10.1056/NEJM199705013361801.

Wyrwich, K W et F D Wolinsky. « Physical activity, disability, and the risk of hospitalization forbreast cancer among older women. » eng. In : The journals of gerontology. Series A, Biologicalsciences and medical sciences 55.7 (2000), p. M418-21. issn : 1079-5006 (Print). doi : 10.1093/gerona/55.7.m418.

107

Annexes

108

Annexe A

Questionnaire

Questionnaire Facteurs de risque du Cancer du Sein chez la femme

française.

Nous allons commencer par vous poser quelques questions sur vos antécé-

dents personnels :

1. Avez-vous déjà eu un cancer du sein ?

� Oui

� Non

2. Quel était votre âge aux premières règles ? tt (années)

3. Avez-vous utilisé une contraception hormonale avant 2006 ?

� Oui ; Si oui, pendant combien de temps ? tt (années)

� Non

4. Avez-vous eu des grossesses avant 2006 :

� Oui ; Si oui :

— Nombre de grossesses ? tt

— Age lors de votre première grossesse : tt (années)

� Non

5. Combien avez-vous eu d’enfants avant 2006 ? tt Et combien de filles parmi ces

enfants ? tt

6. Quelle a été votre durée totale d’allaitement avant 2006 ? (Si vous avez allaité plus

d’un enfant comptez le temps d’allaitement cumulé pour tous vos enfants)

tt mois

7. Avez-vous des soeurs ?

� Oui ; Si oui, combien ? tt

� Non

8. Etiez-vous ménopausée en 2006 ?

� Oui ;Si oui, âge à la ménopause ? tt (années)

� Non

109

Annexe A

� Ne sait pas

9. Avez-vous pris un traitement hormonal pour la ménopause avant 2006 ? (si vous

étiez ménopausée en 2006)

� Non

� Oui : Si oui,

— Nombre d’années de traitement avant 2006 ? tt (années).

— Quel traitement avez-vous pris ? : : : : : : : : : : : : ; � Ne sait pas

10. Avez-vous réalisé des biopsies du sein avant 2006 ?

� Non

� Oui ; Si oui, combien ? tt

Résultat de la 1ère biopsie ?

� Normal

� Lésion non proliférative

� Prolifération sans atypie

� Prolifération avec atypie

� Carcinome lobulaire in situ

� Inconnu

Résultat de la 2ième biopsie ? (si plus d’une biopsie)

� Normal





� Inconnu

Résultat de la 3ième biopsie ? (si plus de deux biopsies)

� Normal





� Inconnu

11. Avez-vous pris un traitement contre l’ostéoporose avant 2006 ?

� Ne sait pas

110

Annexe A

� Non

� Oui ; Si oui, lequel ? (si plusieurs cochez plusieurs cases)

— Raloxifène (Evista R , Optruma R ) � Oui � Non � Ne sait pas

— Biphosphonates (Didronel R , Skelid R , Alendronate R , Actonel R , Clastoban R ,

Fosamax R , Bonviva R ) � Oui � Non � Ne sait pas

— Tériparatide (Forsteo R ) � Oui � Non � Ne sait pas

— Dénosumab (Prolia R , Xgeva R ) � Oui � Non � Ne sait pas

— Autre : � Oui � Non, Si oui lequel ?: : : : : : : : : : : :

12. Avez-vous eu des fractures non causées par un traumatisme important avant 2006 ?

� Non

� Oui ; Si oui, quelle localisation ?

— Poignet : � Oui � Non

— Fémur : � Oui � Non

— Tassement de vertèbres : � Oui � Non

13. A l’âge de 40 (années) quelle était votre :

— Taille t t tcm

— Poids moyen t t tKg

14. Après la ménopause quelle était votre : (si ménopausée en 2006)

— Poids moyen t t tKg

15. Avez-vous présenté un cancer avant 2006 ? (autre qu’un cancer du sein)

� Non

� Oui ; Si oui, de quelle nature ? (si plusieurs cochez plusieurs cases)

� Ovaire

� Colon

� Poumon ; Si oui, avez-vous reçu un traitement par radiothérapie ? � Oui

/ � Non

� Autre ; Si oui, quelle localisation ? : : : : : : : : : : : :

16. Avez-vous eu une maladie de Hodgkin ?

� Non

� Oui ; Si oui, avez-vous eu une radiothérapie du thorax ?

� Oui

� Non

111

Annexe A

Nous allons maintenant vous poser des questions sur vos habitudes de

vie et socio-économiques :

17. Quelle était votre activité physique avant 2006 :

— Nombre d’heure(s) de sport par semaine : tt heures

— Pendant combien d’années avant 2006 ? tt (années)

18. En 2006, par semaine, combien buviez-vous de :

— Verre(s) de vin ? (10cl) tt

— Cannette(s)/bouteille(s) de bière ? (25cl) tt

— Dose(s) d’alcool fort (3cl d’alcool à 40ı) tt

— Pendant combien d’années avant 2006 ? tt (années)

19. Y avait-il des périodes dans le mois où vous rencontriez de réelles difficultés finan-

cières à faire face à vos besoins (alimentation, loyer, électricité. . . ) avant 2006 ?

� Oui

� Non

20. Partiez-vous régulièrement en vacances avant 2006 ?

� Oui

� Non

21. En cas de difficultés, y avait-il dans votre entourage des personnes sur qui vous

pouviez compter pour vous héberger quelques jours en cas de besoin avant 2006 ?

� Oui

� Non

22. En cas de difficultés, y avait-il dans votre entourage des personnes sur qui vous

pouviez compter pour vous apporter une aide matérielle ou financière avant 2006 ?

� Oui

� Non

Nous allons finir par vos antécédents familiaux de cancer :

Antécédents familiaux de cancer du sein :

23. votre mère a-t-elle présenté un cancer du sein ?

� Oui ; Si oui, à quel âge ? tt et en quelle année ? t t tt

� Non

� Ne sait pas

24. votre sœur a-t-elle présenté un cancer du sein ?répéter si plusieurs


112

Annexe A

� Non

� Sans objet

� Ne sait pas

25. votre fille a-t-elle présenté un cancer du sein ?répéter si plusieurs


� Non

� Sans objet

� Ne sait pas

26. votre tante maternelle a-t-elle présenté un cancer du sein ?répéter si plusieurs


� Non

� Sans objet

� Ne sait pas

27. votre père a-t-il présenté un cancer du sein ?


� Non

� Ne sait pas

28. votre tante paternelle a-t-elle présenté un cancer du sein ?répéter si plusieurs


� Non

� Sans objet

� Ne sait pas

Antécédents familiaux de cancer de l’ovaire :

Dans votre famille, le cas échéant,

29. votre mère a-t-elle présenté un cancer de l’ovaire ?


� Non

� Ne sait pas

30. votre soeur a-t-elle présenté un cancer de l’ovaire ? répéter si plusieurs


� Non

� Sans objet

113

Annexe A

� Ne sait pas

31. votre fille a-t-elle présenté un cancer de l’ovaire ? répéter si plusieurs


� Non

� Sans objet

� Ne sait pas

32. votre tante maternelle a-t-elle présenté un cancer de l’ovaire ? répéter si plusieurs


� Non

� Sans objet

� Ne sait pas

33. votre tante paternelle a-t-elle présenté un cancer de l’ovaire ? répéter si plusieurs


� Non

� Sans objet

� Ne sait pas

114

Annexe B

Code première étape de TEMAS

B.1 Shiny UI

ui <- fluidPage(

useShinyjs(),

tags$head(

tags$style(HTML("hr {border-top: 1px solid #000000;}")),

tags$style(

HTML(".info_rounded {

background-color: #e7f3fe;

border-left: 6px solid #2196F3;

border-radius: 0px 10px 10px 0px; } ")

),

tags$style(

HTML("

.shiny-output-error-validation {

color: red;

}

")

)),

titlePanel("Step 1 : Global Search"),

div(style = "position:absolute;left:23em;top:20px;",

actionButton("help", "HELP", icon=icon(’info-circle’),

style="color: #000000; background-color: #FFE436;

border-color: #000000")),

tags$div(style = "position:absolute;left:29em;top:20px;",

a(class="btn btn-default",

href="https://shiny.temas-bonnet.site/",

"Back Home",style="color: #000000;

background-color: #A9CCE3; border-color: #000000")),

115

Annexe B

sidebarLayout(

sidebarPanel(

bsModal(id="dataset", title = "Help Step 1 : Global Search",

trigger = "help", size="large",

tags$div(HTML("<ol>

<li> Define search parameters </li>

<ul>

<li> Enter your search keywords </li>

<ul>

<li> You can use advanced search since it respects

<a href=\"https://www.nlm.nih.gov/bsd/disted/pubmedtutorial/020_350.html\"

target=\"_blank\">PubMed syntax.</a> </li>

<li> Use parentheses wisely</li>

<li> The use of the \"NOT\" operator is not recommended.</li>

<li> For \"phrase searching\" (e.g. breast cancer) use \"_\" instead of

enclosing the phrase in double quotes (e.g. breast_cancer instead of

\"breast cancer\")</li>

</ul>

<div class=\"info_rounded notes\">

Example: 

(breast_cancer[MeSH Terms] OR breast_cancer[Title/Abstract]) AND

(risk_factor[Title/Abstract] OR risk_factor[MeSH Terms])

</div>

<li> Select the date range for the articles you want to retrieve (default

setting is a search over a range of 10 years from today’s date)</li>

<li> Click on the \"Save search parameters\" button and wait until \"Done!\"

appears on your screen</li>

</ul>

<li> Create database </li>

<ul>

<li> Select the maximum number of articles you want to retrieve

(default setting is the number of PubMed answers) </li>

<li> Click on the \"Create database\" button and wait until \"Done!\"

appears on your screen </li>

<li> A preview of your database is displayed on the right of your screen</li>

</ul>

<li> Download database </li>

116

Annexe B

<ul>

<li> Click on the \"Download database\" button to save the formatted

database. </li>

</ul>

<li> Go to Step 2 </li>

<ul>

<li> Click on the \"Step 2 : First sort\" link to go to Step 2 </li>

</ul>

</ol>")

)

),

conditionalPanel("output.re_1 == ’1’",

fluidRow(

column(10,h3("1. Define search \n parameters")),

column(1,actionButton("re_1h", "",

icon = icon("angle-double-up")),align="right")

),

textAreaInput("text", label = h4("Pubmed Search"),

height = "90px" ,value = "((risk_factor[MeSH Terms]

OR risk_factor[Title/Abstract]

OR risk_factors[Title/Abstract])

AND (Breast_cancer[MeSH Terms]

OR Breast_cancer[Title/Abstract]

OR breast_neoplasms[Title/Abstract]))",

width = ’100%’,resize = "vertical"),

textOutput("texterror_format"),

dateRangeInput("dates", label = h4("Publication dates"),

start = Sys.Date() - 3652,

end = Sys.Date(), max = Sys.Date(),

weekstart = 1),

actionButton(inputId = "go",

label = "Save search parameters")),

conditionalPanel("output.re_1 == ’1bis’",

fluidRow(

column(10,h3("1. Define search parameters")),

column(1,actionButton("re_1", "",

icon = icon("angle-double-down")),align="right")

)

117

Annexe B

),

conditionalPanel("input.go",

h4("Query Translation:"),

textOutput("QueryTranslation"),

h4("Number of PubMed answers:"),

textOutput("count"),

hr(),

h3("2. Create database"),

numericInput("num_max", label = h4("Maximum number

of articles to retrieve \n (approx 15s/1000 articles)"),

value = 500,min=100,max=50000),

actionButton(inputId = "base",

label = "Create database"),

),

conditionalPanel("input.base",

hr(),

h3("3. Download database (optional)"),

downloadButton("downloadData", "Download database"),

hr(),

h3(uiOutput("tab"))

)

),

mainPanel(

conditionalPanel("input.base",

h3("Database Preview"),

tableOutput("test")

)

)

)

)

118

Annexe B

B.2 Shiny server

library(backports)

if (!require("devtools"))

install.packages("devtools")

if (!require("rclipboard"))

install.packages("rclipboard")

if (!require("shinyWidgets"))

install.packages("shinyWidgets")

if (!require("processx"))

install.packages("processx")

#install.packages("nlme")

library(nlme)

if (!require("scales"))

install.packages("scales")

library("devtools")

library(shiny)

library(rentrez)

library("XML")

#install.packages("reutils",check_built=T)

# library(reutils)

library(rclipboard)

library(lubridate)

library(stringr)

library("shinyWidgets")

if(!require("shinycssloaders"))

install.packages("shinycssloaders")

if(!require("shinyjs"))

install.packages("shinyjs")

if(!require("shinyBS"))

install.packages("shinyBS")

library(shinycssloaders)

library(shinyjs)

library(shinyBS)

#devtools::install_github("gschofl/reutils")

# set_entrez_key("XXXXXXXXXXXXXXXXXXXXXXXXXXXXX")

Sys.getenv("ENTREZ_KEY")

119

Annexe B

options("scipen"=100)

# options(reutils.api.key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

# options(reutils.email = "[email protected]")

#rsconnect::appDependencies()

‘%then%‘ <- shiny:::‘%OR%‘

humanTime <- function() {

format(Sys.time(), "%Y%m%d-%H%M%OS")

}

name_doss <- humanTime()

path <- paste0("~/",name_doss)

dir.create(path)

setwd(path)

server <- function(input, output,session) {

output$clip <- renderUI({

rclipButton("clipbtn", "Copy",name_doss, icon("clipboard"))

})

sendSweetAlert(

session = session,

title = "Copy and save you session ID :",

text = h3(name_doss),

type = "info",

closeOnClickOutside = F

)

observeEvent(input$clipbtn,

{ clipr::write_clip(name_doss,allow_non_interactive=T)})

onclick("downloadData", toggle(id = "last", anim = F))

observe({

input$text

memo1 <- as.character(input$text)

print(memo1)

print(memo1 != "")

output$texterror_format <- renderText({

validate(need(input$text != "","Please enter keyword(s)") %then%

need(input$text != " ","Please enter keyword(s)")

)

120

Annexe B

})

toggleState("go",!is.null(input$text) && input$text != "" &&

input$text != " ")

})

output$re_1 <- renderText(’1’)

outputOptions(output, "re_1", suspendWhenHidden = FALSE)

observeEvent(input$re_1,{

output$re_1 <- renderText(’1’)

})

observeEvent(input$re_1h,{

output$re_1 <- renderText(’1bis’)

})

rv <- reactiveValues(data = NULL)

rv <- reactiveValues(datedebut = NULL)

rv <- reactiveValues(datefin = NULL)

rv <- reactiveValues(n_max =NULL)

rv <- reactiveValues(datedebut1 =NULL)

rv <- reactiveValues(datefin1 =NULL)

observeEvent(input$go, {

rv$datedebut1 <- input$dates[1]

rv$datefin1 <- input$dates[2]

rv$data <- input$text

rv$datedebut <- str_c(str_sub(rv$datedebut1 ,1,4),"/",

str_sub(rv$datedebut1 ,6,7),"/",str_sub(rv$datedebut1 ,9,10))

rv$datefin <- str_c(str_sub(rv$datefin1 ,1,4),"/",

str_sub(rv$datefin1 ,6,7),"/",str_sub(rv$datefin1 ,9,10))

})

pubmed <- reactiveValues(ids="Attente confirmation")

a <- "Resultat"

observeEvent(input$go,{

121

Annexe B

output$re_1 <- renderText(’1bis’)

sendSweetAlert(

session = session,

btn_labels = NA,

title = "Saving parameters...",

text = "Please wait until \"Done !\" appears on your screen.",

closeOnClickOutside = F,

type = "warning"

)

pubmed$total <- entrez_search(db = "pubmed", term = rv$data ,

mindate=rv$datedebut,maxdate=rv$datefin,use_history = T)

pubmed$ids <- pubmed$total$ids

print(length(pubmed$ids))

pubmed$history <- pubmed$total$web_history

output$QueryTranslation <- renderText(pubmed$total$QueryTranslation)

output$count <- renderText(pubmed$total$count)

updateNumericInput(session, "num_max",

value = pubmed$total$count, min = 100,

max = pubmed$total$count)

sendSweetAlert(

session = session,

title = "Done !",

text = "Search parameters saved !",

type = "success"

)

})

date_jour <- str_sub(date(),start = 9,end = 10)

date_mois <- str_sub(date(),start = 5,end = 7)

date_annee <- str_sub(date(),start = 21,end = 24)

date_heure <- str_c(str_sub(date(),start = 12,end = 13),"h",

str_sub(date(),start = 15,end = 16))

name_id <- str_c("shiny.step1_",date_jour,"_",date_mois, "_" , date_annee,

"_" ,date_heure,".csv")

rv <- reactiveValues(title_abstract = "Etape 2 : creation de la base de donnees")

output$base <-renderPrint({

122

Annexe B

rv$title_abstract})

num <- 1

observeEvent(input$base, {rv$l <- length(pubmed$ids) })

observeEvent(input$base, {rv$title_abstract <- NULL })

observeEvent(input$base, {

rv$n_max <- min(input$num_max,pubmed$total$count)

sendSweetAlert(

session = session,

btn_labels = NA,

title = "Downloading articles...",

text = "Please wait until \"Done !\" appears on your screen.",


type = "warning"

)

pubmed$ids2 <- NULL

niter <- floor(rv$n_max/10000)

for (i in 0:(max(0,(niter-1)))) {

sendSweetAlert(

session = session,

btn_labels = NA,


text = tags$span("Please wait until \"Done !\" appears on your screen.",

tags$br(),

paste("Step",i+1,"/",niter+2)

),


type = "warning"

)

print("passage")

article <- entrez_fetch(db="pubmed",

web_history = pubmed$total$web_history ,rettype ="xml",parsed = T,

retmax = min(10000,rv$n_max), retstart = i*10000)

b <- getNodeSet(article,"//MedlineCitation")

print(article)

print(b)

print(length(b))

for (i in 1:length(b)) {

123

Annexe B

bbb <- xmlSerializeHook(b[[i]])

bbb <- xmlDeserializeHook(bbb)

ttt <- XML::xpathSApply(bbb, "//Abstract", XML::xmlValue)

aaa <- XML::xpathSApply(bbb, "///MedlineCitation/PMID[@Version=’1’]",

XML::xmlValue)

if(length(ttt)==0) ttt <- "NA"

if(length(aaa)==0) aaa <- "NA"

ttt <- paste((XML::xpathSApply(bbb, "//ArticleTitle", XML::xmlValue)),ttt)

rv$title_abstract <-c(rv$title_abstract,ttt)

pubmed$ids2 <-c(pubmed$ids2,aaa)

print(i)

}

}

if(niter>=1 & rv$n_max != 10000 & rv$n_max != 20000 &

rv$n_max != 30000& rv$n_max != 40000& rv$n_max != 50000){

sendSweetAlert(

session = session,

btn_labels = NA,



tags$br(),

paste("Step",niter+1,"/",niter+2)

),


type = "warning"

)

print("pas_bon")

article <- entrez_fetch(db="pubmed",

web_history = pubmed$total$web_history ,rettype ="xml",parsed = T,

retmax = (rv$n_max-10000*niter), retstart = (niter*10000))


print(length(b))






XML::xmlValue)


124

Annexe B


ttt <- paste((XML::xpathSApply(bbb, "//ArticleTitle", XML::xmlValue)),ttt)



print(i)

}

}

while(length(pubmed$ids2)<rv$n_max & (rv$n_max + zzz < pubmed$total$count)){

sendSweetAlert(

session = session,

btn_labels = NA,



tags$br(),

paste("Step",niter+2,"/",niter+2)

),


type = "warning"

)

article <- entrez_fetch(db="pubmed",web_history = pubmed$total$web_history,

rettype ="xml",parsed = T,retmax = (rv$n_max-length(pubmed$ids2)),

retstart= rv$n_max+zzz )

zzz <- zzz + (rv$n_max-length(pubmed$ids2))


print(length(b))






XML::xmlValue)



ttt <- paste((XML::xpathSApply(bbb, "//ArticleTitle", XML::xmlValue)),

ttt)



125

Annexe B

print(i)

}

}

num <- 10

print(length(pubmed$ids2))

output$base <-renderPrint({

head(rv$title_abstract)})

datas2 <- reactive({rv$title_abstract})

clean_data_bis <- reactive({str_replace_all(datas2(),"-","")})

clean_data <- reactive({str_replace_all(clean_data_bis(),"[[:punct:]]"," ")})

clean_data3 <- reactive({str_replace_all(clean_data(),"[[:cntrl:]]"," ")})

clean_data5 <- reactive({str_to_lower(clean_data3())})

clean_data4 <- reactive({iconv(clean_data5(), to="ASCII//TRANSLIT//IGNORE")})

clean_data_bis2 <- reactive({str_replace_all(clean_data4(),"\"","")})

clean_data2 <- reactive({cbind(pubmed$ids2,clean_data_bis2())})

final_data <- clean_data2()

colnames(final_data)<- c("uid","Title_and_abstract")

write.table(final_data, name_id,sep = ";",row.names = FALSE)

output$test <-renderTable({

head(final_data,n=10L,addrownums=F)})

sendSweetAlert(

session = session,

title = "Done !",

text = "Search complete !",

type = "success"

)

output$downloadData <- downloadHandler(

filename = function() {

126

Annexe B

name_id

},

content = function(file) {

write.table(clean_data2(), file,sep = ";",row.names = FALSE)

}

)

}

)

url <- a("Step 2 : First sort",

href=paste0("http://shiny.temas-bonnet.site/Step2_id/?id_sess=",

name_doss))

output$tab <- renderUI({

tagList("4. Go to", url)

})

}

127

Annexe C

Annexes Article 1

128

Supplementary Appendix 1 : Extra figures

Supplemental Figure 1: Stratified random sampling of 300 articles of RS and

TEMAS databases

Annexe C

C.1 Annexe 1

129

Additional file 2: Complete PubMed search queries

SEARCH QUERIES OF REFERENCE SEARCHES FOR BREAST CANCER:

– Breastfeeding:


(breast_feeding[MeSH Terms] OR

breastfeeding[Title/Abstract] OR

breastfeed[Title/Abstract] OR

breast_feeding[Title/Abstract] OR

breast_feed[Title/Abstract] OR

breastfed[Title/Abstract])

)

– Menarche:


(menarche[MeSH Terms] OR

menarche[Title/Abstract] OR

menarcheal[Title/Abstract])

)

– Oral contraceptive:


(oral_contraceptive[MeSH Terms] OR

oral_contraceptive[Title/Abstract] OR

oral_contraceptives[Title/Abstract] OR

oral_contraception[Title/Abstract])

)

– Breast/mammographic density:


(Breast_Density[MeSH Terms] OR

Breast_Density[Title/Abstract] OR

Breast_Percent_Density[Title/Abstract] OR

(Breast[Title/Abstract] AND Percent_Density[Title/Abstract]) OR

Dense_Breasts[Title/Abstract] OR

Mammographic_Densities[Title/Abstract] OR

Mammographic_Density[Title/Abstract] OR

Annexe C

C.2 Annexe 2

130

Mammographic_Percent_Density[Title/Abstract])

)

SEARCH QUERIES FOR META-ANALYSES:

Physical activity:

· RS: (physical_activity AND breast_cancer AND cohort) before Nov 2012

è 552 articles

· TEMAS :

o Step 1: (Breast_cancer AND Cohort) before Nov 2012 è 34076 articles

o Step 2 : Removing 2773 articles without abstract è 31303 articles

o Step 3 : Extraction 2 terms “physic” and “activ” è 330 articles

o Step 4 : Removing articles without OR, HR or RR è 145 articles

Cohorts menarche menopause:

· RS: (breast cancer AND (cohort OR prospective)) AND (hormonal

contraceptives OR hormone replacement OR menarche OR menopause OR

reproductive OR oral contraceptive OR (Cancer AND risk))before

31/01/2012 è 13885 articles

· TEMAS :

o Step 1: (Breast_cancer AND (Cohort OR Prospective)) before Feb

2012 è 34200 articles


o Step 3 : Extraction “hormon-contra”, “hormon-replac”, “menarch”, “menopaus”,

“reproduct”, “contracept-oral”, “cancer-risk” è 5431 articles

o Step 4 : No sorting because no rule of relevance in the meta-analysis

Case control studies menarche menopause:

· RS: (breast cancer AND Case control) AND (hormonal contraceptives

OR hormone replacement OR menarche OR menopause OR reproductive OR

oral contraceptive OR (Cancer AND risk))before 31/01/2012 è 13885 articles

· TEMAS :

o Step 1: (Breast_cancer AND Case_control) before Feb 2012 è 20150

articles


o Step 3 : Extraction “hormon-contra”, “hormon-replac”, “menarch”, “menopaus”,

“reproduct”, “contracept-oral”, “cancer-risk” è 3400 articles

o Step 4 : No sorting because no rule of relevance in the meta-analysis

Annexe C

131

Additional file 3: Methodological Expectations of Cochrane

Intervention Reviews (MECIR)

Higgins JPT, Lasserson T, Chandler J, Tovey D, Thomas, J, Flemyng E, Churchill R. Methodological

Expectations of Cochrane Intervention Reviews. Cochrane: London, Version October 2019

1.5 Searching for studies

C24 Searching general bibliographic databases

and CENTRAL

TEMAS was only developed for the PubMed

database for the moment

C25 Searching specialist bibliographic databases Not Applicable

C26 Searching for different types of evidence Not Applicable

C27 Searching trials registers Not Applicable

C28 Searching for grey literature Not Applicable

C29 Searching within other reviews Not Applicable

C30 Searching reference lists Not Applicable

C31 Searching by contacting relevant

individuals and organizations

Not Applicable

C32 Structuring search strategies for

bibliographic databases

Within each concept, terms are joined together

with the Boolean ‘OR’ operator, and the

concepts are combined with the Boolean ‘AND’

operator. The ‘NOT’ operator was avoided to

avoid the danger of inadvertently removing

records that are relevant from the search set.

All the searches are detailed in the article or in

Additional file 2

C33 Developing search strategies for

bibliographic databases

We identified MeSH terms and free-text terms

(considering, for example, spelling variants,

synonyms…).

C34 Using search filters No search filters were used

C35 Restricting database searches We chose a 10 year date restriction for recall

and precision and date restrictions indicated in

the explored meta-analysis.

C36 Documenting the search process The search process is documented in detail

throughout the process to ensure that all the

searches are reproducible.

C37 Rerunning searches No need to rerun searches, searches are part of

the validation steps

C38 Incorporating findings from rerun searches Not Applicable

Annexe C

C.3 Annexe 3

132

1.6 Selecting studies to include in the review

C39 Making inclusion decisions Two people (EB and PL) worked independently

to determine whether each study meets the

relevance criteria. C40 Excluding studies without useable data Articles without abstract (letter to the editor,

teaching case, erratum)

C41 Documenting decisions about records

identified

All the selection steps are detailed in the article

or in the Additional file 2

C42 Collating multiple reports Not Applicable

Annexe C

133

Annexe D

Annexes Article 2

D.1 Annexe 1

Le questionnaire est présenté dans l’annexe A.

D.2 Annexe 2

Le calcul pour la détermination des seuils par rapport de risque est présenté dans la

section 2.4.3.2.

134

Construction d’un score prédictif du cancer du sein adapté à la population fran-

çaise : détermination de seuils de risque pour un dépistage organisé personnalisé.

Résumé : Le cancer du sein de la femme concerne près de 60.000 nouveaux cas par an, avec unesurvie nette à 5 ans de 87%. Cependant, si le cancer du sein est dépisté à un stade précoce, la survieà 5 ans est de 99%. Pour tenter de dépister le cancer du sein chez les femmes à un stade précoce, laFrance a mis en œuvre un programme national de dépistage organisé pour les femmes âgées de 50à 74 ans et à risque "moyen". En 2017, les citoyennes et les professionnels ont exprimé leur souhaitque le dépistage organisé devienne de plus en plus personnalisé selon les facteurs de risque. Dansla première partie de cette thèse nous avons développé TEMAS (Text-mining Algorithm- assistedSearch), une nouvelle méthode de recherche de la littérature utilisant des méthodes de text-mininget de classification. Nous avons ensuite créé une application web permettant de mettre en œuvreTEMAS sans nécessité de connaissances particulières. Dans la deuxième partie de cette thèse nousavons développé un questionnaire à partir des facteurs de risque de cancer du sein identifiés grâce àTEMAS. Nous avons posé ce questionnaire à 3077 femmes incluses dans le dépistage organisé afind’établir leur score individuel de risque de cancer du sein au cours d’un suivi de 12 ans grâce à unmodèle de Cox et à l’estimateur de Breslow. Grâce à ce score, nous avons ensuite, estimé deux seuilsde risque optimaux pour créer, au sein de la population actuellement considérée à risque "moyen"trois groupes à risque gradué : risque plus bas, risque moyen et risque plus élevé. En conclusion,nous avons proposé une nouvelle stratégie pour un dépistage organisé personnalisé en fonction duniveau de risque.Mots clés : Cancer du sein, Dépistage organisé personnalisé, Score de risque, Risque gradué,Recherche de la littérature, text-mining, application web, TEMAS (Text Mining algorithm assistedsearch)

Construction of a predictive score for breast cancer adapted to the French popu-

lation : determination of risk thresholds for a personalized organized screening.

Abstract : Breast cancer affects nearly 60,000 new women in France each year, with a net 5-yearsurvival of 87%. However, for a breast cancer detected at an early stage, the 5-year survival is 99%.In an attempt to detect breast cancer in women at an early stage, France has set up a nationalscreening program organized for women aged 50 to 74 and at "medium" risk. In 2017, Frenchcitizens and professionals have expressed their wish that the organized screening becomes more andmore personalized according to risk factors. In the first part of this thesis, we developed TEMAS(Text-mining Algorithm- assisted Search), a new literature search method using text-mining andclassification methods. We then created a web application allowing people to implement TEMASwithout the need for special knowledge. In the second part of this thesis, given the risk factorsidentified with TEMAS, we developed a questionnaire. This questionnaire was completed by 3,077women included in the organized screening in the Hérault department, France, in order to estimatetheir individual risk of breast cancer for a 12-year follow-up, based on a score. This score wasconstructed using a Cox model and the Breslow estimator. Then, within the population currentlyconsidered at "medium" risk, this score enabled to delineate two risk thresholds, on the one hand oflower risk, and on the other hand of higher risk. Then, according to the level of risk within the threegroups of graduated risk, we proposed a new strategy that personalized the organized screening.Keywords : Breast cancer, Personnalized organized screening, Risk score, Graduated risk, Litera-ture search, text-mining, web application. TEMAS (Text Mining algorithm assisted search)

Construction d'un score prédictif du cancer du sein adapté à la ...

Documents

Transcript of Construction d'un score prédictif du cancer du sein adapté à la ...