Automatic generation of documents

26
Introduction Automated documents Examples Workflow for an automated document to complete its task Writing ado commands? Automatic generation of documents Rosa Gini Jacopo Pasquini [email protected] [email protected] Agenzia Regionale di Sanit` a della Toscana September 13, 2006

Transcript of Automatic generation of documents

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Automatic generation of documents

Rosa Gini Jacopo [email protected]

[email protected]

Agenzia Regionale di Sanita della Toscana

September 13, 2006

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

1 Introduction

2 Automated documentsAutomated and actual documentsMotivationElements of an automatically generated document

3 Examples

4 Workflow for an automated document to complete its taskThe general schemeGenerating dta filesBasic elementsControl codeDocument production

5 Writing ado commands?AnalysisElements

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Stata cannot hold a mouse

I While a common way for generating documents is via visualprograms, such as MS Office or OpenOffice, it is completelyimpossible for Stata to produce documents this way, since itlacks eyes to format a table and hands to hold a mouse inorder to cut–and–paste graphs.

I Nevertheless such documents as paper reports, web pages,screen presentations, . . . can also be obtained via the use of amarkup language: html, LATEX. . .

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Stata can write text

I A markup language is a programming language for composingdocuments.

I The code that generates a document is simply a text file,which contains both the contents of the document and theinstructions for the markup language to compose them in abeautiful way.

I Stata is able to write text, basically via the file suite ofcommands.

I The main topic of this communication is to summarize someexperiences on how to make Stata produce documents thisway.

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Automated and actual documentsMotivationElements of an automatically generated document

What is an automated document?

An automated document is a piece of Stata code whose aim is

I to analyze data

I to write a piece of markup language code which presents thedata in a fashionable form, i.e. to generate an actualdocument (in pdf, or html. . . )

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Automated and actual documentsMotivationElements of an automatically generated document

Warning

It is useful to distinguish between

I the automated document, which is a text file (containingStata code) for Stata to execute,

I the code of the actual document, which is a text file(containing the markup language code) for the markuplanguage to interpret (possibily via compilation)

I the actual generated document, which is a pdf or html filefor a user to read or print or browse or show

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Automated and actual documentsMotivationElements of an automatically generated document

When to write an automated document?

It is worthwile investing time in producing an automated documentwhen:

I the actual document that is needed is based on figures thatcan change (e.g. periodically): an automated document infact easly generates an updated actual document when datachange

I and/or the document is long but is structured: an automateddocument is a piece of Stata code, hence it writes the code ofthe actual document by means of cycles, conditionalstatements. . . that can repeat hundreds of times the sameoperations in very little time and without mistakes

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Automated and actual documentsMotivationElements of an automatically generated document

Recall we’re talking about documents for data to be analyzed andpresented, so we think of it as a sequence of basic elements, suchas tables, graphics,. . . , for data analysis.

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Automated and actual documentsMotivationElements of an automatically generated document

Structure of a markup language code

Normally we produce several files for the markup language tointerpret

I basic elements of the document, such as tables, graphics,. . .

I the control code of the document, which assembles the basicelements and provides the general structure, possibly addingtables of contents or similar features that allow navigatingthrough the document

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Automated and actual documentsMotivationElements of an automatically generated document

Traditional basic elements

I Tables

I Graphics

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

Automated and actual documentsMotivationElements of an automatically generated document

Other basic elements

I Lists

I Trees

I . . .

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

dta for tables

dta for graphs

dta for other

tables

graphs

other output

control code

document

I screen

I paper

I web

��������������������������������������������������������

AI

I xcontract,

I statsby,

I parmby,

I manual code(postfile)

zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

8@I statsby,

I parmby,

I manualcode

DDDD

DDDD

DDDD

DDDD

DDDD

DDDD

DDDD

DDD

�&

I manualcode

I . . .

+3

I reshape, . . . ,

I listtex,

I file write

+3

I twoway

graph,

I file write

and do

I graph

export

+3

I file write,

I manual code

I . . .

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

55

��

I compilation,

I browsing

I . . .

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

(

I compilation,

I browsing

I . . .

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

6>

I compilation,

I browsing

I . . .

@H

I compilation,

I browsing

I . . .

@H

I graph,

I twoway

graph,

I file write

and do

I graph

export

�&

I file

write,

I otherprograms(winexec)

(0

I file

write

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

dta for tables

dta for graphs

dta for other

tables

graphs

other output

control code

document

I screen

I paper

I web

��������������������������������������������������������

AI

I xcontract,

I statsby,

I parmby,

I manual code(postfile)

zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

8@I statsby,

I parmby,

I manualcode

DDDD

DDDD

DDDD

DDDD

DDDD

DDDD

DDDD

DDD

�&

I manualcode

I . . .

+3

I reshape, . . . ,

I listtex,

I file write

+3

I twoway

graph,

I file write

and do

I graph

export

+3

I file write,

I manual code

I . . .

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

55

��

I compilation,

I browsing

I . . .

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

(

I compilation,

I browsing

I . . .

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

6>

I compilation,

I browsing

I . . .

@H

I compilation,

I browsing

I . . .

@H

I graph,

I twoway

graph,

I file write

and do

I graph

export

�&

I file

write,

I otherprograms(winexec)

(0

I file

write

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

dta for tables

dta for graphs

dta for other

control code

��������������������������������������������������������

AI

I xcontract,

I statsby,

I parmby,

I manual code(postfile)

zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

8@I statsby,

I parmby,

I manualcode

DDDD

DDDD

DDDD

DDDD

DDDD

DDDD

DDDD

DDD

�&

I manualcode

I . . .

I reshape, . . . ,

I listtex,

I file write

ghahgjkgshasg

hgfhasgwerwI this is the actual dataanalysis

I the results of analysis arestored (generally) as dtafiles, more convenientlystored in a separatedirectory

I the process of storingresults of data analysis indta is not homogeneouslyguaranteed by Statacommands, some of whichdo not support such afeature as statsby

I in case this support is notgranted it necessary to usepostfile

Fortunately Roger Newson’sparmby provides such a sup-port altogether for all estima-tion commands

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

dta for tables tables

control code

+3

I reshape, . . . ,

I listtex,

I file write

I this passage is granted by several commandsby Roger Newson’s, mainly listtex

I some feature of LATEX’s or html’s tables onlymight be available through directly writingmarkup language’s code (groupingcolumns,. . . )

I the more one knows the markup language themore one can make fine tuning of tables’ layout

However since it’s a simple matter of writing textone can generate code for dozens of tables alto-gether, according to some style parameters onecan store in Stata macrosRemark that this pas-sage produces some text files containing piecesof markup language code (hence with extensionstex, htm,. . . ), more conveniently stored in a sep-arate directory (capture mkdir) named some-thing like tables

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

dta for graphs graphs

control code

I reshape, . . . ,

I listtex,

I file write

ghahgjkgshasg

hgfhasgwerw

+3

I twoway

graph,

I file write

and do

I graph

export

@H

I graph,

I twoway

graph,

I file write

and do

I graph

export

I Stata’s graph commandsoften do analysis andgraph composition at thesame time

I sometimes when producinggraphs conditioned on thedata (i.e. a line for eachyear) one can make theautomated documentwrite temporary filescotaining pieces of Statacode and execute it (filewrite and do)

I then graphs must beexported in suitableformats (pdf, png. . . ) forthe markup language toread them

Hence this passage producessome image files in theformats that the particu-lar markup language accepts,more conveniently stored in aseparate directory (capturemkdir) named something likefigures

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

dta for other other output

control code

I reshape, . . . ,

I listtex,

I file write

ghahgjkgshasg

hgfhasgwerw

+3

I file write,

I manual code

I . . . �&

I file

write,

I otherprograms(winexec)

I Many other elementsaccepted by the markuplanguage can be generated

I tipically lists, trees,. . .

I but also archives of data(for php to read them)

Hence this passage producessome files, that can be textfiles containg code of themarkup language or imagefiles, . . . , all more conve-niently stored in a separatedirectory (capture mkdir)named something like other

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

control code

I reshape, . . . ,

I listtex,

I file write

ghahgjkgshasg

hgfhasgwerw

(0

I file

write

Data and macros must beused to produce the con-trol code, which gathers to-gether all the piecesHencethis passage produces a sin-gle text file containg code ofthe markup language namedsomething like main.tex orindex.htm

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

tables

graphs

other output

control code

documentI screen

I paper

I reshape, . . . ,

I listtex,

I file write

ghahgjkgshasg

hgfhasgwerw

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

55

��

compilation

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

(

compilation

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

6>

compilation

@H

compilation

LATEX

I The file main.tex mustbe compiled (winexec) toproduce a single pdf filenamed main.pdf

I that then can be seen onthe screen or printed

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

The general schemeGenerating dta filesBasic elementsControl codeDocument production

data +macros

tables

graphs

other output

control code

document I web

I reshape, . . . ,

I listtex,

I file write

ghahgjkgshasg

hgfhasgwerw

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

5555

55

��

browsing

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

(

browsing

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

6>

browsing

@H

browsing

html

I There is no need ofcompiling

I the file index.htm can bebrowsed via any browser

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

AnalysisElements

Can some parts of the process be standardised?

I This process is very flexible

I This way one can obtain virtually any kind of automaticdocument

I Is it possible to standardize some passages by writing adofiles?

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

AnalysisElements

It would be very useful if the possibility of storing results of basicanalysis on dta files became more systematic

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

AnalysisElements

Just like graph’s schemes:

I Creating Stata “schemes” of LATEX tables?

I Creating Stata “schemes” of html tables (css)?

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

AnalysisElements

Implement an ado file for trees?

IntroductionAutomated documents

ExamplesWorkflow for an automated document to complete its task

Writing ado commands?

AnalysisElements

Agenzia Regionale di Sanita della Toscana.Isa 65+. Indicatori sulla salute e l’assistenza agli anziani.www.arsanita.toscana.it, 2005.

Rosa Gini, Jacopo Pasquini.2006. pr0020:Automatic generation of documents.The Stata Journal 6(1) 245-269.

Ghostscript, Ghostview and GSview.http://www.cs.wisc.edu/~ghost/.

Leslie Lamport.LATEX – A Document Preparation System.Addison-Wesley Publishing Co., 2nd edition, 1994.

Roger Newson.2003. st0043:Confidence intervals and p-values for delivery to the end user.The Stata Journal 3(3) 245-269.

HyperText Markup Language Home Page.www.w3.org/MarkUp, 2004.