Talend Component Kit Developer Reference Guide

133
Talend Component Kit Developer Reference Guide V

Transcript of Talend Component Kit Developer Reference Guide

Talend ComponentKit Developer

Reference GuideV

Table of Contents1. Talend Component Design Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1

1.1. Component API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1

1.2. Isolated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2

1.3. REST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3

1.4. Fixed set of icons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3

2. Talend Component Documentation Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

2.1. Getting help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

2.2. First steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

2.3. Learning about Talend Component features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

3. Talend Component Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

3.1. Introducing Talend Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

3.2. Talend Component System Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

3.3. Quick Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5

4. Talend Component Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5

4.1. Talend Components Definitions Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5

4.2. Components Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  26

4.3. Build tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  34

4.4. Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  45

4.5. Talend Component Testing Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  57

5. Talend Component Testing Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  77

5.1. Best practises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  77

5.2. component-runtime-testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  79

5.3. Beam testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  90

5.4. Multiple environments for the same tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  91

5.5. Secrets/Passwords and Maven. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  95

5.6. Generating data?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  95

6. Talend Component Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  96

6.1. Organize your code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  96

6.2. Modelize your configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  96

6.3. I/O configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  96

6.4. Processor configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  97

6.5. I/O recommandations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  98

6.6. I/O limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  99

6.7. Handle UI interactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  99

6.8. Version and component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  101

6.9. Don’t forget to test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  102

6.10. Contribute to this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  102

7. Talend Component REST API Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  102

7.1. HTTP API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  102

7.2. Web forms and REST API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  116

7.3. Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  120

7.4. Server Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  122

8. Wrapping a Beam I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  124

8.1. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  124

8.2. Wrap an input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  124

8.3. Wrap an output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  125

8.4. Tip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  126

8.5. Advanced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  127

9. Talend Component Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  129

9.1. ContainerManager or the classloader manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  129

1. Talend Component Design Choices

1.1. Component APIThe component API has multiple strong choices:

1. it is declarative (through annotations) to ensure it is

a. evolutive (it can get new fancy featrues without breaking old code)

b. static as much as possible

1.1.1. Evolution

Being fully declarative, any new API can be added iteratively without requiring anychanges to existing components.

Example (projection on beam potential evolution):

@ElementListenerpublic MyOutput onElement(MyInput data) {  return ...;}

wouldn’t be affected by the addition of the new Timer API which can be used like:

@ElementListenerpublic MyOutput onElement(MyInput data,  @Timer("my-timer") Timer timer) {  return ...;}

1.1.2. Static

UI friendly

Intent of the framework is to be able to fit java UI as well as web UI. It must be understoodas colocalized and remote UI. The direct impact of that choice is to try to move as much aspossible the logic to the UI side for UI related actions. Typically we want to validate apattern, a size, … on the client side and not on the server side. Being static encourages thispractise.

Auditable and with clear expectations

The other goal to be really static in its definition is to ensure the model will not bemutated at runtime and all the auditing and modelling can be done before, in the design

| 1

phase.

Dev friendly

Being static also ensures the development can be validated as much as possible throughbuild tools. This doesn’t replace the requirement to test the components but helps thedeveloper to maintain its components with automated tools.

1.1.3. Flexible data modeling

1.1.4. Generic and specific

The processor API supports JsonObject as well as any custom model. Intent is to supportgeneric component development which need to access configured "object paths" andspecific components which rely on a well defined path from the input.

A generic component would look like:

@ElementListenerpublic MyOutput onElement(JsonObject input) {  return ...;}

A specific component would look like (with MyInput a POJO):

@ElementListenerpublic MyOutput onElement(MyInput input) {  return ...;}

No runtime assumption

By design the framework must run in DI (plain standalone java program) but also in Beampipelines. It is also out of scope of the framework to handle the way the runtime serializes- if needed - the data. For that reason it is primordial to not import serialization constraintin the stack. This is why JsonObject is not an IndexedRecord from avro for instance, to notimpose any implementation. Any actual serialization concern - implementation - shouldeither be hidden in the framework runtime (= outside component developer scope) or inthe runtime integration with the framework (beam integration for instance). In thiscontext, JSON-P is a good compromise because it brings a very powerful API with very fewconstraints.

1.2. IsolatedThe components must be able to execute even if they have conflicting libraries. For that

2 |

purpose it requires to isolate their classloaders. For that purpose a component will defineits dependencies based on a maven format and will always be bound to its ownclassloader.

1.3. REST

1.3.1. Consumable model

The definition payload is as flat as possible and strongly typed to ensure it can bemanipulated by consumers. This way the consumers can add/remove fields with just somemapping rules and don’t require any abstract tree handling.

The execution (runtime) configuration is the concatenation of a few framework metadata(only the version actually) and a key/value model of the instance of the configurationbased on the definition properties paths for the keys. This enables the consumers tomaintain and work with the keys/values up to their need.

The framework not being responsible for any persistence it is crucial to ensure consumerscan handle it from end to end which includes the ability to search for values (update amachine, update a port etc…) and keys (new encryption rule on key certificate forinstance).

Talend component is a metamodel provider (to build forms) and runtime executionplatform (take a configuration instance and use it volatively to execute a componentlogic). This implies it can’t own the data more than defining the contract it has for thesetwo endpoints and must let the consumers handle the data lifecycle (creation, encryption,deletion, ….).

1.3.2. Execution with streaming

A new mime type called talend/stream is introduced to define a streaming format.

It basically matches a JSON object per line:

{"key1":"value1"}{"key2":"value2"}{"key1":"value11"}{"key1":"value111"}{"key2":"value2"}

1.4. Fixed set of iconsIcons (@Icon) are based on a fixed set. Even if a custom icon is usable this is without anyguarantee. This comes from the fact components can be used in any environment andrequire a kind of uniform look which can’t be guaranteed outside the UI itself so definingonly keys is the best way to communicate this information.

| 3

when you exactly know how you will deploy your component (ie in theStudio) then you can use `@Icon(value = CUSTOM, custom = "…") to use acustom icon file.

2. Talend Component DocumentationOverview

2.1. Getting helpTalend Component framework is under the responsability of Mike Hirt team.

2.2. First stepsIf you know nothing about Talend Components, the getting started is the place to startwith.

• From scratch: Overview | Requirements

• Tutorial: Code | Run

2.3. Learning about Talend Component features• Core features: Overview

• Advanced: Testing

3. Talend Component Getting Started

3.1. Introducing Talend ComponentTalend Component intends to simplify the development of connectors at two main levels:

Runtime

how to inject the specific component code into a job or pipeline. It should unify asmuch as possible the code required to run in DI and BEAM environments.

Graphical interfaces

unify the code required to be able to render in a browser (web) or the eclipse basedStudio (SWT).

3.2. Talend Component System RequirementTalend Component requires Java 8. You can download it on Oracle website.

4 |

To develop a component or the project itself it is recommanded to use Apache Maven3.5.0. you can download it on Apache Maven website.

3.3. Quick Start• Generate a component

• Create an input component

• Create an output component

• Test your components

• Configuration and sensitive data

• Create components for REST API

• How to test a REST API

• Dev vs CI setup

4. Talend Component Documentation

4.1. Talend Components DefinitionsDocumentation

4.1.1. Components Definition

Talend Component framework relies on several primitive components.

They can all use @PostConstruct and @PreDestroy to initialize/release some underlyingresource at the beginning/end of the processing.

in distributed environments class' constructor will be called on clustermanager node, methods annotated with @PostConstruct and @PreDestroyannotations will be called on worker nodes. Thus, partition plancomputation and pipeline task will be performed on different nodes.

| 5

1. Created task consists of Jar file, containing class, which describes pipeline(flow) whichshould be processed in cluster.

2. During partition plan computation step pipeline is analyzed and split into stages.Cluster Manager node instantiates mappers/processors gets estimated data size usingmappers, splits created mappers according to the estimated data size. All instances areserialized and sent to Worker nodes afterwards.

3. Serialized instances are received and deserialized, methods annotated with@PostConstruct annotation are called. After that, pipeline execution is started.Processor’s @BeforeGroup annotated method is called before processing first elementin chunk. After processing number of records estimated as chunk size, Processor’s@AfterGroup annotated method called. Chunk size is calculated depending onenvironment the pipeline is processed by. After pipeline is processed, methodsannotated with @PreDestroy annotation are called.

6 |

all framework managed methods MUST be public too. Private methods areignored.

in term of design the framework tries to be as declarative as possible butalso to stay extensible not using fixed interfaces or method signatures.This will allow to add incrementally new features of the underlyingimplementations.

| 7

PartitionMapper

A PartitionMapper is a component able to split itself to make theexecution more efficient.

This concept is borrowed to big data world and useful only in this context (BEAMexecutions). Overall idea is to divide the work before executing it to try to reduce theoverall execution time.

The process is the following:

1. Estimate the size of the data you will work on. This part is often heuristic and not veryprecise.

2. From that size the execution engine (runner for beam) will request the mapper to splititself in N mappers with a subset of the overall work.

3. The leaf (final) mappers will be used as a Producer (actual reader) factory.

this kind of component MUST be Serializable to be distributable.

Definition

A partition mapper requires 3 methods marked with specific annotations:

1. @Assessor for the evaluating method

2. @Split for the dividing method

3. @Emitter for the Producer factory

@Assessor

The assessor method will return the estimated size of the data related to the component(depending its configuration). It MUST return a Number and MUST not take any parameter.

Here is an example:

@Assessorpublic long estimateDataSetByteSize() {  return ....;}

@Split

The split method will return a collection of partition mappers and can take optionally a@PartitionSize long value which is the requested size of the dataset per sub partitionmapper.

Here is an example:

8 |

@Splitpublic List<MyMapper> split(@PartitionSize final long desiredSize) {  return ....;}

@Emitter

The emitter method MUST not have any parameter and MUST return a producer. It generallyuses the partition mapper configuration to instantiate/configure the producer.

Here is an example:

@Emitterpublic MyProducer create() {  return ....;}

Producer

A Producer is the component interacting with a physical source. Itproduces input data for the processing flow.

A producer is a very simple component which MUST have a @Producer method without anyparameter and returning any data:

@Producerpublic MyData produces() {  return ...;}

Processor

A Processor is a component responsible to convert an incoming datato another model.

A processor MUST have a method decorated with @ElementListener taking an incoming dataand returning the processed data:

@ElementListenerpublic MyNewData map(final MyData data) {  return ...;}

| 9

this kind of component MUST be Serializable since it is distributed.

if you don’t care much of the type of the parameter and need to accessdata on a "map like" based rule set, then you can use JsonObject asparameter type and Talend Component will just wrap the data to enableyou to access it as a map. The parameter type is not enforced, i.e. if youknow you will get a SuperCustomDto then you can use that as parametertype but for generic component reusable in any chain it is more thanhighly encouraged to use JsonObject until you have your an evaluationlanguage based processor (which has its own way to access component).Here is an example:

@ElementListenerpublic MyNewData map(final JsonObject incomingData) {  String name = incomingData.getString("name");  int name = incomingData.getInt("age");  return ...;}

// equivalent to (using POJO subclassing)

public class Person {  private String age;  private int age;

  // getters/setters}

@ElementListenerpublic MyNewData map(final Person person) {  String name = person.getName();  int name = person.getAge();  return ...;}

A processor also supports @BeforeGroup and @AfterGroup which MUST be methods withoutparameters and returning void (result would be ignored). This is used by the runtime tomark a chunk of the data in a way which is estimated good for the execution flow size.

this is estimated so you don’t have any guarantee on the size of a group.You can literally have groups of size 1.

The common usage is to batch records for performance reasons:

10 |

@BeforeGrouppublic void initBatch() {  // ...}

@AfterGrouppublic void endBatch() {  // ...}

it is a good practise to support a maxBatchSize here and potentiallycommit before the end of the group in case of a computed size which isway too big for your backend.

Multiple outputs

In some case you may want to split the output of a processor in two. A common exampleis "main" and "reject" branches where part of the incoming data are put in a specificbucket to be processed later.

This can be done using @Output. This can be used as a replacement of the returned value:

@ElementListenerpublic void map(final MyData data, @Output final OutputEmitter<MyNewData>output) {  output.emit(createNewData(data));}

Or you can pass it a string which will represent the new branch:

| 11

@ElementListenerpublic void map(final MyData data,  @Output final OutputEmitter<MyNewData> main,  @Output("rejected") final OutputEmitter<MyNewDataWithError>rejected) {  if (isRejected(data)) {  rejected.emit(createNewData(data));  } else {  main.emit(createNewData(data));  }}

// or simply

@ElementListenerpublic MyNewData map(final MyData data,  @Output("rejected") final OutputEmitter<MyNewDataWithError> rejected) {  if (isSuspicious(data)) {  rejected.emit(createNewData(data));  return createNewData(data); // in this case we continue the processinganyway but notified another channel  }  return createNewData(data);}

Multiple inputs

Having multiple inputs is closeto the output case excep it doesn’t require a wrapperOutputEmitter:

@ElementListenerpublic MyNewData map(@Input final MyData data, @Input("input2") final MyData2data2) {  return createNewData(data1, data2);}

@Input takes the input name as parameter, if not set it uses the main (default) inputbranch.

due to the work required to not use the default branch it isrecommanded to use it when possible and not name its branchesdepending on the component semantic.

12 |

Output

An Output is a Processor returning no data.

Conceptually an output is a listener of data. It perfectly matches the concept of processor.Being the last of the execution chain or returning no data will make your processor anoutput:

@ElementListenerpublic void store(final MyData data) {  // ...}

Combiners?

For now Talend Component doesn’t enable you to define a Combiner. It would be thesymmetric part of the partition mapper and allow to aggregate results in a single one.

4.1.2. Configuring components

Component are configured through their constructor parameters. They can all be markedwith @Option which will let you give a name to parameters (if not it will use the bytecodename which can require you to compile with -parameter flag to not have arg0, arg1, … asnames).

The parameter types can be primitives or complex objects with fields decorated with@Option exactly like method parameters.

it is recommanded to use simple models which can be serialized bycomponents to avoid headaches when implementing serializedcomponents.

Here is an example:

| 13

class FileFormat implements Serializable {  @Option("type")  private FileType type = FileType.CSV;

  @Option("max-records")  private int maxRecords = 1024;}

@PartitionMapper(family = "demo", name = "file-reader")public MyFileReader(@Option("file-path") final File file,  @Option("file-format") final FileFormat format) {  // ...}

Using this kind of API makes the configuration extensible and component oriented lettingthe user define all he needs.

The instantiation of the parameters is done from the properties passed to the component(see next part).

Primitives

What is considered as a primitive in this mecanism is a class which can be directlyconverted from a String to the expected type.

It obviously includes all java primitives, String type itself but also all the types with aorg.apache.xbean.propertyeditor.Converter.

This includes out of the box:

• BigDecimal

• BigInteger

• File

• InetAddress

• ObjectName

• URI

• URL

• Pattern

Complex object mapping

The conversion from properties to object is using the dotted notation. For instance:

file.path = /home/user/input.csvfile.format = CSV

14 |

will match

public class FileOptions {  @Option("path")  private File path;

  @Option("format")  private Format format;}

assuming the method parameter was configured with @Option("file").

List case

Lists use the same syntax but to define their elements their rely on an indexed syntax.Assuming the list parameter is named files and the elements are of  FileOptions type,here is how to define a list of 2 elements:

files[0].path = /home/user/input1.csvfiles[0].format = CSVfiles[1].path = /home/user/input2.xmlfiles[2].format = EXCEL

Map case

Inspired from the list case, the map uses .key[index] and .value[index] to represent itskey and values:

// Map<String, FileOptions>files.key[0] = first-filefiles.value[0].path = /home/user/input1.csvfiles.value[0].type = CSVfiles.key[1] = second-filefiles.value[1].path = /home/user/input2.xmlfiles.value[1].type = EXCEL

// Map<FileOptions, String>files.key[0].path = /home/user/input1.csvfiles.key[0].type = CSVfiles.value[0] = first-filefiles.key[1].path = /home/user/input2.xmlfiles.key[1].type = EXCELfiles.value[1] = second-file

| 15

don’t abuse of map type. If not needed for your configuration (= if youcan configure your component with an object) don’t use it.

Constraints and validation on the configuration/input

It is common to need to add as metadata a field is required, another has a minimum sizeetc. This is done with the validation inorg.talend.sdk.component.api.configuration.constraint package:

API Name

ParameterType

Description SupportedTypes

Metadatasample

@org.talend.sdk.component.api.configuration.constraint.Max

maxLength

double

Ensure the decoratedoption size is validatedwith a higher bound.

CharSequence

{"validation::maxLength":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Min

minLength

double

Ensure the decoratedoption size is validatedwith a lower bound.

CharSequence

{"validation::minLength":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Pattern

pattern

string Validate the decoratedstring with a java pattern,you can use xregex libraryin javascript.

CharSequence

{"validation::pattern":"test"}

@org.talend.sdk.component.api.configuration.constraint.Max

max double

Ensure the decoratedoption size is validatedwith a higher bound.

Number, int,short, byte,long, double,float

{"validation::max":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Min

min double

Ensure the decoratedoption size is validatedwith a lower bound.

Number, int,short, byte,long, double,float

{"validation::min":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Required

required

- Mark the field as beingmandatory.

Object {"validation::required":"true"}

@org.talend.sdk.component.api.configuration.constraint.Max

maxItems

double

Ensure the decoratedoption size is validatedwith a higher bound.

Collection {"validation::maxItems":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Min

minItems

double

Ensure the decoratedoption size is validatedwith a lower bound.

Collection {"validation::minItems":"12.34"}

16 |

API Name

ParameterType

Description SupportedTypes

Metadatasample

@org.talend.sdk.component.api.configuration.constraint.Uniques

uniqueItems

- Ensure the elements of thecollection must be distinct(kind of set).

Collection {"validation::uniqueItems":"true"}

using the programmatic API the metadata are prefixed by tcomp:: butthis prefix is stripped in the web for convenience, the previous table usesthe web keys.

Marking a configuration as a particular type of data

It is common to classify the incoming data. You can see it as tagging them in several types.The most common ones are the:

• datastore: all the data you need to connect to the backend

• dataset: a datastore coupled with all the data you need to execute an action

API Type

Description Metadata sample

org.talend.sdk.component.api.configuration.type.DataSet

dataset

Mark a model (complexobject) as being a dataset.

{"tcomp::configurationtype::type":"dataset","tcomp::configurationtype::name":"test"}

org.talend.sdk.component.api.configuration.type.DataStore

datastore

Mark a model (complexobject) as being a datastore(connection to a backend).

{"tcomp::configurationtype::type":"datastore","tcomp::configurationtype::name":"test"}

the component family associated with a configuration type(datastore/dataset) is always the one related to the component using thatconfiguration.

Those configuration types can be composed to provide one configuration item. Forexample a dataset type will often need a datastore type to be provided. and a datastoretype (that provides the connection information) will be used to create a dataset type.

Those configuration types will also be used at design time to create shared configurationthat can be stored and used at runtime.

For example, we can think about a relational database that support JDBC:

• A datastore may provide:

◦ jdbc url, username, password

• A dataset may be:

| 17

◦ datastore (that will provide the connection data to the database)

◦ table name, data []

The component server will scan all those configuration types and provide a configurationtype index. This index can be used for the integration into the targeted platforms (studio,web applications…)

The configuration type index is represented as a flat tree that contains all theconfiguration types represented as nodes and indexed by their ids.

Also, every node can point to other nodes. This relation is represented as an array ofedges that provide the childes ids.

For example, a configuration type index for the above example will be:

{nodes: {  "idForDstore": { datastore:"datastore data", edges:[id:"idForDset"] },  "idForDset": { dataset:"dataset data" }  }}

Define links between properties

It can be needed to define a binding between properties, a set of annotations allows to doit:

API Name

Description Metadata Sample

@org.talend.sdk.component.api.configuration.condition.ActiveIf

if If the evaluation of theelement at the locationmatches value then theelement is considered active,otherwise it is deactivated.

{"condition::if::target":"test","condition::if::value":"value1,value2"}

@org.talend.sdk.component.api.configuration.condition.ActiveIfs

ifs Allows to set multiplevisibility conditions on thesame property.

{"condition::if::value::0":"value1,value2","condition::if::value::1":"SELECTED","condition::if::target::0":"sibling1","condition::if::target::1":"../../other"}

Target element location is specified as a relative path to current location using Unix pathcharacters. Configuration class delimiter is /. Parent configuration class is specified by ...Thus ../targetProperty denotes a property, which is located in parent configuration classand has name targetProperty.

18 |

using the programmatic API the metadata are prefixed by tcomp:: butthis prefix is stripped in the web for convenience, the previous table usesthe web keys.

Add hints about the rendering based on configuration/component knowledge

In some case it can be needed to add some metadata about the configuration to let the UIrender properly the configuration. A simple example is a password value must be hiddenand not a simple clear input box. For these cases - when the component developper wantsto influence the UI rendering - you can use a particular set of annotations:

API Description Generated propertymetadata

@org.talend.sdk.component.api.configuration.ui.DefaultValue

Provide a default value theUI can use - only forprimitive fields.

{"ui::defaultvalue::value":"test"}

@org.talend.sdk.component.api.configuration.ui.OptionsOrder

Allows to sort a classproperties.

{"ui::optionsorder::value":"value1,value2"}

@org.talend.sdk.component.api.configuration.ui.layout.AutoLayout

Request the rendered to dowhat it thinks is best.

{"ui::autolayout":"true"}

@org.talend.sdk.component.api.configuration.ui.layout.GridLayout

Advanced layout to placeproperties by row, this isexclusive with@OptionsOrder.

{"ui::gridlayout::value1::value":"first

second,third","ui::gridlayout::value2::value":"first

second,third"} @org.talend.sdk.component.api.configuration.ui.layout.GridLayouts

Allow to configure multiplegrid layouts on the sameclass, qualified with aclassifier (name)

{"ui::gridlayout::Advanced::value":"another","ui::gridlayout::Main::value":"first

second,third"}

@org.talend.sdk.component.api.configuration.ui.layout.HorizontalLayout

Put on a configuration classit notifies the UI anhorizontal layout ispreferred.

{"ui::horizontallayout":"true"}

@org.talend.sdk.component.api.configuration.ui.layout.VerticalLayout

Put on a configuration classit notifies the UI a verticallayout is preferred.

{"ui::verticallayout":"true"}

@org.talend.sdk.component.api.configuration.ui.widget.Code

Mark a field as beingrepresented by some codewidget (vs textarea forinstance).

{"ui::code::value":"test"}

| 19

API Description Generated propertymetadata

@org.talend.sdk.component.api.configuration.ui.widget.Credential

Mark a field as being acredential. It is typicallyused to hide the value in theUI.

{"ui::credential":"true"}

@org.talend.sdk.component.api.configuration.ui.widget.Structure

Mark a List<String> orMap<String, String> field asbeing represented as thecomponent data selector(field names generally orfield names as key and typeas value).

{"ui::structure::type":"null","ui::structure::discoverSchema":"test","ui::structure::value":"test"}

@org.talend.sdk.component.api.configuration.ui.widget.TextArea

Mark a field as beingrepresented by atextarea(multiline textinput).

{"ui::textarea":"true"}

using the programmatic API the metadata are prefixed by tcomp:: butthis prefix is stripped in the web for convenience, the previous table usesthe web keys.

target support should coverorg.talend.core.model.process.EParameterFieldType but we need toensure web renderers is able to handle the same widgets.

4.1.3. Gallery

Widgets

Name

Code Studio Rendering Web Rendering

Input/Text

@OptionString config;

Password

@Option@CredentialString config;

20 |

Name Code Studio Rendering Web Rendering

Propertyvalidation

/** configurationclass */@Option@Validable("url")String config;

/** service class*/@AsyncValidation("url")ValidationResultdoValidate(Stringurl) {}

Datastorevalidation

@Datastore@Checkablepublic class config{/** config ...*/}

/** service class*/@HealthCheckpublicHealthCheckStatustestConnection(){

}

4.1.4. Registering components

As seen in the Getting Started, you need an annotation to register your componentthrough family method. Multiple components can use the same family value but the pairfamily+name MUST be unique for the system.

If you desire (recommended) to share the same component family name instead ofrepeating yourself in all family methods, you can use @Components annotation on the rootpackage of you component, it will enable you to define the component family and thecategories the component belongs to (default is Misc if not set). Here is a sample package-info.java:

22 |

@Components(name = "my_component_family", categories = "My Category")package org.talend.sdk.component.sample;

import org.talend.sdk.component.api.component.Components;

For an existing component it can look like:

@Components(name = "Salesforce", categories = {"Business", "Cloud"})package org.talend.sdk.component.sample;

import org.talend.sdk.component.api.component.Components;

Components metadata

Components can require a few metadata to be integrated in Talend Studio or Cloudplatform. Here is how to provide these information. These metadata are set on thecomponent class and belongs to org.talend.sdk.component.api.component package.

API Description

@Icon Set an icon key used to represent the component. Note you can use a customkey with custom() method but it is not guaranteed the icon will be renderedproperly.

@Version

Set the component version, default to 1.

Example:

@Icon(FILE_XML_O)@PartitionMapper(name = "jaxbInput")public class JaxbPartitionMapper implements Serializable {  // ...}

Management of configuration versions

If some impacting changes happen on the configuration they can be manage through amigration handler at component level (to enable to support trans-model migration).

The @Version annotation supports a migrationHandler method which will take theimplementation migrating the incoming configuration to the current model.

For instance if filepath configuration entry from v1 changed to location in v2 you canremap the value to the right key in your MigrationHandler implementation.

| 23

it is recommanded to not manage all migrations in the handler but rathersplit it in services you inject in the migration handler (throughconstructor):

// full component code structure skipped for brievity, kept only migrationpart@Version(value = 3, migrationHandler = MyComponent.Migrations.class)public class MyComponent {  // the component code...

  private interface VersionConfigurationHandler {  Map<String, String> migrate(Map<String, String> incomingData);  }

  public static class Migrations {  private final List<VersionConfigurationHandler> handlers;

  // VersionConfigurationHandler implementations are decorated with@Service  public Migrations(final List<VersionConfigurationHandler> migrations){  this.handlers = migrations;  this.handlers.sort(/*some custom logic*/);  }

  @Override  public Map<String, String> migrate(int incomingVersion, Map<String,String> incomingData) {  Map<String, String> out = incomingData;  for (MigrationHandler handler : handlers) {  out = handler.migrate(out);  }  }  }}

What is important in this snippet is not much the way the code is organized but rather thefact you organize your migrations the way which fits the best your component. Ifmigrations are not conflicting no need of something fancy, just apply them all but if youneed to apply them in order you need to ensure they are sorted. Said otherwise: don’t seethis API as a migration API but as a migration callback and adjust the migration codestructure you need behind the MigrationHandler based on your component requirements.The service injection enables you to do so.

24 |

@PartitionMapper

@PartitionMapper will obviously mark a partition mapper:

@PartitionMapper(family = "demo", name = "my_mapper")public class MyMapper {}

@Emitter

@Emitter is a shortcut for @PartitionMapper when you don’t support distribution. Saidotherwise it will enforce an implicit partition mapper execution with an assessor size of 1and a split returning itself.

@Emitter(family = "demo", name = "my_input")public class MyInput {}

@Processor

A method decorated with @Processor will be considered as a producer factory:

@Processor(family = "demo", name = "my_processor")public class MyProcessor {}

4.1.5. Internationalization

In the simplest case you should store messages using ResourceBundle properties file inyour component module to use internationalization. The location of the properties fileshould be in the same package as the related component(s) and is named Messages (ex:org.talend.demo.MyComponent will use org.talend.demo.Messages[locale].properties).

Default components keys

Out of the box components are internationalized using the same location logic for theresource bundle and here is the list of supported keys:

Name Pattern Description

${family}._displayName the display name of the family

${family}.${configurationType}.${name}._displayName

the display name of a configuration type(dataStore or dataSet)

${family}.${component_name}._displayName

the display name of the component (used bythe GUIs)

| 25

Name Pattern Description

${property_path}._displayName the display name of the option.

${simple_class_name}.${property_name}._displayName

the display name of the option using it classname.

${property_path}._placeholder the placeholder of the option.

Example of configuration for a component named list belonging to the family memory(@Emitter(family = "memory", name = "list")):

memory.list._displayName = Memory List

Configuration class are also translatable using the simple class name in the messagesproperties file. This useful when you have some common configuration shared withinmultiple components.

If you have a configuration class like :

public class MyConfig {

  @Option  private String host;

  @Option  private int port;}

You can give it a translatable display name by adding${simple_class_name}.${property_name}._displayName to Messages.properties under thesame package as the config class.

MyConfig.host._displayName = Server Host NameMyConfig.host._placeholder = Enter Server Host Name...

MyConfig.port._displayName = Server PortMyConfig.port._placeholder = Enter Server Port...

If you have a display name using the property path, it will override thedisplay name defined using the simple class name. this rule apply also toplaceholders

4.2. Components Packaging

26 |

4.2.1. Component Loading

Talend Component scanning is based on a plugin concept. To ensure plugins can bedevelopped in parallel and avoid conflicts it requires to isolate plugins (components orcomponent grouped in a single jar/plugin).

Here we have multiple options which are (high level):

• flat classpath: listed for completeness but rejected by design because it doesn’t matchat all this requirement.

• tree classloading: a shared classloader inherited by plugin classloaders but pluginclassloader classes are not seen by the shared classloader nor by other plugins.

• graph classloading: this one allows you to link the plugins and dependencies togetherdynamically in any direction.

If you want to map it to concrete common examples, the tree classloading is commonlyused by Servlet containers where plugins are web applications and the graph classloadingcan be illustrated by OSGi containers.

In the spirit of avoiding a lot of complexity added by this layer, Talend Component relieson a tree classloading. The advantage is you don’t need to define the relationship withother plugins/dependencies (it is built-in).

Here is a representation of this solution:

The interesting part is the shared area will contain Talend Component API which is theonly (by default) shared classes accross the whole plugins.

Then each plugins will be loaded in their own classloader with their dependencies.

Packaging a plugin

this part explains the overall way to handle dependecnies but the TalendMaven plugin provides a shortcut for that.

A plugin is just a jar which was enriched with the list of its dependencies. By defaultTalend Component runtime is able to read the output of maven-dependency-plugin inTALEND-INF/dependencies.txt location so you just need to ensure your component defines

| 27

the following plugin:

<plugin>  <groupId>org.apache.maven.plugins</groupId>  <artifactId>maven-dependency-plugin</artifactId>  <version>3.0.2</version>  <executions>  <execution>  <id>create-TALEND-INF/dependencies.txt</id>  <phase>process-resources</phase>  <goals>  <goal>list</goal>  </goals>  <configuration>  <outputFile>${project.build.outputDirectory}/TALEND-INF/dependencies.txt</outputFile>  </configuration>  </execution>  </executions></plugin>

If you check your jar once built you will see that the file contains something like:

$ unzip -p target/mycomponent-1.0.0-SNAPSHOT.jar TALEND-INF/dependencies.txt

The following files have been resolved:  org.talend.sdk.component:component-api:jar:1.0.0-SNAPSHOT:provided  org.apache.geronimo.specs:geronimo-annotation_1.3_spec:jar:1.0:provided  org.superbiz:awesome-project:jar:1.2.3:compile  junit:junit:jar:4.12:test  org.hamcrest:hamcrest-core:jar:1.3:test

What is important to see is the scope associated to the artifacts:

• the API (component-api and geronimo-annotation_1.3_spec) are provided because youcan consider them to be there when executing (it comes with the framework)

• your specific dependencies (awesome-project) is compile: it will be included as a neededdependency by the framework (note that using runtime works too).

• the other dependencies will be ignored (test dependencies)

Packaging an application

Even if a flat classpath deployment is possible, it is not recommanded because it wouldthen reduce the capabilities of the components.

28 |

Dependencies

The way the framework resolves dependencies is based on a local maven repositorylayout. As a quick reminder it looks like:

.├── groupId1│   └── artifactId1│   ├── version1│   │   └── artifactId1-version1.jar│   └── version2│      └── artifactId1-version2.jar└── groupId2    └── artifactId2    └── version1       └── artifactId2-version1.jar

This is all the layout the framework will use. Concretely the logic will convert the t-uple{groupId, artifactId, version, type (jar)} to the path in the repository.

Talend Component runtime has two ways to find an artifact:

• from the file system based on a configure maven 2 repository.

• from a fatjar (uber jar) with a nested maven repository under MAVEN-INF/repository.

The first option will use either - by default - ${user.home}/.m2/repository or a specific pathconfigured when creating a ComponentManager. The nested repository option will needsome configuration during the packaging to ensure the repository is well created.

Create a nested maven repository with maven-shade-plugin

To create the nested MAVEN-INF/repository repository you can use nested-maven-repositoryextension:

| 29

<plugin>  <groupId>org.apache.maven.plugins</groupId>  <artifactId>maven-shade-plugin</artifactId>  <version>3.0.0</version>  <executions>  <execution>  <phase>package</phase>  <goals>  <goal>shade</goal>  </goals>  <configuration>  <transformers>  <transformer implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer">  <session>${session}</project>  </transformer>  </transformers>  </configuration>  </execution>  </executions>  <dependencies>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>nested-maven-repository</artifactId>  <version>${the.plugin.version}</version>  </dependency>  </dependencies></plugin>

Listing needed plugins

Plugin are programmatically registered in general but if you want to make some of themautomatically available you need to generate a TALEND-INF/plugins.properties which willmap a plugin name to coordinates found with the maven mecanism we just talked about.

Here again we can enrich maven-shade-plugin to do it:

30 |

<plugin>  <groupId>org.apache.maven.plugins</groupId>  <artifactId>maven-shade-plugin</artifactId>  <version>3.0.0</version>  <executions>  <execution>  <phase>package</phase>  <goals>  <goal>shade</goal>  </goals>  <configuration>  <transformers>  <transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">  <session>${session}</project>  </transformer>  </transformers>  </configuration>  </execution>  </executions>  <dependencies>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>nested-maven-repository</artifactId>  <version>${the.plugin.version}</version>  </dependency>  </dependencies></plugin>

maven-shade-plugin extensions

Here is a final job/application bundle based on maven shade plugin:

<plugin>  <groupId>org.apache.maven.plugins</groupId>  <artifactId>maven-shade-plugin</artifactId>  <version>3.0.0</version>  <configuration>  <createDependencyReducedPom>false</createDependencyReducedPom>  <filters>  <filter>  <artifact><strong>:</strong></artifact>  <excludes>  <exclude>META-INF/<strong>.SF</exclude>  <exclude>META-INF/</strong>.DSA</exclude>  <exclude>META-INF/*.RSA</exclude>  </excludes>  </filter>

| 31

  </filters>  </configuration>  <executions>  <execution>  <phase>package</phase>  <goals>  <goal>shade</goal>  </goals>  <configuration>  <shadedClassifierName>shaded</shadedClassifierName>  <transformers>  <transformer  implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer">  <session>${session}</session>  <userArtifacts>  <artifact>  <groupId>org.talend.sdk.component</groupId>  <artifactId>sample-component</artifactId>  <version>1.0</version>  <type>jar</type>  </artifact>  </userArtifacts>  </transformer>  <transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">  <session>${session}</session>  <userArtifacts>  <artifact>  <groupId>org.talend.sdk.component</groupId>  <artifactId>sample-component</artifactId>  <version>1.0</version>  <type>jar</type>  </artifact>  </userArtifacts>  </transformer>  </transformers>  </configuration>  </execution>  </executions>  <dependencies>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>nested-maven-repository-maven-plugin</artifactId>  <version>${the.version}</version>  </dependency>  </dependencies></plugin>

32 |

the configuration unrelated to transformers can depend yourapplication.

ContainerDependenciesTransformer is the one to embed a maven repository andPluginTransformer to create a file listing (one per line) a list of artifacts (representingplugins).

Both transformers share most of their configuration:

• session: must be set to ${session}. This is used to retrieve dependencies.

• scope: a comma separated list of scope to include in the artifact filtering (note that thedefault will rely on provided but you can replace it by compile, runtime,runtime+compile, runtime+system, test).

• include: a comma separated list of artifact to include in the artifact filtering.

• exclude: a comma separated list of artifact to exclude in the artifact filtering.

• userArtifacts: a list of artifacts (groupId, artifactId, version, type - optional, file -optional for plugin transformer, scope - optional) which can be forced inline - mainlyuseful for PluginTransformer.

• includeTransitiveDependencies: should transitive dependencies of the components beincluded, true by default.

• includeProjectComponentDependencies: should project component dependencies beincluded, false by default (normally a job project uses isolation for components so thisis not needed).

• userArtifacts: set of component artifacts to include.

to use with the component tooling, it is recommended to keep defaultlocations. Also if you feel you need to use project dependencies, you canneed to refactor your project structure to ensure you keep componentisolation. Talend component let you handle that part but therecommended practise is to use userArtifacts for the components andnot the project <dependencies>.

ContainerDependenciesTransformer

ContainerDependenciesTransformer specific configuration is the following one:

• repositoryBase: base repository location (default to MAVEN-INF/repository).

• ignoredPaths: a comma separated list of folder to not create in the output jar, this iscommon for the ones already created by other transformers/build parts.

PluginTransformer

ContainerDependenciesTransformer specific configuration is the following one:

| 33

• pluginListResource: base repository location (default to TALEND-INF/plugins.properties`).

Example: if you want to list only the plugins you use you can configure this transformerlike that:

<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">  <session>${session}</session>  <include>org.talend.sdk.component:component-x,org.talend.sdk.component:component-y,org.talend.sdk.component:component-z</include></transformer>

4.3. Build tools

4.3.1. Maven Plugin

talend-component-maven-plugin intends to help you to write components validatingcomponents match best practices and also generating transparently metadata used byTalend Studio.

Here is how to use it:

<plugin>  <groupId>org.talend.sdk.component</groupId>  <artifactId>talend-component-maven-plugin</artifactId>  <version>${component.version}</version></plugin>

Note that this plugin is also an extension so you can declare it in your build/extensionsblock as:

<extension>  <groupId>org.talend.sdk.component</groupId>  <artifactId>talend-component-maven-plugin</artifactId>  <version>${component.version}</version></extension>

Used as an extension, dependencies, validate and documentation goals will be set up.

Dependencies

The first goal is a shortcut for the maven-dependency-plugin, it will create the TALEND-INF/dependencies.txt file with the compile and runtime dependencies to let the component

34 |

use it at runtime:

<plugin>  <groupId>org.talend.sdk.component</groupId>  <artifactId>talend-component-maven-plugin</artifactId>  <version>${component.version}</version>  <executions>  <execution>  <id>talend-dependencies</id>  <goals>  <goal>dependencies</goal>  </goals>  </execution>  </executions></plugin>

Validate

The most important goal is here to help you to validate the common programming modelof the component. Here is the execution definition to activate it:

<plugin>  <groupId>org.talend.sdk.component</groupId>  <artifactId>talend-component-maven-plugin</artifactId>  <version>${component.version}</version>  <executions>  <execution>  <id>talend-component-validate</id>  <goals>  <goal>validate</goal>  </goals>  </execution>  </executions></plugin>

By default it will be bound to process-classes phase. When executing it will do severalvalidations which can be switched off adding the corresponding flags to false in the<configuration> block of the execution:

Name Description Default

validateInternationalization

Validates resource bundle are presents and containcommonly used keys (like _displayName)

true

validateModel Ensure components pass validations of the ComponentManagerand Talend Component runtime

true

| 35

Name Description Default

validateSerializable Ensure components are Serializable - note this is a sanitycheck, the component is not actually serialized here, if youhave a doubt ensure to test it. It also checks any@Internationalized class is valid and has its keys.

true

validateMetadata Ensure components define an @Icon and @Version. true

validateDataStore Ensure any @DataStore defines a @HealthCheck. true

validateComponent Ensure native programming model is respected, you candisable it when using another programming model like inbeam case.

true

validateActions Validate actions signatures for the ones not toleratingdynamic binding (@HealthCheck, @DynamicValues, …). It isrecommanded to keep it true.

true

validateFamily Validate the family, i.e. the package containing the@Components has also a @Icon.

true

validateDocumentation

Ensure all 1. components and 2. @Option properties have adocumentation using @Documentation

true

Documentation

This goal generates an Asciidoc file documenting your component from the configurationmodel (@Option) and @Documentation you can put on options and the component itself.

<plugin>  <groupId>org.talend.sdk.component</groupId>  <artifactId>talend-component-maven-plugin</artifactId>  <version>${component.version}</version>  <executions>  <execution>  <id>talend-component-documentation</id>  <goals>  <goal>asciidoc</goal>  </goals>  </execution>  </executions></plugin>

Name Description Default

level Which level are the root title 2 which means ==

output Where to store the output, it is NOTrecommended to change it

${classes}/TALEND-INF/documentation.adoc

36 |

Name Description Default

formats A map of the renderings to do, keysare the format (pdf or html) andvalues the output paths

-

attributes A map of asciidoctor attributes whenformats is set

-

templateDir/templateEngine

Template configuration for therendering

-

title Document title ${project.name}

attachDocumentations

Should the documentations (.adoc,and formats keys) should be attachedto the project (and deployed)

true

if you use the extension you can add the propertytalend.documentation.htmlAndPdf and set it to true in your project toautomatically get a html and PDF rendering of the documentation.

Render your documentation

HTML

To render the generated documentation you can use the Asciidoctor Maven plugin (orGradle equivalent):

| 37

<plugin> (1)  <groupId>org.talend.sdk.component</groupId>  <artifactId>talend-component-maven-plugin</artifactId>  <version>${talend-component-kit.version}</version>  <executions>  <execution>  <id>documentation</id>  <phase>prepare-package</phase>  <goals>  <goal>asciidoc</goal>  </goals>  </execution>  </executions></plugin><plugin> (2)  <groupId>org.asciidoctor</groupId>  <artifactId>asciidoctor-maven-plugin</artifactId>  <version>1.5.6</version>  <executions>  <execution>  <id>doc-html</id>  <phase>prepare-package</phase>  <goals>  <goal>process-asciidoc</goal>  </goals>  <configuration>  <sourceDirectory>${project.build.outputDirectory}/TALEND-INF</sourceDirectory>  <sourceDocumentName>documentation.adoc</sourceDocumentName>  <outputDirectory>${project.build.directory}/documentation</outputDirectory>  <backend>html5</backend>  </configuration>  </execution>  </executions></plugin>

1. Will generate in target/classes/TALEND-INF/documentation.adoc the componentsdocumentation.

2. Will render the documenation as an html file intarget/documentation/documentation.html.

ensure to execute it after the documentation generation.

PDF

If you prefer a PDF rendering you can configure the following execution in the asciidoctor

38 |

plugin (note that you can configure both executions if you want both HTML and PDFrendering):

<plugin>  <groupId>org.asciidoctor</groupId>  <artifactId>asciidoctor-maven-plugin</artifactId>  <version>1.5.6</version>  <executions>  <execution>  <id>doc-html</id>  <phase>prepare-package</phase>  <goals>  <goal>process-asciidoc</goal>  </goals>  <configuration>  <sourceDirectory>${project.build.outputDirectory}/TALEND-INF</sourceDirectory>  <sourceDocumentName>documentation.adoc</sourceDocumentName>  <outputDirectory>${project.build.directory}/documentation</outputDirectory>  <backend>pdf</backend>  </configuration>  </execution>  </executions>  <dependencies>  <dependency>  <groupId>org.asciidoctor</groupId>  <artifactId>asciidoctorj-pdf</artifactId>  <version>1.5.0-alpha.16</version>  </dependency>  </dependencies></plugin>

Include the documentation into a document

If you want to add some more content or add a title, you can include the generateddocument into another document using Asciidoc include directive.

A common example is:

| 39

= Super ComponentsSuper Writer:toc::toclevels: 3:source-highlighter: prettify:numbered::icons: font:hide-uri-scheme::imagesdir: images

include::{generated_doc}/documentation.adoc[]

This assumes you pass to the plugin the attribute generated_doc, this can be done this way:

<plugin>  <groupId>org.asciidoctor</groupId>  <artifactId>asciidoctor-maven-plugin</artifactId>  <version>1.5.6</version>  <executions>  <execution>  <id>doc-html</id>  <phase>prepare-package</phase>  <goals>  <goal>process-asciidoc</goal>  </goals>  <configuration>  <sourceDirectory>${project.basedir}/src/main/asciidoc</sourceDirectory>  <sourceDocumentName>my-main-doc.adoc</sourceDocumentName>  <outputDirectory>${project.build.directory}/documentation</outputDirectory>  <backend>html5</backend>  <attributes>  <generated_adoc>${project.build.outputDirectory}/TALEND-INF</generated_adoc>  </attributes>  </configuration>  </execution>  </executions></plugin>

This is optional but allows to reuse maven placeholders to pass paths which is quiteconvenient in an automated build.

More

You can find more customizations on Asciidoctor website.

40 |

Web

Testing the rendering of your component(s) configuration into the Studio is just a matterof deploying a component in Talend Studio (you can have a look tolink::studio.html[Studio Documentation] page. But don’t forget the component can also bedeployed into a Cloud (web) environment. To ease the testing of the related rendering, youcan use the goal web of the plugin:

mvn talend-component:web

Then you can test your component going on localhost:8080. You need to select whichcomponent form you want to see using the treeview on the left, then on the right the formwill be displayed.

The two available configurations of the plugin are serverPort which is a shortcut tochange the default, 8080, port of the embedded server and serverArguments to passMeecrowave options to the server. More on that configuration is available atopenwebbeans.apache.org/meecrowave/meecrowave-core/cli.html.

this command reads the component jar from the local maven repositoryso ensure to install the artifact before using it.

Generate inputs or outputs

The Mojo generate (maven plugin goal) of the same plugin also embeds a generator youcan use to bootstrap any input or output component:

| 41

<plugin>  <groupId>org.talend.sdk.component</groupId>  <artifactId>talend-component-maven-plugin</artifactId>  <version>${talend-component.version}</version>  <executions>  <execution> ①  <id>generate-input</id>  <phase>generate-sources</phase>  <goals>  <goal>generate</goal>  </goals>  <configuration>  <type>input</type>  </configuration>  </execution>  <execution> ②  <id>generate-output</id>  <phase>generate-sources</phase>  <goals>  <goal>generate</goal>  </goals>  <configuration>  <type>output</type>  </configuration>  </execution>  </executions></plugin>

① Generates an input (partition mapper + emitter)

② Generates an output

It is intended to be used from the command line (or IDE Maven integration):

$ mvn talend-component:generate \  -Dtalend.generator.type=[input|output] \ ①  [-Dtalend.generator.classbase=com.test.MyComponent] \ ②  [-Dtalend.generator.family=my-family] \ ③  [-Dtalend.generator.pom.read-only=false] ④

① select the type of component you want, input to generate a mapper and emitter andoutput to generate an output processor

② set the class name base (will be suffixed by the component type), if not set the packagewill be guessed and classname based on the basedir name

③ set the component family to use, default to the base dir name removing (component[s]from the name, ex: my-component will lead to my as family if not explicitly set)

④ should the generator try to add component-api in the pom if not already here, if you

42 |

added it you can set it to false directly in the pom

For this command to work you will need to just register the plugin:

<plugin>  <groupId>org.talend.sdk.component</groupId>  <artifactId>talend-component-maven-plugin</artifactId>  <version>${talend-component.version}</version></plugin>

Talend Component Archive

Component ARchive (.car) is the way to bundle a component to share it in Talendecosystem. It is a plain Java ARchive (.jar) containing a metadata file and a nested mavenrepository containing the component and its depenencies.

mvn talend-component:car

It will create a .car in your build directory which is shareable on Talend platforms.

Note that this CAR is executable and exposes the command studio-deploy which takes asparameter a Talend Studio home location. Executed it will install the dependencies intothe studio and register the component in your instance. Here is a sample launchcommand:

# for a studiojava -jar mycomponent.car studio-deploy /path/to/my/studio

# for a m2 provisioningjava -jar mycomponent.car maven-deploy /path/to/.m2/repository

4.3.2. Gradle Plugin

gradle-talend-component intends to help you to write components validating componentsmatch best practices. It is inspired from the Maven plugin and adds the ability to generateautomatically the dependencies.txt file the SDK uses to build the component classpath. Formore information on the configuration you can check out the maven properties matchingthe attributes.

Here is how to use it:

buildscript {  repositories {  mavenLocal()  mavenCentral()

| 43

  }  dependencies {  classpath "org.talend.sdk.component:gradle-talend-component:${talendComponentVersion}"  }}

apply plugin: 'org.talend.sdk.component'apply plugin: 'java'

// optional customizationtalendComponentKit {  // dependencies.txt generation, replaces maven-dependency-plugin  dependenciesLocation = "TALEND-INF/dependencies.txt"  boolean skipDependenciesFile = false;

  // classpath for validation utilities  sdkVersion = "${talendComponentVersion}"  apiVersion = "${talendComponentApiVersion}"

  // documentation  skipDocumentation = false  documentationOutput = new File(....)  documentationLevel = 2 // first level will be == in the generated adoc  documentationTitle = 'My Component Family' // default to project name  documentationFormats = [:] // adoc attributes  documentationFormats = [:] // renderings to do

  // validation  skipValidation = false  validateFamily = true  validateSerializable = true  validateInternationalization = true  validateModel = true  validateMetadata = true  validateComponent = true  validateDataStore = true  validateDataSet = true  validateActions = true

  // web  serverArguments = []  serverPort = 8080

  // car  carOutput = new File(....)  carMetadata = [:] // custom meta (string key-value pairs)}

44 |

4.4. Services

4.4.1. Internationalization

Recommanded practise for internationalization are:

• store messages using ResourceBundle properties file in your component module

• the location of the properties are in the same package than the related component(s)and is named Messages (ex: org.talend.demo.MyComponent will useorg.talend.demo.Messages[locale].properties)

• for your own messages use the internationalization API

Internationalization API

Overal idea is to design its messages as methods returning String values and back thetemplate by a ResourceBundle located in the same package than the interface definingthese methods and named Messages.

this is the mecanism to use to internationalize your own messages inyour own components.

To ensure you internationalization API is identified you need to mark it with@Internationalized:

@Internationalized ①public interface Translator {

  String message();

  String templatizedMessage(String arg0, int arg1); ②

  String localized(String arg0, @Language Locale locale); ③}

① @Internationalized allows to mark a class as a i18n service

② you can pass parameters and the message will use MessageFormat syntax to be resolvedbased on the ResourceBundle template

③ you can use @Language on a Locale parameter to specify manually the locale to use, notethat a single value will be used (the first parameter tagged as such).

4.4.2. Providing some actions for consumers/clients

In some cases you will desire to add some actions unrelated to the runtime. A simpleexample is to enable clients - the users of the plugin/library - to test if a connection works.Even more concretely: does my database is up?.

| 45

To do so you need to define an @Action which is a method with a name (representing theevent name) in a class decorated with @Service:

@Servicepublic class MyDbTester {  @Action(family = "mycomp", "test")  public Status doTest(final IncomingData data) {  return ...;  }}

services are singleton so if you need some thread safety ensure theymatch that requirement. They shouldn’t store any state too (state is heldby the component) since they can be serialized any time.

services are usable in components as well (matched by type) and allow toreuse some shared logic like a client. Here is a sample with a service usedto access files:

@Emitter(family = "sample", name = "reader")public class PersonReader implements Serializable {  // attributes skipped to be concise

  public PersonReader(@Option("file") final File file,  final FileService service) {  this.file = file;  this.service = service;  }

  // use the service  @PostConstruct  public void open() throws FileNotFoundException {  reader = service.createInput(file);  }

}

service is passed to constructor automatically, it can be used as a bean.Only call of service’s method is required.

Particular action types

Some actions are that common and need a clear contract so they are defined as API firstcitizen, this is the case for wizards or healthchecks for instance. Here is the list of allactions:

46 |

API Type Description Return type

Samplereturned type

@org.talend.sdk.component.api.service.completion.DynamicValues

dynamic_values

Mark a method as beinguseful to fill potential valuesof a string option for aproperty denoted by itsvalue. You can link a field asbeing completable using@Proposable(value). Theresolution of the completionaction is then done throughthe component family andvalue of the action. Thecallback doesn’t take anyparameter.

Values {"items":[{"id":"value","label":"label"}]}

@org.talend.sdk.component.api.service.healthcheck.HealthCheck

healthcheck

This class marks an actiondoing a connection test

HealthCheckStatus

{"comment":"Something wentwrong","status":"KO"}

@org.talend.sdk.component.api.service.schema.DiscoverSchema

schema

Mark an action as returninga discovered schema. Itsparameter MUST be the typedecorated with @Structure.

Schema

{"entries":[{"name":"column1","type":"STRING"}]}

@org.talend.sdk.component.api.service.Action

user - any -

@org.talend.sdk.component.api.service.asyncvalidation.AsyncValidation

validation

Mark a method as being usedto validate a configuration.IMPORTANT: this is a servervalidation so only use it ifyou can’t use other client sidevalidation to implement it.

ValidationResult

{"comment":"Something wentwrong","status":"KO"}

4.4.3. Built in services

The framework provides some built-in services you can inject by type in components andactions out of the box.

Here is the list:

Type Description

org.talend.sdk.component.api.service.cache.LocalCache

Provides a small abstraction to cache datawhich don’t need to be recomputed veryoften. Commonly used by actions for the UIinteractions.

org.talend.sdk.component.api.service.dependency.Resolver

Allows to resolve a dependency from itsMaven coordinates.

| 47

Type Description

javax.json.spi.JsonProvider A JSON-P instance. Prefer other JSON-Pinstances if you don’t exactly know why youuse this one.

javax.json.JsonBuilderFactory A JSON-P instance. It is recommanded touse this one instead of a custom one formemory/speed optimizations.

javax.json.JsonWriterFactory A JSON-P instance. It is recommanded touse this one instead of a custom one formemory/speed optimizations.

javax.json.JsonReaderFactory A JSON-P instance. It is recommanded touse this one instead of a custom one formemory/speed optimizations.

javax.json.stream.JsonParserFactory A JSON-P instance. It is recommanded touse this one instead of a custom one formemory/speed optimizations.

javax.json.stream.JsonGeneratorFactory A JSON-P instance. It is recommanded touse this one instead of a custom one formemory/speed optimizations.

it assumes the dependencyis locally available to theexecution instance whichis not guaranteed yet bythe framework.

org.talend.sdk.component.api.service.configuration.LocalConfiguration

Represents the local configuration whichcan be used during the design.

it is not recommanded touse it for the runtime sincethe local configuration isgenerally different and theinstances are distincts.

you can also use the localcache as an interceptorwith @Cached

48 |

Type Description

Every interface that extends HttpClient andthat contains methods annotated with@Request

This let you define an http client in adeclarative manner using an annotatedinterface.

See the [_httpclient_usage]for details.

HttpClient usage

Let assume that we have a REST API defined like below, and that it requires a basicauthentication header.

GET /api/records/{id} -

POST /api/records with a json playload to be created{"id":"some id", "data":"some data"}

To create an http client able to consume this REST API, we will define an interface thatextends HttpClient,

The HttpClient interface lets you set the base for the http address that our client will hit.

The base is the part of the address that we will need to add to the request path to hit theapi.

Every method annotated with @Request of our interface will define an http request. Alsoevery request can have @Codec that let us encode/decode the request/response playloads.

if your payload(s) is(are) String or Void you can ignore the coder/decoder.

public interface APIClient extends HttpClient {  @Request(path = "api/records/{id}", method = "GET")  @Codec(decoder = RecordDecoder.class) //decoder = decode returned data toRecord class  Record getRecord(@Header("Authorization") String basicAuth, @Path("id")int id);

  @Request(path = "api/records", method = "POST")  @Codec(encoder = RecordEncoder.class, decoder = RecordDecoder.class)//encoder = encode record to fit request format (json in this example)  Record createRecord(@Header("Authorization") String basicAuth, Recordrecord);}

The interface should extends HttpClient.

| 49

In the codec classes (class that implement Encoder/Decoder) you can inject any of yourservices annotated with @Service or @Internationalized into the constructor. The i18nservices can be useful to have i18n messages for errors handling for example.

This interface can be injected into our Components classes or Services to consume thedefined api.

@Servicepublic class MyService {

  private APIClient client;

  public MyService(...,APIClient client){  //...  this.client = client;  client.base("http://localhost:8080");// init the base of the api, ofenin a PostConstruct or init method  }

  //...  // Our get request  Record rec = client.getRecord("Basic MLFKG?VKFJ", 100);

  //...  // Our post request  Record newRecord = client.createRecord("Basic MLFKG?VKFJ", new Record());}

Note: by default /+json are mapped to JSON-P and /+xml to JAX-B if the model has a@XmlRootElement annotation.

Advanced HTTP client request customization

For advanced cases you can customize the Connection directly using @UseConfigurer on themethod. It will call your custom instance of Configurer. Note that you can use some@ConfigurerOption in the method signature to pass some configurer configuration.

For instance if you have this configurer:

50 |

public class BasicConfigurer implements Configurer {  @Override  public void configure(final Connection connection, finalConfigurerConfiguration configuration) {  final String user = configuration.get("username", String.class);  final String pwd = configuration.get("password", String.class);  connection.withHeader(  "Authorization",  Base64.getEncoder().encodeToString((user + ':' + pwd).getBytes(StandardCharsets.UTF_8)));  }}

You can then set it on a method to automatically add the basic header with this kind ofAPI usage:

public interface APIClient extends HttpClient {  @Request(path = "...")  @UseConfigurer(BasicConfigurer.class)  Record findRecord(@ConfigurerOption("username") String user,@ConfigurerOption("password") String pwd);}

4.4.4. Services and interceptors

For common concerns like caching, auditing etc, it can be fancy to use interceptor like API.It is enabled by the framework on services.

An interceptor defines an annotation marked with @Intercepts which defines theimplementation of the interceptor (an InterceptorHandler).

Here is an example:

@Intercepts(LoggingHandler.class)@Target({ TYPE, METHOD })@Retention(RUNTIME)public @interface Logged {  String value();}

Then handler is created from its constructor and can take service injections (by type). Thefirst parameter, however, can be a BiFunction<Method, Object[], Object> whichrepresentes the invocation chain if your interceptor can be used with others.

| 51

if you do a generic interceptor it is important to pass the invoker as firstparameter. If you don’t do so you can’t combine interceptors at all.

Here is an interceptor implementation for our @Logged API:

public class LoggingHandler implements InterceptorHandler {  // injected  private final BiFunction<Method, Object[], Object> invoker;  private final SomeService service;

  // internal  private final ConcurrentMap<Method, String> loggerNames = newConcurrentHashMap<>();

  public CacheHandler(final BiFunction<Method, Object[], Object> invoker,final SomeService service) {  this.invoker = invoker;  this.service = service;  }

  @Override  public Object invoke(final Method method, final Object[] args) {  final String name = loggerNames.computeIfAbsent(method, m ->findAnnotation(m, Logged.class).get().value());  service.getLogger(name).info("Invoking {}", method.getName());  return invoker.apply(method, args);  }}

This implementation is compatible with interceptor chains since it takes the invoker asfirst constructor parameter and it also takes a service injection. Then the implementationjust does what is needed - logging the invoked method here.

the findAnnotation annotation - inherited from InterceptorHandler is anutility method to find an annotation on a method or class (in this order).

4.4.5. Creating a job pipeline

Job Builder

The Job builder let you create a job pipeline programmatically using Talend components(Producers and Processors). The job pipeline is an acyclic graph, so you can built complexpipelines.

Let’s take a simple use case where we will have 2 data source (employee and salary) thatwe will format to csv and write the result to a file.

52 |

A job is defined based on components (nodes) and links (edges) to connect their branchestogether.

Every component is defined by an unique id and an URI that identify the component.

The URI follow the form : [family]://[component][?version][&configuration]

• family: the name of the component family

• component: the name of the component

• version : the version of the component, it’s represented in a key=value format. wherethe key is __version and the value is a number.

• configuration: here you can provide the component configuration as key=value tuplewhere the key is the path of the configuration and the value is the configuration valuein string format.

URI Example

job://csvFileGen?__version=1&path=/temp/result.csv&encoding=utf-8"

configuration parameters must be URI/URL encoded.

Here is a more concrete job example:

Job.components() ①  .component("employee","db://input")  .component("salary", "db://input")  .component("concat", "transform://concat?separator=;")  .component("csv", "file://out?__version=2")  .connections() ②  .from("employee").to("concat", "string1")  .from("salary").to("concat", "string2")  .from("concat").to("csv")  .build() ③  .run(); ④

① We define all the components that will be used in the job pipeline.

② Then, we define the connections between the components to construct the job pipeline.the links from → to use the component id and the default input/output branches. Youcan also connect a specific branch of a component if it has multiple or namedinputs/outputs branches using the methods from(id, branchName) → to(id,

branchName). In the example above, the concat component have to inputs (string1 andstring2).

③ In this step, we validate the job pipeline by asserting that :

• It has some starting components (component that don’t have a from connection

| 53

and that need to be of type producer).

• There is no cyclic connections. as the job pipeline need to be an acyclic graph.

• All the components used in connections are already declared.

• The connection is used only once. you can’t connect a component input/outputbranch twice.

④ We run the job pipeline.

In this version, the execution of the job is linear. the component are notexecuted in parallel even if some steps may be independents.

Environment/Runner

Depending the configuration you can select which environment you execute your job in.

To select the environment the logic is the following one:

1. if an org.talend.sdk.component.runtime.manager.chain.Job.ExecutorBuilder is passedthrough the job properties then use it (supported type are a ExecutionBuilder instance,a Class or a String).

2. if an ExecutionBuilder SPI is present then use it (it is the case if component-runtime-beamis present in your classpath).

3. else just use a local/standalone execution.

In the case of a Beam execution you can customize the pipeline options using systemproperties. They have to be prefixed by talend.beam.job.. For instance to set appNameoption you will set -Dtalend.beam.job.appName=mytest.

Key Provider

The job builder let you set a key provider to join your data when a component hasmultiple inputs. The key provider can be set contextually to a component or globally tothe job

54 |

Job.components()  .component("employee","db://input")  .property(GroupKeyProvider.class.getName(),  (GroupKeyProvider) context -> context.getData().getString("id")) ①  .component("salary", "db://input")  .component("concat", "transform://concat?separator=;")  .connections()  .from("employee").to("concat", "string1")  .from("salary").to("concat", "string2")  .build()  .property(GroupKeyProvider.class.getName(), ②  (GroupKeyProvider) context -> context.getData().getString("employee_id"))  .run();

① Here we have defined a key provider for the data produced by the component employee

② Here we have defined a key provider for all the data manipulated in this job.

If the incoming data has different ids you can provide a complex global key providerrelaying on the context that give you the component id and the branch Name.

GroupKeyProvider keyProvider = context -> {  if ("employee".equals(context.getComponentId())) {  return context.getData().getString("id");  }  return context.getData().getString("employee_id");};

Beam case

For beam case, you need to rely on beam pipeline definition and use component-runtime-beam dependency which provides Beam bridges.

I/O

org.talend.sdk.component.runtime.beam.TalendIO provides a way to convert a partitionmapper or a processor to an input or processor using the read or write methods.

| 55

public class Main {  public static void main(final String[] args) {  final ComponentManager manager = ComponentManager.instance()  Pipeline pipeline = Pipeline.create();  //Create beam input from mapper and apply input to pipeline  pipeline.apply(TalendIO.read(manager.findMapper(manager.findMapper("sample", "reader", 1, new HashMap<String, String>() {{  put("fileprefix", "input");  }}).get()))  .apply(new ViewsMappingTransform(emptyMap(), "sample")) //prepare it for the output record format (see next part)  //Create beam processor from talend processor and apply to pipeline  .apply(TalendIO.write(manager.findProcessor("test", "writer",1, new HashMap<String, String>() {{  put("fileprefix", "output");  }}).get(), emptyMap()));

  //... run pipeline  }}

Processors

org.talend.sdk.component.runtime.beam.TalendFn provides the way to wrap a processor ina Beam PTransform and integrate it in the pipeline.

public class Main {  public static void main(final String[] args) {  //Component manager and pipeline initialization...

  //Create beam PTransform from processor and apply input to pipeline  pipeline.apply(TalendFn.asFn(manager.findProcessor("sample", "mapper",1, emptyMap())).get())), emptyMap());

  //... run pipeline  }}

The multiple inputs/outputs are represented by a Map element in beam case to avoid to usemultiple inputs/outputs.

you can use ViewsMappingTransform or CoGroupByKeyResultMappingTransformto adapt the input/output format to the record format representing themultiple inputs/output, so a kind of Map<String, List<?>>, butmaterialized as a JsonObject. Input data must be of type JsonObject in thiscase.

56 |

Deployment

Beam serializing components it is crucial to add component-runtime-standalone dependency to the project. It will take care of providing animplicit and lazy ComponentManager managing the component in a fatjarcase.

Convert a Beam.io in a component I/O

For simple I/O you can get automatic conversion of the Beam.io to a component I/Otransparently if you decorated your PTransform with @PartitionMapper or @Processor.

The limitation are:

• Inputs must implement PTransform<PBegin, PCollection<?>> and must be aBoundedSource.

• Outputs must implement PTransform<PCollection<?>, PDone> and just register on theinput PCollection a DoFn.

More information on that topic on How to wrap a Beam I/O page.

4.4.6. Advanced: define a custom API

It is possible to extend the Component API for custom front features.

What is important here is to keep in mind you should do it only if it targets not portablecomponents (only used by the Studio or Beam).

In term of organization it is recommanded to create a custom xxxx-component-api modulewith the new set of annotations.

Extending the UI

To extend the UI just add an annotation which can be put on @Option fields which isdecorated with @Ui. All its members will be put in the metadata of the parameter.Example:

@Ui@Target(TYPE)@Retention(RUNTIME)public @interface MyLayout {}

4.5. Talend Component Testing Documentation

| 57

4.5.1. Best practises

this part is mainly around tools usable with JUnit. You can use most ofthese techniques with TestNG as well, check out the documentation if youneed to use TestNG.

Parameterized tests

This is a great solution to repeat the same test multiple times. Overall idea is to define atest scenario (I test function F) and to make the input/output data dynamic.

JUnit 4

Here is an example. Let’s assume we have this test which validates the connection URIusing ConnectionService:

public class MyConnectionURITest {  @Test  public void checkMySQL() {  assertTrue(new ConnectionService().isValid("jdbc:mysql://localhost:3306/mysql"));  }

  @Test  public void checkOracle() {  assertTrue(new ConnectionService().isValid("jdbc:oracle:thin:@//myhost:1521/oracle"));  }}

We clearly identify the test method is always the same except the value. It can thereforebe rewritter using JUnit Parameterized runner like that:

58 |

@RunWith(Parameterized.class) ①public class MyConnectionURITest {

  @Parameterized.Parameters(name = "{0}") ②  public static Iterable<String> uris() { ③  return asList(  "jdbc:mysql://localhost:3306/mysql",  "jdbc:oracle:thin:@//myhost:1521/oracle");  }

  @Parameterized.Parameter ④  public String uri;

  @Test  public void isValid() { ⑤  assertNotNull(uri);  }}

① Parameterized is the runner understanding @Parameters and how to use it. Note that youcan generate random data here if desired.

② by default the name of the executed test is the index of the data, here we customize itusing the first parameter toString() value to have something more readable

③ the @Parameters method MUST be static and return an array or iterable of the data usedby the tests

④ you can then inject the current data using @Parameter annotation, it can take aparameter if you use an array of array instead of an iterable of object in @Parameterizedand you can select which item you want injected this way

⑤ the @Test method will be executed using the contextual data, in this sample we’ll getexecuted twice with the 2 specified urls

you don’t have to define a single @Test method, if you define multiple,each of them will be executed with all the data (ie if we add a test inprevious example you will get 4 tests execution - 2 per data, ie 2x2)

JUnit 5

JUnit 5 reworked this feature to make it way easier to use. The full documentation isavailable at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests.

The main difference is you can also define inline on the test method that it is aparameterized test and which are the values:

| 59

@ParameterizedTest@ValueSource(strings = { "racecar", "radar", "able was I ere I saw elba" })void mytest(String currentValue) {  // do test}

However you can still use the previous behavior using a method binding configuration:

@ParameterizedTest@MethodSource("stringProvider")void mytest(String currentValue) {  // do test}

static Stream<String> stringProvider() {  return Stream.of("foo", "bar");}

This last option allows you to inject any type of value - not only primitives - which is verycommon to define scenarii.

don’t forget to add junit-jupiter-params dependency to benefit from thisfeature.

4.5.2. component-runtime-testing

component-runtime-junit

component-runtime-junit is a small test library allowing you to validate simple logic basedon Talend Component tooling.

To import it add to your project the following dependency:

<dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-junit</artifactId>  <version>${talend-component.version}</version>  <scope>test</scope></dependency>

This dependency also provide some mocked components that you can use with your owncomponent to create tests.

The mocked components are provided under the family test :

60 |

• emitter : a mock of an input component

• collector : a mock of an output component

JUnit 4

Then you can define a standard JUnit test and use the SimpleComponentRule rule:

public class MyComponentTest {

  @Rule ①  public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent.");

  @Test  public void produce() {  Job.components() ②  .component("mycomponent","yourcomponentfamily://yourcomponent?"+createComponentConfig())  .component("collector", "test://collector")  .connections()  .from("mycomponent").to("collector")  .build()  .run();

  final List<MyRecord> records = components.getCollectedData(MyRecord.class); ③  doAssertRecords(records); // depending your test  }}

① the rule will create a component manager and provide two mock components: anemitter and a collector. Don’t forget to set the root package of your component toenable it.

② you define any chain you want to test, it generally uses the mock as source or collector

③ you validate your component behavior, for a source you can assert the right recordswere emitted in the mock collect

JUnit 5

The JUnit 5 integration is mainly the same as for JUnit 4 except it uses the new JUnit 5extension mecanism.

The entry point is the @WithComponents annotation you put on your test class which takesthe component package you want to test and you can use @Injected to inject in a test classfield an instance of ComponentsHandler which exposes the same utilities than the JUnit 4rule:

| 61

@WithComponents("org.talend.sdk.component.junit.component") ①public class ComponentExtensionTest {  @Injected ②  private ComponentsHandler handler;

  @Test  public void manualMapper() {  final Mapper mapper = handler.createMapper(Source.class, new Source.Config() {

  {  values = asList("a", "b");  }  });  assertFalse(mapper.isStream());  final Input input = mapper.create();  assertEquals("a", input.next());  assertEquals("b", input.next());  assertNull(input.next());  }}

① The annotation defines which components to register in the test context.

② The field allows to get the handler to be able to orchestrate the tests.

if it is the first time you use JUnit 5, don’t forget the imports changed andyou must use org.junit.jupiter.api.Test instead of org.junit.Test. SomeIDE versions and surefire versions can also need you to install either aplugin or a specific configuration.

Mocking the output

Using the component "test"/"collector" as in previous sample stores all records emitted bythe chain (typically your source) in memory, you can then access them usingtheSimpleComponentRule.getCollectoedRecord(type). Note that this method filters by type, ifyou don’t care of the type just use Object.class.

Mocking the input

The input mocking is symmetric to the output but here you provide the data you want toinject:

62 |

public class MyComponentTest {

  @Rule  public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent.");

  @Test  public void produce() {  components.setInputData(asList(createData(), createData(), createData())); ①

  Job.components() ②  .component("emitter","test://emitter")  .component("out", "yourcomponentfamily://myoutput?"+createComponentConfig())  .connections()  .from("emitter").to("out")  .build  .run();

  assertMyOutputProcessedTheInputData();  }}

① using setInputData you prepare the execution(s) to have a fake input when using"test"/"emitter" component.

Creating runtime configuration from component configuration

The component configuration is a POJO (using @Option on fields) and the runtimeconfiguration (ExecutionChainBuilder) uses a Map<String, String>. To make the conversioneasier, the JUnit integration provides a SimpleFactory.configurationByExample utility to getthis map instance from a configuration instance.

Example:

final MyComponentConfig componentConfig = new MyComponentConfig();componentConfig.setUser("....");// .. other inits

final Map<String, String> configuration = configurationByExample(componentConfig);

The same factory provides a fluent DSL to create configuration callingconfigurationByExample without any parameter. The advantage is to be able to convert anobject as a Map<String, String> as seen previously or as a query string to use it with theJob DSL:

| 63

final String uri = "family://component?" +  configurationByExample().forInstance(componentConfig).configured().toQueryString();

It handles the encoding of the URI to ensure it is correctly done.

Testing a Mapper

The SimpleComponentRule also allows to test a mapper unitarly, you can get an instancefrom a configuration and you can execute this instance to collect the output. Here is asnippet doing that:

public class MapperTest {

  @ClassRule  public static final SimpleComponentRule COMPONENT_FACTORY = newSimpleComponentRule(  "org.company.talend.component");

  @Test  public void mapper() {  final Mapper mapper = COMPONENT_FACTORY.createMapper(MyMapper.class,new Source.Config() {{  values = asList("a", "b");  }});  assertEquals(asList("a", "b"), COMPONENT_FACTORY.collectAsList(String.class, mapper));  }}

Testing a Processor

As for the mapper a processor is testable unitary. The case is a bit more complex since youcan have multiple inputs and outputs:

64 |

public class ProcessorTest {

  @ClassRule  public static final SimpleComponentRule COMPONENT_FACTORY = newSimpleComponentRule(  "org.company.talend.component");

  @Test  public void processor() {  final Processor processor = COMPONENT_FACTORY.createProcessor(Transform.class, null);  final SimpleComponentRule.Outputs outputs = COMPONENT_FACTORY.collect(processor,  new JoinInputFactory().withInput("__default__",asList(new Transform.Record("a"), new Transform.Record("bb")))  .withInput("second", asList(newTransform.Record("1"), new Transform.Record("2")))  );  assertEquals(2, outputs.size());  assertEquals(asList(2, 3), outputs.get(Integer.class, "size"));  assertEquals(asList("a1", "bb2"), outputs.get(String.class, "value"));  }}

Here again the rule allows you to instantiate a Processor from your code and then tocollect the output from the inputs you pass in. There are two convenient implementationof the input factory:

1. MainInputFactory for processors using only the default input.

2. JoinInputfactory for processors using multiple inputs have a methodwithInput(branch, data) The first arg is the branch name and the second arg is thedata used by the branch.

you can also implement your own input representation if neededimplementing org.talend.sdk.component.junit.ControllableInputFactory.

component-runtime-testing-spark

The folowing artifact will allow you to test against a spark cluster:

<dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-testing-spark</artifactId>  <version>${talend-component.version}</version>  <scope>test</scope></dependency>

| 65

JUnit 4

The usage relies on a JUnit TestRule. It is recommanded to use it as a @ClassRule to ensurea single instance of a spark cluster is built but you can also use it as a simple @Rule whichmeans it will be created per method instead of per test class.

It takes as parameter the spark and scala version to use. It will then fork a master and Nslaves. Finally it will give you submit* method allowing you to send jobs either from thetest classpath or from a shade if you run it as an integration test.

Here is a sample:

public class SparkClusterRuleTest {

  @ClassRule  public static final SparkClusterRule SPARK = new SparkClusterRule("2.10","1.6.3", 1);

  @Test  public void classpathSubmit() throws IOException {  SPARK.submitClasspath(SubmittableMain.class, getMainArgs());

  // do wait the test passed  }}

this is working with @Parameterized so you can submit a bunch of jobswith different args and even combine it with beam TestPipeline if youmake it transient!

JUnit 5

The integration with JUnit 5 of that spark cluster logic uses @WithSpark marker for theextension and let you, optionally, inject through @SparkInject, the BaseSpark<?> handler toaccess te spark cluster meta information - like its host/port.

Here is a basic test using it:

66 |

@WithSparkclass SparkExtensionTest {

  @SparkInject  private BaseSpark<?> spark;

  @Test  void classpathSubmit() throws IOException {  final File out = new File(jarLocation(SparkClusterRuleTest.class).getParentFile(), "classpathSubmitJunit5.out");  if (out.exists()) {  out.delete();  }  spark.submitClasspath(SparkClusterRuleTest.SubmittableMain.class,spark.getSparkMaster(), out.getAbsolutePath());

  await().atMost(5, MINUTES).until(  () -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,  equalTo("b -> 1\na -> 1"));  }}

How to know the job is done

In current state, SparkClusterRule doesn’t allow to know a job execution is done - even if itexposes the webui url so you can poll it to check. The best at the moment is to ensure theoutput of your job exists and contains the right value.

awaitability or equivalent library can help you to write such logic.

Here are the coordinates of the artifact:

<dependency>  <groupId>org.awaitility</groupId>  <artifactId>awaitility</artifactId>  <version>3.0.0</version>  <scope>test</scope></dependency>

And here is how to wait a file exists and its content (for instance) is the expected one:

| 67

await()  .atMost(5, MINUTES)  .until(  () -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,  equalTo("the expected content of the file"));

component-runtime-http-junit

The HTTP JUnit module allows you to mock REST API very easily. Here are its coordinates:

<dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-junit</artifactId>  <version>${talend-component.version}</version>  <scope>test</scope></dependency>

this module uses Apache Johnzon and Netty, if you have any conflict (inparticular with netty) you can add the classifier shaded to the dependencyand the two dependencies are shaded avoiding the conflicts with yourcomponent.

It supports JUnit 4 and JUnit 5 as well but the overall concept is the exact same one: theextension/rule is able to serve precomputed responses saved in the classpath.

You can plug your own ResponseLocator to map a request to a response but the defaultimplementation - which should be sufficient in most cases - will look intalend/testing/http/<class name>_<method name>.json. Note that you can also put it intalend/testing/http/<request path>.json.

JUnit 4

JUnit 4 setup is done through two rules: JUnit4HttpApi which is responsible to start theserver and JUnit4HttpApiPerMethodConfigurator which is responsible to configure theserver per test and also handle the capture mode (see later).

if you don’t use the JUnit4HttpApiPerMethodConfigurator, the capturefeature will be deactivated and the per test mocking will not be available.

Most of the test will look like:

68 |

public class MyRESTApiTest {  @ClassRule  public static final JUnit4HttpApi API = new JUnit4HttpApi();

  @Rule  public final JUnit4HttpApiPerMethodConfigurator configurator = newJUnit4HttpApiPerMethodConfigurator(API);

  @Test  public void direct() throws Exception {  // ... do your requests  }}

SSL

For tests using SSL based services, you will need to use activeSsl() on the JUnit4HttpApirule.

If you need to access the server ssl socket factory you can do it from the HttpApiHandler(the rule):

@ClassRulepublic static final JUnit4HttpApi API = new JUnit4HttpApi()<strong>.activeSsl()</strong>;

@Testpublic void test() throws Exception {  final HttpsURLConnection connection = getHttpsConnection();  connection.setSSLSocketFactory(API.getSslContext().getSocketFactory());  // ....}

JUnit 5

JUnit 5 uses a JUnit 5 extension based on the HttpApi annotation you can put on your testclass. You can inject the test handler (which has some utilities for advanced cases) through@HttpApiInject:

| 69

@HttpApiclass JUnit5HttpApiTest {  @HttpApiInject  private HttpApiHandler<?> handler;

  @Test  void getProxy() throws Exception {  // .... do your requests  }}

the injection is optional and the @HttpApi allows you to configure severalbehaviors of the test.

SSL

For tests using SSL based services, you will need to use @HttpApi(useSsl = true).

You can access the client SSL socket factory through the api handler:

@HttpApi*(useSsl = true)*class MyHttpsApiTest {  @HttpApiInject  private HttpApiHandler<?> handler;

  @Test  void test() throws Exception {  final HttpsURLConnection connection = getHttpsConnection();  connection.setSSLSocketFactory(handler.getSslContext().getSocketFactory());  // ....  }}

Capturing mode

The strength of this implementation is to run a small proxy server and auto configure theJVM: http[s].proxyHost, http[s].proxyPort, HttpsURLConnection#defaultSSLSocketFactory

and SSLContext#default are auto configured to work out of the box with the proxy.

It allows you to keep in your tests the native and real URLs. For instance this test isperfectlt valid:

70 |

public class GoogleTest {  @ClassRule  public static final JUnit4HttpApi API = new JUnit4HttpApi();

  @Rule  public final JUnit4HttpApiPerMethodConfigurator configurator = newJUnit4HttpApiPerMethodConfigurator(API);

  @Test  public void google() throws Exception {  assertEquals(HttpURLConnection.HTTP_OK, get("https://google.fr?q=Talend"));  }

  private int get(final String uri) throws Exception {  // do the GET request, skipped for brievity  }}

If you execute this test, it will fail with a HTTP 400 because the proxy doesn’t find themocked response. You can create it manually as seen in the introduction of the modulebut you can also set the property talend.junit.http.capture to the folder where to storethe captures. It must be the root folder and not the folder where the json are (ie notprefixed by talend/testing/http by default).

Generally you will want to use src/test/resources. If new File("src/test/resources")

resolves to the valid folder when executing your test (Maven default), then you can justset the system property to true, otherwise you need to adjust accordingly the systemproperty value.

Once you ran the tests with this system property, the testing framework will have createdthe correct mock response files and you can remove the system property. The test will stillpass, using google.com…even if you disconnect your machine from the internet.

The rule (extension) is doing all the work for you :).

Passthrough mode

Setting talend.junit.http.passthrough system property to true, the server will just be aproxy and will execute each request to the actual server - like in capturing mode.

4.5.3. Beam testing

If you want to ensure your component works in Beam the minimum to do is to try withthe direct runner (if you don’t want to use spark).

Check beam.apache.org/contribute/testing/ out for more details.

| 71

4.5.4. Multiple environments for the same tests

JUnit (4 or 5) already provides some ways to parameterized tests and execute the same"test logic" against several data. However it is not that convenient to test multipleenvironments.

For instance, with Beam, you can desire to test against multiple runners your code and itrequires to solve conflicts between runner dependencies, setup the correct classloadersetc…It is a lot of work!

To simplify such cases, the framework provides you a multi-environment support for yourtests.

It is in the junit module and is usable with JUnit 4 and JUnit 5.

JUnit 4

@RunWith(MultiEnvironmentsRunner.class)@Environment(Env1.class)@Environment(Env1.class)public class TheTest {  @Test  public void test1() {  // ...  }}

The MultiEnvironmentsRunner will execute the test(s) for each defined environments. Itmeans it will run test1 for Env1 and Env2 in previous example.

By default JUnit4 runner will be used to execute the tests in one environment but you canuse @DelegateRunWith to use another runner.

JUnit 5

JUnit 5 configuration is close to JUnit 4 one:

@Environment(EnvironmentsExtensionTest.E1.class)@Environment(EnvironmentsExtensionTest.E2.class)class TheTest {

  @EnvironmentalTest  void test1() {  // ...  }}

72 |

The main difference is you don’t use a runner (it doesn’t exist in JUnit 5) and you replace@Test by @EnvironmentalTest.

the main difference with JUnit 4 integration is that the tests are executeone after each other for all environments instead of running all tests ineach environments sequentially. It means, for instance, that @BeforeAlland @AfterAll are executed once for all runners.

Provided environments

The provided environment setup the contextual classloader to load the related runner ofApache Beam.

Package: org.talend.sdk.component.junit.environment.builtin.beam

the configuration is read from system properties, environment variables,….

Class Name Description

ContextualEnvironment Contextual Contextual runner

DirectRunnerEnvironment Direct Direct runner

FlinkRunnerEnvironment Flink Flink runner

SparkRunnerEnvironment Spark Spark runner

Configuring environments

If the environment extends BaseEnvironmentProvider and therefore defines anenvironment name - which is the case of the default ones, you can useEnvironmentConfiguration to customize the system properties used for that environment:

| 73

@Environment(DirectRunnerEnvironment.class)@EnvironmentConfiguration(  environment = "Direct",  systemProperties = @EnvironmentConfiguration.Property(key ="beamTestPipelineOptions", value = "..."))

@Environment(SparkRunnerEnvironment.class)@EnvironmentConfiguration(  environment = "Spark",  systemProperties = @EnvironmentConfiguration.Property(key ="beamTestPipelineOptions", value = "..."))

@Environment(FlinkRunnerEnvironment.class)@EnvironmentConfiguration(  environment = "Flink",  systemProperties = @EnvironmentConfiguration.Property(key ="beamTestPipelineOptions", value = "..."))class MyBeamTest {

  @EnvironmentalTest  void execute() {  // run some pipeline  }}

if you set the system property <environment name>.skip=true then theenvironment related executions will be skipped.

Advanced usage

this usage assumes Beam 2.4.0 is in used and the classloader fix about thePipelineOptions is merged.

Dependencies:

74 |

<dependencies>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-junit</artifactId>  <scope>test</scope>  </dependency>  <dependency>  <groupId>org.junit.jupiter</groupId>  <artifactId>junit-jupiter-api</artifactId>  <scope>test</scope>  </dependency>  <dependency>  <groupId>org.jboss.shrinkwrap.resolver</groupId>  <artifactId>shrinkwrap-resolver-impl-maven</artifactId>  <version>3.0.1</version>  <scope>test</scope>  </dependency>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-beam</artifactId>  <scope>test</scope>  </dependency>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-standalone</artifactId>  <scope>test</scope>  </dependency></dependencies>

These dependencies brings into the test scope the JUnit testing toolkit, the Beamintegration and the multi-environment testing toolkit for JUnit.

Then using the fluent DSL to define jobs - which assumes your job is linear and each stepsends a single value (no multi-input/multi-output), you can write this kind of test:

@Environment(ContextualEnvironment.class)@Environment(DirectRunnerEnvironment.class)class TheComponentTest {  @EnvironmentalTest  void testWithStandaloneAndBeamEnvironments() {  from("myfamily://in?config=xxxx")  .to("myfamily://out")  .create()  .execute();  // add asserts on the output if needed  }}

| 75

It will execute the chain twice:

1. with a standalone environment to simulate the studio

2. with a beam (direct runner) environment to ensure the portability of your job

4.5.5. Secrets/Passwords and Maven

If you desire you can reuse your Maven settings.xml servers - including the encryptedones. org.talend.sdk.component.maven.MavenDecrypter will give you the ability to find aserver username/password from a server identifier:

final MavenDecrypter decrypter = new MavenDecrypter();final Server decrypted = decrypter.find("my-test-server");// decrypted.getUsername();// decrypted.getPassword();

It is very useful to not store secrets and test on real systems on a continuous integrationplatform.

even if you don’t use maven on the platform you can generate thesettings.xml and settings-security.xml files to use that feature. Seemaven.apache.org/guides/mini/guide-encryption.html for more details.

4.5.6. Generating data?

Several data generator exists if you want to populate objects with a semantic a bit moreevolved than a plain random string like commons-lang3:

• github.com/Codearte/jfairy

• github.com/DiUS/java-faker

• github.com/andygibson/datafactory

• …

A bit more advanced, these ones allow to bind directly generic data on a model - but dataquality is not always there:

• github.com/devopsfolks/podam

• github.com/benas/random-beans

• …

Note there are two main kind of implementations:

• the one using a pattern and random generated data

• a set of precomputed data extrapolated to create new values

76 |

Check against your use case to know which one is the best.

an interesting alternative to data generation is to import real data anduse Talend Studio to sanitize the data (remove sensitive informationreplacing them by generated data or anonymized data) and just injectthat file into the system.

If you are using JUnit 5, you can have a look to glytching.github.io/junit-extensions/randomBeans which is pretty good on that topic.

5. Talend Component TestingDocumentation

5.1. Best practises

this part is mainly around tools usable with JUnit. You can use most ofthese techniques with TestNG as well, check out the documentation if youneed to use TestNG.

5.1.1. Parameterized tests

This is a great solution to repeat the same test multiple times. Overall idea is to define atest scenario (I test function F) and to make the input/output data dynamic.

JUnit 4

Here is an example. Let’s assume we have this test which validates the connection URIusing ConnectionService:

public class MyConnectionURITest {  @Test  public void checkMySQL() {  assertTrue(new ConnectionService().isValid("jdbc:mysql://localhost:3306/mysql"));  }

  @Test  public void checkOracle() {  assertTrue(new ConnectionService().isValid("jdbc:oracle:thin:@//myhost:1521/oracle"));  }}

We clearly identify the test method is always the same except the value. It can therefore

| 77

be rewritter using JUnit Parameterized runner like that:

@RunWith(Parameterized.class) ①public class MyConnectionURITest {

  @Parameterized.Parameters(name = "{0}") ②  public static Iterable<String> uris() { ③  return asList(  "jdbc:mysql://localhost:3306/mysql",  "jdbc:oracle:thin:@//myhost:1521/oracle");  }

  @Parameterized.Parameter ④  public String uri;

  @Test  public void isValid() { ⑤  assertNotNull(uri);  }}

① Parameterized is the runner understanding @Parameters and how to use it. Note that youcan generate random data here if desired.

② by default the name of the executed test is the index of the data, here we customize itusing the first parameter toString() value to have something more readable

③ the @Parameters method MUST be static and return an array or iterable of the data usedby the tests

④ you can then inject the current data using @Parameter annotation, it can take aparameter if you use an array of array instead of an iterable of object in @Parameterizedand you can select which item you want injected this way

⑤ the @Test method will be executed using the contextual data, in this sample we’ll getexecuted twice with the 2 specified urls

you don’t have to define a single @Test method, if you define multiple,each of them will be executed with all the data (ie if we add a test inprevious example you will get 4 tests execution - 2 per data, ie 2x2)

JUnit 5

JUnit 5 reworked this feature to make it way easier to use. The full documentation isavailable at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests.

The main difference is you can also define inline on the test method that it is aparameterized test and which are the values:

78 |

@ParameterizedTest@ValueSource(strings = { "racecar", "radar", "able was I ere I saw elba" })void mytest(String currentValue) {  // do test}

However you can still use the previous behavior using a method binding configuration:

@ParameterizedTest@MethodSource("stringProvider")void mytest(String currentValue) {  // do test}

static Stream<String> stringProvider() {  return Stream.of("foo", "bar");}

This last option allows you to inject any type of value - not only primitives - which is verycommon to define scenarii.

don’t forget to add junit-jupiter-params dependency to benefit from thisfeature.

5.2. component-runtime-testing

5.2.1. component-runtime-junit

component-runtime-junit is a small test library allowing you to validate simple logic basedon Talend Component tooling.

To import it add to your project the following dependency:

<dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-junit</artifactId>  <version>${talend-component.version}</version>  <scope>test</scope></dependency>

This dependency also provide some mocked components that you can use with your owncomponent to create tests.

The mocked components are provided under the family test :

| 79

• emitter : a mock of an input component

• collector : a mock of an output component

JUnit 4

Then you can define a standard JUnit test and use the SimpleComponentRule rule:

public class MyComponentTest {

  @Rule ①  public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent.");

  @Test  public void produce() {  Job.components() ②  .component("mycomponent","yourcomponentfamily://yourcomponent?"+createComponentConfig())  .component("collector", "test://collector")  .connections()  .from("mycomponent").to("collector")  .build()  .run();

  final List<MyRecord> records = components.getCollectedData(MyRecord.class); ③  doAssertRecords(records); // depending your test  }}

① the rule will create a component manager and provide two mock components: anemitter and a collector. Don’t forget to set the root package of your component toenable it.

② you define any chain you want to test, it generally uses the mock as source or collector

③ you validate your component behavior, for a source you can assert the right recordswere emitted in the mock collect

JUnit 5

The JUnit 5 integration is mainly the same as for JUnit 4 except it uses the new JUnit 5extension mecanism.

The entry point is the @WithComponents annotation you put on your test class which takesthe component package you want to test and you can use @Injected to inject in a test classfield an instance of ComponentsHandler which exposes the same utilities than the JUnit 4rule:

80 |

@WithComponents("org.talend.sdk.component.junit.component") ①public class ComponentExtensionTest {  @Injected ②  private ComponentsHandler handler;

  @Test  public void manualMapper() {  final Mapper mapper = handler.createMapper(Source.class, new Source.Config() {

  {  values = asList("a", "b");  }  });  assertFalse(mapper.isStream());  final Input input = mapper.create();  assertEquals("a", input.next());  assertEquals("b", input.next());  assertNull(input.next());  }}

① The annotation defines which components to register in the test context.

② The field allows to get the handler to be able to orchestrate the tests.

if it is the first time you use JUnit 5, don’t forget the imports changed andyou must use org.junit.jupiter.api.Test instead of org.junit.Test. SomeIDE versions and surefire versions can also need you to install either aplugin or a specific configuration.

Mocking the output

Using the component "test"/"collector" as in previous sample stores all records emitted bythe chain (typically your source) in memory, you can then access them usingtheSimpleComponentRule.getCollectoedRecord(type). Note that this method filters by type, ifyou don’t care of the type just use Object.class.

Mocking the input

The input mocking is symmetric to the output but here you provide the data you want toinject:

| 81

public class MyComponentTest {

  @Rule  public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent.");

  @Test  public void produce() {  components.setInputData(asList(createData(), createData(), createData())); ①

  Job.components() ②  .component("emitter","test://emitter")  .component("out", "yourcomponentfamily://myoutput?"+createComponentConfig())  .connections()  .from("emitter").to("out")  .build  .run();

  assertMyOutputProcessedTheInputData();  }}

① using setInputData you prepare the execution(s) to have a fake input when using"test"/"emitter" component.

Creating runtime configuration from component configuration

The component configuration is a POJO (using @Option on fields) and the runtimeconfiguration (ExecutionChainBuilder) uses a Map<String, String>. To make the conversioneasier, the JUnit integration provides a SimpleFactory.configurationByExample utility to getthis map instance from a configuration instance.

Example:

final MyComponentConfig componentConfig = new MyComponentConfig();componentConfig.setUser("....");// .. other inits

final Map<String, String> configuration = configurationByExample(componentConfig);

The same factory provides a fluent DSL to create configuration callingconfigurationByExample without any parameter. The advantage is to be able to convert anobject as a Map<String, String> as seen previously or as a query string to use it with theJob DSL:

82 |

final String uri = "family://component?" +  configurationByExample().forInstance(componentConfig).configured().toQueryString();

It handles the encoding of the URI to ensure it is correctly done.

Testing a Mapper

The SimpleComponentRule also allows to test a mapper unitarly, you can get an instancefrom a configuration and you can execute this instance to collect the output. Here is asnippet doing that:

public class MapperTest {

  @ClassRule  public static final SimpleComponentRule COMPONENT_FACTORY = newSimpleComponentRule(  "org.company.talend.component");

  @Test  public void mapper() {  final Mapper mapper = COMPONENT_FACTORY.createMapper(MyMapper.class,new Source.Config() {{  values = asList("a", "b");  }});  assertEquals(asList("a", "b"), COMPONENT_FACTORY.collectAsList(String.class, mapper));  }}

Testing a Processor

As for the mapper a processor is testable unitary. The case is a bit more complex since youcan have multiple inputs and outputs:

| 83

public class ProcessorTest {

  @ClassRule  public static final SimpleComponentRule COMPONENT_FACTORY = newSimpleComponentRule(  "org.company.talend.component");

  @Test  public void processor() {  final Processor processor = COMPONENT_FACTORY.createProcessor(Transform.class, null);  final SimpleComponentRule.Outputs outputs = COMPONENT_FACTORY.collect(processor,  new JoinInputFactory().withInput("__default__",asList(new Transform.Record("a"), new Transform.Record("bb")))  .withInput("second", asList(newTransform.Record("1"), new Transform.Record("2")))  );  assertEquals(2, outputs.size());  assertEquals(asList(2, 3), outputs.get(Integer.class, "size"));  assertEquals(asList("a1", "bb2"), outputs.get(String.class, "value"));  }}

Here again the rule allows you to instantiate a Processor from your code and then tocollect the output from the inputs you pass in. There are two convenient implementationof the input factory:

1. MainInputFactory for processors using only the default input.

2. JoinInputfactory for processors using multiple inputs have a methodwithInput(branch, data) The first arg is the branch name and the second arg is thedata used by the branch.

you can also implement your own input representation if neededimplementing org.talend.sdk.component.junit.ControllableInputFactory.

5.2.2. component-runtime-testing-spark

The folowing artifact will allow you to test against a spark cluster:

84 |

<dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-testing-spark</artifactId>  <version>${talend-component.version}</version>  <scope>test</scope></dependency>

JUnit 4

The usage relies on a JUnit TestRule. It is recommanded to use it as a @ClassRule to ensurea single instance of a spark cluster is built but you can also use it as a simple @Rule whichmeans it will be created per method instead of per test class.

It takes as parameter the spark and scala version to use. It will then fork a master and Nslaves. Finally it will give you submit* method allowing you to send jobs either from thetest classpath or from a shade if you run it as an integration test.

Here is a sample:

public class SparkClusterRuleTest {

  @ClassRule  public static final SparkClusterRule SPARK = new SparkClusterRule("2.10","1.6.3", 1);

  @Test  public void classpathSubmit() throws IOException {  SPARK.submitClasspath(SubmittableMain.class, getMainArgs());

  // do wait the test passed  }}

this is working with @Parameterized so you can submit a bunch of jobswith different args and even combine it with beam TestPipeline if youmake it transient!

JUnit 5

The integration with JUnit 5 of that spark cluster logic uses @WithSpark marker for theextension and let you, optionally, inject through @SparkInject, the BaseSpark<?> handler toaccess te spark cluster meta information - like its host/port.

Here is a basic test using it:

| 85

@WithSparkclass SparkExtensionTest {

  @SparkInject  private BaseSpark<?> spark;

  @Test  void classpathSubmit() throws IOException {  final File out = new File(jarLocation(SparkClusterRuleTest.class).getParentFile(), "classpathSubmitJunit5.out");  if (out.exists()) {  out.delete();  }  spark.submitClasspath(SparkClusterRuleTest.SubmittableMain.class,spark.getSparkMaster(), out.getAbsolutePath());

  await().atMost(5, MINUTES).until(  () -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,  equalTo("b -> 1\na -> 1"));  }}

How to know the job is done

In current state, SparkClusterRule doesn’t allow to know a job execution is done - even if itexposes the webui url so you can poll it to check. The best at the moment is to ensure theoutput of your job exists and contains the right value.

awaitability or equivalent library can help you to write such logic.

Here are the coordinates of the artifact:

<dependency>  <groupId>org.awaitility</groupId>  <artifactId>awaitility</artifactId>  <version>3.0.0</version>  <scope>test</scope></dependency>

And here is how to wait a file exists and its content (for instance) is the expected one:

86 |

await()  .atMost(5, MINUTES)  .until(  () -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,  equalTo("the expected content of the file"));

5.2.3. component-runtime-http-junit

The HTTP JUnit module allows you to mock REST API very easily. Here are its coordinates:

<dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-junit</artifactId>  <version>${talend-component.version}</version>  <scope>test</scope></dependency>

this module uses Apache Johnzon and Netty, if you have any conflict (inparticular with netty) you can add the classifier shaded to the dependencyand the two dependencies are shaded avoiding the conflicts with yourcomponent.

It supports JUnit 4 and JUnit 5 as well but the overall concept is the exact same one: theextension/rule is able to serve precomputed responses saved in the classpath.

You can plug your own ResponseLocator to map a request to a response but the defaultimplementation - which should be sufficient in most cases - will look intalend/testing/http/<class name>_<method name>.json. Note that you can also put it intalend/testing/http/<request path>.json.

JUnit 4

JUnit 4 setup is done through two rules: JUnit4HttpApi which is responsible to start theserver and JUnit4HttpApiPerMethodConfigurator which is responsible to configure theserver per test and also handle the capture mode (see later).

if you don’t use the JUnit4HttpApiPerMethodConfigurator, the capturefeature will be deactivated and the per test mocking will not be available.

Most of the test will look like:

| 87

public class MyRESTApiTest {  @ClassRule  public static final JUnit4HttpApi API = new JUnit4HttpApi();

  @Rule  public final JUnit4HttpApiPerMethodConfigurator configurator = newJUnit4HttpApiPerMethodConfigurator(API);

  @Test  public void direct() throws Exception {  // ... do your requests  }}

SSL

For tests using SSL based services, you will need to use activeSsl() on the JUnit4HttpApirule.

If you need to access the server ssl socket factory you can do it from the HttpApiHandler(the rule):

@ClassRulepublic static final JUnit4HttpApi API = new JUnit4HttpApi()<strong>.activeSsl()</strong>;

@Testpublic void test() throws Exception {  final HttpsURLConnection connection = getHttpsConnection();  connection.setSSLSocketFactory(API.getSslContext().getSocketFactory());  // ....}

JUnit 5

JUnit 5 uses a JUnit 5 extension based on the HttpApi annotation you can put on your testclass. You can inject the test handler (which has some utilities for advanced cases) through@HttpApiInject:

88 |

@HttpApiclass JUnit5HttpApiTest {  @HttpApiInject  private HttpApiHandler<?> handler;

  @Test  void getProxy() throws Exception {  // .... do your requests  }}

the injection is optional and the @HttpApi allows you to configure severalbehaviors of the test.

SSL

For tests using SSL based services, you will need to use @HttpApi(useSsl = true).

You can access the client SSL socket factory through the api handler:

@HttpApi*(useSsl = true)*class MyHttpsApiTest {  @HttpApiInject  private HttpApiHandler<?> handler;

  @Test  void test() throws Exception {  final HttpsURLConnection connection = getHttpsConnection();  connection.setSSLSocketFactory(handler.getSslContext().getSocketFactory());  // ....  }}

Capturing mode

The strength of this implementation is to run a small proxy server and auto configure theJVM: http[s].proxyHost, http[s].proxyPort, HttpsURLConnection#defaultSSLSocketFactory

and SSLContext#default are auto configured to work out of the box with the proxy.

It allows you to keep in your tests the native and real URLs. For instance this test isperfectlt valid:

| 89

public class GoogleTest {  @ClassRule  public static final JUnit4HttpApi API = new JUnit4HttpApi();

  @Rule  public final JUnit4HttpApiPerMethodConfigurator configurator = newJUnit4HttpApiPerMethodConfigurator(API);

  @Test  public void google() throws Exception {  assertEquals(HttpURLConnection.HTTP_OK, get("https://google.fr?q=Talend"));  }

  private int get(final String uri) throws Exception {  // do the GET request, skipped for brievity  }}

If you execute this test, it will fail with a HTTP 400 because the proxy doesn’t find themocked response. You can create it manually as seen in the introduction of the modulebut you can also set the property talend.junit.http.capture to the folder where to storethe captures. It must be the root folder and not the folder where the json are (ie notprefixed by talend/testing/http by default).

Generally you will want to use src/test/resources. If new File("src/test/resources")

resolves to the valid folder when executing your test (Maven default), then you can justset the system property to true, otherwise you need to adjust accordingly the systemproperty value.

Once you ran the tests with this system property, the testing framework will have createdthe correct mock response files and you can remove the system property. The test will stillpass, using google.com…even if you disconnect your machine from the internet.

The rule (extension) is doing all the work for you :).

Passthrough mode

Setting talend.junit.http.passthrough system property to true, the server will just be aproxy and will execute each request to the actual server - like in capturing mode.

5.3. Beam testingIf you want to ensure your component works in Beam the minimum to do is to try withthe direct runner (if you don’t want to use spark).

Check beam.apache.org/contribute/testing/ out for more details.

90 |

5.4. Multiple environments for the same testsJUnit (4 or 5) already provides some ways to parameterized tests and execute the same"test logic" against several data. However it is not that convenient to test multipleenvironments.

For instance, with Beam, you can desire to test against multiple runners your code and itrequires to solve conflicts between runner dependencies, setup the correct classloadersetc…It is a lot of work!

To simplify such cases, the framework provides you a multi-environment support for yourtests.

It is in the junit module and is usable with JUnit 4 and JUnit 5.

5.4.1. JUnit 4

@RunWith(MultiEnvironmentsRunner.class)@Environment(Env1.class)@Environment(Env1.class)public class TheTest {  @Test  public void test1() {  // ...  }}

The MultiEnvironmentsRunner will execute the test(s) for each defined environments. Itmeans it will run test1 for Env1 and Env2 in previous example.

By default JUnit4 runner will be used to execute the tests in one environment but you canuse @DelegateRunWith to use another runner.

5.4.2. JUnit 5

JUnit 5 configuration is close to JUnit 4 one:

@Environment(EnvironmentsExtensionTest.E1.class)@Environment(EnvironmentsExtensionTest.E2.class)class TheTest {

  @EnvironmentalTest  void test1() {  // ...  }}

| 91

The main difference is you don’t use a runner (it doesn’t exist in JUnit 5) and you replace@Test by @EnvironmentalTest.

the main difference with JUnit 4 integration is that the tests are executeone after each other for all environments instead of running all tests ineach environments sequentially. It means, for instance, that @BeforeAlland @AfterAll are executed once for all runners.

5.4.3. Provided environments

The provided environment setup the contextual classloader to load the related runner ofApache Beam.

Package: org.talend.sdk.component.junit.environment.builtin.beam

the configuration is read from system properties, environment variables,….

Class Name Description

ContextualEnvironment Contextual Contextual runner

DirectRunnerEnvironment Direct Direct runner

FlinkRunnerEnvironment Flink Flink runner

SparkRunnerEnvironment Spark Spark runner

5.4.4. Configuring environments

If the environment extends BaseEnvironmentProvider and therefore defines anenvironment name - which is the case of the default ones, you can useEnvironmentConfiguration to customize the system properties used for that environment:

92 |

@Environment(DirectRunnerEnvironment.class)@EnvironmentConfiguration(  environment = "Direct",  systemProperties = @EnvironmentConfiguration.Property(key ="beamTestPipelineOptions", value = "..."))

@Environment(SparkRunnerEnvironment.class)@EnvironmentConfiguration(  environment = "Spark",  systemProperties = @EnvironmentConfiguration.Property(key ="beamTestPipelineOptions", value = "..."))

@Environment(FlinkRunnerEnvironment.class)@EnvironmentConfiguration(  environment = "Flink",  systemProperties = @EnvironmentConfiguration.Property(key ="beamTestPipelineOptions", value = "..."))class MyBeamTest {

  @EnvironmentalTest  void execute() {  // run some pipeline  }}

if you set the system property <environment name>.skip=true then theenvironment related executions will be skipped.

Advanced usage

this usage assumes Beam 2.4.0 is in used and the classloader fix about thePipelineOptions is merged.

Dependencies:

| 93

<dependencies>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-junit</artifactId>  <scope>test</scope>  </dependency>  <dependency>  <groupId>org.junit.jupiter</groupId>  <artifactId>junit-jupiter-api</artifactId>  <scope>test</scope>  </dependency>  <dependency>  <groupId>org.jboss.shrinkwrap.resolver</groupId>  <artifactId>shrinkwrap-resolver-impl-maven</artifactId>  <version>3.0.1</version>  <scope>test</scope>  </dependency>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-beam</artifactId>  <scope>test</scope>  </dependency>  <dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-runtime-standalone</artifactId>  <scope>test</scope>  </dependency></dependencies>

These dependencies brings into the test scope the JUnit testing toolkit, the Beamintegration and the multi-environment testing toolkit for JUnit.

Then using the fluent DSL to define jobs - which assumes your job is linear and each stepsends a single value (no multi-input/multi-output), you can write this kind of test:

@Environment(ContextualEnvironment.class)@Environment(DirectRunnerEnvironment.class)class TheComponentTest {  @EnvironmentalTest  void testWithStandaloneAndBeamEnvironments() {  from("myfamily://in?config=xxxx")  .to("myfamily://out")  .create()  .execute();  // add asserts on the output if needed  }}

94 |

It will execute the chain twice:

1. with a standalone environment to simulate the studio

2. with a beam (direct runner) environment to ensure the portability of your job

5.5. Secrets/Passwords and MavenIf you desire you can reuse your Maven settings.xml servers - including the encryptedones. org.talend.sdk.component.maven.MavenDecrypter will give you the ability to find aserver username/password from a server identifier:

final MavenDecrypter decrypter = new MavenDecrypter();final Server decrypted = decrypter.find("my-test-server");// decrypted.getUsername();// decrypted.getPassword();

It is very useful to not store secrets and test on real systems on a continuous integrationplatform.

even if you don’t use maven on the platform you can generate thesettings.xml and settings-security.xml files to use that feature. Seemaven.apache.org/guides/mini/guide-encryption.html for more details.

5.6. Generating data?Several data generator exists if you want to populate objects with a semantic a bit moreevolved than a plain random string like commons-lang3:

• github.com/Codearte/jfairy

• github.com/DiUS/java-faker

• github.com/andygibson/datafactory

• …

A bit more advanced, these ones allow to bind directly generic data on a model - but dataquality is not always there:

• github.com/devopsfolks/podam

• github.com/benas/random-beans

• …

Note there are two main kind of implementations:

• the one using a pattern and random generated data

| 95

• a set of precomputed data extrapolated to create new values

Check against your use case to know which one is the best.

an interesting alternative to data generation is to import real data anduse Talend Studio to sanitize the data (remove sensitive informationreplacing them by generated data or anonymized data) and just injectthat file into the system.

If you are using JUnit 5, you can have a look to glytching.github.io/junit-extensions/randomBeans which is pretty good on that topic.

6. Talend Component Best Practices

6.1. Organize your codeFew recommandations apply to the way a component packages are organized:

1. ensure to create a package-info.java with the component family/categories at the rootof your component package:

@Components(family = "jdbc", categories = "Database")package org.talend.sdk.component.jdbc;

import org.talend.sdk.component.api.component.Components;

2. create a package for the configuration

3. create a package for the actions

4. create a package for the component and one subpackage by type of component (input,output, processors, …)

6.2. Modelize your configurationIt is recommanded to ensure your configuration is serializable since it is likely you willpass it through your components which can be serialized.

6.3. I/O configurationThe first step to build a component is to identify the way it must be configured.

It is generally split into two main big concepts:

1. the DataStore which is the way you can access the backend

96 |

2. the DataSet which is the way you interact with the backend

Here are some examples to let you get an idea of what you put in each categories:

Example description DataStore DataSet

Accessing a relationaldatabase like MySQL

the JDBC driver, url, usernameand password

the query to execute, rowmapper, …

Access a file system the file pattern (or directory +file extension/prefix/…)

the file format, potentiallythe buffer size, …

It is common to make the dataset including the datastore since both are required to work.However it is recommanded to replace this pattern by composing both in a higher levelconfiguration model:

@DataSetpublic class MyDataSet {  // ...}

@DataStorepublic class MyDataStore {  // ...}

public class MyComponentConfiguration {  @Option  private MyDataSet dataset;

  @Option  private MyDataStore datastore;}

6.4. Processor configurationProcessor configuration is simpler than I/O configuration since it is specific each time. Forinstance a mapper will take the mapping between the input and output model:

| 97

public class MappingConfiguration {  @Option  private Map<String, String> fieldsMapping;

  @Option  private boolean ignoreCase;

  //...}

6.5. I/O recommandationsI/O are particular because they can be linked to a set of actions. It is recommanded to wireall the ones you can apply to ensure the consumers of your component can provide a richexperience to their users.

Here are the most common ones:

Type

Action

Description

Configuration example Action example

DataStore

@Checkable

Expose away toensurethedatastore/connection works

@DataStore@Checkablepublic classJdbcDataStore  implements Serializable{

  @Option  private String driver;

  @Option  private String url;

  @Option  private Stringusername;

  @Option  private Stringpassword;}

@HealthCheckpublic HealthCheckStatushealthCheck(@Option("datastore") JdbcDataStoredatastore) {  if (!doTest(dataStore)) {  // often add anexception message mappingor equivalent  return newHealthCheckStatus(Status.KO, "Test failed");  }  return newHealthCheckStatus(Status.KO, e.getMessage());}

98 |

6.6. I/O limitationsUntil the studio integration is complete, it is recommanded to limit processors to 1 input.

6.7. Handle UI interactionsIt is also recommanded to provide as much information as possible to let the UI work withthe data during its edition.

6.7.1. Validations

Light validations

The light validations are all the validations you can execute on the client side. They arelisted in the UI hint part.

This is the ones to use first before going with custom validations since they will be moreefficient.

Custom validations

These ones will enforce custom code to be executed, they are more heavy so try to avoidto use them for simple validations you can do with the previous part.

Here you define an action taking some parameters needed for the validation and you linkthe option you want to validate to this action. Here is an example to validate a dataset. Forexample for our JDBC driver we could have:

// ...public class JdbcDataStore  implements Serializable {

  @Option  @Validable("driver")  private String driver;

  // ...}

@AsyncValidation("driver")public ValidationResult validateDriver(@Option("value") String driver) {  if (findDriver(driver) != null) {  return new ValidationResult(Status.OK, "Driver found");  }  return new ValidationResult(Status.KO, "Driver not found");}

| 99

Note that you can also make a class validable and you can use it to validate a form if youput it on your whole configuration:

// note: some part of the API were removed for brievity

public class MyConfiguration {

  // a lot of @Options}

public MyComponent {  public MyComponent(@Validable("configuration") MyConfiguration config) {  // ...  }

  //...}

@AsyncValidation("configuration")public ValidationResult validateDriver(@Option("value") MyConfigurationconfiguration) {  if (isValid(configuration)) {  return new ValidationResult(Status.OK, "Configuration valid");  }  return new ValidationResult(Status.KO, "Driver not valid ${because ...}");}

the parameter binding of the validation method uses the same logic thanthe component configuration injection. Therefore the @Option specifiesthe prefix to use to reference a parameter. It is recommanded to use@Option("value") until you know exactly why you don’t use it. This waythe consumer can match the configuration model and just prefix it withvalue. to send the instance to validate.

6.7.2. Completion

It can be neat and user friendly to provide completion on some fields. Here an examplefor the available drivers:

100 |

// ...public class JdbcDataStore  implements Serializable {

  @Option  @Completable("driver")  private String driver;

  // ...}

@Completion("driver")public CompletionList findDrivers() {  return new CompletionList(findDriverList());}

6.7.3. Don’t forget the component representation

Each component must have its own icon:

@Icon(Icon.IconType.DB_INPUT)@PartitionMapper(family = "jdbc", name = "input")public class JdbcPartitionMapper  implements Serializable {}

you can use talend.surge.sh/icons/ to identify the one you want to use.

6.8. Version and componentNot mandatory for the first version but recommanded: enforce the version of yourcomponent.

@Version(1)@PartitionMapper(family = "jdbc", name = "input")public class JdbcPartitionMapper  implements Serializable {}

If you break a configuration entry in a later version ensure to:

1. upgrade the version

2. support a migration of the configuration

| 101

@Version(value = 2, migrationHandler = JdbcPartitionMapper.Migrations.class)@PartitionMapper(family = "jdbc", name = "input")public class JdbcPartitionMapper  implements Serializable {

  public static class Migrations implements MigrationHandler {  // implement your migration  }}

6.9. Don’t forget to testTesting the components is crucial, you can use unit tests and simple standalone JUnit butit is highly recommanded to have a few Beam tests to ensure your component works inBig Data world.

6.10. Contribute to this guideDon’t hesitate to send your feedback on writing component and best practices you canencounter.

7. Talend Component REST APIDocumentation

a test environment is available on Heroku and browable using TalendComponent Kit Server Restlet Studio instance.

7.1. HTTP APIThe HTTP API intends to expose over HTTP most of Talend Component features, it is astandalone Java HTTP server.

WebSocket protocol is activated for the endpoints as well, instead of/api/v1 they uses the base /websocket/v1, see WebSocket part for moredetails.

Here is the API:

7.1.1. REST resources of Component Runtime :: Server

0.0.3

102 |

POST api/v1/action/execute

This endpoint will execute any UI action and serialize the response as a JSON (pojo model)It takes as input the family, type and name of the related action to identify it and itsconfiguration as a flat key value set using the same kind of mapping than for components(option path as key).

Request

Content-Type: application/jsonRequest Body: (java.util.Map<java.lang.String, java.lang.String>) Query Param:action, java.lang.StringQuery Param: family, java.lang.StringQuery Param: type, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (java.lang.RuntimeException)

400 Bad Request

Response Body: (org.talend.sdk.component.server.front.model.error.ErrorPayload)

{  "code":"ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED",  "description": "string"}

404 Not Found

Response Body: (org.talend.sdk.component.server.front.model.error.ErrorPayload)

{  "code":"ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED",  "description": "string"}

GET api/v1/action/index

This endpoint returns the list of available actions for a certain falimy and potentially

| 103

filters the " output limiting it to some families and types of actions.

Request

No bodyQuery Param: family, java.lang.StringQuery Param: language, java.lang.StringQuery Param: type, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (org.talend.sdk.component.server.front.model.ActionList)

104 |

{  "items": [  {  "component": "string",  "name": "string",  "properties": [  {  "defaultValue": "string",  "displayName": "string",  "metadata": {  },  "name": "string",  "path": "string",  "placeholder": "string",  "type": "string",  "validation": {  "enumValues": [  "string"  ],  "max": 0,  "maxItems": 0,  "maxLength": 0,  "min": 0,  "minItems": 0,  "minLength": 0,  "pattern": "string",  "required": false,  "uniqueItems": false  }  }  ],  "type": "string"  }  ]}

GET api/v1/component/dependencies

Returns a list of dependencies for the given components.

don’t forget to add the component itself since it will not be part of thedependencies.

Then you can use /dependency/{id} to download the binary.

Request

No body

| 105

Query Param: identifier, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (org.talend.sdk.component.server.front.model.Dependencies)

{  "dependencies": {  }}

GET api/v1/component/dependency/{id}

Return a binary of the dependency represented by id. It can be maven coordinates fordependencies or a component id.

Request

No bodyPath Param: id, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (javax.ws.rs.core.StreamingOutput)

404 Not Found

Response Body: (org.talend.sdk.component.server.front.model.error.ErrorPayload)

{  "code":"ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED",  "description": "string"}

GET api/v1/component/details

Returns the set of metadata about a few components identified by their 'id'.

106 |

Request

No bodyQuery Param: identifiers, java.lang.StringQuery Param: language, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (org.talend.sdk.component.server.front.model.ComponentDetailList)

{  "details": [  {  "actions": [  {  "family": "string",  "name": "string",  "properties": [  {  "defaultValue": "string",  "displayName": "string",  "metadata": {  },  "name": "string",  "path": "string",  "placeholder": "string",  "type": "string",  "validation": {  "enumValues": [  "string"  ],  "max": 0,  "maxItems": 0,  "maxLength": 0,  "min": 0,  "minItems": 0,  "minLength": 0,  "pattern": "string",  "required": false,  "uniqueItems": false  }  }  ],  "type": "string"  }  ],

| 107

  "displayName": "string",  "icon": "string",  "id": {  "family": "string",  "familyId": "string",  "id": "string",  "name": "string",  "plugin": "string",  "pluginLocation": "string"  },  "inputFlows": [  "string"  ],  "links": [  {  "contentType": "string",  "name": "string",  "path": "string"  }  ],  "outputFlows": [  "string"  ],  "properties": [  {  "defaultValue": "string",  "displayName": "string",  "metadata": {  },  "name": "string",  "path": "string",  "placeholder": "string",  "type": "string",  "validation": {  "enumValues": [  "string"  ],  "max": 0,  "maxItems": 0,  "maxLength": 0,  "min": 0,  "minItems": 0,  "minLength": 0,  "pattern": "string",  "required": false,  "uniqueItems": false  }  }  ],  "type": "string",

108 |

  "version": 0  }  ]}

400 Bad Request

Response Body: (java.util.Map<java.lang.String,org.talend.sdk.component.server.front.model.error.ErrorPayload>)

GET api/v1/component/icon/family/{id}

Returns a particular family icon in raw bytes.

Request

No bodyPath Param: id, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (byte[])

{}

404 Not Found

Response Body: (org.talend.sdk.component.server.front.model.error.ErrorPayload)

{  "code":"ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED",  "description": "string"}

GET api/v1/component/icon/{id}

Returns a particular component icon in raw bytes.

Request

No body

| 109

Path Param: id, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (byte[])

{}

404 Not Found

Response Body: (org.talend.sdk.component.server.front.model.error.ErrorPayload)

{  "code":"ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED",  "description": "string"}

GET api/v1/component/index

Returns the list of available components.

Request

No bodyQuery Param: includeIconContent, booleanQuery Param: language, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (org.talend.sdk.component.server.front.model.ComponentIndices)

110 |

{  "components": [  {  "categories": [  "string"  ],  "displayName": "string",  "familyDisplayName": "string",  "icon": {  "customIcon": {  },  "customIconType": "string",  "icon": "string"  },  "iconFamily": {  "customIcon": {  },  "customIconType": "string",  "icon": "string"  },  "id": {  "family": "string",  "familyId": "string",  "id": "string",  "name": "string",  "plugin": "string",  "pluginLocation": "string"  },  "links": [  {  "contentType": "string",  "name": "string",  "path": "string"  }  ],  "version": 0  }  ]}

POST api/v1/component/migrate/{id}/{configurationVersion}

Allows to migrate a component configuration without calling any component execution.

Request

Content-Type: application/jsonRequest Body: (java.util.Map<java.lang.String, java.lang.String>) Path Param:

| 111

configurationVersion, intPath Param: id, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (java.util.Map<java.lang.String, java.lang.String>)

GET api/v1/configurationtype/index

Returns all available configuration type - storable models.

Request

No bodyQuery Param: language, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (org.talend.sdk.component.server.front.model.ConfigTypeNodes)

{  "nodes": {  }}

POST api/v1/configurationtype/migrate/{id}/{configurationVersion}

Allows to migrate a configuration without calling any component execution.

Request

Content-Type: application/jsonRequest Body: (java.util.Map<java.lang.String, java.lang.String>) Path Param:configurationVersion, intPath Param: id, java.lang.String

Response

Content-Type: application/json

112 |

200 OK

Response Body: (java.util.Map<java.lang.String, java.lang.String>)

GET api/v1/documentation/component/{id}

Returns an asciidoctor version of the documentation for the component represented by itsidentifier id.

Format can be either asciidoc or html - if not it will fallback on asciidoc - and if html isselected you get a partial document.

it is recommanded to use asciidoc format and handle the conversion onyour side if you can, the html flavor handdles a limited set of the asciidocsyntax only like plain arrays, paragraph and titles.

The documentation will likely be the family documentation but you can use anchors toaccess a particular component (_componentname_inlowercase).

Request

No bodyPath Param: id, java.lang.StringQuery Param: format, java.lang.StringQuery Param: language, java.lang.String

Response

Content-Type: application/json

200 OK

Response Body: (org.talend.sdk.component.server.front.model.DocumentationContent)

{  "source": "string",  "type": "string"}

GET api/v1/environment

Returns the environment of this instance. Useful to check the version or configure ahealthcheck for the server.

Request

No body

| 113

Response

Content-Type: */*

200 OK

Response Body: (org.talend.sdk.component.server.front.model.Environment)

{  "commit": "string",  "latestApiVersion": 0,  "time": "string",  "version": "string"}

POST api/v1/execution/read/{family}/{component}

deprecated

Read inputs from an instance of mapper. The number of returned records if enforced tobe limited to 1000. The format is a JSON based format where each like is a json record.

Request

Content-Type: application/jsonRequest Body: (java.util.Map<java.lang.String, java.lang.String>) Path Param:component, java.lang.StringPath Param: family, java.lang.StringQuery Param: size, long

Response

Content-Type: talend/stream

204 No Content

POST api/v1/execution/write/{family}/{component}

deprecated

Sends records using a processor instance. Note that the processor should have only aninput. Behavior for other processors is undefined. The input format is a JSON basedformat where each like is a json record - same as for the symmetric endpoint.

Request

Content-Type: talend/streamRequest Body: (java.io.InputStream) Path Param: component, java.lang.StringPath Param: family, java.lang.String

114 |

Query Param: group-size, long

Response

Content-Type: application/json

204 No Content

to ensure the migration can be activated you need to set in the executionconfiguration you send to the server the version it was created with(component version, it is in component detail endpoint) with the keytcomp::component::version.

7.1.2. Deprecated endpoints

If some endpoints are intended to disappear they will be deprecated. In practise it meansa header X-Talend-Warning will be returned with some message as value.

7.1.3. WebSocket transport

You can connect on any endpoint replacing /api by /websocket and appending /<httpmethod> for the URL and formatting the request as:

SENDdestination: <endpoint after v1><headers>

<payload>^@

For instance:

SENDdestination: /component/indexAccept: application/json

^@

The response is formatted as follow:

MESSAGEstatus: <http status code><headers>

<payload>^@

| 115

if you have a doubt about the endpoint, they are all logged during startupand you can find them in the logs.

If you don’t want to create a pool of connection per endpoint/verb you can use the busendpoint: /websocket/v1/bus. This endpoint requires that you add the headerdestinationMethod to each request with the verb value - default would be GET:

SENDdestination: /component/indexdestinationMethod: GETAccept: application/json

^@

7.2. Web forms and REST APIcomponent-form library provides a way to build a component REST API facade compatiblewith react form library.

A trivial facade can be:

@Path("tacokit-facade")@ApplicationScopedpublic class ComponentFacade {

  @Inject  private Client client;

  @Inject  private UiSpecService uiSpecService;

  @Inject  private ActionService actionService;

  @POST  @Path("action")  public UiActionResult action(@QueryParam("family") final String family,@QueryParam("type") final String type,  @QueryParam("action") final String action, final Map<String,Object> params) {  try {  return actionService.map(type, client.action(family, type, action,params));  } catch (final WebException exception) {  final UiActionResult payload = actionService.map(exception);  throw new WebApplicationException(Response.status(exception

116 |

.getStatus()).entity(payload).build());  }  }

  @GET  @Path("index")  public ComponentIndices getIndex(@QueryParam("language") @DefaultValue("en") final String language) {  final ComponentIndices index = client.index(language);  // our mapping is a bit different so rewrite links  index.getComponents().stream().flatMap(c -> c.getLinks().stream()).forEach(link -> link.setPath(  link.getPath().replaceFirst("\\/component\\/", "\\/tcomp-facade\\/").replace("/details?identifiers=", "/detail/")));  return index;  }

  @GET  @Path("detail/{id}")  public Ui getDetail(@QueryParam("language") @DefaultValue("en") finalString language,  @PathParam("id") final String id) {  final List<ComponentDetail> details = client.details(language, id, newString[0]).getDetails();  if (details.isEmpty()) {  throw new WebApplicationException(Response.Status.BAD_REQUEST);  }  return uiSpecService.convert(details.iterator().next());  }}

the Client can be created usingClientFactory.createDefault(System.getProperty("app.components.base",

"http://localhost:8080/api/v1")) and the service can be a simple newUiSpecService().

All the conversion between component model (REST API) and uiSpec model is donethrough the UiSpecService. It is based on the object model which will be mapped to a uimodel. The advantage to have a flat model in the component REST API is to make theselayers easy to customize.

You can completely control the available components, tune the rendering switching theuiSchema if desired or add/remove part of the form. You can also add customactions/buttons for specific needs of the application.

the  /migrate endpoint has nothing special so was not shown in previoussnippet but if you need it you must add it as well.

| 117

7.2.1. Use UiSpec model without all the tooling

<dependency>  <groupId>org.talend.sdk.component</groupId>  <artifactId>component-form-model</artifactId>  <version>${talend-component-kit.version}</version></dependency>

This maven dependency provides the UISpec model classes. You can use the Ui API (withor without the builders) to create UiSpec representations.

Example:

final Ui form1 = ui()  // (1)  .withJsonSchema(JsonSchema.jsonSchemaFrom(Form1.class).build())  // (2)  .withUiSchema(uiSchema()  .withKey("multiSelectTag")  .withRestricted(false)  .withTitle("Simple multiSelectTag")  .withDescription("This datalist accepts values that are not in thelist of suggestions")  .withWidget("multiSelectTag")  .build())  // (3)  .withProperties(myFormInstance)  .build();

// (4)final String json = jsonb.toJson(form1);

1. We extract the JsonSchema from reflection on the class Form1. Note that@JsonSchemaIgnore allows to ignore a field and @JsonSchemaProperty allows to rename aproperty,

2. We build programmatically using the builder API a UiSchema,

3. We pass an instance of the form to let the serializer extracts it JSON model,

4. We serialize the Ui model which can be used by UiSpec compatible front widgets.

the model uses JSON-B API to define the binding, ensure to have animplementation in your classpath. This can be done adding thesedependencies:

118 |

<dependency>  <groupId>org.apache.geronimo.specs</groupId>  <artifactId>geronimo-jsonb_1.0_spec</artifactId>  <version>1.0</version></dependency><dependency>  <groupId>org.apache.geronimo.specs</groupId>  <artifactId>geronimo-json_1.1_spec</artifactId>  <version>1.0</version></dependency><dependency>  <groupId>org.apache.johnzon</groupId>  <artifactId>johnzon-jsonb</artifactId>  <version>${johnzon.version}</version> <!-- 1.1.5 for instance --></dependency>

7.2.2. Javascript integration

Default javascript integration goes through Talend UI Forms library.

It is bundled as a NPM module called component-kit.js. It provides a default triggerimplementation for the UIForm.

Here is how to use it:

| 119

import React from 'react';import UIForm from '@talend/react-forms/lib/UIForm/UIForm.container';import TalendComponentKitTrigger from 'component-kit.js';

export default class ComponentKitForm extends React.Component {  constructor(props) {  super(props);  this.trigger = new TalendComponentKitTrigger({ url:'/api/to/component/server/proxy' });  this.onTrigger = this.onTrigger.bind(this);  // ...  }

  onTrigger(event, payload) {  return this.trigger.onDefaultTrigger(event, payload);  }

  // ...

  render() {  if(! this.state.uiSpec) {  return (<div>Loading ...</div>);  }

  return (  <UIForm  data={this.state.uiSpec}  onTrigger={this.onTrigger}  onSubmit={this.onSubmit}  />  );  }}

7.3. LoggingThe logging uses Log4j2, you can specify a custom configuration using the systemproperty -Dlog4j.configurationFile or adding a log4j2.xml file into the classpath.

Here are some common configurations:

• Console logging:

120 |

<?xml version="1.0"?><Configuration status="INFO">  <Appenders>  <Console name="Console" target="SYSTEM_OUT">  <PatternLayout pattern="[%d{HH:mm:ss.SSS}][%highlight{%-5level}][%15.15t][%30.30logger] %msg%n"/>  </Console>  </Appenders>  <Loggers>  <Root level="INFO">  <AppenderRef ref="Console"/>  </Root>  </Loggers></Configuration>

This outputs messages looking like:

[16:59:58.198][INFO ][ main][oyote.http11.Http11NioProtocol]Initializing ProtocolHandler ["http-nio-34763"]

• JSON logging:

<?xml version="1.0"?><Configuration status="INFO">  <Properties>  <!-- DO NOT PUT logSource there, it is useless and slow -->  <Property name="jsonLayout">{"severity":"%level","logMessage":"%encode{%message}{JSON}","logTimestamp":"%d{ISO8601}{UTC}","eventUUID":"%uuid{RANDOM}","@version":"1","logger.name":"%encode{%logger}{JSON}","host.name":"${hostName}","threadName":"%encode{%thread}{JSON}","stackTrace":"%encode{%xThrowable{full}}{JSON}"}%n</Property>  </Properties>  <Appenders>  <Console name="Console" target="SYSTEM_OUT">  <PatternLayout pattern="${jsonLayout}"/>  </Console>  </Appenders>  <Loggers>  <Root level="INFO">  <AppenderRef ref="Console"/>  </Root>  </Loggers></Configuration>

Output messages look like:

| 121

{"severity":"INFO","logMessage":"Initializing ProtocolHandler [\"http-nio-46421\"]","logTimestamp":"2017-11-20T16:04:01,763","eventUUID":"8b998e17-7045-461c-8acb-c43f21d995ff","@version":"1","logger.name":"org.apache.coyote.http11.Http11NioProtocol","host.name":"TLND-RMANNIBUCAU","threadName":"main","stackTrace":""}

• Rolling file appender

<?xml version="1.0"?><Configuration status="INFO">  <Appenders>  <RollingRandomAccessFile name="File" fileName="${LOG_PATH}/application.log" filePattern="${LOG_PATH}/application-%d{yyyy-MM-dd}.log">  <PatternLayout pattern="[%d{HH:mm:ss.SSS}][%highlight{%-5level}][%15.15t][%30.30logger] %msg%n"/>  <Policies>  <SizeBasedTriggeringPolicy size="100 MB" />  <TimeBasedTriggeringPolicy interval="1" modulate="true"/>  </Policies>  </RollingRandomAccessFile>  </Appenders>  <Loggers>  <Root level="INFO">  <AppenderRef ref="File"/>  </Root>  </Loggers></Configuration>

More details are available on RollingFileAppender documentation.

of course you can compose previous layout (message format) andappenders (where logs are written).

7.4. Server ConfigurationThe server module contains several configuration you can set in:

• Environment variables

• System properties

• A file located based on the --component-configuration CLI option

the configuration is read from system properties, environment variables,….

122 |

Key Description Default

talend.component.server.component.coordinates

A comma separated list of gav to locatethe components

-

talend.component.server.component.registry

A property file where the value is a gavof a component to register(complementary with coordinates)

-

talend.component.server.documentation.active

Should the /documentation endpoint beactivated.

true

talend.component.server.execution.dataset.retriever.timeout

How long the read execution endpointcan last (max)

180

talend.component.server.execution.pool.size

The size of the execution pool forruntime endpoints.

64

talend.component.server.execution.pool.wait

How long the application waits duringshutdown for the execution tasks tocomplete

PT10S

talend.component.server.jaxrs.exceptionhandler.defaultMessage

If set it will replace any message forexceptions. Set to false to use theactual exception message.

false

talend.component.server.maven.repository

The local maven repository used tolocate components and theirdependencies

${user.home}/.m2/repository

talend.component.server.monitoring.brave.reporter.async

When using url or kafka reporter, youcan configure the async reporter withproperties passed to this configurationentry.Ex:messageTimeout=5000,closeTimeout=5000.

console

talend.component.server.monitoring.brave.reporter.type

The brave reporter to use to send thespans. Supported values are [auto,console, noop, url]. When configurationis needed,you can use this syntax toconfigure the repoter if needed:<name>(config1=value1,config2=value2), for example:url(endpoint=http://brave.company.com.

In auto mode, if environment variableTRACING_ON doesn’t exist or is set tofalse, noop will be selected, and is set totrue, TRACING_KAFKA_URL,TRACING_KAFKA_TOPIC andTRACING_SAMPLING_RATE will configurekafka reporter..

auto

talend.component.server.monitoring.brave.sampling.action.rate

The accuracy rate of the sampling foraction endpoints.

-1

| 123

Key Description Default

talend.component.server.monitoring.brave.sampling.component.rate

The accuracy rate of the sampling forcomponent endpoints.

-1

talend.component.server.monitoring.brave.sampling.configurationtype.rate

The accuracy rate of the sampling forenvironment endpoints.

-1

talend.component.server.monitoring.brave.sampling.documentation.rate

The accuracy rate of the sampling fordocumentation endpoint.

-1

talend.component.server.monitoring.brave.sampling.environment.rate

The accuracy rate of the sampling forenvironment endpoints.

-1

talend.component.server.monitoring.brave.sampling.execution.rate

The accuracy rate of the sampling forexecution endpoints.

1

talend.component.server.monitoring.brave.sampling.rate

The accuracy rate of the sampling. -1.

talend.component.server.monitoring.brave.service.name

The name used by the brave integration(zipkin)

component-server

talend.component.server.security.command.handler

How to validate a command/request.Accepted values: securityNoopHandler.

securityNoopHandler

talend.component.server.security.connection.handler

How to validate a connection. Acceptedvalues: securityNoopHandler.

securityNoopHandler

8. Wrapping a Beam I/O

8.1. LimitationsThis part is limited to particular kinds of Beam PTransform:

• the PTransform<PBegin, PCollection<?>> for the inputs

• the PTransform<PCollection<?>, PDone> for the outputs. The outputs also must use asingle (composite or not) DoFn in their apply method.

8.2. Wrap an inputAssume you want to wrap an input like this one (based on existing Beam ones):

124 |

@AutoValuepublic abstract [static] class Read extends PTransform<PBegin, PCollection<String>> {

  // config

  @Override  public PCollection<String> expand(final PBegin input) {  return input.apply(  org.apache.beam.sdk.io.Read.from(new BoundedElasticsearchSource(this,null)));  }

  // ... other transform methods}

To wrap the Read in a framework component you create a transform delegating to thisone with a @PartitionMapper annotation at least (you likely want to follow the bestpractices as well adding @Icon and @Version) and using @Option constructor injections toconfigure the component:

@PartitionMapper(family = "myfamily", name = "myname")public class WrapRead extends PTransform<PBegin, PCollection<String>> {  private PTransform<PBegin, PCollection<String>> delegate;

  public WrapRead(@Option("dataset") final WrapReadDataSet dataset) {  delegate = TheIO.read().withConfiguration(this.createConfigurationFrom(dataset));  }

  @Override  public PCollection<String> expand(final PBegin input) {  return delegate.expand(input);  }

  // ... other methods like the mapping with the native configuration(createConfigurationFrom)}

8.3. Wrap an outputAssume you want to wrap an output like this one (based on existing Beam ones):

| 125

@AutoValuepublic abstract [static] class Write extends PTransform<PCollection<String>,PDone> {

  // configuration withXXX(...)

  @Override  public PDone expand(final PCollection<String> input) {  input.apply(ParDo.of(new WriteFn(this)));  return PDone.in(input.getPipeline());  }

  // other methods of the transform}

You can wrap this output exactly the same way than for the inputs but using @Processorthis time:

@PartitionMapper(family = "myfamily", name = "myname")public class WrapRead extends PTransform<PCollection<String>, PDone> {  private PTransform<PCollection<String>, PDone> delegate;

  public WrapRead(@Option("dataset") final WrapReadDataSet dataset) {  delegate = TheIO.write().withConfiguration(this.createConfigurationFrom(dataset));  }

  @Override  public PDone expand(final PCollection<String> input) {  return delegate.expand(input);  }

  // ... other methods like the mapping with the native configuration(createConfigurationFrom)}

8.4. TipNote that the class org.talend.sdk.component.runtime.beam.transform.DelegatingTransformfully delegates to another transform the "expansion". Therefore you can extend it and justimplement the configuration mapping:

126 |

@Processor(family = "beam", name = "file")public class BeamFileOutput extends DelegatingTransform<PCollection<String>,PDone> {

  public BeamFileOutput(@Option("output") final String output) {  super(TextIO.write()  .withSuffix("test")  .to(FileBasedSink.convertToFileResourceIfPossible(output)));  }}

8.5. AdvancedIn terms of classloading, when you write an IO all the Beam SDK Java core stack isassumed in Talend Component Kit runtime as provided so never include it in compilescope - it would be ignored anyway.

8.5.1. Coder

If you need a JSonCoder you can useorg.talend.sdk.component.runtime.beam.factory.service.PluginCoderFactory service whichgives you access the JSON-P and JSON-B coders.

8.5.2. Sample

Here is a sample input based on beam Kafka:

@Version@Icon(Icon.IconType.KAFKA)@Emitter(name = "Input")@AllArgsConstructor@Documentation("Kafka Input")public class KafkaInput extends PTransform<PBegin, PCollection<JsonObject>> {①

  private final InputConfiguration configuration;

  private final JsonBuilderFactory builder;

  private final PluginCoderFactory coderFactory;

  private KafkaIO.Read<byte[], byte[]> delegate() {  final KafkaIO.Read<byte[], byte[]> read = KafkaIO.<byte[], byte[]>read()  .withBootstrapServers(configuration.getBootstrapServers())  .withTopics(configuration.getTopics().stream().map

| 127

(InputConfiguration.Topic::getName).collect(toList()))  .withKeyDeserializer(ByteArrayDeserializer.class).withValueDeserializer(ByteArrayDeserializer.class);  if (configuration.getMaxResults() > 0) {  return read.withMaxNumRecords(configuration.getMaxResults());  }  return read;  }

  @Override ②  public PCollection<JsonObject> expand(final PBegin pBegin) {  final PCollection<KafkaRecord<byte[], byte[]>> kafkaEntries = pBegin.getPipeline().apply(delegate());  return kafkaEntries.apply(ParDo.of(new RecordToJson(builder))).setCoder(coderFactory.jsonp()); ③  }

  @AllArgsConstructor  private static class RecordToJson extends DoFn<KafkaRecord<byte[], byte[]>, JsonObject> {

  private final JsonBuilderFactory builder;

  @ProcessElement  public void onElement(final ProcessContext context) {  context.output(toJson(context.element()));  }

  // todo: we shouldnt be typed string/string so make it evolving  private JsonObject toJson(final KafkaRecord<byte[], byte[]> element) {  return builder.createObjectBuilder().add("key", new String(element.getKV().getKey()))  .add("value", new String(element.getKV().getValue())).build();  }  }}

① the PTransform generics define it is an input (PBegin marker)

② the expand method chains the native IO with a custom mapper (RecordToJson)

③ the mapper uses the JSON-P coder automatically created from the contextualcomponent

Since the Beam wrapper doesn’t respect the standard Kit programming Model (no@Emitter for instance) you need to set<talend.validation.component>false</talend.validation.component> property in yourpom.xml (or equivalent for Gradle) to skip the Kit component programming modelvalidations.

128 |

9. Talend Component Appendix

9.1. ContainerManager or the classloadermanagerThe entry point of the API is the ContainerManager, it will enable you to define what is theShared classloader and to create children:

try (final ContainerManager manager = new ContainerManager( ①  ContainerManager.DependenciesResolutionConfiguration.builder() ②  .resolver(new MvnDependencyListLocalRepositoryResolver("META-INF/talend/dependencies.list"))  .rootRepositoryLocation(new File(System.getProperty("user.home",".m2/repository"))  .create(),  ContainerManager.ClassLoaderConfiguration.builder() ③  .parent(getClass().getClassLoader())  .classesFilter(name -> true)  .parentClassesFilter(name -> true)  .create())) {

  // create plugins

}

① the ContainerManager is an AutoCloseable so you can use it in a try/finally block ifdesired. NOTE: it is recommanded to keep it running if you can reuse plugins to avoidto recreate classloaders and to mutualize them. This manager has two mainconfiguration entries: how to resolve dependencies for plugins from the pluginfile/location and how to configure the classloaders (what is the parent classloader, howto handle the parent first/last delegation etc…).

② the DependenciesResolutionConfiguration enables to pass a custom Resolver which willbe used to build the plugin classloaders. For now the library only providesMvnDependencyListLocalRepositoryResolver which will read the output of mvn

dependencies:list put in the plugin jar and will resolve from a local maven repositorythe dependencies. Note that SNAPSHOT are only resolved based on their name and notfrom metadata (only useful in development). To continue the comparison to a Servletserver, you can easily implement an unpacked war resolver if you want.

③ the ClassLoaderConfiguration is configuring how the whole container/plugin pair willbehave: what is the shared classloader?, which classes are loaded from the sharedloader first (intended to be used for API which shouldn’t be loaded from the pluginloader), which classes are loaded from the parent classloader (useful to exclude to loada "common" library from the parent classloader for instance, can be neat for guava,commons-lang3 etc…).

| 129

Once you have a manager you can create plugins:

final Container plugin1 = manager.create( ①  "plugin-id", ②  new File("/plugin/myplugin1.jar")); ③

① to create a plugin Container just use the create method of the manager

② you can give an explicit id to the plugin (or if you bypass it, the manager will use thejar name)

③ you specify the plugin root jar

To create the plugin container, the Resolver will resolve the dependencies needed for theplugin, then the manager will create the plugin classloader and register the pluginContainer.

9.1.1. Listener for plugin registration

It is common to need to do some actions when a plugin is registered/unregistered. For thatpurpose ContainerListener can be used:

public class MyListener implements ContainerListener {  @Override  public void onCreate(final Container container) {  System.out.println("Container #" + container.getId() + " started.");  }

  @Override  public void onClose(final Container container) {  System.out.println("Container #" + container.getId() + " stopped.");  }}

They are registered on the manager directly:

final ContainerManager manager = getContainerManager();final ContainerListener myListener = new MyListener();

manager.registerListener(myListener); ①// do somethingmanager.unregisterListener(myListener); ②

① registerListener is used to add the listener from now on, it will not get any event foralready created containers.

② you can remove a listener with unregisterListener at any time.

130 |