Extraction Model

Subjects

An extraction model consists of one more more subjects. Each subject has its own set of tables and relations. Relations can also be global, which allows setting of global defaults that can be overridden by subjects. By default, any row found in a table in the extraction model is fetched in its entirety.

Tables

A table on a subject consists of the following:

table
The name of the table
column
The name of the column to extract values out of, when used together with values
values
A single number of string or an array of numbers or strings

Examples:

Example Description
All departments A table entry with just a table name will fetch all rows for that table
One department A table entry with a single column/value will fetch one row
Two departments A table entry with multiple column/values will fetch multiple rows
Two tables A table entry with multiple column/values will fetch multiple rows
Two subjects Two subjects with one table each

Relations

A relation enables or disables the processing of a foreign key in the database schema. A relationship is incoming or outgoing as seen from the perspective of a subject. All outgoing not null foreign keys must be processed to satisfy the foreign key constraint. This type of relationship is therefore always enabled and cannot be disabled,

A relation can be applied globally or to a subject. A global relation is always included in all subjects. Relations in a subject are only processed on rows related to the subject. See Extraction for more information.

defaults
Add all relations from a couple of selected types. See defaults for more details.
table
A foreign key constraint is identified by specifying a table and column in a relation. The first foreign key relationship to match the table and column is used. Compound foreign keys are fully supported, but can only be identified by a specifying a single column.
column
Must be specified when using table to identify a foreign key.
type
One of incoming or outgoing, with incoming the default. This identifies the direction of a relationship from the perspective of a subject.
name
Optional and purely for informational purposes.
disabled
Foreign key relations can be disabled. This is useful in the blacklisting approach where the everything default is used and then relations disabled.
sticky
Sticky relations can be used to keep track of which rows are directly connected to the subject. See sticky relations for more details.

Compound keys are also supported, see e.g. Compound Foreign Keys

A relationship is uniquely identified by its table, column, type and name. Identical relationships are processed in order and merged according to the following rules:

  • If any relation is disabled, then the relation is disabled and not processed.
  • If any relation is sticky, then the relation is sticky.

Examples:

Example Description
Default relations for a department Default relations for a department
Incoming Relation Incoming Relation
Relation for two departments 1 Default relations for two departments
Relation for two departments 2 Alternative default relations for two departments
Relation for two departments 3 Another alternative default relations for two departments
Relation for an employee All relations
Outgoing relation Outgoing relation
Disabled incoming relation Blacklisting approach with a disabled incoming relation
Disabled outgoing relation Blacklisting approach with a disabled outgoing relation

Defaults

Default relations can be set by using the relations default key. There are four default settings that can be combined in an additive way:

Setting Default Meaning
all-outgoing-not-null yes Always satisfy not null foreign key constraints
all-outgoing-nullable yes Ensures that complete rows are fetched
all-incoming no Process incoming foreign keys
everything no All of the above

If no defaults are specified, a single relation of type all-outgoing-nullable is used. The all-outgoing-not-null default is always present. The combination of these two ensures that whenever a row is encountered, all outgoing foreign keys are processed. This causes rows referenced by the foreign key to be fetched.

This is the default setting:

- relations:
  - {defaults: all-outgoing-not-null}
  - {defaults: all-outgoing-nullable}

To add all incoming relations to the default, use:

- relations:
 - {defaults: everything}

Since all-outgoing-not-null is always included implicitly, the above is equivalent to:

- relations:
  - {defaults: all-outgoing-nullable}
  - {defaults: all-incoming}

Use this to disable all relations except the minimal required all-outgoing-not-null:

- relations:
  - {defaults: all-outgoing-not-null}

Setting default relations is useful when using the blacklisting approach. See Disabled incoming relation and Disabled outgoing relation.

Examples:

Example Description
all-outgoing-not-null all-outgoing-not-null
all-outgoing-nullable all-outgoing-nullable
all-incoming all-incoming
all-incoming and all-outgoing-nullable all-incoming and all-outgoing-nullable
everything everything

Includes

Yaml files can be included in each other using the include directive. For example having this in a top level file:

- include basic-tables.yaml
- subject:
  - tables:
    - {table: departments}

and this in another file called basic-tables.yaml

- subject:
  - tables:
    - {table: building_types}
    - {table: something_essential}

is equivalent to:

- subject:
  - tables:
    - {table: building_types}
    - {table: something_essential}
- subject:
  - tables:
    - {table: departments}

Includes can be done anywhere, so that e.g. a common file of relations can be defined and then used in several subjects like so:

- subject:
  - tables:
    - {table: employees, column: name, values: ['John', 'Jane']}
    - include employee-relations.yaml
- subject:
  - tables:
    - {table: departments, column: name, values: ['Research']}
    - include employee-relations.yaml

This is useful in complex databases where several combinations of relations and tables could get combined in several subjects.

Sticky relations

What can quickly happen when doing an extraction with a complicated database schema is an explosion of data. In many of these cases, just enabling a foreign key relationship can pull in lots of unwanted data. An easy solution to prevent this is to make use of the sticky relations. When this flag is set on a relation, then the relation is only processed if there is a direct graph of sticky relations back to a subject. The rules of transmitting stickiness are:

  • Subject’s table’s rows start off being sticky
  • Non-sticky relations are always processed, however the fetched rows aren’t marked sticky. This is the default behavior.
  • A sticky relationship is only processed if the row is sticky
  • Stickiness is only transmitted if a) the row is sticky and b) the relationship is sticky

This behavior can be summarized in a table:

Fetched row sticky Relationship sticky Relationship is processed Processed row sticky
No No Yes No
Yes No Yes No
No Yes No -
Yes Yes Yes Yes

See Sticky Relations for an example.