Extraction Model¶
Subjects¶
An extraction model consists of one more more subjects. Each subject has its own set of tables and relations. Relations can also be global, which allows setting of global defaults that can be overridden by subjects. By default, any row found in a table in the extraction model is fetched in its entirety.
Tables¶
A table on a subject consists of the following:
- table
- The name of the table
- column
- The name of the column to extract values out of, when used together with
values
- values
- A single number of string or an array of numbers or strings
Examples:
Example | Description |
All departments | A table entry with just a table name will fetch all rows for that table |
One department | A table entry with a single column/value will fetch one row |
Two departments | A table entry with multiple column/values will fetch multiple rows |
Two tables | A table entry with multiple column/values will fetch multiple rows |
Two subjects | Two subjects with one table each |
Relations¶
A relation enables or disables the processing of a foreign key in the database schema. A relationship is incoming
or outgoing
as seen from the perspective of a subject. All outgoing
not null foreign keys must be processed to satisfy the foreign key constraint. This type of relationship is therefore always enabled and cannot be disabled,
A relation can be applied globally or to a subject. A global relation is always included in all subjects. Relations in a subject are only processed on rows related to the subject. See Extraction for more information.
- defaults
- Add all relations from a couple of selected types. See defaults for more details.
- table
- A foreign key constraint is identified by specifying a
table
andcolumn
in a relation. The first foreign key relationship to match the table and column is used. Compound foreign keys are fully supported, but can only be identified by a specifying a single column.- column
- Must be specified when using
table
to identify a foreign key.- type
- One of
incoming
oroutgoing
, withincoming
the default. This identifies the direction of a relationship from the perspective of a subject.- name
- Optional and purely for informational purposes.
- disabled
- Foreign key relations can be disabled. This is useful in the blacklisting approach where the
everything
default is used and then relations disabled.- sticky
- Sticky relations can be used to keep track of which rows are directly connected to the subject. See sticky relations for more details.
Compound keys are also supported, see e.g. Compound Foreign Keys
A relationship is uniquely identified by its table
, column
, type
and name
. Identical relationships are processed in order and merged according to the following rules:
- If any relation is disabled, then the relation is disabled and not processed.
- If any relation is sticky, then the relation is sticky.
Examples:
Example | Description |
Default relations for a department | Default relations for a department |
Incoming Relation | Incoming Relation |
Relation for two departments 1 | Default relations for two departments |
Relation for two departments 2 | Alternative default relations for two departments |
Relation for two departments 3 | Another alternative default relations for two departments |
Relation for an employee | All relations |
Outgoing relation | Outgoing relation |
Disabled incoming relation | Blacklisting approach with a disabled incoming relation |
Disabled outgoing relation | Blacklisting approach with a disabled outgoing relation |
Defaults¶
Default relations can be set by using the relations
default
key. There are four default settings that can be combined in an additive way:
Setting | Default | Meaning |
all-outgoing-not-null | yes | Always satisfy not null foreign key constraints |
all-outgoing-nullable | yes | Ensures that complete rows are fetched |
all-incoming | no | Process incoming foreign keys |
everything | no | All of the above |
If no defaults are specified, a single relation of type all-outgoing-nullable
is used. The all-outgoing-not-null
default is always present. The combination of these two ensures that whenever a row is encountered, all outgoing foreign keys are processed. This causes rows referenced by the foreign key to be fetched.
This is the default setting:
- relations:
- {defaults: all-outgoing-not-null}
- {defaults: all-outgoing-nullable}
To add all incoming relations to the default, use:
- relations:
- {defaults: everything}
Since all-outgoing-not-null
is always included implicitly, the above is equivalent to:
- relations:
- {defaults: all-outgoing-nullable}
- {defaults: all-incoming}
Use this to disable all relations except the minimal required all-outgoing-not-null
:
- relations:
- {defaults: all-outgoing-not-null}
Setting default relations is useful when using the blacklisting approach. See Disabled incoming relation and Disabled outgoing relation.
Examples:
Example | Description |
all-outgoing-not-null | all-outgoing-not-null |
all-outgoing-nullable | all-outgoing-nullable |
all-incoming | all-incoming |
all-incoming and all-outgoing-nullable | all-incoming and all-outgoing-nullable |
everything | everything |
Includes¶
Yaml files can be included in each other using the include
directive. For example having this in a top level file:
- include basic-tables.yaml
- subject:
- tables:
- {table: departments}
and this in another file called basic-tables.yaml
- subject:
- tables:
- {table: building_types}
- {table: something_essential}
is equivalent to:
- subject:
- tables:
- {table: building_types}
- {table: something_essential}
- subject:
- tables:
- {table: departments}
Includes can be done anywhere, so that e.g. a common file of relations can be defined and then used in several subjects like so:
- subject:
- tables:
- {table: employees, column: name, values: ['John', 'Jane']}
- include employee-relations.yaml
- subject:
- tables:
- {table: departments, column: name, values: ['Research']}
- include employee-relations.yaml
This is useful in complex databases where several combinations of relations and tables could get combined in several subjects.
Sticky relations¶
What can quickly happen when doing an extraction with a complicated database schema is an explosion of data. In many of these cases, just enabling a foreign key relationship can pull in lots of unwanted data. An easy solution to prevent this is to make use of the sticky
relations. When this flag is set on a relation, then the relation is only processed if there is a direct graph of sticky relations back to a subject. The rules of transmitting stickiness are:
- Subject’s table’s rows start off being sticky
- Non-sticky relations are always processed, however the fetched rows aren’t marked sticky. This is the default behavior.
- A sticky relationship is only processed if the row is sticky
- Stickiness is only transmitted if a) the row is sticky and b) the relationship is sticky
This behavior can be summarized in a table:
Fetched row sticky | Relationship sticky | Relationship is processed | Processed row sticky |
No | No | Yes | No |
Yes | No | Yes | No |
No | Yes | No | - |
Yes | Yes | Yes | Yes |
See Sticky Relations for an example.