Column-level lineage
Supported sources
Column-level lineage is supported in all Synq supported data-warehouses and relational databases. In addition to that it also works down to Looker and Tableau.
Visibility into issues
There are two possible when column-level lineage might not be accurate and we clearly indicate that.
Parsing of SQL failed
In case our parser is unable to understand the SQL query, we indicate that both on the Code & Changes
of an asset and on the lineage graph.
Extraction of dependencies failed
In case when SQL contains edge case constructs, we might not correctly resolve how data in the columns flows. To prevent such issues, we do post-validation of our SQL understanding and expose those validation results both in Code & Changes
tab of an asset and on the lineage graph.
Limitations
We do our best to improve support for all different syntaxes and constructs available in different SQL dialects. If any feature needed is not supported please report that to us.
BigQuery STRUCT
literal lineage
We track lineage of individual STRUCT
fields and RECORD
column itself, so lineage will work when using them in CTEs and record simplified lineage as part of single SQL. When going from one asset to another we expose only information about columns and not the RECORD
fields.
Consider this example:
WITH c AS (
SELECT STRUCT (c1 AS a, c2 AS b) AS field FROM `prj`.`ds`.`tbl`),
d AS (SELECT STRUCT (field.a as d) double_nested
FROM c)
SELECT double_nested.d
FROM d
In this case we will correctly track simplified lineage from output column d
to column c1
in prj
.ds
.tbl
.
When we have two assets, one defined as
CREATE VIEW ds.rec_tbl(r) AS
SELECT STRUCT (c1 AS a, c2 AS b) AS r FROM `prj`.`ds`.`tbl`)
Second defined by query
SELECT r.c1 AS r FROM `prj`.`ds`.`rec_tbl`
We show that column r
depends on both c1
and c2
. We might consider changing that behaviour later.