[GTALUG] looking for a tool to transform table relationships

Christopher Browne cbbrowne at gmail.com
Fri Jul 20 16:34:17 EDT 2018


On Fri, 20 Jul 2018 at 13:42, Alex Volkov via talk <talk at gtalug.org> wrote:
>
> Hey Everyone,
>
> I'm looking for a tool to transform (possibly migrate) data from one
> postgres db to another.
>
> I have two postgres databases -- old (not normalized) has all the data
> and new (normalized with some major schema changes) has no data.
>
> The new db has more tables and table relationship structured
> differently, some data residing in old db is in text column in csv
> format, whereas in new database it's a separate table, and so on.
>
> I've been thinking of writing a script that would just transform data
> from one format to the other, but before writing a bunch of code that's
> going to be run exactly once, I'm wondering if there a tool out there
> which I can use to express one-way transformation rules for these databases.


I have done this a few times ;-)

The first time, it was pretty much "scripts manipulating text files"...

These days, I seek to get the data out in some sort of "logical form",
where, as much as possible, internal details such as object IDs get
thrown out, and the data then gets loaded into some tables, and then
cleansed and loaded into the target schema.

Often, what happens is that the data files represent data that I
receive from someone else (that used to operate a domain name
registry...), and so the notion of using dblink isn't ever in my head.

But then, in effect, the data flows...

- Loaded into raw_this, and raw_that, and raw_other...

- Some cleansing is done to make sure that invalid data is
fixed or thrown out, giving me tables cleansed_this, cleansed_that,
cleansed_other...

Finally, the data from the "cleansed" tables gets loaded into the
new schema essentially via a series of INSERT statements.

By having that intermediate stage, you have some ability to
audit the data, to know that all the data loaded was in good
form, or to have specific identification of data discarded because
it was bad., or to identify all the data that got modified to
clean it up (e.g. - I have lots of postal addresses, so have to
clean up country codes and the like).

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


More information about the talk mailing list