[GTALUG] looking for a tool to transform table relationships

Alex Volkov alex at flamy.ca
Fri Jul 20 18:17:52 EDT 2018


Hey Chris,

Thank you for your reply, this is the way to do this task properly. In 
my case I don't need to clean any data, it already comes validated from 
the db, so I might get away with using dblink and some functions, if 
that fails, I'll do dump/restore into temporary tables.

So far I figured out the right package that's has dblink.so (used to be 
dblink.sql), then I activated it from under postgres user with CREATE 
EXTENSON dblink;
Now I figured out the syntax of dblink command that allows me to connect 
via socket, it's --  FROM dblink('dbname=fmcc host=/var/run/postgresql 
port=5432 password=test'...

The only problem is I'm getting this interesting error, for some reason 
dblink *really wants to do password verification*

ERROR:  password is required
DETAIL:  Non-superuser cannot connect if the server does not request a 
password.
HINT:  Target server's authentication method must be changed.
********** Error **********

ERROR: password is required
SQL state: 2F003
Detail: Non-superuser cannot connect if the server does not request a 
password.
Hint: Target server's authentication method must be changed.

Alex.

On 2018-07-20 04:34 PM, Christopher Browne wrote:
> On Fri, 20 Jul 2018 at 13:42, Alex Volkov via talk <talk at gtalug.org> wrote:
>> Hey Everyone,
>>
>> I'm looking for a tool to transform (possibly migrate) data from one
>> postgres db to another.
>>
>> I have two postgres databases -- old (not normalized) has all the data
>> and new (normalized with some major schema changes) has no data.
>>
>> The new db has more tables and table relationship structured
>> differently, some data residing in old db is in text column in csv
>> format, whereas in new database it's a separate table, and so on.
>>
>> I've been thinking of writing a script that would just transform data
>> from one format to the other, but before writing a bunch of code that's
>> going to be run exactly once, I'm wondering if there a tool out there
>> which I can use to express one-way transformation rules for these databases.
>
> I have done this a few times ;-)
>
> The first time, it was pretty much "scripts manipulating text files"...
>
> These days, I seek to get the data out in some sort of "logical form",
> where, as much as possible, internal details such as object IDs get
> thrown out, and the data then gets loaded into some tables, and then
> cleansed and loaded into the target schema.
>
> Often, what happens is that the data files represent data that I
> receive from someone else (that used to operate a domain name
> registry...), and so the notion of using dblink isn't ever in my head.
>
> But then, in effect, the data flows...
>
> - Loaded into raw_this, and raw_that, and raw_other...
>
> - Some cleansing is done to make sure that invalid data is
> fixed or thrown out, giving me tables cleansed_this, cleansed_that,
> cleansed_other...
>
> Finally, the data from the "cleansed" tables gets loaded into the
> new schema essentially via a series of INSERT statements.
>
> By having that intermediate stage, you have some ability to
> audit the data, to know that all the data loaded was in good
> form, or to have specific identification of data discarded because
> it was bad., or to identify all the data that got modified to
> clean it up (e.g. - I have lots of postal addresses, so have to
> clean up country codes and the like).
>



More information about the talk mailing list