Read CSV files from AWS S3 : Portal

Start a new topic

Answered

Read CSV files from AWS S3

Cédric BLANC

started a topic about 3 years ago

I am new to Semarchy and I am working on what will be my first Continuous Integration job. Can a job can be set up to read CSV files directly from AWS S3 data buckets?

Best Answer

Cédric BLANC said about 3 years ago

Here's the key idea: to make a continuous load work, you need to insert data into an SD or SA table. xDM won't be pulling data.

So a typical example looks something like this:

insert into my_mdm_schema.sd_contact (  b_loadid  , b_classname  , b_pubid, b_sourceid  , attribute1  , ... )
select  my_repo_schema.get_continuous_loadid('CONTACT') as b_loadid  , 'Contact' /* case sensitive! */  , 'SAP' as b_pubid, id as b_sourceid  , att1  , ... from my_stg_schema.contact ;

That SQL is simple... but it's run outside of xDM by a data integration technology (or just by manual SQL).

So your select portion could indeed come directly from a CSV file in S3 rather than from the staging schema in my example. That's a nice S3 feature. But it would need to be the data integration technology calling it rather than xDM.

1 Comment

Cédric BLANC

said about 3 years ago

Answer

Here's the key idea: to make a continuous load work, you need to insert data into an SD or SA table. xDM won't be pulling data.

So a typical example looks something like this:

insert into my_mdm_schema.sd_contact (  b_loadid  , b_classname  , b_pubid, b_sourceid  , attribute1  , ... )
select  my_repo_schema.get_continuous_loadid('CONTACT') as b_loadid  , 'Contact' /* case sensitive! */  , 'SAP' as b_pubid, id as b_sourceid  , att1  , ... from my_stg_schema.contact ;

That SQL is simple... but it's run outside of xDM by a data integration technology (or just by manual SQL).