Quantcast
Channel: SCN: Message List
Viewing all articles
Browse latest Browse all 2695

Re: Approach on loading multiple data sources sequentially within one job

$
0
0

From my document on Table_Comparison:

Sorted Input:

Often the most efficient solution when dealing with large data sources, because DS reads the comparison table only once. This option can only be selected when it is guaranteed that the incoming data are sorted in exactly the same order as the primary key in the comparison table. In most cases incoming data must be pre-sorted, e.g. using a Query transform with an Order-by (that may be pushed down to the underlying database), to take advantage of this functionality.

 

From the log, it seems:

  1. You don't have a primary key in the comparison table. Therefore DS has to sort it, first, caching all in memory, a very time-consuming process.
  2. Your source is a content extractor, isn't it? And it doesn't produce the data in the correct order. Therefore DS has to perform another sort operation.

 

Stage your input data in a database table with the correct primary key (you can use the Data_Transfer transform to that extent) and make sure your comparsion table has the same primary key column(s). You're job will termiate in seconds.


Viewing all articles
Browse latest Browse all 2695

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>