DS: Sequential File stage

The Sequential File stage is a file stage. It allows you to read data from or write data to one or more flat files

Maximum 2gb can be read
Upto 30 columns can be read
------------------------------------------

Input :
Input can be
- Fixed length file
- Delimiter file

Parallelism :

Delimiter file -
The stage executes in parallel mode by default if reading multiple files but executes sequentially if it is only reading one file.
Number Of readers per node - Can be > 1 to read the file parallel.
Read from multiple nodes -Can be YES if reading multiple files

Fixed length file -
For fixed-width files, you can configure the stage to behave differently:

* You can specify that single files can be read by multiple nodes. This can improve performance on cluster systems.
* You can specify that a number of readers run on a single node. This means, for example, that a single file can be partitioned as it is read.

Number Of readers per node - Can be > 1 to read the file parallel.
Read from multiple nodes -Can be YES
These two options are mutually exclusive.

-------------------------------------------------------------------------

Important Options:

First Line is Column Names: If set true, the first line of a file contains column names on writing and is ignored on reading.

Keep File Partitions: Set True to partition the read data set according to the organization of the input file(s).

Reject Mode: Continue to simply discard any rejected rows; Fail to stop if any row is rejected; Output to send rejected rows down a reject link.

Number Of readers per node: This is an optional property and only applies to files containing fixed-length records, it is mutually exclusive with the Read from multiple nodes property. Specifies the number of instances of the file read operator on a processing node. The default is one operator per node per input data file. IfnumReaders is greater than one, each instance of the file read operator reads a contiguous range of records from the input file. The starting record location in the file for each operator, or seek location, is determined by the data file size, the record length, and the number of instances of the operator, as specified by numReaders.

The resulting data set contains one partition per instance of the file read operator, as determined by numReaders.

This provides a way of partitioning the data contained in a single file. Each node reads a single file, but the file can be divided according to the number of readers per node, and written to separate partitions. This method can result in better I/O performance on an SMP system.

Shows multiple readers on one node being used to effectively partition a sequential file

Shows multiple readers on one node being used to effectively partition a sequential file

Read from multiple nodes: This is an optional property and only applies to files containing fixed-length records, it is mutually exclusive with the Number of Readers Per Node property. Set this to Yes to allow individual files to be read by several nodes. This can improve performance on a cluster system.

InfoSphere DataStage knows the number of nodes available, and using the fixed length record size, and the actual size of the file to be read, allocates the reader on each node a separate region within the file to process. The regions will be of roughly equal size.

Shows multiple nodes being used top partition a sequential file

Shows multiple nodes being used top partition a sequential file

---------------------------------------------------

"No. Of Nodes per Node" Vs. "Read from Multiple Nodes" properties :

1. Sequential Files can only be read sequential i.e. 1 file per node.

2. "Read from Multiple Nodes" property works only when multiple files are read.

3. A single file can be read in parallel with "No. of readers per Node" set to greater then 1. but only 1 Node can read.

There is "multiple readers per node". In this case, if you specify N readers per node for one sequential file, only one node gets used, and each reader on that node reads 1/N of the lines in the file.

More Details in - http://datastage4u.wordpress.com/2011/04/26/reading-file-using-sequential-file-stage/

DS

Thursday, 7 November 2013

Sequential File stage

No comments:

Post a Comment