Monday, 11 November 2013

Oracle

     
DROP – delete the table from db

TRUNCATE – delete contents of table

Rownum, Rowid             
Rownum - Virtual
rowid - Physical (used in sub query)       

Functions available in Orc
Rank() function                

Sysdate                System date      
Joins                     
Group by                            
Order by                             
Top 5                    
Sequence use in oracle                

Sample queries –
To  search word containing %
 SELECT col_name FROM tbl_nam WHERE col_name LIKE '%?%%' ESCAPE '?';

SUBSTR returns a specified portion of a string eg SUBSTR('BCDEF',4) output BCDE

INSTR provides character position in which a pattern is found in a string. eg INSTR('ABC-DC-F','-',2) output 7 (2nd occurence of '-')

HAVING clause is used when you want to specify a condition for a group function and it is written after GROUP BY clause.

Number of days between two days - weekends

select sysdate - creation_dttm - 2 * (to_char(sysdate, 'WW') - to_char(creation_dttm, 'WW')) from the_table                    

Compare and contrast TRUNCATE and DELETE for a table.
Both the truncate and delete command have the desired outcome of getting rid of all the rows in a table. The difference between the two is that the truncate command is a DDL operation and just moves the high water mark and produces few rollback data. The delete command, on the other hand, is a DML operation, which will produce rollback data and thus take longer to complete.

Explain the difference between ARCHIVELOG mode and NOARCHIVELOG mode and the benefits and disadvantages to each.
ARCHIVELOG mode is a mode that you can put the database in for creating a backup of all transactions that have occurred in the database so that you can recover to any point in time. NOARCHIVELOG mode is basically the absence of ARCHIVELOG mode and has the disadvantage of not being able to recover to any point in time. NOARCHIVELOG mode does have the advantage of not having to write transactions to an archive log and thus increases the performance of the database slightly.

Explain the difference between $ORACLE_HOME and $ORACLE_BASE.
ORACLE_BASE is the root directory for oracle. ORACLE_HOME located beneath ORACLE_BASE is where the oracle products reside.

How would you determine the time zone under which a database was operating?
SELECT dbtimezone FROM DUAL;
Describe the use of %ROWTYPE and %TYPE in PL/SQL
 %ROWTYPE allows you to associate a variable with an entire table row. The %TYPE associates a variable with a single column type.

What is the fastest query method for a table
Fetch by rowid

Sunday, 10 November 2013

Evaluation sequences for transformer stage

To write efficient Transformer stage derivations, it helps to understand what items get evaluated and when.

The evaluation sequence for a Transformer stage is:
Evaluate each stage variable initial value
For each input row to process:
   Evaluate each stage variable derivation value, unless
   the derivation is empty
   For each output link:
      Evaluate constraint, if true:
         Evaluate each column derivation value
         Write the output row
   Next output link
Next input row
The evaluation sequence for a Transformer stage that has a loop condition defined is:
Evaluate each stage variable initial value
For each input row to process:
   Evaluate each stage variable derivation value (unless empty)
   Evaluate each loop variable initial value
   While the evaluated loop condition is true:
         Evaluate each loop variable derivation value (unless empty)
         For each output link:
            Evaluate constraint, if true:
               Evaluate each column derivation value
               Write the output row
         Next output link
   Loop back to While
Next input row
The stage variables, loop variables, and the columns within a link are evaluated in the order in which they are displayed on the parallel job canvas. Similarly, the output links are also evaluated in the order in which they are displayed.

Examples

Certain constructs are inefficient if they are included in output column derivations, because they are evaluated once for every output column that uses them. The following examples describe these constructs:
The same part of an expression is used in multiple column derivations.
For example, if you want to use the same substring of an input column in multiple columns in output links, you might use the following test in a number of output columns derivations:
IF (DSLINK1.col1[1,3] = "001") THEN ...
In this case, the evaluation of the substring of DSLINK1.col1[1,3] is repeated for each column that uses it. The evaluation can be made more efficient by moving the substring calculation into a stage variable. The substring is then evaluated once for every input row. This example has thus stage variable definition for StageVar1:
DSLINK1.col1[1,3]
Each column derivation starts with this test:
IF (StageVar1 = "001") THEN ...
This example can be improved further by also moving the string comparison into the stage variable. The stage variable is then defined as follows:
IF (DSLink1.col1[1,3] = "001") THEN 1 ELSE 0
Each column derivation starts with this test:
IF (StageVar1) THEN
The improved construct reduces both substring function evaluations and string comparisons.
An expression includes calculated constant values.
For example, a column definition might include a function call that returns a constant value:
Str(" ",20)
This function returns a string of 20 spaces. The function is evaluated every time the column derivation is evaluated. It is more efficient to calculate the constant value once. You can assign an initial value to a stage variable in the Variables tab of the Stage Properties window. The initial value is set to using this expression:
Str(" ", 20)
You do not supply the derivation of the stage variable on the main Transformer page. The initial value of the stage variable is evaluated once, before any input rows are processed. Because the derivation expression of the stage variable is empty, the stage variable is not re-evaluated for each input row. Change any expression that previously used the function Str(" ", 20) to use the stage variable instead.
The same considerations apply to any expression, or part of an expression, that generates a constant value. For example, the following expression concatenates two strings:
"abc" : "def"
The abcdef concatenation is repeated every time the column derivation is evaluated. Since the subpart of the expression is constant, this constant part of the expression can again be moved into a stage variable, using the initial value setting to perform the concatenation once.
An expression requiring a type conversion is used as a constant, or it is used in multiple places
For example, an expression might include the following code:
DSLink1.col1+"1"
In this example, the "1" is a string constant, and so must be converted from a string to an integer each time the expression is evaluated. The solution in this case is to change the constant from a string to an integer:
DSLink1.col1+1
If DSLINK1.col1 is a string field, however, then a conversion is required every time the expression is evaluated. If an input column is used in more than one expression, where it requires the same type conversion in each expression, it is more efficient to use a stage variable to perform the conversion once. You can create, for this example, an integer stage variable, specify its derivation to be DSLINK1.col1, and then use the stage variable in place of DSLink1.col1, where that conversion is required. When you use stage variables to evaluate parts of expressions, you must set the data type of the stage variable correctly for that context. Otherwise, needless conversions are required wherever that variable is used.
The advice on the use of stage variables equally applies to loop variables when you use the Transformer stage looping facilities. You add the evaluation to a stage variable if it is evaluated once per input row, or to a loop variable if it is evaluated once per looped output row when a loop condition was specified.

Transformer Stage

What is Transformer Stage?

DataStage provides several stages to load the data into the data warehouse or data marts. The stages classified into General, Database, Developement and Debug, File, Processing, Real time etc. and the transformer stage is a processing stage.

Local Variables:

You can declare and use your own variables within a Transformer stage. Such variables are accessible only from the Transformer stage in which they declared. Since these variables are local to the transformer, the name local variables.

The uses:

The values assigned by expressions.
They are used in expressions which define an output column derivation.
Expressions evaluating a variable can include other variables or the variable being evaluated itself.

Advantageous of stage variables:

1. Same part of an expression used in multiple column derivations.

For example, IF (DSLINK1.col1 [1, 3] = "001") THEN { statements evaluated here} used in multiple output column derivations, the expression evaluates for each column that uses it. Use the stage variable by moving this substring calculation and the substring evaluates just once for every input row.

2. Where an expression includes calculated constant values.

A column definition might include a function call that returns a constant value, such as: Str(" ",20).This returns a string of 20 spaces. In this case, the function would be evaluated every time the column derivation is evaluated. It is more efficient to calculate the constant value just once for the whole Transformer.

Use the stage variable to use this function. This case, the function would still be evaluated once for every input row.

Assign the initial value from the Stage Properties to the stage variable and leave the derivation empty. This way, the function evaluated only once for whole transformer processing.

3. Where an expression requires a type conversion and used as a constant, or used in multiple places

System variables

WebSphere DataStage provides a set of variables containing useful system information that you can access from an output derivation or constraint.

Name Description:

@FALSE The value replaced with 0.

@TRUE The value replaced with 1.

@INROWNUM Input row counter.

@OUTROWNUM Output row counter (per link).

@NUMPARTITIONS The total number of partitions for the stage.

@PARTITIONNUM The partition number for the particular instance.

Transformer stage: Input page

Partitioning tab:

The Partitioning tab allows you to specify details about how the incoming data partitioned or collected when input to the Transformer stage. It also allows you to specify that the data should be sorted on input.

By default the Transformer stage will attempt to preserve partitioning of incoming data, or use its own partitioning method according to what the previous stage in the job dictates.

In The transformer stage, the stage tab allows to set various properties in the General like Execution mode, Preserve partitioning, Advanced tab, Triggers tab etc.

Thursday, 7 November 2013

Sequential File stage

The Sequential File stage is a file stage. It allows you to read data from or write data to one or more flat files 

Maximum  2gb can be read
Upto 30 columns can be read
------------------------------------------

Input :
Input can be
 - Fixed length file
 - Delimiter file 

Parallelism :

Delimiter file -  
The stage executes in parallel mode by default if reading multiple files but executes sequentially if it is only reading one file.
Number Of readers per node - Can be > 1 to read the file parallel.
Read from multiple nodes -Can be YES if reading multiple files

Fixed length file - 
For fixed-width files, you can configure the stage to behave differently:

*  You can specify that single files can be read by multiple nodes. This can improve performance on cluster systems.
*  You can specify that a number of readers run on a single node. This means, for example, that a single file can be partitioned as it is read.

Number Of readers per node - Can be > 1 to read the file parallel.
Read from multiple nodes -Can be YES
These two options are mutually exclusive.

-------------------------------------------------------------------------

Important Options:

First Line is Column Names: If set true, the first line of a file contains column names on writing and is ignored on reading.

Keep File Partitions: Set True to partition the read data set according to the organization of the input file(s).

Reject Mode: Continue to simply discard any rejected rows; Fail to stop if any row is rejected; Output to send rejected rows down a reject link.

Number Of readers per node: This is an optional property and only applies to files containing fixed-length records, it is mutually exclusive with the Read from multiple nodes property. Specifies the number of instances of the file read operator on a processing node. The default is one operator per node per input data file. IfnumReaders is greater than one, each instance of the file read operator reads a contiguous range of records from the input file. The starting record location in the file for each operator, or seek location, is determined by the data file size, the record length, and the number of instances of the operator, as specified by numReaders.

The resulting data set contains one partition per instance of the file read operator, as determined by numReaders.

This provides a way of partitioning the data contained in a single file. Each node reads a single file, but the file can be divided according to the number of readers per node, and written to separate partitions. This method can result in better I/O performance on an SMP system.

Shows multiple readers on one node being used to effectively partition a sequential file
Shows multiple readers on one node being used to effectively partition a sequential file

Read from multiple nodes: This is an optional property and only applies to files containing fixed-length records, it is mutually exclusive with the Number of Readers Per Node property. Set this to Yes to allow individual files to be read by several nodes. This can improve performance on a cluster system.

InfoSphere DataStage knows the number of nodes available, and using the fixed length record size, and the actual size of the file to be read, allocates the reader on each node a separate region within the file to process. The regions will be of roughly equal size.

Shows multiple nodes being used top partition a sequential file
Shows multiple nodes being used top partition a sequential file
---------------------------------------------------


"No. Of Nodes per Node" Vs. "Read from Multiple Nodes" properties : 

1. Sequential Files can only be read sequential i.e. 1 file per node. 

2. "Read from Multiple Nodes" property works only when multiple files are read. 

3. A single file can be read in parallel with "No. of readers per Node" set to greater then 1. but only 1 Node can read. 

 There is "multiple readers per node". In this case, if you specify N readers per node for one sequential file, only one node gets used, and each reader on that node reads 1/N of the lines in the file. 


More Details in - http://datastage4u.wordpress.com/2011/04/26/reading-file-using-sequential-file-stage/