DS

Saturday, 8 February 2014

ICONV, OCONV

ICONV and OCONV functions are quite often used to handle data in Datastage.
ICONV converts a string to an internal storage format and OCONV converts an expression to an output format.

Syntax:
Iconv (string, conversion code)
Oconv(expression, conversion )

<Karthi> This function available only in server job, not in parallel jobs.

ICONV

The following example show date conversions from external to internal format. All of these ICONV functions return the internal date 14143:

DateConversions:
  PRINT ICONV("20 SEP 2006","D")
  PRINT ICONV("09-20-2006","D")
  PRINT ICONV("09/20/2006","D")

The following example shows time conversions from external to internal format:

TimeConversions:
  PRINT ICONV("13:21","MT");       ! Returns 48060
  PRINT ICONV("1:21PM","MT");      ! Returns 48060
  PRINT ICONV("13:21:01","MT");    ! Returns 48061
  PRINT ICONV("13:21:01.65","MT"); ! Returns 48061.65

OCONV

The following example shows date conversions:

DateConversions:
  ! Month Abbreviation Formats:
  PRINT OCONV(0,"D");       ! "31 DEC 1967"
  PRINT OCONV(14100,"D");   ! "08 AUG 2006"
  PRINT OCONV(14100,"D2");  ! "08 AUG 06"
  PRINT OCONV(DATE(),"D");  ! current date in above format
  PRINT OCONV(@DATE,"D");   ! current date in above format
  PRINT OCONV(14120,"D-")   ! "08-28-2006"
  PRINT OCONV(14120,"D/")   ! "08/28/2006"
  PRINT OCONV(14120,"DE")   ! "28/08/2006"
  PRINT OCONV(14120,"D2/")  ! "08/28/06"
  PRINT OCONV(14120,"D2-E") ! "28-08-06"

The following example shows time conversions:

TimeConversions:
  PRINT OCONV(0,"MT")
  PRINT OCONV(TIME(),"MT")
  PRINT OCONV(TIME(),"MTH")
  PRINT OCONV(TIME(),"MTS")
  PRINT OCONV(TIME(),"MTS.")
  PRINT OCONV(TIME(),"MTHS*")

{
node “node1″
{
fastname “SVR1″
pools “”
resource disk “C:/IBM/InformationServer/Server/Datasets/Node1″ {pools “”}
resource scratchdisk “C:/IBM/InformationServer/Server/Scratch/Node1″ {pools “}
}
node “node2″
{
fastname “SVR1″
pools “”
resource disk “C:/IBM/InformationServer/Server/Datasets/Node1″ {pools “”}
resource scratchdisk “C:/IBM/InformationServer/Server/Scratch/Node1″ {pools “”}
}
node “node3″
{
fastname “SVR2″
pools “” “sort”
resource disk “C:/IBM/InformationServer/Server/Datasets/Node1″ {pools “”}
resource scratchdisk “C:/IBM/InformationServer/Server/Scratch/Node1″ {pools ”" }
}

}

This is a 3 node configuration file. Lets go through the basic entries and what it represents.

Fastname – This refers to the node name on a fast network. From this we can imply that the nodes node1 and node2 are on the same physical node. However if we look at node3 we can see that it is on a different physical node (identified by SVR2). So basically in node1 and node2 , all the resources are shared. This means that the disk and scratch disk specified is actually shared between those two logical nodes. Node3 on the other hand has its own disk and scratch disk space.

Pools – Pools allow us to associate different processing nodes based on their functions and characteristics. So if you see an entry other entry like “node0” or other reserved node pools like “sort”,”db2”,etc.. Then it means that this node is part of the specified pool. A node will be by default associated to the default pool which is indicated by “”. Now if you look at node3 can see that this node is associated to the sort pool. This will ensure that that the sort stage will run only on nodes part of the sort pool.

Resource disk - This will specify Specifies the location on your server where the processing node will write all the data set files. As you might know when Datastage creates a dataset, the file you see will not contain the actual data. The dataset file will actually point to the place where the actual data is stored. Now where the dataset data is stored is specified in this line.

Resource scratchdisk – The location of temporary files created during Datastage processes, like lookups and sorts will be specified here. If the node is part of the sort pool then the scratch disk can also be made part of the sort scratch disk pool. This will ensure that the temporary files created during sort are stored only in this location. If such a pool is not specified then Datastage determines if there are any scratch disk resources that belong to the default scratch disk pool on the nodes that sort is specified to run on. If this is the case then this space will be used.

SMP, MPP

SMP is the Symmetric Multi Processing in which some hardware resources are shared symmetrically among processors. The Processors communicate via shared memory and have single Operating system.
A symmetric multiprocessor system is a multiprocessor system with centralized shared memory called main memory . . . operating under a single operating system with two or more homogeneous processors--ie, it is not a heterogeneous computing system.

MPP (massively parallel processing) is the coordinated processing of a program by multiple processors that work on different parts of the program, with each processor using its own operating system and memory . Typically, MPP processors communicate using some messaging interface.

In terms of software, there are two different types of database software: symmetric multiprocessing (SMP) and massively parallel processing (MPP).
An SMP database system is a database system that runs on one or more machines with several identical processors sharing the same disk storage. When an SMP database system runs on more than one machine, it is called a clustered configuration. The database is physically located in a single disk storage system.
Examples of SMP database systems are SQL Server, Oracle, DB/2, Informix, and Sybase.
An MPP database system is a database system that runs on more than one machine where each machine has its own disk storage. The database is physically located in several disk storage systems that are interconnected to each other. An MPP database system is also known as a parallel database system. Examples of MPP database systems are Teradata, Neoview, Netezza, and DATAllegro.

The machines in SMP and MPP database systems are called nodes.
An MPP database system is faster and more scalable than an SMP database system. In an MPP database system, a table is physically located in several nodes, each with its own storage

Tuesday, 4 February 2014

SQL Query

Query to display middle records drop first 5 last 5 records in emp table

select * from emp where rownum<=(select count(*)-5 from emp) - select * from emp where rownum<=5;

Query to display first N records

select * from(select * from emp order by rowid) where rownum<=&n;

Query to display odd records only?

Q). select * from emp where (rowid,1) in (select rowid,mod (rownum,2) from emp);

Query to display even records only?

Q.) select * from emp where (rowid,0) in (select rowid,mod (rownum,2) from emp);

How to display duplicate rows in a table?

Q). select * from emp where deptno=any

(select deptno from emp having count(deptno)>1 group by deptno);

Query to display 3rd highest and 3rd lowest salary?

Q). select * from emp e1 where 3=(select count(distinct sal) from emp e2 where e1.sal<=e2.sal)

union

select * from emp e3 where 3=(select count(distinct sal) from emp e4 where e3.sal>=e4.sal);

Query to display Nth record from the table?

Q). select * from emp where rownum<=&n minus select * from emp where rownum<&n;

Query to display the records from M to N;

Q.) select ename from emp group by rownum,ename having rownum>1 and rownum<6;

select deptno,ename,sal from emp where rowid in(select rowid from emp

where rownum<=7 minus select rowid from emp where rownum<4);

select * from emp where rownum<=7 minus select * from emp where rownum<5;

Query to delete the duplicate records?

Q). delete from dup where rowid not in(select max(rowid)from dup group by eno);

Query to display the duplicate records?

Q). select * from dup where rowid not in(select max(rowid)from dup group by eno);

Query for joining two tables(OUTER JOIN)?

Q). select e.ename,d.deptno from emp e,dept d where e.deptno(+)=d.deptno order by e.deptno;

select empno,ename,sal,dept.* from emp full outer join dept on emp.deptno=dept.deptno;

Right Outer Join:

select empno,ename,sal,dept.* from emp right outer join dept on emp.deptno=dept.deptno;

Left Outer Join:

select empno,ename,sal,dept.* from emp left outer join dept on emp.deptno=dept.deptno

Query for joining table it self(SELF JOIN)?

Q). select e.ename “employee name”,e1.ename “manger name” from emp e,emp e1 where e.mgr=e1.empno;

Query for combining two tables(INNER JOIN)?

select emp.empno,emp.ename,dept.deptno from emp,dept where emp.deptno=dept.deptno;

By using aliases:

select e.empno,e.ename,d.deptno from emp e,dept d where e.deptno=d.deptno;

select empno,ename,sal,dept.* from emp join dept on emp.deptno=dept.deptno:

Find the particular employee salary?

for maximum:

select * from emp where sal in(select min(sal)from

(select sal from emp group by sal order by sal desc)

where rownum<=&n);

select * from emp a where &n=(select count(distinct(sal)) from emp b where a.sal<=b.sal);

for minimum:

select * from emp where sal in(select max(sal) from(select sal from emp group by sal order by sal asc) where rownum<=&n);

select * from emp a where &n=(select count(distinct(sal)) from emp b where a.sal>=b.sal)

Find the lowest 5 employee salaries?

Q). select * from (select * from emp order by sal asc) where rownum<6;

Find the top 5 employee salaries queries

select * from (select * from emp order by sal desc) where rownum<6;

Find lowest salary queries

select * from emp where sal=(select min(sal) from emp);

Find highest salary queries

select * from emp where sal=(select max(sal) from emp);

Monday, 3 February 2014

difference between join lookup merge

http://mydatastage-notes.blogspot.in/2013/03/difference-between-joinlookup-and-merge.html

Wednesday, 29 January 2014

configuring DB2 database connectivity in a parallel environment in DataStage

Configure DataStage to run on all processing nodes that function as DB2 servers.

Steps

To configure InfoSphere DataStage to access IBM DB2 servers, include a node definition for each IBM DB2 server node in the configuration file. Also, complete the following steps.

Running the IBM DB2 configuration script
Use the DB2 configuration script to configure access to IBM DB2 databases.

Procedure
From the Linux or UNIX command line, run the $APT_ORCHHOME/bin/db2setup.sh script once for each IBM DB2 database that users will access. Pass the database name as an argument.

For example, the following command calls db2setup.sh to configure access to the database db2db:

db2setup.sh db2db

Granting user privileges to run IBM DB2 Enterprise stage jobs
You must grant privileges to each user who runs jobs that contain the DB2 Enterprise stage.

Procedure
Run the $APT_ORCHHOME/bin/db2grant.sh script once for each user. Use the following syntax:

db2grant.sh database_name user_name

Accessing a remote IBM DB2 server from IBM DB2 Enterprise stages
You use an DB2 client to connect from an IBM DB2 Enterprise stage to a remote IBM DB2 server.

Thanks - http://datastage4you.blogspot.in/2012/08/configuring-db2-database-connectivity.html#.Uuf9QhC6bIU

DataStage Architecture

DataStage Architecture

This is the info as per my knowladge.

What is the architecture of data stage?

Architecture of DS is client/server architecture.

We have different types of client /server architecture for DataStage starting from the different versions.The latest version is DataStage 8.7

http://www-01.ibm.com/support/docview.wss?uid=swg27008803

1. Datastage 7.5 (7.5.1 or 7.5.2) version-standalone

DataStage 7.5 version was a standalone version where DataStage engine, service and repository (metadata) was all installed in once server and client was installed in local PC and access the servers using the ds-client. Here the users are created in Unix/windows DataStage server and was added to the dstage group (dsadm is the owner of the DataStage and dstage is the group of that.)To give access to the new user just create new Unix/windows user in the DS-server and add them to dstage group. The will have access to the DataStage server from the client.

Client components & server components

Client components are 4 types they are

Data stage designer
Data stage administrator
Data stage director
Data stage manager

Data stage designer is user for to design the jobs. All the DataStage development activities are done here. For a DataStage developer he should know this part very well.

Data stage manager is used for to import & export the project to view & edit the contents of the repository. This is handled by DataStage operator/administrator

Data stage administrator is used for creating the project, deleting the project & setting the environment variables. This is handled by DataStage administrator

Data stage director is use for to run the jobs, validate the jobs, scheduling the jobs. This is handled by DataStage developer/operator

Server components

DS server: runs executable server jobs, under the control of the DS director, that extract,transform, and load data into a DWH.

DS Package installer: A user interface used to install packaged DS jobs and plug-in;

Repository or project: a central store that contains all the information required to build DWH or data mart.

More reference on DataStage 7.5

ftp://ftp.software.ibm.com/software/data/db2imstools/db2tools/pdf/d...

http://it.toolbox.com/wiki/index.php/DataStage_Enterprise_Edition

http://etl-tools.info/infosphere-datastage-ee.htm

http://h71028.www7.hp.com/enterprise/downloads/DataStage%20Product%...

http://it.toolbox.com/blogs/infosphere/new-release-datastage-753-th...

2.Datastage 8.0 (8.1 and 8.5)version-standalone

DataStage 8 version was a standalone version where DataStage engine and service are in DataStage server but the Database part repository (metadata) was installed in Oracle/DB2 Database server and client was installed in local PC and accesses the servers using the ds-client.

Metadata (Repository): This will be created as one database and will have 2 schemas (xmeta and isuser).This can be made as RAC DB (Active/Active in 2 servers, if any one DB failed means the other will be switch over without connection lost of the DataStage jobs running) where

xmeta :will have information about the project and DataStage software
iauser: will have information about the user of DataStage in IIS or webconsole

Note: we can install 2 or 3 DataStage instance in the same server like ds-8.0 or ds-8.1 or ds-8.5 and bring up any version whenever we want to work on that. This will reduce the hardware cost. But only one instance can be up and running.

The DataStage 8 was also a standalone version but here the 3 components were introduced defiantly.

1.information server(IIS)- isadmin

2.websphere server- wasadmin

3. Datastage server- dsadm

1. The IIS also called as DataStage webconsole was introduced where in which it will have all the user information of the DataStage. This is general accessed in web browser and don’t need and DataStage software installation.

After the DataStage installation. The IIS or webconsole will be generated and will have isadmin as administrator to mange this web console. once we login into the web console using isadmin we need to map the “dsadm” user in the engine credentials”(dsadm is the unix/windows user created in the datastage server with dstage group).Then after the mapping the new users will be created in the same user components(note:The users “xxx” created are internally tagged to dsadm mapped user which internally making connecting between unix datastage server and IIS webconsole.All the files/project ..etc created using “xxx” will be owned by “dsadm” user in the unix server)

We can restrict the “xxx” users here to access 1 or 2 projects.

http://www-01.ibm.com/support/docview.wss?uid=swg27009428&aid=1

http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r1/index.jsp

https://www-304.ibm.com/support/docview.wss?uid=swg27013419

http://it.toolbox.com/blogs/infosphere/user-and-group-security-for-...

http://mayurdsguru.files.wordpress.com/2010/12/datastage_admin.pdf

Client components & server components

http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r0/index.jsp?to...

Client components are

Data stage designer
Data stage administrator
Data stage director
IBM import export manager
Webconsole
IBM infosphere DataStage and Qualitystage multi-client manager
Others I have not come across J

Data stage designer is user for to design the jobs. All the DataStage development activities are done here. For a DataStage developer he should know this part very well.

Data stage administrator is used for creating the project, deleting the project & setting the environment variables. This is handled by DataStage administrator

Data stage director is use for to run the jobs, validate the jobs, scheduling the jobs. This is handled by DataStage developer/operator

IBM import export manager is used for to import & export the project to view & edit the contents of the repository. This is handled by DataStage operator/administrator

Webconsole is use for to create the datastage users and do the administration .This is handled by DataStage administrator

Multi-client manager is use for to install multipal client like ds-7.5,ds-8.1 or ds-8.5 in the local pc and can swap to any version when it is required. This is used by DataStage developer/operator/administrator/all

Server components:

-- IBM InfoSphere Blueprint Director
-- IBM InfoSphere Business Glossary
-- IBM InfoSphere DataStage
-- IBM InfoSphere FastTrack
-- IBM InfoSphere Information Analyzer
-- IBM InfoSphere Information Services Director
-- IBM InfoSphere Metadata Server
-- IBM InfoSphere Metadata Workbench
-- IBM InfoSphere QualityStage

http://www-01.ibm.com/support/docview.wss?uid=swg27016910

http://it.toolbox.com/blogs/infosphere/ten-reasons-why-you-need-dat...

3.Datastage 8.5version-Cluster(HA-High Availability clusters)

DataStage 8.5 version was a also have HA-High Availability clusters setup. All the function and working is same as DataStage 8.5 standalone but the hardware and software structure will be different.

1. DataStage engine Tier is in different server (2 Active/Active or Active/passive) and

2. Service Tire is in different server (2 Active/Active or Active/passive) and

3. Metadata Database part (repository) tire is in different server (2 Active/Active or Active/passive) was installed in Oracle/DB2 Database server with RAC(means 2 Database server in Active/Active mode, if one DB fails the other will be switched immediately and no connection lost)

The whole DataStage HA is made in such way that any fail in any part may be engine/service or metadata tire. It will automatically switch to other Active servers and without connection lost of the current DataStage jobs running. This is the amazing setup done and it is implementing in out Citibank project and I am lucky to work on this.

Also we can have multiple DataStage engines for ex: Singapore/Malaysia/Thiland/Russia(4 Engine tries) running for the same 2 service Tires/Medata DB Tires.(This will reduce the cost of the Hardware)

http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r5/index.jsp

http://www.google.com.sg/url?q=http://www.scribd.com/doc/46695490/D...

http://it.toolbox.com/blogs/infosphere/the-2010-datastage-roadmap-n...

Courtest & Thanks - http://www.channeldb2.com/profiles/blogs/datastage-architecture