DataStage Architecture
This is the info as per my knowladge.
What is the architecture of data stage?
Architecture of DS is client/server architecture.
We have different types of client /server architecture for DataStage starting from the different versions.The latest version is DataStage 8.7
1. Datastage 7.5 (7.5.1 or 7.5.2) version-standalone
DataStage 7.5 version was a standalone version where DataStage engine, service and repository (metadata) was all installed in once server and client was installed in local PC and access the servers using the ds-client. Here the users are created in Unix/windows DataStage server and was added to the dstage group (dsadm is the owner of the DataStage and dstage is the group of that.)To give access to the new user just create new Unix/windows user in the DS-server and add them to dstage group. The will have access to the DataStage server from the client.
Client components & server components
Client components are 4 types they are
- Data stage designer
- Data stage administrator
- Data stage director
- Data stage manager
Data stage designer is user for to design the jobs. All the DataStage development activities are done here. For a DataStage developer he should know this part very well.
Data stage manager is used for to import & export the project to view & edit the contents of the repository. This is handled by DataStage operator/administrator
Data stage administrator is used for creating the project, deleting the project & setting the environment variables. This is handled by DataStage administrator
Data stage director is use for to run the jobs, validate the jobs, scheduling the jobs. This is handled by DataStage developer/operator
Server components
DS server: runs executable server jobs, under the control of the DS director, that extract,transform, and load data into a DWH.
DS Package installer: A user interface used to install packaged DS jobs and plug-in;
Repository or project: a central store that contains all the information required to build DWH or data mart.
More reference on DataStage 7.5
2.Datastage 8.0 (8.1 and 8.5)version-standalone
DataStage 8 version was a standalone version where DataStage engine and service are in DataStage server but the Database part repository (metadata) was installed in Oracle/DB2 Database server and client was installed in local PC and accesses the servers using the ds-client.
Metadata (Repository): This will be created as one database and will have 2 schemas (xmeta and isuser).This can be made as RAC DB (Active/Active in 2 servers, if any one DB failed means the other will be switch over without connection lost of the DataStage jobs running) where
- xmeta :will have information about the project and DataStage software
- iauser: will have information about the user of DataStage in IIS or webconsole
Note: we can install 2 or 3 DataStage instance in the same server like ds-8.0 or ds-8.1 or ds-8.5 and bring up any version whenever we want to work on that. This will reduce the hardware cost. But only one instance can be up and running.
The DataStage 8 was also a standalone version but here the 3 components were introduced defiantly.
1.information server(IIS)- isadmin
2.websphere server- wasadmin
3. Datastage server- dsadm
1. The IIS also called as DataStage webconsole was introduced where in which it will have all the user information of the DataStage. This is general accessed in web browser and don’t need and DataStage software installation.
After the DataStage installation. The IIS or webconsole will be generated and will have isadmin as administrator to mange this web console. once we login into the web console using isadmin we need to map the “dsadm” user in the engine credentials”(dsadm is the unix/windows user created in the datastage server with dstage group).Then after the mapping the new users will be created in the same user components(note:The users “xxx” created are internally tagged to dsadm mapped user which internally making connecting between unix datastage server and IIS webconsole.All the files/project ..etc created using “xxx” will be owned by “dsadm” user in the unix server)
We can restrict the “xxx” users here to access 1 or 2 projects.
Client components & server components
Client components are
- Data stage designer
- Data stage administrator
- Data stage director
- IBM import export manager
- Webconsole
- IBM infosphere DataStage and Qualitystage multi-client manager
- Others I have not come across J
Data stage designer is user for to design the jobs. All the DataStage development activities are done here. For a DataStage developer he should know this part very well.
Data stage administrator is used for creating the project, deleting the project & setting the environment variables. This is handled by DataStage administrator
Data stage director is use for to run the jobs, validate the jobs, scheduling the jobs. This is handled by DataStage developer/operator
IBM import export manager is used for to import & export the project to view & edit the contents of the repository. This is handled by DataStage operator/administrator
Webconsole is use for to create the datastage users and do the administration .This is handled by DataStage administrator
Multi-client manager is use for to install multipal client like ds-7.5,ds-8.1 or ds-8.5 in the local pc and can swap to any version when it is required. This is used by DataStage developer/operator/administrator/all
Server components:
-- IBM InfoSphere Blueprint Director
-- IBM InfoSphere Business Glossary
-- IBM InfoSphere DataStage
-- IBM InfoSphere FastTrack
-- IBM InfoSphere Information Analyzer
-- IBM InfoSphere Information Services Director
-- IBM InfoSphere Metadata Server
-- IBM InfoSphere Metadata Workbench
-- IBM InfoSphere QualityStage
-- IBM InfoSphere Business Glossary
-- IBM InfoSphere DataStage
-- IBM InfoSphere FastTrack
-- IBM InfoSphere Information Analyzer
-- IBM InfoSphere Information Services Director
-- IBM InfoSphere Metadata Server
-- IBM InfoSphere Metadata Workbench
-- IBM InfoSphere QualityStage
3.Datastage 8.5version-Cluster(HA-High Availability clusters)
DataStage 8.5 version was a also have HA-High Availability clusters setup. All the function and working is same as DataStage 8.5 standalone but the hardware and software structure will be different.
1. DataStage engine Tier is in different server (2 Active/Active or Active/passive) and
2. Service Tire is in different server (2 Active/Active or Active/passive) and
3. Metadata Database part (repository) tire is in different server (2 Active/Active or Active/passive) was installed in Oracle/DB2 Database server with RAC(means 2 Database server in Active/Active mode, if one DB fails the other will be switched immediately and no connection lost)
The whole DataStage HA is made in such way that any fail in any part may be engine/service or metadata tire. It will automatically switch to other Active servers and without connection lost of the current DataStage jobs running. This is the amazing setup done and it is implementing in out Citibank project and I am lucky to work on this.
Also we can have multiple DataStage engines for ex: Singapore/Malaysia/Thiland/Russia(4 Engine tries) running for the same 2 service Tires/Medata DB Tires.(This will reduce the cost of the Hardware)
Courtest & Thanks - http://www.channeldb2.com/profiles/blogs/datastage-architecture
No comments:
Post a Comment