Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide 11g Release 2 (11.2) Part Number E10935-02 |
|
|
View PDF |
One of the main functions of an Extract, Transform, and Load (ETL) tool is to transform data. Oracle Warehouse Builder provides various methods of transforming data. This chapter provides an overview of data transformation in Warehouse Builder.
This chapter contains the following topics:
After you import your source data and define the target, you can consider how to transform the source data into the output desired for the target. In Warehouse Builder, you specify how to transform the data by designing mappings in the Mapping Editor. A mapping is a Warehouse Builder entity that describes the sequence of operations required to extract data from sources, transform the data, and load the data into one or more targets.
The fundamental unit of design for a mapping is the operator. You use an operator to represent each distinct operation you want to perform in the mapping. Operations include extracting data, loading data, and transforming data (aggregating, joining, performing a lookup, and so on). To indicate the order of operations, you connect the mappings with data flow connections.
To specify data transformation in a mapping, select from the many prebuilt transformation operators or design a new transformation. The prebuilt transformation operators enable commonly performed operations such as filtering, joining, and sorting. Warehouse Builder also includes prebuilt operators for complex operations such as merging data, cleansing data, or profiling data.
If none of the prebuilt transformation operators meet your needs, you can design a new one. You can design the new transformation operator based on the Oracle Database library of PL/SQL functions, procedures, package functions, and package procedures.
Extraction and loading operations are represented by any of the numerous source and target operators. For example, a Table operator represents a table and a Flat File operator represents a flat file. Whether that operator specifies an extraction or loading operation depends on how you connect the operator relative to other operators in the mapping.
An important distinction to understand is the difference between the operator in the mapping and the object it represents. The operator and the object are separate entities until you bind the two together. For example, when you add a table operator to a mapping, you can bind that operator to a specific table in the repository. With the operator bound to the table, you can synchronize changing definitions between the two. If the table operator represents a target and you change the operator in the mapping, then you can propagate those changes back to the table in the repository. If the operator represents a source that incurred a change in its metadata definition, then you can reimport the table in the Design Center and then propagate those changes to the table operator in the Mapping Editor.
Mappings describe a series of operations that extract data from sources, transform it, and load it into targets. They provide a visual representation of the flow of the data and the operations performed on the data. When you design a mapping in Oracle Warehouse Builder, you use the Mapping Editor interface.
Alternatively, you can create and define mappings using OMB*Plus, the scripting interface for Warehouse Builder as described in Oracle Warehouse Builder API and Scripting Reference.
Based on the ETL logic that you define in a mapping, Warehouse Builder generates the code required to implement your design. Warehouse Builder can generate code for the following languages:
PL/SQL mappings, see "PL/SQL Mappings"
SQL*Loader mappings, see "SQL*Loader Mappings"
SAP ABAP mappings, see "SAP ABAP Mappings"
Code Template mappings, "Code Template (CT) Mappings"
The basic design element for a mapping is the operator. Use operators to represent sources and targets in the data flow. Also use operators to define how to transform the data from source to target. The operators that you select as sources affect how you design the mapping. Based on the operators that you select, Warehouse Builder assigns the mapping to one of the following mapping generation languages. Each of these code languages require you to adhere to certain rules when designing a mapping.
SQL
For mappings that contains code templates, Warehouse Builder generates SQL code.
PL/SQL
For all mappings that do not contain either a Flat File operator as a source or a SAP/R3 source, Warehouse Builder generates PL/SQL code. Design considerations for PL/SQL mappings depend upon whether you specify a row-based or set-based operating mode as described in Chapter 10, "Understanding Performance and Advanced ETL Concepts".
SQL*Loader
When you define a Flat File operator as a source, Warehouse Builder generates SQL*Loader code. To design a SQL*Loader mapping correctly, follow the guidelines described in "Flat File Source Operators".
ABAP
When you define a SAP/R3 source, Warehouse Builder generates ABAP code. For mapping design considerations for SAP sources, see "Creating SAP Extraction Mappings".
As you design a mapping, you select operators from the Mapping Editor palette and drag them onto the canvas.
This section introduces the types of operators and refers you to other chapters in this manual for detailed information.
Source and Target Operators: These operators represent Oracle Database objects, flat files, remote sources or targets, and non-Oracle sources and targets.
Remote and Non-Oracle Source and Target Operators: The special requirements for using these operators are discussed in "Using Remote and non-Oracle Source and Target Operators".
Transformation Operators: These operators transform data.
Pre/Post Processing Operators: These operators call a function or procedure before or after executing a mapping.
Pluggable Mapping Operators: These are mappings that function as operators in other mappings.
Real-time Data Warehousing Operators: These operators are used in creating realtime and batch mappings.
Use source and target operators to represent relational database objects and flat file objects.
Table 4-1 lists each source and target operator alphabetically and gives a brief description. For more information about these operators, see Chapter 25, "Source and Target Operators".
Table 4-1 Source and Target Operators
Icon | Operator Name | Description |
---|---|---|
Constant |
Produces a single output group that can contain one or more constant attributes. |
|
Construct Object |
Produces object types and collection types. |
|
Cube |
Represents a cube that you previously defined. |
|
Data Generator |
Provides information such as record number, system date, and sequence values. |
|
Dimension |
Represents a dimension that you previously defined. |
|
Expand Object |
Expands an object type to obtain the individual attributes that comprise the object type. |
|
External Table |
Represents an external table that you previously defined or imported. |
|
Flat File |
Represents a flat file that you previously defined or imported. |
|
Materialized View |
Represents a materialized view that you previously defined |
|
Queue |
Represents an advanced queue that you previously defined. |
|
Sequence |
Generates sequential numbers that increment for each row. |
|
Table |
Represents a table that you previously defined or imported. |
|
Varray Iterator |
Iterates through the values in the table type. |
|
View |
Represents a view that you previously defined or imported. |
To transform your source data, use data flow operators in your mapping.
Table 4-2 lists each data flow operator alphabetically and gives a brief description. For more information about these operators, see Chapter 26, "Data Flow Operators", Chapter 22, "Name and Address Cleansing", and Chapter 23, "Matching, Merging, and Deduplication".
Table 4-2 Data Flow Operators
Icon | Operator Name | Description |
---|---|---|
Aggregator |
Performs data aggregations, such as SUM and AVG, and provides an output row set with aggregated data. |
|
Anydata Cast |
Converts an object of type Sys.AnyData to either a primary type or to a user-defined type. |
|
Deduplicator |
Removes duplicate data in a source by placing a DISTINCT clause in the select code represented by the mapping. |
|
Expression |
Enables you to write SQL expressions that define nonprocedural algorithms for one output parameter of the operator. The expression text can contain combinations of input parameter names, variable names, and library functions. |
|
Filter |
Conditionally filters out rows from a row set. |
|
Joiner |
Joins multiple row sets from different sources with different cardinalities and produces a single output row set. |
|
Lookup |
Performs a lookup of data from a lookup object such as a table, view, materialized view, external table, cube, or dimension. |
|
Match Merge |
Data quality operator that identifies matching records and merges them into a single record. |
|
Name and Address |
Identifies and corrects errors and inconsistencies in name and address source data. |
|
Pivot |
Transforms a single row of attributes into multiple rows. Use this operator to transform data that contained across attributes instead of rows. |
|
Set Operation |
Performs union, union all, intersect, and minus operations in a mapping. |
|
Sorter |
Sorts attributes in ascending or descending order. |
|
Splitter operator |
Splits a single input row set into several output row sets using a boolean split condition. |
|
Subquery Filter |
filter rows based on the results of a subquery. |
|
Table Function |
Enables you to develop custom code to manipulate a set of input rows and return a set of output rows of the same or different cardinality that can be queried like a physical table. |
|
Transformation |
Transforms the attribute value data of rows within a row set using a PL/SQL function or procedure. |
|
Unpivot |
Converts multiple input rows into one output row. It enables you to extract from a source once and produce one row from a set of source rows that are grouped by attributes in the source data. |
Use Pre/Post Processing operators to perform processing before or after executing a mapping. The Mapping parameter operator is used to provide values to and from a mapping.
Table 4-3 lists the Pre/Post Processing operators and the Mapping Parameter operators and gives a brief description. For more details about these operators, see Chapter 26, "Data Flow Operators".
Table 4-3 Pre/Post Processing Operators
Icon | Operator | Description |
---|---|---|
Mapping Input Parameter |
Passes parameter values into a mapping. |
|
Mapping Output Parameter |
Sends values out of a mapping. |
|
Post-Mapping Process |
Calls a function or procedure after executing a mapping. |
|
Pre-Mapping Process |
Calls a function or procedure prior to executing a mapping. |
A pluggable mapping is a reusable grouping of mapping operators that behaves as a single operator.
Table 4-4 lists the Pluggable Mapping operators and gives a brief description. For more information, see Chapter 26, "Data Flow Operators".
Table 4-4 Pluggable Mapping Operators
Icon | Operator | Description |
---|---|---|
Pluggable Mapping |
Represents a reusable mapping. |
|
Pluggable Mapping Input Signature |
A combination of input attributes that flow into the pluggable mapping. |
|
Pluggable Mapping Output Signature |
A combination of output attributes that flow out of the pluggable mapping. |
Real-time data warehousing operators enable you to use queues to perform Change Data Capture. You can create mappings that capture changes in source objects and then apply them to target tables.
Table 4-5 lists the real-time data warehousing operators.
Transformations are PL/SQL functions, procedures, packages, and types that enable you to transform data. You use transformations when designing mappings and process flows that define ETL processes.
Transformations are stored in the Warehouse Builder workspace and can be used in the project in which they are defined.
Transformation packages are deployed at the package level but executed at the transformation level.
In Warehouse Builder, transformations can be categorized as follows:
The following sections provide more details about these types of transformations.
Warehouse Builder provides a set of predefined transformations that enable you to perform common transformation operations. These predefined transformations are part of the public Oracle Predefined library that consists of built-in and seeded functions and procedures. You can directly use these predefined transformations to transform your data.
Predefined transformations are organized into the following categories:
Administration
Character
Control Center
Conversion
Date
Numeric
OLAP
Others
SYS
Spatial
Streams
XML
For more information about the transformations that belong to each category, see Chapter 28, "Warehouse Builder Transformations Reference".
A custom transformation is one that is created by the user. Custom transformations can use predefined transformations as part of their definition.
Custom transformations contain the following categories:
Functions: The Functions category contains standalone functions. This category is available under the Custom node of the Public Transformations node in the Globals Navigator. It is also created automatically under the Transformations node of every Oracle module, DB2 module, and SQL Server module in the Projects Navigator.
Functions can be defined by the user or imported from a database. A function transformation takes 0 to n input parameters and produces a result value.
Procedures: The Procedures category contains any standalone procedures used as transformations. This category is available under the Custom node of the Public Transformations node in the Globals Navigator. It is also automatically created under the Transformations node of each Oracle module in the Globals Navigator.
Procedures can be defined by the user or imported from a database. A procedure transformation takes 0 to n input parameters and produces 0 to n output parameters.
Table functions: The Table Functions category contains any standalone table functions you can use as transformations. This category is available under the Custom node of the Public Transformations node in the Globals Navigator. It is also automatically created under the Transformations node of each Oracle module in the Projects Navigator.
The Table functions category is also listed under Packages. Any table functions created here belong to the package.
Packages: The Packages category contains packages, which in turn contain functions, procedures, table functions, and PL/SQL types. This category is available under the Custom node of the Public Transformations node in the Globals Navigator. It is also automatically created under the Transformations node of each Oracle module in the Globals Navigator.
PL/SQL packages can be created or imported in Warehouse Builder. The package body may be modified. For packages that are imported from database objects, the package header (which is the signature for the function or procedure) cannot be modified. However, for packages that are imported from Warehouse Builder, the package header can be modified. For instructions, see "Importing Transformations"
PL/SQL Types: The PL/SQL Types category contains any standalone PL/SQL types. This includes PL/SQL record types, REF cursor types, and nested table types. The PL/SQL Types category is automatically created in each package that you define under the Packages node in the Transformations node of the Projects Navigator. It is also available under every package that you define in the Globals Navigator. In the Globals Navigator, expand the Public Transformations node, the Oracle node, the Custom node, and then the Packages node to define PL/SQL types under a package.
For further instructions, see "Defining Custom Transformations".
A transformation library consists of a set of reusable transformations. Each time you create a repository, Warehouse Builder creates a Transformation Library containing transformation operations for that repository. This library contains the standard public Oracle Predefined library and an additional library for each Oracle module defined within the project.
Transformation libraries are available under the Public Transformations node of the Globals Navigator in the Design Center.
Transformation libraries can be categorized as follows:
Public Oracle Predefined Library
This is a collection of predefined functions from which you can define procedures for your public Oracle Custom library. The public Oracle Predefined library is contained in the Globals Navigator. Expand the Pre-Defined node under the Public Transformations node. Each category of predefined transformations is represented by a separate node. Expand the node for a category to view the predefined transformations in that category. For example, expand the Character node to view the predefined character transformations contained in the public Oracle Predefined library.
This is a collection of reusable transformations created by the user. These transformations are categorized as functions, procedures, and packages defined within your workspace.
The transformations in the public Oracle Custom library are available under the Custom node of the Public Transformations node. Any transformation that you create under this node is available across all projects in the workspace. For information about creating transformations in the public Oracle Custom library, see "Defining Custom Transformations".
When you deploy a transformation defined in the public Oracle Custom library, the transformation is deployed to the location that is associated with the default control center.
Because transformations can be used at different points in the ETL process, Warehouse Builder enables you to access transformation libraries from different points in the Design Center.
You can access the transformation libraries using the following:
Expression Builder
While creating mappings, you may need to create expressions to transform your source data. The Expression Builder interface enables you to create the expressions required to transform data. Because these expressions can include transformations, Warehouse Builder enables you to access transformation libraries from the Expression Builder.
Transformation libraries are available under the Transformations tab of the Expression Builder. The Private node under TRANSFORMLIBS contains transformations that are available only in the current project. These transformations are created under the Transformations node of the Oracle module. The Public node contains the custom transformations from the public Oracle Custom library and the predefined transformations from the public Oracle Predefined library.
Add Transformation Operator dialog box
The Transformation operator in the Mapping Editor enables you to add transformations, both from the public Oracle Predefined library and the public Oracle Custom library, to a mapping. You can use this operator to transform data as part of the mapping.
Function Editor, Procedure Editor, Edit Function dialog box, or Edit Procedure dialog box
The Implementation tab of these editors enables you to specify the PL/SQL code that is part of the function or procedure body. You can use transformations in the PL/SQL code.