Skip Headers
Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide
11g Release 2 (11.2)

Part Number E10935-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

4 Overview of Transforming Data

One of the main functions of an Extract, Transform, and Load (ETL) tool is to transform data. Oracle Warehouse Builder provides various methods of transforming data. This chapter provides an overview of data transformation in Warehouse Builder.

This chapter contains the following topics:

About Data Transformation in Oracle Warehouse Builder

After you import your source data and define the target, you can consider how to transform the source data into the output desired for the target. In Warehouse Builder, you specify how to transform the data by designing mappings in the Mapping Editor. A mapping is a Warehouse Builder entity that describes the sequence of operations required to extract data from sources, transform the data, and load the data into one or more targets.

The fundamental unit of design for a mapping is the operator. You use an operator to represent each distinct operation you want to perform in the mapping. Operations include extracting data, loading data, and transforming data (aggregating, joining, performing a lookup, and so on). To indicate the order of operations, you connect the mappings with data flow connections.

To specify data transformation in a mapping, select from the many prebuilt transformation operators or design a new transformation. The prebuilt transformation operators enable commonly performed operations such as filtering, joining, and sorting. Warehouse Builder also includes prebuilt operators for complex operations such as merging data, cleansing data, or profiling data.

If none of the prebuilt transformation operators meet your needs, you can design a new one. You can design the new transformation operator based on the Oracle Database library of PL/SQL functions, procedures, package functions, and package procedures.

Extraction and loading operations are represented by any of the numerous source and target operators. For example, a Table operator represents a table and a Flat File operator represents a flat file. Whether that operator specifies an extraction or loading operation depends on how you connect the operator relative to other operators in the mapping.

An important distinction to understand is the difference between the operator in the mapping and the object it represents. The operator and the object are separate entities until you bind the two together. For example, when you add a table operator to a mapping, you can bind that operator to a specific table in the repository. With the operator bound to the table, you can synchronize changing definitions between the two. If the table operator represents a target and you change the operator in the mapping, then you can propagate those changes back to the table in the repository. If the operator represents a source that incurred a change in its metadata definition, then you can reimport the table in the Design Center and then propagate those changes to the table operator in the Mapping Editor.

About Mappings

Mappings describe a series of operations that extract data from sources, transform it, and load it into targets. They provide a visual representation of the flow of the data and the operations performed on the data. When you design a mapping in Oracle Warehouse Builder, you use the Mapping Editor interface.

Alternatively, you can create and define mappings using OMB*Plus, the scripting interface for Warehouse Builder as described in Oracle Warehouse Builder API and Scripting Reference.

Based on the ETL logic that you define in a mapping, Warehouse Builder generates the code required to implement your design. Warehouse Builder can generate code for the following languages:

About Operators

The basic design element for a mapping is the operator. Use operators to represent sources and targets in the data flow. Also use operators to define how to transform the data from source to target. The operators that you select as sources affect how you design the mapping. Based on the operators that you select, Warehouse Builder assigns the mapping to one of the following mapping generation languages. Each of these code languages require you to adhere to certain rules when designing a mapping.

Types of Operators

As you design a mapping, you select operators from the Mapping Editor palette and drag them onto the canvas.

This section introduces the types of operators and refers you to other chapters in this manual for detailed information.

Source and Target Operators

Use source and target operators to represent relational database objects and flat file objects.

Table 4-1 lists each source and target operator alphabetically and gives a brief description. For more information about these operators, see Chapter 25, "Source and Target Operators".

Table 4-1 Source and Target Operators

Icon Operator Name Description
This image displays the icon for the Constant operator.

Constant

Produces a single output group that can contain one or more constant attributes.

This image displays the icon for the Construct Object operator.

Construct Object

Produces object types and collection types.

This image displays the icon for the Cube operator.

Cube

Represents a cube that you previously defined.

This image displays the icon for the Data Generator operator.

Data Generator

Provides information such as record number, system date, and sequence values.

This image displays the icon for the Dimension operator.

Dimension

Represents a dimension that you previously defined.

This image displays the icon for the Expand Object operator.

Expand Object

Expands an object type to obtain the individual attributes that comprise the object type.

This image displays the icon for the External Table operator.

External Table

Represents an external table that you previously defined or imported.

This image displays the icon for the Flat File operator.

Flat File

Represents a flat file that you previously defined or imported.

This image displays the icon for the Materialized View operator.

Materialized View

Represents a materialized view that you previously defined

This image displays the icon for the Queue operator.

Queue

Represents an advanced queue that you previously defined.

This image displays the icon for the Sequence operator.

Sequence

Generates sequential numbers that increment for each row.

This image displays the icon for the Table operator.

Table

Represents a table that you previously defined or imported.

This image displays the icon for the Varray Iterator operator.

Varray Iterator

Iterates through the values in the table type.

This image displays the icon for the View operator.

View

Represents a view that you previously defined or imported.


Transformation Operators

To transform your source data, use data flow operators in your mapping.

Table 4-2 lists each data flow operator alphabetically and gives a brief description. For more information about these operators, see Chapter 26, "Data Flow Operators", Chapter 22, "Name and Address Cleansing", and Chapter 23, "Matching, Merging, and Deduplication".

Table 4-2 Data Flow Operators

Icon Operator Name Description
This image displays the icon for the Aggregator operator.

Aggregator

Performs data aggregations, such as SUM and AVG, and provides an output row set with aggregated data.

This image displays the icon for the Anydata Cast operator.

Anydata Cast

Converts an object of type Sys.AnyData to either a primary type or to a user-defined type.

This image displays the icon for the Deduplicator operator.

Deduplicator

Removes duplicate data in a source by placing a DISTINCT clause in the select code represented by the mapping.

This image displays the icon for the Expression operator.

Expression

Enables you to write SQL expressions that define nonprocedural algorithms for one output parameter of the operator. The expression text can contain combinations of input parameter names, variable names, and library functions.

This image displays the icon for the Filter operator.

Filter

Conditionally filters out rows from a row set.

This image displays the icon for the Joiner operator.

Joiner

Joins multiple row sets from different sources with different cardinalities and produces a single output row set.

This image displays the icon for the Lookup operator.

Lookup

Performs a lookup of data from a lookup object such as a table, view, materialized view, external table, cube, or dimension.

This image displays the icon for the Match Merge operator.

Match Merge

Data quality operator that identifies matching records and merges them into a single record.

This image displays the icon for the Name and Address operator.

Name and Address

Identifies and corrects errors and inconsistencies in name and address source data.

This image displays the icon for the Pivot operator.

Pivot

Transforms a single row of attributes into multiple rows. Use this operator to transform data that contained across attributes instead of rows.

This image displays the icon for the Set Operation operator.

Set Operation

Performs union, union all, intersect, and minus operations in a mapping.

This image displays the icon for the Sorter operator.

Sorter

Sorts attributes in ascending or descending order.

This image displays the icon for the Splitter operator.

Splitter operator

Splits a single input row set into several output row sets using a boolean split condition.

This image displays the icon for the Subquery Filter operator.

Subquery Filter

filter rows based on the results of a subquery.

This image displays the icon for the Table Function operator.

Table Function

Enables you to develop custom code to manipulate a set of input rows and return a set of output rows of the same or different cardinality that can be queried like a physical table.

This image displays the icon for the Transformation operator.

Transformation

Transforms the attribute value data of rows within a row set using a PL/SQL function or procedure.

This image displays the icon for the Unpivot operator.

Unpivot

Converts multiple input rows into one output row. It enables you to extract from a source once and produce one row from a set of source rows that are grouped by attributes in the source data.


Pre/Post Processing Operators

Use Pre/Post Processing operators to perform processing before or after executing a mapping. The Mapping parameter operator is used to provide values to and from a mapping.

Table 4-3 lists the Pre/Post Processing operators and the Mapping Parameter operators and gives a brief description. For more details about these operators, see Chapter 26, "Data Flow Operators".

Table 4-3 Pre/Post Processing Operators

Icon Operator Description
This image displays the icon for the Mapping Input Parameter operator.

Mapping Input Parameter

Passes parameter values into a mapping.

This image displays the icon for the Mapping Output Parameter operator.

Mapping Output Parameter

Sends values out of a mapping.

This image displays the icon for the Post-Mapping Process operator.

Post-Mapping Process

Calls a function or procedure after executing a mapping.

This image displays the icon for the Pre-Mapping Process operator.

Pre-Mapping Process

Calls a function or procedure prior to executing a mapping.


Pluggable Mapping Operators

A pluggable mapping is a reusable grouping of mapping operators that behaves as a single operator.

Table 4-4 lists the Pluggable Mapping operators and gives a brief description. For more information, see Chapter 26, "Data Flow Operators".

Table 4-4 Pluggable Mapping Operators

Icon Operator Description
This image displays the icon for the Pluggable Mapping operator.

Pluggable Mapping

Represents a reusable mapping.

This image displays the icon for the Pluggable Mapping Input Signature operator.

Pluggable Mapping Input Signature

A combination of input attributes that flow into the pluggable mapping.

This image displays the icon for the Pluggable Mapping Output Signature operator.

Pluggable Mapping Output Signature

A combination of output attributes that flow out of the pluggable mapping.


Real-time Data Warehousing Operators

Real-time data warehousing operators enable you to use queues to perform Change Data Capture. You can create mappings that capture changes in source objects and then apply them to target tables.

Table 4-5 lists the real-time data warehousing operators.

Table 4-5 Pluggable Mapping Operators

Icon Operator Description
This image displays the icon for the LCR Cast operator.

LCR Cast

Expands an LCR (Logical Change Record) object into its constituent columns.

This image displays the icon for the LCR Splitter operator.

LCR Splitter

Directs changes to different tables along data flow paths.


About Transformations

Transformations are PL/SQL functions, procedures, packages, and types that enable you to transform data. You use transformations when designing mappings and process flows that define ETL processes.

Transformations are stored in the Warehouse Builder workspace and can be used in the project in which they are defined.

Transformation packages are deployed at the package level but executed at the transformation level.

Types of Transformations

In Warehouse Builder, transformations can be categorized as follows:

The following sections provide more details about these types of transformations.

Predefined Transformations

Warehouse Builder provides a set of predefined transformations that enable you to perform common transformation operations. These predefined transformations are part of the public Oracle Predefined library that consists of built-in and seeded functions and procedures. You can directly use these predefined transformations to transform your data.

Predefined transformations are organized into the following categories:

  • Administration

  • Character

  • Control Center

  • Conversion

  • Date

  • Numeric

  • OLAP

  • Others

  • SYS

  • Spatial

  • Streams

  • XML

For more information about the transformations that belong to each category, see Chapter 28, "Warehouse Builder Transformations Reference".

Custom Transformations

A custom transformation is one that is created by the user. Custom transformations can use predefined transformations as part of their definition.

Custom transformations contain the following categories:

  • Functions: The Functions category contains standalone functions. This category is available under the Custom node of the Public Transformations node in the Globals Navigator. It is also created automatically under the Transformations node of every Oracle module, DB2 module, and SQL Server module in the Projects Navigator.

    Functions can be defined by the user or imported from a database. A function transformation takes 0 to n input parameters and produces a result value.

  • Procedures: The Procedures category contains any standalone procedures used as transformations. This category is available under the Custom node of the Public Transformations node in the Globals Navigator. It is also automatically created under the Transformations node of each Oracle module in the Globals Navigator.

    Procedures can be defined by the user or imported from a database. A procedure transformation takes 0 to n input parameters and produces 0 to n output parameters.

  • Table functions: The Table Functions category contains any standalone table functions you can use as transformations. This category is available under the Custom node of the Public Transformations node in the Globals Navigator. It is also automatically created under the Transformations node of each Oracle module in the Projects Navigator.

    The Table functions category is also listed under Packages. Any table functions created here belong to the package.

  • Packages: The Packages category contains packages, which in turn contain functions, procedures, table functions, and PL/SQL types. This category is available under the Custom node of the Public Transformations node in the Globals Navigator. It is also automatically created under the Transformations node of each Oracle module in the Globals Navigator.

    PL/SQL packages can be created or imported in Warehouse Builder. The package body may be modified. For packages that are imported from database objects, the package header (which is the signature for the function or procedure) cannot be modified. However, for packages that are imported from Warehouse Builder, the package header can be modified. For instructions, see "Importing Transformations"

  • PL/SQL Types: The PL/SQL Types category contains any standalone PL/SQL types. This includes PL/SQL record types, REF cursor types, and nested table types. The PL/SQL Types category is automatically created in each package that you define under the Packages node in the Transformations node of the Projects Navigator. It is also available under every package that you define in the Globals Navigator. In the Globals Navigator, expand the Public Transformations node, the Oracle node, the Custom node, and then the Packages node to define PL/SQL types under a package.

For further instructions, see "Defining Custom Transformations".

About Transformation Libraries

A transformation library consists of a set of reusable transformations. Each time you create a repository, Warehouse Builder creates a Transformation Library containing transformation operations for that repository. This library contains the standard public Oracle Predefined library and an additional library for each Oracle module defined within the project.

Transformation libraries are available under the Public Transformations node of the Globals Navigator in the Design Center.

Types of Transformation Libraries

Transformation libraries can be categorized as follows:

  • Public Oracle Predefined Library

    This is a collection of predefined functions from which you can define procedures for your public Oracle Custom library. The public Oracle Predefined library is contained in the Globals Navigator. Expand the Pre-Defined node under the Public Transformations node. Each category of predefined transformations is represented by a separate node. Expand the node for a category to view the predefined transformations in that category. For example, expand the Character node to view the predefined character transformations contained in the public Oracle Predefined library.

  • Public Oracle Custom Library

    This is a collection of reusable transformations created by the user. These transformations are categorized as functions, procedures, and packages defined within your workspace.

    The transformations in the public Oracle Custom library are available under the Custom node of the Public Transformations node. Any transformation that you create under this node is available across all projects in the workspace. For information about creating transformations in the public Oracle Custom library, see "Defining Custom Transformations".

    When you deploy a transformation defined in the public Oracle Custom library, the transformation is deployed to the location that is associated with the default control center.

Accessing Transformation Libraries

Because transformations can be used at different points in the ETL process, Warehouse Builder enables you to access transformation libraries from different points in the Design Center.

You can access the transformation libraries using the following:

  • Expression Builder

    While creating mappings, you may need to create expressions to transform your source data. The Expression Builder interface enables you to create the expressions required to transform data. Because these expressions can include transformations, Warehouse Builder enables you to access transformation libraries from the Expression Builder.

    Transformation libraries are available under the Transformations tab of the Expression Builder. The Private node under TRANSFORMLIBS contains transformations that are available only in the current project. These transformations are created under the Transformations node of the Oracle module. The Public node contains the custom transformations from the public Oracle Custom library and the predefined transformations from the public Oracle Predefined library.

  • Add Transformation Operator dialog box

    The Transformation operator in the Mapping Editor enables you to add transformations, both from the public Oracle Predefined library and the public Oracle Custom library, to a mapping. You can use this operator to transform data as part of the mapping.

  • Function Editor, Procedure Editor, Edit Function dialog box, or Edit Procedure dialog box

    The Implementation tab of these editors enables you to specify the PL/SQL code that is part of the function or procedure body. You can use transformations in the PL/SQL code.