What is ETL

What is ETL

What is ETL?

ETL means Extract, Transformation and Load. It is a process of extracting data from multiple sources, transformed data as per business requirement and load it in one single or multiple places. Generally it is used to create Data Marts and Data Warehouses.

Following are the detail explanation on Extract, Transformation and Load process.

Extract: It is a process of collecting data from multiple sources (source may be a database or multiple databases, a file or multiple files.) in to a single place called staging area. The main purpose in this to collect data from different sources in to a single staging database for further processing. The extracted data or staging database is now the primary source for Transformation.

Transformation: In this stage we perform various transformation to clean staging data as per business requirement. Cleaning data includes applying data conversion, removing unwanted characters, etc. The cleaned data is then store in temporary transformation table (It is also used to store historical data in some scenario). After cleaning the staging data (stored in temporary transformation tables) complex business logics are implemented to achieve business goals. The final clean data with business logic is stored in transformation database table. The final transformation database tables are source for creating data marts andor data warehouses. Below are some example of transformation process

  • Use Common Data types for database columns coming from different sources. Example customer related data is coming from two different sources having different datatypes for some columns.
  • Combine multiple columns in to one. Example Combine “First_Name” and “Last_Name” in to a single column named “Full_Name”.
  • Change Date Format.
  • Remove unwanted characters. Example in a flat file source contains data enclosed using pipe delimiter.

Load: It is the final stage of ETL process. Used to create data marts and data warehouses. Following are some main activities perform in this stage

  • Combine multiple transformation table data in to a single data mart or data warehousing table.
  • Create Dimensions Tables.
  • Create Facts Tables.
  • Apply slowly changing dimension.
  • Implementing star or snow flake schema between Dimension and Fact tables.

Note: Click on links to know more about a particular topic.

0 Comments
Leave A Comment

Please login to post your valuable comments.

Join the newsletter

Get the latest vLemonn news first

share