Incorta Intelligent Ingest for Azure Synapse

blogs

Introduction:

For customers using Azure Synapse Analytics who aim to provide business intelligence for Oracle E-Business Suite and Oracle ERP Cloud, the journey from data to insights can be both lengthy and expensive.

Customer Challenges:
Existing data warehouses and reporting must be rebuilt
Phased migrations require simultaneous reporting from EBS and ERP Cloud
Typically, historical transactional data is not migrated
Oracle Cloud doesn’t provide public APIs for bulk data extracts

About the Incorta Intelligent Ingest Feature for Azure Synapse

Incorta’s Intelligent Ingest accelerates enterprise data to analytics by connecting with enterprise applications, some of which lack APIs. It defines the required relationships and logic for business users, and loads data into Azure Synapse via the Azure Data lake. This streamlines the integration between complex data sources and Microsoft Azure Synapse Analytics, accelerating data mart deployment and automating model design and source schema mapping with Incorta data applications.

Incorta Data Applications, aka Blueprints, present a new, faster, more effective method for data analysis and comprehension. They eliminate the need for traditional data modeling or ETL/ELT processes, yet preserve the granularity of raw data sets. By leveraging Incorta Blueprints, customers gain rapid access to enterprise data, and are able to reduce implementation time by as much as 85 percent.

High-level Incorta Intelligent Ingest process:
Connect Incorta to your data source with a data application, and load pre-existing physical schema fact and dimension tables.
Copy the physical schema tables to the Parquet that resides on Azure Data Lake Storage (ADLS) Gen 2.
Load Synapse data marts from the parquet.
Visualize the data with Power BI.

How Incorta Intelligent Works:

The below Incorta components synergize with Intelligent Ingest to process the data from source systems to Azure Synapse. Let’s understand briefly about each object before we jump into the Incorta Intelligent Ingest.

Data Connectors:

The pipeline starts with the data connectors extracting the data from different source systems and applications, providing schema information and data structures.

Data Loader:

The data loader handles data management, loading and scheduling of the schemas. It also manages metadata, compaction, and logging while loading the data. Data loads into paraquet files on Azure Data Lake Storage Gen2 (ADLS2).

Physical Schema:

The Schema is metadata about the properties of each physical source table and includes the source or location for the data, the table columns, data types, joins between tables, formula columns (computed and persisted columns), and runtime security filters..

Business Schema:

The Business Schema is a logical grouping of one or more physical tables and a subset of physical columns. It is used to create a business semantic layer or business view, and provides end-users with an intuitive entry point into the data.

Materialized Views:

MVs persist the business views to the data lake. They can also implement complex transformation logic in Scala, R, SQL and Python. Materialized Views are written and debugged in a notebook interface and can read from multiple data sources and include arbitrarily complex logic including Machine Learning algorithms.

Incorta Blueprints:

Capture best practices and pre-built content for accessing, organizing, and presenting data from popular business solutions. Blueprints contain data connection properties, physical schema descriptions, business views, materialized views, and sample queries. Blueprints slash the time and effort required to bring enterprise application data to Azure Synapse users.

Microsoft Azure Synapse:

Azure Synapse reads the prepared and user-friendly Parquet files from the shared data lake, making the data available for analytics users. Since the data lake consists of standard Parquet files, companies can optionally access the data from other 3rd party tools, such as Databricks to implement Machine Learning on Spark.

Intelligent Ingest Limitations:

If you want to perform an incremental table load, the incremental column name must be LAST_UPDATE_DATE .
Encrypted columns are not supported and will be ingested into Synapse as a string, as it exists in Parquet.
SSO/LDAP users are not supported. The Python script uses only Incorta internal users for authentication.
For incremental data loading, deduplication is only supported across multiple load
jobs, as the solution is built on an assumption that a single load job does not have duplicates.
If you have multiple tenants in Incorta, Synapse will be supported for a single tenant only.
Tenant and schema export/import for data destinations are not supported.

Authored By
Venkat Nagalla
Senior Tech Lead
Data & Insights Practice
Connect with us





I agree to Techwish collecting & processing my personal data, and to receive information on Techwish services. Please see our Privacy policy statement

Related Blogs
Share This