CI-D Inbound Data Batch Integration Patterns – Microsoft Power Query – Part 1

Views (44)

Guillaume Mouton

Like(1)

Report

Overview

This is the third blog article from the Dynamics 365 Customer Insights Data – Inbound Data Batch Integration Patterns series.

Please make sure to go through the first published blog post of the series as this post will provides you with some essential concepts and semantic about Customer Insights Data (CI-D) Data processing architecture that we’ll re-use through this post.

In this third blog post, we’ll focus on the Microsoft Power Query Integration Pattern and we will cover:

Introduction to the Integration pattern
Decision factors for leveraging this pattern
Example of decisions factors walkthrough
Reference and sample architectures
Pre-Requisites and Capabilities

1. Introduction to the Integration Pattern

Microsoft Power Query is a common experience found in different Microsoft products such as Excel, Power BI or the Power Platform.

In the case of CI-D, the underlying execution engine is the Power Platform, where a CI-D data source using the Power Query option will be created as a Power Platform Data Flow (PPDF) in the underlying Dataverse environment supporting the CI-D instance.

Power Platform Data Flows originating from CI-D Data Sources will be created as Analytical Dataflows. Resulting Data Sets from the Analytical Dataflows will be stored in the Source Lake

Microsoft Power Query is an “import” type pattern which induces data movement and requires Source Lake storage capacity. It’s important to understand that when using a Power Query flow, the resulting output Data Set will be stored in the Source Lake.

2. Decision Factors for leveraging this pattern

As part of this blog post series, we’re proposing 5 decision factors to be detailed and assessed when evaluating Integration Patterns.

Those decision factors are presented by descending order of priorities for the decision process.

The decision factors are described in the first blog post from this series.

2.1 Data Sources Types, Volumes and Wrangling complexity

Integration Pattern fits when : Expected Data Volumes and transformation complexities are Low to Medium and / or identified Data Sources do not induce particular concerns (either in terms of connectivity, data types conversion complexities …)

Description
Though Power Query can handle significant Data Volumes when the proper conditions are met, its performance and reliability could be impaired depending on multiple factors such as the connectors being used or wrangling complexity. We dot not recommend leveraging this pattern for data sources that would exceed 5 million records.

You should be aware of Power Query best practices when moving on with this pattern. Selecting the proper Data Connector for a particular Data Source and optimizing your query are key to achieve the best level of performance and reliability. You must also check if a Data Source would support Query Folding as it will allow to push down part or all of the resulting query execution to the Data Source engine.

2.2 Existing Integration Solutions / Integration IT Teams

Integration Pattern fits when : Limited pre-existing Data Integration capabilities are available in the customer landscape and / or when Data Integration Teams have low availabilities for the CI implementation project.

Description
Overall Solution architecture should benefit from existing Customer’s Integration solutions and associated Integration teams expertise, when they exist. Thus ensuring that the CI-D implementation can rely on proven industrialization and support processes.

Power Query is to be favored when such solutions / teams are lacking in the customer context, but can also be leveraged at pre-implementation phases (POC, POV, rapid prototyping of Data Flows in a CI-D SandBox environment) where Integration teams may not yet be engaged.

Another use case would be a requirement for Business Users autonomy, on a “Citizen Data Integrator” approach, as Power Query offers an highly interactive and intuitive experience for rapidly and iteratively building queries.

2.3 Existing Azure assets

Integration Pattern fits when : Customer has a low to medium level of pre-existing Azure services and / or Azure knowledge

Description
This pattern does not require pre-existing investments in Azure services (such as ADLS Gen2 or Synapse).

Existing Customer investments in ADLS Gen2, Azure Data Lake and Dataverse must be identified and leveraged as they may provide acceleration benefits for the CI-D implementation.

2.4 Cost Of Ownership

Integration Pattern fits when : The customer expect a low cost of ownership associated to its CI-D Solution Architecture

Description
When a Customer lacks exiting Data integration solutions, CI-D provides the Power Query experience at no extra cost for the customer (cf: Dynamics 365 Licensing Guide)

2.5 Time to market

Integration Pattern fits when : Customer expects a fast implementation for CI-D with an optimized project overall duration until the first Go-Live.

Description
Power Query has been designed to address the typical data integration challenges so that Data Preparation time of the project can drastically be reduced.

It can also support a “Citizen Data Integrator” approach as previously expressed, particularly when IT Teams in charge of Data integration have no bandwidth for the project implementation.

3. Example of decisions factors walkthrough

3.1 Context

This walkthrough is based on a real customer implementation of CI-D. The customer is a recognized association of professionals in the Financial Services industry.

The Customer was looking to increase its renewal rates by providing an enhanced membership experience. Understanding which members are most at risk of not-renewing is essential to reaching the right members, with the right messaging, at the right time.

The Customer already had invested in Azure and had experience with Power BI.

Its four primary objectives were:

Create a unified view of the Customer members through the consolidation of member campaign and behavioral data.
Enhance data from scores and models and use a propensity model to measure member engagement and likelihood to renew.
Democratize data through dashboards - creating a suite of Power BI dashboards to be a self-serve model and provide training and support to all user groups.
Enable more personalized marketing communications to deliver the right message to the right customer at the right time.

3.2 Data Sources

Three Data Sources that were already established were found:

Membership Management, based on SQL server, in a dedicated Azure V-Net, and accessed via a data gateway, contribute to 500k members, their segments, and some activities (e.g. meeting attendance, mentoring details, purchases, etc.) spanning a total of 14 tables.
Google Analytics, website tracking data is staged in Google Big Query where some pre-filtering is done to keep only relevant information of identifiable users before being ingested in CI-D.
Email Marketing data is acquired by tracking email communications (clicks, opens, bounces etc.) for service messaging and marketing communications. The exposed API only allows to access the last 30 days of data. An intermediary extraction and persistence layer has been designed using Azure Function and Azure SQL Database, which is then ingested by Customer Insights Data (around 3.5 million records).

3.3 Decision Factors assessment

Decision Factors	Assessment
Data Sources Types, Volumes and Wrangling complexity	Data Sources are of low complexity type ( SQL Server, Google Big Query) with existing connectors in Power Query. Data Volume is relatively Low : 500K Customer profiles, a total of 3.5 million Marketing emails are ingested in CI-D. Wrangling tasks happen before data ingestion in CI-D leading to a Low wrangling complexity for PPDF : Google Analytics are pre-filtered in Google Big Query, the Azure Function service associated to the Azure SQL Database deals with the Email Marketing tool APIs first ingestion Power Query “Query Folding” capability can be leveraged on SQL Server and Big Query Data Sources to optimize Data ingestion processing.
Existing Integration Solutions / Integration IT Teams	Customer had IT Teams (internal and partners) already skilled on some Azure services such as the Data Gateway and Azure Functions.
Existing Azure assets	Customer has a low level of pre-existing Azure assets, relying on Azure for operating SQL Server in a IAAS model and a few other services in place such as the Data Gateway.
Cost Of Ownership	Customer expected a streamlined architecture that would limit recurrent OPEX fees and would not induce heavy workloads on IT side for run and support activities.
Time to market	Customer wanted first to quickly prove CI-D value starting from a POC in a very iterative approach. Power Query providing the native connectivity to his SQL Server sources and Big Query, it allows to quickly ingest available data in CI-D and focus on prototyping the first CI-D use cases.

4. Reference and sample architectures

4.1 Batch Data Processing Architecture – Data Integration with Power Query Connectors (PPDF)

4.2 Walkthrough Implementation example

5. Pattern Pre-requisites and Capabilities

5.1 Pattern pre-requisites and associated services entitlement

Leveraging the Microsoft Power Query Integration Pattern (a.k.a : Power Platform Data Flow) requires a Dataverse environment to be provisioned.

The Dataverse environment is created by the customer through the Power Platform Admin Center prior to CI-D instance creation: Create a new environment - Dynamics 365 Customer Insights | Microsoft Learn

A data source based on Power Query creates a dataflow in Dataverse. Don't change the name or modify the query steps of a dataflow in the Power Platform admin center when it is used in Customer Insights - Data.

Modifying or renaming a dataflow causes issues with the references between the data source and the Dataverse dataflow.

5.1.1 Dataverse entitlement

When leveraging PPDF, you must be aware that all Power Query inbound integration flows will writes Data files into the Dataverse environment associated with CI-D, which will count toward the Dataverse File Storage entitlement.

For the vast majority of our customers, the Dataverse file Storage consumed by CI-D PPDF flows is covered by the acquired CI-D Licenses, as CI-D Licenses comes with additional Dataverse Capacity entitlement as described in our Dynamics Licensing Guide.

5.1.2 PowerQuery Online Service Entitlement

The Customer Insights Data subscription provides the customer with the usage of the Power Query Online service.

Limits are enforced for this service’s usage. Power Query Online limits are described in this table, please refer to the “Dataflows in Dynamics 365 Customer Insights” product which provides the limits associated to CI-D usage.

5.2 Accessing firewalled and / or On Premises Data Sources

The Power Query integration pattern can access on-premises data sources or cloud resources which are restricted in terms of network access (not exposed on Public Networks).

To achieve this, it is required to setup On-Premises Data Gateway(s) which will act as a bridge, facilitating secure data transfer between on-premises data or other Cloud resources. Virtual Network Data Gateway(s) can also be used to access Azure cloud data which are prevented to be exposed on Public networks.

Please note that Power Query integration pattern still requires to rely on a supported connector to access your OnPrem / Restricted Cloud data sources.

Data gateways from an existing Power BI or Power Apps environment will be visible and you can reuse them in Customer Insights if the data gateway and the Customer Insights Data environment are in the same Azure Region

Ensure to follow the guidance and troubleshooting links provided by our Public Documentation for Data Gateways usage

5.3 Incremental Data

As of today, though Power Platform Dataflow supports incremental capabilities, the Power Query pattern doesn’t support incremental data refresh in a CI-D context.

Comments

*This post is locked for comments

Community site session details

CI-D Inbound Data Batch Integration Patterns – Microsoft Power Query – Part 1

Comments

Responsible AI policies

Abhilash Warrier – Community Spotlight

Congratulations to the September Top 10 Community Leaders!