Skip to content

Latest commit

 

History

History
 
 

DataLakeIngestion

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Description

The intent with this solution is to automatically ingest binary data created on an external datastore into Microsoft Energy Data Services. While the example uses an Azure Data Lake, any compatible storage solutions could theoretically be used.

Shared Prerequisites

These prerequisites is needed to deploy the solution above. Expand each prerequisite in the list below to see example code.

Azure CLI

Download from aka.ms/azurecli.
Login to the Azure CLI using the command below, and your user with subscription owner rights:

az login

Verify that the right subscription is selected:

az account show

If the correct subscription is not selected, run the following command:

az account set --subscription %subscription_id%
Azure Resource Providers
az provider register --namespace Microsoft.DataFactory
az provider register --namespace Microsoft.DataLakeStore
az provider register --namespace Microsoft.OpenEnergyPlatform
az provider register --namespace Microsoft.Sql
az provider register --namespace Microsoft.Storage
az provider register --namespace Microsoft.Synapse
Azure Resource Group
az group create `
    --name <resource-group> `
    --location <location>
Example
az group create `
    --name medssynapse-rg `
    --location westeurope
Azure Data Lake Storage or Azure Storage Account
az storage account create `
    --name <storage-account> `
    --resource-group <resource-group> `
    --sku Standard_LRS `
    --hns true
Example
az storage account create `
    --name eirikmedsadls `
    --resource-group medssynapse-rg `
    --sku Standard_LRS `
    --hns true

Then create a container to use as the source.

az storage container create `
    --account-name <storage-account> `
    --name <container> `
    --auth-mode login
Example
az storage container create `
    --account-name eirikmedsadls `
    --name medssource `
    --auth-mode login
Azure Synapse Workspace
az synapse workspace create `
    --name <workspace-name> `
    --file-system <filesystem> `
    --resource-group <resource-group> `
    --storage-account <storage-account>`
    --sql-admin-login-user <username> `
    --sql-admin-login-password <password> 
Example
az synapse workspace create `
    --name eirikmedssynapse `
    --file-system synapsefs `
    --resource-group medssynapse-rg `
    --storage-account eirikmedsadls `
    --sql-admin-login-user mysqladmin `
    --sql-admin-login-password mysqlpassword1! 

Open the Synapse Workspace for public access.

az synapse workspace firewall-rule create `
    --name <rule-name> `
    --resource-group <resource-group> `
    --workspace-name <workspace-name> `
    --start-ip-address <start-ip> `
    --end-ip-address <end-ip>
Example
az synapse workspace firewall-rule create `
    --name allowAll `
    --resource-group medssynapse-rg `
    --workspace-name eirikmedssynapse `
    --start-ip-address 0.0.0.0 `
    --end-ip-address 255.255.255.255
Microsoft Energy Data Services

As this is a gated Public Preview product, please see the instructions at learn.microsoft.com.


Authentication Mechanism

There are two main ways to authenticate your API calls towards Microsoft Energy Data Services. You should only choose one of these and follow the guide for the one you selected throughout.

NOTE: Click the header link of your chosen authentication method to proceed with the pipeline deployment.

This will use the managed identity of your Synapse Workspace in conjunction with the scope of the Microsoft Energy Data Services Application Registration. This will be the easiest way to authenticate, as no secrets needs to be stored for the runtime to work. However, security policies may be in place preventing this authentication mechanism.

Pros

  • Less prerequisites and configuration
  • Less secrets management
  • Pipeline runs are quicker due to less activities

Cons

  • May be restricted by policy

Alt 2. Access Token (WORK IN PROGRESS)

This authentication method will use an Application Registration to fetch an access token and parse this in the API requests.

Pros

  • Easier to debug
  • Should work "no matter what"

Cons

  • Additional prerequisites required
  • Pipeline runs slower due to additional Token activities