- Tomáš Pajurek - CTO at Spotflow, Team Lead and Software Architect at Datamole.
- David Nepožitek - Software Engineer at Spotflow and Datamole.
Datamole helps industrial companies become more sustainable and profitable by developing innovative data & AI-driven solutions. Spotflow is Datamole's spinoff, building industrial IoT platform for managing devices and collecting data reliably at scale.
See the Course Outline in the root of the repository.
- On-demand compute & storage resources accessible over the Internet.
- Mostly multi-tenant.
- Infrastructure as a Service (IaaS): virtual machines, disks, networking, dedicated machines.
- Platform as a Service (PaaS): web servers, databases, serverless, Kubernetes.
- Software as a Service (SaaS): applications, analytics, machine Learning, geospatial, etc.
- New ideas can be tested immediately without up-front investments and delays.
- Infrastructure for the applications can be added just-in-time.
- Including e.g. GPUs, NVMes, SGX/SEV.
- No need to plan too far ahead.
- Suitable for incremental development.
- No need to over-allocate resources. Infrastructure over-provisioning can be avoided with pay-as-you-go pricing.
- The capacity of the system can be adjusted/scaled to current demand.
- The scaling is very flexible and can be even on a less-than-hourly basis.
Compared to on-premises solutions, deploying the system in multiple regions is easier if needed.
Different systems or even different parts of a single system can choose the trade-off between:
- IaaS services that can be highly customized but bring higher operational overhead and high-level
- PaaS and SaaS services that are not as customizable but bring lower operational overhead.
- Infrastructure can be redesigned at any time.
- The cost of provisioning new infrastructure does not have to be considered.
- ROI of existing infrastructure does not have to be considered.
Network errors, overloaded servers, unavailable services, VMs being restarted, and DNS being reconfigured are a norm, software running in the cloud must be designed to handle them.
Everything is exposed as an API over the Internet. There is no layer of physical security as in traditional data centers. Every mistake in configuration or access control is a very serious one.
Also, from time to time, some cloud service is deprecated which require migration to another service.
IaaS services are quite generic and portable. However PaaS and SaaS services are often very specific to the cloud provider and migration to other providers is hard or not possible without system redesign (e.g. due reliance on specific service guarantees or cost model).
- Significant issue for some companies.
- Government access requests?
- Quite complicated cost structure - compute, storage, egress, ingress, and more.
- Can be much more expensive than comparable on-premises "bare metal" hardware.
- For predictable workloads, cost can be optimized with capacity reservations but will be higher than on-premises.
There are multiple points of view:
- Performance.
- Reliability.
- Power usage.
- Real-time.
- Deployment of devices into the field
- Networking.
- Security.
- Updates.
- (Stateful) connection of many devices.
- Storing of data.
- Processing of data.
- Cloud-to-device communication.
The development of backends for IoT solutions is software engineering like any other. However, there are some challenges that are more common in IoT solutions than in typical business applications. Also, many of these challenges are also relevant for observability solutions.
- Long connectivity disruptions, as well as other failures are a norm, not an exception.
- Network bandwidth is limited.
- Disk space is limited.
- Harsh prioritization is needed.
- Extremely unreliable clocks on devices
It is much easier to process data from devices if there are at least some guarantees about ordering.
In-order processing is not trivial in distributed systems.
- Retries, fail-overs.
- Trade-off with high availability.
Logging/tracing every single transaction (e.g., collecting one sensor reading) is not feasible:
- Designing a logging/tracing strategy is non-trivial.
- Debugging is harder.
Typical IoT solutions need at least some functionality to work close to real-time manner (to have minimal latency from device sending a data point to time moment when the data point is processed).
Low latency negatively impacts throughput (e.g., possibility of batching of data points limited).
Devices (clients) that are distributed across the globe (or at least across multiple regions) bring additional challenges:
- Varying network reliability and latency.
- Might lead to geo-distributed system architecture.
PaaS serving as a general-purpose SQL transactional database built on-top of the Microsoft SQL Server.
Azure Docs: https://learn.microsoft.com/en-us/azure/azure-sql/database
Serverless PaaS for running pieces of code (functions) with minimal management overhead.
Azure Docs: https://docs.microsoft.com/en-us/azure/azure-functions/
The function is triggered by an HTTP request and its return value is the HTTP response to the request.
Infrastructure as Code (IaC) tool for Azure services. It allows to define the infrastructure in a declarative way (JSON). It is useful to manage large deployments in repeatable and predictable way.
Azure Docs: https://docs.microsoft.com/en-us/azure/templates/
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"apiVersion": "2022-09-01",
"name": "mystorageaccount",
"location": "westeurope",
"sku": {
"name": "Standard_LRS"
},
"kind": "StorageV2"
},
]
}
Your client operates a delivery company with five sorting facilities. In these facilities, robots retrieve parcels from the inbound zones and transport them to the outbound zones, where they are prepared for the next stages of delivery. The client wants to keep track of the parcel movement within the facilities and get daily reports.
Example:
Robot R-1 moves parcel 4242 from inbound zone I-12 to outbound zone O-25 in 40 seconds .
The client needs answers for the following:
- What was the daily volume of parcels transported within the facilities?
- What was the daily average transportation time?
- How did the parcel 4242 move within a facility F-1 on a day 20-04-2024?
They want to consume the data via HTTP API from their auditing service.
Let's assume that (date, facilityId, parcelId) identifies exactly one transport within the facility. That is, on a specific day, in a particular facility, each parcel could be transported at most once.
- HTTP API
- Event Consumer
- Stats Reporter
- Storage
Request Method: POST
Request Query Parameters: None
Request Body:
{
"transportId": "15asd55cvgh",
"parcelId": "sf546ad465asd",
"facilityId": "prague-e12",
"transportedAt": "2022-04-05T15:01:02Z",
"locationFrom": "in-25",
"locationTo": "out-35",
"transportDurationSec": 31,
"deviceId": "sorter-1654345"
}
Response Code:
201 Created
- Event was successfully stored.400 Bad Request
- Body is not in the correct form.
Response Body: None
Request Method: GET
Request Query Parameters:
date
- a day for which the statistics are calculated in form ofyyyy-MM-dd
Request Body: None
Response Code
200 OK
- Statistics calculated and returned in the body204 No Content
- No events for the given day exists400 Bad Request
- The query parameterday
is not correct
Response Body:
{
"day": "20220405",
"totalTransported": 42,
"avgDurationOfTransportationSec": 40.2
}
Request Method: GET
Request Query Parameters:
date
- a day of transportation in form ofyyyy-MM-dd
facilityId
- name of the sorting facilityparcelId
- id of the parcel
Request Body: None
Response Code
200 OK
- Statistics calculated and returned in the body204 No Content
- No events for the given day exists400 Bad Request
- The query parameterday
is not correct
Response Body:
{
"transportedDate": "2022-04-05",
"facilityId": "prague",
"parcelId": "123",
"transportedAt": "2022-04-05T15:01:02+00:00",
"locationFrom": "in-25",
"locationTo": "out-35",
"timeSpentSeconds": 31,
"deviceId": "sorter-1654345",
"transportId": "15asd55cvgh"
}
Prerequisites: Azure CLI
Clone the repository
git clone https://github.com/datamole-ai/mff-cloud-app-development.git
Navigate to the lesson-1/arm
directory and see the ARM template. It contains
- Function App
- Azure SQL
- Storage Account
- App Insights
First thing is to create a new resource group. In the following command, replace the <resource-group>
with your own name (e.g. "mff-iot-{name}-{surname}").
az group create --location 'WestEurope' -n <resource-group>
Edit the values in the lesson-1/arm/resources.azrm.parameters.json
so they are unique.
Then, deploy the infrastructure defined in lesson-1/arm/resources.azrm.json
with the following command.
cd lesson-1/arm
az deployment group create `
--name "deploy-mff-task-components" `
--resource-group "mff-lectures" `
--template-file "resources.azrm.json" `
--parameters "resources.azrm.parameters.json" `
--parameters adminPassword=<password-to-sql-server>
You should copy the Connection String to the database for local development. It should appear in the output as follows:
"outputs": {
"sqlConnectionString": {
"type": "String",
"value": "Server=tcp:mff-iot-sql-...database.windows.net,1433;Initial Catalog=transports;User ID=mffAdmin;Password=<password-to-sql-server>;"
}
}
Prerequisites: Azure Functions Core
Create Azure Function .NET project (docs here)
mkdir ./iot-usecase-1
cd ./iot-usecase-1
func init "AzureFunctions" --worker-runtime "dotnet-isolated" --target-framework "net8.0"
Create the individual Azure Functions
cd ./iot-usecase-1/AzureFunctions
func new --name "Reporter" --template "HTTP trigger" --authlevel "function"
func new --name "GetDailyStatistics" --template "HTTP trigger" --authlevel "function"
func new --name "GetTransport" --template "HTTP trigger" --authlevel "function"
cd iot-usecase-1/AzureFunctions
func azure functionApp publish "<name-of-the-functionapp>" --show-keys
In the output, you will receive URIs of each azure function. Put them down.
EventConsumer - [httpTrigger]
Invoke url: https://<name-of-the-functionapp>.azurewebsites.net/api/eventconsumer?code=<code>
GetDailyStatistics - [httpTrigger]
Invoke url: https://<name-of-the-functionapp>.azurewebsites.net/api/getdailystatistics?code=<code>
GetTransport - [httpTrigger]
Invoke url: https://<name-of-the-functionapp>.azurewebsites.net/api/gettransport?code=<code>
You can test the function with your HTTP client of choice:
curl "https://<name-of-the-functionapp>.azurewebsites.net/api/gettransport?code=<code>"
Open Azure Portal in you browser.
Find the SQL Database resource.
Go to Query Editor in the left panel.
Login using the admin credentials.
Execute the following query:
CREATE TABLE [Transports] (
[TransportedDate] date NOT NULL,
[FacilityId] nvarchar(255) NOT NULL,
[ParcelId] nvarchar(255) NOT NULL,
[TransportedAt] datetimeoffset NOT NULL,
[LocationFrom] nvarchar(max) NOT NULL,
[LocationTo] nvarchar(max) NOT NULL,
[TimeSpentSeconds] bigint NOT NULL,
[DeviceId] nvarchar(max) NOT NULL,
[TransportId] nvarchar(max) NOT NULL,
CONSTRAINT [PK_Transports] PRIMARY KEY ([TransportedDate], [FacilityId], [ParcelId])
);
You can find the reference implementation in sln/AzureFunction
.
Note that it is definitely not production-ready for many reasons (missing error-handling, validations, observability). It should rather serve as a minimal example on how to glue the Azure resources and code together.
Powershell
$body = @{
locationFrom="a";
locationTo="b";
transportDurationSec=30;
parcelId="1";
transportedAt="2022-04-05T15:01:02Z";
deviceId="sorter-123";
facilityId="facility-123";
transportId="t-4156";
} | ConvertTo-Json
Invoke-WebRequest -Uri <event-consumer-uri> -Method Post -Body $body -ContentType "application/json"
cURL
curl -X POST -H "Content-Type: application/json" \
-d '{"parcelId": "12345","facilityId": "prague","transportedAt": "2022-04-05T15:01:02Z", "locationFrom": "in-25", "locationTo": "out-35", "transportDurationSec": 50, "deviceId": "sorter-1654345", "transportId": "t-4156"
}' \
<URI>
Powershell
Invoke-WebRequest -Uri "https://<name-of-the-functionapp>.azurewebsites.net/api/getdailystatistics?code=/<func_code>&date=2022-04-05"
cUrl
curl "https://<name-of-the-functionapp>.azurewebsites.net/api/getdailystatistics?code=/<func_code>&date=2022-04-05"
Powershell
Invoke-WebRequest -Uri "https://<name-of-the-functionapp>.azurewebsites.net/api/gettransport?code=/<func_code>&date=2022-04-05&facilityId=prague&parcelId=123"
cUrl
curl "https://<name-of-the-functionapp>.azurewebsites.net/api/gettransport?code=/<func_code>&date=2022-04-05&facilityId=prague&parcelId=123"
Log output of each function can be read via Portal -> Function App <name-of-the-functionapp>
-> Functions -> Select the Function -> Monitor -> Logs tab
NOTE: Only http triggered function can be tested locally.
Add the connection string from arm deployment to local.settings.json
. They will be accessible to the function as enviromental variables and also automatically loaded as configuration.
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "UseDevelopmentStorage=true",
"FUNCTIONS_WORKER_RUNTIME": "dotnet-isolated",
// Add this:
"TransportsDbConnectionString": "<the-connection-string-from-arm>"
}
}
Then navigate to <project-root>/sln/AzureFunctions
and run
func start
Then use the requests from the section Test.
Open Azure Portal in you browser.
Find the SQL Database resource.
Go to Query Editor in the left panel.
Login using the admin credentials.
Find the table: Tables -> dbo.Transports.
Right-click and select "Select Top 1000 Rows".