Organizations produce, consume, and store large amounts of data. Streaming is data that is continuously generated by various sources. It is processed incrementally and analyzed in near real-time. This type of data gives businesses a live view of all aspects of the organization. It enables business leaders to quickly make decisions and respond to issues faster than other forms of data pre-processing.
What is Data Engineering?
Data Engineering is the process of taking raw data sets and creating the infrastructure required to transform them into data that is meaningful and store it so that it is accessible in a usable format. This involves sourcing (databases, data warehouses, application programming interfaces, etc.), moving (source to target), storing, securing, and governing data (quality).
What is Public Cloud?
The public cloud is defined as computing services offered by third-party providers over the public internet, making them available to anyone who wants to use or purchase them. They may be free or sold on-demand, allowing customers to pay only per usage for the CPU cycles, storage, or bandwidth they consume. Azure is a Microsoft Public Cloud Platform with three main capabilities: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).
What is an Analytical Data Flow?
Dataflows that load data into analytical entities are categorized as an analytical dataflow.
Assumptions
While it is essentially free to upload data into the cloud and relatively inexpensive to store, it can be costly to extract data from. A common theme in Data Management is to hit an application or data source once, grab everything, and store it for future use. This approach lends itself well to cloud computing because it has a large storage capacity and reduces hardware maintenance costs that businesses incur with on-prem data centers. Further, organizations need to protect their data against hackers and data breaches. Beyond the loss of data, these can be costly to resolve and negatively impact the organization’s brand and customer experience. Therefore, security should be part of every data decision you make.
Let’s get started!
Log in to your azure portal at https://portal.azure.com/
Set up a Resource Group. This is a logical grouping of resources. For an Enterprise these will be broken out by department, line of business, or subject matter.
Then you want to create a Storage Account for the data to be stored as a Binary Large Object in Azure Blob. Azure Blob (and Azure Data Lake Storage) is like a Hadoop Distributed File System (HDFS) managing large data sets running on commodity hardware within Azure.
Note:Â Tie the Storage Account to the Resource Group and Region. It is important to be consistent on the Region because you will be charged a penalty for downloading data across locations. As always, choose your performance based on the data needs of your organization.
Azure Blob Storage
Next, create an Event Hub Namespace. This is a logical grouping of Event Hubs. Create your Event Hub within the namespace. This is your message broker that ingests streaming data and feeds it to Azure.
Azure Event Hub
Now that your Event Hub is set up, you want to authenticate the connection strings and SAS access policies. This governs the access to manage, send, and listen to a specific Event Hub.
Connection
Azure Stream Analytics allows you to run queries on a live stream. A Stream Analytics job consists of three main parts: an input, output, and query which defines how Stream Analytics will process the data.
Azure Stream Analytics
Now that we’ve established that you can send messages through Event Hub and store them in containers, the Stream Analytics job will take messages from Event Hub and put them in your desired output, which in this case is a Power BI data set that will be consumed in Power BI Pro to show updates in real time.
Power BI
Now that we’ve established that you can send messages through Event Hub and store them in containers, the Stream Analytics job will take messages from Event Hub and put them in your desired output, which in this case is a Power BI data set that will be consumed in Power BI Pro to show updates in real time.
What About the Data?
I used an API from CryptoCompare.com to see current pricing for Bitcoin (BTC), Ethereum (ETH), and Litecoin (LTC). I executed a python application to pull the data from CryptoCompare and push it to Azure Event Hub using the connection string from the shared access policy.
How can I use Streaming Data in my Organization?
Previously restricted to Media and Finance, data streaming is now available across all industries. Streaming connects decision makers with data, providing them with the insights they need to take action. For instance, streaming of coal reclaimer idle time or aircraft sensor data could enable leaders to forecast maintenance more proactively and efficiently, thereby minimizing downtime and revenue impact.
What are you waiting for? Get started with Azure today!
Have more Power BI questions? Our team of Power BI experts are here to help!