Azure Synapse Analytics - Highlights

Paraphrased notes from Microsoft documentation:

Azure Synapse Analytics is an integrated cloud-based platform for big data processing and analysis.

You can use it to build descriptive, diagnostic, predictive, and prescriptive analytics solutions.

Azure Synapse Analytics combines a centralized service for data storage and processing with an extensible architecture through which linked services enable you to integrate commonly used data stores, processing platforms, and visualization tools.

A Synapse Analytics workspace defines an instance of the Synapse Analytics service in which you can manage the services and data resources needed for your analytics solution. 

After creating a Synapse Analytics workspace, you can manage the services in it and perform data analytics tasks with them by using Synapse Studio; a web-based portal for Azure Synapse Analytics.

One of the core resources in a Synapse Analytics workspace is a data lake, in  which data files can be stored and processed at scale. A workspace typically has a default data lake, which is implemented as a linked service to an Azure Data Lake Storage Gen2 container. 

The Azure Synapse SQL system uses a distributed query processing model to parallelize SQL operations, resulting in a highly scalable solution for relational data processing. Azure Synapse Analytics supports SQL-based data querying and manipulation through two kinds of SQL pool that are based on the SQL Server relational database engine:

  • A built-in serverless pool that is optimized for using relational SQL semantics to query file based data in a data lake.
  • Custom dedicated SQL pools that host relational data warehouses.

Apache Spark is an open source platform for big data analytics. In Azure Synapse Analytics, you can create one or more Spark pools and use interactive notebooks to combine code and notes as you build solutions for data analytics, machine learning, and data visualization.

A notebook enables you to interactively run Python code in an Apache Spark pool and embed notes using Markdown.

Azure Synapse Analytics includes built-in support for creating, running, and managing pipelines that orchestrate the activities necessary to retrieve data from a range of sources, transform the data as required, and load the resulting transformed data into an analytical store. Pipelines in Azure Synapse Analytics are based on the same underlying technology as Azure Data Factory

Azure Synapse Data Explorer is a data processing engine in Azure Synapse Analytics that is based on the Azure Data Explorer service. Data Explorer uses an intuitive query syntax named Kusto Query Language (KQL) to enable high performance, low-latency analysis of batch and streaming data.

Azure Synapse Analytics can be integrated with other Azure data services for end-to-end analytics solutions like:

  • Azure Synapse Link 
  • Microsoft Power BI 
  • Microsoft Purview 
  • Azure Machine Learning
Common use cases for Azure Synapse Analytics are:
  • Large-scale data warehousing
  • Advanced analytics
  • Data exploration and discovery
  • Real time analytics
  • Data integration
  • Integrated analytics

Comments