Compared: Azure Data Lake Store and Azure Blob Storage

As part of my learning, I keep looking out for info presented through tables & comparison charts as they summarize lengthy topics & are useful to review what I learn. I post them with the tag ComparisonChart to revisit occasionally.

Azure Data Lake Store Azure Blob Storage
Purpose Optimized storage for big data analytics workloads General purpose object store for a wide variety of storage scenarios
Use Cases Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data
Key Concepts Data Lake Store account contains folders, which in turn contains data stored as files Storage account has containers, which in turn has data in the form of blobs
Structure Hierarchical file system Object store with flat namespace
API REST API over HTTPS REST API over HTTP/HTTPS
Server-side API WebHDFS-compatible REST API Azure Blob Storage REST API
Hadoop File System Client Yes Yes
Data Operations - Authentication Based on Azure Active Directory Identities Based on shared secrets - Account Access Keys and Shared Access Signature Keys.
Data Operations - Authentication Protocol OAuth 2.0. Calls must contain a valid JWT (JSON Web Token) issued by Azure Active Directory Hash-based Message Authentication Code (HMAC) . Calls must contain a Base64-encoded SHA-256 hash over a part of the HTTP request.
Data Operations - Authorization POSIX Access Control Lists (ACLs). ACLs based on Azure Active Directory Identities can be set file and folder level. For account-level authorization – Use Account Access Keys
For account, container, or blob authorization - Use Shared Access Signature Keys
Data Operations - Auditing Available. See here for information. Available
Encryption data at rest Transparent, Server side
  • With service-managed keys
  • With customer-managed keys in Azure KeyVault
  • Transparent, Server side
    • With service-managed keys
    • With customer-managed keys in Azure KeyVault (coming soon)
  • Client-side encryption
Management operations (e.g. Account Create) Role-based access control (RBAC) provided by Azure for account management Role-based access control (RBAC) provided by Azure for account management
Developer SDKs .NET, Java, Python, Node.js .Net, Java, Python, Node.js, C++, Ruby
Analytics Workload Performance Optimized performance for parallel analytics workloads. High Throughput and IOPS. Not optimized for analytics workloads
Size limits No limits on account sizes, file sizes or number of files Specific limits documented here
Geo-redundancy Locally-redundant (multiple copies of data in one Azure region) Locally redundant (LRS), globally redundant (GRS), read-access globally redundant (RA-GRS). See here for more information
Service state Generally available Generally available
Regional availability See here See here
Price See Pricing See Pricing

Comments

Popular posts from this blog

Maven Crash Course - Learn Power Query, Power Pivot & DAX in 15 Minutes

"Data Prep & Exploratory Data Analysis" course by Maven Analytics

Oracle Cloud Infrastructure 2024 Generative AI Professional Course & Certification Exam (1Z0-1127-24)