Compared: Azure Data Lake Store and Azure Blob Storage
As part of my learning, I keep looking out for info presented through tables & comparison charts as they summarize lengthy topics & are useful to review what I learn. I post them with the tag ComparisonChart to revisit occasionally.
Azure Data Lake Store | Azure Blob Storage | |
---|---|---|
Purpose | Optimized storage for big data analytics workloads | General purpose object store for a wide variety of storage scenarios |
Use Cases | Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets | Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data |
Key Concepts | Data Lake Store account contains folders, which in turn contains data stored as files | Storage account has containers, which in turn has data in the form of blobs |
Structure | Hierarchical file system | Object store with flat namespace |
API | REST API over HTTPS | REST API over HTTP/HTTPS |
Server-side API | WebHDFS-compatible REST API | Azure Blob Storage REST API |
Hadoop File System Client | Yes | Yes |
Data Operations - Authentication | Based on Azure Active Directory Identities | Based on shared secrets - Account Access Keys and Shared Access Signature Keys. |
Data Operations - Authentication Protocol | OAuth 2.0. Calls must contain a valid JWT (JSON Web Token) issued by Azure Active Directory | Hash-based Message Authentication Code (HMAC) . Calls must contain a Base64-encoded SHA-256 hash over a part of the HTTP request. |
Data Operations - Authorization | POSIX Access Control Lists (ACLs). ACLs based on Azure Active Directory Identities can be set file and folder level. | For account-level authorization – Use Account Access Keys For account, container, or blob authorization - Use Shared Access Signature Keys |
Data Operations - Auditing | Available. See here for information. | Available |
Encryption data at rest | Transparent, Server side
|
|
Management operations (e.g. Account Create) | Role-based access control (RBAC) provided by Azure for account management | Role-based access control (RBAC) provided by Azure for account management |
Developer SDKs | .NET, Java, Python, Node.js | .Net, Java, Python, Node.js, C++, Ruby |
Analytics Workload Performance | Optimized performance for parallel analytics workloads. High Throughput and IOPS. | Not optimized for analytics workloads |
Size limits | No limits on account sizes, file sizes or number of files | Specific limits documented here |
Geo-redundancy | Locally-redundant (multiple copies of data in one Azure region) | Locally redundant (LRS), globally redundant (GRS), read-access globally redundant (RA-GRS). See here for more information |
Service state | Generally available | Generally available |
Regional availability | See here | See here |
Price | See Pricing | See Pricing |
Comments
Post a Comment