Architecture for High-Throughput Low-Latency Big Data Pipeline on Cloud
Excerpts from the article "Scalable Efficient Big Data Pipeline Architecture" by Satish Chandra Gupta
The big data pipeline is the railroad on which the heavy wagons of ML run.
A data pipeline stitches together the end-to-end operation consisting of collecting the data, transforming it into insights, training a model, delivering insights, applying the model whenever and wherever the action needs to be taken to achieve the business goal.
There are 5 stages in the big data pipeline:
🔹 Collect - Collect data from internal & external sources🔹 Ingest - Ingest data through batch jobs and streams
🔹 Store - Store in Data Lake and/or Warehouse
🔹 Compute - Compute analytics aggregations and ML features
🔹 Use - Use it in dashboards, data science, and ML
Comments
Post a Comment