Data Loading and Storage
Frameworks
| Name / Framework | What It Is (Summary) | Further Reading |
|---|---|---|
| Kimball Dimensional Modeling | Bottom-up warehouse modeling using facts & dimensions; optimized for analytics. | https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/ |
| Inmon Corporate Information Factory | Top-down enterprise data warehouse architecture emphasizing centralized data. | https://en.wikipedia.org/wiki/Bill_Inmon |
| Star Schema / Snowflake Schema | Schemas used in data warehouses to structure analytical data efficiently. | https://www.databricks.com/glossary/star-schema |
| OLTP vs OLAP | OLTP for transactions; OLAP for analytics. Defines database use specialization. | https://www.ibm.com/cloud/learn/olap-vs-oltp |
| CAP Theorem | States that distributed systems can only guarantee two of: Consistency, Availability, Partition Tolerance. | https://en.wikipedia.org/wiki/CAP_theorem |
| ACID vs BASE Models | ACID ensures reliability; BASE provides scalability and eventual consistency. | https://www.geeksforgeeks.org/difference-between-acid-and-base-in-dbms/ |
| Data Lakes vs Lakehouse Architecture | Lakehouses combine data lake flexibility with warehouse reliability. | https://www.databricks.com/discover/pages/lakehouse |
| Lambda Architecture | Hybrid batch + real-time processing pipeline design. | https://lambda-architecture.net/ |
| Kappa Architecture | Stream-only architecture simplifying Lambda by removing batch layer. | https://martin.kleppmann.com/2015/01/29/kappa-architecture.html |
| Hot / Warm / Cold Data Tiering | Strategy for storing frequently accessed data vs archival data efficiently. | https://www.ibm.com/think/topics/data-tiering |
| Sharding & Partitioning Strategies | Breaks data into distributed parts to improve performance and scalability. | https://www.mongodb.com/docs/manual/sharding/ |
| CDC (Change Data Capture) | Tracks and streams data changes in real-time for sync and analytics. | https://debezium.io/documentation/ |
| Slowly Changing Dimensions (SCD Types 0–6) | Methods for tracking changes in dimensional data over time. | https://www.kimballgroup.com/2003/02/design-tip-6-slowly-changing-dimensions/ |
| Message Queue & Stream Processing (Kafka, Kinesis, Pulsar) | Real-time data streaming pipelines. | https://kafka.apache.org/documentation/ |
