Introduction to data modeling part 2

4 min readSep 26, 2022

You can check part one here

What is Key Pints about Data Modeling:

1-Data Organizations:

§ The organization of the data for your applications is extremely important and makes everyone’s life easier.

2-Use Cases:

Having a well thought out and organized data model is critical to how that data can later be used. Queries that could have been straightforward and simple might become complicated queries if data modeling isn’t well thought out.

3-Starting early:

§ Thinking and planning ahead will help you be successful. This is not something you want to leave until the last minute.

4-Iterative Process:

§ Data modeling is not a fixed process. It is iterative as new requirements and data are introduced. Having flexibility will help as new information becomes available.

Example of Way Data Modeling is Important:

Let’s tale an example from Udacity. Here, a Udacity data engineer would help structure the data so it can be used by different people within Udacity for further analysis and also shared with the learner on the website. For instance, Wen we want to track the student’s progress with a Nanodegree program, we want to aggregate data across students and projects within a Nanodegree. In a relational database, this requires the data to be structured in ways that each student’s data is tracked across all Nanodegree programs that s/he has ever enrolled in. The data also needs to track the students’ progress with each of those Nanodegree programs. The data model is critical for accurately representing each data object. For instance, a data table would track a student’s progress on project submissions, i.e., whether they passed or failed a specific rubric requirement. Furthermore, the data model should ensure that a student’s progress is updated and aggregated to provide an indicator of whether the student passed all the rubric requirements and successfully finished the project. Data modeling is critical to track all of these pieces of data, so the tables are speaking to each other, updating the tables correctly and meeting defined rules.

Who Should focus on learning data modeling?

Everyone who deals with data such as Data Scientist, Data engineers and Software Engineers.

Introduction To Relational Databases?

o The relational model and the relational database were invented by IBM and Edgar Codd in the late 60’s early 70’s. The software system used to maintain a relational database is referred to as an EDBMS which stands for the relational database management system.

Example of EDMS include:

Oracle
Teradata
MySQL
PostgreSQL
Microsoft SQL Server

Advantages of Using a Relational Database

Flexibility for writing in SQL queries.
Modeling the data not modeling queries
Ability to do JOINS
Ability to do aggregations and analytics
Secondary indexes available:

You have the advantage of being able to add another index to help with quick searching.

Smaller Data volumes:

If you have a smaller data volume (and not big data) you can use a relational database for its simplicity.

ACID Transactions:

Allows you to meet a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, and thus maintain data integrity.

Easier to change to business requirements.

· ACID Transactions

Properties of database transactions intended to guarantee validity even in the event of errors or power failures.

Atomicity:

The whole transaction is processed, or nothing is processed. A commonly cited example of an atomic transaction is money transactions between two bank accounts. The Transaction of transferring money from one account to the other is make up of two operations. First, you have to withdraw money in one account, and second you have to save the withdrawn money to the second account. An atomic transaction, i.e., This Ensures that if either of those two operations (withdrawing money from the 1st account or saving the money to the 2nd account) fail, the money is neither lost not created.

Consistency:

Only Transactions that abide by constraints and rules are written into the database, otherwise the database keeps the previous state. The data should be correct all rows and tables.

Isolation:

Transactions are processed independently and securely; order does not matter. A low level of isolation enables many users to access the data simultaneously, however this also increases the possibilities of concurrency effects (e.g., lost updates). On the other hand, a high level of isolation reduces these chances of concurrency effects, but also uses more system resources and transactions blocking each other.

Durability:

Completed transactions are saved to database even in cases of system failure. A commonly cited example includes tracking flight seat bookings. So once the flight booking records a confirmed seat booking, the seat remains booked even if a system failure occurs.

When not to use a relational database?

Have large amount of data:

Relational Databases are not distributed databases and because of this they can only scale vertically by adding more storage in the machine itself. You are limited by how much you can scale and how much data you can store on one machine. You cannot add more machines like you can in NoSQL databases.

Need to be able to store different data type formats:

Relational databases are not designed to handle unstructured data.

Need high throughput / Fast reads:

While ACID transactions bring benefits, they also slow down the process of reading and writing data. If you need very fast reads and writes, using a relational database may not suit your needs.

Need a flexible schema:

Flexible schema can allow for columns to be added that do not have to be used by every row, saving disk space.

Need High availability:

The act that relational databases are not distributed (and even when they are, they have a coordinator/worker architecture), they have a single point of failure. When that database goes down, a fail-over to a backup system occurs and takes time.

Need high availability:

The fact that relational databases are not distributed (and even when they are, they have a coordinator/worker architecture), they have a single point of failure. When that database goes down, a fail-over to a backup system occurs and takes time.

Need horizontal scalability:

Horizontal scalability is the ability to add more machines or nodes to a system to increase performance and space for data.

· Note

Relational Databases are not traditionally Horizontally salable.