There are several different kinds of database representations. An Entity Relationship Diagram or a ER model, an Enhanced Entity Relationship Diagram EER model, Object Relational Model etc. ER Model is the most universally followed database convention at least in my experience with software application development. Any enterprise application today would surely have a database connected to it, so I think that understanding database design and to be able to design databases is a “must have” for Software developers personnel. In this article I would walk through some of the essentials to give you the basic idea by giving you an example. Once you know how it’s done in a “Non Textbook” way I guess reading the text book or some advanced tutorial would be very easy and interesting.
The First Step in ER modeling:
The first step in designing the database is to know what are the business objects or so called “Entities”. How to determine what the entities are? The textbook says Read the problem statement or the scenario and try to identify nouns. From a collection of nouns it would be much clearer to determine what the entities are in your system. No doubt it’s a text book approach. I have had very few instances where I had a problem statement and I could do a noun analysis for an object oriented or the database design. So try to think in terms of the business and ask yourself why? This is called the “ROOT CAUSE ANALYSIS”. Ask yourself the question Why the customer wants to develop this system, who would be using the system, what features should be provided and most importantly what data needs to be stored?.
E.g. If someone wants to develop an online forum and you have to design the database for them, try to think what a forum does and generate a problem statement. The requirement would be vaguely “A discussion forum has many categories. In each category there are many topics and each topic may have many messages by different users. Old messages (let’s say 10 days old) are deleted automatically by the system. User can post new messages and edit their message” what would the database design be?It is evident with the problem statement that you need to keep track of categories, messages (post) and user. An entity in the system might be a strong or a weak entity. A Weak entity is one which cannot exist on its own in a system and requires a relationship with a strong entity (the one with a primary key) to identify its records. Such an entity is described by a double lined rectangle in the model. What attributes to store? What is a primary key? What is a partial Key? What are candidate keys? What is a super key? How to select them: Simply ask yourself what is the information that the system should store for a particular entity, importantly how that information would be identified at a later time? Is there a way, an attribute which uniquely identifies any record/row/ tuple in the table? What are some of the other important attributes the system has to store?
The answer to these questions would be very important to you in terms of database design. Primarily while resolving the ER model you would be converting all your entities to tables so what are the columns of that table. They are the characteristics of the entity. e.g. In our scenario what are the important attributes of a User (in this example we only require a loginid and a password but in a slightly complex scenario like a university registration system there might be many attributes like address , phone number , date of birth , previous college , social security etc Similarly what is the data about the categories we want to store (I guess the category name , Description , we would like to know the number of messages in the category and the topic with maximum posts and stuff like that but lets not talk about that here)The set of attributes which can together uniquely identify an entity instance constitute the Super key. The minimum set of attributes (subset of super key) which can uniquely identify a record in the database is called the candidate key. There can be more than one candidate key for the entity e.g. A Student number and the Social security number we use only one as the primary key of the table and the candidate key which we use to uniquely identify any record in the system is called the Primary Key ( The programmer assigned candidate key). Note in case of weak entities the identifying attribute is called the partial key instead of the primary key.
Step 2 : What are relationships and why? What is cardinality and multiplicity?
Once you have identified the entities, all of them think of the relationships and the cardinalities or multiplicity of the relationship. For the sake of simplicity check each of the entity against all others in order to clarify the design. Start with the User. How is the user related to Categories? A user can create a new category? If yes then we have a relationship. Then there are 2 things that need to be answered.1. Can a Single User create more than 1 category2. Can a Single Category be created by more than one user These two questions are the key to determine the cardinalities which would indicate how we resolve the database design in terms of tables.The relationship can have the following cardinalities:
· 1 to 1· 1 to Many· Many to 1· Many to Many
Similarly then compare if the user and topic has a relationship. That would be pretty similar to the first one. Now also compare the user with all the other entities.Note that relationship may also have attributes like the date/time when the message was posted etc.Relationships are indicated as diamonds in the design. A relationship with one of the participating entities is weak is called a weak relationship and is indicated with a double lined diamond. Resolving Relationship in the form of physical Database design:It’s a Science not an art J So you can generate the database physical design by pushing a button given the logical design , but the logical design is not a science its an art , thankfully otherwise we would be out of business JJ.
For 1 to 1 and many to many relationships create a new table for the relationship. Include the primary key of each participating entry. The union of all the primary keys which are referenced as foreign keys can uniquely identify any record in the table. So if the cardinality of both the participating entries is the same make a separate table. However if it is not take the primary key of the entity where cardinality is 1 and include it in the table where it is many as a foreign key. This is also called an identifying relationship sometimes. I would talk a bit about the enhanced ER in my next article; hope this would help you in learning the basics of database design.Some good database design tools are Dezign for databases by datanamic, Microsoft Visio, ERWin, Rational Rose and perhaps some other free design tool. If you have any queries, suggestions or feedback I would really appreciate it.This is a sample DB Design for a STUDENT GRADING SYSTEM which is a solved exercise from “Fundamentals of Database Systems” from ELmasri/ Navathe Question 16 and Question 18
Here is Question 18 where a new structure is to be designed that holds the previous college info , degrees and courses taken at another college:
Note: If I missed an important design concept you were looking for or is there anything you might wanna discuss ? Was this helpful to you ? I would appreciate your comments . Thanks