Vishal SQL Blog: Building Fast-Performing Database Models

Performance tuning of different database model types depends solely on what the database is servicing, in terms of applications connected to that database.Different database model types are tuned in different ways. In general, a database model can be tuned based on what its dependant applications require. It comes down to what the end-users need. The two extreme ends of the scale are the OLTP database model and the data warehouse database model.

Factors Affecting OLTP Database Model Tuning
An OLTP database services the Internet. The primary characteristics of OLTP databases are as follows:

Large user population — OLTP databases have an immeasurably large user population, all trying to get at the same information at once.

Very high concurrency — Concurrency implies a very high degree of sharing of the same information.

Large database size — OLTP databases have small to large databases, depending on application type and user population. A large globally available online book retailer might have a multitude of servers all over the world. A site advertising local night spots for only a single city, in a specific country, has local appeal and, thus, potentially far less information.

Reaction time — Real-time, instantaneous reaction to database changes and activities are essential. If you withdraw cash from an ATM at your bank and then check your statement online in an hour or so, you would expect to see the transaction. Similarly, if you purchase something online, you would hope to see the transaction on your credit card account within minutes, if not seconds.

Small transactions — Users retrieve single records or very small joins.

Granularity — Many OLTP database models are highly normalized structures, but this is often a mistake. OLTP databases allow access to small chunks of data; however, the problem is that sometimes those small chunks of data can actually equate to large multiple table joins caused by excessive normalization. If a table structure is normalized to the point of catering for all business rules in the table structure, performance problems may well arise, even for users seeking to view 10 to 20 records on a single screen. A prime example of this is a user logging onto a bank account a getting bank statement. If all the information on a single sheet of paper (a short Web page) is in a multitude of tables, that user could become seriously irritated with all the data glued together (if it takes more than seven seconds for a response). Thousands of other users could be accessing the same data at the same time.

Manageability — This is usually possible but quite often difficult. OLTP database user populations are generally globally based, round the clock and 365 days a year. This can make managing an OLTP database complex and awkward.

Service window — As already stated, OLTP databases must be alert, awake, and ready for use permanently. This is an ideal, but many service providers sell themselves based on the ability to provide availability at slightly less than 100 percent. Less than 100 percent service time allows for small servicing windows of time.

Factors Affecting Client-Server Database Model Tuning
There are plenty of client-server environments servicing small numbers of users in the range of tens of users or even less. The primary characteristics of client-server databases are as follows:

Small user population — A company can be small or large, on local-or wide-area networks. Predicting and measuring internal company use is much easier than trying to cater to OLTP database capacity requirements.

Low level of concurrency — Company-wide client-server databases have measurable user populations. These populations can be extremely small or relatively large, but it is a quantifiable service requirement because of being a measurable user population. OLTP database requirements are actually quantifiable; however, for OLTP databases, user populations are immeasurably larger, but OLTP database use can often have sudden increases (or decreases), even occasional massive spikes (jumps in end-users). Client-server database concurrency levels are much more predictable than OLTP databases. Predictability implies the ability to prepare for and cater to application requirements more easily.

Database size — Client-server databases are usually small in size. Anything too large, and a client-server architecture simply won't be able to cope with requirements. One solution to over use of client-server architectures is extremely costly hardware. At that stage, costs can probably be reduced by implementing OLTP and data warehouse architectural approaches.

Reaction time — Client-server reaction times are generally acceptable as real-time for single record user interface actions, and perhaps minutes for reporting requirements.

Small and large transactions — Client-server environments combine both small and large transactions in the form of user interface connectivity to data, plus reporting needs, which are small enough to manage at the same time. This type of service is possible because both user population numbers and concurrency requirement levels are low.

Granularity — All items of data are often relatively small and table structures can be more mathematical in nature. Client-server databases can even incorporate large quantities of business rule structure into table structures by utilizing very high levels of normalization, beyond 3NFs.

Note Once again application of high-level normalization is, in my opinion, often more mathematical than practical. Let applications do the number crunching and leave the database to store the data. Don't put too much processing into the database. It is quite possible, but can become very complicated to manage, change, and administer. Modern application SDKs are more than capable of intense processing and number crunching. The purpose of a relational database is to store and apply structure to data. Object databases manage processing inside database objects well. Relational databases do not!

Manageability — Data is fairly easily manageable not only because parameters are small and quantifiable but also because everyone goes home at night, giving plenty of down time for maintenance.

Service window — See this same explanation in the previous section, "Factors Affecting OLTP Database Model Tuning."

Factors Affecting Data Warehouse Database Model Tuning
Data warehouses are all about seriously large amounts of data and a very few — often very technically challenging — application environments:

Minimal user population — Administrators, developers, and analytical-type end-users typically access data warehouses. Those analytical end-users are usually knowledgeable and executive or middle-management level. One of the primary purposes of storing lots and lots of old data in a data warehouse is to help with forecasting for the future. This type of user population finds this type of information extremely useful.

Very low concurrency — There is very little data sharing in a data warehouse. Most activity is read-only, or bulk updates to fact tables, when the database is not being used for reporting and analysis. Concurrency is not really an issue.

Frightening database size — Data warehouses can become incredibly large. Administrators and developers must decide how much detail to retain, when to remove data, when to summarize, and what to summarize. A lot of these decisions are done during production when the data ware-house is in use. It is very difficult to predict what will be needed in design and development phases. Ad-hoc queries can cause serious problems if a data warehouse is very large. User education in relation to how to code proper joins may be essential; otherwise, provision of efficiency providing structures such as pre-built joins and aggregations in materialized views can help.

Note Materialized views copy data, allowing access to physical copies of data and avoiding underlying table access, expensive joins, and aggregations. A relational database allowing use of materialized views uses something called query rewrite. Query rewrite is where requested access to a table in a query, is potentially replaced with access to a much smaller, and more efficient materialized view. I/O and processing activity are substantially reduced. Query performance is helped enormously.

Reaction time — Data warehouse reaction times are acceptable as hours and perhaps even longer. Reaction times depend on various factors, such as data warehouse database physical size, complexity of end-user reporting and analytical requests, granularity of data, and general end-user understanding of the scale of data warehouses.

Incredibly large transactions — Users retrieve large amounts of data, using both simple reporting and highly complex analytical techniques. The fewer tables in joins, the better. Updates are best performed periodically in large batch operations.

Very low granularity — A star schema is the best route to adopt for a data warehouse because it minimizes on the potential numbers of tables in joins. A star schema contains a single large fact table connected to a single layer of very small, descriptive, static dimensional tables. Very small tables can be joined with a single very large table fairly efficiently. When joins involve more than one very large table, serious performance problems can arise.

Very demanding manageability — Because of their size, extremely large databases can become difficult to manage. The larger a database becomes, the more time and hardware resources needed to use and alter that data. Demanding manageability is gradually replaced with more sophisticated means of handling sheer database sized, such as hugely expensive hardware and special tricks (such as clustering, partitioning, parallel processing, and materialized views). Data warehouses are, more often than not, largely read-only structures. This gives far more flexibility, allowing for more available options to cope with a very demanding physical database size.

Service window — Data warehouse service windows are generally not an issue because end-user usage is driven by occasional bursts of furious I/O activity, but generally not constant usage as with an OLTP database. Most I/O activity is read-only. This, of course, depends on the real-time capability of a data warehouse. Real-time reporting requirements in a data warehouse complicate everything substantially, requiring constant real-time updating.

Note One way to alleviate performance issues with data warehouses is the use of data marts. A data mart is a subsection of a larger single data warehouse. A large data warehouse can consist of a number of very large fact tables, linked to the same dimensions. A data mart can be pictured as a single large fact table (perhaps one or two fact table star schemas) linked to a single set of dimensions.

Understanding Database Model Tuning
The biggest problem with database model tuning is that it really must be done during the design phase, and preferably before any development is complete. This is the case as far as tables and their inter-relationships are concerned. Data warehouses are largely read-only and are not as restrictive with production-phase changes. Data warehouses are mostly read-only type environments. Read-only environments can take advantage of specialized database structures, which overlay, duplicate, and summarize data in tables. Materialized views are used extensively in data warehouses and even some OLTP databases. A materialized view allows for copying of table data, either as individual tables or joins. The result is a physical copy of data. Queries then execute against the materialized view copy, which is built based on the requirements of a single query or a group of queries. The result is better performance.

Tuning a database model is the most difficult and expensive option because SQL code depends on the structure of the underlying database model; extensive application code changes can result. The database model underpins and supports everything else. Changes to a database model can cause major application changes, obviously applying after development of application code. The point is that database model tuning changes (such as changes to underlying tables) can affect everything else. Changing everything from database model up is very expensive because everything is dependent on the database model. Everything must be changed. This is why it is so important to get the database model correct before development begins. Unfortunately, we don't live in an ideal world, but we can strive for it. Big changes to database model table structure can often result in what amounts to full rewrites of application software.

An effective way to performance-tune a database model after development is complete, is the creation of alternate indexing. Stored procedures can also help by compartmentalizing, speeding up and organizing what already exists.

When it comes to database model tuning, at the worst and most expensive end of the scale are normalization, denormalization, changes to referential integrity and table structure, and anything else that changes the basic table structure. At best, and with minimal intrusion on existing tables and relationships, alternate indexing, materialized views, clustering, and other such tricks, can help to enhance a database model, without messing around with critical underlying table structure. Database objects such as materialized views and clustering help to circumvent table changes by creating copies and overlays of existing table structures, without affecting those existing tables, and obviously avoiding changes to any dependent application coding already written, tested, debugged, and in general use in a production environment. The down side to overlaying and copying is that there is a limit to how many things such as materialized views that can be created. Too much can hinder rather than help performance.

So, now you know why OLTP databases need less granularity, some denormalization, and small quantities of data. The same applies with the other extreme in that data warehouses need highly denormalized (simple) table structures to minimize table numbers in join queries, thus not severely impeding data warehouse reporting performance.

Vishal SQL Blog

Sunday, September 23, 2007

Building Fast-Performing Database Models

No comments:

Blog Archive

About Me

Links