A Case Study of a Web-Based Online Analytical Processing System

Rumpi Gravenstein                                                        Michael A. Rife

Rumken Inc.                                                                 The Goodyear Tire & Rubber Company

Jennifer Croy                                                                Gerhardt Martin

Rumken Inc.                                                                 The Goodyear Tire & Rubber Company

 

Topics of Discussion

·        Introduction

·        Existing Process

·        Application Vision

·        Project Goals

·        Project Opportunities

·        Application Design

·        Design Opportunity Impacts

·        Lessons Learned

·        Conclusion

Introduction

This is a case study of the development of a custom built, web-based On Line Analytical Processing (OLAP) system.  This OLAP system is designed to analyze returned goods information.  The project started with users requesting a number of enhancements to an existing system.  These requests eventually turned into a multi-year effort that continues today. This paper is a review of the key processes and business issues the project team faced. 

Background

The project is being developed for an engineering group responsible for identifying and managing technical problems in a large, multi-national company’s products.  Over the course of a year, products returned by customers can cost the company millions of dollars.  Therefore is of great interest to determine why they were returned.  The intention is to predict, within a few months of a product’s first return, the total number of returns to expect from that product’s build.  With an accurate prediction, a product deficiency can be eliminated early in its life cycle, saving the company money.

This project’s goal was to provide company product development engineers the facilities with which this type of analysis can efficiently be performed in a large multi-national setting.

Existing Process

The existing product analysis system was implemented with an Oracle database and a mish-mash of Oracle Forms, Reports, Graphs and C programs running in a client server environment.  The professional programming group managed the Oracle tools and C programming environment.  The product engineers used these professionally managed tools along with their own SAS and Excel reports.  The product engineers used their  SAS/Excel environment to perform an initial high level return analysis and then used the Oracle tools and C programs to perform drill down analysis in suspected problem areas. 

Each month new data from corporate systems is extracted, put into flat files and sent to be loaded into our Oracle product analysis database.  These extract files are then loaded into the Oracle database with a number of automated procedures.  Some are written in C/Pro*C; others use the Oracle SQL*Loader environment.  After the data is loaded, the development engineering group runs a SAS extract procedure that primed their system with that month’s new data.  From here, additional SAS/Excel reports are run and then, where warranted, the Oracle tools are used to drill further into the concern areas.

This environment was difficult to maintain. The difficulty started with two separate groups trying to maintain different portions of the same system.  Coordination between the groups was not always possible.  An example that illustrates one of the many problems we encountered is how data errors were handled.  When a product engineer noticed an obvious error, he would fix the SAS extract data file and then re-run the SAS/Excel reports.  This approach allowed the product engineers to find and fix data errors within a few hours.  The professional programmers could not fix the data in the Oracle database because they were working under a policy that stated that their data had to match the data provided by the corporate data systems.  They were therefore reluctant to ‘fix’ data without that data first being fixed and resent from the source corporate system.  The net effect of this policy was that the professional programmer maintained environment would take weeks or even months to reflect data corrections.  This delay was a source of continual friction. 

A number of other issues existed as well as the underlying problem was that two groups with totally different management directives were responsible for maintaining portions of the same system.  The engineers primarily needed to generate accurate reports on a strict schedule.  The programming group needed to accurately provide a mirror of the data housed in the corporate systems.  This project would eventually resolve this problem, but more on that later. 

Application Vision

Given the existing environment, the project team developed a new data application vision.  The vision started with the existing goal that all of the various corporate divisions would provide periodic data feeds to a central database. This database should house a number of globally accessible analysis tools and reports, which the product development engineers could use to monitor product performance.  This environment would provide a high level of security where users could only run reports on data for which they were responsible.  For instance, the European division would be restricted to running reports on products that were either produced or returned with European division processes.  Only a single group, responsible for generating global reports, should be given access to all the data.  Finally, to the extent that a user wanted to perform a division specific analysis not available in the global database, that user could extract the desired data and perform the analysis locally using local tools.  Figure 1 illustrates this general data vision.  In this figure the arrows between the various divisions and the database support three types of communications:  data uploads from the division to the central database, user communications that support running the various reports and tools, and division specific data extracts.

 


Figure 1.  Application Data Vision

 


The pre-existing system did not support this vision in two very important areas:

·         Client/Server environment was not readily accessible globally

·         Existing data security configuration did not adequately secure data

Project Goals

Given this application vision, a set of new project goals were defined to move the existing system to one that better met the vision.  The project goals were:

1.       Store global data:  The existing system stored global data but did not make it securely accessible to all users.

2.       Provide more standardized analysis tools/reports.  There were a large number of desired reports that had been requested and were waiting on implementation resources.

3.       Provide global system access.  The existing system was readily accessible at one site only.  Other sites used the system through a Citrix/Windows Terminal architecture that was slow, expensive to operate, and cumbersome.

4.       Increase report throughput.  All reports had been interactive, requiring users to wait for one report to finish before requesting the next.

5.       Better automate quarterly report generation.  The product engineering process required to generate quarterly reports could take as long as three to four weeks.

6.       Improve data accuracy.  Data sources were providing inaccurate data that would introduce outliers on reports.  This data needed to be fixed in order to perform accurate analysis.

7.       Coordinate programming support.  Programmer support was split into two areas that had different objectives.  This caused problems in that one group would make a change that the other group might not support for several weeks. The unstated goal was to move all programming support to the professional programmers.  As this goal is not a technical but rather an organizational goal, we will not discuss it further except to say that management has now moved all of the programming support to the professional programming group leaving the development engineers with more time to analyze the data.

 

These goals reflected not only the application vision but also general usability concerns.

Project Opportunities

The project goals were then reviewed and organized into the following technical challenges. 

Single threaded user reports.  The existing client/server environment required a user to wait for the current report to finish before requesting the next report.  This process was inefficient as no report queuing system existed.

Reports ran slow.  The existing system was slow.  A typical report might take 15-30 minutes to run.  The slowness was attributed to an overly normalized database design.

Presentations were difficult to develop.  Oracle Report and Pro*C reports were delivered in an either a flat text format or in a printout.  Once one of these reports were generated it was very difficult to electronically annotate the result or even include it in PowerPoint presentations.

Report selection criteria were difficult to use.  Report selection criteria were very lengthy.  Users wanted report criteria organized in the same way files and directories are organized in the Windows Explorer.

Limited report criterion.  The existing reporting environment was limited to 6 reporting criteria.  Additional criteria were desired.

Limited access.  The client/server configuration required to run the environment was only installed in one location.  Users in remote locations could not easily access the system.

Difficult version control environment.  The client/server system was difficult to maintain.  Version upgrades had to be coordinated with a number of Oracle and other application programs.  This coordination made version upgrades difficult.

Limited security.  The existing system did not include a robust security system.  Once a user had access, he had access to all of the data.

Data accuracy.  Some data was clearly wrong.  For instance there are products that are sold for $40 and recorded as a warranty credit for several thousand.  One of the issues faced was that bad data was first recognized during the generation of the quarterly reports.  Fixing data problems as the data is loaded should then significantly reduce the quarterly report generation time.

Application Design

The application design took a while to take shape.  Initially, the global nature of the application suggested some type of web design.  It was also clear that client/server designs would not work because they were too difficult to maintain.  In addition to these considerations, the user community was very familiar with spreadsheet technology.  Excel held considerable appeal because users could take a graph or spreadsheet, annotate it, and then place the result into a PowerPoint presentation or an e-mail.  If the new environment included Excel as the reporting tool, the development team hoped that they could leverage some of the existing Excel applications to reduce the conversion time from the old SAS/Excel reports to the new web Excel environment.  Given all of these issues, it was decided that the application environment needed to be accessible via standard web browsers and require a minimum of non-standard software.  These constraints led to the interactive application design shown in Figure 2.  By interactive we mean that a user requests a report and the browser waits for the results.  In this configuration the Oracle Web-Form requests the report and waits for it to finish.  A key issue in this configuration is that the user is required to have only two locally installed programs to run the application, a browser that can run Oracle web forms and Excel.  The RMDS box in Figure 2 refers to a Report Management Distribution System (RMDS).  This is the area where often-viewed reports should be stored so that they need not be repeatedly regenerated from the database. 

 

In Figure 2 data flows in the following way.  A user starts a browser that connects to an Oracle web form.  This communication occurs via an http communication link.  Once the web form is started, the web form starts a SQL session with the database.  This session is used to access a SQL report package which generates the desired report data into a working table.  From here the Oracle web form drops the data out of the working table into a flat file on the web server which is downloaded to the browser via the http browser link.  Once the data arrives at the user’s PC, the browser recognizes that the extension of the downloaded file is an Excel file extension and should be viewed from Excel.  The browser therefore starts Excel and loads the data into the Excel environment. The user now has several options, he can e-mail the result to a friend, print it out or forward it on to the RMDS environment.

 


Figure 2.  Interactive Application Configuration

Our application environment vision was split into two components, the interactive configuration just described and the batch configuration shown in Figure 3.  In the batch configuration, a user could request multiple reports with the system e-mailing the resulting Excel workbooks back to him.  Here again the user need only have a minimum of software installed locally: a web browser, an e-mail system, and Excel.  Alternatively, if the user just needs a printout, he need only have a browser that can run Oracle web forms.

In Figure 3 the data flow is as follows.   A user starts a browser that connects to an Oracle web form.  This communication occurs with an http link.  Once the web form is started, the web form starts a SQL session with the database.  This session then puts a record into the job queue table to queue up a report.  From here the Job Queue Manager takes the report request and schedules it to run at the desired time.  The report is run via the same report package that was used in the interactive environment.  The report package generates the report data into the work table.  Once the report package has finished generating the report, the job queue manager alerts the Excel environment that a report is waiting to be generated.  The Excel program finds the report, generates the associated workbook and then can do one of the following:  e-mail the workbook back to the requesting user, print the workbook out on the requested printer, or forward the workbook to the RMDS environment. 


Figure 3. Batch Application Configuration

This is the environment that we then implemented.  As users started using this environment they initially focused on running reports in the interactive mode, similar to how they had run reports in the past in their client/server system.  Gradually, they recognized the benefit of running reports in batch mode.  Today, they run nearly all of their reports in batch mode.

This configuration had the additional benefit that remote users on the company’s WAN now had direct access to the application environment.  The need for the Citrix terminal emulation software had been removed.  They also saw significant improvements over their terminal emulation response times.  Finally, users could log into the company WAN over a dial-in network and access the system from their homes and hotel rooms with a minimum of software (browser, Excel, and optionally an e-mail application) installed on their personal computers. 

The two key components that enabled this batch environment are the Job Queue manager and the NT/Excel batch server.  The Excel batch server environment is described in Paper 951: Spinning Web Reports from Oracle Databases with Excel.  We leave it to the reader to read this presentation for further details.  The job-queue manager coordinates running reports and starting the Excel batch server.  The job queue manager technology relies on the Oracle DBMS_LOCK package to protect critical codes sections and the DBMS_ALERT package to alert the constantly waiting Excel batch server to the arrival of a new report to run.  The code supports running multiple reports at the same time in different Oracle threads.  This takes advantage of the multiple processors on the Oracle database server.  In fact it is not uncommon for this application to fully utilize the Oracle RDBMS host hardware, a 6 processor Sun Enterprise server.

The vision for the new system also included a simplified loading environment.  In the existing system, data was loaded with C/Pro*C and SQL*Loader programs.  The C/Pro*C programs were deemed difficult to maintain.  In the interest of reducing the overall system complexity, it was decided that all load programs should use the Oracle SQL*Loader tool.  See Paper 421:  Replacing 3GL Data Loading with SQL*Loader and Database Triggers for more on the advantages of this approach and the details of how to implement it.

Design Opportunity Impacts

In this section we will review the various application opportunities and how the new application vision impacted them.

Single threaded reports.  The pre-existing system forced a user to finish a report before he could start the next.  The new application environment allows users to batch submit a number of reports and then check a queue screen to determine the reports’ progress.

 

Reports run slow.  The existing system had reports that could take an hour or more to run.  This was considered too slow.  The new application environment included a database redesign, which denormalized some of the key fact tables.  These denormalizations along with the introduction of a batch environment have resolved the concern. 

 

Presentations were difficult to develop.  The existing system reports were either printed from the Oracle Development tool environment (Developer Release 1) or saved as postscript files.  Neither output format lent itself to being easily editable.  Forget editing the data, in this environment it was even difficult to cut and paste report outputs into presentations.  The new application environment delivered all reports in Excel workbooks, which do not suffer from these problems.

 

Report selection criteria were difficult to use.  Very long lists of report selection criteria made the lists difficult to use effectively.  This, combined with the lack of organization within the criteria, made this issue a sore point with the user community.  The new application environment made extensive use of the Oracle Forms hierarchical tree structure tool.  Presenting selection criteria within an hierarchical tree simplified the selection of the correct criteria.  There is another presentation given at this conference (Paper 455:  Tips and Tricks for Developer Forms Hierarchical Tree Tools) that delves into the technical details of this approach.  Figure 4 is an example of a hierarchical data view.  This view illustrates an organizational tree in which the folks to the right report to those on their left.  However, the development team did not stop here.  A number of unique, division specific key codes were combined into universal codes that applied to all of the application’s data sources.  These universal codes could then be used to query the data in a much more straightforward manner.


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 4.  Hierarchical View Example

 

Limited report criterion.  The existing reporting environment supported just six report criteria.  The new reporting form used hierarchical selection criteria in a tabbed window environment to offer the user a number of new report parameters.  The report code supported this criteria selection complexity through the use of dynamically generated PL/SQL statements.

 

Limited access.  The client/server system access was limited to those sites that had installed a compatible version of the client tools or the Citrix windows terminal client.  No other site was able to duplicate our client configuration as the coordination of the client side tools proved to be very difficult.  The problem was that other sites used different versions of the Oracle client side tools and could not migrate these tool versions until other Oracle client dependent applications had been migrated as well.  The Windows terminal server configuration proved to be costly and out of favor with the technical computing organization.  The new application structure simplified the client side installation, reducing the client software requirements to a smaller, easier to coordinate set of tools: Oracle web form enabled browser an e-mail system and Excel, thus removing the need for remote sites to access a Windows terminal server.  This new configuration moved the majority of the configuration complexity from a client’s PC to a centrally maintained application web server and Oracle database platform. 

 

Difficult version control environment.  The client/server environment was very difficult to upgrade as coordination was required with other client/server applications running in the same environment.  The new application environment required a minimal client configuration that, because it was small and for the most part independent of the application, easy to upgrade and maintain.  The complex portions of the system were installed on a web server and an Oracle database server.  These hardware platforms were centrally located and therefore much easier to maintain.

 

Limited security.  A user that was given access in the prior system was given access to all of the data.  In the new application environment, the Oracle web form report requesting application restricted users to only requesting data and reports to which they had been given access.  Putting the security in the report requesting screen proved to be the simplest way to implement a very complicated security requirement.  The development team looked at using native Oracle table and row level security approaches but decided against them as they all proved to have a significant negative impact on the report execution times.

 

Data accuracy.  The database design was revised to include additional columns for critical fields that had frequently suspect values.  Thus, our table had the original date of production and a reporting date of production field.  The reporting date is used by the reports and is a computed field while the original date of production is the date provided by the data source.  This approach allowed the system to maintain a ‘mirror’ of the source data while providing more accurate data for the application reports.  Also, the new SQL*Loader based load programs provided extensive reporting on all suspect data.  Although it is still early in the process, it is our hope that these new and improved load reports will catch errors much earlier in the quarterly reporting process, thereby reducing the time and effort required to run the quarterly report.  We should be able to estimate the effectiveness of this approach later this year.

Lessons Learned

Through the course of working on this project there are a number of lessons that have been learned.  Here are some of our highlights:

Gantt charts should focus on the next two to six months

Business processes are in a constant state of flux.  If you spend a lot of time detailing out day and week long tasks 12 months from now, there is a good chance that the effort will be wasted as the business, project staff and a number of other unknowns are likely to have changed, potentially negating the planning effort.  Our experience has shown that it is best to gradually put less detail into the plan as you move further into the future.  In fact, we would recommend that you include only summary level tasks as you schedule six months or more into the future.

Lowest level task durations should be a calendar week or less

Low-level detail tasks that have durations of more then a week are difficult to manage.  We have found that programmers are generally an optimistic lot.  They tend to think that current delays can be made up tomorrow.  Therefore programmers can think they are on-track on long (several week) running tasks until a few days before the task is due.  Given this tendency, it is advisable to split long running tasks into subtasks, at most a week long, so that the project can be accurately tracked by the project manager. 

All third party dependencies should be identified in Gantt Charts

This is a standard, understood principal.  However, it bears repeating that any dependency outside of your development team or outside of the direct control of the project manager needs to be explicitly identified in the project plan.  These dependencies should be noted so that higher-level managers can be made aware of all possible project coordination ‘opportunities’.

All user accessible features should be demonstrated to users as early and often as possible

Again this reflects a standard principal.  The more often demonstrations can be presented to the customer the less likely he will be surprised by what you deliver and the closer your delivery matches what he really wants. 

Product releases should occur at least once a year

This issue speaks to the business horizon under which most sponsors work.  They need to be able to regularly report verifiable progress on projects justifying their belief in the worthiness of your work and of the appropriateness of the resources they have allocated for your project.  As such, it is our experience that it is in your best interest to give the project sponsor a deliverable at least once a year.

Deliver and document cost savings (sponsor justifications) as soon and often as possible

Every cost savings story that your project generates should be documented and presented to the sponsor.  Taking this approach can help maintain, and possibly even generate more, support from your sponsor and others within the organization.

Avoid part-time team members

Part-time project team members can add development resources to a project.  Our issue with them is that if you’re not careful, they will require extra project coordination time as they timeshare your project work with their other project commitments.  This issue is a corollary of the mythical man-month, which states that you cannot add 100 people to a 100-day task and expect that task to finish in a day.  Please note that this is a general rule and, as such, there are exceptions.  Clearly if the part-time resource adds a ‘lightly’ needed or highly regarded skill, it may be in the project’s best interest to take advantage of this part-time team member.

Avoid death spirals

Be careful to avoid resource death spirals.  Resource death spirals most often occur when project resources are stretched too thin.  An example of a resource death spiral is when old programs are being maintained by applying patches.  These patches usually complicate the original programs resulting in a more complex code base.  The more complex code base then requires additional patches, which get progressively to apply as the underlying code gets progressively more complicated.   This process repeats in a spiraling effect.  An organization going through intermediate or final stage death spirals can be recognized by a large number of open bugs, an unhappy user community, and a rapidly aging manager.   The cleanest way to break out of a death spiral is to make a revolutionary change in the underlying program.  Normally the change is to replace the existing programs and start over.

The Cost, Quality, Time Compromise

In most projects that we have been involved with, one of the team members will state that of the three most basic project goals: low cost, high quality, and a short development time, you can only realize two.  This logic is based on linear thinking in which it is assumed that as you spend more time on a project, the project quality will improve and the cost will go up.  This does not have to be the case if you apply the right combination of ingenuity and technical insight.  We have found that far often teams get stuck thinking that the next problem needs to be solved in the same way that the last problem was solved, and since the last solution took three people two weeks, the next solution will take about the same time.  Avoid such thinking.  Instead, spend more time designing.  If you are blocked and can not see an easier way to do the work, look at rotating a new resource on to the development or design team.  Perhaps add a consultant.  Having said all of this, there are some problems that are intractable and cannot easily be sped up by innovation and ingenuity.  Our point is that out of the project gate you shouldn’t settle for the status quo, rather first search for an elegant, innovative solution and only after that fails, go with a tried and true approach.

Conclusion

This paper presented a custom made OLAP environment.  The basic tools used to create this environment were Oracle Web-Forms and Excel.  As we write this paper, over 7,300 reports have been run on this system with users located throughout the world.  We have some users that request as many as 100 reports per day.  We believe that the technology used in developing this project represents a straightforward methodology which you too can use to build a custom web enabled OLAP environment.