SPSS Data Warehousing and Systems Design

Data Warehousing is the process of putting your data together in such a way that it makes analysis and report creation simple, fast, efficient, error-free, and cost effective.

Typically, an enterprise has data stored in a variety of formats, with multiple types of software, and often stored on different operating systems. The Data Warehousing approach is to combine all of this information into a small set of files (generally 2 to 3) that contain all of the information commonly used for reporting and/or data mining. By having your information all in one place and in one format, reporting and exploration become much easier.

The typical data warehouse that we develop contains a single file of customer or client activity data tagged with all the relevant demographic and organizational data, and a file of aggregate data that capture summary data for quick reporting.

For instance, a chain of stores has customers, purchases, and numerous store locations.  (This is, of course, a simplification. Usually a data warehouse will combine data from 10 to 20 different source files.)  We connect the purchases data to the customer data, and the purchases/customer data to the store data. That acts as the customer level part of the data warehouse.  From there we create a set of aggregate data by department and store levels for easy reporting. This aggregate file allows for fast processing and, if set up right, can be the backbone of most or all of your summary reporting.

Characteristics of My Data Warehouses

My data warehouses are:

  • Designed to be "punch of a button" easy to run.
  • Designed with automated file builds and automated standard reports.
  • Designed to allow for quick and easy ad-hoc analyses.
  • Designed using well structured code making them easy to maintain.
  • Well documented externally (how to run, how to maintain, sources of data files, data contacts, flowcharts/diagrams of program and data flows, etc.).
  • Well documented internally, with the standard program information (name, author, purpose, file and data sources, change log, etc) as well as references to the specific state or federal rules or other specifications driving that particular piece of code, any deficiencies in the data, clarifications of complex procedures, etc.
  • Designed with automatic data verification routines to spot any problems that occur.

Our Data Warehousing Project Options:

We can handle data warehousing projects in a number of ways. We charge by the week (35 hours), and are very flexible in how we will structure the project to your needs and schedule. The design of the Data Warehouse can be done within a 35 hour time frame if your staff already has some SPSS knowledge and a thorough knowledge of the data and the necessary reports.

We tailor each data warehouse project to the customer's needs. Depending on the systems requirements and level of your staff's expertise, we will customize the project so that the end result will be both a data warehouse, and someone at your organization who can use and maintain it.

For sites where the staff have minimal or no SPSS skills, we can combine the data warehousing project with SPSS - PASW training so that your staff develops the necessary programming skills through building pieces of the data warehouse and eventually putting it all together.

This is the most effective method of training because your staff actually works on the implementation of your data warehouse. This means that your staff will be intimately knowledgeable about the inner workings of your warehouse, how best to use it and, perhaps more importantly, how to maintain it. For the duration of the project, we will be working along side your staff doing the research and coding the SPSS systems, as well as operating as expert consultants for your staff when they have questions (and there are typically many questions).

Your main decision with regard to the format of your data warehousing project is how much training, if any, you want to include. If you have a staff of experienced SPSS programmers, you may want to minimize the training aspect of your project. On the other hand, if you staff has moderate, little, or no SPSS knowledge, you will probably want a 50/50 split of training time to compliment the implementation time. (see the training page for your training options).

And please note:  We do not have to be in the same place at the same time to accomplish all of this.  With the magic of the internet, we can do all of the work long distance.  This includes customer support, where we use LOGMEIN.com to connect to your computer and provide you the training or support you need at the time.


The cost per hour is $140/hour plus sales tax, which is 8.25% in Austin. The cost per week (35 hours) is $4395 plus sales tax, which is 8.25% in Austin. 

Copyright © 2001-2010, Greg Black, Statistics by Greg Black