SDG: A Flexible Synthetic-Databases-Generator

Dr. Jamal Alsabbagh, alsabbaj@gvsu.edu

There is often a need for using large test databases for benchmarking of DBMS query optimizers, performance tuning, and testing queries and database programs. One seemingly obvious alternative is to use one of the many publicly available databases. Such databases are, however, by their very nature domain-specific and, in addition, have their own data distributions that render them inappropriate for controlled experimentation. In this work, we have developed SDG (Synthetic Database Generator), which is capable of generating very large (millions of records) synthetic databases. SDG can generate one or more related relations according to use-defined parameters that specify such characteristics as number of relations in the schema, relation sizes, value distributions, selectivity factors, and key/foreign-key relationships. The domain of a generated application is specified to SDG by a small (about a dozen rows) sample from the desired domain whereby the sample acts as a seed for the generated synthetic database. SDG is written in SQL and generates relational data directly. Therefore, there is no need to go through the extra steps of loading flat files into a database.

