A practical problem of interest is indexing high-dimensional data based on query access patterns. The first iteration for this work utilized only characteristics of the query history in order to suggest a recommended set of indexes in order to reduce a high-dimensional data set to a number of reduced dimensional spaces. While this technique worked quite well for many situations, certain combinations of query selectivity and data correlation could lead the algorithm to "over"-recommend indexes. The current iteration of this work provides a more general solution by taking into account both the characteristics of the query access patterns and the data itself in order to recommend a set of indexes. The technique uses a combination of query access pattern data mining to determine a set of potential indexes, followed by a cost based evaluation of the potential indexes against an approximation of the data set in order to recommend indexes.
The effectiveness of using query access patterns to address the index selection problem is based on the assumption that future query access patterns will be similar to the query access patterns on which the indexes were determined. In order to decouple our solution from this assumption, we plan to incorporate a method to track the similarity of new query access patterns against the historical query access patterns and provide feedback to the user when the difference dictates an index update.