Using Genetic Algorithms to Solve Knapsack Problems

Motivation

Aspera offers strategic software license management solutions that are supported by  SmartTrack. SmartTrack is a software solution that allows automated software license management and optimization for many different software products. In this thesis we will focus on the distribution of Microsoft SQL Server licenses in single and multi-cluster environments, which is calculated by a metric engine that is part of SmartTrack. Currently the metric engine  only works for single cluster environments and it is not possible to provide an existing set of licenses for the distribution calculation. Additionally, the calculation of the optimal distribution (i.e. distribution of licenses that cover all installations with the lowest sum of costs of licenses) of Microsoft SQL Server licenses is known to be NP-complete.POE14 Since a greedy algorithm is used for the calculation of the distribution an optimal solution is often not found.

Related Work

The distribution of Microsoft SQL Server licenses over clusters is similar to the knapsack problem,POE14 which can be solved with genetic algorithms.EIB15 Thus, in this thesis we will implement and evaluate different solutions to this combinatorial optimization problem that will use genetic algorithms. In addition to genetic algorithms this thesis will combine genetic algorithms with constraint solving. It was shown in another master thesis by author Kristian Scholz that using constraint solving for the Microsoft SQL Server license distribution problem is working, but sometimes runs into local maxima and won’t find the global maxima (i.e. license distribution with the lowest cost).SCH17

Approach

To use genetic algorithms for the license optimization problem, we require a representation of the solution domain as well as a fitness function. Thus, the first step in this thesis will be to model a genetic representation of the solution domain. The solution domain consists of a topology of clusters with Microsoft SQL Server installations, a distribution of licenses that cover the installations, and a set of licenses. A solution is valid if all installations are covered by licenses.

For the fitness function we plan to compare a simple function, which just sums up the cost of all used licenses, with an improved function.

All algorithms that are developed and implemented in the course of this thesis should work with a fixed set of existing licenses. This set of existing licenses must be completely distributed over all installations, before any new licenses are added.

Two different approaches are planned for implementation. The first, which is planned to work for single and multi-cluster environments, will solely use the genetic algorithm to find a solution. For the second approach it is planned to use the output of the constraint solver from Scholz’s master thesis as input and give the calculated solution back to the constraint solver. With this combination of genetic algorithms and constraint solving, we hope to overcome the problem of local maxima of the constraint solving approach.

Both approaches will be evaluated and compared to the existing creedy algorithm and constraint solving solutions.

Project information

Status:

Finished

Thesis for degree:

Master

Student:

Sebastian Rabenhorst

Supervisor:
Id:

2018-001