Development of a Runt-time Data Integration Framework in a JVM Environment

Data Integration can be defined as the process of combining data from different sources in order to provide unified perspective of data, based on which important business decisions can be made. Such an integration platform should be capable of handling heterogeneous data from huge number of sources. The number of data-reads and data-stores must be done systematically such that it does not exhaust the memory or the processing speed. In this regard, an efficient run-time solution with the capability to coherently access the right set of data, only when required, is important. KiScript is a script language developed by Kisters, which provides effective functionalities for the purpose of data processing. It is a language of its own containing an IDE, its own syntax and console developed using BASIC, PASCAL, C and JAVA elements. It basically compiles the script and returns an object code which is executed further to obtain desired result. This helps in achieving a very acceptable performance. Further, KiScript provides a number of functions and modules making itself a very stable and flexible framework for data processing.

The main objective of this Master thesis would be to research, implement and analyze if an efficient run-time solution using Java or JVM based languages is feasible using which we can obtain results similar to or better than KiScript. The success of this objective could help in eliminating the efforts to learn new language like KiScript for the purpose of data integration and users can easily get accustomed since Java or JVM based language is a common programming language. In this regard, main focus is to be able to perform Row-wise transformations using the Lambda expressions and Dynamic class generations. Also, to deliver a generic template that can be extended and reused for suitable functionalities.

Project information



Thesis for degree:



Shilpa Sreenivasa