Discovering Deviations of Domain Specific Source Language from Domain Language

A good software design code is written according to the Domain Driven Design practice introduced by Eric Evans. In this way our business domain is beautifully mapped to our source code. In domain driven design, one of the core concepts is of Ubiquitous Language which defines the name structuring of the concepts present in the domain model. Through ubiquitous language programmers, domain experts and customers can easily communicate with one another and there will be no second meanings of the concepts under consideration. But as soon as the domain evolves, ubiquitous language gets expanded and complicated. Companies hire new developers who sometimes introduce new identifiers for a certain concept which expands the existing vocabulary thus resulting in the introduction of new terms. This makes it very difficult to maintain the ubiquitous language around the whole project. So this thesis is all about identifying and normalizing the source code vocabulary, particularly terms denoting the concepts. Normalization takes place in three steps: Parsing source code vocabulary, applying Identifier splitting-expansion algorithm to split and expand the concept names and identifier names, lastly constructing graph of concepts and identifiers and suggesting problems that lie in that naming convention of concepts.

Project information

Status:

Finished

Thesis for degree:

Master

Student:

Arj Shahid

Supervisor:
Id:

2020-025