Systematic Classification and Prioritization of Code Smells based on Historical Data


Those who don’t know history are doomed to repeat it.” ― Edmund Burke

Code smells detection tool detect an anomaly code based on current structural information (i.e., metrics). This information is quite important for selecting refactoring candidates either for current or next releases. However, not all detected code smells are considered as important to refactor. Some of them are less risk (i.e., quite stable in a few releases and less harm  and no changes at all). Therefore, to prioritize the code smells for refactoring, we should seek the changes history to reveal how they behave (i.e., changes frequently or not) in the past.

Although there a lot of tools or techniques can detect the worst smells (based on specific quality model), the history of those smells are still remain mystery. By investigating their history, we can reveal their characteristics / patterns. Those characteristics should have a good and clear reasons why they had such characteristic. Knowing the characteristic and its reasons can help a developer to justify and prioritize the smells. Besides that, mining the changes history still a tedious task. From selecting the right version of repository, analyzing the intended artifacts and presenting the results - it is still doing manually by using revision tool (e.g., Git). Although existing mining software repository tools are available, those tools have different purposes and scope, and not focusing on code smells changes history.

Therefore, in this thesis, we plan to develop a systematic classification process to identify a set of code smell characteristics based on changes history. These characteristics could indicate or justify why those code smells need to be refactored either urgent or hold for next releases. After that, it may help in prioritizing the list of identified code smells by giving a considerable weight or rank of each characteristics.   Research Questions

  1. What are characteristics of detected code smells (e.g., god class) base on changes history?
    • Possible Characteristics:
      • H1: Stable but ugly -> Metrics/Indicators?
      • H2: Non-stable and ugly -> Metrics/Indicators?
      • H3: [TBC] -> Metrics/Indicators?
  2. What are steps/procedure need to be done to extract change historical information and classify the smells for particular detected code smells?
    • What are generic steps?
      • H4: Extract, analyze and present metrics
    • What are the requirements?
      • Tools, repo., version etc.
  3. How should we prioritize based on those information/metrics?
    • How to rank them or how much weight should be assigned for each?
      • H5: The ugly and non-stable has high rank/weigh in term of risk

Planning Phases Phase 1: Investigate characteristics or patterns of changes history of detected code smells

Phase 2: Develop an approach to extract, analyze and classify those smells from changes historical data

Phase 3: Validate the proposed classification approach with other changes historical data and see its preciousness and applicability  

Intended Results

  • A set of code smells characteristics (after classification) based on changes history
    • A metrics that represent those identified characteristics
  • A systematic approach to extract, analyze and classify those characteristics from changes history
  • A rank or weight of badness (in term of risk) for each identified characteristics for prioritization of refactoring candidates


  1. Vidal, S. A., Marcos, C., & Díaz-Pace, J. A. (2016). An approach to prioritize code smells for refactoring. Automated Software Engineering, 23(3), 501–532. doi:10.1007/s10515-014-0175-x
  2. Palomba, F., Bavota, G., Penta, M. Di, Oliveto, R., Poshyvanyk, D., & De Lucia, A. (2015). Mining Version Histories for Detecting Code Smells. IEEE Transactions on Software Engineering, 41(5), 462–489. doi:10.1109/TSE.2014.2372760
  3. Snipes, W., Robinson, B., & Murphy-Hill, E. (2011). Code Hot Spot: A tool for extraction and analysis of code change history. In 2011 27th IEEE International Conference on Software Maintenance (ICSM) (pp. 392–401). IEEE. doi:10.1109/ICSM.2011.6080806

Project information



Thesis for degree:



Namitha Raj Basavarajappa