Applying Cluster Analysis to Software Reuse

Copyright Lombard Hill Group


 

Abstract
Cluster analysis, a method of classifying objects based upon specific features, has many potential benefits in software reuse. At the Manufacturing Productivity section of Hewlett-Packard, we applied such an analysis to a set of reusable assets and identified "include file" configurations that would minimize the maintenance effort of such files. In this paper, we define cluster analysis within the context of group technology and present an example of how this analysis was applied and utilized.

 

Keywords: cluster analysis, group technology, manufacturing concepts, domain analysis.

 

1. Background
The purpose of group technology in manufacturing is to utilize economies of scope by identifying and exploiting the similarities in the parts to be manufactured and the sequence of machines that are necessary for the processing of those parts. Parts are classified into families on the basis of characteristics such as size and geometry. Machines which are used to process the part families are situated in proximity to each other in "machine cells".

Wemmerlov and Hyer [Wemm] highlight three ways that group technology benefits are achieved:

* By performing similar activities together, less time is wasted in changing from one unrelated activity to the next
* By standardizing closely related items or activities, unnecessary duplication of effort is avoided
* By efficiently storing and retrieving information related to recurring problems, search time is reduced

 

2. Position

One of the means for family identification in group technology is cluster analysis. Cluster analysis is "concerned with grouping of objects into homogeneous clusters (groups) based on the object features. [Kusi]" Cluster analysis was applied to software reuse at the Manufacturing Productivity (MP) section of the Software Technology Division of HP to solve a problem concerning the maintenance of their reusable assets, called "utilities".

The MP section produces large application software for manufacturing resource planning. The MP reuse program started in 1983 and continues to the present. The original motivation for pursuing reuse was to increase the productivity of the engineers to meet critical milestones [Nish]. MP has since discovered that reuse also eases the maintenance burden and supports product enhancement. Reuse was practised in the form of reusable assets (application/architecture utilities and shared files) and generated code. Total code size for the 685 reusable assets was 55 Thousand lines of Non-Commented Source Statements (KNCSS). The reusable assets were written in Pascal and SPL, the Systems Programming Language for the HP 3000 Computer System. The development and target operating system was the Multi-Programming Environment (MPEXL).

The utilities at MP are many (685 utilities) and small in size (lines of code range from 14 to 619 Non-Commented Source Statements). In manufacturing systems software developed by MP, a transaction constitutes a cohesive set of activities used for inventory management and is a subunit of the manufacturing systems software.

Within each transaction, calls are made to the appropriate utilities as required which are contained in an include file specific to the transaction. However, this has led to a proliferation of different include files since each transaction is usually created by a different engineer. When a utility is modified, all the include files which contain this utility need to be identified and updated with the new version. This has resulted in a tremendous amount of effort.

In an effort to reduce the potential amount of effort required for future updates, an analysis using cluster analysis was conducted on the use of utilities by transactions.

First, a 13 x 11 matrix was created by designating the rows as transactions and the columns as utilities. (Figure 1). A "1" indicates that a transaction makes a call to the particular utility, and a "0" indicates that a transaction does not make a call to the particular utility.

Input Matrix

(Rows are transactions; columns are reusable assets)

 

1 0 1 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 1 0 0 1

0 0 1 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 1 0 0 1

0 0 1 0 0 0 0 0 0 0 1

0 0 1 1 0 0 0 0 0 0 1

0 0 1 0 0 0 0 0 0 0 0

0 1 0 0 1 0 1 0 0 0 1

0 1 0 0 1 0 1 0 0 0 0

0 1 0 0 1 1 1 0 1 1 1

0 1 0 1 1 1 1 0 1 1 1

0 1 0 1 0 0 1 0 0 0 1

0 1 0 1 1 0 1 0 0 0 1

 

Column (Reusable assets):
1=Adj-summary-qty
2=Autobofill
3=check'store'up
4=invalid-fld-check
5=potency-inv-qty
6=prep'for'pcm
7=print-document
8=report-neg-qty
9=send'to'pcm
10=update'pcm'buff
11=write-stock-file

Rows (Transactions):


1=adjhand
2=issalloc
3=isseu
4=issord
5=issunp
6=issbo
7=move
8=recinloc
9=recinsp
10=recwo
11=recrun
12=recpopt
13=recpoit
 


 

Figure 1

The matrix is then used as an input file to a clustering algorithm provided by Dr. Andrew Kusiak of the University of Iowa.

The output solution, as shown in figure 2, reorders the reusable assets into "clusters". The results suggest that we place utilities (depicted by the columns) 1, 3 and 8 into a single include file for transactions 1 to 7.

Utilities 2,5,6,7,9,10 should be placed into another include file for transactions 8,9,10,11,12,and 13.

Utilities 4 and 11 can either be placed in both include files or a separate one may be created for them.

 

Cluster Analysis Solution

(Rows are transactions; columns are reusable assets)

 
 
 
 
 
 
 
 
 
 
1
1
 
 
 
1
3
8
2
5
6
7
9
0
1
4
Row
1
1
1
 
 
 
 
 
 
 
 
 
 
2
 
1
1
 
 
 
 
 
 
1
 
 
3
 
1
1
 
 
 
 
 
 
 
 
 
4
 
 
1
 
 
 
 
 
 
1
 
 
5
 
1
 
 
 
 
 
 
 
1
 
 
6
 
1
 
 
 
 
 
 
 
1
1
 
7
 
1
 
 
 
 
 
 
 
 
 
 
8
 
 
 
1
1
 
1
 
 
1
 
 
9
 
 
 
1
1
 
1
 
 
 
 
 
10
 
 
 
1
1
1
1
1
1
1
 
 
11
 
 
 
1
1
1
1
1
1
1
1
 
12
 
 
 
1
 
 
1
 
 
1
1
 
13
 
 
 
1
1
 
1
 
 
1
1

 

Figure 2

 

Benefits of Cluster Analysis for Reuse

Cluster analysis is useful in the creation of include files with specified utilities that would reduce the effort required to maintain the files. In our example with the MP section, prior to cluster analysis, thirteen individual include files were maintained; one for each transaction. By utilizing cluster analysis, we were able to identify the commonalities and differences within the thirteen include files and specify a core set of two include files. By reengineering the thirteen include files into two, the number of files to maintain can be reduced by 85%.

Cluster analysis also has further applications in software reuse. It may be used to identify "families of systems" i.e. those that share the same features. For example, we can apply cluster analysis to a matrix where the columns depict the features of software systems/products and the rows, the software systems/products. The analysis would cluster the features to the software systems/products. thereby helping to identify families of systems which share common features. This information may be useful in determining specific reusable assets to create.

  3. Comparison
Some researchers have utilized cluster analysis for the purposes of reuse classification. For example, Maarek and Kaiser [Maar] describe the use of conceptual clustering to classify software components. Taxonomies of software components are created "by using a classification scheme based on conceptual clustering, where the physical closeness of components mirrors their conceptual similarity." The objects are gathered into clusters where they are more 'similar' to each other than to the members of other clusters.

 

Acknowledgements

My acknowledgements to Dr. Andrew Kusiak, Dr. Sylvia Kwan and Alvina Nishimoto for their help and input to this paper.

[Kusi] Kusiak, Andrew and Wing Chow, Decomposition of Manufacturing Systems, IEEE Journal of Robotics and Automation, vol. 4, no. 5, October 1988.

[Maar] Maarek, Yoelle and Gail Kaiser, Using Conceptual Clustering for classifying reusable Ada code, Using Ada: ACM SIGAda International Conference, December 9-11, 1987, ACM Press, New York, 1987.

[Nish] Nishimoto, Alvina, "Evolution of a Reuse Program in a Maintenance Environment", 2nd Irvine Software Symposium, March 1992.

[Wemm] Wemmerlov, Urban and Nancy Hyer, Group Technology, chapter 17 in Handbook of Industrial Engineering, Gavriel Salvendy, ed., John Wiley & Sons, 1992.