Credit Scoring and Data Mining Computer Lab Workshop 2 using Excel We will use the data on 100 customers that is found on cc100.xls. Open that Excel file and note what the data means. Most of the data describes the customer attributes but some describe the subsequent history. DL1 is number of times in year 1 month overdue, DL2 is number of times 2 months overdue etc. Definition of bad=1 is at least once 1 month overdue (i,e. 1 or more in DL1).
We use Pivot Tables in Excel to display analysis. For real scorecard building one would use the Chi module is SPSS, or the Enterprise Miner module in SAS ( though that chooses classes automatically so some just use the standard SAS module) or dedicated credit scoring systems like Paragon’s DSS..
#p#分页标题#e#1. To use the pivot tables proceed as follows. Highlight the whole table from a1: w101. Go to Pivot Table and Pivot Chart Report in Data When the first screen of the Wizard appears you want to create to click on data is in Microsoft Excel list and create Pivot table Then go to next screen and make sure you are using data in A1:W101. Go to next screen and say you want to go to existing worksheet and chose to start table at A120. Then scroll down to A120. Drag res ( residential status ) to pivot column Drag good=1 and bad=1 to data section.http://www.ukassignment.org/yingguozuoye/ying_liu_zuo_dai_/
You should get that res status O ( owner) has 37 goods and 20 bads, residential status P ( with parents) has 10 goods and 7 bads; residential status T( furnished tenants) has 7 goods and 4 bads. To get ratio of goods to bads in cell b124 type =b122/b123. Copy this across to columns C,D, E and F. #p#分页标题#e# Which outcomes of residential status would you combine? My suggestion would be to have classes of {O}, {P,T,X} and {U}. Do you agree?
2.The pivot table for “children” has already been done just above. What classes would you combine of these? Similarly spouses employment category has a table above. What classes would you form there? Look at the definitions of the attributes as well as the good:bad ratio.
3. You might want to look at combinations of variables. So create a two dimensional pivot table to look at the interaction between residential status and having a cheque guarantee card. Again highlight the whole table from a1: w101. Go to Pivot Table and Pivot Chart Report in Data When the first screen of the Wizard appears you want to create to click on data is in Microsoft Excel list and create Pivot table Then go to next screen and make sure you are using data in A1:W101. Go to next screen and say you want to go to existing worksheet and chose to start table at A130.#p#分页标题#e# Then scroll down to A130. Drag res ( residential status ) to pivot row and drag cheque to pivot column Drag good=1 and bad=1 to data section.
You should get that Of owners with cheque guarantee card 28 are good and 16 are bad Of owners with no cheque guarantee card 9 are good and 4 are bad; Of with parents with cheque guarantee card 4 are good and 6 are bad Of with parents with no cheque guarantee card 6 are good and 1 is bad
Which groups would you put together to form classes now. I would be tempted to make it { owners}, { P or T or X with no cheque guarantee card}, {P or T with card and U with no card} and { U or X with card}. What groups would you choose. #p#分页标题#e#
4. For binary variables decide what would be good/bad ratio for interactions of cheque account deposit account current account phone Can you work out more than two interactions?
5. For continuous variables do same thing and then combine to get sensible coarse classifying for age Why does pivot tables not work as well for this. Can you construct the best monotone coarse classifier?
B: Constructing KS and ROC Curves#p#分页标题#e#
Suppose one has constructed a scorecard how does one then calculate the KS and ROC curves? We will use the linear regression based scorecard we obtained in the previous workshop where
100Y= -1 +17X(18-33) -11X(33-48) -31X(49-54) +19X(55 -63) + 68X(0child) +63X(1child) +67X(2child) +30X(B,Y) +6X(R,H,X) –14X(Z, no response) +23X(P,E,L)
Go to sheet 2 where the data used in constructing this scorecard is. In row 104 type in the scoring values starting in column E. So type in 17 in E104, -11 in F104 , -31 in G104 and so in until one finishes with 23 in O104. Go to T1 and type in “score”. In T2 we calculate the score of that person by typing in =SUMPRODUCT(E2:O2,E$104:O$104) This should give a score of 3. Copy cell T2 down the column to T101, che Copy column D, the good=1 column into column U by highlighting the top of column D ( the cell with a D in it , clicking copy and then going to the top of U ( the cell with U) in it and paste.#p#分页标题#e#
These, score and whether person was good are the two pieces of information needed to evaluate the ROC and KS curve. We will use another sheet to evaluate the curves on. Highlight columns T and U. Press copy go to Sheet 4 and with cursor on the first column go to Paste Special in the Edit menu. Click the values box ( otherwise we only get the formulae which don’t make sense in this sheet) and then click on OK. Highlighting the two columns A and B go to Sort in the Data menu and sort on score column with scores ascending. In cell B102 ( at bottom of good column) use Summation sign to get =SUM(B2:B101), the number of goods, In c102 type =100-b102 and in d102 type =b102. We will use the C and D column to find how many bads ( C) and How many goods(D) have values less than that customer. Type 0s in C1 and D1 In C2, type =IF(b2=0,C1+1,C1) which adds one if the b column is 0 ( a bad). Copy this down to C101. In D2 type =IF(b2=1,d1+1,D1) which does the same thing for the goods. To get these into percentages in E1 type %of bads, in F1 type % of goods In E2 type =C2/C$102 and copy down the E column. #p#分页标题#e#In F2 type =D2/D$102 and copy down the F column
To get KS value in g column type in G2 =E2-F2 and copy this down the whole column. The largest value ( 0.66) is the KS statistic
To get KS chart go to Chart Wizard Choose line and in data range type in or highlight E1 to F101 ( E1:F101). Let this chart be in the current sheet and tidy up the colours and background fill. This is the KS chart
To get the ROC chart go to Chart Wizard again Choose scatter this time and choose the variant where the data points are connected by a line, The data range is E2:F101 and when it comes to labelling the axes label the x axis % bads and the Y axis % goods.
Exercise: Can you work out how to calculate the Gini coefficient form the numbers given in the spreadsheet?#p#分页标题#e# |