Table of Contents: o DATA DESCRIPTION o PROJECT OVERVIEW: A FUND RAISING NET RETURN PREDICTION MODEL o FINAL REPORT o DATA SOURCES and ORDER & TYPE OF THE VARIABLES IN THE DATA SETS o SUMMARY STATISTICS (MIN & MAX) o DATA (PRE)PROCESSING o TERMINOLOGY-GLOSSARY +--------------------------------------------------------------------+ | DATA DESCRIPTION | +--------------------------------------------------------------------+ The data is in comma delimited format. The raw dataset, cty_raw.dat, contains 95412 records and 481 fields. The first/header row of the data set contains the field names. The data dictionary is in the file cty_dic.txt. The fields in the data dictionary are ordered by the position of the fields in the raw data set. Blanks in the string (or character) variables/fields and periods in the numeric variables correspond to missing values. Each record has a unique record identifier or index (field name: CONTROLN.) For each record, there are two target/dependent variables (field names: TARGET_B and TARGET_D). TARGET_B is a binary variable indicating whether or not the record responded to the promotion of interest ("97NK" mailing) while TARGET_D contains the donation amount (dollar) and is only observed for those that responded to the promotion. +--------------------------------------------------------------------+ | PROJECT OVERVIEW: A Fund Raising Net Return Prediction Model | +--------------------------------------------------------------------+ BACKGROUND AND OBJECTIVES ------------------------- The data set is proprietary and has been obtained from an unnamed charity that will be called CTY hereafter. This data is to be used only for class purposes and is not to be shared with others. CTY is a not-for-profit organization that provides programs and services for a specific group with a specific injury and its related diseases. With an in-house database of over 13 million donors, CTY is also one of the largest direct mail fund raisers in the country. The project is to analyze the results of one of CTY's recent fund raising appeals. This mailing was sent to a total of 3.5 million CTY donors who were on the CTY database as of June 1997. Everyone included in this mailing had made at least one prior donation to CTY. The mailing included a gift (or "premium") of personalized name & address labels plus an assortment of 10 note cards and envelopes. All of the donors who received this mailing were acquired by CTY through similar premium-oriented appeals such as this. One group that is of particular interest to CTY is "Lapsed" donors. These are individuals who made their last donation to CTY 13 to 24 months ago. They represent an important group to CTY, since the longer someone goes without donating, the less likely they will be to give again. Therefore, recapture of these former donors is a critical aspect of CTY's fund raising efforts. However, CTY has found that there is often an inverse correlation between likelihood to respond and the dollar amount of the gift, so a straight response model (a classification or discrimination task) will most likely net only very low dollar donors. High dollar donors will fall into the lower deciles, which would most likely be suppressed from future mailings. The lost revenue of these suppressed donors would then offset any gains due to the increased response rate of the low dollar donors. Therefore, to improve the cost-effectiveness of future direct marketing efforts, CTY wishes to develop a model that will help them maximize the net revenue (a regression or estimation task) generated from future renewal mailings to Lapsed donors. POPULATION ---------- The population for this analysis will be Lapsed CTY donors who received the June '97 renewal mailing (appeal code "97NK"). Therefore, the analysis data set contains a subset of the total universe who received the mailing. The analysis file includes all Lapsed donors who received the mailing, with responders to the mailing marked with a flag in the TARGET_B field. The total dollar amount of each responder's gift is in the TARGET_D field. The overall response rate for this direct mail promotion is 5.1%. The distribution of the target fields is as follows: Target Variable: Binary Indicator of Response to 97NK Mailing Cumulative Cumulative TARGET_B Frequency Percent Frequency Percent ------------------------------------------------------ 0 90569 94.9 90569 94.9 1 4843 5.1 95412 100.0 Target Variable: Donation Amount (in $) to 97NK Mailing Variable N Mean Minimum Maximum ------------------------------------------------------ TARGET_D 95412 0.7930732 0 200.0000000 ------------------------------------------------------ The average donation amount (in $) among the responders is: Target Variable: Donation Amount (in $) to 97NK Mailing N Mean Minimum Maximum ----------------------------------------------- 4843 15.6243444 1.0000000 200.0000000 ----------------------------------------------- COST MATRIX ----------- The package cost (including the mail cost) is $0.68 per piece mailed. ANALYSIS TIME FRAME AND REFERENCE DATE -------------------------------------- The 97NK mailing was sent out on June 1997. All information included in the file (excluding the giving history date fields) is reflective of behavior prior to 6/97. This date may be used as the reference date in generating the "number of months since" or "time since" or "elapsed time" variables. The participants could also find the reference date information in the filed ADATE_2. This filed contains the dates the 97NK promotion was mailed. +--------------------------------------------------------------------+ | FINAL REPORT | +--------------------------------------------------------------------+ Once again, the objective of the analysis is to maximize the net revenue generated from this mailing - a censored regression or estimation problem. The response variable is, thus, continuous (for the lack of a better common term.) Although we are releasing both the binary and the continuous versions of the target variable (TARGET_B and TARGET_D respectively), we will use the predicted value of the donation (dollar) amount (for the target variable TARGET_D) in evaluating the results. So, predicting the value of the binary target variable TARGET_B and its associated probability/strength will not be sufficient. The typical outcome of predictive modeling in database marketing is an estimate of the expected response/return per customer in the database. A marketer will mail to a customer so long as the expected return from an order exceeds the cost invested in generating the order, i.e., the cost of promotion. For our purpose, the package cost (including the mail cost) is $0.68 per piece mailed. The final report will consist of a summary of the analysis, the completed questionnaire (cty_que.txt), and an evaluation of the model on on the net revenue generated on a hold-out or validation sample retained in-house. The validation sample will be made available (with the fields TARGET_B and TARGET_D deleted) upon request to allow a verification that the software developed in the project can read the validation data correctly. The performance measure of interest is: Sum (the actual donation amount - $0.68) over all records for which the expected revenue (or predicted value of the donation) is over $0.68. This is a direct measure of profit. +--------------------------------------------------------------------+ | DATA SOURCES and ORDER & TYPE OF THE VARIABLES IN THE DATA SETS | +--------------------------------------------------------------------+ The dataset includes: o 24 months of detailed CTY promotion and giving history (covering the period 12 to 36 months prior to the "97NK" mailing) o A summary of the promotions sent to the donors over the most recent 12 months prior to the "97NK" mailing (by definition, none of these donors responded to any of these promotions) o Summary variables reflecting each donor's lifetime giving history (e.g., total # of donations prior to "97NK" mailing, total $ amount of the donations, etc.) o Overlay demographics, including a mix of household and area level data o All other available data from the CTY database (e.g., date of first gift, state, origin source, etc.) The fields are described in greater detail in the data dictionary file (filename: cty_dic.txt). The name of the variables in the raw data set is included in each file as the top (header) record. For your information, they are listed below again (ordered by data set position) along with the filed type information (Num: numeric, Char: string/character.) Field Field Name Type ------------------------ 1 ODATEDW Num 2 OSOURCE Char 3 TCODE Num 4 STATE Char 5 ZIP Char 6 MAILCODE Char 7 CTYSTATE Char 8 DOB Num 9 NOEXCH Char 10 RECINHSE Char 11 RECP3 Char 12 RECPGVG Char 13 RECSWEEP Char 14 MDMAUD Char 15 DOMAIN Char 16 CLUSTER Char 17 AGE Num 18 AGEFLAG Char 19 HOMEOWNR Char 20 CHILD03 Char 21 CHILD07 Char 22 CHILD12 Char 23 CHILD18 Char 24 NUMCHLD Num 25 INCOME Num 26 GENDER Char 27 WEALTH1 Num 28 HIT Num 29 MBCRAFT Num 30 MBGARDEN Num 31 MBBOOKS Num 32 MBCOLECT Num 33 MAGFAML Num 34 MAGFEM Num 35 MAGMALE Num 36 PUBGARDN Num 37 PUBCULIN Num 38 PUBHLTH Num 39 PUBDOITY Num 40 PUBNEWFN Num 41 PUBPHOTO Num 42 PUBOPP Num 43 DATASRCE Char 44 MALEMILI Num 45 MALEVET Num 46 VIETVETS Num 47 WWIIVETS Num 48 LOCALGOV Num 49 STATEGOV Num 50 FEDGOV Num 51 SOLP3 Char 52 SOLIH Char 53 MAJOR Char 54 WEALTH2 Num 55 GEOCODE Char 56 COLLECT1 Char 57 VETERANS Char 58 BIBLE Char 59 CATLG Char 60 HOMEE Char 61 PETS Char 62 CDPLAY Char 63 STEREO Char 64 PCOWNERS Char 65 PHOTO Char 66 CRAFTS Char 67 FISHER Char 68 GARDENIN Char 69 BOATS Char 70 WALKER Char 71 KIDSTUFF Char 72 CARDS Char 73 PLATES Char 74 LIFESRC Char 75 PEPSTRFL Char 76 POP901 Num 77 POP902 Num 78 POP903 Num 79 POP90C1 Num 80 POP90C2 Num 81 POP90C3 Num 82 POP90C4 Num 83 POP90C5 Num 84 ETH1 Num 85 ETH2 Num 86 ETH3 Num 87 ETH4 Num 88 ETH5 Num 89 ETH6 Num 90 ETH7 Num 91 ETH8 Num 92 ETH9 Num 93 ETH10 Num 94 ETH11 Num 95 ETH12 Num 96 ETH13 Num 97 ETH14 Num 98 ETH15 Num 99 ETH16 Num 100 AGE901 Num 101 AGE902 Num 102 AGE903 Num 103 AGE904 Num 104 AGE905 Num 105 AGE906 Num 106 AGE907 Num 107 CHIL1 Num 108 CHIL2 Num 109 CHIL3 Num 110 AGEC1 Num 111 AGEC2 Num 112 AGEC3 Num 113 AGEC4 Num 114 AGEC5 Num 115 AGEC6 Num 116 AGEC7 Num 117 CHILC1 Num 118 CHILC2 Num 119 CHILC3 Num 120 CHILC4 Num 121 CHILC5 Num 122 HHAGE1 Num 123 HHAGE2 Num 124 HHAGE3 Num 125 HHN1 Num 126 HHN2 Num 127 HHN3 Num 128 HHN4 Num 129 HHN5 Num 130 HHN6 Num 131 MARR1 Num 132 MARR2 Num 133 MARR3 Num 134 MARR4 Num 135 HHP1 Num 136 HHP2 Num 137 DW1 Num 138 DW2 Num 139 DW3 Num 140 DW4 Num 141 DW5 Num 142 DW6 Num 143 DW7 Num 144 DW8 Num 145 DW9 Num 146 HV1 Num 147 HV2 Num 148 HV3 Num 149 HV4 Num 150 HU1 Num 151 HU2 Num 152 HU3 Num 153 HU4 Num 154 HU5 Num 155 HHD1 Num 156 HHD2 Num 157 HHD3 Num 158 HHD4 Num 159 HHD5 Num 160 HHD6 Num 161 HHD7 Num 162 HHD8 Num 163 HHD9 Num 164 HHD10 Num 165 HHD11 Num 166 HHD12 Num 167 ETHC1 Num 168 ETHC2 Num 169 ETHC3 Num 170 ETHC4 Num 171 ETHC5 Num 172 ETHC6 Num 173 HVP1 Num 174 HVP2 Num 175 HVP3 Num 176 HVP4 Num 177 HVP5 Num 178 HVP6 Num 179 HUR1 Num 180 HUR2 Num 181 RHP1 Num 182 RHP2 Num 183 RHP3 Num 184 RHP4 Num 185 HUPA1 Num 186 HUPA2 Num 187 HUPA3 Num 188 HUPA4 Num 189 HUPA5 Num 190 HUPA6 Num 191 HUPA7 Num 192 RP1 Num 193 RP2 Num 194 RP3 Num 195 RP4 Num 196 MSA Num 197 ADI Num 198 DMA Num 199 IC1 Num 200 IC2 Num 201 IC3 Num 202 IC4 Num 203 IC5 Num 204 IC6 Num 205 IC7 Num 206 IC8 Num 207 IC9 Num 208 IC10 Num 209 IC11 Num 210 IC12 Num 211 IC13 Num 212 IC14 Num 213 IC15 Num 214 IC16 Num 215 IC17 Num 216 IC18 Num 217 IC19 Num 218 IC20 Num 219 IC21 Num 220 IC22 Num 221 IC23 Num 222 HHAS1 Num 223 HHAS2 Num 224 HHAS3 Num 225 HHAS4 Num 226 MC1 Num 227 MC2 Num 228 MC3 Num 229 TPE1 Num 230 TPE2 Num 231 TPE3 Num 232 TPE4 Num 233 TPE5 Num 234 TPE6 Num 235 TPE7 Num 236 TPE8 Num 237 TPE9 Num 238 PEC1 Num 239 PEC2 Num 240 TPE10 Num 241 TPE11 Num 242 TPE12 Num 243 TPE13 Num 244 LFC1 Num 245 LFC2 Num 246 LFC3 Num 247 LFC4 Num 248 LFC5 Num 249 LFC6 Num 250 LFC7 Num 251 LFC8 Num 252 LFC9 Num 253 LFC10 Num 254 OCC1 Num 255 OCC2 Num 256 OCC3 Num 257 OCC4 Num 258 OCC5 Num 259 OCC6 Num 260 OCC7 Num 261 OCC8 Num 262 OCC9 Num 263 OCC10 Num 264 OCC11 Num 265 OCC12 Num 266 OCC13 Num 267 EIC1 Num 268 EIC2 Num 269 EIC3 Num 270 EIC4 Num 271 EIC5 Num 272 EIC6 Num 273 EIC7 Num 274 EIC8 Num 275 EIC9 Num 276 EIC10 Num 277 EIC11 Num 278 EIC12 Num 279 EIC13 Num 280 EIC14 Num 281 EIC15 Num 282 EIC16 Num 283 OEDC1 Num 284 OEDC2 Num 285 OEDC3 Num 286 OEDC4 Num 287 OEDC5 Num 288 OEDC6 Num 289 OEDC7 Num 290 EC1 Num 291 EC2 Num 292 EC3 Num 293 EC4 Num 294 EC5 Num 295 EC6 Num 296 EC7 Num 297 EC8 Num 298 SEC1 Num 299 SEC2 Num 300 SEC3 Num 301 SEC4 Num 302 SEC5 Num 303 AFC1 Num 304 AFC2 Num 305 AFC3 Num 306 AFC4 Num 307 AFC5 Num 308 AFC6 Num 309 VC1 Num 310 VC2 Num 311 VC3 Num 312 VC4 Num 313 ANC1 Num 314 ANC2 Num 315 ANC3 Num 316 ANC4 Num 317 ANC5 Num 318 ANC6 Num 319 ANC7 Num 320 ANC8 Num 321 ANC9 Num 322 ANC10 Num 323 ANC11 Num 324 ANC12 Num 325 ANC13 Num 326 ANC14 Num 327 ANC15 Num 328 POBC1 Num 329 POBC2 Num 330 LSC1 Num 331 LSC2 Num 332 LSC3 Num 333 LSC4 Num 334 VOC1 Num 335 VOC2 Num 336 VOC3 Num 337 HC1 Num 338 HC2 Num 339 HC3 Num 340 HC4 Num 341 HC5 Num 342 HC6 Num 343 HC7 Num 344 HC8 Num 345 HC9 Num 346 HC10 Num 347 HC11 Num 348 HC12 Num 349 HC13 Num 350 HC14 Num 351 HC15 Num 352 HC16 Num 353 HC17 Num 354 HC18 Num 355 HC19 Num 356 HC20 Num 357 HC21 Num 358 MHUC1 Num 359 MHUC2 Num 360 AC1 Num 361 AC2 Num 362 ADATE_2 Num 363 ADATE_3 Num 364 ADATE_4 Num 365 ADATE_5 Num 366 ADATE_6 Num 367 ADATE_7 Num 368 ADATE_8 Num 369 ADATE_9 Num 370 ADATE_10 Num 371 ADATE_11 Num 372 ADATE_12 Num 373 ADATE_13 Num 374 ADATE_14 Num 375 ADATE_15 Num 376 ADATE_16 Num 377 ADATE_17 Num 378 ADATE_18 Num 379 ADATE_19 Num 380 ADATE_20 Num 381 ADATE_21 Num 382 ADATE_22 Num 383 ADATE_23 Num 384 ADATE_24 Num 385 RFA_2 Char 386 RFA_3 Char 387 RFA_4 Char 388 RFA_5 Char 389 RFA_6 Char 390 RFA_7 Char 391 RFA_8 Char 392 RFA_9 Char 393 RFA_10 Char 394 RFA_11 Char 395 RFA_12 Char 396 RFA_13 Char 397 RFA_14 Char 398 RFA_15 Char 399 RFA_16 Char 400 RFA_17 Char 401 RFA_18 Char 402 RFA_19 Char 403 RFA_20 Char 404 RFA_21 Char 405 RFA_22 Char 406 RFA_23 Char 407 RFA_24 Char 408 CARDPROM Num 409 MAXADATE Num 410 NUMPROM Num 411 CARDPM12 Num 412 NUMPRM12 Num 413 RDATE_3 Num 414 RDATE_4 Num 415 RDATE_5 Num 416 RDATE_6 Num 417 RDATE_7 Num 418 RDATE_8 Num 419 RDATE_9 Num 420 RDATE_10 Num 421 RDATE_11 Num 422 RDATE_12 Num 423 RDATE_13 Num 424 RDATE_14 Num 425 RDATE_15 Num 426 RDATE_16 Num 427 RDATE_17 Num 428 RDATE_18 Num 429 RDATE_19 Num 430 RDATE_20 Num 431 RDATE_21 Num 432 RDATE_22 Num 433 RDATE_23 Num 434 RDATE_24 Num 435 RAMNT_3 Num 436 RAMNT_4 Num 437 RAMNT_5 Num 438 RAMNT_6 Num 439 RAMNT_7 Num 440 RAMNT_8 Num 441 RAMNT_9 Num 442 RAMNT_10 Num 443 RAMNT_11 Num 444 RAMNT_12 Num 445 RAMNT_13 Num 446 RAMNT_14 Num 447 RAMNT_15 Num 448 RAMNT_16 Num 449 RAMNT_17 Num 450 RAMNT_18 Num 451 RAMNT_19 Num 452 RAMNT_20 Num 453 RAMNT_21 Num 454 RAMNT_22 Num 455 RAMNT_23 Num 456 RAMNT_24 Num 457 RAMNTALL Num 458 NGIFTALL Num 459 CARDGIFT Num 460 MINRAMNT Num 461 MINRDATE Num 462 MAXRAMNT Num 463 MAXRDATE Num 464 LASTGIFT Num 465 LASTDATE Num 466 FISTDATE Num 467 NEXTDATE Num 468 TIMELAG Num 469 AVGGIFT Num 470 CONTROLN Num 471 TARGET_B Num 472 TARGET_D Num 473 HPHONE_D Num 474 RFA_2R Char 475 RFA_2F Char 476 RFA_2A Char 477 MDMAUD_R Char 478 MDMAUD_F Char 479 MDMAUD_A Char 480 CLUSTER2 Num 481 GEOCODE2 Char +--------------------------------------------------------------------+ | SUMMARY STATISTICS (MIN & MAX) | +--------------------------------------------------------------------+ Summary statistics are provided for the numeric variables only. -------- ------------------------- Variable Minimum Maximum -------- ------------------------- ODATEDW 8306.00 9701.00 TCODE 0 72002.00 DOB 0 9710.00 AGE 1.0000000 98.0000000 NUMCHLD 1.0000000 7.0000000 INCOME 1.0000000 7.0000000 WEALTH1 0 9.0000000 HIT 0 241.0000000 MBCRAFT 0 6.0000000 MBGARDEN 0 4.0000000 MBBOOKS 0 9.0000000 MBCOLECT 0 6.0000000 MAGFAML 0 9.0000000 MAGFEM 0 5.0000000 MAGMALE 0 4.0000000 PUBGARDN 0 5.0000000 PUBCULIN 0 6.0000000 PUBHLTH 0 9.0000000 PUBDOITY 0 8.0000000 PUBNEWFN 0 9.0000000 PUBPHOTO 0 2.0000000 PUBOPP 0 9.0000000 MALEMILI 0 99.0000000 MALEVET 0 99.0000000 VIETVETS 0 99.0000000 WWIIVETS 0 99.0000000 LOCALGOV 0 99.0000000 STATEGOV 0 99.0000000 FEDGOV 0 87.0000000 WEALTH2 0 9.0000000 POP901 0 98701.00 POP902 0 23766.00 POP903 0 35403.00 POP90C1 0 99.0000000 POP90C2 0 99.0000000 POP90C3 0 99.0000000 POP90C4 0 99.0000000 POP90C5 0 99.0000000 ETH1 0 99.0000000 ETH2 0 99.0000000 ETH3 0 99.0000000 ETH4 0 99.0000000 ETH5 0 99.0000000 ETH6 0 22.0000000 ETH7 0 72.0000000 ETH8 0 99.0000000 ETH9 0 67.0000000 ETH10 0 46.0000000 ETH11 0 47.0000000 ETH12 0 72.0000000 ETH13 0 97.0000000 ETH14 0 57.0000000 ETH15 0 81.0000000 ETH16 0 86.0000000 AGE901 0 84.0000000 AGE902 0 84.0000000 AGE903 0 84.0000000 AGE904 0 84.0000000 AGE905 0 84.0000000 AGE906 0 84.0000000 AGE907 0 75.0000000 CHIL1 0 99.0000000 CHIL2 0 99.0000000 CHIL3 0 99.0000000 AGEC1 0 99.0000000 AGEC2 0 99.0000000 AGEC3 0 99.0000000 AGEC4 0 99.0000000 AGEC5 0 99.0000000 AGEC6 0 99.0000000 AGEC7 0 99.0000000 CHILC1 0 99.0000000 CHILC2 0 99.0000000 CHILC3 0 99.0000000 CHILC4 0 99.0000000 CHILC5 0 99.0000000 HHAGE1 0 99.0000000 HHAGE2 0 99.0000000 HHAGE3 0 99.0000000 HHN1 0 99.0000000 HHN2 0 99.0000000 HHN3 0 99.0000000 HHN4 0 99.0000000 HHN5 0 99.0000000 HHN6 0 99.0000000 MARR1 0 99.0000000 MARR2 0 99.0000000 MARR3 0 73.0000000 MARR4 0 99.0000000 HHP1 0 650.0000000 HHP2 0 700.0000000 DW1 0 99.0000000 DW2 0 99.0000000 DW3 0 99.0000000 DW4 0 99.0000000 DW5 0 99.0000000 DW6 0 99.0000000 DW7 0 99.0000000 DW8 0 99.0000000 DW9 0 99.0000000 HV1 0 6000.00 HV2 0 6000.00 HV3 0 13.0000000 HV4 0 13.0000000 HU1 0 99.0000000 HU2 0 99.0000000 HU3 0 99.0000000 HU4 0 99.0000000 HU5 0 99.0000000 HHD1 0 99.0000000 HHD2 0 99.0000000 HHD3 0 99.0000000 HHD4 0 99.0000000 HHD5 0 99.0000000 HHD6 0 99.0000000 HHD7 0 99.0000000 HHD8 0 50.0000000 HHD9 0 99.0000000 HHD10 0 99.0000000 HHD11 0 99.0000000 HHD12 0 99.0000000 ETHC1 0 75.0000000 ETHC2 0 99.0000000 ETHC3 0 99.0000000 ETHC4 0 55.0000000 ETHC5 0 99.0000000 ETHC6 0 99.0000000 HVP1 0 99.0000000 HVP2 0 99.0000000 HVP3 0 99.0000000 HVP4 0 99.0000000 HVP5 0 99.0000000 HVP6 0 99.0000000 HUR1 0 99.0000000 HUR2 0 99.0000000 RHP1 0 85.0000000 RHP2 0 90.0000000 RHP3 0 61.0000000 RHP4 0 40.0000000 HUPA1 0 99.0000000 HUPA2 0 99.0000000 HUPA3 0 99.0000000 HUPA4 0 99.0000000 HUPA5 0 99.0000000 HUPA6 0 99.0000000 HUPA7 0 99.0000000 RP1 0 99.0000000 RP2 0 99.0000000 RP3 0 99.0000000 RP4 0 99.0000000 MSA 0 9360.00 ADI 0 651.0000000 DMA 0 881.0000000 IC1 0 1500.00 IC2 0 1500.00 IC3 0 1500.00 IC4 0 1500.00 IC5 0 174523.00 IC6 0 99.0000000 IC7 0 99.0000000 IC8 0 99.0000000 IC9 0 99.0000000 IC10 0 99.0000000 IC11 0 99.0000000 IC12 0 50.0000000 IC13 0 61.0000000 IC14 0 99.0000000 IC15 0 99.0000000 IC16 0 99.0000000 IC17 0 99.0000000 IC18 0 99.0000000 IC19 0 99.0000000 IC20 0 99.0000000 IC21 0 50.0000000 IC22 0 99.0000000 IC23 0 99.0000000 HHAS1 0 99.0000000 HHAS2 0 99.0000000 HHAS3 0 99.0000000 HHAS4 0 99.0000000 MC1 0 99.0000000 MC2 0 99.0000000 MC3 0 99.0000000 TPE1 0 99.0000000 TPE2 0 99.0000000 TPE3 0 99.0000000 TPE4 0 99.0000000 TPE5 0 71.0000000 TPE6 0 47.0000000 TPE7 0 25.0000000 TPE8 0 99.0000000 TPE9 0 99.0000000 PEC1 0 99.0000000 PEC2 0 99.0000000 TPE10 0 90.0000000 TPE11 0 76.0000000 TPE12 0 99.0000000 TPE13 0 99.0000000 LFC1 0 99.0000000 LFC2 0 99.0000000 LFC3 0 99.0000000 LFC4 0 99.0000000 LFC5 0 99.0000000 LFC6 0 99.0000000 LFC7 0 99.0000000 LFC8 0 99.0000000 LFC9 0 99.0000000 LFC10 0 99.0000000 OCC1 0 99.0000000 OCC2 0 99.0000000 OCC3 0 99.0000000 OCC4 0 99.0000000 OCC5 0 99.0000000 OCC6 0 43.0000000 OCC7 0 55.0000000 OCC8 0 99.0000000 OCC9 0 99.0000000 OCC10 0 99.0000000 OCC11 0 99.0000000 OCC12 0 99.0000000 OCC13 0 99.0000000 EIC1 0 99.0000000 EIC2 0 65.0000000 EIC3 0 99.0000000 EIC4 0 99.0000000 EIC5 0 99.0000000 EIC6 0 64.0000000 EIC7 0 99.0000000 EIC8 0 99.0000000 EIC9 0 99.0000000 EIC10 0 99.0000000 EIC11 0 99.0000000 EIC12 0 67.0000000 EIC13 0 99.0000000 EIC14 0 99.0000000 EIC15 0 99.0000000 EIC16 0 99.0000000 OEDC1 0 99.0000000 OEDC2 0 99.0000000 OEDC3 0 99.0000000 OEDC4 0 99.0000000 OEDC5 0 99.0000000 OEDC6 0 99.0000000 OEDC7 0 99.0000000 EC1 0 170.0000000 EC2 0 99.0000000 EC3 0 99.0000000 EC4 0 99.0000000 EC5 0 99.0000000 EC6 0 37.0000000 EC7 0 99.0000000 EC8 0 99.0000000 SEC1 0 97.0000000 SEC2 0 99.0000000 SEC3 0 30.0000000 SEC4 0 72.0000000 SEC5 0 99.0000000 AFC1 0 97.0000000 AFC2 0 99.0000000 AFC3 0 78.0000000 AFC4 0 99.0000000 AFC5 0 99.0000000 AFC6 0 30.0000000 VC1 0 99.0000000 VC2 0 99.0000000 VC3 0 99.0000000 VC4 0 99.0000000 ANC1 0 83.0000000 ANC2 0 99.0000000 ANC3 0 31.0000000 ANC4 0 92.0000000 ANC5 0 47.0000000 ANC6 0 14.0000000 ANC7 0 99.0000000 ANC8 0 55.0000000 ANC9 0 68.0000000 ANC10 0 99.0000000 ANC11 0 43.0000000 ANC12 0 52.0000000 ANC13 0 50.0000000 ANC14 0 27.0000000 ANC15 0 32.0000000 POBC1 0 99.0000000 POBC2 0 99.0000000 LSC1 0 99.0000000 LSC2 0 99.0000000 LSC3 0 99.0000000 LSC4 0 99.0000000 VOC1 0 99.0000000 VOC2 0 99.0000000 VOC3 0 99.0000000 HC1 0 31.0000000 HC2 0 52.0000000 HC3 0 99.0000000 HC4 0 99.0000000 HC5 0 99.0000000 HC6 0 99.0000000 HC7 0 99.0000000 HC8 0 99.0000000 HC9 0 90.0000000 HC10 0 62.0000000 HC11 0 99.0000000 HC12 0 99.0000000 HC13 0 99.0000000 HC14 0 99.0000000 HC15 0 30.0000000 HC16 0 99.0000000 HC17 0 99.0000000 HC18 0 99.0000000 HC19 0 99.0000000 HC20 0 99.0000000 HC21 0 99.0000000 MHUC1 0 21.0000000 MHUC2 0 5.0000000 AC1 0 99.0000000 AC2 0 99.0000000 ADATE_2 9704.00 9706.00 ADATE_3 9604.00 9606.00 ADATE_4 9511.00 9609.00 ADATE_5 9604.00 9604.00 ADATE_6 9601.00 9603.00 ADATE_7 9512.00 9602.00 ADATE_8 9511.00 9605.00 ADATE_9 9509.00 9511.00 ADATE_10 9510.00 9511.00 ADATE_11 9508.00 9511.00 ADATE_12 9507.00 9510.00 ADATE_13 9502.00 9507.00 ADATE_14 9504.00 9506.00 ADATE_15 9504.00 9504.00 ADATE_16 9502.00 9504.00 ADATE_17 9501.00 9503.00 ADATE_18 9409.00 9508.00 ADATE_19 9409.00 9411.00 ADATE_20 9411.00 9412.00 ADATE_21 9409.00 9410.00 ADATE_22 9408.00 9506.00 ADATE_23 9312.00 9407.00 ADATE_24 9405.00 9406.00 CARDPROM 1.0000000 61.0000000 MAXADATE 9608.00 9702.00 NUMPROM 4.0000000 195.0000000 CARDPM12 0 19.0000000 NUMPRM12 1.0000000 78.0000000 RDATE_3 9605.00 9806.00 RDATE_4 9510.00 9804.00 RDATE_5 9604.00 9803.00 RDATE_6 9510.00 9805.00 RDATE_7 9512.00 9610.00 RDATE_8 9511.00 9806.00 RDATE_9 9509.00 9609.00 RDATE_10 9510.00 9806.00 RDATE_11 9509.00 9805.00 RDATE_12 9509.00 9806.00 RDATE_13 9502.00 9603.00 RDATE_14 9406.00 9603.00 RDATE_15 9412.00 9603.00 RDATE_16 9411.00 9805.00 RDATE_17 9502.00 9512.00 RDATE_18 9412.00 9601.00 RDATE_19 9409.00 9509.00 RDATE_20 9411.00 9508.00 RDATE_21 9409.00 9508.00 RDATE_22 9409.00 9510.00 RDATE_23 9309.00 9507.00 RDATE_24 9309.00 9504.00 RAMNT_3 2.0000000 50.0000000 RAMNT_4 1.0000000 100.0000000 RAMNT_5 4.0000000 50.0000000 RAMNT_6 1.0000000 100.0000000 RAMNT_7 1.0000000 250.0000000 RAMNT_8 1.0000000 500.0000000 RAMNT_9 1.0000000 1000.00 RAMNT_10 0.3000000 500.0000000 RAMNT_11 1.0000000 300.0000000 RAMNT_12 1.0000000 300.0000000 RAMNT_13 0.1000000 500.0000000 RAMNT_14 1.0000000 200.0000000 RAMNT_15 1.0000000 300.0000000 RAMNT_16 0.5000000 500.0000000 RAMNT_17 1.0000000 500.0000000 RAMNT_18 1.0000000 1000.00 RAMNT_19 1.0000000 970.0000000 RAMNT_20 0.5000000 250.0000000 RAMNT_21 1.0000000 300.0000000 RAMNT_22 0.2900000 300.0000000 RAMNT_23 0.3000000 200.0000000 RAMNT_24 1.0000000 225.0000000 RAMNTALL 13.0000000 9485.00 NGIFTALL 1.0000000 237.0000000 CARDGIFT 0 41.0000000 MINRAMNT 0 1000.00 MINRDATE 7506.00 9702.00 MAXRAMNT 5.0000000 5000.00 MAXRDATE 7510.00 9702.00 LASTGIFT 0 1000.00 LASTDATE 9503.00 9702.00 FISTDATE 0 9603.00 NEXTDATE 7211.00 9702.00 TIMELAG 0 1088.00 AVGGIFT 1.2857143 1000.00 CONTROLN 1.0000000 191779.00 TARGET_B 0 1.0000000 TARGET_D 0 200.0000000 HPHONE_D 0 1.0000000 CLUSTER2 1.0000000 62.0000000 -------------------------------------- +--------------------------------------------------------------------+ | DATA (PRE)PROCESSING | +--------------------------------------------------------------------+ General ------- o The field CONTROLN is a unique record identifier (an index) and should not be used in modeling o Response flag (field name: TARGET_B) indicates whether or not the lapsed donor responded to the campaign. This field should not be used as a predictor for TARGET_D. Similarly for TARGET_D should not be used as a predictor for TARGET_B. o Blanks in string or character variables correspond to missing values. Periods and/or blanks in the numeric variables correspond to missing values. Data preprocessing tasks include the following: Noisy Data ---------- Some of the fields in the analysis file may contain data entry and/or formatting errors. You are expected to clean these fields (without excluding the records.) Records and Fields with Missing and Sparse Data ----------------------------------------------- Discovery methods vary in the way they treat the missing values. While some simply disregard missing values or omit the corresponding records, others infer missing values from known values, or treat missing data as a special value to be included additionally in the attribute domain. For the purposes of this project the records and/or fields should not be omitted from analysis because they contain missing data. Instead, the missing data should be inferred from known values (e.g., mean, median, mode, a modeled value, or any other way supported by your tool.) One exception to this rule is the attributes containing 99.5 percent or more missing. You are expected to omit these attributes from the analysis. You are also expected to drop attributes with 'sparse' distributions. Sparse data occur when the events actually represented in given data make only a very small subset of the event space. Fields Containing Constants --------------------------- Fields containing a constant value (i.e., there is only one value for all the records) should be dropped from the analysis. Attributes containing missing and one valid level (e.g., 'Y') are not considered as constants and should be included in the analysis. Time Frame and Date Fields -------------------------- This mailing was mailed to a total of 3.5 million CTY donors who were on the CTY database as of June 1997. All information contained in the analysis dataset reflects the donor status prior to 6/97 (except the gift receipt dates, which will follow the promotion dates.) This date could be used as the "end date" or "reference date" in the calculation of "number of months since" variables. ATTRIBUTE TYPE -------------- See the data dictionary to determine the attribute types. +--------------------------------------------------------------------+ | TERMINOLOGY-GLOSSARY | +--------------------------------------------------------------------+ [GLOSSARY] o attribute = field = variable = feature o responders = targets o non-responders = non-targets o output = target = dependent variable o inputs = independent variables o analysis file = analysis sample = raw data