Theoretical Background of Model Development
The formulation of this model is roughly grounded on concepts developed by Volkwein and Szelest. Their study of student loan defaults is based on four theoritical perspectives-1) human capital and public subsidy, 2) borrower's ability to pay, 3) organizational structural/functional approaches, and 4) student-institution fit models from other literature. Each of these perspectives supports the choices of variables in our model that are influential to default behaviour.
Brief descriptions of these perspectives are as follows. Human capital theory is based on the inherent value of a person's skills and knowledge and the theory relates acquisitions of skills and konwledge to educational investment. Public subsidy theory states that low-income but capable students will benefit from the investment in education when the benefits of education exceed the cost of obtaining it. Borrower's ability to pay theory relates income levels of stuednts and of parents to the borrower's ability to repay loans. Organizational structural/functional approaches theory says that organizational characteristics exert influence on student choices and behaviour including repayment of loans. Student institution fit models from other literature comprises many individual student traits to help explain repayment behaviour. Volkwein and Szelest provide a more thorough explanation of these concepts.26
Methodology of Model
Methodology of the default model consists of the steps to implement a mathematical representation of describing default patterns. These steps begin with obtaining the data and end with the realization of the model.
One of the model's advantages is the magnitude of the sample size. Models from nearly all other studies are based on sample sizes ranging in the hundreds to less thanten thousand. Often, teh small sample sizes would represent a 'universe' of borrowers. TG's data files contain approximately three-quarters of the borrowers in Texas during the time period between April 1990 to the end of September 1991. Since we have approximately 170,000 observations and the data represent the majority of borrowers in Texas, this model should produce a more robust inference of the patterns of defaults.
Our first step in model formulation was to select possible characteristics of default and to perform cross-tabulations of characteristics that have a possibility of being associated with default behaviour. Statistical significance of the relationship between default and each characteristics was tested by Chi-square in several different cohorts. Since this model is focused on predicting future borrowers who seperate from institutions, we attempted to use characteristics at the time the borrower left school. However, enrollment status variables-whether a student withdrew from school, whether a student graduated, whether a student had less than half-time status, whether a student returned half or full-time, and whetehr a student had other statuses)-were not available, so we used data available from November 1995. Some other variables were eliminated because of colinearity or unavailability of data. Table 5 contains the means and standart deviations for the selected variables.
Each observation represents a borrower. About 19 percent of the observations were eliminated due to missing values in explanatory variables. Continuous variables were plotted for linearity and were modeled as a continuous variable if the relationship between the characteristic and default was considered linear. If the continuous variable exhibit a non-linear pattern, then prediction by category was determined to best represent the relationship.
Some literature has emphasized the importance of race and income levels in determining default. Some variables are included in the model to represent these factors. U.S. Census tract data provides median income level and the percent share of both Afro-Americans and Hispanics within each borrower's ZIP ode tract.
The choice of the April 1990-September 1990 cohort was largely discretional, but the choice was made to ensure distribution of students between school-types. Most borrowers enter repayment in either November or June when most two and four-year students enter repayment (See Figure 2). There are two months of the year (November and June) where students from two and four-year schools dominate entering repayment. In the remaining months, the type of school that dominates entering repayment varies from month to month. The 18-month cohort contains six months of the former type (November and June) and 12 months of the latter.
Some time-dependent explanatory variables are measured in November 1991. These variables are:
Another time-dependent explanatory variable was measured in November 1995, since we could not obtain the data for 1991. These variables are the enrollemnt statuses previously discussed (in Section V, "Predicting which Borrowers are Most Likely to Default").
These variables are time-dependent since their effect on the model is dynamic wuth respect to the cohort time period. We take a 'snapshot' of these characteristics on eitehr November 1991 or November 1995 in the model. For example, enrollment status is dynamic since it changes for a borrower as time progresses after the end of the cohort period. We consider variables such type of school and location of student as more static, since these tend not to change for an observation over time.