指导
网站地图
英国作业 美国作业 加拿大作业
返回首页

英国留学生应用统计英语硕士学位论文-Multiple imputation in the probit model

论文价格: 免费 时间:2013-02-18 10:22:55 来源:www.ukassignment.org 作者:留学作业网
英国留学生应用统计英语硕士学位论文-Multiple imputation in the probit model
 
 
统计试图解释和预测个体的行为。事实上,我们经常需要做一些决定和选择,例如是或否,雇用或解雇,这样或那样的方式。当我们的选择或决定,我们也同时拥有二元变量。通常,我们用0和1表示二进制变量。在本文中,我们考虑的概率模型是一种特殊的二进制响应模型
 
 
在2节中,我们将介绍Probit模型包括模型的最大似然估计。在3节中,我们介绍了缺失值的估算。比较平均法,回归插补法和多重插补技术,想想伯克森误差和对测量误差的潜在影响的争论。在4节中,它是模拟研究中,我们生成的数据集和参数估计。在5节中,我们提出的结论。
 
 
1. Introduction
The statistics attempt to explain and predict the behavior of individuals. In fact, we often need to do some decisions and choices, such as yes or no, employ or dismiss and this way or that way. When we have the choices or decisions, we also have the binary variables at the same time. Generally, we use 0 and 1 to express the binary variables. In this paper, we consider the probit model http://www.ukassignment.org/uklunwen/ which is one of the special binary response models. Probit analysis is the widely used method for estimating the relationship between the response variables and the explanatory variables. Measurement errors always exist when we compute the explanatory variables and they cannot be avoided. So we need to think about the measurement error in the probit model. In the statistical process, missing data have arose and become a very important problem. When we collect the data, we will have the missing data certainly. A lot of methods can solve the question of missing data. However, we choose the multiple imputation technique, regression imputation method and mean imputation method to solve the question of missing values. Mean imputation method substitutes the mean of the non-missing values for each missing value of that variable. Regression imputation method is an expansion of mean imputation, also called conditional mean imputation. It is the replacement of each missing value with an estimated value based on the regression of other variables on the missing variable from complete data set. Multiple imputation technique replaces each missing value with a set of plausible values instead of substituting a single value for each missing value in order to reflect the uncertainty. In this paper, the initial idea of the thesis is to use multiple imputation technique, regression imputation method and mean imputation method for estimating the missing data and compare the degree of the bias of the probit ML estimator under the normal distribution.
 
 
In section 2, we introduce the probit model include the model and MLE. In section 3, we introduce the imputation for missing values. Compare the mean imputation method, regression imputation method and the multiple imputation technique, think about the Berkson error and give an argument for the potential effects of measurement error. In section 4, it is the simulation study, we generate the data sets and estimate the parameters. In section 5, we present the conclusion.
 
 
2. Probit model
In our life, sometimes we have to choose among some alternatives, such as decide whether or not to go for postgraduate studies or how to commute to work. These decisions are qualitative choices. If we only have two choices, “first choice” with 1 and “second choice” with 0, the choice is binary. For analysis, suppose the response variable y is binary, for instance, y=1 if a person is employed, y=0 otherwise. The binary response model is written as
y_i^*=x_i β+e_i
y_i={█(1    if  y_i^*≥0@0    otherwise)┤                    (2.1)
where y_i is the indicator of the ith individual’s response, it is determined by the latent variable y_i^*, x_i is a 1×K vector of explanatory variable, β is a K×1 vector of parameter and e_i is a random error term with zero expectation and unit variance under normal distribution.
 
 
Examples of the binary response model are the linear probability model, the probit model and the logit model. The probit and logit models often have similar results, although the logit model has a logistic distribution and probit model has a standard normal distribution. Unless the sample size is very large and the observations at the tails exert a large influence, the results obtained from the two models will be very close. In this paper, we only talk about the probit model.
 
 
The probit model was introduced by Chester Bliss in 1935. Let y be a binary response variable with two possible values 0 and 1. We assume x is a vector of explanatory variable and x can influence the outcome of y. Then we consider that probit model takes the form
P(y=1├|x┤ )=Φ(x^' β)                    (2.2)
where β is a vector of unknown parameter estimated by maximum likelihood and Φ denotes the standard normal cumulative distribution function.
 
 
If exists a random term e which is a normal distributed variable in the function, we can motivate probit model as latent variable model. Suppose e is independent of x and e is normal distribution with mean equal to 0 and variance equal to 1. Just like equation (2.1), this gives
 
 
Pr⁡(y_i=1)=Pr⁡(y_i^*≥0)
                                = Pr⁡(x_i^' β+e_i≥0)
                            = Pr⁡(e_i≥-x_i^' β)
                           =1-Φ(-x_i^' β)
=Φ(x_i^' β)                       (2.3)
 
 
In the probit model, the MLE method can translate the discrete dependent variable into a continuous domain, since it satisfies Φ(∙)∈[0,1]. So we can use the log-likelihood function to estimate the parameter. Combine the vector of parameters and the data (x_i,y_i) for each observation, we can obtain the likelihood function as
L(β)=∏_(http://www.ukassignment.org/ i=1)^n▒〖[Φ(x_i β)]〗^(y_i )  〖[1-Φ(x_i β)]〗^(1-y_i )            (2.4)
Suppose the sample size is n, the log likelihood function is
lnL(β)=∑_(i=1)^n▒〖〖{y〗_i  log⁡[Φ(x_i β) ]+(1-y_i )log⁡[1-Φ(x_i β)] 〗}      (2.5)
By maximizing lnL(β), we can derive β ̂ which is the MLE of β. Because Φ(∙) is the standard normal cumulative distribution function, β ̂ is the probit ML estimator.
 
 
3. Imputation for missing values
Missing data is a very frequent problem in the various fields now. In statistical analysis, most data sets have the missing values of some observations. There are many ways to process the missing data, imputing the missing values is a common method if not all values are observed. Imputation is a suitable procedure to replace the data holes with the appropriate values. Imputation techniques include single imputation method and multiple imputation method.
 
 
Mean imputation
Mean imputation is a single imputation method. It substitutes the mean of the non-missing values for each missing value of that variable. We can compute the imputed value as follow
x ̅_k=1/m Σ_r x_k                         (3.1)
where m is the number of non-missing values. Then we substitute x ̅_k for the missing values.
 
 
Mean imputation does not think about the uncertainty of the missing values. A potential disadvantage of mean imputation is that it results in biased parameter estimates. Replacing all missing values with the mean of the non-missing values will cause the variance to be artificially reduced. If the data are missing at random, the estimate of the mean imputation remains unbiased.
 
 
Regression imputation
Regression imputation is also a single imputation method. It is the expansion of mean imputation, also called conditional mean imputation. Regression imputation is the replacement of each missing value with an estimated value based on the regression of other variables on the missing variable from complete data set. If we have the missing value of x_k, we can estimate the imputed value as follow
x ̂_k=β ̂_i z_k                            (3.2)
Through use the data (x_k,z_k) which are non-missing values, we can obtain the regression coefficient vector β ̂_i and then compute the imputed values.
The major advantage of regression http://www.ukassignment.org/ygkczy/ imputation is an unbiased point estimate of the missing value. The bias is reduced because using the relationships with known variables in the sample. But regression imputation also has disadvantage which maybe intensify the existing relationships in the sample, then the estimated results will become more characteristic to the sample and less general to the population
 
Multiple imputation
Rubin brought forth the multiple imputation technique, due to disadvantages of single imputation. Multiple imputation is different from single imputation that using single value to replace the missing values. Multiple imputation is the technique that we use regression method to structure m (m>1) estimated values for each missing value and replace each missing value m times, then obtain m complete data sets. By using standard procedures to analyse m complete data sets and the results of m analyses are combined for the inference.
 
此论文免费


如果您有论文代写需求,可以通过下面的方式联系我们
点击联系客服
如果发起不了聊天 请直接添加QQ 923678151
923678151
推荐内容
923678151