案例经济分析案例经济分析
project computer class Risk Theory 2017–2018; week 2
案例经济分析
Bailey-Simon tariff method—optimization in R
image.png– an open source, community developed statistical computing system – has acquired significant capability in optimization and nonlinear parameter estimation. Indeed, R is be- lieved to be the leading package for research with statistics, and its optimization capabilities reflect the interests of its creators and users. Thus maximum likelihood computations have particular tools, but there are also a number of function optimization tools that can be used
to construct customized computations. Having tools as well for mathematical programming tasks, R is a strong general-purpose tool for scientific computation. It is also quite adaptable and sometimes is used as a user interface to tools written in other languages.
In this project, we study the problem of constructing a tariff in non-life insurance. We have a portfolio of risks split up by two risk factors, gender i= 1,2 and region of residence j= 1,2,3. The available data are the numbers of policies nijin each class (i,j), as well as the average of the claims rijfiled.
We want to construct a tariff for insuring these policies, charging a premium of the form xiyj, for certain numbers x1,x2 and y1,y2,y3. It can be seen as a basic premium per region, multiplied by a surcharge for gender, or vice versa. We want to find numbers xiand yjsuch that the values xiyjare in some sense ‘close’ to the observed risk premiums rij.
We define ‘close’ as having a small value of a quantity equal to or resembling (o− e)2, with othe observed values rijin each cell and ethe premiums xiyjto be charged.
In insurance applications, the rule is that the deviation of o from its mean value will tend to be larger if this mean is large. So if we look at the unmodified sum-of-squares (o− e)2, the large values of o and e will have a large influence on the fitted values of xiand yj. Now assume that e is the mean value of o, and also a constant times its variance, such as when the observations are Poisson variables or multiples of those. Then if we look at (o − e)2/e instead, a scaled version of the sum-of-squares, all terms in the sum can be seen to have equal mean. See also MART, Ch. 9.3. Such quantities are known as chi-square statistics, and minimizing them to get estimates is called minimum chi-square estimation.
By asymptotic normality, the quantity Σ
i,j
(Oij− E^[Oij])2/E^[Oij] approximately has a χ2
distribution, with as the number of degrees of freedom k the number of cells minus the number of parameters estimated.
The purpose of this project is to learn about
• using matrices, functions, iteration in R;
1 Reading thedata
Our data are average claims and the numbers of policies in a portfolio split up by gender, row-wise, and by three regions of residence, column-wise:
Gender
Region 1
Policies Claims
Region 2
Policies Claims
Region 3
Policies Claims
1
800 550
2400 364
1200 455
2
3200 625
1600 455
800 518
First we read the data and tell R to store them in 2 × 3 matrices:
n <- rbind(c(800, 2400, 1200),
c(3200, 1600, 800)) ## nr. of policies r <- rbind(c(550, 364, 455),
c(625, 455, 518)) ## average claim amounts
2 Usingthegeneralpurposeroutineoptim()
Our first plan of action to find optimal premiums is simply to invoke R’s general purpose multivariate optimization routine optim(). The first argument of optim() is a vector of length equal to the number of parameters, and initially set to some starting values. The second is a function to be minimized. Further and optional arguments might instruct optim to use a specific optimization technique or to stop only after many iterations.
The first two elements of the parameter vector pars for the function dist1 defined below represent x1,x2, the last three are y1,y2,y3. Using the fact that the outer product ˙x⊗ ˙y is a matrix with (i, j) element xiyj, a straightforward function to compute the ‘distance’ d(˙x,˙y) = (o− e)2/eis as follows:
dist1 <- function (pars) { x <- pars[1:2] y <- pars[3:5] f <- x o y ## o gives outer product sum(n * (r - f) ^ 2 / f) ## i.e., sum{(o-e)^2/e} }
Note that oijhere is actually nijrij, and eijis nijfij, which explains the n appearing in the numerator for (o−e)2/ein the final line of the script. Since xiyjis to be used as a premium for cell (i,j), all parameters xi,yjmust be positive. If the parameters are not constrained to be positive, it is easy to see that the distance function has minimal value −∞, obtained when any f-value tends to −∞. Later on we will see some ways to find the solution we need, forcing optim to scan only the positive region.
We minimize this function dist1 by calling optim(), taking all starting values of the pa- rameters equal to one. By this choice, the initial fitted values are also all equal to 1, meaning we start the search for the minimum in a point very far removed from the minimum when all parameters are positive.
oo <- optim(rep(1, 5), dist1) dist1(rep(1, 5)) ## 2,594,325,600 oo$value ## 2554.5 oo$convergence ## 1 oo$counts ## 502 NA
Instead of the outer product, we might have used for-loops the generate the matrix of cross- products xiyj. But unless the result in the next step of a for-loop depends on the result of the previous step(s), such as in recursive algorithms, it is better to use vector operations whenever possible. First of all, they are easier to read, and second, they are very much faster.The result of the optimization
案例经济分析用R语言computer class Risk Theory Ba
2017-07-30