STATG003/M003 STATISTICAL COMPUTING
ASSESSMENT 3 (2017/18 SESSION)
Your solutions should be your own work and are to be handed in by yourself to the
Statistical Science Departmental Office by 4pm on MONDAY, 23th of April 2018.
Detailed submission instructions are given below.
Before you hand in your work,completeand sign the slip below this rubric,cutitoff
and attach it firmly to your work.
When you submit your work,pleasemake sure that someone in theDepartmental
Office records on their list of students that you have handed in your work.
Late submission will incur a penalty unless there are extenuating circumstances (e.g.
medical) supported by appropriate documentation.Penaltiesare setout in thelatest
editions of the Statistical Science Departmental Student Handbooks, available from
the departmental web pages.
Failure to submit this in-course assessment will mean that your overall ination
mark is recorded as non-complete, i.e.youwill not obtain a pass for thecourse.
Any plagiarism or collusion will normally result in zero marks for all students in-
volved, whichmay also mean that your overall ination mark is recordedas
non-complete. Guidelinesas to what constitutes plagiarism and collusion maybe
found in the Departmental Student Handbooks.TheTurn-It-In plagiarismdetection
system may be used to scan your submission for evidence of plagiarism or collusion.
Your grade will be provisional until confirmed by the Statistics iners’ Meeting
in June 2018.
General feedback will be given via Moodle.
Declaration:Iam aware of the UCL Statistical Science Department’s regulationson
plagiarism for assessed coursework.Ihave read the guidelines in the studenthandbook,
and understand what constitutes plagiarism.
I hereby affirm that the work I am submitting for this in-course assessment is entirely
my own.
Please write your name in block letters:
Your student number:
Signature:
Date:
STATG003/M003 Assessment 3 — instructions
1. Youare required to write a singleRfunction.Thecode for this functionshould
be saved in a.rfile named by your student number.Forple, if yourstudent
number is 17101710, your code should be saved in the file17101710.r.
2. Yourfunction should bethoroughlycommented.Itshould consist of aheader
section summarising the logical structure,followedby the main body of thefunction.
The main body should itself contain comments.
3. Youare required to submit thefollowing:
A printout of yourRscript.
An electronic copy of yourRscript (see below).
A brief explanation of how your function works,alongwith a summary ofits
output. Theexplanation should include,for ple, detailsof anymathemat-
ical calculations that you carried out before implementing the IWLS algorithm.
Where you have made decisions regarding what to produce by way of output,
you should justify these decisions.Asa rough guide,thisexplanation/summary
should be no more than 2 pages long.
4. Yourfunction shouldnotcreate any outputfiles.
5. Printoutsand explanations should be handed in to the Statistical ScienceDepart-
mental Office.Rememberto complete a plagiarism declaration,and to attach itto
your work.Youshould ensure that all printouts are clearly identifiedwith
your student number.Yourname shouldonlybe on the coversheet.
6. Electroniccopies of yourRfunction should be submitted via the Moodle page forthe
course. Lookfor the link with the heading “Use this link to submit yourproject
ICA3” and follow the instructions.
STATG003/M003 Assessment 3 —Rfunction
Suppose thatYisa vector of geometric random variables, withYi∼Geo(πi)sothat
P(Yi=y)=πi(1−πi)y−1
(y=1,2,3,...),
withE(Yi)= 1/πi=µi,say, and Var(Yi)= (1−πi)/πi2.Supposealso thatxiisavector
of covariates, forming theith row of a matrixX, such that
image.png
for some coefficient vectorβ.
This can be regarded as a GLM, since the geometric distribution is in the exponential
family andηiisa monotonic function ofµi.
Write anRfunction to fit such a model using iterative weighted least squares, and to
check the fitted model.Yourfunction should be calledgrm(‘geometric regressionmodel’).
The arguments to the function should bey,avector of responses to be modelled usingthe
geometric distribution as described above;X,a design matrix of covariates,andstartval,
an initialestimateofthemodelcoefficients.Iftheuserdoesnotsupplyavalueofstartval,
you should either provide a default (e.g.avector of zeroes or any other sensible choice)or
find some other way of starting the algorithm.
Your function should run without user intervention,andits value should be alist
object containing at least the following components (you may add more components if you
feel that these would be useful):
y:The observed responses.
fitted:The fitted values.
betahat:The estimated regression coefficients.
sebeta:The standard errors of the estimated regression coefficients.
cov.beta:The covariance matrix of the estimated regression coefficients.
p:The number of coefficients estimated in the linear predictor.
df.residual: Theresidual degrees offreedom.
deviance: Thedeviance for themodel.
The structure of your function should be similar to the following:
1. Checkthat the dimensions ofyandXare compatible,and that the data aresuitable
for modelling using the geometric distribution — if not,stopwith anappropriate
error message.
2. Carryout the IWLS procedure to fit the model,and output the results to screen(as
described below).
3. Produceresidual plots and other appropriate modeldiagnostics.
4. Assemblethe results into alistobject,andreturn this as the value of thefunction.
In step 2,thescreen output should consist of:atable showing the estimatedcoeffi-
cients, theirstandard errors,z-statisticsand associatedp-values;thenumber ofcoefficients
estimated; theresidual degrees of freedom for the fitted model;andthe deviance forthe
fitted model.Youmay output any other relevant information if youwish.
In step 3, you should use your knowledge of model checking for GLMs to producean
appropriate selection of diagnostics.Youdo not have to produce the same plots asRdoes
when youplotaglmobject.
Your function mustnotuse theglmcommand (nor anything similar such asglm.fit)!
STATG003/M003 Assessment 3 — hints
1. Thereis no single ‘right answer’ to this question.Toobtain a good mark youneed
to approach the problem sensibly, and to provide a clear justification of what you’re
doing. Creditwill be given for code that isclearandreadable.Inparticular,code
that is inadequately commented will be penalised.
2. Youshould ensure that your function produces output that is clearly andappropri-
ately labelled and formatted.
3. Youare not required to analyse any data here;however, whenmarking thisassess-
ment, your function will be tested on one or more datasets to ensure that itworks
correctly. Youmay therefore wish to test your function on a simple datasetbe-
fore submission, and optionally submit your test script along with your functionas
described below.
4. Ifdesired, you may use theIWLSfunction from Workshop 8 as a starting pointfor
this assessment.
5. Toexplain how your function works,youwill probably need to use quite a lotof
mathematical notation.Youare encouraged to use LATEX.Thatbeing said, alegible
handwritten explanation is also perfectly acceptable.
6. Inorder to explain how your function works, you will have to explain that thegiven
distribution is in the exponential family.
7. Yourscripts will be tested by calling your function from a program that assumesthat
you have doneexactlywhat the question asks for.Thismeans,forple,thatyou
must specify your function’s arguments in the order given above,andthat thenames
of respective elements of the list result must be the same as those given above.If
you do not do this, your function will fail when called, and you will lose marks.
8.Rhas some built-in routines relating to the geometric distribution.Youmayuse
these if you think they would be useful;however,notethat the definition ofthe
distribution inRis slightly different from that given above.
9. If you have not already done so, please read the general feedback on the first ICA on
Moodle. Also read the feedback on ICA 2 when it is made available.
10. In case you are stuck or need advice, queries regarding this assessment should be
made during an office hour. For the details of the office hours, and a link to book an
appointment, please see the Moodle page.
STATG003/M003 Assessment 3 — Optional test case
script
You are allowed to write a second script which loads a dataset, fits a regression model
using your implementation of grm, and outputs a selection of estimates and diagnostics.
The choice of data is yours, but the execution must be reproducible by any users of your
script. Hence, limit yourself to datasets which can be loaded from a R package, or which
can be constructed from R code within the script itself. For the former, we recommend the
package datasets. The choice of data and output is yours to make. The goal of this script
is for you to demonstrate to us an ple of your script working in practice, in case we
have any problems running it on our own test cases. For instance, if your script works
correctly with the data provided by you but not with all of our test cases, we will be able
to give you appropriate credit for demonstrating a situation in which the script works. For
that to be possible however, we require that your test case script is clearly written and
commented. As long as the code is clear and reproducible, the format is up to you.
If you make use of this option, upload the test script as a second file. If your student
number is 17101710, say, use the format 17101710test.r .
STATG003/M003 Assessment 3 — marking guidelines
This assessment is marked out of 50. The marks are roughly subdivided into the following
components: 11 marks for correct implementation of the IWLS algorithm, 21 marks for
correct checking of input, for correct presentation of output, and for good coding style,
and 18 marks for clear explanation of how your function works, for correct diagnostics, for
correct mathematical expressions for the variance function, the deviance, etc.
Finance案例之R语言实现函数STATG003/M003 STATISTICAL COMPUTI
2019-05-30