R语言代写|R语言代做|R语言代考
当前位置:以往案例 > >Finance案例之R语言实现函数STATG003/M003 STATISTICAL COMPUTI
2019-05-30

STATG003/M003 STATISTICAL COMPUTING

ASSESSMENT 3 (2017/18 SESSION)

Your solutions should be your own work and are to be handed in by yourself to the

Statistical Science Departmental Office by 4pm on MONDAY, 23th of April 2018.

Detailed submission instructions are given below.

Before you hand in your work,completeand sign the slip below this rubric,cutitoff

and attach it firmly to your work.

When you submit your work,pleasemake sure that someone in theDepartmental

Office records on their list of students that you have handed in your work.

Late submission will incur a penalty unless there are extenuating circumstances (e.g.

medical) supported by appropriate documentation.Penaltiesare setout in thelatest

editions of the Statistical Science Departmental Student Handbooks, available from

the departmental web pages.

Failure to submit this in-course assessment will mean that your overall ination

mark is recorded as non-complete, i.e.youwill not obtain a pass for thecourse.

Any plagiarism or collusion will normally result in zero marks for all students in-

volved, whichmay also mean that your overall ination mark is recordedas

non-complete. Guidelinesas to what constitutes plagiarism and collusion maybe

found in the Departmental Student Handbooks.TheTurn-It-In plagiarismdetection

system may be used to scan your submission for evidence of plagiarism or collusion.

Your grade will be provisional until confirmed by the Statistics iners’ Meeting

in June 2018.

General feedback will be given via Moodle.



Declaration:Iam aware of the UCL Statistical Science Department’s regulationson

plagiarism for assessed coursework.Ihave read the guidelines in the studenthandbook,

and understand what constitutes plagiarism.

I hereby affirm that the work I am submitting for this in-course assessment is entirely

my own.

Please write your name in block letters:

Your student number:

Signature:

Date:



STATG003/M003 Assessment 3 — instructions

1. Youare required to write a singleRfunction.Thecode for this functionshould

be saved in a.rfile named by your student number.Forple, if yourstudent

number is 17101710, your code should be saved in the file17101710.r.

2. Yourfunction should bethoroughlycommented.Itshould consist of aheader

section summarising the logical structure,followedby the main body of thefunction.

The main body should itself contain comments.

3. Youare required to submit thefollowing:

A printout of yourRscript.

An electronic copy of yourRscript (see below).

A brief explanation of how your function works,alongwith a summary ofits

output. Theexplanation should include,for ple, detailsof anymathemat-

ical calculations that you carried out before implementing the IWLS algorithm.

Where you have made decisions regarding what to produce by way of output,

you should justify these decisions.Asa rough guide,thisexplanation/summary

should be no more than 2 pages long.

4. Yourfunction shouldnotcreate any outputfiles.

5. Printoutsand explanations should be handed in to the Statistical ScienceDepart-

mental Office.Rememberto complete a plagiarism declaration,and to attach itto

your work.Youshould ensure that all printouts are clearly identifiedwith

your student number.Yourname shouldonlybe on the coversheet.

6. Electroniccopies of yourRfunction should be submitted via the Moodle page forthe

course. Lookfor the link with the heading “Use this link to submit yourproject

ICA3” and follow the instructions.

STATG003/M003 Assessment 3 —Rfunction

Suppose thatYisa vector of geometric random variables, withYi∼Geo(πi)sothat

P(Yi=y)=πi(1−πi)y−1

(y=1,2,3,...),

withE(Yi)= 1/πi=µi,say, and Var(Yi)= (1−πi)/πi2.Supposealso thatxiisavector

of covariates, forming theith row of a matrixX, such that

image.png

for some coefficient vectorβ.

This can be regarded as a GLM, since the geometric distribution is in the exponential

family andηiisa monotonic function ofµi.

Write anRfunction to fit such a model using iterative weighted least squares, and to

check the fitted model.Yourfunction should be calledgrm(‘geometric regressionmodel’).

The arguments to the function should bey,avector of responses to be modelled usingthe

geometric distribution as described above;X,a design matrix of covariates,andstartval,

an initialestimateofthemodelcoefficients.Iftheuserdoesnotsupplyavalueofstartval,

you should either provide a default (e.g.avector of zeroes or any other sensible choice)or

find some other way of starting the algorithm.

Your function should run without user intervention,andits value should be alist

object containing at least the following components (you may add more components if you

feel that these would be useful):

y:The observed responses.

fitted:The fitted values.

betahat:The estimated regression coefficients.

sebeta:The standard errors of the estimated regression coefficients.

cov.beta:The covariance matrix of the estimated regression coefficients.

p:The number of coefficients estimated in the linear predictor.

df.residual: Theresidual degrees offreedom.

deviance: Thedeviance for themodel.

The structure of your function should be similar to the following:

1. Checkthat the dimensions ofyandXare compatible,and that the data aresuitable

for modelling using the geometric distribution — if not,stopwith anappropriate

error message.

2. Carryout the IWLS procedure to fit the model,and output the results to screen(as

described below).

3. Produceresidual plots and other appropriate modeldiagnostics.

4. Assemblethe results into alistobject,andreturn this as the value of thefunction.

In step 2,thescreen output should consist of:atable showing the estimatedcoeffi-

cients, theirstandard errors,z-statisticsand associatedp-values;thenumber ofcoefficients

estimated; theresidual degrees of freedom for the fitted model;andthe deviance forthe

fitted model.Youmay output any other relevant information if youwish.

In step 3, you should use your knowledge of model checking for GLMs to producean

appropriate selection of diagnostics.Youdo not have to produce the same plots asRdoes

when youplotaglmobject.

Your function mustnotuse theglmcommand (nor anything similar such asglm.fit)!

STATG003/M003 Assessment 3 — hints

1. Thereis no single ‘right answer’ to this question.Toobtain a good mark youneed

to approach the problem sensibly, and to provide a clear justification of what you’re

doing. Creditwill be given for code that isclearandreadable.Inparticular,code

that is inadequately commented will be penalised.

2. Youshould ensure that your function produces output that is clearly andappropri-

ately labelled and formatted.

3. Youare not required to analyse any data here;however, whenmarking thisassess-

ment, your function will be tested on one or more datasets to ensure that itworks

correctly. Youmay therefore wish to test your function on a simple datasetbe-

fore submission, and optionally submit your test script along with your functionas

described below.

4. Ifdesired, you may use theIWLSfunction from Workshop 8 as a starting pointfor

this assessment.

5. Toexplain how your function works,youwill probably need to use quite a lotof

mathematical notation.Youare encouraged to use LATEX.Thatbeing said, alegible

handwritten explanation is also perfectly acceptable.

6. Inorder to explain how your function works, you will have to explain that thegiven

distribution is in the exponential family.

7. Yourscripts will be tested by calling your function from a program that assumesthat

you have doneexactlywhat the question asks for.Thismeans,forple,thatyou

must specify your function’s arguments in the order given above,andthat thenames

of respective elements of the list result must be the same as those given above.If

you do not do this, your function will fail when called, and you will lose marks.

8.Rhas some built-in routines relating to the geometric distribution.Youmayuse

these if you think they would be useful;however,notethat the definition ofthe

distribution inRis slightly different from that given above.

9. If you have not already done so, please read the general feedback on the first ICA on

Moodle. Also read the feedback on ICA 2 when it is made available.

10. In case you are stuck or need advice, queries regarding this assessment should be

made during an office hour. For the details of the office hours, and a link to book an

appointment, please see the Moodle page.

STATG003/M003 Assessment 3 — Optional test case

script

You are allowed to write a second script which loads a dataset, fits a regression model

using your implementation of grm, and outputs a selection of estimates and diagnostics.

The choice of data is yours, but the execution must be reproducible by any users of your

script. Hence, limit yourself to datasets which can be loaded from a R package, or which

can be constructed from R code within the script itself. For the former, we recommend the

package datasets. The choice of data and output is yours to make. The goal of this script

is for you to demonstrate to us an ple of your script working in practice, in case we

have any problems running it on our own test cases. For instance, if your script works

correctly with the data provided by you but not with all of our test cases, we will be able

to give you appropriate credit for demonstrating a situation in which the script works. For

that to be possible however, we require that your test case script is clearly written and

commented. As long as the code is clear and reproducible, the format is up to you.

If you make use of this option, upload the test script as a second file. If your student

number is 17101710, say, use the format 17101710test.r .

STATG003/M003 Assessment 3 — marking guidelines

This assessment is marked out of 50. The marks are roughly subdivided into the following

components: 11 marks for correct implementation of the IWLS algorithm, 21 marks for

correct checking of input, for correct presentation of output, and for good coding style,

and 18 marks for clear explanation of how your function works, for correct diagnostics, for

correct mathematical expressions for the variance function, the deviance, etc.




在线提交订单