project 2 for 2018
• INSTRUCTIONS:
– This project is worth 20% of your overall marks for this course (for all students, enrolled in STAT2008, STAT4038 or STAT6038).
– If you wish, you may work together with another student (one other) in doing the analyses and present a single (joint) report. If you choose to do this then both of you will be awarded the same total mark. Students enrolled under different course codes may work together. You may NOT work in groups of more than two students and the usual ANU ination rules on plagiarism still apply with respect to people not in your group. This means you should not discussthe project (questions, solutions, code, etc.) with your classmates or any other individuals if they are not in your group. You can discuss the project with me (Anton Westveld) or your tutors.
– Please submit your project on Wattle. As a group you should only submitone project. Make sure to place to place the names and IDs of the individuals in your group on the front page of your project. When uploading to Wattle you will submit:
1. Your project/report.
2. An ‘.R’ ftle containing the R code you used for the project.
– projects should be typed. Your project may include some carefully edited computer output (e.g. graphs, tables) showing the results of your data analysis and a discussion of those results, as well as some carefully selected code. Please be selective about what you present and only include as many pages and as much computer output as necessary to justify your solution. It is important to be be concise in your discussion of the results. Clearly label each part of your report with the part of the question that it refers to.
– Unless otherwise advised, use a significance level of 5%.
–Marks may be deducted if these instructions are not strictly adhered to, and marks will certainly be deducted if the total report is of an unreasonable length, i.e. more than 10 pages including graphs and tables. You may include an appendix that is in addition to the above page limits; however the appendix will generally not be marked, only checked if there is some question about what you have actually done.
– projects will be marked by your tutor (or one of your two tutors, for joint projects). You may ask any of the tutors or me (Anton Westveld) questions about this project up to 4 pm on Thursday 17 May 2018.
– Late projects will NOT be accepted after the deadline without an extension. Extensions will usually be granted on medical or compassionate grounds on production of appropriate evidence, but must have my permission by no later than 12 noon on Thursday 17 May 2018. Even with an extension, all projects must be submitted reasonably close to the original deadline to allow time for the marking to be completed.
1.(100 points) You will explore the techniques for the course by ining data on the number of visits to a health care professional in Australia from 1977-78. The data have been placed on Wattle. The variables are:
–sex : 1 if female, 0 if male
–age : Age in years divided by 100 (measured as mid-point of 10 age groups from 15-19 years to 65-69 with 70 or more coded treated as 72)
–income: Annual income in Australian dollars divided by 1000 (measured as mid-point of coded ranges Nil, less than 200, 200-1000, 1001-, 2001-, 3001-, 4001-, 5001-, 6001-, 7001-,
8001-10000, 10001-12000, 12001-14000, with 14001- treated as 15000
–insurance : insurance contract (medlevy : medibanl levy, levyplus : private health insurance, freepoor : government insurance due to low income, freerepa : government insurance due to old age disability or veteran status
–illness : number of illness in past 2 weeks
–actdays : number of days of reduced activity in past 2 weeks due to illness or injury
–hscore : general health score using Goldberg’s method (from 0 to 12). High score indicates bad health
–chcond : chronic condition (np : no problem, la : limiting activity, nla : not limiting activity)
–doctorco : number of consultations with a doctor or specialist in the past 2 weeks
–nondocco : number of consultations with non-doctor health professionals (chemist, optician, physiotherapist, social worker, district community nurse, chiropodist or chiropractor) in the past 2 weeks
–hospadmi : number of admissions to a hospital, psychiatric hospital, nursing or convalescent home in the past 12 months (up to 5 or more admissions which is coded as 5)
–hospdays : number of nights in a hospital, etc. during most recent admission: taken, where appropriate, as the mid-point of the intervals 1, 2, 3, 4, 5, 6, 7, 8-14, 15-30, 31-60, 61-79 with 80 or more admissions coded as 80. If no admission in past 12 months then equals zero.
–prescrib : total number of prescribed medications used in past 2 days
–nonpresc : total number of non-prescribed medications used in past 2 days
(a)(15 points) Conduct an exploratory data analysis, where the response y = doctorco +nondocco (i.e. the total number of visits to health care professional in the past two weeks) in relation to the other variables, which should be considered explanatory variables (covariates). In doing your analysis make sure to