R语言代写|R语言代做|R语言代考
当前位置:以往案例 > >Disease prediction
2021-09-16

You are working with a research team who are trying to understand risk factors and causes of cardiovascular disease. They have completed a cohort study in which they have collected information at baseline related to various risk factors for cardiovascular disease in US adults. They have then followed the patients for ten years and have observed whether they developed heart disease during follow-up. You are provided with a dataset in which all the baseline and outcome information is summarized into a spreadsheet, with one subject per row, and columns indicate characteristics of the individuals. Missing data are indicated by “NA”.

 

1. Conduct preliminary checks of the data and remove any errant records accordingly. Comment on any changes made to the dataset and on the extent of missing data. Check the structure of all variables and make sure they are read in by R appropriately. Make a table suitable for publication which summarises the crude (unadjusted) association between each variable and the outcome. Use counts and column percentages for categorical variables and mean and standard deviations to summarise continuous data. For each predictor variable, select an appropriate statistical test and describe the result of each test in a couple of sentences. (9 marks)

 

 

2. Select fasting plasma glucose and diabetes status and illustrate the relationship between these variables and the outcome using an appropriate plot which explores these crude relationships and in people by stroke category and by age. Describe what the graphs show (6 marks)

 

 

3. Construct a multi-variable model which seeks to estimate the nature of the association between fasting plasma glucose (mg/dL) and incident coronary heart disease. Present a table of crude and adjusted measures of association derived from the model. Adjust for age and gender, cigarette smoking (currentSmoker), total cholesterol, and past stroke. Comment on the model fit. Discuss whether it is appropriate to adjust for systolic blood pressure in the model. (9 marks)

 

 

4. Interpret the results of your study and discuss the nature of the findings with reference to public health measures that may be supported by these findings. Contrast the findings from this study with two other similar studies. Do not include any Framingham-related studies. (6 marks)

 

 

Assume all patients have been followed for the same amount of time.

Assume there is little measurement error in recordings.

Variables are:male: 1 = male, 0 = female.

age: age in years

education: 1 = Some High School; 2 = High School graduate; 3 = Some University of Polytech; 4 = Completed University degree.

 

currentSmoker: 0 = non-smoker, 1 = smoker.

 

cigsPerDay: estimated number of cigarettes smoked per day.

 

BPMeds: 0 = no blood pressure medications. 1 = Blood pressure medications.

 

prevalentStroke: past stroke

 

prevalentHyp: diagnosis of hypertension.

 

diabetes: 0 = No; 1 = yes.

 

totChol: Serum total cholesterol in mg/dL

 

sysBP: systolic blood pressure (mmHg)

 

diaBP: diastolic blood pressure (mmHg)

 

BMI: body mass index (kg/m2)

 

heartRate: beats per minute

 

glucose: fasting serum in mg/dL

 

TenYearCHD: outcome variable (incidence of CHD in ten years). 0 = no CHD, 1 = CHD.


在线提交订单