R语言代写|R语言代做|R语言代考
当前位置:以往案例 > >Movielens
2023-01-04

### What to Submit
- An **R/Python Markdown file in HTML** or a link to such a file, containing all necessary code to reproduce the reported results. No page limit.
- A web link to a **movie recommendation app**, e.g., Shiny app, built by your team. You can share with us the link to your source code, or submit a copy of your code as one zip file on Coursera/Canvas

### HTML Markdown File (5pt)

It should contain the following two components.


#### System I: Recommendation based on genres


Suppose you know the user's favorite genre. How would you recommend movies to him/her?

Propose **two** recommendation schemes along with all necessary technical details.

For example, you can recommend the top-five most popular movies in that genre, then you have to define what you mean by "most popular". Or recommend the top-five highly-rated movies in that genre; again need to define what you mean by highly-rated. (Will the movie that receives only one 5-point review be considered highly rated?) Or recommend the most trendy movies in that genre; define how you measure trendiness.


For this part, you do not really need `recommenderlab`. Some data waggling/summary tools would be enough.

#### System II: Collaborative recommendation system


Review **two** collaborative recommendation algorithms: UBCF and IBCF. (Suggest reading Sec 2.1-2.2 of the [recommenderlab tutorial](https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf))

Please follow the following steps to provide your review.

----
For **UBCF**, use the following options:
-  `normalize = 'center'`: Let R denote the rating matrix with rows as users and columns as movies; this option means that we need to subtract each non-NA entry by its row mean. Here, row means are computed based on non-NA entries; for example, the mean of vector `(2, 4, NA, NA)` should be 3.
- `nn = 20`: nearest neighborhood size is 20. That is, the prediction for a new user is based on ratings from 20 users who are most similar to this new user.
- `weighted  = TRUE`: (this is the default option) Ratings from users that are more similar to the new user receive higher weights. That is, we use equation (4) (on page 6) instead of equation (3) (on page 5) in [recommenderlab tutorial](https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf)
- `method = 'Cosine'`: this similarity measure is defined in the 2nd paragraph of Sec 2.1 in [recommenderlab tutorial](https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf). Remember to transform this measure to be between 0 and 1. There is a typo in the transformation formula in the tutorial; see below.

![Screen%20Shot%202022-11-22%20at%202.22.05%20PM.png](https://campuspro-uploads.s3.us-west-2.amazonaws.com/3d46bbba-b94e-4270-82c1-e453ce55b35e/edbfa39f-c392-4030-95fe-3c4c7d75165e/Screen%20Shot%202022-11-22%20at%202.22.05%20PM.png)

Demonstrate how UBCF predicts the ratings of a new user based on training data. Use the first 500 users from MovieLens as training and predict the ratings of the 501st user.
```{r}
library(recommenderlab)
myurl = "https://liangfgithub.github.io/MovieData/"
ratings = read.csv(paste0(myurl, 'ratings.dat?raw=true'),
sep = ':',
colClasses = c('integer', 'NULL'),
header = FALSE)
colnames(ratings) = c('UserID', 'MovieID', 'Rating', 'Timestamp')
i = paste0('u', ratings$UserID)
j = paste0('m', ratings$MovieID)
x = ratings$Rating
tmp = data.frame(i, j, x, stringsAsFactors = T)
Rmat = sparseMatrix(as.integer(tmp$i), as.integer(tmp$j), x = tmp$x)
rownames(Rmat) = levels(tmp$i)
colnames(Rmat) = levels(tmp$j)
Rmat = new('realRatingMatrix', data = Rmat)

train = Rmat[1:500, ]
test = Rmat[501, ]
```

Store the predicted ratings for the 501st user in a vector named `mypred`. **Remember to provide all necessary code so we can reproduce your calculation** for `mypred`.

Next, compare your prediction with the one from `recommenderlab`

```{r}
recommender.UBCF <- Recommender(train, method = "UBCF",
parameter = list(normalize = 'center',
method = 'Cosine',
nn = 20))

p.UBCF <- predict(recommender.UBCF, test, type="ratings")
p.UBCF <- as.numeric(as(p.UBCF, "matrix"))

sum(is.na(p.UBCF) != is.na(mypred)) ### should be zero
max(abs(p.UBCF - mypred), na.rm = TRUE)  ### should be less than 1e-06
```
The last two commands above show that (1)  `p.UBCF` and `mypred` assign NA to the same set of movies and (2) their non-NA predictions are very close (**should be less than 1e-06**).

**NAs in the prediction**. In `mypred` and `p.UBCF`, a movie may receive NA prediction due to two reasons: 1) none of the 20 similar users has provided a rating for this movie yet; 2) the 501st user has watched this movie before (i.e., s/he has already assigned a rating for this movie).

----
Do the same for IBCF. For **IBCF**, use the following options:
-  `normalize = 'center'`
- `k = 30`: the nearest neighborhood size for items is 30.
- `weighted  = TRUE`: (this is the default option) That is, we use equation (5) (on page 7) in [recommenderlab tutorial](https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf)
- `method = 'Cosine'`

Again, demonstrate how IBCF predicts the ratings of the 501st user based on ratings from the first 500 users.

Store your prediction for the 501st user in a vector named `mypred`. Then compare your prediction with the one from `recommenderlab`

```{r}
recommender.IBCF <- Recommender(train, method = "IBCF",
parameter = list(normalize = 'center',
method = 'Cosine',
k = 30))

p.IBCF <- predict(recommender.IBCF, test, type="ratings")
p.IBCF <- as.numeric(as(p.IBCF, "matrix"))

## first output: should be less than 10
sum(is.na(p.IBCF) != is.na(mypred))

## second output: should be less than 10%
mydiff = abs(p.IBCF - mypred)
sum(mydiff[!is.na(mydiff)] > 1e-6) / sum(!is.na(mydiff))
```
The first output measures how many mismatches among NA assignments are between `p.IBCF` and `mypred`. You should target to have less than 10 mismatches.

The second output measures the percentage of disagreement (difference bigger than 1e-06) among non-NA predictions. You should target to have this number less than 10%.

**Question**: why do we encounter such a big discrepancy for IBCF, but not for UBCF? I have a partial answer but would like students to think about it.


### A Movie Recommendation App (5pt)

Build a shiny app (or any other app) with one algorithm from **System I** and one algorithm from **System II**.

For the algorithm from **System I**, your app needs to take input from a user on his/her favorite genre. For the algorithm from **System II**, your app needs to provide some sample movies and ask the user to rate them.


> **Output of your App**: N movies recommended to the user after his/her input, where N >= 5.

For **System I**, you should save your top choices for each genre, for example, as a table, so you do not need to recompute them each time. For **System II**, you can use either UBCF or IBCF from `recommenderlab` with any choices for similarity measure, neighborhood size, and normalization; there is no need to stick to the options used in your Markdown file, which are used for demonstration purposes. Also, students are not required to use their own implementation of UBCF/IBCF; your implementation in the Markdown file, again, is for demonstration purposes.

You need a front page that allows users to select which system to use. For  example,
![UI_image%20%281%29.png](https://campuspro-uploads.s3.us-west-2.amazonaws.com/497eef81-a2cf-4d1c-923e-22a7e4dcb092/368df833-ba22-414f-8fd1-a3e796342fc9/UI_image%20%281%29.png)

We will test your app. 3pt, if it works; 2pt for design. For example, an app like \[[**this**](https://philippsp.shinyapps.io/BookRecommendation/)\] will receive 2pt for design.

在线提交订单