Congratulations on getting a job as a data scientist at Paramount Pictures!
Please see the Data Analysis Project for your assignment.
Below you will find the files that you will need. Your boss has just acquired data about how much audiences and critics like movies as well as numerous other variables about the movies. This dataset is provided below, and it includes information from Rotten To
...[Show More]
Congratulations on getting a job as a data scientist at Paramount Pictures!
Please see the Data Analysis Project for your assignment.
Below you will find the files that you will need. Your boss has just acquired data about how much audiences and critics like movies as well as numerous other variables about the movies. This dataset is provided below, and it includes information from Rotten Tomatoes and IMDB for a random sample of movies. She is interested in learning what attributes make a movie popular. She is also interested in learning something new about movies. She wants you team to figure it all out. As part of this project you will complete exploratory data analysis (EDA), modeling, and prediction. All analysis must be completed using the R programming language via RStudio, and your write up must be an R Markdown document. To help you get started we provide a template Rmd file below (see Rmd template in the Required files section below). Download this file, and fill in each section.
IMPORTANT:
Analyses completed using software other than R, or not written up using R Markdown, will receive a 0 on the project regardless of their content.
Required files • Data
- Save this file in the same directory as the Rmd template (provided below).
NOTE: If you are using Chrome as your browser you might need to change the .gz at the end of the extension to .Rdata in the file you downloaded. movies.Rdata
• Codebook - Review this file to find out what each column in the data represents. movies_codebook.html
• Rmd template - You must use this template to write up your project. Save the data and this file in the same directory. reg_model_project.Rmd
• Assessment rubric - You might want to review the assessment rubric while working on your project so that you have some idea of how your peers will evaluate your work. reg_model_project_rubric.html
More information on the data
The data set is comprised of 651 randomly sampled movies produced and released before 2016. Some of these variables are only there for informational purposes and do not make any sense to include in a statistical analysis. It is up to you to decide which variables are meaningful and which should be omitted. For example information in the the `actor1` through `actor5` variables was used to determine whether the movie casts an actor or actress who won a best actor or actress Oscar. You might also choose to omit certain observations or restructure some of the variables to make them suitable for answering your research questions. When you are fitting a model you should also be careful about collinearity, as some of these variables may be dependent on each other. Source: Rotten Tomatoes and IMDB APIs.
[Show Less]