You work for the online store Ice, which sells video games all over the world. User and expert reviews, genres, platforms (e.g. Xbox or PlayStation), and historical data on game sales are available from open sources in the file games.csv
. You need to identify patterns that determine whether a game succeeds or not. This will allow you to spot potential big winners and plan advertising campaigns.
In front of you is data going back to 2016. Let’s imagine that it’s December 2016 and you’re planning a campaign for 2017.
(The important thing is to get experience working with data. It doesn't really matter whether you're forecasting 2017 sales based on data from 2016 or 2027 sales based on data from 2026.)
The dataset contains the abbreviation ESRB. The Entertainment Software Rating Board evaluates a game's content and assigns an age rating such as Teen or Mature.
-
Replace the column names (make them lowercase).
-
Convert the data to the required types.
-
Describe the columns where the data types have been changed and why.
-
If necessary, decide how to deal with missing values:
-
Explain why you filled in the missing values as you did or why you decided to leave them blank.
-
Why do you think the values are missing? Give possible reasons.
-
Pay attention to the abbreviation TBD (to be determined. Specify how you intend to handle such cases.
-
Calculate the total sales (the sum of sales in all regions) for each game and put these values in a separate column.
-
-
Look at how many games were released in different years. Is the data for every period significant?
-
Look at how sales varied from platform to platform. Choose the platforms with the greatest total sales and build a distribution based on data for each year. Find platforms that used to be popular but now have zero sales.
-
Determine what period you should take data for. To do so, look at your answers to the previous questions. The data should allow you to build a prognosis for 2017.
-
Work only with the data that you've decided is relevant. Disregard the data for previous years.
-
Which platforms are leading in sales? Which ones are growing or shrinking? Select several potentially profitable platforms.
-
Build a box plot for the global sales of all games, broken down by platform. Are the differences in sales significant? What about average sales on various platforms? Describe your findings.
-
Take a look at how user and professional reviews affect sales for one popular platform (you choose). Build a scatter plot and calculate the correlation between reviews and sales. Draw conclusions.
-
Keeping your conclusions in mind, compare the sales of the same games on other platforms.
-
Take a look at the general distribution of games by genre. What can we say about the most profitable genres? Can you generalize about genres with high and low sales?
For each region (NA, EU, JP), determine:
-
The top five platforms. Describe variations in their market shares from region to region.
-
The top five genres. Explain the difference.
-
Do ESRB ratings affect sales in individual regions?