book_r.py
import pandas as pd import numpy as np # DO NOT CHANGE THE VARIABLE NAMES # Set the precision of our dataframes to one decimal place. pd.set_option('precision', 1) # Create a Pandas DataFrame that contains the ratings some users have given to a series of books. # The ratings given are in the range from 1 to 5, with 5 being the best score. # The names of the books, the corresponding authors, and the ratings of each user are given below: books = pd.Series(data = ['Great Expectations', 'Of Mice and Men', 'Romeo and Juliet', 'The Time Machine', 'Alice in Wonderland' ]) authors = pd.Series(data = ['Charles Dickens', 'John Steinbeck', 'William Shakespeare', ' H. G. Wells', 'Lewis Carroll' ]) # User ratings are in the order of the book titles mentioned above # If a user has not rated all books, Pandas will automatically consider the missing values as NaN. # If a user has mentioned `np.nan` value, then also it means that the user has not yet rated that book. user_1 = pd.Series(data = [3.2, np.nan ,2.5]) user_2 = pd.Series(data = [5., 1.3, 4.0, 3.8]) user_3 = pd.Series(data = [2.0, 2.3, np.nan, 4]) user_4 = pd.Series(data = [4, 3.5, 4, 5, 4.2]) # Use the data above to create a Pandas DataFrame that has the following column # labels: 'Author', 'Book Title', 'User 1', 'User 2', 'User 3', 'User 4'. # Let Pandas automatically assign numerical row indices to the DataFrame. # TO DO: Create a dictionary with the data given above dat = # TO DO: Create a Pandas DataFrame using the dictionary created above book_ratings = # TO DO: # If you created the dictionary correctly you should have a Pandas DataFrame # that has column labels: # 'Author', 'Book Title', 'User 1', 'User 2', 'User 3', 'User 4' # and row indices 0 through 4. # Now replace all the NaN values in your DataFrame with the average rating in # each column. Replace the NaN values in place. # HINT: Use the `pandas.DataFrame.fillna(value, inplace = True)` function for substituting the NaN values. # Write your code below:
solution.py
import pandas as pd import numpy as np pd.set_option('precision', 1) books = pd.Series(data = ['Great Expectations', 'Of Mice and Men', 'Romeo and Juliet', 'The Time Machine', 'Alice in Wonderland' ]) authors = pd.Series(data = ['Charles Dickens', 'John Steinbeck', 'William Shakespeare', ' H. G. Wells', 'Lewis Carroll' ]) user_1 = pd.Series(data = [3.2, np.nan ,2.5]) user_2 = pd.Series(data = [5., 1.3, 4.0, 3.8]) user_3 = pd.Series(data = [2.0, 2.3, np.nan, 4]) user_4 = pd.Series(data = [4, 3.5, 4, 5, 4.2]) dat = {'Book Title' : books, 'Author' : authors, 'User 1' : user_1, 'User 2' : user_2, 'User 3' : user_3, 'User 4' : user_4} book_ratings = pd.DataFrame(dat) book_ratings.fillna(book_ratings.mean(), inplace = True)
Level Up
From the DataFrame above can you now pick all the books that had a rating of 5? Hint – You can do this in just one line of code, by using pandas.DataFrame.any. Try to do it yourself first, you’ll find the answer below:
pandas.DataFrame.any
DataFrame.any
(axis=0, bool_only=None, skipna=True, level=None, **kwargs) Return whether any element is True, potentially over an axis. Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).
Here is one of the possible solutions:
best_rated = book_ratings[(book_ratings == 5).any(axis = 1)]['Book Title'].values
Explanation
# Step 1 - Highlight the rows containing a value = 5. # This step will replace all values != 5 with NaN in the entire DataFrame. best_rated = book_ratings[(book_ratings == 5)] # Step 2 - Extract the DataFrame containing only those rows that have a value = 5. best_rated = book_ratings[(book_ratings == 5).any(axis = 1)] # Step 3 - Find the corresponding Book Title of the rows identified above. # Returns a numpy.ndarray containing only the Book Titles best_rated = book_ratings[(book_ratings == 5).any(axis = 1)]['Book Title'].values
The code above returns a NumPy ndarray that only contains the names of the books that had a rating of 5.