Data Cleaning with Pandas
✕Data Transformation
Question 1 of 4
- Load
corona_data.csvinto a pandas DataFrame. Transform this data to have columnsDate,Country,TotalCases,TotalDeaths. Save result in csv filecorona_transformed.csv. - Load
lab_reading.csvinto a pandas DataFrame. Transform this data to have columnsDate,CO2,RainandMethane. FillReadingin appropriate column and save result in csv filelab_reading_transformed.csv. Note: Incase of duplicate take min value. - Load
treatment_info.csvinto a pandas DataFrame. Transform this data to have columnsDate,Treatment TypeandDosage. Save result in csv filetreatment_info_transformed.csv.
- Load
Data Cleaning - Heart Disease Dataset
Question 2 of 4
- Load
heart_disease_raw.csvinto a DataFrame. Perform the following operations: 1. Display number of rows and columns in dataset. 2. Display column names and their datatypes. 3. Rename columnHeart_ stroketoHeart_Stroke. 4. Display sample 15 records. 5. AdjustGendercolumn to have onlyM,Fandnull. 6. Adjusteducationto have only:Uneducate,Primary School,Graduate,Post Graduate,null7. AdjustExerciseto have only:null,daily,weeklyandmonthly8. Fill missing value in numeric column with their mean. 9. Fill missing value in categorical column with most frequent value. 10. Remove duplicate records. 11. Replace outliers in numeric column with mean value. 12. EnsureGender,education,Exercise,prevalentStroke,Heart_Strokeis category dtype 13. Ensure other columns also have appropriate datatypes. 14. Save cleaned data asheart_disease_cleaned.csv.
- Load
Data Cleaning - Baseball Player Dataset
Question 3 of 4
- Load
Baseball player.txtinto a DataFrame. Perform the following operations: 1. During Load assign header from fileBaseball Player - Clean.txt. 2. Display sample 15 records to understand data. 3. Handle missing value in numeric column with mean & categorical column with mode 4. Remove duplicate records. 5. Handle outliers in numeric column by clipping to 1st & 99th percentile. 6. Drop duplicate records. 7. Create new columnheight_cmconverting existing inch value1 inch = 2.54 cm8. Bin players into age group as:10-15,15-20, and so on till last value is included. 9. Replace underscore (_) of position column by space 10. Split the Name asFirstName,MiddleName,LastName& store in three columns. 11. For non-numeric value check if there is casing and spelling mistake. If so fix it. 12. Ensure all columns have appropriate datatypes. 13. Save cleaned data as tab separated values inbaseball_player_cleaned.tsv.
- Load
Answering from Baseball Player Dataset
Question 4 of 4
- Load
baseball_player_cleaned.tsvinto a DataFrame. Answer the following questions: 1. Which player has the highest BMI? (BMI = weight / height_cm^2) 2. Which team has the highest average player age? 3. How many players are in each position category? 4. What is the average weight of players in each age group? 5. Which player has the longest name (in terms of characters)? 6. How many players have age above 30 and weight below 70? 7. What is the distribution of players across different age groups? 8. Which player has the highest number of characters in their name? 9. How many players haveCenteras their position and are in20-25age group? 10. What is the average height of players in each position category? 11. What is average Height & Weight of U-23 baseball players? 12. Display the most common position hired by each team. 13.Display average age of each team and Position 14. Display MinHeight, MaxHeight, AverageHeight, MinWeight, MaxWeight & AverageWeight for each Position.
- Load
