Saturday, 20 October 2018

Data Cleansing with R

Data munging is ____
A process to clean messy data

Binning is a method to manage ______ data
Noisy data

Ignoring missing values from your dataset is an easier and correct approach than updating the dataset with mean / median values
May be correct only at times when the records have more than 30-40% of the data going missing

Can a technically correct dataset still be incorrect for data analysis?
Yes, technically correct dataset does not mean data is clean for analysis

Data cleaning is the most time consuming process in data analysis
True

No comments:

Post a Comment