CONTEXT:  Let the computer clean up your database?  Well, that sounds scarier than “Synthetic Data”, but this author remembers when we used to write things down in a 3-part NCR case report form, so who know where this tech will take us int he clinical research world.  One to watch?

IMPACT:  Unknown

READ TIME:  2 mins

Quality Level Mean [1 – 10]:  7

1. “”PClean users can give PClean hints about how to reason more effectively about their database, and tune its performance — unlike previous probabilistic programming approaches to data cleaning, which relied primarily on generic inference algorithms that were often too slow or inaccurate,” says Mansinghka.” 

2. “PClean uses a knowledge-based approach to automate the data cleaning process: Users encode background knowledge about the database and what sorts of issues might appear.” 

3. “PClean builds on recent progress in probabilistic programming, including a built at MIT’s Probabilistic Computing Project that makes it much easier to apply realistic models of human knowledge to interpret data.” 

4. “David Pfau, a senior research scientist at DeepMind, that PClean meets a business need: “When you consider that the vast majority of business data out there is not images of dogs, but entries in relational databases and spreadsheets, it’s a wonder that things like this don’t yet have the success that deep learning has.”” 

5. “Agrawal says she hopes PClean will free up data scientists’ time, “to focus on the problems they care about instead of data cleaning.” 

Source URL: https://news.mit.edu/2021/system-cleans-messy-data-tables-automatically-0511