The challenges of dirty data

The challenges of dirty data

The challenges of dirty data

Date: May 14, 2014

Data pros often talk extensively about the importance of data cleaning and data governance to initiatives. But with unstructured data becoming more central to analysis of data from social media platforms, there is some debate about when to cleanse the data. Many data scientists, for example, want to see the data unvarnished, so they can identify outliers and other trends.

At the AIIM New England chapter meeting, we discussed best practices for dealing with dirty data, and Steve Weissman of the Holly Group and president of the chapter was on hand to offer some thoughts.

"It's not so much whether to clean dirty data, but when," he said. "There is value to getting all the raw data in, so that whoever is doing the analysis can make their own decisions about what biases are introduced by the dirty data. The alternative is clean it up first."

According to Weissman, particularly with unstructured data, there may be important information that is difficult to clean up and may be important to retain. Audience members also discussed the possibility of segmenting data outliers, then reintegrating the segment once it's been analyzed.

For more, check out this video.

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: