Archive

Posts Tagged ‘scripting’

“Everyone should be able to pull and analyze data”

February 11, 2014 Leave a comment

“Data overload” Islam Elsedoudi via flickr- cc http://creativecommons.org/licenses/by-sa/2.0/

Everyone should be able to write spaghetti code, and everyone should be able to pull and analyze data. And I’m not just talking about business-folk here.


Look at what’s going on in the digital humanities. Now, even literature, history, and religious scholars can use data to shed new insight on old texts. How awesome is that? But you have to be able to actually analyze the data. That means being able to query and scrub; that means knowing a bit of probability and statistics. The difference between a median and mean would be a start.

So yes, it’s no longer acceptable to say, “I suck at math!” and then ignore that part of the world.

I suck at physical exercise, but that doesn’t mean it’s OK for me to melt into a chair all day. We all need to work at the important stuff in life, and understanding data has become terribly important.

I agree with the overall sentiment of the quote, that more people should be able to do basic data scraping and analysis. Unfortunately, I don’t see it happening anytime soon for two reasons – the tools to analyze data are complicated to non-engineers and most people do not receive training in programming (to script and pull the data in the first place) or statistics (to crunch the data and draw valid insights).

Even if everyone had the skills and tools necessary to pull and analyze the data, there would still be a need for skilled analysts / data scientists. Executives and product managers often don’t have the time to do analysis themselves; it’s not efficient for them to do so. Analysts fulfill an important role by distilling raw data into products and insights.

How to remove “smart” quotes from a text file

October 11, 2010 4 comments

If you’ve copied and pasted text from Microsoft Word, chances are there will be the so-called smart quotes in that text. Some programs don’t handle these characters very well. You can turn them off in Word but if you’re trying to remedy the problem after the fact, sed is your old friend.  I’ll show you how to replace these curly quotes with the traditional straight quote.

Recall that you can do global find/replace by using sed.

sed s/[”“]/'"'/g File.txt

This won’t actually change the contents of the File, but you can save the results to a new file

sed s/[”“]/'"'/g File.txt > WithoutSmartQuotes.txt

If you wish to save the files in place, overwriting the original contents, you would do

sed -i ".bk" s/[”“]/'"'/g File.txt

This tells the sed command to make the change “in place”, while backing up the original file to File.txt.bk in case anything goes wrong.

To fix the smart quotes in all the text files in a directory, do the following:

for i in *.txt; do sed -i ".bk" s/[”“]/'"'/g $i; done

At the conclusion of the command, you will have double the number of text files in the directory, due to all the backup files. When you’ve concluded that the changes are correct (do a diff File.txt File.txt.bk to see the difference), you can delete all the backup files with rm *.bk.