Tuesday, 19 February 2013

Two changes at the same time in Python

I needed to remove ending of the variable names of a few thousand variables in fifteen different panda databases. The variable names often had the year included, which made it difficult to merge since having the year included in the variable names made the names different across years. No big problem:
i=0
for db in dflist:
    db.columns = [varName.replace(yearEndList[i], "") for varName in db.columns]
     i = i + 1 
I also wanted to change the names to lower case. Again, no big problem:
db.columns = [varName.lower() for varName in db.columns]
But I kept wondering whether it could be done more elegantly. Both replace and upper at the same time, not sequentially- in an elegant and fast way. A for loop could do this, for instance something similar to this seems natural:
for varName in db.columns
    replace(yearEndList[i], "")
    upper()
It is implicit that we are doing tings with the variable in the loop. 

Perhaps there is no similarly easy readable way to do multiple changes with list comprehension. Yes, it can be done, but not very elegantly, or?



Wednesday, 23 January 2013

Argh! BOM, UTF-8, and solution

Potentially useful rant: If you ever have a problem importing and analyzing what you believe is a standard .csv file (for examples using Python and Pandas' read.csv), you may want to know that sometimes the .csv file contains a hidden code (details about the encoding used, such as UTF-8 etc, BOM). After wasting too much time discovering and dealing with this, I found a quick solution: Open the .csv file in Notebook ++, go to Encoding and select "Encode in UTF-8 without BOM." Save the file again and the problem is gone.