Tony Schneider

  • Archive
  • RSS

UTF-8 Good, Windows-1252 Bad

So I’m using sidekiq, with sidekiq-scheduler at work in production to handle background tasks. Well, the other night around 10pm, just as I was settling down to watch the last episode of Mad Men, all hell broke lose.

After digging through the logs to find out what happened, all I could find was this

2012-05-07T01:59:10Z 24340 TID-fmbc0 ERROR: Manager#assign died
2012-05-07T01:59:10Z 24340 TID-fmbc0 ERROR: "\xC3" on US-ASCII
2012-05-07T01:59:10Z 24340 TID-fmbc0 ERROR: /home/web_apps/dream/shared/bundle/ruby/1.9.1/gems/json-1.6.6/lib/json/common.rb:148:in `encode'

None of the stack trace bubbled up to my app, so I knew I had no chance at rescuing from an exception. That left me with one thing to do, try and figure out what the heck caused this. Fortunately, I was able to lean on a few more seasoned colleagues of mine for advice.

After some discussion, we landed on our app being sent a windows-1252 encoded character, and trying to parse it as UTF-8.

So, to combat such things, we came up with this:

# This method exists because it has been shown that we cannot
# trust data coming from facebook
#
def clean_fb_str(str)
  ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
  if str == ic.iconv(str)
    # valid UTF-8 string should match its iconv version
    str
  else
    # UTF-8 conversion changed string, force windows-1252 into utf-8
    windows_ic = Iconv.new('UTF-8//IGNORE//TRANSLIT', 'WINDOWS-1252')
    windows_ic.iconv(str)
  end
end

Seems to do the job, as we haven’t yet had a reoccurrence of the issue.

NOTE: After deploying this on ruby-1.9.3, I’ve noticed we are getting deprecation warnings. I suppose we should be using String#encode instead.

    • #utf-8
    • #windows-1252
    • #iconv
    • #ruby
    • #sidekiq
  • 1 year ago
  • 1
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

1 Notes/ Hide

  1. tonywok posted this
← Previous • Next →

About

Avatar Hello there,

I'm a recent graduate of The Ohio State University. While attending OSU, I studied Computer Science & Engineering and minored in Studio Art.

I'm currently a software developer working for Neo. I'm an avid music listener, movie watcher, concert goer, and traveler.

You've stumbled upon a bag of my favorite links, thoughts, lessons, and opinions. I hope you enjoy.

Pages

  • Year End Lists

Me, Elsewhere

  • tonywok on Forrst
  • @tonywok on Twitter
  • tsnydermtg on Last.fm
  • My Skype Info
  • tonywok on github

Twitter

loading tweets…

  • RSS
  • Random
  • Archive
  • Mobile
Effector Theme by Pixel Union