back home

Thursday, February 4th, 2010

My time in Finland has passed far too quickly, but it was a great experience that I wouldn’t want to miss.

Besides getting to know heaps of students from half the world I also managed to learn a little bit of Finnish. I’m not sure if I will ever need

again, but who knows, after all almost one in thousand people worldwide speaks finnsih. [stunning number yeah] I got to know the Finns as a very nice people that may seem a little reserved at first, but when you managed to get to know them a little they are much more welcoming than some of the mediterranean folks. But of course the Finns also had some irritating customs. These included not really putting spice in their food, lighting hundreds of kilometres of streets through Nowhere and heating the pedestrian precinct. Maybe Finland could use some inspiration from southern europe for their cuisine ;)

I also had a few interesting courses in Finland that ranged from data compression to machine learning, it’s not unlikely that I’ll do one of my theses in on of these fields. But I have yet to decide about those…

And now a few more pictures from Finland:

shannon coding experiments

Friday, October 2nd, 2009

I just came back from today’s exercise in information theoretic modelling (so far I would call it redundancy and data compression, but that doesn’t sound anywhere as cool) with a question that had formed itself during the exercise.

We had the task to implement a Shannon-Fano encoding (a very simple data compression that uses variable length prefix codes) as homework. Basically the algorithm creates a binary tree so that the more frequent characters are higher up than the less frequent ones. This tree is then used to create binary prefix codes. The tree is constructed from a sorted table with the frequencies by subdividing it into 2 halfs that are as equal as possible.

This is the part where I didn’t pay too much attention and thought the equality was the most important goal when dividing. In fact the dividing step is quite simple as it is supposed to be a simple split at the right point. Instead I tried a brute force approach to find the best 1:1 sized  division. Later I came up with an approximation that is quite close to the brute force results and is much faster for bigger frequency tables. O(2^n) -> O(n)

In the exercise i found out how it was supposed to work and I started comparing the two approaches (brute force/approximation vs real shannon)

My understanding of the problem was that a very equal division means that the length of the codewords will be close to the length that is most suitable for the entropy – and thus would minimize average code length.

After trying it out I was proved wrong. I used a few pieces from the English wikipedia to test. And all of them turned out to be slightly longer using my encoding (difference between brute force and the approximation was negligible). On average Shannon was 3% better than my approach.

After thinking about it for a while I think I found the problem. It seems that my approach favours equal code length a little too much.

The following dataset illustrates the difference

char: C B A D E F
freq: 25% 22% 20% 15% 13% 5%

The two algorithms then give the following output
Shannon tree vs equal-split tree
As you can see the tree is more balanced for the equal-split but if you compare the differences in code length it is obvious that the second is worse for encoding.

char shannon equal-split len*probability (shannon) len*probability (equal-split) contribution to entropy
a 10 011 0.4 0.6 0.4644
b 01 10 0.44 0.44 0.4806
c 00 00 0.5 0.5 0.5000
d 110 111 0.45 0.45 0.4105
e 1110 110 0.52 0.39 0.3826
f 1111 010 0.2 0.15 0.2161
2.51 2.53 2.4542

The leftmost column represents the ideal ratio of this letter for the total encoding and the 2 columns right of this one show, how well the two algorithms work for that specific char. Most interesting is the last line where you can see that both algorithms are still a bit away from the lower bound imposed by the entropy. So the actual difference between the two algorithms is very slim. The effect of course varies with the dataset but I didn’t find any instances where the equal-split was closer to the entropy than shannon.
So far I wasn’t able to come up with a mathematical proof if shannon is always better, but maybe I’ll have the right idea one of these days…

You can also download the source from my experiments, but don’t expect too much as I have only just started with python ;)
soucecode (.py, 7.5kb)

Minkä maalainen sinä olet?

Tuesday, September 29th, 2009

*Which country are you from?

Wow time passes quick. It’s already one month since i arrived here in Helsinki. On the one hand it feels like yesterday as I get to see new stuff every day but on the other hand it’s like I’ve been around for ages ;)

Yesterday I went to Nuuksio Nat. park for a second time, but this time there was eight of us and one even had GPS on his mobile phone … Of course that doesnt mean that we were able to follow a straight path and once again, the last thing we found was the lake, but we found a lot of other interesting things on the way. Most notably we found a trampoline that led to a fair bit of fun in the forest (until we were chased away by the owner of the trampoline ;)

[paragraph only suitable for computer science students, others might wanna skip ;þ ] Although some of you might not believe that, I’m actually learning some things at uni here, too. On of them is Python. So far the language has impressed me with a few nice features such as self managing lists, operator overloading and polymorphy, but I think it also has some serious drawbacks. One of them is how iterators are done. Why cant python have something like hasNext() instead of that stupid exception that you have to catch if you do anything else than the standard for-each loop? Also i don’t like using “self.” at least once per line, feels like python had added object orientation by a little child that is determined to get the rectangle block through the triangle hole of his toy… And never try to collaborate with ppl  in python unless everyone uses the same kind of indentation. As of now I’d say python is nice for a quick tryout, but thats it!

Apart from computer science I’m still learning Finnish here. That language also has some curious features.

By omitting one little “a”, you can turn a meeting into a murder: tapaan means I meet, whereas tapan means I kill, dangerous language. Also we learned that one reason for Finns having a less favourable image abroad is that they don’t use Anteeksi, “excuse me/please” very much, however they use Kiitos, thank you a lot. If you know that, they appear quite a lot less rude :)

But this weekend we wont need Finnish so much. We found a quite cheap overnight cruise to Stockholm that we will be taking from sunantai (Sunday) evening to tiistai (Tuesday) morning. Quite cheap means that we will have 4m² for 3 ppl but I don’t think we will need our beds all that much. We’re not quite sure yet what we are going to do in Stockholm, but i don’t expect that to be a problem, when Erasmus students go somewhere, the plan is made when they get there!

That’s it for now, more will follow sometime after Stockholm

news from the north

Wednesday, September 9th, 2009

While looking at a calendar I realized that I’ve already been up here for more than one week and the thought crossed my mind that maybe I should write something new here ;)

As it turns out my room isnt that bad to reach after all because I also found a train running nearby. Also I wont have to go to uni all that often for the first half. The semester is divided in 2 sections here and in the first one I will only have classes Tuesday and Thursday :D These classes include python (programming), information theoretic modelling (computer science theory) and a beginners course in finnish (the language still seems crazy).

Together with other excahnge students we already made a short trip to the Island of Suomenlinna just south of Helsinki. There we walked through the remains of a huge castle built in the 18th century and had a pick-nick. Also we went to a few parties already, but sadly thats one of the things really expensive here. One pint of beer usually sets u back 5€ at least. Nevertheless i think there will be more than enough cheaper private parties with the other exchange students ;)

Another expensive thing (there are many of those) is buying groceries. The market that is closest to my student housing complex has quite a few items that cost twice as much as in Munich. But yesterday I have found a Lidl only one train stop away that has more reasonable prices (still expensive but acceptable)

One of the few things that are actually cheaper here is public transport. As long as you don’t have to go to Vantaa (north/northeast) or Espoo (west) you get one month for some 40€ or so. There is also a student discount but it only applies if you stay for more than one semester.

What is really silly is something about the Lyyra card. That card is something like a student card and it gets you many advantages… if u have one. The problem is that it takes almost a month to get it and you need to pay the card fee with bank transfer… very convenient for exchange students that have only just arrived…. Let’s hope that it arrives at the end of September….

One thing that happens to many Erasmus students is that you mostly get to meet other exchange students. To avoid that I went for a programme called ALICE where u get to meet Finnish students who want to improve their German. My partner will be a Maikki, a finnish student of international relations. As most Finnish students she seems to be in heaps of student organizations, clubs and other stuff, so It seems im going to get to know quite a few Finns.

As I dont have a working camera charger (working on that) right now, I dont have many pictures to show you, but if you know me on facebook you can always check out the “linked photos”, there are quite a few there.

first images from the north

Tuesday, September 1st, 2009

Yesterday I arrived here in Helsinki and I like it already (perhaps because uni hasn’t started yet *g*)

I’ll be living in a student housing complex in the north-east of the city not too far from the uni campus. I took a few pictures of this place for you to have a look:

Puhutteko te englantia?*

Saturday, August 29th, 2009
* Finnish for “Do you speak english?”

The time has come again, after 2 years in munich I became a little bored and decided to see something new. As some of you might know I’m going to spend the next winter term at Helsink University in Finland. Although my trip is starting next Monday, I cant exactly say that I’m fully packed, but who cares, still more than 24hrs to go!

I’ve already learned a tiny piece of that crazy language, but for anyone who thinks of that French is a very hard language…. think again. Finnish has cases in its grammar for counting or to describe relations to surfaces or volumes. Also there seems to be no limit for the length of a word as you can pick from a heap of suffixes to spice it up. The only easy part is pronunciation. Coming from Germany it seems you can say most words just the way it’s written.

There’ll be more to read here about my 4 month study trip soon (well if i can spare some time ;)  )

stay tuned…