Music and internet, what else? Now with OpenData!: May 2010

Sunday, May 30, 2010

Master of Orion and other obsessions

So I yet again took one of my all-time favorite games out: Master of Orion, a turn based space exploration game by MicroProse. A screenshot:

It runs perfectly in dosbox, though I haven't gotten the sound working yet-- but that's a linux and lazyness issue (read: it worked fine on windows last time I tried, but this game doesn't really need sound anyway). A quick google showed me you can actually pick this game up, together with Master of Orion 2, for a mere 6 dollars (.). Or you could go to your favorite abandonia website and grab it there.

More nostalgia today:
Entombed, the first (metal) band I ever saw live. In a forgotten few sectors on my harddrive, there it was: Wolverine Blues:

A pretty damn awesome piece of rollin' death metal. For all those people who love AC/DC and secretly want death metal; for all those metal fans who secretly enjoy rock, I present to you a fine, fine album.

Sunday, May 23, 2010

Project Euler | Problem 6 (&7!)

Problem 6 is one of those where functional programming just makes things WAY too easy. Look at this:

squares x = [y^2| y<-[1..x]]
answer = (sum ([1..100]))^2 - sum(squares 100)

If you know how list comprehension (the [result | var<-generator;condition] part) works, and you know that [x..y] is all the natural numbers between x and y, including x and y, you can understand the above.
List comprehension works like this:
The generator generates one value. Each occurrence of var is replaced by the value, the conditions, of any, are evaluated, and if all of them apply, result is evaluated and "returned". Since functional languages like these are lazy, evaluation of a list comprehension only occurs when a value from the list is actually needed. Thus, one can define all prime numbers as follows:

allPrimes = [y |y<-[1..];isPrime( y )]

And use it as a list containing all possible primes (assuming the isPrime function exists). Just be sure not to call reverse or something on an infinite list like this. If you try to find the xth prime, all you have to do is allPrimes!x and the list will be evaluated up to the xth result.

[edit]
Holy crap, I almost answered problem 7 without even reading the question. Given allPrimes from above and isPrime from problem 3, we can do:

allPrimes ! 10001

And we're given the prime we want.
[/edit]

By the way, this album is fucking badass:

The Cosmic Hearse, a blog I've followed for nearly a year now, posted this and I've been hooked ever since. Its sound is fairly repetitive, but Kool Keith is just really fucking solid and there are some great samples and beats here. One of the strongest rap albums out there, possibly.

Project Euler | Problem 5

Well, it's about time for the next problem! The solution isn't hard to find with pen and paper if you realize that what is asked is the least common multiple of the numbers 1 to 20. Since 20 is quite a small number, we can easily find this through the prime decompositions of all these numbers. Here's how it works for 1 to 10 (which, as given, gives 2520 as a result):

2=2
3=3
4=2*2
5=5
6=2*3
7=7
8=2*2*2
9=3*3
10=2*5

Note that any multiple n needs to be dividable by each of the above. In other words, the multiple n needs to contain the prime decomposition of each of the above numbers. Next, observe that 2 occurs at most 3 times, 3 at most 2 times, and 5 and 7 both at most once in each prime decomposition. Thus, the number we're looking for is 2*2*2*3*3*5*7=5*7*8*9=2520.
Analogously;

11=11
12=2*2*3
13=13
14=2*7
15=3*5
16=2*2*2*2
17=17
18=2*3*3
19=19
20=2*2*5

So the number is 2520*2*11*13*17*19=232792560. Surprisingly big number, isn't it?

The question remains, is this the most efficient method? Probably not, since determining the prime decomposition in the way I did in a previous post less than p/2*(n/2) modular operations and (p-1) divisions, where p is the amount of (non-distinct) primes in the decomposition and n is the value to be decomposed. The actual amount of modular operations is the multiplication of (n/p_i/2) for p_i=1,p_1,p_2..p_y, where p_1..p_y are the primes in the decomposition of n. I'm pretty sure this can be optimized to p/2*sqrt( n) by changing the condition in the while loop from x/2 to sqrt( x). Sounds efficient? I'd say so. You really don't want to use this to reverse an RSA public key, though: we know that in the above, p=2, but also that n=2^(2048) or n=2^(3072), for a reasonably secure key. that still means we need sqrt(2^(2048)) operations with this naive method, which is insanely huge. Nevertheless, it might be fun to try this method, or preferably, the more efficient method called Number Field Sieve, which is mentioned here.
To be continued?

Project Euler | Problem 4

This problem asks you to find the biggest 6-digit palindrome that is the product of two 3-digit numbers. I found this one quite boring, so I'll just put the code (in the functional language Amanda) I used to solve it here and not think about it too much.

condition x = (~.empty) [y | y <- [100..999] ; divides y x ; (x/y)<1000 ]
isPalindrome len val = (take len (itoa val))=((reverse.drop len) (itoa val))
pali6 = reverse [x| x <- [100000..999999]; isPalindrome 3 x]
answer = hd [x | x <- pali6 ; condition x]
divides y x = (x%y = 0)

Those familiar with haskell should probably be able to understand this. It takes about half a minute to evaluate, which is probably because I first calculate all the palindromes and then check them. Also, it should be possible to do the whole thing in one line, but it gets kinda hard to read that way.

Saturday, May 22, 2010

Project Euler | Problem 3

Ah, an interesting problem!
Problem 3 (clicky) asks you to find the largest prime factor of a pretty big number. Now, you could (like me) try to figure out how to do prime factorization efficiently. wikipedia has some really interesting articles on this subject (I might write a little bit about them later). However, compared to real prime decomposition problems (like in RSA), the primes are hard to find because the factors are close together and are really, really large, compared to the number in the problem.

So I decided to try the brute-force approach. First I tried to do it top-down, ie. count from the square root of x to 1 and return the first factor, find the factor of that and so on. There are two flaws in that approach: first, the decomposition of the biggest (possibly non-prime) factor does not always provide the biggest prime factor. Second, the difference between the square root and the biggest prime factor is pretty large (in this particular case, on the order of 1000 times bigger than the biggest prime factor).
In short, brute-force works better by counting from 1. Here's some python that does just that:

def find_factor(x):
 if (x%2==0):
  return 2
 i=3
 while i<(x/2):
  if (x%i==0):
   return i
  i+=2
 return x
def decompose(x):
 res = []
 while(x>=2):
  f = find_factor(x)
  res.append(f)
  x=x/f
 return res
print decompose(600851475143)

decompose finds the whole decomposition, while find_factor finds the smallest factor of a number, which is always a prime factor. This is easy to prove:
Assume x is the smallest factor of n, but x is not prime. That means x has some prime decomposition that contains p, p!=x. But that means p<x, and because x divides n, p divides n, and thus p is a smaller factor of n than x, which is a contradiction.
Finding this factor is pretty fast:

namnatulco@namnatulco-desktop:~/codestuff$ memtime python test.py
[71, 839, 1471, 6857L]
Exit [0]
0.02 user, 0.00 system, 0.10 elapsed -- Max VSize = 1616KB, Max RSS = 56KB

However, if we try to find the decomposition of 600851475133, it takes a lot longer:

namnatulco@namnatulco-desktop:~/codestuff$ memtime python test.py
[7, 13, 47, 140484329L]
Exit [0]
47.08 user, 0.26 system, 65.80 elapsed -- Max VSize = 7212KB, Max RSS = 3364KB

Moving on to the C code:

#include  //for printf
long long find_factor(long long x){
 if (x%2==0){
  return 2;
 }
 long long i=3; //we can start at 3, since 2 is the only even prime
 while (i<(x/2)){
  if((x%i)==0){
   return i;
  }
  //even factors are caught by the initial if
  i+=2;
 }
 return x;
}
int main(){
 long long target = 600851475143LL;
 long long f = 0L;
 int count = 0;
 printf("Decomposing: %lld\n",target);
 while (target>=2){
  f = find_factor(target);
  printf("Factor%d:%lld\n",count,f);
  target = (target/f);
  count+=1;
 }
 return 0;
}

This is, as expected, really fast:

namnatulco@namnatulco-desktop:~/codestuff$ memtime ./a.out
 Decomposing: 600851475143
Factor0:71
Factor1:839
Factor2:1471
Factor3:6857
Exit [0]
0.00 user, 0.00 system, 0.10 elapsed -- Max VSize = 1616KB, Max RSS = 60KB

And the other number:

namnatulco@namnatulco-desktop:~/codestuff$ memtime ./a.out
 Decomposing: 600851475133
Factor0:7
Factor1:13
Factor2:47
Factor3:140484329
Exit [0]
1.26 user, 0.01 system, 1.70 elapsed -- Max VSize = 1616KB, Max RSS = 356KB

So the C code is about 35 times faster than python.

I've also tried to do this in Amanda, but it doesn't support long longs (I think it uses the C type long int underwater).

Thursday, May 20, 2010

Project Euler | Problem 2

Okay, so here's something about PE, problem 2 (clicky). Again, the problem is not very hard, it's just a matter of toying with writing efficient code (which is a pretty new idea to me, usually I only do this for more complex algorithms, rarely during actual coding). So, it's C time! Also, there will be some python, since Fibonacci is a nice toy example for generators.
Usually, I check for evenness with %2. However, C allows you to do bitwise operators while actually making sense (I've been told this is more of a pain in Java, but I haven't tried it in that language myself). So you can do this:

#include <stdio.h> //for printf
int main(){
 long long sum  = 0;
 int curr = 2;
 int prev = 1;
 int tmp  = 0;
 unsigned long long limit = 4000000000;
 printf("Up to: %lld\n",limit);
 //loop through all fibonaci numbers, starting with 1 and 2
 while (curr < limit){
  if(!(curr&1)){ //if even, add curr
   sum+=curr;
  }
  tmp=curr;
  curr+=prev;
  prev=tmp;
 }
 printf("Result:%lld\n",sum);
 return 0;
}

Note the brackets, which are needed since ! takes precedence.
This is pretty fast:

namnatulco@namnatulco-desktop:~/codestuff$ memtime ./a.out
4613732
Exit [0]
0.00 user, 0.00 system, 0.10 elapsed -- Max VSize = 1616KB, Max RSS = 56KB

Now, on to python. This shows why generators are cool: they're like functions, except they yield a value, instead of return, and when they yield, the context is saved and on the next call, they continue. More on generators: in the python docs and on wikipedia.

def fib(first, second):
 prev=first
 curr=second
 yield prev
 yield curr
 while True:
  tmp=curr
  curr+=prev
  prev=tmp
  yield (prev+curr)

STOP = 4000000

result = 0
for i in fib(1,2):
 if i%2==0:
  result+=i
 if i > STOP:
  break
print result

By the way, what wikipedia says about generator expressions ("Python has a syntax modelled on that of list comprehensions") makes sense if you're into functional programming (or mathematical set notation). So I figured I could do some Amanda here too, just for fun:

fib a b = a:(fib b (a+b))
f = fib 1 2

efib = [x | x<- f ; (x%2=0)]

res (x:y:xs) = x +y+ res xs, if y<4000000
             = x         , otherwise

main = res efib

Note that Amanda has lazy evaluation, ie. it only evaluates something when it needs that value. main= (...) simply executes the command when the code is loaded. fib is a function that generates all the fibonacci numbers, while efib is a function that picks out all the even ones with list comprehension. Fun fact: if we run efib by itself, it will generate an infinite list. If we interrupt the program and scroll back, we can see that: [2, 8, 34, 144, 610, 2584, 10946, 46368, 196418, 832040, 3524578, 14930352, 63245986, 267914296, 1134903170, 512559680, are all the even integers in the fibonacci series below whatever maximum amanda defines for numbers (beyond this, the number just wraps to negative).

Project Euler | Problem 1

Okay, so I'm going to try and learn some more languages, and improve my knowledge of those I already know. I'm going to keep a log of the code I write on this blog. I'll try and write the solution in as many languages as possible. Plus, maybe the fact that everyone can read my ugly code means I'll try and write things in a nicer way. This post-series is mostly just personal; I don't expect people to read this.

Anyway, lets get started with Problem 1. Of course, the basic solution is simple: walk over the numbers between 0 and 1000, check whether 3 or 5 divides it, add them to the sum. I figured I could play with some efficiency, but it turned out (clicky) that there is simply an x86 operator that does the modular operation, so there's not a whole lot to speed up here. Here's what I did in C:

#include 
int main(){
 int i = 0;
 int sum=0;
 while (i<1000){
  if ((i%3==0) | (i%5==0)){
   sum+=i;
  }
  i++;
 }
 printf("%d\n",sum);
}

I'm kind of bothered by the fact that I need the include. But anyway, you can also do this without a division if you really want it. it's easiest to demonstrate with python:

rresult=0
for i in range(999,0,-3):
 result+=i

for i in range(995,0,-5):
 result+=i

for i in range(15,1000,15):
 result-=i
print result

Of course, these numbers are pretty boring. I tried to scale things up a bit, so I modified the C code to use longs for the sum and had both go up to 1 000 000. Using memtime (a little tool to record time and used memory for whatever you feed it), I recorded the time each took. As expected, python is way slower:

namnatulco@namnatulco-desktop:~/codestuff$ memtime ./a.out 
233333166668
Exit [0]
0.01 user, 0.00 system, 0.10 elapsed -- Max VSize = 1616KB, Max RSS = 56KB
namnatulco@namnatulco-desktop:~/codestuff$ memtime python test.py 
233333166668
Exit [0]
0.27 user, 0.00 system, 0.40 elapsed -- Max VSize = 12472KB, Max RSS = 8544KB

However, if I change the python code to match what the C code does, this happens:

namnatulco@namnatulco-desktop:~/codestuff$ memtime python test.py 
233333166668
Exit [0]
0.74 user, 0.00 system, 0.80 elapsed -- Max VSize = 7212KB, Max RSS = 3312KB

Surprising? Perhaps at first, yes. But if you know a little bit about python, you'll know it has a lot of C in the background. Looks like even though the range() function takes a large amount of memory, it's way faster than manual computation.

So that's it. Enough toying with code; go listen to this magnificent post-hardcore album:

There's a sample track (Vulgar Theories) for listening on the dischord website (take note: flash).

Tuesday, May 18, 2010

Python!

So someone asked me a little while ago how to get some set of files to be sorted on creation date and then have a number in the filename to indicate the sorting. I figured it'd be easy with bash, but being the bash newbie I am, I switched to do some python. Using some quick googling, I wrote the code down there.
It does things pretty simple: it uses os.stat to query the ctime, puts it in a tuple together with the filename, adds it all to a list, sorts that list and then copies the files to their new names.

import os, glob, sys, shutil
from stat import ST_CTIME
path = '.'

lijstje = []
for f in glob.glob( os.path.join(path, '*') ):
 lijstje.append((os.stat(f)[ST_CTIME],f))

counter = 0
lijstje = sorted(lijstje)
for x in lijstje:
 date, name = x
 if counter <10:
  dest = "00%d%s"%(counter,name[2:])
 elif counter <100:
  dest = "0%d%s"%(counter,name[2:])
 else:
  dest = "%d%s"%(counter,name[2:])
 #print dest
 shutil.copy(name, dest)
 counter +=1

Here's what it does:


namnatulco@namnatulco-desktop:~/test/subdir$ ls -Al
total 4
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 a
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 b
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 f
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 hello_thar
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 test
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:45 testfile
-rw-r--r-- 1 namnatulco namnatulco 444 2010-05-18 21:45 test.py
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 x

I like to see what happens, so I commented the print line in the code out. Here's what it outputs:

namnatulco@namnatulco-desktop:~/test/subdir$ python test.py
000a
001x
002b
003f
004hello_thar
005test
006testfile
007test.py

And the resulting ls -Al:

namnatulco@namnatulco-desktop:~/test/subdir$ ls -Al
total 8
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:46 000a
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:46 001x
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:46 002b
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:46 003f
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:46 004hello_thar
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:46 005test
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:46 006testfile
-rw-r--r-- 1 namnatulco namnatulco 444 2010-05-18 21:46 007test.py
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 a
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 b
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 f
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 hello_thar
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 test
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:45 testfile
-rw-r--r-- 1 namnatulco namnatulco 444 2010-05-18 21:45 test.py
-rw-r--r-- 1 namnatulco namnatulco   0 2010-05-18 21:44 x

edit: yes, I know this is probably a lot easier in just plain bash, but I just like to play around in python. Besides, this'll probably work on windows :)

Music and internet, what else? Now with OpenData!