Saturday, May 22, 2010

Project Euler | Problem 3

Ah, an interesting problem!
Problem 3 (clicky) asks you to find the largest prime factor of a pretty big number. Now, you could (like me) try to figure out how to do prime factorization efficiently. wikipedia has some really interesting articles on this subject (I might write a little bit about them later). However, compared to real prime decomposition problems (like in RSA), the primes are hard to find because the factors are close together and are really, really large, compared to the number in the problem.

So I decided to try the brute-force approach. First I tried to do it top-down, ie. count from the square root of x to 1 and return the first factor, find the factor of that and so on. There are two flaws in that approach: first, the decomposition of the biggest (possibly non-prime) factor does not always provide the biggest prime factor. Second, the difference between the square root and the biggest prime factor is pretty large (in this particular case, on the order of 1000 times bigger than the biggest prime factor).
In short, brute-force works better by counting from 1. Here's some python that does just that:
def find_factor(x):
if (x%2==0):
return 2
i=3
while i<(x/2):
if (x%i==0):
return i
i+=2
return x
def decompose(x):
res = []
while(x>=2):
f = find_factor(x)
res.append(f)
x=x/f
return res
print decompose(600851475143)

decompose finds the whole decomposition, while find_factor finds the smallest factor of a number, which is always a prime factor. This is easy to prove:
Assume x is the smallest factor of n, but x is not prime. That means x has some prime decomposition that contains p, p!=x. But that means p<x, and because x divides n, p divides n, and thus p is a smaller factor of n than x, which is a contradiction.
Finding this factor is pretty fast:
namnatulco@namnatulco-desktop:~/codestuff$ memtime python test.py
[71, 839, 1471, 6857L]
Exit [0]
0.02 user, 0.00 system, 0.10 elapsed -- Max VSize = 1616KB, Max RSS = 56KB

However, if we try to find the decomposition of 600851475133, it takes a lot longer:
namnatulco@namnatulco-desktop:~/codestuff$ memtime python test.py
[7, 13, 47, 140484329L]
Exit [0]
47.08 user, 0.26 system, 65.80 elapsed -- Max VSize = 7212KB, Max RSS = 3364KB
Moving on to the C code:
#include  //for printf
long long find_factor(long long x){
if (x%2==0){
return 2;
}
long long i=3; //we can start at 3, since 2 is the only even prime
while (i<(x/2)){
if((x%i)==0){
return i;
}
//even factors are caught by the initial if
i+=2;
}
return x;
}
int main(){
long long target = 600851475143LL;
long long f = 0L;
int count = 0;
printf("Decomposing: %lld\n",target);
while (target>=2){
f = find_factor(target);
printf("Factor%d:%lld\n",count,f);
target = (target/f);
count+=1;
}
return 0;
}

This is, as expected, really fast:
namnatulco@namnatulco-desktop:~/codestuff$ memtime ./a.out
Decomposing: 600851475143
Factor0:71
Factor1:839
Factor2:1471
Factor3:6857
Exit [0]
0.00 user, 0.00 system, 0.10 elapsed -- Max VSize = 1616KB, Max RSS = 60KB
And the other number:
namnatulco@namnatulco-desktop:~/codestuff$ memtime ./a.out
Decomposing: 600851475133
Factor0:7
Factor1:13
Factor2:47
Factor3:140484329
Exit [0]
1.26 user, 0.01 system, 1.70 elapsed -- Max VSize = 1616KB, Max RSS = 356KB
So the C code is about 35 times faster than python.

I've also tried to do this in Amanda, but it doesn't support long longs (I think it uses the C type long int underwater).

No comments:

Post a Comment