Using Python, I would like to implement a function that takes a natural number n as input and outputs a list of natural numbers [y1, y2, y3, ...] such that n + y1*y1 and n + y2*y2 and n + y3*y3 and so forth is again a square.
What I tried so far is to obtain one y-value using the following function:
def find_square(n:int) -> tuple[int, int]:
if n%2 == 1:
y = (n-1)//2
x = n+y*y
return (y,x)
return None
It works fine, eg. find_square(13689) gives me a correct solution y=6844. It would be great to have an algorithm that yields all possible y-values such as y=44 or y=156.
>Solution :
Simplest slow approach is of course for given N just to iterate all possible Y and check if N + Y^2 is square.
But there is a much faster approach using integer Factorization technique:
Lets notice that to solve equation N + Y^2 = X^2, that is to find all integer pairs (X, Y) for given fixed integer N, we can rewrite this equation to N = X^2 - Y^2 = (X + Y) * (X - Y) which follows from famous school formula of difference of squares.
Now lets rename two factors as A, B i.e. N = (X + Y) * (X - Y) = A * B, which means that X = (A + B) / 2 and Y = (A - B) / 2.
Notice that A and B should be of same odditiy, either both odd or both even, otherwise in last formulas above we can’t have whole division by 2.
We will factorize N into all possible pairs of two factors (A, B) of same oddity. For fast factorization in code below I used simple to implement but yet quite fast algorithm Pollard Rho, also two extra algorithms were needed as a helper to Pollard Rho, one is Fermat Primality Test (which allows fast checking if number is probably prime) and second is Trial Division Factorization (which helps Pollard Rho to factor out small factors, which could cause Pollard Rho to fail).
Pollard Rho for composite number has time complexity O(N^(1/4)) which is very fast even for 64-bit numbers. Any faster factorization algorithm can be chosen if needed a bigger space to be searched. My fast algorithm time is dominated by speed of factorization, remaining part of algorithm is blazingly fast, just few iterations of loop with simple formulas.
If your N is a square itself (hence we know its root easily), then Pollard Rho can factor N even much faster, within O(N^(1/8)) time. Even for 128-bit numbers it means very small time, 2^16 operations, and I hope you’re solving your task for less than 128 bit numbers.
If you want to process a range of possible N values then fastest way to factorize them is to use techniques similar to Sieve of Erathosthenes, using set of prime numbers, it allows to compute all factors for all N numbers within some range. Using Sieve of Erathosthenes for the case of range of Ns is much faster than factorizing each N with Pollard Rho.
After factoring N into pairs (A, B) we compute (X, Y) based on (A, B) by formulas above. And output resulting Y as a solution of fast algorithm.
Following code as an example is implemented in pure Python. Of course one can use Numba to speed it up, Numba usually gives 30-200 times speedup, for Python it achieves same speed as optimized C++. But I thought that main thing here is to implement fast algorithm, Numba optimizations can be done easily afterwards.
I added time measurement into following code. Although it is pure Python still my fast algorithm achieves 8500x times speedup compared to regular brute force approach for limit of 1 000 000.
You can change limit variable to tweak amount of searched space, or num_tests variable to tweak amount of different tests.
Following code implementes both solutions – fast solution find_fast() described above plus very tiny brute force solution find_slow() which is very slow as it scans all possible candidates. This slow solution is only used to compare correctness in tests and compare speedup.
Code below uses nothing except few standard Python library modules, no external modules were used.
def find_slow(N):
import math
def is_square(x):
root = int(math.sqrt(float(x)) + 0.5)
return root * root == x, root
l = []
for y in range(N):
if is_square(N + y ** 2)[0]:
l.append(y)
return l
def find_fast(N):
import itertools, functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = factor(N)
mfs = {}
for e in fs:
mfs[e] = mfs.get(e, 0) + 1
fs = sorted(mfs.items())
del mfs
Ys = set()
for take_a in itertools.product(*[
(range(v + 1) if k != 2 else range(1, v)) for k, v in fs]):
A = Prod([p ** t for (p, _), t in zip(fs, take_a)])
B = N // A
assert A * B == N, (N, A, B, take_a)
if A < B:
continue
X = (A + B) // 2
Y = (A - B) // 2
assert N + Y ** 2 == X ** 2, (N, A, B, X, Y)
Ys.add(Y)
return sorted(Ys)
def trial_div_factor(n, limit = None):
# https://en.wikipedia.org/wiki/Trial_division
fs = []
while n & 1 == 0:
fs.append(2)
n >>= 1
all_checked = False
for d in range(3, (limit or n) + 1, 2):
if d * d > n:
all_checked = True
break
while True:
q, r = divmod(n, d)
if r != 0:
break
fs.append(d)
n = q
if n > 1 and all_checked:
fs.append(n)
n = 1
return fs, n
def fermat_prp(n, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
import random
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def pollard_rho_factor(n):
# https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
import math, random
fs, n = trial_div_factor(n, 1 << 7)
if n <= 1:
return fs
if fermat_prp(n):
return sorted(fs + [n])
for itry in range(8):
failed = False
x = random.randint(2, n - 2)
for cycle in range(1, 1 << 60):
y = x
for i in range(1 << cycle):
x = (x * x + 1) % n
d = math.gcd(x - y, n)
if d == 1:
continue
if d == n:
failed = True
break
return sorted(fs + pollard_rho_factor(d) + pollard_rho_factor(n // d))
if failed:
break
assert False, f'Pollard Rho failed! n = {n}'
def factor(N):
import functools
Prod = lambda it: functools.reduce(lambda a, b: a * b, it, 1)
fs = pollard_rho_factor(N)
assert N == Prod(fs), (N, fs)
return sorted(fs)
def test():
import random, time
limit = 1 << 20
num_tests = 20
t0, t1 = 0, 0
for i in range(num_tests):
if (round(i / num_tests * 1000)) % 100 == 0 or i + 1 >= num_tests:
print(f'test {i}, ', end = '', flush = True)
N = random.randrange(limit)
tb = time.time()
r0 = find_slow(N)
t0 += time.time() - tb
tb = time.time()
r1 = find_fast(N)
t1 += time.time() - tb
assert r0 == r1, (N, r0, r1, t0, t1)
print(f'\nTime slow {t0:.05f} sec, fast {t1:.05f} sec, speedup {round(t0 / max(1e-6, t1))} times')
if __name__ == '__main__':
test()
Output:
test 0, test 2, test 4, test 6, test 8, test 10, test 12, test 14, test 16, test 18, test 19,
Time slow 26.28198 sec, fast 0.00301 sec, speedup 8732 times