Plotting Collatz conjecture values - Python

Question

I am writing a program to display the behavior or certain seeds when you apply the Collatz conjecture algorithm to them. I would like to know what I should change and also if there is a way to see which seed produced a certain plot. I am also having some problems with the color.

import matplotlib.pyplot as plt import numpy as np def collatz(seed): """Apply Collatz conjecture algorithm and stores all the values of the seed after every iteration to display them later""" y = [seed] while seed != 1: if seed % 2 == 0: seed = seed/2 else: seed = (3*seed) + 1 y.append(seed) return y colors = plt.cm.bwr(np.linspace(0, 1, 1500)) for i in range(1, 1500): #collecting data by applying the algorithm to a certain set of seeds Y = collatz(i) plt.plot(Y, color = colors[i]) plt.title("Collatz conjecture") plt.xlabel("Number of iterations until stuck") plt.ylabel("Values reached") plt.show()

I have rolled back Rev 4 → 3. Please see What should I do when someone answers my question?. — Sᴀᴍ Onᴇᴌᴀ, CommentedMay 2, 2024 at 22:08

tdy · Accepted Answer · 2024-04-30 14:50:53Z

Note how you're manually repeating the number of seeds (1500):

colors = plt.cm.bwr(np.linspace(0, 1, 1500)) for i in range(1, 1500): ...

That's always a good sign it should be a variable.

One option is to put the plotting code into a function parametrized by n_seeds (and then add 1 to make the naming accurate):

def plot_collatz(n_seeds=1500): # ------------ """Plot Collatz sequences for a given number of seeds""" colors = plt.cm.bwr(np.linspace(0, 1, n_seeds+1)) # --------- for seed in range(1, n_seeds+1): # --------- plt.plot(collatz(seed), color=colors[seed]) plt.title("Collatz conjecture") plt.xlabel("Number of iterations until stuck") plt.ylabel("Values reached") plt.show() plot_collatz(10)

Didn't you fall into your own trap by having n_seeds+1 twice? Why not make this another variable? :) — infinitezero, CommentedMay 1, 2024 at 6:53
Well that depends what the "trap" is. For a beginner review, I think the main takeaway can just be that if we want to change the number of seeds, now we only have to do it once. That takeaway message isn't affected by having multiple n_seeds+1 (and if we were to address the latter, I wouldn't recommend making a throwaway variable for n_seeds+1). — tdy, CommentedMay 1, 2024 at 11:14
@nocomment This figure is the output of plot_collatz(10) (preceding code block), so it doesn't contain seed 1161. If you're curious about the higher seeds, the figure gets quite busy, but here's the version with 1500 sequences. — tdy, CommentedMay 2, 2024 at 1:52
@nocomment as the answer above says, the main idea behind this plot is to show how certain seeds behave if you apply the algorithm. Yeah, it isn't that great to see which seed produced a certain plot but I think it makes a good job at displaying how chaotic the sequence is. Maybe paired with a scatter plot it would make more sense — Lorenzo, CommentedMay 2, 2024 at 4:54

AJNeufeld · Accepted Answer · 2024-05-01 22:39:22Z

Integer arithmetic

seed = seed/2 takes the value of seed (initially an integer) and divides it by 2, and stores the result as a float back in seed.

>>> 10 / 2 5.0

If the collatz sequence value ever exceeds \$2^{53}\$, seed % 2 == 0 will always be true, due to the limited float precision.

>>> 2 ** 53 + 1.0 # You'd expect this to be odd, but it's not! 9007199254740992.0

Instead, you should be using integer division: seed = seed // 2. Python's integers have variable length representations, so can easily hold exceedingly large integer values precisely.

You could also use an Augmented Assignment operator here (seed //= 2). It saves typing a variable name an additional time, and -- when the left-hand-side is more complex than a simple variable, such as self.seed or a[i] -- avoids evaluating the term a second time, which can be a speed improvement.

Thanks for the suggestion. So I should use this "contracted form" just for complex cases, right? Also, I don't understand how is it possible that 2^53 causes error: there is only one float digit (0) — Lorenzo, CommentedMay 1, 2024 at 20:00
@Lorenzo I'd probably use the //= form here as well. Less to type/maintain and it's how people usually do it. I've avoided it with ints only when hardcore micro-optimizing. — no comment, CommentedMay 1, 2024 at 20:04
Okay so it makes the code more readable and the double // makes so that i am not storing useless float digits. Does this make the program faster? — Lorenzo, CommentedMay 1, 2024 at 20:08
@Lorenzo Actually the // instead of / might make it slower. Because ints tend to be slower than floats, due to their arbitrary size that needs to be handled. You could measure to find out. But... you really should use ints. Otherwise you risk getting wrong results! — no comment, CommentedMay 1, 2024 at 20:14
@AJNeufeld Btw here's my benchmark for the 27 vs 29 ns that I mentioned. — no comment, CommentedMay 1, 2024 at 22:03

J_H · Accepted Answer · 2024-04-30 20:28:47Z

identifiers

The literature tends to speak of "values" generated by the \$3n+1\$ process. A "seed" is used to start the process, and we see successive values grow from there. It is perfectly sensible to accept a seed parameter. But during the growth loop, please use another name, perhaps n.

annotation

It wouldn't hurt to point out we accept seed: int, since a variant rational processes might perform the hailstone operation on a seed: Fraction. (Though I confess the lack of from fractions import Fraction is kind of a tip-off, here.)

It would also set you up for a numba JIT decorator.

single responsibility

Thank you for this beautiful docstring.

def collatz(seed): """Apply Collatz conjecture algorithm and stores all the values of the seed after every iteration to display them later"""

It is concise and accurate. It also helps us to see we are doing

compute and
store

Consider focusing on compute, so storage is someone else's problem.

from typing import Generator def collatz(seed: int) -> Generator[int, None, None]: """Generates hailstone values.""" n = seed while n > 1: if n % 2: n = 3 * n + 1 else: n //= 2 yield n def main(num_seeds: int = 1500) -> None: ... for seed in range(1, num_seeds + 1): y = list(collatz(seed)) ...

color maps

The cm.bwr colormap is very nice. You might consider playing with cyclic colormaps, to highlight the chaotic non-linear nature of the data you're plotting. Or cycle through a categorical map.

You may find that plotting points rather than lines, and using a partly transparent beta, allows you to achieve higher visual data density.

caching

If you summarize a given seed as producing an (iterations, max_value) tuple, you might find the @lru_cache or @cache decorators to be helpful; or introduce your own caching array. The literature sometimes uses delay to describe number of iterations for a given seed. Leavens and Vermeulen did some interesting work in this space in 1992.

Of course, the calculations we're doing are so lightweight that time spent on cache checks threatens to swamp the time spent calculating. So we'd want to unroll a bit, perhaps by doing half a dozen "ordinary" function calls followed by one where we check the cache. Or equivalently, to handle every seventh seed, do a modulo congruent to zero test. Or perhaps use some bigger prime.

Thanks for the exhaustive answer. Wouldn't a cyclic colormap "hide" the chaos of the algorithm? I think a non cyclic colormap makes it easier to estimate the seed: small seeds = bluish color, big seeds = reddish color (obviously in respect to the chosen interval). Btw this is my first matplotlib program and >I am still learning the basics of python so I don't really know how to check cache. — Lorenzo, CommentedMay 1, 2024 at 20:16
Well, Tufte tells us that there's more than one perceptual channel to access the visual cortex, and more than one thing that that cortex is good at. It all depends on what aspect of the underlying data you wish to highlight. — J_H, CommentedMay 1, 2024 at 20:30
About caching: looking up ints in a dict is also pretty fast, especially when they're a dense as they might be here. Btw if caching actually doesn't help, then another idea: They're already using NumPy. We could run all seeds in parallel. Put them all in an array, then do a step for all of them, then filter out the 1s. Until none are left. That could be competitive/interesting. — no comment, CommentedMay 1, 2024 at 20:40
The cPython dict implementation is impressive, but for the current "dense" situation it can't possibly be competitive with array access, as it necessarily has to do more stuff. // Running all seeds in parallel certainly does hold some charm. Go for it! Create an implementation, post an Answer; I will upvote it. — J_H, CommentedMay 1, 2024 at 21:34
Oh I wasn't comparing it with array (or list?) access. I was talking about your last paragraph, which sounds like caching with a dict might make their code slower. Although with an array/list cache, you might need to know the top peak value in advance or always check whether to expand it, which could be more costly than the dict access. I've already done the NumPy version now and it's indeed faster, but I don't think it fits as an answer here. I'll put a link in the next comment. — no comment, CommentedMay 1, 2024 at 21:47

Stack Exchange Network

Plotting Collatz conjecture values - Python

3 Answers 3

Integer arithmetic

identifiers

annotation

single responsibility

color maps

caching

Linked

Hot Network Questions

Plotting Collatz conjecture values - Python

3 Answers 3

Integer arithmetic

identifiers

annotation

single responsibility

color maps

caching

Linked

Related

Hot Network Questions