How do I speed up this simulation program

Question

This is a program to solve a differential equation numerically using Euler method. As of now, it is very slow, and I need to run 10000 Monte Carlo simulations.

The differential equation is called stochastic Landau-Lifschitz-Gilbert equation with Spin Transfer Torque effect, basically a differential equation having vector cross products.

I cannot use scipy because there is random noise added in each simulation step (ASFAIK) scipy cannot do that. And I also need to normalize the resultant vector in each step.

import numpy as np from matplotlib import pyplot as plt class Mtj: def __init__(self, step_size, t_max): #Physical constants self.mu0 = 4 * np.pi * 1e-7 self.gamma = 2.2127e5 self.kb = 1.38e-23 self.e = 1.6e-19 self.me = 9.1e-31 self.hbar = 1.054e-34 self.alpha = 0.05 self.gamma_eff = self.gamma / (1 + self.alpha**2) #MTJ Dimensions self.D = 50e-9 self.tf = 1.1e-9 self.tox = 1.4e-9 self.AMTJ = np.pi * (self.D**2) / 4 self.Vf = self.AMTJ * self.tf #Demag tensors self.Nx = (np.pi * self.tf) / (4 * self.D) self.Ny = self.Nx self.Nz = 1 - (2 * self.Nx) #Mag field constants self.HEX = -(5 / 1e4) / self.mu0 self.Hex = np.array([0, self.HEX, 0]) self.HPMA = None self.HVCMA = None self.HDEMAG = None self.temperature = None #Magnetic Parameters self.Ms0 = 0.786e6 self.Ki0 = 5.274e-4 self.Beta0 = 97.3e-15 #Electrical self.VMTJ = -0.2 self.Istt = None self.Jstt = None self.Hstt = None #Temperature dependent parameters self.Msx = 1.73 self.Tc = 875 self.Ms = None self.gammai = 2.18 self.Ki = None self.gammab = 2.83 self.Beta = None self.P0 = 0.58 self.asp = 2e-5 self.P = None #MTJ resistance self.phi = 0.4 self.RA = 5e3 self.F = 33141 self.Rp = (self.tox / (self.F * self.phi**0.5 * self.AMTJ)) * np.exp((1.025 * self.tox * (2 * self.e * self.me * self.phi)**0.5) / self.hbar) self.Vh = 0.5 self.TMR = 1.0 #simulation initialization self.step_size = step_size self.t_max = t_max self.num_steps = round(self.t_max / self.step_size) self.m_initial = np.array([0, 0, 1]) self.magnetization = np.zeros((self.num_steps, 3)) self.time = np.zeros(self.num_steps) self.m = self.m_initial self.m_r = np.array([0, 0, 1]) def Set_temp_params(self, temperature): self.temperature = temperature self.Ms = self.Ms0 * (1 - (self.temperature / self.Tc)**self.Msx) self.Ki = self.Ki0 * (self.Ms / self.Ms0)**self.gammai self.Beta = self.Beta0 * (self.Ms / self.Ms0)**self.gammab self.P = self.P0 * (1 - self.asp * self.temperature**1.5 ) self.Hstt = (self.hbar * self.P) / (2 * self.e * self.mu0 * self.Ms * self.tf) * self.gamma def Set_mag_fields(self): self.HPMA = (2 * self.Ki) / (self.mu0 * self.Ms * self.tf) self.HVCMA = - (2 * self.Beta * abs(self.VMTJ)) / (self.mu0 * self.Ms * self.tox * self.tf) self.HDEMAG = -self.Ms * np.array([self.Nx, self.Ny, self.Nz]) def calculate_Rmtj(self): self.Rmtj = (self.Rp * (1 + (self.VMTJ/self.Vh)**2 + self.TMR)) \ / (1 + (self.VMTJ**2/self.Vh**2) + self.TMR * (0.5 * (1 + self.m[2]))) def llg(self): self.calculate_Rmtj() self.Istt = self.VMTJ / self.Rmtj self.Jstt = self.Istt / self.AMTJ sigmath = np.random.randn(3) sigmath = sigmath / np.linalg.norm(sigmath) Hthermal = np.sqrt((2 * self.kb * self.temperature * self.alpha) / ( self.mu0 * self.Ms * self.gamma * self.Vf * self.step_size)) * sigmath Hpma = np.array([0, 0, self.HPMA * self.m[2]]) Hvcma = np.array([0, 0, self.HVCMA * self.m[2]]) Hdemag = self.HDEMAG * self.m Heff = Hpma + Hdemag + self.Hex + Hvcma + Hthermal dmdt = -self.gamma_eff * np.cross(self.m, Heff) - self.alpha * self.gamma_eff * np.cross(self.m, np.cross(self.m, Heff)) - \ self.Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r)) return dmdt def run(self): for i in range(0,self.num_steps): dmdt = self.llg() self.m = self.m + self.step_size * dmdt self.m /= np.linalg.norm(self.m) self.magnetization[i] = self.m self.time[i] = i * self.step_size def plot(self): fig, ax = plt.subplots() ax.plot(self.time, self.magnetization[:, 0], label='Mx') ax.plot(self.time, self.magnetization[:, 1], label='My') ax.plot(self.time, self.magnetization[:, 2], label='Mz') ax.set_xlabel('Time (s)') ax.set_ylabel('Magnetization') ax.legend() plt.show() mtj = Mtj(1e-12, 10e-9) mtj.Set_temp_params(450) mtj.Set_mag_fields() mtj.run() mtj.plot()

I know Matlab would be faster, but python is convenient. Please give me suggestion if there is any way to speed up my code.

Have you tried using the Pypy JIT to run your code? It might speed things up. — Dair, CommentedMay 15, 2024 at 16:40
@vnp, you're looking for that np.random.randn(3) call. 8^) — J_H, CommentedMay 15, 2024 at 18:40
Like J_H mentioned in the answer, @numba.jit is a good start, but if you really want performance, write it in C or C++. That's the depressing truth about python performance: there isn't any. The good news is your code is mainly numeric calculations so translating it should be fairly straightforward. — Passer By, CommentedMay 16, 2024 at 8:29
numpy is for the mass processing of data, there is an overhead for type and range checking. Atomic calls make it slow, using numpy.sin for a single argument is slower than using math.sin. Thus it might be helpful to generate all the random numbers that are used at once instead of only 3 in every loop. — Lutz Lehmann, CommentedMay 16, 2024 at 9:35

Attack68 · Accepted Answer · 2024-05-16 17:05:19Z

This was very difficult for a quantitative analyst to examine because the separation between variables and constants was not distinct. It has already been observed in another answer that your naming convention is odd. Attaching constants to the class is a bad choice and makes it more difficult to understand what is updated on the instance and what is not.

Specifically indicating what are constants.

The first thing I did was rename all of your constants so that I try to observe any optimisations in functions:

import numpy as np from matplotlib import pyplot as plt # Physical constants C_mu0 = 4 * np.pi * 1e-7 C_gamma = 2.2127e5 C_kb = 1.38e-23 C_e = 1.6e-19 C_me = 9.1e-31 C_hbar = 1.054e-34 C_alpha = 0.05 C_gamma_eff = C_gamma / (1 + C_alpha ** 2) # MTJ dimensions C_D = 50e-9 C_tf = 1.1e-9 C_tox = 1.4e-9 C_AMTJ = np.pi * (C_D ** 2) / 4 C_vf = C_AMTJ * C_tf # Demag tensors C_Nx = (np.pi * C_tf) / (4 * C_D) C_Ny = C_Nx C_Nz = 1 - (2 * C_Nx) # Mag field constants C_HEX = -(5 / 1e4) / C_mu0 C_Hex = np.array([0, C_HEX, 0]) # C_HPMA = None # C_HVCMA = None # C_HDEMAG = None # Magnetic Parameters C_Ms0 = 0.786e6 C_Ki0 = 5.274e-4 C_Beta0 = 97.3e-15 # Electrical C_VMTJ = -0.2 # Temperature dependent parameters C_Msx = 1.73 C_Tc = 875 C_gammai = 2.18 C_gammab = 2.83 C_P0 = 0.58 C_asp = 2e-5 # MTJ resistance C_phi = 0.4 C_RA = 5e3 C_F = 33141 C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp( (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar) C_Vh = 0.5 C_TMR = 1.0 class Mtj: def __init__(self, step_size, t_max): # Mag field constants self.HPMA = None self.HVCMA = None self.HDEMAG = None self.temperature = None # Electrical self.Istt = None self.Jstt = None self.Hstt = None # Temperature dependent parameters self.Ms = None self.Ki = None self.Beta = None self.P = None # simulation initialization self.step_size = step_size self.t_max = t_max self.num_steps = round(self.t_max / self.step_size) self.m_initial = np.array([0, 0, 1]) self.magnetization = np.zeros((self.num_steps, 3)) self.time = np.zeros(self.num_steps) self.m = self.m_initial self.m_r = np.array([0, 0, 1]) def Set_temp_params(self, temperature): self.temperature = temperature self.Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx) self.Ki = C_Ki0 * (self.Ms / C_Ms0) ** C_gammai self.Beta = C_Beta0 * (self.Ms / C_Ms0) ** C_gammab self.P = C_P0 * (1 - C_asp * self.temperature ** 1.5) self.Hstt = (C_hbar * self.P) / (2 * C_e * C_mu0 * self.Ms * C_tf) * C_gamma def Set_mag_fields(self): self.HPMA = (2 * self.Ki) / (C_mu0 * self.Ms * C_tf) self.HVCMA = - (2 * self.Beta * abs(C_VMTJ)) / (C_mu0 * self.Ms * C_tox * C_tf) self.HDEMAG = -self.Ms * np.array([C_Nx, C_Ny, C_Nz]) def calculate_Rmtj(self): self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \ / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2]))) def llg(self): self.calculate_Rmtj() self.Istt = C_VMTJ / self.Rmtj self.Jstt = self.Istt / C_AMTJ sigmath = np.random.randn(3) sigmath = sigmath / np.linalg.norm(sigmath) Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / ( C_mu0 * self.Ms * C_gamma * C_vf * self.step_size)) * sigmath Hpma = np.array([0, 0, self.HPMA * self.m[2]]) Hvcma = np.array([0, 0, self.HVCMA * self.m[2]]) Hdemag = self.HDEMAG * self.m Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m, np.cross(self.m, Heff)) - \ self.Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r)) return dmdt def run(self): for i in range(0, self.num_steps): dmdt = self.llg() self.m = self.m + self.step_size * dmdt self.m /= np.linalg.norm(self.m) self.magnetization[i] = self.m self.time[i] = i * self.step_size def plot(self): fig, ax = plt.subplots() ax.plot(self.time, self.magnetization[:, 0], label='Mx') ax.plot(self.time, self.magnetization[:, 1], label='My') ax.plot(self.time, self.magnetization[:, 2], label='Mz') ax.set_xlabel('Time (s)') ax.set_ylabel('Magnetization') ax.legend() plt.show()

Mixing initialisation and execution

Secondly you have a separate process for initialisating step sizes, which you do on the instance and a separate method for setting temperature parameters. You should set the temperature parameters at initialisation of the class.

Then I relabelled temperature names as constants internal to the class so that it was further obvious what was updated during a run.

class Mtj: def __init__(self, step_size, t_max, temperature): # Mag field constants self.HPMA = None self.HVCMA = None self.HDEMAG = None # Temperature self.temperature = temperature self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx) self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5) self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma # Electrical self.Istt = None self.Jstt = None # simulation initialization self.step_size = step_size self.t_max = t_max self.num_steps = round(self.t_max / self.step_size) self.m_initial = np.array([0, 0, 1]) self.magnetization = np.zeros((self.num_steps, 3)) self.time = np.zeros(self.num_steps) self.m = self.m_initial self.m_r = np.array([0, 0, 1]) def Set_mag_fields(self): self.HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf) self.HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf) self.HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz]) def calculate_Rmtj(self): self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \ / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2]))) def llg(self): self.calculate_Rmtj() self.Istt = C_VMTJ / self.Rmtj self.Jstt = self.Istt / C_AMTJ sigmath = np.random.randn(3) sigmath = sigmath / np.linalg.norm(sigmath) Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / ( C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath Hpma = np.array([0, 0, self.HPMA * self.m[2]]) Hvcma = np.array([0, 0, self.HVCMA * self.m[2]]) Hdemag = self.HDEMAG * self.m Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m, np.cross(self.m, Heff)) - \ self.C_Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r)) return dmdt

After doing this I could observe that set_mag_fields was just setting constants as well. So this was restructured into the initialisation.

class Mtj: def __init__(self, step_size, t_max, temperature): # Temperature self.temperature = temperature self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx) self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5) self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma # Mag field constants self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf) self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf) self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz])

Standardising routines

I removed calculate_Rmtj. A better name for it would have been set_Rmtj but it was a one line function out of context with the rest of your code style.

Unfortunately I have to get back to work now.. But with the increased visibility it might be easier to look for further optimisations and any possible vectorisations. I am sure this can be improved further for performance gains.

Full Updated code

import numpy as np from matplotlib import pyplot as plt # Physical constants C_mu0 = 4 * np.pi * 1e-7 C_gamma = 2.2127e5 C_kb = 1.38e-23 C_e = 1.6e-19 C_me = 9.1e-31 C_hbar = 1.054e-34 C_alpha = 0.05 C_gamma_eff = C_gamma / (1 + C_alpha ** 2) # MTJ dimensions C_D = 50e-9 C_tf = 1.1e-9 C_tox = 1.4e-9 C_AMTJ = np.pi * (C_D ** 2) / 4 C_vf = C_AMTJ * C_tf # Demag tensors C_Nx = (np.pi * C_tf) / (4 * C_D) C_Ny = C_Nx C_Nz = 1 - (2 * C_Nx) # Mag field constants C_HEX = -(5 / 1e4) / C_mu0 C_Hex = np.array([0, C_HEX, 0]) # C_HPMA = None # C_HVCMA = None # C_HDEMAG = None # Magnetic Parameters C_Ms0 = 0.786e6 C_Ki0 = 5.274e-4 C_Beta0 = 97.3e-15 # Electrical C_VMTJ = -0.2 # Temperature dependent parameters C_Msx = 1.73 C_Tc = 875 C_gammai = 2.18 C_gammab = 2.83 C_P0 = 0.58 C_asp = 2e-5 # MTJ resistance C_phi = 0.4 C_RA = 5e3 C_F = 33141 C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp( (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar) C_Vh = 0.5 C_TMR = 1.0 class Mtj: def __init__(self, step_size, t_max, temperature): # Temperature self.temperature = temperature self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx) self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5) self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma # Mag field constants self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf) self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf) self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz]) # Electrical self.Istt = None self.Jstt = None # simulation initialization self.step_size = step_size self.t_max = t_max self.num_steps = round(self.t_max / self.step_size) self.m_initial = np.array([0, 0, 1]) self.magnetization = np.zeros((self.num_steps, 3)) self.time = np.zeros(self.num_steps) self.m = self.m_initial self.m_r = np.array([0, 0, 1]) def llg(self): self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \ / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2]))) self.Istt = C_VMTJ / self.Rmtj self.Jstt = self.Istt / C_AMTJ sigmath = np.random.randn(3) sigmath = sigmath / np.linalg.norm(sigmath) Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / ( C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath Hpma = np.array([0, 0, self.C_HPMA * self.m[2]]) Hvcma = np.array([0, 0, self.C_HVCMA * self.m[2]]) Hdemag = self.C_HDEMAG * self.m Heff = Hpma + Hdemag + C_Hex + Hvcma + Hthermal dmdt = -C_gamma_eff * np.cross(self.m, Heff) - C_alpha * C_gamma_eff * np.cross(self.m, np.cross(self.m, Heff)) - \ self.C_Hstt * self.Jstt * np.cross(self.m, np.cross(self.m, self.m_r)) return dmdt def run(self): for i in range(0, self.num_steps): dmdt = self.llg() self.m = self.m + self.step_size * dmdt self.m /= np.linalg.norm(self.m) self.magnetization[i] = self.m self.time[i] = i * self.step_size def plot(self): fig, ax = plt.subplots() ax.plot(self.time, self.magnetization[:, 0], label='Mx') ax.plot(self.time, self.magnetization[:, 1], label='My') ax.plot(self.time, self.magnetization[:, 2], label='Mz') ax.set_xlabel('Time (s)') ax.set_ylabel('Magnetization') ax.legend() plt.show() mtj = Mtj(1e-12, 10e-9, 450) mtj.run() mtj.plot()

Actually you can now observe that these lines:

sigmath = np.random.randn(3) sigmath = sigmath / np.linalg.norm(sigmath) Hthermal = np.sqrt( (2 * C_kb * self.temperature * C_alpha) / (C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size) ) * sigmath

contain purely constants and random variables. You can extract this outside the class and create one large vectorised array rather than calculating it repeatedly in each loop.

Profiling

np.cross accounts for 60% of the profiling speed.

It runs 50,000 times. You can observe that one of the equations does not need to be calculated twice: np.cross(self.m, Heff) can be precalculated and substitued in, which will reduce to 40,000 times.

You may also find that your triple vector cross products are more rapidly solved by the alternative formulation. But for this you would have to experiment.

Edited with suggestions

I took the constants and random variables to initialisation and also reduced the cross product computations to 40,000. Speed improved:

I also used the alternative formulation for the triple cross product as follows:

 m_dot_m = np.dot(self.m, self.m) m_dot_Heff = np.dot(self.m, Heff) m_dot_r = np.dot(self.m, self.m_r) m_x_m_x_Heff = self.m * m_dot_Heff - Heff * m_dot_m m_x_m_x_r = self.m * m_dot_r - self.m_r * m_dot_m cross_m_Heff = np.cross(self.m, Heff) dmdt = -C_gamma_eff * cross_m_Heff - C_alpha * C_gamma_eff * \ m_x_m_x_Heff - \ self.C_Hstt * self.Jstt * m_x_m_x_r

This was very effective, notice the time for llg

The real gains

You mentioned that you need to do this 10,000 times. You cannot vectorize or parallelize the above because the operations need to be performed in sequence.

However you can, and should, vectorize the run of 10,000 of these simulations. For example the array magnetization has dimension (num_steps, 3), but you aim to create a 3D tensor which is of dimension (num_steps, 3, 10000).

That is why I reformulated the equation using dot products, so that later you can vectorize all of these formulae to obtain the speed from numpy's C implementation.

Thus far

 import numpy as np from matplotlib import pyplot as plt # Physical constants C_mu0 = 4 * np.pi * 1e-7 C_gamma = 2.2127e5 C_kb = 1.38e-23 C_e = 1.6e-19 C_me = 9.1e-31 C_hbar = 1.054e-34 C_alpha = 0.05 C_gamma_eff = C_gamma / (1 + C_alpha ** 2) # MTJ dimensions C_D = 50e-9 C_tf = 1.1e-9 C_tox = 1.4e-9 C_AMTJ = np.pi * (C_D ** 2) / 4 C_vf = C_AMTJ * C_tf # Demag tensors C_Nx = (np.pi * C_tf) / (4 * C_D) C_Ny = C_Nx C_Nz = 1 - (2 * C_Nx) # Mag field constants C_HEX = -(5 / 1e4) / C_mu0 C_Hex = np.array([0, C_HEX, 0]) # C_HPMA = None # C_HVCMA = None # C_HDEMAG = None # Magnetic Parameters C_Ms0 = 0.786e6 C_Ki0 = 5.274e-4 C_Beta0 = 97.3e-15 # Electrical C_VMTJ = -0.2 # Temperature dependent parameters C_Msx = 1.73 C_Tc = 875 C_gammai = 2.18 C_gammab = 2.83 C_P0 = 0.58 C_asp = 2e-5 # MTJ resistance C_phi = 0.4 C_RA = 5e3 C_F = 33141 C_Rp = (C_tox / (C_F * C_phi ** 0.5 * C_AMTJ)) * np.exp( (1.025 * C_tox * (2 * C_e * C_me * C_phi) ** 0.5) / C_hbar) C_Vh = 0.5 C_TMR = 1.0 class Mtj: def __init__(self, step_size, t_max, temperature): # Temperature self.temperature = temperature self.C_Ms = C_Ms0 * (1 - (self.temperature / C_Tc) ** C_Msx) self.C_Ki = C_Ki0 * (self.C_Ms / C_Ms0) ** C_gammai self.C_Beta = C_Beta0 * (self.C_Ms / C_Ms0) ** C_gammab self.C_P = C_P0 * (1 - C_asp * self.temperature ** 1.5) self.C_Hstt = (C_hbar * self.C_P) / (2 * C_e * C_mu0 * self.C_Ms * C_tf) * C_gamma # Mag field constants self.C_HPMA = (2 * self.C_Ki) / (C_mu0 * self.C_Ms * C_tf) self.C_HVCMA = - (2 * self.C_Beta * abs(C_VMTJ)) / (C_mu0 * self.C_Ms * C_tox * C_tf) self.C_HDEMAG = -self.C_Ms * np.array([C_Nx, C_Ny, C_Nz]) # Electrical self.Istt = None self.Jstt = None # simulation initialization self.step_size = step_size self.t_max = t_max self.num_steps = round(self.t_max / self.step_size) self.m_initial = np.array([0, 0, 1]) self.magnetization = np.zeros((self.num_steps, 3)) self.time = np.zeros(self.num_steps) self.m = self.m_initial self.m_r = np.array([0, 0, 1]) sigmath = np.random.randn(3, self.num_steps) sigmath = sigmath / np.linalg.norm(sigmath, axis=0) self.C_Hthermal = np.sqrt((2 * C_kb * self.temperature * C_alpha) / ( C_mu0 * self.C_Ms * C_gamma * C_vf * self.step_size)) * sigmath def llg(self, i): self.Rmtj = (C_Rp * (1 + (C_VMTJ / C_Vh) ** 2 + C_TMR)) \ / (1 + (C_VMTJ ** 2 / C_Vh ** 2) + C_TMR * (0.5 * (1 + self.m[2]))) self.Istt = C_VMTJ / self.Rmtj self.Jstt = self.Istt / C_AMTJ Hpma = np.array([0, 0, self.C_HPMA * self.m[2]]) Hvcma = np.array([0, 0, self.C_HVCMA * self.m[2]]) Hdemag = self.C_HDEMAG * self.m Heff = Hpma + Hdemag + C_Hex + Hvcma + self.C_Hthermal[:, i] m_dot_m = np.dot(self.m, self.m) m_dot_Heff = np.dot(self.m, Heff) m_dot_r = np.dot(self.m, self.m_r) m_x_m_x_Heff = self.m * m_dot_Heff - Heff * m_dot_m m_x_m_x_r = self.m * m_dot_r - self.m_r * m_dot_m cross_m_Heff = np.cross(self.m, Heff) dmdt = -C_gamma_eff * cross_m_Heff - C_alpha * C_gamma_eff * \ m_x_m_x_Heff - \ self.C_Hstt * self.Jstt * m_x_m_x_r return dmdt def run(self): for i in range(0, self.num_steps): dmdt = self.llg(i) self.m = self.m + self.step_size * dmdt self.m /= np.linalg.norm(self.m) self.magnetization[i] = self.m self.time[i] = i * self.step_size def plot(self): fig, ax = plt.subplots() ax.plot(self.time, self.magnetization[:, 0], label='Mx') ax.plot(self.time, self.magnetization[:, 1], label='My') ax.plot(self.time, self.magnetization[:, 2], label='Mz') ax.set_xlabel('Time (s)') ax.set_ylabel('Magnetization') ax.legend() plt.show() mtj = Mtj(1e-12, 10e-9, 450) mtj.run() mtj.plot()

Takes less than one second to run on my end. A significant improvement in terms of speed. — Kate, CommentedMay 16, 2024 at 15:01
Thanks! Actually using explcit cross product made it even faster, I precalculated the random number array, now it runs is about 0.5 seconds. Could you tell me how you got that screenshot of benchmark statistics? — Jonathan Wick, CommentedMay 16, 2024 at 15:45
I used the profiling tool in the professional version of Pycharm, which uses the cProfile profiler. — Attack68, CommentedMay 16, 2024 at 16:19
Choosing a “this is a constant” prefix is a design choice, that’s cool. But please prefer lowercase “c_” over the proposed “C_”, out of respect for PEP-8. (I gave the OP a pass on most naming details, due to very helpful adherence to math conventions which are a bit different from python conventions.) Yeah, yeah, I know, manifest constants are typically upcased. But here, mixed case sometimes makes an identifier more readable, more familiar. — J_H, CommentedMay 16, 2024 at 20:51
Pep8 module level constants are usually all caps. The only reason I didnt use all caps is it was easier to stick with the OP original naming for comparison and when editing to avoid errors. I prefer the capital C because it is more obvious and it better suited the task presented here and made the answer clearer to me and, in my opinion, to most. — Attack68, CommentedMay 16, 2024 at 21:12

atommy · Accepted Answer · 2024-05-16 08:39:22Z

I'm not an expert but let me raise very common tips that make a calculation faster and that you might be already aware of.

Decreasing the number of function calls

You call llg() 10000 times in the loop, which then calls calculate_Rmtj() in it. Calling a function is slow. Did you try expanding llg() and calculate_Rmtj() directly in the loop? (yes that will make your code messy). Also, you call randn(), cross() and norm() in the loop. Some of them can be slow. It looks like randn() can be called before the loop, i.e., building (10000, 3) array by randn(10000, 3) and then you can access elements in the loop.

Using a JIT compiler

If you want to make your function work faster, a JIT compiler might be the thing. numba is a very popular package for this purpose. By using it, you can for example make llg() run a lot faster (by orders of magnitude). But you need to modify your code.

Using local variables

Using local variables are generally faster way. I'm not very sure about this but accessing self.something from a function scope over and over looks not economical. You can try to pass them to a function as arguments.

J_H · Accepted Answer · 2024-05-15 19:08:12Z

physical constants

Thank you for using MKS, it's all very clear.

 self.e = 1.6e-19 self.me = 9.1e-31

Number of coulombs is correct to three decimal places. But it's just too easy to specify it to five, in keeping with the γ sig figs: 1.6022 Similarly 9.1094 for mass, which had just two sig figs.

I can see some small attraction to always having self.this and self.that, but still, choosing to allocate new storage for these on each object instance is an odd modeling choice. Consider putting such constants at top-level of this or another module. Consider relying on a pypi package which already offers them, such as astropy or scipy:

from scipy.constants import e, m_e, mu_0

When we examine a particular "constant", this (modern) OP code has the curious artifact of relying on the value of mu_0 that was used from 1948 through 2018. The redefinition of 2019 altered its value to the slightly different one we see in the scipy package. This was due to ditching the platinum kilogram prototype, which regrettably was starting to exhibit discrepancies of ~ 50 μg.

tl;dr: These details are boring and not part of the core contribution of this work; outsource them so you needn't worry about them.

naming

These are distinct identifiers:

 self.HEX = -(5 / 1e4) / self.mu0 self.Hex = np.array([0, self.HEX, 0])

But I do wish you'd pick a slightly different name for that first one. Sometimes colleagues discuss code over the telephone. Try to choose names that will sound distinct when recited aloud, or when recited mentally during quiet code reading.

 def Set_temp_params(self, temperature): ... def Set_mag_fields(self):

Pep-8 asked you nicely to please use a lowercase set_ prefix for such methods.

cite a reference

 def llg(self):

The Review Context made it clear enough that this is the Landau-Lifschitz-Gilbert equation, and I thank you for that. But the source code really needs a """docstring""" or at least a # comment to that effect.

Citing some author would also shed a great deal of light on F, Rmtj, and the various other expressions. As written this code is on the opaque side. I do appreciate that it is trying hard to conform to math notation conventions.

profiling

This submission would have benefited from the inclusion of profiling measurements.

speed

We pick a random vector in 3-space, good.

 def llg(self): ... sigmath = np.random.randn(3)

But then we run() it \$10^4\$ times. There's an opportunity to roll all the random numbers at once here, and let numpy / BLAS worry about looping.

Or if you wish to conserve memory, pick some compromise, perhaps by rolling 300 numbers at a time and doing a hundred runs through this.

If you don't go that route, consider adding type annotation which mypy lints clean on, and then giving this method numba's @njit decorator.

You're computing nine temp vars, each of them changing slightly on every iteration. It's not clear to me how big the partial derivative is on each of them. If some of them are fairly stable, changing by some tiny epsilon with every iteration, consider hoisting them out of this function, that is, hoisting them out of the loop. So if we do ten loop unrolls, we compute just once at top of unroll a smallish change which is 10x bigger than what OP code computes, rather than computing order-of-magnitude smaller values ten times. Then we still need to loop back \$10^3\$ times, an improvement over \$10^4\$ times.

Thank you I will try to incorporate them. Another thing I noticed is that the code became about 5-6x faster when I used an excplicit cross product instead of the numpy builtin one. Would you happen to know why? — Jonathan Wick, CommentedMay 16, 2024 at 8:40
Is there a way I can put my new code somehwere and we can continue this discussion? Should I post the new code in this question itself? — Jonathan Wick, CommentedMay 16, 2024 at 8:51
When you take the tour, it describes a couple of possible outcomes. For example, you might choose to post a self Answer, or create a related new Question. — J_H, CommentedMay 16, 2024 at 12:52

greybeard · Accepted Answer · 2024-05-16 04:56:24Z

 self.HPMA = (2 * self.Ki) / (self.mu0 * self.Ms * self.tf) self.HVCMA = - (2 * self.Beta * abs(self.VMTJ)) / (self.mu0 * self.Ms * self.tox * self.tf)

With interpreted languages, I'm never sure how much common subexpression elimination to expect.

 def Set_temp_params(self, temperature): self.temperature = temperature temperature_coefficient = 1 - (self.temperature / self.Tc) ** self.Msx self.Ms = self.Ms0 * temperature_coefficient self.Ki = self.Ki0 * temperature_coefficient ** self.gamma_i self.Beta = self.Beta0 * temperature_coefficient ** self.gamma_b … def Set_mag_fields(self): fudge = 2 / (self.mu0 * self.Ms * self.tf) self.HPMA = self.Ki * fudge self.HVCMA = -(self.Beta * abs(self.VMTJ)) / self.tox * fudge …

should be algebraically equivalent, and the expression for HVCMA isn't overly long any more.
(I noticed another instance of self.mu0 * self.Ms in llg().)

"I'm never sure how much common subexpression elimination" last I checked python doesn't have it, period. Which makes sense when you consider all the types are statically unknown so operators can be arbitrary functions. — Passer By, CommentedMay 16, 2024 at 8:34

Kate · Accepted Answer · 2024-05-15 19:08:28Z

I am not very qualified to comment on your algorithm, so this is naive review from the POV of developer, but here is some generic advice that can be used in any situation where you want to optimize slow code, or identify bottlenecks.

I often use the codetiming package for this purpose.

So I have applied decorators on each function like:

@Timer(name="llg", text="Ran in {:0.8f} seconds", logger=logger.debug) def llg(self): ....

And then this code to produce a breakdown by function:

logger.info("Timing summary") logger.info("--------------") total_time = sum(Timer.timers.values()) logger.info(f"Total time: {total_time:.8f}") logger.info("name time cost ") logger.info("------------------------------ ---------- ------") for k, v in Timer.timers.items(): logger.info(f"{k:30} {v:.8f} {(v / total_time):.2%}")

Result:

 2024-05-15 18:30:44,361 - INFO - L163 - Timing summary 2024-05-15 18:30:44,361 - INFO - L164 - -------------- 2024-05-15 18:30:44,361 - INFO - L166 - Total time: 7.78605968 2024-05-15 18:30:44,361 - INFO - L167 - name time cost 2024-05-15 18:30:44,361 - INFO - L168 - ------------------------------ ---------- ------ 2024-05-15 18:30:44,361 - INFO - L170 - __init__ 0.00012388 0.00% 2024-05-15 18:30:44,361 - INFO - L170 - Set_temp_params 0.00000414 0.00% 2024-05-15 18:30:44,361 - INFO - L170 - Set_mag_fields 0.00001162 0.00% 2024-05-15 18:30:44,361 - INFO - L170 - calculate_Rmtj 0.02407876 0.31% 2024-05-15 18:30:44,361 - INFO - L170 - llg 3.64657636 46.83% 2024-05-15 18:30:44,361 - INFO - L170 - run 4.11526491 52.85%

This is a quick and convenient way to pinpoint slow code. And then you can dig further by wrapping selected blocks of code like this:

with Timer(name="some code", text="Ran in {:0.8f} seconds", logger=logger.info): # do something ...

Check for example the block in function llg that has np.cross...

The second strategy is that you can cache the results of expensive operations. The docs have a basic example about it:

@cache def factorial(n): return n * factorial(n-1) if n else 1 >>> factorial(10) # no previously cached result, makes 11 recursive calls 3628800 >>> factorial(5) # just looks up cached value result 120 >>> factorial(12) # makes two new recursive calls, the other 10 are cached 479001600

As long as the arguments are "hashable" you should be able to apply that logic to any purpose. Some variables do not change over the course of program execution, so if you find yourself repeatedly calculating the same values over and over, this "trick" will save a lot of precious (CPU) time.

You will quite likely have to refactor your code to benefit from this approach. You would need to pass arguments to your functions rather than rely on self.whatever. I find that approach bug-prone, because it's not immediately clear where and how the values are instantiated. Or you could use setters and getters.

But this is not my code and I have my own way of doing things so...

In functions like Set_mag_fields where you set three class attributes:

def Set_mag_fields(self): self.HPMA = (2 * self.Ki) / (self.mu0 * self.Ms * self.tf) self.HVCMA = - (2 * self.Beta * abs(self.VMTJ)) / (self.mu0 * self.Ms * self.tox * self.tf) self.HDEMAG = -self.Ms * np.array([self.Nx, self.Ny, self.Nz])

You could return a namedtuple instead, if that makes sense.

Misc: as per PEP8 constants and magic numbers should be uppercase.

The class could perfectly be written as dataclass.

Stack Exchange Network

How do I speed up this simulation program

5 Answers 5

Specifically indicating what are constants.

Mixing initialisation and execution

Standardising routines

Full Updated code

Profiling

Edited with suggestions

The real gains

Thus far

Decreasing the number of function calls

Using a JIT compiler

Using local variables

physical constants

naming

cite a reference

profiling

speed

Hot Network Questions

How do I speed up this simulation program

5 Answers 5

Specifically indicating what are constants.

Mixing initialisation and execution

Standardising routines

Full Updated code

Profiling

Edited with suggestions

The real gains

Thus far

Decreasing the number of function calls

Using a JIT compiler

Using local variables

physical constants

naming

cite a reference

profiling

speed

Related

Hot Network Questions