2
\$\begingroup\$

Problem

This project gets the past ~20 days of an equity data (such as $AAPL, $AMZN, $GOOG) plus the current equity "quote" (which comes every 60 seconds using a free API) and estimates seven "real-time" future price targets. It only uses one main class, EQ and slowly does this task using a CRON job.

Code in the EQ class scrapes equities data using EquityRecords, calculates sector (e.g., technology, healthcare) coefficients using SectorMovers, estimates equity prices, and finally writes ~8,000 HTML strings (~100Kb-120Kb) for each equity on .md files for viewing.

Performance

The goal is making EQ as fast/efficient as possible for a single server. However, all comments/helps/advices are so welcomed.

This is my first scripting project and it has so many issues. I also couldn't add EQ in the post due to character limitations. Please see the entire code on this GitHub link.


EquityRecords

date_default_timezone_set("UTC"); ini_set('max_execution_time', 0); ini_set('memory_limit', '-1'); set_time_limit(0); // EquityRecords::allEquitiesSignleJSON(new EquityRecords()); class EquityRecords { const NUMBER_OF_STOCKS_PER_REQUEST = 100; const NEW_LINE = "\n"; /** * * @var a string of iextrading symbols */ const SYMBOLS_PATH = '/../../config/z-iextrading-symbs.md'; /** * * @var a string of our symbols json directory */ const SYMBOLS_DIR = "/../../blog-back/equities/real-time-60sec/z-raw-equilibrium-estimation"; /** * * @var a string of target path and query */ const TARGET_QUERY = "stock/market/batch?symbols="; /** * * @var a string of iextrading base URL */ const BASE_URL = "https://api.iextrading.com/1.0/"; /** * * @var a string of iextrading end point */ const END_POINT = "&types=quote,chart&range=1m&last=10"; /** * * @var an integer for maximum number of stocks per URL on each call */ //***************** A ********************** // // public static function getSymbols() { // return array_map(function($line){ return str_getcsv($line, "\t"); }, file(__DIR__ . self::SYMBOLS_PATH)); // } public static function getSymbols() { //***************** START: ALL SYMBOLS ARRAY ********************** // // var: is a filename path directory, where there is an md file with list of equities $list_of_equities_file = __DIR__ . self::SYMBOLS_PATH; // var: is content of md file with list of equities $content_of_equities = file_get_contents($list_of_equities_file); // var is an array(3) of equities such as: string(4) "ZYNE", string(10) "2019-01-04", string(27) "ZYNERBA PHARMACEUTICALS INC" // $symbols_array=preg_split('/\r\n|\r|\n/', $content_of_equities); $symbols_array = preg_split('/\R/', $content_of_equities); //***************** END: ALL SYMBOLS ARRAY ********************** // // child and mother arrays are created to help calling equities in batches of 100, which seems to be the API limit. $child = array(); $mother = array(); // var: is 100 counter $limit_counter = self::NUMBER_OF_STOCKS_PER_REQUEST; foreach ($symbols_array as $ticker_arr) { $limit_counter = $limit_counter - 1; $symbols_array = preg_split('/\t/', $ticker_arr); array_push($child, $symbols_array); if ($limit_counter <= 0) { $limit_counter = self::NUMBER_OF_STOCKS_PER_REQUEST; array_push($mother, $child); $child = array(); } } return $mother; } public static function allEquitiesSignleJSON() { $equity_arrays = EquityRecords::getSymbols(); $base_url = self::BASE_URL . self::TARGET_QUERY; $current_time = date("Y-m-d-H-i-s"); $all_equities = array(); // ticker: AAPL, GE, AMD foreach ($equity_arrays as $ticker_arr) { $ticker = array_column($ticker_arr, 0); $equity_url = $base_url . implode("%2C", $ticker) . self::END_POINT; $raw_eauity_json = file_get_contents($equity_url); $raw_equity_array = json_decode($raw_eauity_json, true); $all_equities = array_merge($all_equities, $raw_equity_array); } $all_equities_json = json_encode($all_equities); $symbols_dir = __DIR__ . self::SYMBOLS_DIR; if (!is_dir($symbols_dir)) {mkdir($symbols_dir, 0755, true);} $raw_equity_file = $symbols_dir . "/" . $current_time . ".json"; $fp = fopen($raw_equity_file, "x+"); fwrite($fp, $all_equities_json); fclose($fp); echo "YAAAY! Equity JSON file success at " . __METHOD__ . " ! 💚 " . self::NEW_LINE; } } 

SectorMovers

date_default_timezone_set("UTC"); ini_set('max_execution_time', 0); ini_set('memory_limit', '-1'); set_time_limit(0); require_once __DIR__ . "/EquityRecords.php"; SectorMovers::getSectors(); class SectorMovers { /** * * @var a string of iextrading base URL */ const BASE_URL = "https://api.iextrading.com/1.0/"; /** * * @var a string of target path and query */ const TARGET_QUERY = "stock/market/batch?symbols="; /** * * @var a string for backend path for every sector */ const EACH_SECTOR_DIR_PREFIX = "/../../blog-back/sectors/real-time-60sec/z-raw-sector-"; /** * * @var a string for backend path for index sector */ const INDEX_SECTOR_DIR_PREFIX = "/../../blog-back/sectors/real-time-60sec/y-index/"; /** * * @var a string for live data path */ const LIVE_DATA_DIR = "/../../../public_html/blog/files/"; const DIR_FRONT_SECTOR_COEF_FILENAME = "s-1.txt"; // Filename that records sector coefficient JSON public static function getSectors() { $base_url = self::BASE_URL . self::TARGET_QUERY; $current_time = date("Y-m-d-H-i-s"); $permission = 0755; $index_data = array("Overall" => array("sector_weight" => 1, "sector_coefficient" => 1, "sector_value" => 0)); $sector_movers = SectorMovers::iexSectorParams(); foreach ($sector_movers as $sector_mover) { // $sector_url = $base_url . implode(",", array_keys($sector_mover["selected_tickers"])) . "&types=quote&range=1m"; $sector_url = $base_url . implode("%2C", array_keys($sector_mover["selected_tickers"])) . "&types=quote&range=1m"; $rawSectorJson = file_get_contents($sector_url); $raw_sector_array = json_decode($rawSectorJson, true); // ******************* Back Data ***************** // // Write the raw file in the back directories // $rawSectorDir = __DIR__ . self::EACH_SECTOR_DIR_PREFIX . $sector_mover["directory"]; // // if back directory not exist // if (!is_dir($rawSectorDir)) {mkdir($rawSectorDir, $permission, true);} // // create and open/write/close sector data to back directories // $rawSectorFile = $rawSectorDir . "/" . $current_time . ".json"; // $fp = fopen($rawSectorFile, "a+"); // fwrite($fp, $rawSectorJson); // fclose($fp); // ******************* End Back Data ***************** // // Calculate the real-time index $index_value = 0; foreach ($raw_sector_array as $ticker => $ticker_stats) { if (isset($sector_mover["selected_tickers"][$ticker], $ticker_stats["quote"], $ticker_stats["quote"]["extendedChangePercent"], $ticker_stats["quote"]["changePercent"], $ticker_stats["quote"]["ytdChange"])) { $change_amount = ($ticker_stats["quote"]["extendedChangePercent"] + $ticker_stats["quote"]["changePercent"] + $ticker_stats["quote"]["ytdChange"]) / 200; $index_value += $sector_mover["sector_weight"] * $sector_mover["selected_tickers"][$ticker] * $change_amount; } } $index_data[$sector_mover["sector"]] = array("sector_weight" => $sector_mover["sector_weight"], "sector_coefficient" => $sector_mover["sector_coefficient"], "sector_value" => $index_value); $index_data["Overall"]["sector_value"] += $index_data[$sector_mover["sector"]]["sector_value"]; } // Calculate the index factor for better visibility between -1 and +1 $front_index_data = array(); foreach ($index_data as $sector_name => $sector_index_data) { // $index_sign = $sector_index_data["sector_value"]; // if ($index_sign < 0) { // $index_sign = - $index_sign; // } $index_sign = abs($sector_index_data["sector_value"]); $index_factor = 1; for ($i = 0; $i <= 10; $i++) { $index_factor = pow(10, $i); if (($index_factor * $index_sign) > 1) { $index_factor = pow(10, $i - 1); break; } } // $index_factor = 10 ** strlen(preg_match('~\.\K0+~', $float, $zeros) ? $zeros[0] : 0); $front_index_data[$sector_name] = $sector_index_data["sector_weight"] * $sector_index_data["sector_coefficient"] * $sector_index_data["sector_value"] * $index_factor; } // Write the index file $index_sector_dir = __DIR__ . self::INDEX_SECTOR_DIR_PREFIX; if (!is_dir($index_sector_dir)) {mkdir($index_sector_dir, $permission, true);} $index_sector_file = $index_sector_dir . $current_time . ".json"; $index_sector_json = json_encode($front_index_data, JSON_FORCE_OBJECT); $fp = fopen($index_sector_file, "a+"); fwrite($fp, $index_sector_json); fclose($fp); $sector_dir = __DIR__ . self::LIVE_DATA_DIR; if (!is_dir($sector_dir)) {mkdir($sector_dir, $permission, true);} // if data directory did not exist // if s-1 file did not exist if (!file_exists($sector_dir . self::DIR_FRONT_SECTOR_COEF_FILENAME)) { $handle = fopen($sector_dir . self::DIR_FRONT_SECTOR_COEF_FILENAME, "wb"); fwrite($handle, "d"); fclose($handle); } $sector_coef_file = $sector_dir . self::DIR_FRONT_SECTOR_COEF_FILENAME; copy($index_sector_file, $sector_coef_file); echo "YAAAY! " . __METHOD__ . " updated sector coefficients successfully 💚!\n"; return $front_index_data; } public static function iexSectorParams() { $sector_movers = array( array( "sector" => "IT", "directory" => "information-technology", "sector_weight" => 0.18, "sector_coefficient" => 4, "selected_tickers" => array( "AAPL" => 0.18, "AMZN" => 0.16, "GOOGL" => 0.14, "IBM" => 0.2, "MSFT" => 0.1, "FB" => 0.1, "NFLX" => 0.08, "ADBE" => 0.06, "CRM" => 0.04, "NVDA" => 0.02, ), ), array( "sector" => "Telecommunication", "directory" => "telecommunication-services", "sector_weight" => 0.12, "sector_coefficient" => 4, "selected_tickers" => array( "VZ" => 0.18, "CSCO" => 0.16, "CMCSA" => 0.14, "T" => 0.12, "CTL" => 0.1, "CHTR" => 0.1, "S" => 0.08, "DISH" => 0.06, "USM" => 0.04, "VOD" => 0.02, ), ), array( "sector" => "Finance", "directory" => "financial-services", "sector_weight" => 0.1, "sector_coefficient" => 6, "selected_tickers" => array( "JPM" => 0.18, "GS" => 0.16, "V" => 0.14, "BAC" => 0.12, "AXP" => 0.1, "WFC" => 0.1, "USB" => 0.08, "PNC" => 0.06, "AMG" => 0.04, "AIG" => 0.02, ), ), array( "sector" => "Energy", "directory" => "energy", "sector_weight" => 0.1, "sector_coefficient" => 6, "selected_tickers" => array( "CVX" => 0.18, "XOM" => 0.16, "APA" => 0.14, "COP" => 0.12, "BHGE" => 0.1, "VLO" => 0.1, "APC" => 0.08, "ANDV" => 0.06, "OXY" => 0.04, "HAL" => 0.02, ), ), array( "sector" => "Industrials", "directory" => "industrials", "sector_weight" => 0.08, "sector_coefficient" => 8, "selected_tickers" => array( "CAT" => 0.18, "FLR" => 0.16, "GE" => 0.14, "JEC" => 0.12, "JCI" => 0.1, "MAS" => 0.1, "FLS" => 0.08, "AAL" => 0.06, "AME" => 0.04, "CHRW" => 0.02, ), ), array( "sector" => "Materials and Chemicals", "directory" => "materials-and-chemicals", "sector_weight" => 0.08, "sector_coefficient" => 8, "selected_tickers" => array( "DWDP" => 0.18, "APD" => 0.16, "EMN" => 0.14, "ECL" => 0.12, "FMC" => 0.1, "LYB" => 0.1, "MOS" => 0.08, "NEM" => 0.06, "PPG" => 0.04, "MLM" => 0.02, ), ), array( "sector" => "Utilities", "directory" => "utilities", "sector_weight" => 0.08, "sector_coefficient" => 8, "selected_tickers" => array( "PPL" => 0.18, "PCG" => 0.16, "SO" => 0.14, "WEC" => 0.12, "PEG" => 0.1, "XEL" => 0.1, "D" => 0.08, "NGG" => 0.06, "NEE" => 0.04, "PNW" => 0.02, ), ), array( "sector" => "Consumer Discretionary", "directory" => "consumer-discretionary", "sector_weight" => 0.08, "sector_coefficient" => 8, "selected_tickers" => array( "DIS" => 0.18, "HD" => 0.16, "BBY" => 0.14, "CBS" => 0.12, "CMG" => 0.1, "MCD" => 0.1, "GPS" => 0.08, "HOG" => 0.06, "AZO" => 0.04, "EXPE" => 0.02, ), ), array( "sector" => "Consumer Staples", "directory" => "consumer-staples", "sector_weight" => 0.06, "sector_coefficient" => 8, "selected_tickers" => array( "PEP" => 0.18, "PM" => 0.16, "PG" => 0.14, "MNST" => 0.12, "TSN" => 0.1, "CPB" => 0.1, "HRL" => 0.08, "SJM" => 0.06, "CAG" => 0.04, "KHC" => 0.02, ), ), array( "sector" => "Defense", "directory" => "defense-and-aerospace", "sector_weight" => 0.04, "sector_coefficient" => 10, "selected_tickers" => array( "BA" => 0.18, "LMT" => 0.16, "UTX" => 0.14, "NOC" => 0.12, "HON" => 0.1, "RTN" => 0.1, "TXT" => 0.08, "LLL" => 0.06, "COL" => 0.04, "GD" => 0.02, ), ), array( "sector" => "Health", "directory" => "health-care-and-pharmaceuticals", "sector_weight" => 0.04, "sector_coefficient" => 10, "selected_tickers" => array( "UNH" => 0.18, "JNJ" => 0.16, "PFE" => 0.14, "UHS" => 0.12, "AET" => 0.1, "RMD" => 0.1, "TMO" => 0.08, "MRK" => 0.06, "ABT" => 0.04, "LLY" => 0.02, ), ), array( "sector" => "Real Estate", "directory" => "real-estate", "sector_weight" => 0.04, "sector_coefficient" => 10, "selected_tickers" => array( "CCI" => 0.18, "AMT" => 0.16, "AVB" => 0.14, "HCP" => 0.12, "RCL" => 0.1, "HST" => 0.1, "NCLH" => 0.08, "HLT" => 0.06, "ARE" => 0.04, "AIV" => 0.02, ), ), ); return $sector_movers; } } 

Acknowledgment

I'd like to thank these users for being so kind and helpful, which I could implement some of their advices in the code.

\$\endgroup\$
3
  • 1
    \$\begingroup\$@Emma, Can you change the title to represent business requirement and not what you are doing. Ex: Creating Equity Dump?\$\endgroup\$
    – JaDogg
    CommentedMar 21, 2019 at 16:10
  • 1
    \$\begingroup\$The first thing you need to do is to discover what the bottlenecks are. Have you tried to profile your code? Common bottlenecks are: Slow APIs, too many database calls or file access, inefficient loops, or simply bad code/algorithms. There's way too many code in your project for me to dive into, but I think you can probably find the bottlenecks yourself. Once located you need to find a way to eliminate them, if possible.\$\endgroup\$CommentedMar 31, 2019 at 23:16
  • 1
    \$\begingroup\$So you're doing a slow calculation and updating lots of files every minute? And how often are those files read on average? If it's less than once per minute, you might have a good case for generating the data on demand instead.\$\endgroup\$CommentedApr 1, 2019 at 8:00

1 Answer 1

2
+50
\$\begingroup\$

Overall the methods look a bit too long. In this presentation about cleaning up code Rafael Dohms talks about limiting the indentation level to one per method and keeping methods to ~15 lines or less. (see the slides here).

Either you didn't comprehend it, or you didn't want to heed the advice of the first section of my answer to your first question. You don't need to to have an instance of the EQ class that holds values that come from the static methods. You could simply call the static methods wherever the properties of that instance are currently used. For example, in the static method EQ::getEquilibriums() the symbols are used like this:

foreach ($class_obj->symbols as $symb => $arr) { 

Instead of utilizing $class_obj->symbols just utilize EQ::getSymbols()- this could be stored in a local variable if that needs to be used multiple times within a method/function.

foreach (self::getSymbols() as $symb => $arr) { 

Notice that this example uses the keyword self instead of EQ. This is a shortcut that can be used when accessing methods and static properties on the same class - see this example in the documentation for the scope resolution operator.

The same is true for the other methods called by that method - e.g. EQ::getCharts(). EQ::getOverallCoef() can just call EQ::getSectors() to get the sectors. And those methods can store fetched data the first time in static variables instead of re-fetching data on subsequent calls.

There shouldn't be a need to create that new EQ() object and pass it to the methods. So this line:

EQ::getEquilibriums(new EQ()); 

should be updated like this:

EQ::getEquilibriums(); 

If you need to check if any of those helper methods doesn't return anything (i.e. the following check at the end of the EQ constructor)

if ($this->symbols == null || $this->sector == null || $this->overall == null || $this->emojis == null) { 

Check for each case in the respective getter method and consider throwing an exception if appropriate.


The array returned by SectorMovers::iexSectorParams() could be declared as a constant and the method can be removed.


The following three lines in EquityRecords::allEquitiesSignleJSON():

$fp = fopen($raw_equity_file, "x+"); fwrite($fp, $all_equities_json); fclose($fp); 

Should likely be replaceable with a call to file_put_contents()

You pointed me to the SO question with this accepted answer which claims "the fwrite() is a smidgen faster." and cites this article. I would be curious if that is still the case in PHP 7. I will research this.


You could also consider using explode() instead of preg_split() if it works - depending on the delimiter. Refer to answers to _In PHP, which is faster: preg_split or explode?_ for more information.

Tip If you don't need the power of regular expressions, you can choose faster (albeit simpler) alternatives like explode() or str_split().1

1https://www.php.net/manual/en/function.preg-split.php#refsect1-function.preg-split-notes

\$\endgroup\$
0

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.