I've found some resources about PHP performance optimization here and there but I don't know how to start. I send 130k CSV rows in approximatively 7min. My script read CSV file and send $batch_size
row per query, while doing some string operations such as changing date to the correct format or replacing ,
with .
for the floats.
$db = new PDO("mysql:host=$db_host;port=$db_port;charset=utf8","$db_username", "$db_password"); $csv_path = "path/to/myfile.csv"; $table = "TableName"; //Start of the query $query="USE ".$db_name.";"; $FilePath = $csv_path; $is_not_title_row=FALSE;//Usefull not to take the first row $row_batch_count=0;//Counter for batches //1 data = 1 part of the row $number_of_data=0; $data=''; // Open file in read-only mode $handle = fopen($FilePath, "r"); while (($data = fgetcsv($handle, 2000, ";" , chr(8))) !== FALSE) { $number_of_data = count($data)-1; // Generate MySQL query if($is_not_title_row) { $query .= "INSERT INTO ".$table." VALUES ("; //We read all the data of one row for ($c=0; $c <= $number_of_data; $c++) { $data[$c]=utf8_decode($data[$c]); $data[$c]=str_replace('/', '-', $data[$c]); $data[$c]=str_replace(',', '.', $data[$c]); //If it's a date, we convert it to the right date format if (DateTime::createFromFormat('d-m-Y', $data[$c]) !== FALSE) { $data[$c] = date("Y-m-d", strtotime($data[$c])); $query .= "'" . mres($data[$c]) ."',"; //If there is nothing, we send NULL value else if(mres($data[$c])==NULL){ $query .= "NULL,"; } else{ $query .= "'" . mres($data[$c]) ."',"; } //If this is the end of the INSERT INTO we remove the comma if($c==$number_of_data) { $query=substr_replace($query ,"", -1); } } $query .= ");"; $row_batch_count++; //If we are at the end of the batch, we send the query then start the creation of another if($row_batch_count==$batch_size) { echo "There is ".$row_batch_count." row sent<br/>"; echo "<br/><br/>"; $query_to_execute = $db->prepare($query); $query_to_execute->execute(); $query="USE ".$db_name.";"; $row_batch_count=0; } } //Usefull to remove the first row if(!$is_not_title_row){ $is_not_title_row=TRUE; } } //Without more batches, we execute the query(=we are at the end of the file) $query_to_execute = $db->prepare($query); $query_to_execute->execute(); $query_to_execute = NULL;
Solution
Instead of doing one INSERT INTO
for each row, I've simply inserted multiple row in a single INSERT INTO. The output query looks like that :
INSERT INTO Table VALUES (x,x,x),(x,x,x),(x,x,x),(...
The csv insertion is now 30 times faster 🚀
$handle
,$table
, etc. "there are significant pieces of the core functionality missing, and we need you to fill in the details. Excerpts of large projects are fine, but if you have omitted too much, then reviewers are left imagining how your program works."\$\endgroup\$chr(8)
infgetcsv()
. That looks weird to me. It's also difficult to understand why you process the rows, the way you do, without any idea of what the data looks like, and what it has to become.\$\endgroup\$