SlideShare a Scribd company logo
PgQ Generic high-performance queue for PostgreSQL © 2008 by Skype.
Agenda  Introduction to queuing  Problems with standard SQL  Solution by exporting MVCC info  PgQ architecture and API  Use-cases  Future © 2008 by Skype.
Queue properties  Data is created during ordinary transactions  But we want to process it later  After it is processed, its useless Producer: change_password -> password event User events Consumer: mailer © 2008 by Skype.
Queue goals  High-throughput  No locking during writing / reading  Parallel writes  Batched reads  Low-latency  Data available in reasonably short time  Robust  Returns all events  Repeatable reads © 2008 by Skype.
Implementing a queue with standard SQL © 2008 by Skype.
Standard SQL - row-by-row  Reading process:  Select first unprocessed row  Update it as in-progress  Later update it as done or delete.  High-throughput – NO  Low-latency – YES  Robust - YES © 2008 by Skype.
Standard SQL – SELECT with LIMIT  Reading process:  Select several unprocessed rows with LIMIT  Later delete all of them.  High-throughput – YES  Low-latency – YES  Robust - NO © 2008 by Skype.
Standard SQL – rotated tables  Reading process:  Rename current event table  Create new empty event table  Read renamed table  High-throughput – YES  Low-latency – NO  Robust - YES © 2008 by Skype.
Standard SQL – group by nr / date  Reading process:  Request block of events for reading  Read them  Tag the block of events as done  High-throughput – YES  Low-latency – YES  Robust - NO © 2008 by Skype.
No good way to implement queue with standard SQL © 2008 by Skype.
Postgres-specific solution, ideas  Vadim Mikheev (rserv)  We can export internal Postgres visibility info (trancaction id / snapshot).  Jan Wieck (Slony-I)  If we have 2 snapshots, we can query events that happened between them.  “Agreeable order” - order taken from sequence in AFTER trigger © 2008 by Skype.
Postgres-specific solution, PgQ improvements  Optimized querying that tolerates long transactions  Optimized rotation, the time when query is ran on both old and new table is minimal (long tx problem)  64-bit stable external transaction Ids  Simple architecture – pull-only readers  Queue component is generic © 2008 by Skype.
Postgres-spacific solution, MVCC basics  Transaction IDs (txid) are assigned sequentially  Transactions can be open variable amount of time, their operations should be invisible for that time  Snapshot represents point in time – it divides txids into visible ones and invisible ones © 2008 by Skype.
Postgres-specific solution, details  Event log table:  (ev_txid, ev_data)  Tick table where snapshots are stored  (tick_id, tick_snapshot)  Result:  High-performance – YES  Low-latency – YES  Robust - YES © 2008 by Skype.
Postgres-specific solution – Snapshot basics  Xmin – lowest transaction ID in progress  Xmax – first unassigned transaction ID  Xip – list of transaction Ids in progress  txid_visible_in_snapshot(txid, snap) = txid < snap.xmin OR ( txid < snap.xmax AND txid NOT IN (snap.xip) ) © 2008 by Skype.
Postgres-specific solution – Core API  Current transaction details:  txid_current(): int8  txid_current_snapshot(): txid_snapshot  Snapshot components:  txid_snapshot_xmin(snap): int8  txid_snapshot_xmax(snap): int8  txid_snapshot_xip(snap): SETOF int8  Visibility check:  txid_visible_in_snapshot(txid, snap): bool © 2008 by Skype.
Query between snapshots © 2008 by Skype.
Query between snapshots – Simple version  Snapshot 1 – xmin1, xmax2, xip1  Snapshot 2 – xmin2, xmax2, xip2  SELECT * FROM queue WHERE ev_txid BETWEEN xmin1 AND xmax2 AND NOT is_visible(ev_txid, snap1) AND is_visible(ev_txid, snap2)  Index scan between xmin1 and xmax2 © 2008 by Skype.
Query between snapshots – optimized version  Query must be done in 2 parts – range scan and list of explicit ids  SELECT * FROM queue WHERE ( ev_txid IN (xip1) OR ( ev_txid BETWEEN xmax1 AND xmax2) ) AND NOT is_visible(ev_txid, snap1) AND is_visible(ev_txid, snap2) © 2008 by Skype.
Query between snapshots – more optimizations  More optimizations  Pick txids that were actually committed  Decrease explicit list by accumulating nearby ones into range scan  Final notes:  The values must be substituted literally into final query, Postgres is not able to plan parametrized query.  PgQ itself uses UNION ALL instead OR. But OR seems to work at least on 8,3. © 2008 by Skype.
Query between snapshots – helper function  All complexity can be put into helper function  SELECT range_start, range_end, explicit_list FROM txid_query_helper(snap1, snap2);  This results in query:  SELECT * FROM queue WHERE ev_txid IN (explicit_list) OR ( ev_txid BETWEEN range_start AND range_end AND NOT is_visible(ev_txid, snap1) AND is_visible(ev_txid, snap2) ) © 2008 by Skype.
Take a deep breath. There is PgQ.
PgQ architecture  Ticker (pgqadm.py -d config.ini ticker)  Inserts ticks – per-queue snapshots  Vacuum tables  Rotates tables  Re-inserts retry events  Event Producers  pgq.insert_event()  pgq.sqltriga() / pgq.logutriga()  Event Consumers  Need to register  Poll for batches © 2008 by Skype.
PgQ event structure  CREATE TABLE pgq.event ( ev_id int8 NOT NULL, ev_txid int8 NOT NULL DEFAULT txid_current(), ev_time timestamptz NOT NULL DEFAULT now(), -- rest are user fields -- ev_type text, -- what to expect from ev_data ev_data text, -- main data, urlenc, xml, json ev_extra1 text, -- metadata ev_extra2 text, -- metadata ev_extra3 text, -- metadata ev_extra4 text -- metadata ); CREATE INDEX txid_idx ON pgq.event (ev_txid); © 2008 by Skype.
PgQ ticker  Reads event id sequence for each queue.  If new events have appeared, then inserts tick if:  Configurable amount of events have appeared ticker_max_count (500)  Configurable amount of time has passed from last tick ticker_max_lag (3 sec)  If no events in the queue, creates tick if some time has passed.  ticker_idle_period (60 sec)  Configuring from command line:  pgqadm.py ticker.ini config my_queue ticker_max_count=100 © 2008 by Skype.
PgQ API: event insertion  Single event insertion:  pgq.insert_event(queue, ev_type, ev_data): int8  Bulk insertion, in single transaction:  pgq.current_event_table(queue): text  Inserting with triggers:  pgq.sqltriga(queue, ...) - partial SQL format  pgq.logutriga(queue, ...) - urlencoded format © 2008 by Skype.
PgQ API: insert complex event with pure SQL  CREATE TABLE queue.some_event (col1, col2); CREATE TRIGGER some_trg BEFORE INSERT ON queue.some_event FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('dstqueue', 'SKIP');  Plain insert works:  INSERT INTO queue.some_event(col1, col2) VALUES ('value1', 'value2');  Type safety, default values, sequences, constraints!  Several tables can insert into same queue. © 2008 by Skype.
PgQ API: reading events  Registering  pgq.register_consumer(queue, consumer)  pgq.unregister_consumer(queue, consumer)  Reading  pgq.next_batch(queue, consumer): int8  pgq.get_batch_events(batch_id): SETOF record  pgq.finish_batch(batch_id) © 2008 by Skype.
Remote event tracking  Async operation allows coordinating work between several database.  Occasionally data itself allows tracking:  eg. Delete order.  If not then explicit tracking is needed.  pgq_ext module.  Tracking can happen in multiple databases. © 2008 by Skype.
Tracking events  Per-event overhead  Need to avoid accumulating  pgq_ext solution  pgq_ext.is_event_done(consumer, batch_id, ev_id)  pgq_ext.set_event_done(consumer, batch_id, ev_id)  If batch changes, deletes old events  Eg. email sender, plproxy. © 2008 by Skype.
Tracking batches  Minimal per-event overhead  Requires that all batch is processed in one TX  pgq_ext.is_batch_done(consumer, batch_id)  pgq_ext.set_batch_done(consumer, batch_id)  Eg. replication, most of the Skytools partitioning script. © 2008 by Skype.
Use-case: row counter for count(*) speedup  import pgq class RowCounter(pgq.Consumer): def process_batch(self, db, batch_id, ev_list): tbl = self.cf.get('table_name'); delta = 0 for ev in ev_list: if ev.type == 'I' and ev.extra1 == tbl: delta += 1 elif ev.type == 'D' and ev.extra1 == tbl: delta -= 1 ev.tag_done() q = 'select update_stats(%s, %s)' db.cursor().execute(q, [tbl, delta]) RowCounter('row_counter', 'db', sys.argv[1:]).start() [row_counter] db = ... pgq_queue_name = ... table_name = ... job_name = ... logfile = ... pidfile = ... © 2008 by Skype.
Use-case: copy queue to different database import pgq class QueueMover(pgq.RemoteConsumer): def process_remote_batch(self, db, batch_id, ev_list, dst_db): # prepare data rows = [] for ev in ev_list: rows.append([ev.type, ev.data, ev.time]) ev.tag_done() # insert data fields = ['ev_type', 'ev_data', 'ev_time'] curs = dst_db.cursor() dst_queue = self.cf.get('dst_queue_name') pgq.bulk_insert_events(curs, rows, fields, dst_queue) script = QueueMover('queue_mover', 'src_db', 'dst_db', sys.argv[1:]) script.start() © 2008 by Skype.
Use-case: email sender  Non-transactional, so need to track event-by-event  Needs to commit at each event © 2008 by Skype.
Use-case: replication (Londiste)  Per-batch tracking on remote side  COPY as a parallel consumer  Register, then start COPY  If COPY finishes, applies events from queue for that table  Then gives it over to main consumer  Example session: $ ed replic.ini; ed ticker.ini $ londiste.py replic.ini provider install $ londiste.py replic.ini subscriber install $ pgqadm.py -d ticker.ini ticker $ londiste.py -d replic.ini replay $ londiste.py replic.ini provider add table1 table2 ... $ londiste.py replic.ini subscriber add table1 table2 ... © 2008 by Skype.
Future: cascaded queues  The goal is to have exact copy of queue in several nodes so reader can freely switch between them.  Exact means tick_id + events. For simplicity the txids and snapshots are not carried over.  To allow consumers to randomly switch between nodes, the global horizon is kept. Each node has main worker that sends its lowest tick_id to provider. Worker on master node send global lowest tick_id to queue, where each worker can see it.  Such design allows workers to care only about 2 node.  Fancy stuff: merging of plproxy partitions. © 2008 by Skype.
Questions? © 2008 by Skype.
PgQ queue info table create table pgq.queue ( queue_id serial, queue_name text not null, queue_ntables integer not null default 3, queue_cur_table integer not null default 0, queue_rotation_period interval not null default '2 hours', queue_ticker_max_count integer not null default 500, queue_ticker_max_lag interval not null default '3 seconds', queue_ticker_idle_period interval not null default '1 minute', queue_data_pfx text not null, queue_event_seq text not null, queue_tick_seq text not null, ); © 2008 by Skype.

More Related Content

What's hot (20)

Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
TypeScript Presentation
TypeScript PresentationTypeScript Presentation
TypeScript Presentation
Patrick John Pacaña
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxData
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
Federico Campoli
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
LINAGORA
 
RxJS Operators - Real World Use Cases (FULL VERSION)
RxJS Operators - Real World Use Cases (FULL VERSION)RxJS Operators - Real World Use Cases (FULL VERSION)
RxJS Operators - Real World Use Cases (FULL VERSION)
Tracy Lee
 
Inside the jvm
Inside the jvmInside the jvm
Inside the jvm
Benjamin Kim
 
Logging, Serilog, Structured Logging, Seq
Logging, Serilog, Structured Logging, SeqLogging, Serilog, Structured Logging, Seq
Logging, Serilog, Structured Logging, Seq
Doruk Uluçay
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
Joshua Zhu
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxData
 
Hive tuning
Hive tuningHive tuning
Hive tuning
Michael Zhang
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxData
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
Joel Koshy
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
Federico Campoli
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
LINAGORA
 
RxJS Operators - Real World Use Cases (FULL VERSION)
RxJS Operators - Real World Use Cases (FULL VERSION)RxJS Operators - Real World Use Cases (FULL VERSION)
RxJS Operators - Real World Use Cases (FULL VERSION)
Tracy Lee
 
Logging, Serilog, Structured Logging, Seq
Logging, Serilog, Structured Logging, SeqLogging, Serilog, Structured Logging, Seq
Logging, Serilog, Structured Logging, Seq
Doruk Uluçay
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
Joshua Zhu
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxData
 

Similar to PgQ Generic high-performance queue for PostgreSQL (20)

Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Puppet
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
Igor Lozynskyi
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
Shu-Jeng Hsieh
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
Kohei KaiGai
 
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance Issues
Odoo
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
Hideyuki Kawashima
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGI
Mike Pittaro
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
DECK36
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
EDB
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Accumulo Summit
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
Evan Chan
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
Kohei KaiGai
 
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotDistributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Citus Data
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Puppet
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
Igor Lozynskyi
 
20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
Kohei KaiGai
 
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance Issues
Odoo
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
Hideyuki Kawashima
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
Kohei KaiGai
 
20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place20160407_GTC2016_PgSQL_In_Place
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGI
Mike Pittaro
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
DECK36
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
EDB
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Accumulo Summit
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
Evan Chan
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
Kohei KaiGai
 
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotDistributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Citus Data
 

More from elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
elliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
elliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
elliando dias
 
Nomenclatura e peças de container
Nomenclatura e peças de containerNomenclatura e peças de container
Nomenclatura e peças de container
elliando dias
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
elliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
elliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
Ragel talk
Ragel talkRagel talk
Ragel talk
elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
elliando dias
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
elliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
elliando dias
 
Rango
RangoRango
Rango
elliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
elliando dias
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
elliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
elliando dias
 
Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
elliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
elliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
elliando dias
 
Nomenclatura e peças de container
Nomenclatura e peças de containerNomenclatura e peças de container
Nomenclatura e peças de container
elliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
elliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
elliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
elliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
elliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
elliando dias
 

Recently uploaded (20)

"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents""Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
panktiskywinds12
 
Assuring Your SD-WAN to Deliver Unparalleled Digital Experiences
Assuring Your SD-WAN to Deliver Unparalleled Digital ExperiencesAssuring Your SD-WAN to Deliver Unparalleled Digital Experiences
Assuring Your SD-WAN to Deliver Unparalleled Digital Experiences
ThousandEyes
 
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
VictorSzoltysek
 
Microsoft Power Platform in 2025_Piyush Gupta_.pptx
Microsoft Power Platform in 2025_Piyush Gupta_.pptxMicrosoft Power Platform in 2025_Piyush Gupta_.pptx
Microsoft Power Platform in 2025_Piyush Gupta_.pptx
Piyush Gupta
 
Leading a High-Stakes Database Migration
Leading a High-Stakes Database MigrationLeading a High-Stakes Database Migration
Leading a High-Stakes Database Migration
ScyllaDB
 
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Safe Software
 
EIS-Manufacturing-AI–Product-Data-Optimization-Webinar-2025.pptx
EIS-Manufacturing-AI–Product-Data-Optimization-Webinar-2025.pptxEIS-Manufacturing-AI–Product-Data-Optimization-Webinar-2025.pptx
EIS-Manufacturing-AI–Product-Data-Optimization-Webinar-2025.pptx
Earley Information Science
 
Artificial Intelligence (AI) Security, Attack Vectors, Defense Techniques, Et...
Artificial Intelligence (AI) Security, Attack Vectors, Defense Techniques, Et...Artificial Intelligence (AI) Security, Attack Vectors, Defense Techniques, Et...
Artificial Intelligence (AI) Security, Attack Vectors, Defense Techniques, Et...
Salman Baset
 
Autopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
Autopilot for Everyone Series - Session 3: Exploring Real-World Use CasesAutopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
Autopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
UiPathCommunity
 
A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...
A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...
A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...
Julia Undeutsch
 
Design pattern talk by Kaya Weers - 2025
Design pattern talk by Kaya Weers - 2025Design pattern talk by Kaya Weers - 2025
Design pattern talk by Kaya Weers - 2025
Kaya Weers
 
WebMethods to MuleSoft Migration: Seamless API Integration
WebMethods to MuleSoft Migration: Seamless API IntegrationWebMethods to MuleSoft Migration: Seamless API Integration
WebMethods to MuleSoft Migration: Seamless API Integration
Prowess Software Services Inc
 
The History of Artificial Intelligence: From Ancient Ideas to Modern Algorithms
The History of Artificial Intelligence: From Ancient Ideas to Modern AlgorithmsThe History of Artificial Intelligence: From Ancient Ideas to Modern Algorithms
The History of Artificial Intelligence: From Ancient Ideas to Modern Algorithms
isoftreview8
 
Doctronic's 5M Seed Funding Pioneering AI-Powered Healthcare Solutions.pdf
Doctronic's 5M Seed Funding Pioneering AI-Powered Healthcare Solutions.pdfDoctronic's 5M Seed Funding Pioneering AI-Powered Healthcare Solutions.pdf
Doctronic's 5M Seed Funding Pioneering AI-Powered Healthcare Solutions.pdf
davidandersonofficia
 
Presentation Session 5 Transition roadmap.pdf
Presentation Session 5 Transition roadmap.pdfPresentation Session 5 Transition roadmap.pdf
Presentation Session 5 Transition roadmap.pdf
Mukesh Kala
 
Beginners: Radio Frequency, Band and Spectrum (V3)
Beginners: Radio Frequency, Band and Spectrum (V3)Beginners: Radio Frequency, Band and Spectrum (V3)
Beginners: Radio Frequency, Band and Spectrum (V3)
3G4G
 
Next Generation of Developer by Ben Hicks
Next Generation of Developer by Ben HicksNext Generation of Developer by Ben Hicks
Next Generation of Developer by Ben Hicks
gdgcincy
 
real time ai agent examples | AI agent development
real time ai agent examples | AI agent developmentreal time ai agent examples | AI agent development
real time ai agent examples | AI agent development
ybobbyyoung
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
BrainSell Technologies
 
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruptionFault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Karri Huhtanen
 
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents""Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
"Smarter, Faster, Autonomous: A Deep Dive into Agentic AI & Digital Agents"
panktiskywinds12
 
Assuring Your SD-WAN to Deliver Unparalleled Digital Experiences
Assuring Your SD-WAN to Deliver Unparalleled Digital ExperiencesAssuring Your SD-WAN to Deliver Unparalleled Digital Experiences
Assuring Your SD-WAN to Deliver Unparalleled Digital Experiences
ThousandEyes
 
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...
VictorSzoltysek
 
Microsoft Power Platform in 2025_Piyush Gupta_.pptx
Microsoft Power Platform in 2025_Piyush Gupta_.pptxMicrosoft Power Platform in 2025_Piyush Gupta_.pptx
Microsoft Power Platform in 2025_Piyush Gupta_.pptx
Piyush Gupta
 
Leading a High-Stakes Database Migration
Leading a High-Stakes Database MigrationLeading a High-Stakes Database Migration
Leading a High-Stakes Database Migration
ScyllaDB
 
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...
Safe Software
 
EIS-Manufacturing-AI–Product-Data-Optimization-Webinar-2025.pptx
EIS-Manufacturing-AI–Product-Data-Optimization-Webinar-2025.pptxEIS-Manufacturing-AI–Product-Data-Optimization-Webinar-2025.pptx
EIS-Manufacturing-AI–Product-Data-Optimization-Webinar-2025.pptx
Earley Information Science
 
Artificial Intelligence (AI) Security, Attack Vectors, Defense Techniques, Et...
Artificial Intelligence (AI) Security, Attack Vectors, Defense Techniques, Et...Artificial Intelligence (AI) Security, Attack Vectors, Defense Techniques, Et...
Artificial Intelligence (AI) Security, Attack Vectors, Defense Techniques, Et...
Salman Baset
 
Autopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
Autopilot for Everyone Series - Session 3: Exploring Real-World Use CasesAutopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
Autopilot for Everyone Series - Session 3: Exploring Real-World Use Cases
UiPathCommunity
 
A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...
A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...
A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...
Julia Undeutsch
 
Design pattern talk by Kaya Weers - 2025
Design pattern talk by Kaya Weers - 2025Design pattern talk by Kaya Weers - 2025
Design pattern talk by Kaya Weers - 2025
Kaya Weers
 
WebMethods to MuleSoft Migration: Seamless API Integration
WebMethods to MuleSoft Migration: Seamless API IntegrationWebMethods to MuleSoft Migration: Seamless API Integration
WebMethods to MuleSoft Migration: Seamless API Integration
Prowess Software Services Inc
 
The History of Artificial Intelligence: From Ancient Ideas to Modern Algorithms
The History of Artificial Intelligence: From Ancient Ideas to Modern AlgorithmsThe History of Artificial Intelligence: From Ancient Ideas to Modern Algorithms
The History of Artificial Intelligence: From Ancient Ideas to Modern Algorithms
isoftreview8
 
Doctronic's 5M Seed Funding Pioneering AI-Powered Healthcare Solutions.pdf
Doctronic's 5M Seed Funding Pioneering AI-Powered Healthcare Solutions.pdfDoctronic's 5M Seed Funding Pioneering AI-Powered Healthcare Solutions.pdf
Doctronic's 5M Seed Funding Pioneering AI-Powered Healthcare Solutions.pdf
davidandersonofficia
 
Presentation Session 5 Transition roadmap.pdf
Presentation Session 5 Transition roadmap.pdfPresentation Session 5 Transition roadmap.pdf
Presentation Session 5 Transition roadmap.pdf
Mukesh Kala
 
Beginners: Radio Frequency, Band and Spectrum (V3)
Beginners: Radio Frequency, Band and Spectrum (V3)Beginners: Radio Frequency, Band and Spectrum (V3)
Beginners: Radio Frequency, Band and Spectrum (V3)
3G4G
 
Next Generation of Developer by Ben Hicks
Next Generation of Developer by Ben HicksNext Generation of Developer by Ben Hicks
Next Generation of Developer by Ben Hicks
gdgcincy
 
real time ai agent examples | AI agent development
real time ai agent examples | AI agent developmentreal time ai agent examples | AI agent development
real time ai agent examples | AI agent development
ybobbyyoung
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
BrainSell Technologies
 
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruptionFault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Fault-tolerant, distrbuted AAA architecture supporting connectivity disruption
Karri Huhtanen
 

PgQ Generic high-performance queue for PostgreSQL

  • 1. PgQ Generic high-performance queue for PostgreSQL © 2008 by Skype.
  • 2. Agenda  Introduction to queuing  Problems with standard SQL  Solution by exporting MVCC info  PgQ architecture and API  Use-cases  Future © 2008 by Skype.
  • 3. Queue properties  Data is created during ordinary transactions  But we want to process it later  After it is processed, its useless Producer: change_password -> password event User events Consumer: mailer © 2008 by Skype.
  • 4. Queue goals  High-throughput  No locking during writing / reading  Parallel writes  Batched reads  Low-latency  Data available in reasonably short time  Robust  Returns all events  Repeatable reads © 2008 by Skype.
  • 5. Implementing a queue with standard SQL © 2008 by Skype.
  • 6. Standard SQL - row-by-row  Reading process:  Select first unprocessed row  Update it as in-progress  Later update it as done or delete.  High-throughput – NO  Low-latency – YES  Robust - YES © 2008 by Skype.
  • 7. Standard SQL – SELECT with LIMIT  Reading process:  Select several unprocessed rows with LIMIT  Later delete all of them.  High-throughput – YES  Low-latency – YES  Robust - NO © 2008 by Skype.
  • 8. Standard SQL – rotated tables  Reading process:  Rename current event table  Create new empty event table  Read renamed table  High-throughput – YES  Low-latency – NO  Robust - YES © 2008 by Skype.
  • 9. Standard SQL – group by nr / date  Reading process:  Request block of events for reading  Read them  Tag the block of events as done  High-throughput – YES  Low-latency – YES  Robust - NO © 2008 by Skype.
  • 10. No good way to implement queue with standard SQL © 2008 by Skype.
  • 11. Postgres-specific solution, ideas  Vadim Mikheev (rserv)  We can export internal Postgres visibility info (trancaction id / snapshot).  Jan Wieck (Slony-I)  If we have 2 snapshots, we can query events that happened between them.  “Agreeable order” - order taken from sequence in AFTER trigger © 2008 by Skype.
  • 12. Postgres-specific solution, PgQ improvements  Optimized querying that tolerates long transactions  Optimized rotation, the time when query is ran on both old and new table is minimal (long tx problem)  64-bit stable external transaction Ids  Simple architecture – pull-only readers  Queue component is generic © 2008 by Skype.
  • 13. Postgres-spacific solution, MVCC basics  Transaction IDs (txid) are assigned sequentially  Transactions can be open variable amount of time, their operations should be invisible for that time  Snapshot represents point in time – it divides txids into visible ones and invisible ones © 2008 by Skype.
  • 14. Postgres-specific solution, details  Event log table:  (ev_txid, ev_data)  Tick table where snapshots are stored  (tick_id, tick_snapshot)  Result:  High-performance – YES  Low-latency – YES  Robust - YES © 2008 by Skype.
  • 15. Postgres-specific solution – Snapshot basics  Xmin – lowest transaction ID in progress  Xmax – first unassigned transaction ID  Xip – list of transaction Ids in progress  txid_visible_in_snapshot(txid, snap) = txid < snap.xmin OR ( txid < snap.xmax AND txid NOT IN (snap.xip) ) © 2008 by Skype.
  • 16. Postgres-specific solution – Core API  Current transaction details:  txid_current(): int8  txid_current_snapshot(): txid_snapshot  Snapshot components:  txid_snapshot_xmin(snap): int8  txid_snapshot_xmax(snap): int8  txid_snapshot_xip(snap): SETOF int8  Visibility check:  txid_visible_in_snapshot(txid, snap): bool © 2008 by Skype.
  • 17. Query between snapshots © 2008 by Skype.
  • 18. Query between snapshots – Simple version  Snapshot 1 – xmin1, xmax2, xip1  Snapshot 2 – xmin2, xmax2, xip2  SELECT * FROM queue WHERE ev_txid BETWEEN xmin1 AND xmax2 AND NOT is_visible(ev_txid, snap1) AND is_visible(ev_txid, snap2)  Index scan between xmin1 and xmax2 © 2008 by Skype.
  • 19. Query between snapshots – optimized version  Query must be done in 2 parts – range scan and list of explicit ids  SELECT * FROM queue WHERE ( ev_txid IN (xip1) OR ( ev_txid BETWEEN xmax1 AND xmax2) ) AND NOT is_visible(ev_txid, snap1) AND is_visible(ev_txid, snap2) © 2008 by Skype.
  • 20. Query between snapshots – more optimizations  More optimizations  Pick txids that were actually committed  Decrease explicit list by accumulating nearby ones into range scan  Final notes:  The values must be substituted literally into final query, Postgres is not able to plan parametrized query.  PgQ itself uses UNION ALL instead OR. But OR seems to work at least on 8,3. © 2008 by Skype.
  • 21. Query between snapshots – helper function  All complexity can be put into helper function  SELECT range_start, range_end, explicit_list FROM txid_query_helper(snap1, snap2);  This results in query:  SELECT * FROM queue WHERE ev_txid IN (explicit_list) OR ( ev_txid BETWEEN range_start AND range_end AND NOT is_visible(ev_txid, snap1) AND is_visible(ev_txid, snap2) ) © 2008 by Skype.
  • 22. Take a deep breath. There is PgQ.
  • 23. PgQ architecture  Ticker (pgqadm.py -d config.ini ticker)  Inserts ticks – per-queue snapshots  Vacuum tables  Rotates tables  Re-inserts retry events  Event Producers  pgq.insert_event()  pgq.sqltriga() / pgq.logutriga()  Event Consumers  Need to register  Poll for batches © 2008 by Skype.
  • 24. PgQ event structure  CREATE TABLE pgq.event ( ev_id int8 NOT NULL, ev_txid int8 NOT NULL DEFAULT txid_current(), ev_time timestamptz NOT NULL DEFAULT now(), -- rest are user fields -- ev_type text, -- what to expect from ev_data ev_data text, -- main data, urlenc, xml, json ev_extra1 text, -- metadata ev_extra2 text, -- metadata ev_extra3 text, -- metadata ev_extra4 text -- metadata ); CREATE INDEX txid_idx ON pgq.event (ev_txid); © 2008 by Skype.
  • 25. PgQ ticker  Reads event id sequence for each queue.  If new events have appeared, then inserts tick if:  Configurable amount of events have appeared ticker_max_count (500)  Configurable amount of time has passed from last tick ticker_max_lag (3 sec)  If no events in the queue, creates tick if some time has passed.  ticker_idle_period (60 sec)  Configuring from command line:  pgqadm.py ticker.ini config my_queue ticker_max_count=100 © 2008 by Skype.
  • 26. PgQ API: event insertion  Single event insertion:  pgq.insert_event(queue, ev_type, ev_data): int8  Bulk insertion, in single transaction:  pgq.current_event_table(queue): text  Inserting with triggers:  pgq.sqltriga(queue, ...) - partial SQL format  pgq.logutriga(queue, ...) - urlencoded format © 2008 by Skype.
  • 27. PgQ API: insert complex event with pure SQL  CREATE TABLE queue.some_event (col1, col2); CREATE TRIGGER some_trg BEFORE INSERT ON queue.some_event FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('dstqueue', 'SKIP');  Plain insert works:  INSERT INTO queue.some_event(col1, col2) VALUES ('value1', 'value2');  Type safety, default values, sequences, constraints!  Several tables can insert into same queue. © 2008 by Skype.
  • 28. PgQ API: reading events  Registering  pgq.register_consumer(queue, consumer)  pgq.unregister_consumer(queue, consumer)  Reading  pgq.next_batch(queue, consumer): int8  pgq.get_batch_events(batch_id): SETOF record  pgq.finish_batch(batch_id) © 2008 by Skype.
  • 29. Remote event tracking  Async operation allows coordinating work between several database.  Occasionally data itself allows tracking:  eg. Delete order.  If not then explicit tracking is needed.  pgq_ext module.  Tracking can happen in multiple databases. © 2008 by Skype.
  • 30. Tracking events  Per-event overhead  Need to avoid accumulating  pgq_ext solution  pgq_ext.is_event_done(consumer, batch_id, ev_id)  pgq_ext.set_event_done(consumer, batch_id, ev_id)  If batch changes, deletes old events  Eg. email sender, plproxy. © 2008 by Skype.
  • 31. Tracking batches  Minimal per-event overhead  Requires that all batch is processed in one TX  pgq_ext.is_batch_done(consumer, batch_id)  pgq_ext.set_batch_done(consumer, batch_id)  Eg. replication, most of the Skytools partitioning script. © 2008 by Skype.
  • 32. Use-case: row counter for count(*) speedup  import pgq class RowCounter(pgq.Consumer): def process_batch(self, db, batch_id, ev_list): tbl = self.cf.get('table_name'); delta = 0 for ev in ev_list: if ev.type == 'I' and ev.extra1 == tbl: delta += 1 elif ev.type == 'D' and ev.extra1 == tbl: delta -= 1 ev.tag_done() q = 'select update_stats(%s, %s)' db.cursor().execute(q, [tbl, delta]) RowCounter('row_counter', 'db', sys.argv[1:]).start() [row_counter] db = ... pgq_queue_name = ... table_name = ... job_name = ... logfile = ... pidfile = ... © 2008 by Skype.
  • 33. Use-case: copy queue to different database import pgq class QueueMover(pgq.RemoteConsumer): def process_remote_batch(self, db, batch_id, ev_list, dst_db): # prepare data rows = [] for ev in ev_list: rows.append([ev.type, ev.data, ev.time]) ev.tag_done() # insert data fields = ['ev_type', 'ev_data', 'ev_time'] curs = dst_db.cursor() dst_queue = self.cf.get('dst_queue_name') pgq.bulk_insert_events(curs, rows, fields, dst_queue) script = QueueMover('queue_mover', 'src_db', 'dst_db', sys.argv[1:]) script.start() © 2008 by Skype.
  • 34. Use-case: email sender  Non-transactional, so need to track event-by-event  Needs to commit at each event © 2008 by Skype.
  • 35. Use-case: replication (Londiste)  Per-batch tracking on remote side  COPY as a parallel consumer  Register, then start COPY  If COPY finishes, applies events from queue for that table  Then gives it over to main consumer  Example session: $ ed replic.ini; ed ticker.ini $ londiste.py replic.ini provider install $ londiste.py replic.ini subscriber install $ pgqadm.py -d ticker.ini ticker $ londiste.py -d replic.ini replay $ londiste.py replic.ini provider add table1 table2 ... $ londiste.py replic.ini subscriber add table1 table2 ... © 2008 by Skype.
  • 36. Future: cascaded queues  The goal is to have exact copy of queue in several nodes so reader can freely switch between them.  Exact means tick_id + events. For simplicity the txids and snapshots are not carried over.  To allow consumers to randomly switch between nodes, the global horizon is kept. Each node has main worker that sends its lowest tick_id to provider. Worker on master node send global lowest tick_id to queue, where each worker can see it.  Such design allows workers to care only about 2 node.  Fancy stuff: merging of plproxy partitions. © 2008 by Skype.
  • 37. Questions? © 2008 by Skype.
  • 38. PgQ queue info table create table pgq.queue ( queue_id serial, queue_name text not null, queue_ntables integer not null default 3, queue_cur_table integer not null default 0, queue_rotation_period interval not null default '2 hours', queue_ticker_max_count integer not null default 500, queue_ticker_max_lag interval not null default '3 seconds', queue_ticker_idle_period interval not null default '1 minute', queue_data_pfx text not null, queue_event_seq text not null, queue_tick_seq text not null, ); © 2008 by Skype.
close