Questions tagged [big-data]

The big-data tag has no summary.

75 questions

2votes

1answer

239views

What is an optimal system design for tracking product views per user that is scalable?

I have a web application that contains products and users. There are 10,000+ products and 100,000+ users to give a sense of the scale that's required. For some application specific reasons, I need to ...

kitkat

asked Jun 10, 2024 at 2:16

0votes

1answer

94views

Data file ingestion with minio and kafka

I want to collect a lot of files (file data + metadata) from local servers to a central server. Files are important, need to ensure that no files are lost Local servers: implement a collector to ...

kietheros

asked Jan 18, 2024 at 17:40

3votes

1answer

978views

How to store a huge volume of time-series datapoints in an efficient way?

We have an application producing 5k-10k datapoints per second. Each datapoint has more than one metric, alongside its time of creation. We are looking for an efficient, scalable way to store this huge ...

Paul Benn

asked Feb 15, 2022 at 12:15

5votes

1answer

1kviews

How do you perform accumulation on large data sets and pass the results as a response to REST API?

I have around 125 million event records on s3. The s3 bucket structure is: year/month/day/hour/*. Inside each hour directory, we have files for every minute. A typical filename looks like this: ...

Namah

asked Feb 19, 2021 at 22:08

1vote

0answers

478views

How to (simply) architecture a way to ingest multiple types of large files, process them, and send data in chunks to web services?

Note: All of this would be in AWS Hi everyone, What would you guys suggest for building something that: Takes in several different input file types (ex: csv, json, jsonl, xml, .gz, ...) That can be ...

user384015

asked Jan 28, 2021 at 19:10

0votes

0answers

63views

Should aggregated data include meta data?

I want to create a aggregation job that executes a big db query and flush it into BigQuery. My question is should I include only the id of the entities (campaign id, advertiser id, user id) or should ...

Avi L

asked Jan 19, 2021 at 14:05

0votes

1answer

95views

A program design question: Good idea using HDFS in c for reading large data?

I have mainly three groups of CSV files (each file is divided into several small files): First group of CSV files have 600+ GB in total (MAYBE 200+ GB if in int, cause CSV calculates by char right?), ...

heisthere

asked Mar 10, 2020 at 16:04

2votes

2answers

3kviews

From Oracle to Apache Parquet : how to handle eventual consistency?

I have an existing production Oracle Database. However, there are performance issues for certain kind of operations, because of the volume of the data, or the complexity of queries. That's why I ...

Klun

asked Feb 2, 2020 at 17:55

1vote

1answer

882views

Load for Date dimension table of a warehouse

I have a general question about loading data into a data warehouse (DW). This is basically a followup to an older question of mine. I have a general understanding problem about fill the [Date] ...

Steffen Mangold

asked Jan 20, 2020 at 0:01

3votes

2answers

167views

Enterprise application warehousing and relational database

I have a general question about design pattern for an enterprise application. I read a lot about it but its actually hard to find an answer because most you find it rater about how to design a data ...

Steffen Mangold

asked Jan 17, 2020 at 6:23

3votes

2answers

2kviews

Aggregation and storage system design for user event processing?

I have a eCommerce like system which produces 5000 user events (of different kind like product search/product view/profile view) per second Now for reporting business users would like to view the ...

M Sach

asked Jan 6, 2020 at 2:42

1vote

3answers

292views

Query 30 million HTML documents

I have 30-ish million html documents in a file system. There is no emergency, the files are in a reasonable directory tree, it's not breaking the file system. But I'd like to be able to organize and ...

Martin K

2,947

asked Dec 13, 2019 at 21:33

0votes

0answers

85views

Generating fake number for a 25 digit PII number in a file containing millions of rows

I have to expose some sensitive data containing a PII column that has a 25 digit number. Rest of the columns aren't PII data. This is done such that the data can be safely shared to the larger ...

stormfield

asked Jun 16, 2019 at 2:04

2votes

0answers

29views

How to design a report processing model using Spark in the most efficient way

I have a reporting system which gets time-series data from numerous meters (here I am referring it as raw_data) I need to generate several reports based on different combinations of the incoming ...

Remis Haroon - رامز

asked May 2, 2019 at 7:02

2votes

2answers

1kviews

Designing a big data web app

How do you design a website that allows users to query a large amount of user data, more specifically: there are ~100 million users with ~100TB of data, data is stored in HDFS (not a database) number ...

Minh Thai

asked Feb 18, 2019 at 3:45

15 30 50per page

2 3 4 5 Next

Stack Exchange Network

Questions tagged [big-data]

What is an optimal system design for tracking product views per user that is scalable?

Data file ingestion with minio and kafka

How to store a huge volume of time-series datapoints in an efficient way?

How do you perform accumulation on large data sets and pass the results as a response to REST API?

How to (simply) architecture a way to ingest multiple types of large files, process them, and send data in chunks to web services?

Should aggregated data include meta data?

A program design question: Good idea using HDFS in c for reading large data?

From Oracle to Apache Parquet : how to handle eventual consistency?

Load for Date dimension table of a warehouse

Enterprise application warehousing and relational database

Aggregation and storage system design for user event processing?

Query 30 million HTML documents

Generating fake number for a 25 digit PII number in a file containing millions of rows

How to design a report processing model using Spark in the most efficient way

Designing a big data web app

Hot Network Questions

Questions tagged [big-data]

Related Tags