Thanos use case - Any ideas on this scale? #4872
Replies: 3 comments
-
|
Hi, in case Slack didn't answer on this here are my thoughts about this: Can't say I've worked with that scale with Thanos but with other software. Remote write certainly is the best way instead of sidecar. Generally I can't say if Thanos will scale that well on that scale. As mentioned, the compactor is for sure a bottleneck that can't be easily fixed as of today. My suggestion above is based on experience with Thanos and other software at large scale but I don't claim that this will work for sure. Would be great to know how you solved this in the end. |
Beta Was this translation helpful? Give feedback.
-
|
Did you test thanos on your infrastructure ? I am currently deploy thanos and need some feedback about using thanos store / receiver / bucket / ... Thanks :) |
Beta Was this translation helpful? Give feedback.
-
|
Hi, can you share Thanos use case? We are currently studied about Thanos. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
We are looking at using Thanos. We have the following environment/use case and are looking to see if Thanos would be a good fit.
There will be around 27000 devices across different sites for which metrics are scraped.
These devices are scraped by a total of 3000-5000 prometheus instances.
Metrics ingestion into Thanos will be via remote write.
Total number of metrics will be around 5 million per minute.
Data will be stored in S3
Data will be retained for three years and should be queryable
We will be using a containerized environment (Cloud foundry, looking at Amazon EKS as another option).
Current limitations in CF, max 16GB memory per container. Max 20GB disk space per container.
Containers will have a fixed open file limit op 16k
We will be running the different thanos components in seperate apps in cloud foundry.
Question to the community, is Thanos a viable solution, looking at above requirements and limitations. Does anyone have experience using Thanos is a similar infrastructure or deployment and using it on this scale?
What are the thoughts on the data retention and being able to query three years of data.
What would be the best way to set up thanos store, taken into account the limits mentioned above(disk space and open file limits). We have done some initial testing and are now sharding data on two weeks. Is there another way to go about this? Having to need different stores that each shard 2 weeks of data will mean a lot of stores when looking at three years of data. (150+ when taking HA into account)
PS Also posted the same in Thanos slack chanel.
Beta Was this translation helpful? Give feedback.
All reactions