if you want to remove an article from website contact us from top.

    which cloud run autoscaling setting should you set if you want to limit cost?

    Mohammed

    Guys, does anyone know the answer?

    get which cloud run autoscaling setting should you set if you want to limit cost? from screen.

    Is there a minimal time after which a cloud run instance isn't scaled down when using always allocated CPU?

    I currently have a typical Cloud Run service implemented and want to extend it with some asynchronous functionality, which is executed in response to an incoming HTTP request. These asynchronous ta...

    Is there a minimal time after which a cloud run instance isn't scaled down when using always allocated CPU?

    Ask Question Asked 6 months ago

    Modified 6 months ago

    Viewed 775 times

    Part of Google Cloud Collective

    2

    I currently have a typical Cloud Run service implemented and want to extend it with some asynchronous functionality, which is executed in response to an incoming HTTP request. These asynchronous tasks would take no longer than 5-10 minutes.

    I am now wondering if cloud run services with the "always allocated cpu"-option enabled can guarantee a 15 min window of allocated cpu time after the response of the last request has been send. I understand that an instance, which has not received a request in more than 15min is subject to termination. But is it also the other way around?

    I found the following paragraph on the cloud run documentation:

    Even if CPU is always allocated, Cloud Run autoscaling is still in effect, and may terminate container instances if they aren't needed to handle incoming traffic. An instance will never stay idle for more than 15 minutes after processing a request (unless it is kept active using min instances).

    (https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation)

    This is the only time though that the 15 minutes time interval is mentioned in this article and also the gantt chart in the article does not show any fixed time of guaranteed cpu-allocation time after the last sent response.

    Is there some guaranteed cpu-time interval after a request?

    google-cloud-platformgoogle-cloud-run

    Share

    edited May 17 at 14:05

    asked May 16 at 20:00

    Lucas Mähn 6332 2 gold badges 6 6 silver badges 17 17 bronze badges Add a comment

    2 Answers

    1 +50

    My take from the container contract was that it's best to confine yourself to the request/response flow, but have mostly been practicing this because it lends itself to having an easier time tracing requests.

    While there isn't anything clear to point out that it's not permissive to use the allotted idle time for out of bound processing like you intended, perhaps it would be prudent to either use Cloud Tasks or the new Cloud Run Job workload, if that's an option.

    Share

    answered May 19 at 9:41

    Filip Dupanović 31.8k13 13 gold badges 81 81 silver badges 113 113 bronze badges

    I guess you are right. Cloud Run Job and Cloud Functions Gen. 2 (because of concurrency support) + PubSub actually look like they would fit well. I guess even with always allocated cpu time enabled, the cloud run use-case is really just for workloads bound to a request/response. It seems like I misunderstood the value of this new option... –

    Lucas Mähn May 22 at 16:25 Add a comment -1

    This is not possible as per documentation that you provided (same with CPU allocation (services) documentation).

    Note that even if CPU is always allocated, Cloud Run autoscaling is still in effect, and may terminate container instances if they aren't needed to handle incoming traffic. An instance will never stay idle for more than 15 minutes after processing a request unless it is kept active using minimum instances.

    One way of keeping idle instances permanently available is by setting your min-instance with value more than 1 however this would incur cost even if the service is not actively handling any requests.

    You can check this documentation about container instance autoscaling for additional information.

    Share

    edited May 20 at 5:49

    answered May 17 at 4:09

    Robert G 9842 2 silver badges 10 10 bronze badges 1

    This does not answer my question unfortunately. The request timeout refers to the time between a request and its response. I am interested on how long an instance stays active or idle (is not scaled down) after it does no longer process any requests. This allows asynchronous processing of some tasks which are merely triggered by an incoming request, the requested user has no need to wait for the completion of that asynchronous task though. –

    Lucas Mähn May 17 at 7:36

    I would also consider it a bad practice to have such long response times and reconnection mechanism as you suggested since cloud run is not designed for stateful applications. –

    Lucas Mähn May 17 at 7:41

    I've updated my answer. Kindly check. –

    Robert G May 20 at 5:49 Add a comment

    Not the answer you're looking for? Browse other questions tagged google-cloud-platformgoogle-cloud-run or ask your own question.

    Google Cloud Collective See more

    This question is in a collective: a subcommunity defined by tags with relevant content and experts.

    The Overflow Blog

    Here’s what it’s like to develop VR at Meta (Ep. 508)

    Why writing by hand is still the best way to retain information

    Featured on Meta

    The Windows Phone SE site has been archived

    Stack Gives Back to Open Source 2022

    The [collapse] tag is being burninated

    The Ask Wizard (2022) has graduated

    2022 Community Moderator Election Results

    Related

    2

    How do Micrometer custom metrics work for containers with few short invocations when using Cloud Run and Stackdriver

    स्रोत : stackoverflow.com

    About container instance autoscaling

    Cloud Run Documentation Guides Was this helpful?

    About container instance autoscaling

    In Cloud Run, each revision is automatically scaled to the number of container instances needed to handle all incoming requests or events. When a revision does not receive any traffic, by default it is scaled in to zero container instances. However, if desired, you can change this default to specify an instance to be kept idle or "warm" using the minimum instances setting.

    In addition to the rate of incoming requests or events, the number of instances scheduled is impacted by:

    The CPU utilization of existing instances when they are processing requests or events, targeting to keep scheduled instances to a 60% CPU utilization.

    The maximum concurrency setting

    The maximum number of container instances setting

    The minimum number of container instances setting

    About maximum container instances

    In some cases you may want to limit the total number of container instances that can be started, for cost control reasons, or for better compatibility with other resources used by your service. For example, your Cloud Run service might interact with a database that can only handle a certain number of concurrent open connections.

    You can use the maximum container instances setting to limit the total number of instances that can be started in parallel, as documented in Setting a maximum number of container instances.

    Exceeding maximum instances

    Under normal circumstances, your revision scales out by creating new instances to handle incoming traffic load. But when you set a maximum instances limit, in some scenarios there will be insufficient instances to meet that traffic load. In that case, incoming requests can be queued for up to 10 seconds. During this time window, if an instance finishes processing requests, it becomes available to process queued requests. If no instances become available during the window, the request fails with a 429 error code.

    Scaling guarantees

    The maximum instances limit is an upper limit per revision. Setting a high limit does not mean that your revision will scale out to the specified number of container instances. It only means that the number of container instances for this revision should not exceed the maximum.

    Exceeding max instances

    In some cases, such as rapid traffic surges or system maintenance, Cloud Run might, for a short period of time, create more container instances than are specified in the maximum instances setting. New instances can be started in excess of the maximum instances setting to replace existing instances and to provide a grace period for inflight requests to finish processing.

    The maximum instance limit can be exceeded under normal operation a few times per week. The grace period usually lasts up to 15 minutes, or up to the value specified in the request timeout setting. These extra instances are destroyed within 15 minutes after they become idle.

    If many replacements are needed, the updates are usually spread out over many minutes or hours, but each replacement has an excess instance for just the grace period. Instances in excess of the maximum instance value are normally less than twice the configured maximum instances limit, but can be much larger for sudden large traffic spikes.

    Load tests experience more instances exceeding the maximum instances setting because the system may change where traffic spikes are served to preserve capacity for existing workloads that have sustained load patterns.

    If your service cannot tolerate this temporary behavior, you may want to factor in a safety margin and set a lower maximum instances value.

    Traffic splits

    Because the maximum instances limit is a limit for each revision, if the service splits traffic across multiple revisions, the total number of instances for the service can exceed the maximum instances per revision. This can be observed in the Instance Count metrics.

    Deployments

    When you deploy a new revision to serve 100% of the traffic, Cloud Run gradually migrates traffic from the revision previously serving 100% of the traffic to the new one. Because the maximum instances limit is a limit for each revision, during a deployment, the total number of instances for the service can exceed the maximum instances per revision. This can be observed in the Instance Count metrics.

    Idle instances and minimizing cold starts

    Cloud Run does not immediately shut down instances once they have handled all requests. To minimize the impact of cold starts, Cloud Run may keep some instances idle for a maximum of 15 minutes. These instances are ready to handle requests in case of a sudden traffic spike.

    For example, when a container instance has finished handling requests, it may remain idle for a period of time in case another request needs to be handled. An idle container instance may persist resources, such as open database connections. Note that CPU is only allocated during request processing unless you explicitly configure your service to have CPU always allocated.

    To keep idle instances permanently available, use the min-instance setting. Note that using this feature will incur cost even when the service is not actively serving requests.

    What's next

    To manage the maximum number of instances of your Cloud Run services, see Setting a maximum number of container instances.

    स्रोत : cloud.google.com

    3 Ways to Optimize Cloud Run Response Times

    “Season of Scale” is a blog and video series to help enterprises and developers build scale and resilience into your design patterns. In this series we plan on walking you through some patterns and…

    3 Ways to Optimize Cloud Run Response Times

    3 Ways to Optimize Cloud Run Response Times Season of Scale

    Season of Scale

    “Season of Scale” is a blog and video series to help enterprises and developers build scale and resilience into your design patterns. In this series we plan on walking you through some patterns and practices for creating apps that are resilient and scalable, two essential goals of many modern architecture exercises.

    In Season 2, we’re covering how to optimize your applications to improve instance startup time! If you haven’t seen Season 1, check it out here.

    How to improve Compute Engine startup times

    How to improve App Engine startup times

    How to improve Cloud Run startup times (this article)

    Critter Junction has created a pretty diverse Compute infrastructure between Compute Engine, App Engine, and Cloud Run. We learned in Season 1 that they decided to go with Cloud Run for their Layout App. To refresh, the Layout App is a key part of the game. You can share house layouts with other players. Now they’re looking to optimize Cloud Run for scalability.

    The Layout App

    Check out the video

    Check out the video Review

    Since containerizing their Node.js application, they’ve decided to run it on Cloud Run for its portability, statelessness, and autoscaling, even to zero.

    Unlike their online site running on App Engine, they haven’t needed to write warmup wrappers in their code, because Cloud Run may keep some idle instances around to handle spikes in traffic.

    Cold starts on Cloud Run

    The thing is, Cloud Run will terminate unused Cloud Run containers after some time…which means a cold start can still occur.

    After looking at recent deployments of the Layout App, we noticed a few things that could be improved to minimize cold start latency.

    First they happened to be using a dynamic language with dependent libraries, like importing modules in Node.js.

    They weren’t using global variables.

    And their container base images were about 700 megabytes in size.

    This meant overall they were facing longer load times upon container startup or required additional computation before the server could start listening for requests. Instead, they want to optimize their service startup speed to minimize the latency that causes these.

    Let’s dive into each of these.

    #1 Create a leaner service

    For starters, on Cloud Run, the size of your container image does not affect cold start or request processing time.

    Large container images, however, mean slower build times, and slower deployment times.

    You want to be extra careful when it comes to applications written in dynamic languages. For example, if you’re using Node.js or Python, module loading that happens on process startup will add latency during a cold start.

    Also be aware of some modules that run initialization code upon importing.

    To build a leaner service you can:

    Minimize the number and size of dependencies if you’re using a dynamic language.

    Instead of computing things upon startup, compute them lazily.

    Shorten your initializations and speed up time to start your HTTP server.

    And use code-loading optimizations like PHP’s composer autoloader optimization.

    #2 Use global variables

    In Cloud Run, you can’t assume that service state is preserved between requests. But, Cloud Run does reuse individual container instances to serve ongoing traffic.

    That means you can declare a global variable. When new containers are spun up, it can reuse its value.

    You can also cache objects in memory. Moving this from the request logic to global scope means better performance.

    Now this doesn’t exactly help cold start times, but once the container is initialized, cached objects can help reduce latency during subsequent ongoing requests.

    For example, if you move per-request logic to global scope, it should make a cold starts last approximately the same amount of time (and if you add extra logic for caching that you wouldn’t have in a warm request, it would increase the cold start time), but any subsequent request served by that warm instance will then improve latency.

    // Global (instance-wide) scope

    // This computation runs at instance cold-start

    const instanceVar = heavyComputation();

    /**

    * HTTP function that declares a variable.

    *

    * @param {Object} req request context.

    * @param {Object} res response context.

    */

    exports.scopeDemo = (req, res) => {

    // Per-function scope

    // This computation runs every time this function is called

    const functionVar = lightComputation();

    res.send(`Per instance: ${instanceVar}, per function: ${functionVar}`);

    };

    A lot of this boils down to creating a leaner service.

    स्रोत : medium.com

    Do you want to see answer or more ?
    Mohammed 6 day ago
    4

    Guys, does anyone know the answer?

    Click For Answer