Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unable to stream API response in Flask application on Google Cloud application

I’m developing a little testing website using the OpenAI API. I’m trying to stream GPT’s response, just like how it’s done on https://chat.openai.com/chat. This works just fine when running my Flask application on a local development server, but when I deploy this app to Google Cloud, the response is given in one go, instead of being streamed. I have tried disabling buffering according to https://cloud.google.com/appengine/docs/flexible/how-requests-are-handled?tab=python#x-accel-buffering, but that didn’t resolve the issue. I suspect that my issue lies in how I’m configuring my app on Google Cloud (or the lack thereof).

This is what I’ve got going on currently, this works when running the application locally.

main.py

@app.route('/stream_response', methods=['POST'])
def stream_response():
    prompt_text = request.form['prompt']

    def generate():
        for chunk in gpt_model.get_response(prompt_text, stream=True):
            for choice in chunk['choices']:
                dictionary: dict = choice['delta']
                if 'content' in dictionary:
                    yield dictionary['content']
    
    response = Response(generate(), content_type='text/html')
    response.headers['X-Accel-Buffering'] = 'no'
    return response

prompt.html

<script>
    function streamResponse() {
        var promptText = document.getElementById("prompt").value;
        var xhr = new XMLHttpRequest();
        xhr.open("POST", "/stream_response", true);
        xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
        xhr.onprogress = function () {
            document.getElementById("response-container").style.display = "block";
            document.getElementById("response").innerHTML = xhr.responseText;
            console.log(xhr.responseText)
        };
        xhr.send("prompt=" + encodeURIComponent(promptText));
    }
</script>

Google Cloud app.yaml

runtime: python310

handlers:
- url: /.*
  script: auto

Google Cloud deployment process

gcloud app deploy

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

From your app.yaml, it means you’re deploying to Google App Engine (GAE) Standard Environment. GAE doesn’t support streaming – see doc where it says

App Engine does not support streaming responses where data is sent in incremental chunks to the client while a request is being processed. All data from your code is collected as described above and sent as a single HTTP response.

Chat UIs also usually require sockets. GAE Standard doesn’t support web sockets but GAE Flex does (see docs)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading