Architecture of ChatGPT4

Architecture of ChatGPT4

Here is the Technical Architecture of ChatGPT 4.x, which I managed put together into one piece from technical tidbits sourced from internet.

I drew this architecture in cloud agnostic way, but is deployable in _Amazon Web Services (AWS) or Microsoft Azure or Google Cloud platforms.
_
OpenAI has not leaked any significant technical infrastructure detail on the architecture that GPT runs on. With the cut-throat competition they face, it is understood that they chose to be in silo. But, there are trust worthy sources which revealed astounding information over time on their Transformer.

Here we go –

1) Typically, an A*PI Gateway* is fronted to meter the user traffic to streamline access to GPT models.

2) Users are authenticated through an IdP (Identity Provider) that can handle SSO/Oauth and IAM functions. On a trivial note, ChatGPT had used Auth0 product to authenticate their user base.

3)Then comes the Big Elephant: The GPT model. Nah, nah, I didn’t intend to fat shame the GPT model, but it was said so because of its predictable “Gigantic” size. It should be in Terrabytes. Let’s do a simple maths: Mistral AI 7B model’s physical size is 13.5 GB. And GPT is 1.8 Trillion parameter model, which is 257 times bigger than Mistral’s. So, in layman Maths, GPT’s physical size should be 257 X 13.5 GB at least. That would be easily 3 Terrabytes +
Transformer Architecture: The Transformer architecture came in limelight when ChatGPT’s arrival broke the internet. A Transformer is a deep learning network, comprised of umpteen number of Encoding and Decoding layers. The embeddings (input prompt in vector form) are squeezed through Encoding layers, then run through whole bunch of Attention layers (which weighs on the context of the prompt) to derive context….and then the sequence is run through Decoding layers to generate meaningful outcomes based on desired filtering controls (like Temperature etc). This Transformer architecture is based on **“Attention is All You Need” **paper originally published by Dr Ashish Vaswani and his gang.

4) Finally, here comes the heavy machinery that GPT runs on: NVIDIA GPUs. From NVIDIA’s official sources it is confirmed that GPT runs on swath array of NVIDIA H100 Tensor Core GPUs. To meet exploding user traffic, the VMs were scaled with NVIDIA Quantum-2 InfiniBand networking. That was latest by mid 2023, may be upgrades were made along the way since then.

Besides everything said already, Log trailing and Monitoring infrastructure must be in picture somewhere. And, the ML Ops pipeline is altogether a different topic which is kinda outside this “production-cut” workflow.

Running such a scale of LLM might cost few thousands of $$ on cloud; building one costs billions of dollars.

There are tons of LLMs already out there in the market.ChatGPT – New! isn’t just famous because it was first of its kind, but it was/is always one of Top3 since inception.

Leave a Reply

Your email address will not be published. Required fields are marked *