Open AI just launched GPT-4 and it can handle both images and text.

It is a multimodal model - accepts both image and text inputs, emits text outputs.

Improved capabilities

  1. Greater creativity and advanced reasoning abilities.
  2. Accepts images as inputs enabling tasks such as caption generation and classification.
  3. Longer context of upto 25000 words allowing long-form content creation use cases
  4. Safer and more aligned. It is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on OpenAI’s internal evaluations.

Pricing

(seems a little steep to us)

Availability

  1. API - You need to join the waitlist. Developers can get prioritized API access for contributing model evaluations to OpenAI Evals.
  2. ChatGPT Plus - ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a dynamically adjusted usage cap.
  3. Duolingo, Khan Academy, Stripe, Be My Eyes, and Mem amongst others are already using it.

Limitations

  1. GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors). Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of specific applications.
  2. GPT-4 generally lacks knowledge of events that have occurred after September 2021 which is when the vast majority of its pre-training data cuts off, and the model does not learn from its experience.
  3. It can sometimes make simple reasoning errors that do not seem to comport with competence across so many domains or be overly gullible in accepting obviously false statements from a user.
  4. It can fail at hard problems the same way humans do, such as introducing security vulnerabilities into the code it produces.