OpenAI – Patrick

tiktoken: get number of tokens from string + openai pricing

November 22, 2023November 22, 2023theptrkOpenAI

Get the number of tokens The encoding cl100k_base is used by gpt-4, gpt-3.5-turbo, text-embedding-ada-002.This is how you encode a string into the tokens. You can abstract this into a separate function. Source: openai-cookbook (github) OpenAI Pricing As of November 22, 2023 Splitting chunks for to manage Context Window Limits gpt-3.5-turbo-1106 has a limit of 16,385…

How to use OpenAI whisper with python

June 15, 2023July 14, 2024theptrkOpenAI, python

OpenAI open sourced their speech recognition model: https://github.com/openai/whisper There are 5 sizes: tiny, base, small, medium, large Note: you need to pip install openai-whisper (pypi link) Here is code to transcribe an audio file