Strategy 1: increment a suffix so like “file.pdf(1)” Version: Simple Version: Guards against race conditions The above is safe if there is only one process writing to the location but it can include a race condition. This is the safer version that “opens” the file. Strategy 2: attach a timestamp
Removing stopwords with NLTK
Code with explanations: Just the code
How to write files and read files in python
In short How would you optimize saving expensive API calls? Explanation Say you have a function with output you want to save. It could be from: Step: Create the directory you are saving to. I usually write to the system tmp directory or a project tmp tmp directory and put “tmp” in my gitignore so…
How to use the python rank-bm25 library
Note: this library is called rank-bm25 on pypi (pypi) and NOT bm25 Official docs (github) say: However, you often want to clean the corpus (lowercase, remove punctuation) before indexing the corpus. Once you do that you must keep the original corpus around that index into the original unedited strings. This works for most cases! Note,…
How to time a function in python (bonus: intervals ⏱️)
Method 1: use time.time() and do calculations Method 2: Create a Timer class that stores the start time and last time recorded
Install nginx locally with docker
Step: Download docker desktop Step: install nginx with “pull” and “run” Key problem: where is nginx serving? “80/tcp” Step: forward host machine requests to container “0.0.0.0:80->80/tcp”
How to invert docker ps output
How Step: Create a new bash script (“.docker_ps_invert.sh) Step: allow executing Why? Alternative You can choose specific columns to show (like below)… But what if you want to see every column? That’s where the script about shines.
How to format, filter docker output
install nginx on mac with homebrew
Q: How do I install nginx? Q: Where is nginx configured? Q: Where is nginx hosted on homebrew?
Chunking text
These are basic chunking utilities for quickly getting large text blocks into smaller chunks. Starts with Character based, then Word base, then Sentence based chunking.