I was fortunate to get early access to OpenAI's GPT-3, which gave anyone with coding chops the power of the most advanced language model to date (175B parameters) via a wonderfully-designed API.
In tinkering, I became particulalry impressed by GPT-3's ability to parse text. Curious to explore its limits, I built an ingredient parser: take a photo of a nutrition facts label and GPT-3 will parse each ingredient (ignoring all other text), assign an emoji, give a defintion, and determine whether it's unhealthy. Who knows how GPT-3 makes this latter judgement—and I only provided a few examples—but then isn't this fairly subjective for humans?
Behind the scenes I'm passing the image to an OCR service so I can then send the extracted text to GPT-3. Rudimentary string matching allows us to find the tokens in the OCR data and highlight them in the interface. It takes awhile for GPT-3 to return all desired information, so I resorted to otherwise superflous tactics (staggered display of information, drawn-out animations, etc.) to improve perceived performance.
Much of the work here was in "training" GPT-3: providing the right set of examples that yield reliable results. Particularly tricky was teaching it to avoid text that looks like an ingredient but is not; for example, note that "high fructose corn syrup" (which appears at the bottom of the label only to say it is not an ingredient, lol) is correctly excluded by GPT-3.
I should mention that this is not a practical application of GPT-3; you wouldn't want randomness in ingredient definitions, for example. And clearly, this was less "solve a problem" and more "let's see what this shiny new technology can do." But hey it's a side project so I do what I want. :)