Hidden AI Library Prompts

2025-07-22

Times change fast and tooling for AI integrations is rapidly evolving. MCP definitions, AI frameworks, and agentic browsers are the "new JavaScript framework every day" phenomenon. Everyone is trying to be the first to the next big thing before someone else gets lucky and finds the pot of gold. I'd say this is perfectly normal and mimics the process that most disruptive technologies go through. The smartphone wasn't awesome on day one and there are tons of odd Frankenstein devices that show us how crazy that time was.

But as a programmer I can't help but furrow my brow in mild annoyance at the amount of things being thrown against the proverbial wall in hopes that it sticks ¹. I don't want things sticking to my wall, dangit. Okay, enough of that analogy, I'm focusing specifically on all of the code frameworks that seek to streamline interaction with AI services.

I'd say that I'm, at best, putting in 1/4 of the effort needed to keep up with everything so I don't claim to have full mental coverage of everything out there (and I'm not sure how you would) but there's a huge number of pre-written libraries that handle AI interactions and I think we should be very cautious in rushing to adopt them. The main reasons for this are:

We don't always know for sure what they're doing under the hood.
It's just not that hard to do on your own.

Hey! You Should Understand How That Library Works But Let's Face It: You Usually Don't Care

I would say when it comes to adopting libraries most people don't dig too deeply into how they work. I mean, I've had times where I've popped open GitHub and dug through a library's commits and files to figure out how a certain feature works or hoped to unlock some undocumented method I could use. But in general when I find a library that works, I typically download it, read the quick start, and test to see if it does the thing. Mission accomplished! I use pagy for everything and have never bothered looking at the source code. Honestly this is fine for most things, no sense in re-inventing the wheel.

On the other hand, inversely, ² with AI the wheel is still being invented. We're not even sure it's a wheel and in the long run we're not positive we even want to roll. We might want a dodecahedron that we use to fly. I've clearly lost the analogy again but the point is: AI tooling is far from mature and you should dig deeper than you might normally when you're using an AI library/framework.

Example: LangChain.rb

Before we get into this I want to be clear: I love this library. It's clean, lightweight, and relatively unopinionated and offers a bunch of super useful helpers.

"Why focus on this framework in particular?" You might ask. Easy: I love Ruby.

This framework ports some LangChain-like behavior over into a handy set of Ruby classes and it's beautifully easy to setup. A coworker and I used it to stand up a neat little proof-of-concept chatbot in a few hours. It wasn't production ready but it served its purpose.

One of the neat and handy features this library has is an OutputFixingParser class that detects when a response fails to conform to the requested structured output and attempts to fix it.

From their docs:

begin
  parser.parse(llm_response)
rescue Langchain::OutputParsers::OutputParserException => e
  fix_parser = Langchain::OutputParsers::OutputFixingParser.from_llm(
    llm: llm,
    parser: parser
  )
  fix_parser.parse(llm_response)
end

Now that's just pretty darn cool. But wait... how is it fixing the prompt?

From the docs:

If the parser fails to parse the LLM response, you can use the OutputFixingParser. It sends an error message, prior output, and the original prompt text to the LLM, asking for a "fixed" response:

Okay so it's sending a prompt to fix the output. I thought that was an interesting (and logical) solution. But I'd dealt with this in a project I was building at work while doing some formal prompt testing and ChatGPT would return malformed JSON randomly. I asked ChatGPT for a solution and it came up with this little nugget to fix the JSON response:

def try_to_fix_json(raw)
  cleaned = raw.dup

  # Remove code fences if still present
  cleaned.gsub!(/\A```(?:json)?\s*|\s*```\Z/, '')

  # Remove trailing commas (before closing braces/brackets)
  cleaned.gsub!(/,(\s*[}\]])/, '\1')

  # Fix unquoted strings in arrays (very basic heuristic)
  cleaned.gsub!(/(\[.*?)(\w+)(\s*[\]])/, '\1"\2"\3')

  # Attempt parse again
  begin
    return JSON.parse(cleaned)
  rescue JSON::ParserError => e
    puts "❌ JSON still malformed after cleanup: #{e.message}"
    return nil
  end
end

And honestly that was completely good enough for what I was doing but it's far from robust. The idea of using a prompt and asking the LLM to fix it is nice. So I could just use this!

But how does it work? Well if we look at the docs the Langchain::OutputParsers::OutputFixingParser.from_llm method looks like this:

def self.from_llm(llm:, parser:, prompt: nil)
  new(llm: llm, parser: parser, prompt: prompt || naive_fix_prompt)
end

So if you don't pass your own prompt, it defaults to its own naive_fix_prompt which does the following:

private_class_method def self.naive_fix_prompt
  Langchain::Prompt.load_from_path(
    file_path: Langchain.root.join("langchain/output_parsers/prompts/naive_fix_prompt.yaml")
  )
end

Interesting, what's in that YAML file?

_type: prompt
input_variables:
  - instructions
  - completion
  - error
template: | 
  Instructions:
  --------------
  {instructions}
  --------------
  Completion:
  --------------
  {completion}
  --------------
  
  Above, the Completion did not satisfy the constraints given in the Instructions.
  Error:
  --------------
  {error}
  --------------
  
  Please try again. Please only respond with an answer that satisfies the constraints laid out in the Instructions:

Nice, so that's a cool prompt. But here's the main take-home of this post:

You wouldn't know what prompt was being sent to the LLM if you didn't dig into how this feature worked.

And that's where you need to be cautious.

Testing is Everything

If you've read Chip Huyen's AI Engineering then this will sound familiar but one of the most critical steps when implementing AI into anything is evaluation. When I say evaluation I mean rigorously testing your prompts across multiple models using high-quality data. This should be the bare-minimum because AI has a non-zero error-rate and you really need to know that number before you set anything loose on the world (or your project, or your company's internal data/processes).

If you implemented something in production that utilized the OutputFixingParser mentioned above without testing, you can't know how often that built-in prompt might fail and that might cause serious issues. Honestly it probably wouldn't as this is a pretty basic feature and if you're exception handling is good this won't bring the world down. But the point is that you should know what prompts your libraries are sending to your LLM's. They are burning your tokens and you are relying on them to succeed so it's important to uncover them, and test them along with your regular process if you intend to use them.

Conclusion

Just be careful. Use libraries, experiment with them, implement them. Just like we always have - but take a little extra time to dig up what prompts an AI library might be sending to your model.

Thanks for reading!

-- Rick

What a super weird thing to say. When is that a good thing? I mean a quick Google search will tell you it's a method to determine if pasta is cooked but that's a very specific scenario. How did that expand to software? Anyway...↩
Homestar Runner Reference ↩

#ai #llm #ruby