Reliability Calls #1: A Masterclass Series on Reliable AI

Are you enjoying this session?

See exactly how PromptQL works for your business.

Book demo

Every month, we dive into what it takes to build and scale Reliable AI — the kind that survives edge cases, earns trust in production, and becomes mission-critical. From identifying critical reliability challenges in AI to showcasing practical solutions and their impact, this is where leaders working on AI projects come to learn, share insights, and stay ahead in making AI reliable for the enterprise.

Data readiness is a myth! Don’t let it hold back your AI Initiatives Organizations routinely delay mission-critical AI deployments by months while trying to get data "AI-ready" — an unrealistic expectation in complex organizations with diverse systems and tribal knowledge.

What's discussed in the video

My name is Rob Dominguez. I am in the engineering team over at Hasura, which is focusing on a new product called PromptQL. And that's probably what drew you to the call today. And that's why we're here. first off, what's in a name? Chances are, if you are a familiar guest of the show, you've heard us call this the community call for quite a while. At this point though, we have shifted the name to reliability calls. The reason being is that as an organization and as an industry, Everybody is starting to focus more and more on AI, but in particular about the reliability and accuracy of those AI tools. And what we've been building with PromptQL, we really feel focuses tremendously on being both reliable and accurate. So that's just a little bit of housekeeping as to the name change that we've had here. There's a couple of exciting announcements that are coming up starting at the beginning of next week and then throughout the month of June and July. So please do keep a lookout for social media and any posts that we'll be having around these updates that are coming through. Let me mute my notifications as well so they're out of the way. And then we'll keep driving on with all of this. So do not disturb. There we are. All right. A couple other housekeeping things. If this is your first time on the call, we have an engagement option over on the right-hand side. And if you click on that inside of the window, you'll see a few things. first off in the chat, if you'll go ahead and just drop a message, let us know where you're joining from. We love to kind of see the global reach that we have within these calls. and where people are joining us from. So we'd love to see where you are. Say hi. Additionally, we have a Q&A option. So if there are any questions that you have about anything that we're presenting during the call itself, please let us know in the Q&A section. We can actually, as part of this platform, bring those questions up onto the stage so everybody can see them and read them, and it'll be very clear and we all know what's being discussed. We'll also make sure that anybody that's presenting is able to address those questions during the call itself at the end of each segment. Thanks to everybody who's jumping in and letting us know where they're joining from. We got somebody from ABQ. Thank you, Andrew. Gary from Tennessee, Roll Tide Gary from Alabama, born and bred, but out here in SF now. So I'll see you in October, I guess. Anybody else, please let us know where you're joining from. Without further ado, I'm going to go ahead and jump into our first speaker today. Actually, I will look at the agenda real quick. You can see we have 3 things today, all continuing along with that theme of reliability and accuracy. We have one towards the end that we're going to talk about as well, and it's going to be focused on some updates to our UI. We'll see some changes in what we call the PromptQL console, and again, some things that are being rolled out over the next couple of weeks. that hopefully will get everybody excited about their experience. All right. So first up, I'm goanna bring a new shred up on the stage. So if Harsha, who's behind the scenes organizing things, will go ahead and throw a new shred up here with me. How are you, man? I'm good. I was thrown on the stage, so I said ouch. Oh, OK. That's fair. That's fair. I like to think I'm good at some things. I'm not good at reading things in a mirrored setting. And I don't know, for me, what I'm looking at right now. So I guess because I'm displaying my screen, it's not mirrored on what I'm seeing in front of me right now. So I'm going to try this real quick. How the agentic semantic layer makes AI understand your business data. I did it. OK, cool. No AI needed there. Yeah, for sure. So I think the key message that I want to drive through this short conversation that we have today is Your employees, your customers speak a certain language. You understand your domain, you understand your business, you understand your systems, and you are able to work with your data, work with your systems very reliably because you understand your domain and you speak that language. But your AI does not. Your AI, you have predefined AI pipelines which speak a certain language which is not your business's language. What you will learn through this is how to make your AI speak your business's language so that it becomes highly reliable in your context. Awesome. All right. Harsh is going to take me off the stage and Anushrut, it's yours. Awesome. Hey, folks. So let's get this started. Let me share my screen. On chat, can I get, if you can see my screen? I'm assuming yes. So, okay. So, what's the secret weapon behind reliable AI? It's basically an AI that can speak your company's language, as I said. And how do we enable your AI to speak your company's language is with a semantic layer that your engineers don't have to build. So I have said a lot, but I'll dive deep into what I mean by all of this and show you how this works. But before that, I would love to know from all of you. Imagine if you had a hundred percent accurate AI for any analysis, intelligence, decision making or automation that you needed. What kind of massive business value, and when I say massive, I really mean massive, could even be hundreds of millions of dollars that you could unlock if your AI, if your large language models could reliably do something for you, for your customers, or for your employees? So please scan this QR code and just type it out. What value would you be able to unlock if you had reliable AI? And as you do that, let's see if there are live responses coming in. Oops, it wasn't activated, is it? OK. OK, folks, scan this QR code and type in things that you think you can unlock with a reliable AI. If I don't see a response in the next ten seconds, either my poly V is broken or it's not. Okay. Predict what deals are going to close this quarter? Makes sense. Great sales use case. Get detailed analysis on marketing campaigns. Great go-to-market use case. I'm assuming like first one needs to connect to Salesforce and a few other of your sales systems and then answer a question reliably on this. a detailed NASA marketing campaign. So pull a bunch of data from Marketo and all the other different tools that you're using and get insights on that. So which is fair. This is what you want AI to do, what it claims it can. But you don't use AI for any of this, do you? Today, you don't. Why is that? Why is this value gap between the claims of AI and the reality of AI? Let's try to break it down. Before there was AI, for any of these things that you wanted to do, like decision-making, you wanted intelligence and insight on your data, you want to create automations or write software, you want to generate reports, you want to create applications, You rely on these humans which are experts in their own domain. These are your analysts, your engineers, your data scientists who know how to work with a bunch of different systems under the hood. These would be your databases, your SaaS applications, different microservices that you have, a bunch of documents that you might be working with or just be able to look up information on the open web. So they are the ones that are gatekeeping the value for the business user from these underlying data sources. But that is what we have been trying to replicate with AI. But AI isn't there yet, right? is not as dynamic, as flexible, as knowledgeable as these people in the data work. Why? Because we have been building AI with certain design patterns in mind, certain architectures in mind. We as humans, what we do is we are very flexible. We don't process things in our head, right? We consume a bunch of information from different sources. We try to operate with different type of systems using the different types of tools available at our hand. We dynamically think about strategies that are required for a certain business problem that we are trying to solve. We are very explainable. If someone asks how we did something, we can tell how we did something. We can course correct our own mistakes. And then finally, we were not hired for a certain use case. We didn't say that, hey, you are a sales forecasting guy. All your entire job is to run the same thing every month. No. I am a general analyst. Whatever problem you throw at me, I'm going to adapt and solve that problem based on all of my expertise. So why are we building AI like this? Tell me if I'm wrong. This is the architecture, the superficial architecture of all of the AI applications that you or your team has been building. You ask a question to an agent. This agent might have ten other agents under the hood. Each agent might have ten different tools under These agents are talking to each other in natural language, and these are certain tools which limit the abilities of each of these sub-agents. There is no guarantee that there is a reliable orchestration between these agents. There's no guarantee that they are able to transfer a lot of context, a lot of data between each other. If my question steps out of the line, for which there isn't a tool available or there isn't a sub-agent available, it just completely falls on its knees. And that is the problem, right? Because we are so rigid in our thinking of how to build AI systems like So to start from first principles, right? What makes any human or any AI reliable is the fact that it is predictable and explainable, Think of your traditional software systems, right? You know exactly what it's going to do. And the developer who built it, you can ask them exactly how it works under the hood. And it's reliable. What makes a human reliable is that you trust an employer, trust a colleague. If they are predictable, they're consistent in what they do. They don't fail in unexpected ways. And whatever they're doing, they can completely explain what they're doing. And hence, you can exercise control over it, right? You can steer yourself. You can fix your own mistakes. This is the same property we want with AI. But if we use any of these standard architectures that we've all been thinking about, I'm sure you've all heard of frag retrieval augmented generation. You've all heard of tool calling, tool composition, model context protocol. You've heard of generating SQL queries from natural language or generating database queries from natural language. All of these are the different approaches of connecting data or different external systems to your AI, and then creating agents on top using these methods, and then you orchestrate these agents somehow. But let's look at a rag system. This is an e-mail provider. You ask it, when was my last Uber trip and how much did I spend? It says April, 15, spent 14 dollars. Okay, cool, makes sense. Then I ask it, what time exactly was this trip? Now it says that this trip was on June, 29, 24. Why have you lost your context? Because every time I ask this question, you're doing a separate semantic search under the hood, and whatever is the top semantically relevant email you're surfacing here, you have no idea about my context, you don't understand the kind of follow-up question I'm asking, and you can't realistically sift through a billion emails that I have and answer my question. And then I ask it to explain itself like how did you get this answer? You said my last trip was on April, and it says each product has its own strengths deciding which one is best for you depends on what you're trying to do. It's completely irrelevant response, right? So there's no guarantee how you will be able to enable a breadth or depth of tasks. This is what we've been hearing from all of our customers as well. Like, RAG is not the solution. Maybe finding some, what is the right document for this one question, maybe that's fine. RAG is great. But for real enterprise use cases, RAG is not the solution. Let's look at tool composition, right? This is AI assistant on one of the biggest CRMs. Ask her this question, can you calculate the average length of our sales cycle? It says, can you please provide more details on the specific data report that you have in mind? It gives me a suggestion. I click on that suggestion. It just tells me how to create a report myself. I refresh the context, ask it again. This time it says your average sales cycle is 71.6 days. And I ask it, can you explain it? It says I arbitrarily took stage one to stage 4 age-length average. But we have 7 stages. Can you look at all of them? Never came back with a response. Refresh the page again, ask you the same question. This time it says 2 point to one days. No consistency, no repeatability, no explain ability into how really it is, what it's doing under the hood. And I can't, I can't work with this anymore. I don't even know what to prompt to get the right So it's not predictable for complex tools or too many number of tools. Even though we have this amazing standardized protocol of connecting tools to AI systems, which is model context protocol. But still, if there are a hundred MCP tools that have connected to my AI, there is still no guarantee that my AI is picking the right tool, passing the right context, orchestrating reliably between these tools. So let's look at text to SQL. So what about structured data or like other data systems that we have and we translate natural language queries into these specific database queries. We asked a question like, how many albums from the metal genre have a positive and happy sounding title? question which I want to ask. How do I translate positive and happy sounding into SQL? There is no way for me to do that, right? Even though SQL we say is a Turing complete language, but I can't translate positive and happy into SQL. And that's where this is one of the co-pilots which gives up that, no, I can't do that. And second, me as a business user who does not even understand SQL, I can't work with these systems, right? I can't I can't reliably steer it, understand what it's doing. And this is limited to the analysts. It's limited to the SQL experts. I am not one of them. So one of our customers said that only our analysts can use something like Text-to-SQL. It's limited to what's in the database. What if I want to connect it to external systems? I can't have a Text-to-SQL-based solution. So what's the solution? You mix and match all of these, you create agents, but you can increase the quality of the output, but this will keep increasing the complexity, Tool calling that can integrate search base retrieval and extra SQL, sure, but same pitfalls as tool composition, right? Complexity of tools. What about multiple agents, which is just like a bunch of tools and then talking to one AI and then a multiple AI is talking to each other. But then they get caught in the collaboration loops, they forget to stick to a specific plan that the orchestrator agent came up with, and there is no reliable way of transferring a bunch of data and context between these different agents. So see, this is what the current approaches are doing. Whatever system you have built, any AI system that you have built, that AI system takes an input, generates a result. This AI system was created for a use case. And the result was generated. But if you expect the result to be generated, you are expecting all the LLMs to hallucinate, and you just hope that the hallucination is correct. if you let the AI generate the final result, there is no guarantee. There's no guardrails you can put on it that there will be no hallucinations. There will be guaranteed correct results. second, you will be confined to that specific use case that your AI system was designed to So it's okay for like, if you have a very clear use case and you know exactly why you're building, what you're building, and there's only a certain very tiny breadth of queries you want to run on it, great. These systems work incredibly. But for any enterprise use case, for any real world use case, you need a general purpose system. You want a system which can adapt to whatever you need it to. Right? So that's what we have tried to build with PromptQL, which is like a catch all domain specific language that decouples this planning and execution. And what I mean by that is that whatever LLM that you're using, the best state of the art LLM that you're using, it's responsible only for creating a plan. think of it this way. If I ask an analyst, hey, this is a business problem. Why do you think our sales are dropping month over The analyst is not going to just answer your question back to you. They are going to say, okay, to answer your question, what I need to first do is I need to go to these 7 different systems, pull out all of this data. These are the kind of analysis and data composition aggregations I need to run. and then finally structure my response in a certain format. And then I will implement this somehow. I'll write code, I'll do something, and then click on a button called execute and let my computer execute it. And that will fetch all the data from different systems, run the code, do the analysis, and then come back with a response. I am, as the human, as the non-deterministic human, I'm coming up with the plan. I am not executing because I can't read the entire database in my head. I can't do math in my head. I can't do these things in my head. LLMs are exactly the same way. So let the LLMs come up with this plan and you can put guardrails on these plans. And this plan is in a deterministic language, which you can execute in a deterministic runtime. So now think about all the deterministic guardrails you can put on And that execution is what is creating the result, which is not AI generated. And that's the entire idea behind PromptQL, which is decoupling the planning and execution and letting the LM only be responsible for the planning. If you try to map out all of the AI technologies, from narrow to general and unreliable to reliable, what you realize is these early AI systems, which are very narrow use cases, they can set a timer, they can change your thermostat, but they're also very unreliable. Then you have these agents which have been custom-built, which are highly reliable, but for that specific use case. Then you have these assistants that are getting popular, like everyone is talking about them, every third LinkedIn post is about them. These are more general purpose, but they just don't work as we just saw. So it's all the hype. Then there are some of these super general purpose assistants, like Claude, Manners, Chad, GPT. All of these are like a human in the loop, non-enterprise grade, like for non-enterprise rate use cases, but they're still a little bit reliable and a little bit general. But if you see, there is this one line which is blocking us from reaching the top right corner, right? Which is a completely general purpose, a hundred percent reliable AI system. And that's what we believe humans are, right? We are very close to being general purpose. We can do, handle any task possible. And we are pretty reliable at what we do. And that is the goal. So let us look at an attempt towards building a highly general purpose and a highly reliable AI, which is what we call PromptQL, right? An accurate AI for any analysis for automation. So let me jump into a demo of PromptQL for those who haven't seen it before. And then I'm going to talk about what makes it reliable. So I'm just going to refresh the page to make sure my internet is working. So this is PromptQL running on top of an enterprise SaaS company's data systems. So now I can ask any free form question. Can you help me find the organization The org that has brought us the highest billings over all time. Find unique orgs based on the email domains of the user, something Right? I can ask a free form, national language query, make it as specific or as general as I want. And I want an AI system which understands what I'm trying to do. Breaks it down into this plan. This is what I was talking about. The plan that PromptQL generates. This is for non-technical users to see. But if you want to look at the actual DSL that the PromptQL has generated, it's right here. Hey, Indre, could you just bump up the font size a little bit? It's a little hard to see. There you go. Lovely. Thank you so much. Of course. Okay. So yeah, so you see how PromptQL came up with this plan. I'm just going to refer to the natural language plan here because it's just easier to come talk about and explain. But yeah, so it says that, okay, first I'll get all the user, extract their email domains, join this with the invoice items to get the billing amounts. Great. I didn't even have to tell it that the invoice amounts is where the billings are. Group the email by domains, sort by total billing, and then store it in an artifact for you to see. That's it. That was my AI's job. That's it. It's done. Now, this completed in 11 seconds, right? The underlying deterministic programmatic runtime is executing this plan and creating this result. And my AI doesn't even finally answer my question. It just says, hey dude, this is the answer that the system gave back. You just look at it. And that's perfect, because now I am very sure that this is correct, because I understand the plan. I understand that this execution happened deterministically, and this actually came from my data systems. There is no AI generation happening here. Another great thing is I can edit this plan. I'm like, this is good. But when you are finding the orgs, I can be like, here, can you ignore all orgs with less than, let's say, ten projects between their users? So I can just do that. I can just edit my query plan and be like, hey, do this instead. And it's like, I get what you're saying. Same thing as before. But I need to fetch the project data as well. And then I need to ignore those. Perfect. Nice. Now I can ask data analysis. I can ask semantic questions like, how is the first org feeling about our product? Can you look at the support tickets? So make it call Zendesk. Make it call our ticketing system, right? It's like, yeah, OK. So for Williams.com, I need to get all the support tickets and then analyze the tickets and then Also, the ticket comments to understand the sentiment and then share it. This is a more complicated semantic query. So, I'm going to execute that. It will take a little bit of time, but it will execute. Okay. These are all the 68 support tickets that I have. Okay, so based on the status, they show neutral sentiment, suggesting standard technical support interactions rather than major frustrations. Perfect. So see, this is what reliable AI looks like, which tells me exactly what it's going to do. Does that deterministically and keeps me in control as the user. ask it to do whatever I want and however I wanted to do it. I can ask a vague question, make it come up with a plan, I can then steer this plan to become more So this is what PromptQL was. PromptQL is this AI platform that delivers this human level reliability for any natural language analysis or automation on your data and systems. but what it really does under the hood to ensure this reliability is that it learns and codifies this unique language of your business and this I'm going to get into this next which it uses for its LLMs to create these plans and execute them deterministically think of it as like a staff level analyst or an engineer in your company that anyone in your org or your customer can trust on can trust so How do we make the plans reliable? Some of you might have a question. You still have AI generating the plans. That's still non-deterministic. That's still prone to hallucinations. But what is making your plans reliable? What makes our plans reliable is because they are generated by an AI that understands your company's language. Because you speak your company's language. All your employees speak your company language. There's so much tribal knowledge that you have, asset knowledge that you have. There is so much domain context that you already know. You know what are the constraints you're working with. You know what different terminologies means, what kind of KPIs you're optimizing on. All of this you know. And hence, whatever actions you take, are all based on this knowledge. But your AI does not do that. My AI does not know that, hey, if I need to answer a question on how do I improve my sales forecasting, but I also need to keep these 5 other guardrails, like other KPIs, which should not go down if I want to improve a certain metric. This is a knowledge that you have, your AI did not have. And I can't be expected to think of every single thing as the engineer building this AI system and put it as context into the AI, right? So there is something missing between your AI and your data, which your AI should be able to understand so that it speaks the same language that you speak in your business. So an AI can speak your company's language with what we call is an agentic semantic layer. So this agentic semantic layer, think of it as like a completely bootstrapped, self-improving metadata or context layer, which keeps capturing this business context and business knowledge as you keep using the AI. Every time you had to correct the AI saying, no, no, no, this is not what I asked you. This is what I meant by that. This is what the terminology means. All of this context as you were giving the AI, the AI should be learning and improving its own semantic layer. And that's exactly what we have built with something called Autograph. So let me show you the Autograph. Actually, let me show you this recorded demo first and then I'll show it to you live as well. So this basically, this is a good demo to understand what autograph really means. So what I did is I connected a database with to PromptQL, which has very poorly named tables and columns, like completely absurdly named tables and I asked a question like, what employees are working in departments with more than ten thousand dollars in budget? And the 3 tables which I've connected are called more plugins or completely meaningless. The AI has no idea what they even mean, right? So the AI is like, I don't see any information about employees, departments or budgets. All I see are 3 tables called more plugins or I have no idea what to do. It's like which that's already a reliable response because I know the data is very bad. So I'm like, can you sample a few rows from each table and figure out what which table contains employees? So prompter is like, OK, cool. I'm going to look at every table, sample a few rows. And now I understand, OK, ZORB contains employee information, PLUG contains department information, and MORC is like a junction table. So now I can execute this. I can answer your question now because I understand. I, as the developer, didn't give any context to the AI. I let the AI figure it out itself. Okay, this is awesome. But one more caveat, that the budget in our data is in cents, not in dollars. But I asked the question in dollars. So you'll have to divide by a thousand, right? So can you do that, please? So prompt is like, cool, I'm going to divide by a thousand. So now it filters down to just 2 employees. Now, what Autograph lets us do is, it lets us, like this steering we had to do, right? Like we had to give context about a certain table, we had to give context about how our data has been saved. All of this, my AI should be learning. So Autograph runs automatically in the background as well, but this is like a manual way of executing it to show what it's doing under the hood. All I'm saying is suggest metadata improvements based on the recent threads. That's all we ask Autograph to do automatically again and again. It says, okay, cool. I'll look at the last one hour conversation, then I see that there are 3 models and let me analyze the thread state to see if there are any meaningful interactions that we had with these models. I can now generate many meaningful descriptions for these different tables. That's what we exactly did. It created these descriptions for these tables. And I also added this context that, hey, this column called QWERTY, it has a department budget, which is in cents. So it not just described every single table, but also every single column. And now, all I have to do is click on this Apply Suggestion. And this semantic layer, this metadata layer, is a version control layer. It has this concept of immutable builds. So now it creates a new immutable build on top of that. right? Which I can then run my evals on, make sure everything is working fine, right? So this you see, the 0 seconds ago a new build was created. And now if I ask the same question, which employees are working in departments more than ten thousand dollars? This time the AI does not have to say that, hey, I don't understand what you're talking about. No, it understands now, because it has all the context and semantic layer. And it also understands that the data is in sense and not in right? So this is what the Genetic semantic layer looks like. Let me just show it to you here as well. Let me ask a question. Let's see. So this is a finance database where there's a lot of transactional data, anti-money laundering data, and ask it a question. Find accounts with the maximum suspicious AML outgoing amounts for the first quarter for each print out the account ID and name. Let me refresh this page again just to make sure it's working fine. So it's like, OK, this is the query plan. This is what I need to execute. I'm implementing this query plan under the hood. I'm going to execute this query plan under the hood and then come back with a response. That's perfect. So it says that, OK, I asked the question for this quarter, and it says the quarter which starts in January ends in March. But let's say this data is for some country where the quarter arbitrarily starts in February and ends in April. Let's say I say this, the table values are in a country where the financial quarter is offset by a month. So Q one is from February to April. Now, this is a tribal knowledge that I have bought my domain. I just did what I thought it was right. So I had to steer this and it's like, okay, cool. So I'm going to just do that. I'll do the same thing, but with the filters from February to April. Perfect. So this is your answer. Now, if I go to Autograph, And I'll skip this question. Let's just do the last thread. So improve my metadata using insights from the last thread. So again, Autograph runs agnatically automatically under the hood. You can also manually execute it if you want to run it for a certain set of users, certain type of threads, certain set of queries. So that's why I say, OK, I'll first get the schema information. Let me understand what the schema is. I'll calculate the time range of this thread. from the last many 4 hours and then get the top thread, I can identify several insights to improve them and material descriptions. So let me create these improvements focusing on the anti-money laundering and accounts tables. And so these are the improvements I'm suggesting that I need to add context here that the financial quarters are offset by a month and something else which I didn't even realize. some information that it needed to do some kind of joints, it has that as well. Again, apply the suggested improvements and a new build gets created here, generated by Auto Craft. There you go. Next time you ask a question, no more steering, no more nothing. It's just AI keeps learning. So that is what we have been able to achieve with this agentic semantic layer, this highly reliable AI that speaks your language, your company's language. Oops, Yeah, so let's just quickly see what the customers have been saying with an AI that speaks their language. So think of this as like a company QL or QL. one of the directors of data for a Fortune Five hundred international food chain. We tried building it, couldn't do it, then we evaluated a hundred vendors, nothing worked, but then finally we saw PromptQL, and it was just completely reliable AI. Another VP of AI for Global to thousand internet services company, no other tool has come even close to meeting the expectations. PromptQL has just met and exceeded the expectations. CEO of a high-growth fintech company, our prompter was able to demonstrate a hundred percent accuracy on the hardest questions in our eval set. One of the things we do with our customers is we ask them, give us a set of your hardest questions. No matter what kind of assumptions they require, what kind of data sources they might require, how complicated the analysis is, give us the hardest questions. And we promise that we'll get you a hundred percent accuracy on top of that. So with that, I ask one last question, which is what would you call your company's language? We call it PromptQL, which is a generic term, but think of your or QL, right? What would you call your company's language? And with that, Rob, ask you to come back up. Let's see if by any chance if someone is answering this. Yeah, we'll give it a second there. We will have a couple of questions that we'll come through with. I'm actually going to lead off with one of my own just because I'm curious. Yeah. But we'll give it a second here to see some responses that come through. It's just a configure. Nice. Harshik, can we pull the question off the stage for just a moment, please? Thank you. PromptQL. Yeah, I had to resist the urge to not say your mom QL. Sorry. Won't put it in writing. Before we jump into the questions that we have over, like in the Q&A section, I have one. So we were speaking at a conference last week. I know that you're back in Vegas this week with another conference that's going on right now. A lot of us are. One of the things that people kind of presented as like a barrier for getting involved with AI solutions was their concern about not only data hygiene, but also just kind of like getting things rectified in advance and all the work that would go into it before actually connecting to an AI system. So could you speak just a little bit to like the autograph use case and how essentially you can sidestep all that work completely and farm it out to, you know, a service as opposed to having to do all that chore work yourself? Great, great, great. So I would talk about both autograph and PromptQL here. See, if you think that you can have perfect data to build perfect AI, you are kidding yourself, right? You will never have perfect data and you will never be able to train a perfect AI on top of that or make an AI work perfectly on top of that. So your AI needs to be adaptable as you are, as a human, right? You understand, like you might also run into unexpected problems, unexpected data messes, right? But then you improvise, adapt and overcome, right? That's what Autograph has been built to do, which is exactly like improvise, sorry, PromptQL has been built to do, which is improvise, adapt and overcome. And then Autograph is built to learn that this is the problem. Next time, I will make sure this problem doesn't, like I can't for this problem. That's one. second is PromptQL is great for data prep. data investigation, you understand, you can ask it like, hey, can you find inconsistencies in my data and like X, Y, Z? And then you can ask it, okay, can you go fix it please? And it'll be like, okay, cool. This is what I'm going to do for fixing it. Looks cool to you. Okay, cool. I'm going to go and end up doing it. It'll take about 20 minutes, fix all of your, as much as it can, working with your data engineers, of course, but yeah. Yeah, and I think that's the big point, right, is that we're saving a tremendous amount of time to market or to implement a solution because instead of having to do all the manual PromptQL of, you know, sanitizing data and making sure that things are semantically like they should be, let's have PromptQL and Autograph do it instead. Easy peasy. Nice. Awesome. Harsha, do you want to start throwing some questions up here that we have from the Q&A section, please? Lovely, so the first one says, so is PromptQL working within the GraphQL Hasura interface or is this a completely separate product which is also working through standard SQL queries as well as searching other data points? Is PromptQL working within the Hasura GraphQL? Okay, so think of PromptQL as the evolution of GraphQL. Because one of the use cases that we have is generative APIs, right? You want to consume data APIs in your traditional applications. Right now, you had to write these APIs yourself. With GraphQL, we solved this by just, like, your GraphQL API is just there. We just connected resources, and we have a GraphQL API. What's next? What's next is natural language. Like, I should be able to build my API endpoints using natural language. My business logic is in natural language, So, I want to consume XPyZ data in a certain JSON format, and this data should come from XPyZ sources, and before it should render it to the front end, it needs to go through some business logic. All of this, I know, I can just say PromptQL will write the PromptQL program, and then we have something called the programs API. So every PromptQL program that you run is itself an API. So you can just call that API every single time you want to do exactly what did. No more AI there, it's just a static program that gets executed. that's the evolution of GraphQL. That's what the next thing is. And that's PromptQL. And it is a completely separate product, which is also working through the standard. So, okay, so the underlying data engine, the data delivery network, is shared by GraphQL and PromptQL. PromptQL does not use GraphQL under the hood to send requests to this data engine. We use a certain dialect of SQL, because SQL is a more Turing-complete language, and LLM is really good at generating it. And, but that does not mean underlying data sources have to be SQL. It can be anything, same thing that GraphQL was supporting. SQL, NoSQL, SAS, API. It's essentially an implementation detail that you as a user don't have to know about. Exactly. Yeah. Awesome. Excellent. Thanks so much. All right. Next question coming up. We got a few of these. This is great. Can you manually provide semantic data like synonyms, definitions, et cetera? So my first thought, Anushrut, is thinking about the LSP that we have inside of VS Code and things like that. So do you want to jump in? Yeah. Autograph was invented a couple of months ago. Before that, that's what we were doing. So, yes. Yes, you can provide semantic context manually. So, first of all, whenever you connect a new data source to PromptQL, you run this command called introspect. So introspection basically pulls out the schemas from your data sources. If you have comments on your Postgres schemas, they all get put into the semantic layer. Then you, as the developer, can manually add a lot more context. If this table is about X, Y, Z, use this table only for certain types of questions. This column has information about X, Y, Z, so you can annotate your data a lot more manually. Or you can let Autograph do So yes, answer questions. All right, next question coming up here. All right. Can you provide more detail on how to provide context on the data models and the data and type of data and how the data is related to other pieces of data? Brilliant question again. I should share my screen again, I think, for sure. Okay, share screen. Let's just go back to a project like, let's look at a project of a telecom company. Let's explore it, right? So whenever you connect your data sources, let's just go to the latest build. Please pardon my hotel Wi-Fi. No idea what's happening. Okay, perfect. I'm just going to make this a little bit easier to see. Okay. So whenever you connect a bunch of data sources, right, like for example, here we have Mongo, we have Click House, we have Aurora, we have Atlas. So you have a bunch of different data sources. you run the Centro spec command and it creates the Super graph on its own. So any relationship that existed inside a specific data source will automatically get updated. So all of these connections that you see, all of these are relationships. Then the great thing about the Super graph architecture is that you can create relationships which are cross-domain as well, cross-data sources as well. For example, this table, customer link, This table is connecting this Aurora customers table to this Click house table. This is a cross-domain relationship. Now, this also gives your AI the context and gives our data layer the context that, hey, this is the relationship that we can operate So this is when you can very clearly define a foreign key relationship. Sometimes you can't. Sometimes you can't define a very clear foreign key relationship, but you can still be like, hey, this is similar data. Now that goes in the semantic layer. So if it's like a hard relationship that you can very easily quantify, you can just define all of these relationships. That's good. going to say with the visualization as well, like one thing that I like to try to explain to people is that, hey, this visualization is powered by a very human readable YAML format that we have. And that's the exact same thing that's getting eventually passed to PromptQL as well to understand kind of how everything maps together. exactly. Awesome. All right. We have more questions coming. Next one up. I have a RAG project that uses semantic DB for data storage. I have also added re-ranking to it. Is there any other way to improve my model's accuracy? So this is a RAG-specific question. So I don't want to behave like I'm a RAG expert. So I'm assuming what you're asking is, like, how do I improve my semantic search capabilities, right? We are not a semantic search company or a semantic search product, so I would refrain from commenting on that. But the most reliable way of orchestrating your vector database and your semantic search functions will still be Good answer, good answer. All right, Harsha, next one, please. Is there a way to audit or log every SQL query generated by PromptQL for compliance? Sharing my screen again. Yep. Okay, let's do this. Okay, so if I go back to my project, if I go to insights, That's for this specific thread. For this thread, you can trace every single thing. When was the LLM called? Which step took how long? What was the exact SQL query that was generated? You can pretty much track every single So yes, you have complete visibility into what's happening. Finally, total observability into LLMs. Nice, nice, nice. All right, next question. Harsh, we got to keep moving through these so we can get through the rest of the show too. All right, and Hasura, are we able to control access to realms of data based on data in a token? So I guess the question is, do we still have RBAC with PromptQL? Yes. Yes. At the core of it. Without that, you can't power enterprise use cases. So you have role-based, you have token-based, Same access control rules that we had for the lab today. Harsha, before you put the next, I was going to say, let's have this be the last question for this section. I know that there's a lot more in a new shred I'm going to ask you, even though I know you probably got to go to the conference floor after this, if you'll stick around after you answer this question live. And any questions that we haven't answered over in the Q&A, if you'll answer those via text. So the last question that we have here is, can PromptQL be self-hosted for privacy and data concerns? Yes. So there are multiple ways of hosting PromptQL. So PromptQL can be hosted. You can use our cloud. You can bring your own cloud. You can host the data plane, we can host the control plane, or we can do completely self-hosted as well. So there are a bunch of different deployment options. Totally. And I would say that we probably got through like 75 percent of the questions. So you got a few over there that you'll need to answer before you hop off. thanks so much for being here today. And thanks for the great demo. Yeah. Thank you so much. Thanks, folks. Awesome. All right. Harsha, if you would please, can we pull that question off? Thank you so much. All right, folks, let's keep checking along here because we only have about ten minutes or so left, but we're going to try to squeeze everything in and we'll probably go a little bit over just a heads up. So please do stick around so you can see not only what we're talking about this next section with reliability, but then also what we'll have with regards to UI and UX updates inside of the console. So without further ado, I'm going to go ahead and bring on Vamshi and Tanmay. Vamshi is a principal engineer here at Hasura. Tanmay is our CEO and co-founder. How are you, Tanmay? What's up, everybody? It's been a while since we've had you on the show. How are you? Good. Glad to be back on. Excellent. We've had a lot of time answering questions. I'm going to go ahead and jump out the way. And let me un share my screen and turn it over to you folks. All right. Thank you. So Vamshi and I are going to talk about something that we've been building over the last few weeks as something that kind of emerged as a very natural artifact of what we could do once you have planning, right? So one of the big questions that you'll have, let's say you're talking to an agent, you're talking to a human being that's kind of doing some analysis for you, right? The question that you often have is, hey, how do you trust what your colleague does or the human person that you're depending on does. So for example, let's say you ask this human a question about something about the product or something in customer support. You're like, help me understand this or fix this for me or analyze this for me. Help me solve this problem. Tell me how many active users we had yesterday. When you do any of these tasks, there's this kind of gigantic problem or implicit assumption that we have that the other person understands what you're saying. And this person who's helping you solve that problem is solving it reliably. There's no holes in their understanding that will cause an incomplete analysis or an inaccurate analysis. For example, you're like, hey, tell me all of the products in the dairy segment. what's the right, like how would you trust a human that made, that gave you an answer that you would trust? Like let's say the human gave you an answer that was one or ten or 100. It's an answer. How do I know if the answer is reliable, right? And the way that you do is by the exhaustiveness or the rigor. of how they thought about solving that problem, or how they reacted to ambiguities in the data or in the question itself. So they're like, ah, dairy segment now. Do I have a category called dairy? I don't have a category called dairy. Do I have a product name? And do I think product names will have the word dairy in it, or product descriptions will have the word dairy in it? Maybe I should look at the product descriptions and semantically analyze them to see if they belong to the daily segment. There's so many different ways. And when you build trust, you're building trust with somebody, with another human who's solving this problem for you, because they understand that kind of data and your question enough to be able to kind of figure out the right plan. And so when PromptQL is kind of doing this for you, and you ask PromptQL to do something for you, you really need PromptQL, you need something to help you and PromptQL, you as the user of PromptQL and PromptQL itself, to give it a sense of whether its work was reliable or not. And so we kind of call this the reliability score, and this is our first kind of release of the reliability score, and it will gradually kind of amp it up in complexity of what it's able to do. But let's take a quick look at that in action, and then Vamshi can talk to us a little bit about I would actually, what kinds of things it looks for. And so that's what kind of reliability stuff it solves for and what is on our roadmap. So let me share. Can you folks see the super graph here? Let's see. Make sure it's there on stage. All right, thank you. Cool. We were debating what project to use and Vamshi is a football person. Are any of you football people? Soccer people or football people? Please do let me know. And so do let me know and please feel free to post a few questions on chat as well that we can try out. But Vamshi, where did you get this data from? And I think it's a public data set on Kaggle. I can share a link. Awesome. So this is a Kaggle data set. And so this just basically has a bunch of data about people, and players, and player valuations, and games, and stuff like that. And so then let's start doing some interesting stuff with it. One of the questions I was trying out was help me or help me identify players that are underperforming. given their player valuation in 20 23? Is that, is that a way to frame the question, Vamshi, as a football person? Sounds good. Are these the words that you would use? Would you say player valuation or given their value? I would say given their valuation, I don't know. Alright, cool. So, yeah, doesn't matter, right? So, I want to kind of find out about people who, you know, oh, this athlete joined this club for like a bazillion euros or dollars or whatever and Are they performing up to the mark? And so PromptQL does, like you heard, like you saw from understands demo, creates a plan, figures out what to do, comes up with a way of thinking about performance stats. It says, you know what, we'll do goals and assists. Because what PromptQL is kind of trained to do is to say, let me try to solve the problem. So PromptQL is like an energetic, try hard in turn. PromptQL is in your org. PromptQL is trying to have your back. And PromptQL is like, you gave me a problem. Let me look at the data I have. And to the best of my ability, let me solve that problem. And so it found a bunch of data. And then it tried to use that data to come up with an answer. And so if you kind of just take a look at the quick plan, it says, you know, I'm going to look at goals and assists in their appearances in 20 23. I'm going to identify players who have high valuations, but low performance. This is kind of how they define performance. Right. But how prompt kills decide to kind of define performance. And it made some assumptions that it kind of told me. Right. Where, you know, it's making a few assumptions in how to determine performance and what is high valuation or not, right? And what happens on the bottom right asynchronously, right? I'm sure there's some data here and it's kind of fun to see this chart if any of you are curious on what is happening with valuation versus performance, but I'll draw your focus here to the bottom right corner, which has a reliability score which will asynchronously pop up. And this reliability score will kind of give you a meta commentary. Think of it like a code review on your interaction and on PromptQL's work itself. So it's going to comment as a neutral entity both on you and on PromptQL itself. And it's going to say, You know, this analysis is only considering goals and assists, which is not the only definition of performance, right? Because you're not capturing what players in defines do or what goalkeepers do, right? Goalkeeper making even one goal and like superlative performance, perhaps. And it says, ten million euros as a threshold. So is that Is that what you were looking for or not? And of course, this analysis also includes appearances from all competitions. So sometimes when it's not a smaller league that has players or teams at the same level that are playing, which is perhaps a better analysis when you're thinking about things like this. So it gives me that extra note. on what could make this answer less reliable, right? And then what you would do is you would ask, say, PromptQL to try to retry these improvements by itself, or you would explicitly say, hey, think about this, or use ten million as a threshold, or change the threshold, or whatever it is, right? And then you would have a better interaction, right? And this interaction eventually would get fed into our self-learning system where our self-learning system so that the next time, let's say next week or the week after you came back and asked this question, you made certain assumptions, those assumptions would automatically be captured because that is the kind of language that you are using to reason about things here. When you say high, that typically means this valuation threshold that's going to get captured automatically. But the reliability score is going to be a key ingredient in helping both the user and PromptQL understand how reliable their work and their interaction is. I'm going to take up maybe another question from if you have any questions on the football data set. But because we're running out of time, I won't show you a few other questions that I had. But Vamshi, do you want to give us a quick sense of what are the kinds of things that you're looking at when you think about reliability? So what are the buckets that you've come up with? And what are the buckets that perhaps we're going to be looking at in the future? Right. So broadly, the kind of issues that you would see are you have like a query plan quality issue. That is, the query plan that PromptQL picked may not be the most relevant or may not be what you're looking for when you ask a question. So that's one category of issues. And then the actual implementation of the query plan. So the actual implementation of the query plan is not as important because models these days are quite good at writing code. And we also have a way even if a model writes an incorrect SQL query for a step that it has planned, the error message would again be fed back to the LLM and the LLM corrects itself. So the second category is not as important. But again, if there is a logic error in the implementation, and if it has high impact, then you would want to see that. So let me quickly switch to the first category. The first category is what you're seeing here, which is that the query is what could cause an issue with the quality of the query plan. Number one reason is that your query is ambiguous. So when you're asking for When Padme asks this question, as someone who understands the policy, what are you even asking? That's my first question. It's like, are you looking for attackers? Are you looking for defenders? That's probably the first question that I would ask. So because it made a bunch of assumptions, the reliability score evaluation is complaining that, you know what, the query is ambiguous. Your query plan, the query plan that PromptQL picked may not be the right one. So if you clarify that, like, you know what, I'm actually looking for attackers, then this goes away. And similarly, if you can quickly try out, like, searching for a random player. This is my question that is going to be offensive to football players. But I don't follow football. So when Vamshi gave me this data set, the first question that I asked was, how is Smith playing these days? I don't even know if this player is called Smith. So go ahead, Vamshi. Right, so there would be a couple of things that would happen. One is A, the question is ambiguous, but also during implementation, how is the system looking for a player named Smith? Is it doing an exact match? So for example, in this, the step one of query plan is search for players with Smith in their name. What does this mean? Is it doing a last name search or a first name search, or is it doing some similarity search? So that is what the evaluation would be. for things like implemented, like for the second category of issues, right, where you are looking at the implementation quality. And yeah, there are other classes of issues as well. So in this case, for example, you're going to be checking whether like, you know, what's the quality of our search filtering, like is that filtering like over-fetching or under-fetching data essentially, right, because that could because the way that you filter it might work, sort of, but that's not important, right? If you're drastically under-fetching the amount of data, then the reliability is bad, right? Like you're not capturing enough information, makes sense. And similarly, like you could be So in this case, because the search implementation, can you open up the code term? Yeah. I think because it's doing a like search, you can see that it's a similarity search. So it found the set of players. But if it did not manage to find it, then a quality issue would be raised about the implementation saying that. Correct. And in this case, we're saying that the quality of implementation is OK, because this is the best that you can do here. But this interaction is not a reliable interaction, because if your question is about the one player Smith, then none of this is going to help. You're going to have to choose the Smith that you're interested in. Right. That makes a lot of sense. I know we're over time, so Rob, I'll hand it back to you. And if you folks have any questions here, please do engage us in the chat, and then we'll continue from there. Itemed, if you can just wrap that up by doing the retry with improvements. But anyway, it's OK. I don't think we have time. If we want to make Rick and wait longer to show us console updates, you can go for it, Vamshi. Yeah, Vamshi. But I think the other thing is you can try it on a prompt kill project that you have today. So I think it's already live, right, Vamshi, for folks? Yes, you'll have to go into your settings to enable this feature once you have that. Awesome. All right. All right. We'll let you continue on that if you want to switch over to one of these projects and retry with improvements. That'll be really fun. All right. If we have time at the end, we will. There's one question that came through recently, and I want to go ahead and address it while we're on the call right now. So Harsh, I'll go through the latest question on the stage that came through about one minute ago. Or I'll just go ahead and start reading out loud. It says, how do you deal with syntactic issues? I imagine you use similar searches a lot. syntactic issues for search. Well, depending on the data layer that is available, depending on the data layer and the question that's available, the search strategy is chosen. So for example, I think previous question that came up is like you had semantic search. So if you have semantic search in your If you have semantic search as a tool that's available or a semantic search function that your database exposes, like vector databases, or that you've written your own custom search, PromptQL will use that. So, PromptQL will use what is available in the system depending on the task. So, for example, in this football dataset, the only thing it had were these entities with the default operators that were available, so it used a like search. But let's say, for example, I had a search called player Search as a function that was available, that was doing a better search. Then it would use the player Search function automatically. It would use the player Search function, say, player Search Smith, and then it would try to get that. So PromptQL chooses its strategy depending on the question, depending on the data layer that you've given it, which is what makes it very useful because you don't have to pre-commit to a one-size-fits-all strategy. And so that's the answer to that question. That's a great answer. And I love that it's not a one-size-fits-all. That's wonderful. Yep. Lovely. All right. Vamshi, I think you get to leave too. And it's just goanna be Rick and being brought on stage here. So Harsha, if you'll bring Rick and up with me. Hey man, how are you? Hey Rob, it's been a while. It's been a while. It's good to see you. It's good to see lights on in the office and not you just sitting in a dark cave. So lovely. This was done just for this call. OK, do you want to share with us some updates to the UI and UX for the console? Yes, let's keep it on screen. All Well, I'll keep this short. My updates are not as exciting as the previous ones. So anyway, I'll be going through a few quality of life features that we've added for improving the management of threads and chats in the console. Mazda should show the Traces stuff. We also have added a bunch of observability features to help admins and folks maintaining the project to get a little bit more understanding of how the PromptQL project is being used. And towards the end, I'll have a sneak peek towards the new Prom Kill console that we've been working on. So yeah, let's get into it. So I guess the first simple quality of life feature I'd like to talk about is A, we have LLM generated thread names now. So initially, we basically the starting few characters of your thread, of your user message would become the thread title. But now we have an LLM taking the context of the first interaction and actually generating a name for you, which makes it a little better, I would say. If you don't like the name it generated, you have now the ability to kind of rename it to something you like. So in this case, I'm just going to remove suffix it's added there and as you see the name has been changed. One other quality of life feature is now the ability to pin threads. So if there's a thread that you want to refer to come back to later or something you found very interesting in it which you would just like to keep track of, you can just pin a particular thread and it should show up at the top of the threads list. Yeah, that's mainly around just small improvements around like what we let you do with your thread lists and thread chats and when these start growing, I guess. I'll jump into the observability features. So as Anushrut had shown earlier, we now for every interaction you make with PromptQL, a trace gets generated and you can get to see all the details that that went through in that in particular interaction and basically debug and figure out what could be going wrong or just any other information you might want to what you would be interested in. For admins, we also have this feature called Thread History. What this feature lets you do is, as an admin, you can go and look at the threads generated on the project. So in the playground, you would just see the threads that you have created. But this gives you a view of how the project is being used, how the different collaborators on the project are actually working with PromptQL, and what kind of queries they're running. So you kind of get a read-only view of any thread that has been run by anybody on this particular project. Similarly, we have an API history tab here. So as you know, we can also interact with the protocol API. So this gives you a very similar view of like what all API requests are being made and basically how that interaction kind of went. Give you more insights on, again, how your project is being used. We now also have a monitoring tab that has been added, which kind of gives you some metrics on your chat and your API usage. So you can see for the API, you can see the request per minute, errors per minute, and the latency. I haven't made any requests recently, so it's all empty at this point. I have made some chat requests so you can get to see how many queries are happening and again errors and digital time being taken for the query should be answered. So yeah these were all the changes we had made on the to help people you know better understand how their project is being used and just get some better insights into yeah the project usage. With that, I'm going to jump into the new PromptQL console that's going to come up very soon. So it looks very similar. But once I enter a project, you'll see things have changed quite a bit. looks very similar, but once I enter a project, you'll see things have changed quite a bit. So the general idea we've been going for with this change is that you're trying to make the PromptQL console more tailored towards consumers, basically folks who come in to actually interact with PromptQL and kind of hide away all of the innards of how this thing all works and all of that mid-related stuff away so that you just get a much cleaner interface. So we have this much nicer just a generally nice looking page I would say. The chat history has been moved into the sidebar so that you know it cannot be hidden away when you go to a particular chat, here's where you will see some more changes coming in. So the main change you will notice is that artifacts are now rendered inline. So initially, in the previous version of the console, you would have to actually go to the, you would have the artifact sidebar, and then that's where you would actually see the artifacts. And then you kind of have to play this game of read the text and see what's happening in the artifact all of that. But now with this version, we hope it's a little bit more easy to follow a thread and, you know, kind of read the conversation in a more linear fashion, I would say. So that's one of the major changes that has been introduced in this particular revision. Regarding all of the other stuff that we had for the developers, like people maintaining this project, all of those features have kind of been tucked away into this dev mode. So if you go and enable the dev mode, all the other familiar pages will kind of come back. You get back insights. You get back the settings and stuff like that. But for consumers, it's much simpler and cleaner look to work with. So yeah, this should be landing in soon, hopefully sometime early next week. And with that, I guess, those are all my updates pretty quick. Excellent, man. That was a great update. Great to see those changes that are coming through. It's going to be a nice, simpler interface. I love the hierarchy. Yeah, looks awesome. Great work. And with that, I think that is the end of our call today. That was the last topic we had. Typically in these calls, we'll go through and talk about what's coming up next. I will say if you go to hasura.io events, you will find information about the upcoming reliability calls that we have, but I don't have anything formalized in terms of, call us to action and that kind of thing. If you're curious about anything that you saw on the call today, please head to Hasura.io and you'll find more information there. Additionally, we recently launched a forum for conversations around PromptQL at forum.hasura.io. So if you have questions that you want to answer about a team, please check that out as well. Rikin, we'll wave bye to everybody and then we'll let Harsha send us away. Thanks, folks.