Fireside chat with Jilong Kan (Sr. Director of Engineering, Flexport)

Are you enjoying this session?

See exactly how PromptQL works for your business.

Book demo

Flexport’s Insights Builder enables customers to transform complex data interactions, making insights and generating custom dashboards as accessible as asking a natural language question. But under the hood, this system is powered by a clean semantic foundation, meticulously built to support accuracy, extensibility, and customer context.

How the team handles evolving business definitions and continuous context

The tooling they’ve built for monitoring accuracy, both online and offline

Lessons learned from turning a great AI idea into a trusted product

It’s not just about being AI-ready — it’s about staying AI-reliable as the ground shifts beneath you.

What's discussed in the video

Go ahead and dive right in. So I'm going to turn it over to Rajoshi and Jilong for our first segment. Jilong is the Senior Director of Engineering at Flex port. He's been there for almost 4 years and he's been leading some of their AI initiatives. One of the products is what we're going to be diving into, their insights builder. For those of you unfamiliar, Flex port is a global logistics and supply chain platform, and they coordinate global logistics from the factory to the consumer's door. The insights builder is the product that we'll be diving into, so no spoilers there. We'll dive into it in just a little bit, but Jilong would love for you to introduce yourself to our audience here. Yeah, sure. Thank you, Arjas. Thanks for having me today. Yeah, my name is Jilong. I'm the senior director of engineering for Flash port. I've been here for almost 4 years, and I'm leading the pretty much focused on the operations and technology teams in China, like the forwarding application, like a warehouse management system, and also currently some AI initiatives, like the ones I build is one of them. And also, I'm currently in the data domains and the info domains as well. Looking forward to this conversation. Awesome. Thanks again for joining us. And maybe just to start off, would love for you to tell us a little bit about what is the Insights Builder product and how did it even start off? Was this something that was something your customers wanted or was this part of the roadmap for a while? Just tell us a little bit about the product. Yeah, sure. Yeah, honestly, this computer just started with a simple observation, right? Like in our daily business, we continuously have our customer to ask us about like, okay, where's my safe plan? Like, how's my transit time like over the past few months? And why am I causing spikes for the last quarter? So all those questions will involve rendering several different diasporas and offline communication. And typically, they're going to involve our AMA teams to help our clients to prepare data. Then we thought, what if the process would be failing if we can just empower our user to just simply ask a question, like a conversation in a natural language manner? So that led us to imagine an interface by this inside builder, where users can type in the natural question, and the system is going to do all in the network. They can just fetch the right data, clean it up, and do the validation, and generate a summary. And even further, they can do some charting for user visibility. So yeah, I would say that's a very massive response to pain like we saw on the ground, but also a chance for us to proactively reimagine how analytics would work in the world part of AI. Got it. And around when did you start building the product and when was it shipped out to your customers? We started our internal version back to last October, October or November. Then after we internally aligned and approved and we kicked off the whole development work, and we wrote our alpha phase to our alpha clients back to this end of February, along with our winter release. Okay, that's a pretty impressive timeline. Congratulations. Yeah. Awesome. And for those of you watching, we do have a demo a little bit later, so we will get to kind of see this in action. But while Geelong and I were chatting, you know, sort of for this conversation, one of the things that really struck out to me was, you know, you mentioned having a really good data foundation that, you know, you folks have maintained over the years. And I'd love to kind of hear about that. What was that data foundation like? And how did that help in sort of accelerating, you know, your development of the insights product? Yeah, good question. So for a forwarding domain, also for supply chain, especially, if you think about a higher overview of the whole industry, data is really the key to maintain a higher level of user experience. So what really makes Flex Pro differentiated from the other supply chain players adopting with AI for data is that, Flash plot has already been accumulating data for quite a long time that has been already used to serve our clients daily data analytics use cases. So before Inside Builder to be brought to the table, platform clients have already been able to query their platform data through the capability of reporting analytics in Client App. is an external-facing platform built to encompass lots of different capabilities there, like the safety and visibility, and manage their appeal, and manage their schedule. as well as also including the data, right? Like we have reporting feature there already be used by like tons of different clients. So that's the data foundation for Inside Builder. That being said, like we are reusing whatever we have already built to serve our clients' data analytics requirement. So that's the key, like we have already set a very good foundation for us to move on. So the users that they can access a bunch of well-defined analytics right now to gain the insights in different domains, right, like the volume and the transit time and the invoicing. So now they throw all their data through a complicated reporting system to customize their reporting, you know, for offline users, it's kind of like a require, you know, some expertise and knowledge, like the, you know, you have to be familiar with the system. You have to know like what kind of fields to be included in which report and what's the relationship between different domains. We already collected quite a bit of feedbacks and also suggestions from our clients about how this data navigation can be improved. That really gave us a really valuable insights from our clients directly about what could be the direction we can move towards for insight builder. So basically, we know what data matter to clients, and we know what we need to do to receive the way they access the whole data, and to facilitate the process with AI to basically deliver simply a question, ask, experience through the network language. That's the ICs that are really critical to our success of Inside Builder. Got it. Who are typically the users of the platform? Your customers, what are their profiles? Yeah, for our platform, well, for app, that's the platform that we used to deliver quite a bit of a different capability to our clients, the whole fast forwarding clients. And we have the Omni channel business as well. What we are trying to do is we want to combine all those different client profiles together to one single platform. With that combination, then all different types of clients, they can benefit from all those features and capabilities in the future. Has the profile of the people who are using the platform changed after you've introduced the AI-powered capabilities? Because you said they needed to be familiar with the platform and the intricacies and relationships between different data sources and things like that. Yeah, I would say the client profile wise, not too much difference because our platform has been there for quite a while. Our clients are really enjoying the features that we have already built to provide them the visibility and the real-time communication, as well as this kind of a similar view. They really like it already. But I think one difference that we have already been noticing is that with the inside builder delivery. So our clients are really recognizing the technology kind of advantage that Flash port has over the other players, especially in AI domain. Because we already, well, as what I mentioned earlier, for Flash port, we don't really have to do quite a bit of a marketing job to ask different clients to come into our platform to use our new product. because we already have the foundation there. Users already be on the platform. What we need to do is to save the high quality product to them, to ask them to try out our new capability powered by AI, then proactively engage with our clients to provide feedbacks. Then we just bring those feedback into our next iteration of the disability. Yeah. Got it. Got it. And how do you kind of gauge success and adoption of this product? Are you folks monitoring? Because like you said, these are the same users. They've been using the features that you had even before the natural language capability. And now that you've introduced here, how do you folks measure this being a successful rollout? What are the metrics that your team is monitoring to see if this was a good rollout or not? Yeah. So let's give a quick overview about the whole product lifecycle. We have already launched our Insta Builder, the first-order Insta Builder, back to this March, end of February, end of March. And what we are right now doing is to further enhance our system capability and experience to ensure that it really digests our clients' feedbacks and to deliver some more exciting features to our clients to benefit their data analysis scenarios. And also we are talking to put the inside builder to GA face in this year's summer release. So the focus area that we are trying to improve is more strongly like. we need to have a better accuracy and a smarter conversation, as well as we want to build some agent AI features to really leverage the skills of LLM to give our users some more benefit for the data analytics or data insights gathering. So back to your question that what other major magic we are focusing on? that is more about the accuracy and the engagement. So, accuracy-wise, we have learned that trust takes a long time to build If one bad answer can really seek the trust quickly. So we'll adjust over the reliability. So that basically means on the live side. so we have some guardrails. for example, if there are some empty results like a rose, then the system is going to automatically try to explore options to correct the result and share with the user about what's the potential reasons. And at the same time then the system is going to trigger real-time scoring. Then we have our keyway folks offline to proactively check out those kind of low-score cases and proactively fix those root causes by the end teams and to hold the accuracy at a high bar. For offline evaluation wise, then we can collect authentic user questions to add them into the backend system, especially for our evaluation question bank, so that we can regularly run some evaluation jobs to keep driving the accuracy to a high level. Further, we will add those as a gather rail into our CI-CD pipeline to potentially block potential regressions. For engagement, we are actively working with our business team to bring more clients, to give us feedbacks and to try out the new features so that we can really keep the momentum and to make the system getting better as time That's awesome, Yeah, I mean, and we are on an episode of the reliability call. So this is actually a topic that is super interesting And we'll dive in just a little bit. dive in a bit more, just in a little bit. One question that I had, which is related, which is that how you know, there's always new business rules, new sort of New like or other business language phrases that is unique to flex port. or maybe there's a new definition or a new data source that you need to bring in to your product. How do you, what is the process around? keeping sort of this changing or new data in sync with the insights builder so that you're always able to surface accurate, reliable answers for your customers? Yeah, so that's a good question, for sure. So, as the product gets evolved, we are keeping onboarding different DOMs data on WebCenter Builder, not only for external version but also for internal version. We have some data that are not suitable for external fixing, but they are definitely valuable for our internal teams. So we have 2 different kinds of verticals for different user personas And in foreign domains, like, some data are not exclusive to some. certain domains, For example, like, say, that we have the inverse amount of data, but the custom domains may also have those data as well. So the trend that we have observed is that those kinds of potential conflict, especially surrounding the semantic and definition in DDL, will potentially cause OAM to not be able to differentiate. what exactly is the user intention? what data I need to pull in to answer user's question? That's something we are already facing and the team has been actually innovatively working on the solution to make this API be most smooth. But, first of all, the principle that we have been following is that we need to ensure that the data schema on-boarded to the platform are justified to avoid conflict as much as possible. It does make sense for us to have 2 different domains, but they have a relatively semantic and similar data to be onboarded together at the same time in one single database. That's something we need to do at the very beginning, to ensure we are on the same page across different domains. And also, second, we're going to give enough training data to help our IAM to understand the differences and the suitable kind of response. If there are indeed some conflicts, it happens. Maybe they're like user. they're going to ask some questions in a really random manner. This simply gives some data, gives some cost data, some trusted data. Those question need to be clarified. Once some clarification be down, then this is going to move ahead with some assumption. That assumption need to be explicitly communicated with our clients to say: okay, I'm based on this assumption and to give you some data, Not feeling comfortable or you're going to feel something need to be changed. just tell me through a follow-up question. Yeah, so this kind of a manner is something we're adopting to avoid the point of conflict at the very beginning And as time goes, this is definitely not something totally avoidable, right? Last but not least, we want to keep collecting feedback from our internal folks and from our clients directly so that we can conduct an evaluation and test as much as possible to ensure the solutions are consistent and robust. Got it, You mentioned sort of training, the LLMs. Is that kind of how you can keep? is that part of how you keep your system aware of the business terms? Are you folks doing some post-training work with an LLM? Yeah, I would say we are trying both. Or is it at the semantic layer kind of thing? Yeah, is it like metadata or is it like training the LLM? Yeah, we have quite a bit of a different approach. Well, in general, we have 2 methods to measure our system reliability. One is the field-solid accuracy. The other one is the 0solid accuracy. For field-solid, it means we're going to have some training data be available. Then, whenever the user asks a question, the system is going to automatically provide our reg and to pull in some data directly from our backend embedding model and to do the ranking and to tell OM. this is something you can follow to try to answer your question. This is supported by some really well-defined training data mechanism built by our team, And the other one is that there's some kind of accuracy. That basically means we have no kind of reference or clean data available. Then what's going to be the behaviour for our system? to answer the user question? We are trying both ways to improve the accuracy. got it, got it? okay? um, and uh, yeah, i think we touched upon, uh, yeah, the touched upon the kinds of things you folks are looking at accuracy and i think, from what i hear, like that is sort of like in your metrics of evaluation. accuracy is like, sort of like the top kind of metric that you folks are. Do you have any examples over the last few months that this product has been rolled out, where you've had to kind of go back and change the way you built something or rethink the product based on maybe certain interesting- I'm going to use the word interesting, interesting responses, right? or changing definitions like how is how have the last few months been? and like how tell us a little bit about that feedback loop, of how you built the system with the training data or, you know, with the eval sets, and then a real world sort of use case that's had. you know you've had to go back and change things. Yeah, there's a lot. So I'll just give a few examples right One's from the external client's point of view and the other from the industry. For the external ones, we have been keeping talking with our clients directly for the past few months. We talk directly with our clients and authentic users who are willing to give us feedback. For flexible clients, one of the privileges we have is that for those people there are already some very narrow people who are willing to revolutionize their supply chain. That's why they choose Flash port as the go-to solution to help their supply chain. So those people are really excited about AI. So they gave us really valuable feedbacks about how they feel about the AI inside Builder and the worst potential improvement that we can target for the future enhancements. So one example that we get from the classes that they already want to have our system to behave as the human assistant-like system Basically means they want to have the system be aware of, for example, if you were a user, you have some report somewhere, then the user want to say, I want to get this report be regularly alerted if some abnormal case happen. Instead of that, I correctly come into the platform to pull all this data. So I'm expecting your system to do this for me, instead of I come to the platform to gather data by myself. That's something we are really thinking is to be the direction we want to enhance in the future. Based on that, we are revolutionizing our backend architecture to empower this future possibility that we eventually have the existing AI capabilities to be delivered to our total clients. This is something we're going to see along the way in beta phase. We have some really cool features that will be rolled out to our clients. The second one is for our internal teams ourselves. that so far, if I remember correctly, we have already undertaken 3 times of fundamental architectural revolution. From last November, we have done 3 times fundamental re-architecting for the system. That's all based on the feedback we get from both internal and external people. We rethink our future, our product, and also the limitation we have observed based on the updated architecture. That suffers, but that's valuable. That's a really valuable investment for us to reiterate our decision and to say what we can do to deliver a better experience for our clients, not only from the reliability perspective, but also from the security perspective and also the scalability perspective. Are you able to tell us a little bit about maybe one of those instances, like what was the kind of feedback and what is the kind of re-architecting that you had to do? Yeah, for example, in our previous architecture, the system is not that aware, super aware about the previous context data. For example, if you ask a question to gather a table, those kind of table data are not persistent. to the context. The next time if you ask a follow-up question, the system is going to rerun the query to gather those data and to render a chart, for example, if you want to realize it. That may cause some discrepancy between these 2. Maybe at the very beginning, the LLM gave you a SQL, for example, with ABC columns. But for a second term, this LLM will give you more in the column or give you less columns. That will really cause an inconsistency for the appearance. So that basically is one of the reasons why I want to ensure that the backend, we call it supervisor node, needs to be aware of the previous contacts and data. So that if there are some questions, then we don't have to re-run the whole process to get the SQL again and to run the execution again. Instead, we just simply get those data from the previous context and to render, for example, a chart for the user. Got it. OK. Got it. And maybe this is a good time for us, too, if you are willing to just share your screen and show us a little bit about the product and how somebody would use it. That'd be great. Yeah, sure. Let's give a quick demo about our system. Let me see. Yeah, this is the landing page of the incident builder. Generally speaking, we have a pretty like a simple layout here, like left panel will be some setting history. And the main areas, the conversation, the chatbot areas. What I do notice is that because I'm logging as an internal user, so the look and feel will be kind of different. But this will be to showcase the general flow of the system. For example, if I want to have a pretty standard question be answered by the inside builder, I'm going to say, sumi the department statements in the post 6 months. For the system, we are following an agent kind of reasoning approach. So the system is going to reach the target into several different steps to plan the tasks. As you can see, we have data being written back. For the sake of performance as well as the user experience, we're only showing at most like ten records at the same time. But you can just choose to act with the whole data to an offline line to certify. It's something they can do. We're going to show some follow-up question. If you want to have further drill down, then you can just follow the guideline here to ask a different follow-up question. This is what you meant earlier by saying that earlier it would rerun the whole thing and now you can follow up and it pulls from this information. Basically, we are breaking down the tags to different stats and reasoning. Once we have the SQL available, we're going to do some validation to ensure that SQL is valid. Instead of just getting whatever from LM to execute, that will very likely to cause some errors out. And even further, if there are empty results, we want to tell user why we think there are empty results, what potentially could be the assessment they can try to ensure they have data back. For me, I'm going to just do a breakdown by transportation mode for each month until we realize it. Yeah, this is a follow-up question to the above question. So supposedly, the system is going to return back a chart to realize the breakdown by the condition here. So as you can see, we have a stacked column chart be written back to represent different modes, like how many statements are there for each month. And yeah, so they can play around. They can ask for the whole image, and they can full screen. And yeah, and they can even change the chart style, for example, change to line chart instead. So we support kind of a different type of chart types. It's going to see the chart is converted to a different chart style. So if users get those kind of data back and are ready to create some dashboard, they can simply click some button here. Then they can drag and drop the data they have pulled out of Canvas on the right panel. And for example, they can do something like this. And they can do some resizing and the layout, or whatever they want to do. And they can just go with the task, dashboard, and save. And this is going to be dashboard they can refer to like an interface. And this keeps updated, like it's something that stays real time or is it a snapshot in time? They keep updated. Okay, got it. Awesome. And this is also where like you said that if there is something ambiguous, then before it does the processing, it kind of asks the user for a clarification. Yes, if there are indeed some cases where the system fail is not clear enough for the system to move on, then the system is going to ask some clarification question. Got it. Got it. And do you folks use multiple LLMs under the hood, or do you have a specific LLM that you use? Yeah, personally, we are pretty much only using one single LLM mode. But now we are trying to do different kinds of optimization. Some different models may have different kind of advantages. So that's something we are doing to try to explore a different LLM combination to ensure we have a better experience for different use cases. Got it. And on the data side, does all of the data live in one system or are there multiple systems or multiple databases, data sources that this is coming from? Well, if you're speaking from the whole deep life cycle perspective, then from the raw data, raw data is owned by different services behind the scenes. Then all those kind of raw data are going to go through a pipeline to flow into our data warehouse system. And the data warehouse system is going to have another layer to abstract all those natural data to an analytic database. So that analytics database is going to be the one the system relies on to serve users with questions. Got it. Got it. No, this is super helpful. It really helps kind of put a picture to what we've been speaking about. So thank you so much for sharing. I just have a couple more things that I wanted us to dive into. One sort of thing that we've been seeing as we interact with our customers at PromptQL is that With AI initiatives, the interaction between a business leader and a technical leader becomes so much more important than, you know, pre-AI products where it was pretty clear, right? Like you had a button. If you want a checkout button, everybody knows what a checkout button is supposed to mean. Whereas with, you know, a lot of the times with AI, you're building for users that you may not entirely empathize with. You're not in their job function and everything is very new. And again, a natural language interface means people can ask anything. at Flex port and with your customers, like, how do you, how are you bringing sort of the domain experts and business leaders into the loop? How are they How are they a part of the product and the product building experience? What's that like for you all? Yeah, actually, this is one of my favourite part for this whole project that our tech teams can just sit with our domain team leaders, equities together to build the whole product to our clients. All those different business teams like operation, Demand teams and the customs and the finance teams all gave us a really, really valuable feedback at the very beginning of the Incivirus project. They tell us, okay, what other typical pain they have been experiencing. in the past in terms of the data analytics for their clients. And we learn from them about what's the fundamental kind of root causes. It's purely as a data problem or it's an experiment problem. It's something that we can really make some changes through the AI. Then based on the feedback, then we narrow down to the area that we really feel we need to make some change with the AI. That's because the pattern we have been keeping, we have been following for all the time along the way for Insta Builder. So we learn how they talk about business and we learn what terms they use. We learn what things are really getting their attention and are the bottlenecks for their daily scalability. So that's something we take it very seriously. And we bake all those kind of collaboration and feedback into our systems and into how we pass the questions, how we clarify the user intention, how we're mapping those kind of fields to the user question. Because it's not sort of based on our understanding about how we're supposed to answer. It's really based on the, we really make our system behave like a human, right? If you want to behave like a human, then the very first kind of resource you can layer is on is the business team themselves. That how they interact with their clients on daily basis, What I'll take from the challenges or the miscommunication that they encountered before. So that, you know, once we get all those data back in the system, you know, we're going to evaluate them. And at the end of the day, then the AI has to be capable to continuously digesting all those kinds of feedbacks and to make sure like their behaviour really sounds like this is your data analytics assistant, not like, you know, like a system-wise kind of concept, right? We wouldn't want to build like a system like AI system for analytics. That's the only way to involve the people who are concerned about the user experience and to make some changes on behalf of them to empower their clients to have a better experience for data. Yeah, no, that's great to hear. We've been talking about this a lot in terms of finding your dance partner. That's the phrase we've been using. That pairing of a business leader and a tech leader is just so important, especially in these early days when we're rolling out AI products, but the adoption is key. And without this partnership, adoption is just going to be something that really takes a hit. I have 2 tangential questions. One is, what's been the most surprising thing for you over the last few months of both building and rolling the software out in maybe a domain like logistics or just in AI in terms of building this product? What is something that's been surprising to you, what you thought it would be different and the experience has been different? Yeah, well, looking back the whole journey until now, I really didn't expect just how much of this journey would become, you know, to come down to understanding the people, just as what we talk about, like, earlier, right? It's not only about understanding the people, not just data. So you go into thinking, OK, it's not technically a problem, right? Just build a model or write some prompt to the output. Everything is going to work? That's not the case. For logistics, logistics is really messy. It's really a scattered kind of industry. And for applications, so the real challenge isn't just gather AI to run. It's to understand your business, to understand your data, and understand your users. That's the most surprising kind of thing for me till now for this kind of AI product. That tells me I need empathy for the business. We need tight collaboration with the people in the trenches. You really don't need a perfect foundation to start. You have really people who are really proficient in AI domain and a really good experience as engineers to kick it off. You do need to just kick it off, but you do need to start somewhere real and grounded in the actual problem. That's where the magic will begin. For my team, we have 0 AI knowledge and experience back to the time when we decided to do the App Builder. But I'm really super proud about the achievement the team has already delivered so far. That's really something I try out a lot. And also, for the teams who just want to start out their AI product, I would say don't wait for perfection. Just start. Just start your idea. Everything will make its way out. And my final question is, what is AI adoption like within Flex port? Not your products, but in terms of just AI that you all are using, like your team, maybe other teams across, how much AI is being used, what are you seeing in terms of just adoption within Flex port? Yeah, I would say that we have been using AI for quite a long time, not just for LLM. We have been using it for learning and in-house learning for quite a long time. And actually, that's the one example that we instantly announced that we love the LLM to power our users to do this kind of data analytics. But internally, we have quite a lot of systems. They have already adopted AI in different domains, right? Spanning from the visibility, you know, like where to sit, right? We have to make sure we have a relatively accurate kind of estimation about when we're going to arrive at this port. And we have our internal forwarding application platform that we break down different tasks into different work items. You can think that as tasks need to be done by some operation people. And we are working on this kind of automation. This kind of automation, we're going to be driven by different kind of mechanism, like integration, as well as some AI-triggered automation, either through email parsing or documentation. All those kind of things, we have been doing an AI attempt. We have adopted AI for lots of different use cases. And also, we have a voice agent, that being a team that whenever you want to have some external communication, then we're going to have AI to automatically trigger a call to our trackers to communicate the business. I would say lots of different, like Omnichannel, they have an AI application as well, that we are going to make sure your inventory are never auto-stopped. This is something that we announced back to when I released it last year. So the adoption in Flashboard is really amazing. I would say due to time constraint, I don't have time to go through every single aspect. But I would say we're going to let the AI and the investing AI in the area that really matters to Flashboard Any favourite tools of yours, personal tools that you've been using that you like to share with the world? I mean, the yeah, just anything not just like, like spot on any particular tools that you like, that you use day to day that, you know, has to be like the day to day I used it definitely is the tragedy. Right? Yeah. Yeah. That's the one I use the most. Got it. Awesome. Geelong, if anyone has further questions, what is the best way for them to reach you? Yeah, if they have some offline questions, then feel free to send me either email or you have my contact. Then I think it's definitely good for them to maybe you can just consult the question together. Then I can just figure out some questions, then we will back. Then you can share with the audience. Perfect. No, thank you so much for your time today. This was absolutely great. It was great to learn a little bit about the kinds of stuff that you all are doing and the real world complexities that you're dealing with. Thanks again for your time. Yeah, thank you.