The Data Driven Podcast had an insightful conversation with Krishna Kumar Ramanujam, the Chief Architect, EVP, and India Country Head for Abzooba.  KK is a Solutions Architect with more than 20 years of experience in architecture, design, and development of high-performance software products across domains. He is a specialist in the design of frameworks to enable non-technical end-users to create business applications.   He earned a Bachelors’s and a Master’s degree from the prestigious India Institute of Technology, Bombay in Electrical Engineering.

Some of the topics we discussed on the podcast include:

  • Working at IBM and KK’s introduction to Data Mining
  • Early Data Engineering and Data Science in the NBA and the 1996 Olympics in Atlanta
  • A typical company’s AI journey
  • The accelerating emergence of Data Science and MLOps
  • How choosing the right tools can really make a difference
  • Transfer Learning and Data Augmentation
  • How small and medium-sized companies can take advantage of MLOps platforms
  • xpresso.ai

See the full transcript below:

Michael Waitze 0:00
Okay, we’re on. Hi, this is Michael Waitze, and welcome to the Data Driven Podcast sponsored by Xpresso.ai. Xpresso is an integrated set of frameworks and accelerators to help data scientists build cognitive solutions. Today we are joined by Krishna Kumar Ramanujam, the Chief Architect, EVP, and India Country Head for Abzooba. KK, how are you doing today?

Krishna Kumar Ramanujam 0:21
I’m doing well, Michael, how are you?

Michael Waitze 0:23
I am awesome. Look, let’s get a little bit of your background for context, then we can jump right into some of the main topics we’re going to cover today.

Krishna Kumar Ramanujam 0:32
Sure, I graduated with a degree in electrical engineering from the Indian Institute of Technology Bombay, way back in 1990. And I worked in hardware for a couple of years, I used to roam around with a soldering iron in my hand and doing stuff with microprocessors and assembly language and stuff like that. Then I did my Master’s Degree also from IIT Bombay. Then I joined IBM, where I was there for about three years. And that’s when I got introduced to what was called Data Mining at the time. So we were doing some very interesting work in IBM/ Actually, my boss was an IBM, TJ Watson Research Center in Hawthorne, New York. And he came up with the idea of using data mining for sports because he wanted to make it you know, easy to use. So we did data mining for the National Basketball Association. And, you know, that’s when I went over to the US. And we worked with a lot of NBA teams, including the LA Lakers, the New York Knicks, Orlando Magic, and so on. So we got to see some NBA games that were very cool. And we also went to the Atlanta Olympics, actually, we helped the US women’s team, analyze their opponents, and we sent them analysis and you know, they won the Gold Medal, so we got an autographed basketball from them. That’s awesome. Yeah, that was very, very cool stuff for a bunch of you know, young kids straight out of grad school to be doing…so great fun. So and we got showcase, you know, wherever there was this bunch of people out of IBM Research, we’re doing this really great stuff on analytics, and they’re using it in basketball.

Michael Waitze 2:06
What was it like as IBM going to the Lakers? That’s like, almost 30 years ago now and then trying to sell them on the fact that you can use data analytics, to give them some kind of edge. What was that like?

Krishna Kumar Ramanujam 2:20
Frankly speaking, I was the guy behind the scenes doing the coding. So I wasn’t involved in those deals at that time. But it was part of IBM’s sports marketing group. So you know, IBM has a lot of money and sponsorship of major sports events, including the Olympics, Golf, the tennis majors, and basketball, right? So it is part of a sponsorship deal really with the NBA.

Michael Waitze 2:43
I just think it’s amazing that even 30 years ago, people were talking about this and maybe to move forward, we can just get a couple of definitions out of the way. And maybe you can do that. How do you differentiate between data engineering and data science, words I think that people throw around but may not really know what they mean. Right?

Krishna Kumar Ramanujam 2:59
That’s a very good question, actually. So data science. To me, data science is the entire process of analyzing data and coming up with some, perhaps some predictions or some results of your analysis. And in order to do that, you need to make sure that you have the right data, you have the data in the right volume, you have the cleanest data possible, etc. And that’s where data engineering comes in. So when I speak, sometimes I give talks at colleges and so on. And there are a bunch of people who are studying data science. And I tell them, you know, it’s all very nice, saying, You’re a data scientist, but 70% of the work that you’re going to be doing in the real world is going to be around data engineering, that means ensuring that you have good clean data to work. Because otherwise, your analysis is all going to be you know, screwed up because you garbage in garbage out. Right. So that’s, I think the fundamental difference.

Michael Waitze 3:50
Can you also walk us through what a typical artificial intelligence or AI journey is for a company that’s starting to implement data analysis, machine learning, and MLOps? And maybe, what are the typical choices that a company has to make around that? What do you see that journey looking like?

Krishna Kumar Ramanujam 4:07
Very interesting question. And, you know, we’ve been through that. At Abzooba, of course, we’ve been through a lot of services, we provide analytic services to our customers. And lately, and I’m sure we will talk about it, we are developing our own AI platform or ml ops platform. So every company goes through this journey where you start off with a problem. And you maybe you have a couple of people who are interested in analytics, they may not have the knowledge, but they can operate an Excel sheet and they start digging around and trying to say, you know, can we actually do some predictive modeling and they read up, maybe they learn Python, and they go ahead and they build models. And that’s extremely exciting because you start getting results pretty soon. And in today’s world, you know, there’s a bunch of libraries out there. So a kid in high school can actually take some of those libraries and build fairly sophisticated models and that’s what happens. That’s the first success you get and you’re tremendously excited.

Krishna Kumar Ramanujam 5:00
Right, then you hit kind of a barrier because what happens then is that, then your business says, Okay, now I want this in production because I want to now start using it in real-world scenarios, it’s not just a toy anymore, that you’re playing around in your little lab or whatever. So you want it in production. And that’s when the challenges come in. Because you need to make sure that you can run these models at scale 24 by 7, you need to have them responsive. And that’s where I think the initial wild west kind of scenario is no longer applicable. And you need really good solid processes to be able to ensure that your models get into production. So that’s kind of the level two that you get into at that point,

Michael Waitze 5:41
Right. But what are some of these major hurdles, right, like, like you said, even a high school kid can go out and find some of these models and build like really simple predictive analysis, or predictive data analysis and an Excel spreadsheet, I’m sure that I was doing that. You know, when I was at Morgan Stanley, or Goldman Sachs, but when you really want to get into sort of full-scale data analysis, you mentioned earlier, like you want your data to be clean. How do you even define what clean data is? And what do people really need to consider when they’re, when they have data, and they need to clean it?

Krishna Kumar Ramanujam 6:13
Right, good. Good point there. So. So there’s so many challenges, you know, just considering the cleanliness of data. So you need to ensure that the data is full, there are no gaps in it, the fields in the data are consistent and reliably consistent in the sense that if you get some kind of data, today, it’s going to be the same kind of schema, the same fields that you get, let’s say, one month later on. Otherwise, you have to keep changing your model every time or you’re changing your data entering pipeline every time. So that’s, that’s one challenge, just making sure that you know, what you’re analyzing and, and the fields are all clean. The second challenge could be of ensuring that this happens at scale, that means that you need to set up your data pipelines so that when your data is flowing from different data sources, you have it set up. So it’s a continuous process, you can continually keep analyzing the data, keep building your models, keep retraining them. The third challenge would be just different data sources. You could have data coming in from databases, from flat files, from spreadsheets, from, you know, big data sources from the cloud, etc, etc. So you need to make sure that your system takes care of all of this. So that’s when I mean, you know, that it’s, it’s you move from level one to level two in the sense it’s not just I have a little CSV file, and I’m writing a Python code to do it. It’s a different, you know, ballgame completely.

Michael Waitze 7:30
But what do you do if let’s say the IT team has a vision for what that data analysis, we can talk about predictive? We can talk about a bunch of different types of data analysis, right, descriptive, diagnostic, predictive, all these types of data analysis of the IT team has one thing in mind, but the business has another thing in mind. Right? How do you work? How do you get those two teams to work together when their vision about what that’s supposed to look like? And actually the non triviality about it is understood by one side and not by the other side? Right? businesses sometimes just think, just tell me what to do? How do you get past that?

Krishna Kumar Ramanujam 8:02
Yeah, absolutely. That’s a challenge. But you know, in all these kinds of situations, the business obviously takes priority in my head at least. So you have to make sure that there is that’s the end problem that you’re solving. It’s the business that provides the funding, firstly, so you need to make sure that they are happy. So so you need to ensure that you know, their needs are met. And you need to work with it to ensure that the upstream data pipelines, make sure that you have the data in place in order to meet that business need. So sometimes, like, for example, a PLC is a good starting point, you start with a small amount of data that’s already clean, make sure the business is happy with the results. And then you go back to the IT guys and say now I need this on a continual basis. Can you do that? So right, that’s, that’s one way.

Michael Waitze 8:45
Okay. But let’s talk a little bit about this idea of nontriviality and scale. How can even a small company like big companies have plenty of resources, but how can small and medium-sized companies implement data analysis? So they’re not losing out? When it comes to understanding the information that they do have now losing out to bigger companies that just have way more resources?

Krishna Kumar Ramanujam 9:07
Right, that’s, you know, it’s actually becoming easier today. Because if you were to ask the same question, let’s say 10 years back, then there would have been a big gap between the biggies and the small guys, because as you said, the big guys have all the data. But today, it’s actually much easier because a lot of that data is available in the open, open-source, or in the public domain. And then there are a bunch of tools which are called ml ops tools, which help you to actually get started on this journey. So if I were a small to medium-sized company, embarking on my data science or analytics journey, I would definitely look at some of these ml ops tools that are available, because that helps you to build your infrastructure, it takes care of what I call the plumbing so your data scientists don’t need to worry about you know, setting up a Kubernetes cluster or a Spark cluster or worry about your data pipelines and containerization, all that kind of stuff, you focus primarily on the business problems, I would definitely recommend, you know, some investigating some of those tools for people that don’t necessarily know.

Michael Waitze 10:10
Right. I mean, I think the market is out there talking about, we said this already data, data analysis, data engineering, and I still think most people are confused about what it is you just throughout this term ml Ops, right? What does that mean? And what does it mean at scale?

Krishna Kumar Ramanujam 10:25
Right. So as I said, one of the big problems of data science is that it’s emerging. So it’s obviously in the last maybe seven or eight years, it’s become extremely critical for enterprises. And it’s emerging from the labs to places where it’s actually running in production. And you have so many organizations which actually depend on their, you know, for their business on AI, ml models, which are out there in production. Now, there’s a bunch of tools that are available, which help you to make that transition from going from the lab into production from your development environment into production. And those are what I call ml ops tools. So it’s like a mixture, or a marriage between machine learning and your DevOps. So what DevOps did for your software development ml ops is doing for machine learning models. So they enable you in the entire journey. So right from picking data from various sources, so they have data connectors available, which help you to pull data from various sources and keep it inside the ML ops tool. They have, perhaps some of them have something called auto ml, which is essentially the ability to create models automatically. So the machine you don’t have to write any code, you just find the tool to the data, and then it creates a model for you. And maybe it gives you five models. And you can select which one you want. So there’s a bunch of tools that do that, then there’s a bunch of tools, which help you to take your models into production and manage and monitor them, which means that once you’ve created a model, and you’re happy with it, they would actually help you to run experiments, different experiments on your model, you know, data science is a highly iterated process, right, you need to run hundreds of models before you finally decide on one. So they would help you to track those, compare them decide you know, which one works best, and then finally help you to deploy the selected one into production. And even after that, to monitor it to ensure that your model is behaving well, there’s no drift, there’s no you know, the problem in terms of performance, accuracy, etc, etc, even after it has been deployed. So the ML ops tool provides you, of course, there are, you know, several tools out there, which maybe provide bits and pieces of this entire workflow. But this is the workflow that they help you with so that the data scientist focuses on his job.

Michael Waitze 12:35
Right, and what are some of the drawbacks? You said, auto ml? Right? What are some of the drawbacks of using auto ml? versus I think you’ve also said managed MLOps, right, what are the differences right there? And what are some of the benefits and drawbacks?

Krishna Kumar Ramanujam 12:48
Sure. So auto ml tools enable business analysts to really come in and quickly out of the box, they provide you certain sort of pre-built models to play with. So that’s, that’s an advantage, obviously, because if you’re a business analyst, and you really don’t know too much about data science, you can click a few buttons and get started. And you can have predictive models up and running immediately. But the problem with that is, it’s like, you know, I give the analogy of any kid can build a drone in his backyard today, right? So to build a drone is not very tough, you can probably get all the instructions of the internet and start building a drone, and it will fly and do stuff, right. But to take that same drone and make 1000s of them in production, with precision, etc, is a completely different ballgame. So while the business analysts can use auto ml to build these three or four models, I suspect that you know, with my data scientists hat on, I wouldn’t be too happy about it. So that’s the difference a data scientist wants to would want to explore those models, tweak them, probably play with them to ensure that he gets the best performance out of them, he would not be happy with this black box model that he’s been given. So that’s the difference in auto ml, it provides you something very quickly out of the box. But if you want to tweak it, typically you cannot. So my ideal would be a mixture of both. So I would say if there were an ml tool out there, which would generate these four or five or six models for me, but then enable me to open them up, and then tweak them to say that how can I improve on what you’ve given me? That would be really great.

Michael Waitze 14:21
Got it. I mean, I like this idea of data analysis and data science being this massively iterative process. It’s not like you can just put something into a machine and get a solution immediately. Right.

Krishna Kumar Ramanujam 14:32
Right. That’s absolutely true. And that’s actually another problem. It’s kind of the Wild West, you know, the first phase of data science was like the Wild West in which you have all the bunch of really smart people doing all the stuff. And let’s say, you know, you have a business problem to solve and your data scientist has run 100 experiments, and experiment number 100 really worked well. And he says, you know, this is, this is it, this is what I’m going with, and then you ask him, you know, experiment number 73 was pretty good too. And I can, I think, use that and tweak it a little bit and, you know, use it in a different project. And we can I can you show me that? And usually, the answer is no, I cannot show you because it’s lost on my laptop. And I forgot what data used, I forgot what algorithm I used, I forgot the hyperparameters, and so on. So MLOps helps you to, you know, sort of monitor all these things. So you just click on experiment 73, and you have all the details in front of you. That’s a big, big advantage.

Michael Waitze 15:28
Years ago, I was reading this book, I believe it was called inside the Googleplex, or the Googleplex. (Actually, it was “Inside the Plex”) And one of the arguments that the author of this book was making was that Google itself was just obtaining and saving so much data, that it was going to be hard for people to compete with them because they just had this massive trove of data. And they were just very good at organizing it and of cleaning it. I’m not suggesting that someone should start a search engine today. But I am asking, What does a small company do today that doesn’t have 25 years of data at their disposal, when they’re trying to compete, maybe with technology or with a new business model against a company that does have those 25 years of data? Does that make sense?

Krishna Kumar Ramanujam 16:10
That is definitely a big advantage for the big companies, as I said earlier, right? They have all those data already collected. However, today, what’s happening is that you have models out there, which are already kind of, in some sense, discounted that data in the sense that they’ve been trained on all that data. So you already have them in place. And you can use techniques like transfer, learning, etc, to just tweak those models. So you give them the incremental data that you have collected, perhaps for your domain, and then you transfer the learning, or you essentially, retrain those models on this smaller data that you have, which then enables you to build on top of the work that’s already been done. So there’s a lot of stuff that you can do, then there are ways of generating you know, artificial data, you even if you have a small amount of data, you can augment it, data augmentation techniques are there which enable you to create more data, and then essentially, help train your models better. So there are things out there, which you can do as a small company.

Michael Waitze 17:09
When you go in and talk to people about just starting to do data science and data engineering their companies, what are the some of the things that maybe surprised you, or some of the issues that come up that maybe you understand, but that these companies don’t understand, and that they’re not anticipating when you tell them they think, Oh, I didn’t even know that that was a thing.

Krishna Kumar Ramanujam 17:28
So I think the first thing is always adding in a new organization, when you’re introducing AI, or ml, for the first time, the skepticism is always there, right? I’ve been doing this, I’ve been running this business for 20 years. And in what do you want to tell me, right, so that is an initial hump that you have to get over. And that’s, I think, probably more true of traditional businesses in new businesses. You know, if you’re starting a business right off the bat, you know that machine learning is going to help you. So you probably have that in your strategy right upfront. But in the traditional businesses, then I think that’s a hump that you have to get over, especially if you’ve been working the organization for a long, long time. The second thing is that, you know, in terms of tools and technology, so there’s so many tools, so many libraries out there that you tend to get lost in what you want. So that’s possibly a pitfall that I would want against in the sense that you want to choose the right tools, you want to make them, you want to make sure that they are as open as possible, because, for example, if you go with Python, right, you’re not you’re in safe hands, because Python, the community is so vibrant, there’s so many libraries out there, etc, that you’re in good shape, if you choose that. On the other hand, you could choose an ml ops tool or any tool, which ties you to a specific platform ties you to a specific set of libraries, and so on, which then it would be very difficult for you to get out of I have spoken to customers who say that, you know, we are trying to move from platform to platform why and it’s a nightmare because it just takes so long to because we have tied up, you know, with platform x. So that’s another pitfall that I would definitely avoid, we try to keep the data scientists front and center. So whatever tools they are comfortable with, let’s give them give it to them.

Michael Waitze 19:15
At the beginning of this discussion, we were talking about IBM, right, and this is 2530 years ago, back then you kind of needed to be IBM to have just enough compute, and even enough storage to be able to participate in data analysis. But today, we realize the massive drop in storage costs, right? I mean, I have terabytes of data storage, and I’m just one person, but also the rising power of compute as well. So you have this massive drop in storage costs, and then a massive increase in compute power. How does that impact this whole field? ml ops data science data analysis?

Krishna Kumar Ramanujam 19:52
Absolutely. You know, I can still recall the days when I got my first computer and it was great that we had some A few 100 or maybe it was 10s of megabytes of RAM that, you know, that was so exciting. So and now you have all that available in, you know, even the cheapest laptop around. So, and I actually, you know, digressing, but there was a story I had heard that in, you had to get permission from the government to actually import ram of, you know, etc in India so so yeah, so those are stories, which we’ve heard about 2025 years back, but it’s today, it’s so easy. And that’s why I say that, you know, a high school kid can write a fairly sophisticated machine learning application today, with just a week of learning because the libraries are there, they all do a lot of stuff for you, you can go to the cloud and sign up and get like this, you know, heavy-duty hardware pretty cheap. And it’s costing you on a minute by minute or second by second basis. So you only pay for what you use, you can get storage. So it’s a very exciting time. And, and that’s actually part of my problem because it is so easy to do that bad practices actually creep in very easily. So that’s why I call it the wild west to see because things like which we have learned in software engineering, modularity. reusability, you know, all that kind of goes out of the window, because it’s so easy to do. You know, simple stuff. Right?

Michael Waitze 21:17
Right. What was your first computer by the way? Do you remember?

Krishna Kumar Ramanujam 21:20
I don’t remember I it must have been an IBM because I worked for IBM. So it would have been, you know, what was IBM PC at that time?

Michael Waitze 21:27
Yes. So my first computer at school was and this is early in high school as a TRS 80. A Radio Shack, TRS 80, with a tape drive.

Krishna Kumar Ramanujam 21:35
So Right, right. Yes, I remember those. Yes, we in high school I worked on so you’re right. This is not my first computer. My high school was my first one. Yeah, but that was I don’t even remember the brand.

Michael Waitze 21:47
Obviously, when I got to Morgan Stanley, we had a computer. And it had an IBM AT, XT, I can’t remember. And there, we actually had a five and a quarter inch floppy drive. That was the boot drive. And if you didn’t have that thing,

Krishna Kumar Ramanujam 21:57
yes, yes, you can start the machine. Yes,

Michael Waitze 22:00
Yes. So just to give people a frame of reference, I have a computer on my desk with an M1 Mac chip in it. And, uh, you know, a terabyte of data there, so the computer is just so different than it was back then. And I do think it gives companies the ability like it’s not a frivolous conversation to talk about how different is was, I do think it gives you the possibility to drive this data analysis and install the right tools. And I’m curious, for the tools that you run, I want to talk a little bit about Xpresso if you don’t mind, and how it’s different from other ml ops platforms.

Krishna Kumar Ramanujam 22:36
Absolutely. So just one thing on the, you know, the computing power that you’re talking about it also the downside to it, because it makes programmers actually a bit lazy, if I were to say that, because now you know, that the bad code that you write will be hidden by the processing power of your computer. So, so that’s, as I said, That’s, those are some of the pitfalls that are there. And espresso helps to, you know, cover some of those, or helps you to help the programmer to get, you know, really organized. So, you know, when we started on this journey with Abzooba, and it’s been 10 years now. And we’re into services, analytic services, you know, we do AI ml, we do deep learning, computer vision, NLP everything. And we realized, and the top leadership has been in analytics for a long, long time, we have people who have started their own companies earlier, we have people with PhDs in data science, etc. And we realize that, essentially, data science projects are very different from software engineering projects for various reasons, because the requirements are not very clear. It’s a spiral implementation, you know, you have to do a lot of experimentation before you reach the result, as opposed to a normal if you’re developing a website or, you know, a mobile app or something like that. It’s pretty lenient. Exactly. You know what the endgame is, you have your bugs, obviously, you have your delays, but eventually, you reach there. But in machine learning, you never know, they even want to reach is like so open-ended. And because of that, you know, what happens is that good software development practices get thrown out of the window. So what we’ve tried to do in espresso is trying to introduce those software development best practices right upfront. So for example, you have the concept of components. And so when you come in into espresso, and you start working on a project you work on, you create components, and we support various types of components. And those are all object-oriented. So you get a whole skeleton code, we get parent classes, and you’re right, the subclasses, you get the methods very clearly defined. And you have to fill in those methods. Now that helps because firstly, it’s object-oriented design, which is modular, reusable, etc. And also because the structure of that code is standardized, so there’s no problem with documentation. So in a real-life project, you have people walking in or walking into the project working out of the project, that knowledge transfer becomes very simple. And this is a very practical problem, right? It’s very nice to talk about, you know, I built this great NLP engine, but then your team loses one person and all the knowledge goes with him. And it’s again, very tough to figure out, what did he do? You know, this was his Python notebook, what the heck was he trying and so on. So now it’s all really well organized. So that’s one big thing that we’ve done. We also introduced the concept of data versioning. Now, this is extremely critical for data science projects. Because if you look at a model, it really has three inputs to it. The first is the algorithm that means, you know, what are you using? Are you using SK learn Are you using, you know, the Deep Learning Network, whatever, whatever you using, the second is the training data, because obviously, depending on the training data that you provided, the model output will change. And the third is the hyperparameters. Because each of them has certain, you know, knobs on the outside, which you can control and you control the training, right, these, these are the three inputs, so you need to make sure that each of these three inputs is version so that you can repeat the experiment later, even months later, if somebody asks you, you know, repeat that experiment number 73 that I was talking about, you should be able to do it. Because you have the version you have the data you have the code is obviously versioned. Most people version code well, data is version and we version, the hyperparameters as well. So all these are versions plus the output models version. So this is one concept we have for model lineage, we’re able to track your entire process completely. The other thing is around DevOps, you know, as I said, in today’s world, again, containerization is very critical. Everybody in good software development. And DevOps practices involve containerization, which has not really gone down in machine learning very, you know, very clearly, very cleanly. So we support that right out of the pack, we ensure that whatever code you’re doing is containerized and deployed. But you don’t have to worry about it, you don’t have to set up Docker, you don’t have to set up Kubernetes. Again, we deploy to Kubernetes. So it becomes scalable, and you get the whole container orchestration advantages, we deployed a cube flow Spark, but the data scientist doesn’t need to worry, he clicks a button, and it’s done. So these are some of the advantages that you get from Xpresso.

Michael Waitze 27:10
And if you’re just starting off, we talked about this a little bit, but maybe you can address this, again, in a little bit more detail. How does that whole setup work better with smaller data sets, right? In other words, if you’re just coming in with a smaller data set, what is the benefit of using this product, if your data sets are smaller…

Krishna Kumar Ramanujam 27:26
So it’s, it’s really not so much about the data set, or the problem that you’re going to solve? It’s about the process. So we really believe in a structured process. So whether you’re doing, you know, a few kilobytes, or megabytes of data, or you’re doing terabytes or petabytes of data, the process has to be a really solid, so that in an enterprise, right, you have confidence, the business people have confidence in your process, that the models you’re deploying or not, are tested fully are functional, high performance, etc, etc. So, so and this, you know, this whole workflow works the same whether you have a small data set or a large data set. So that’s the advantage.

Michael Waitze 28:02
And are there any other best practices that companies should consider when they’re implementing this type of stuff?

Krishna Kumar Ramanujam 28:10
Yeah, so for example, I talked about, you know, versioning, I talked about model versioning, data versioning hyperparameter, you can have standardized mechanisms for exploring your data. So for example, again, data exploration is, is an art, in some sense. In fact, that’s why I one of my pet peeves is that this whole field should not be called Data Science, which we call data art, because it’s very dependent on the practitioner, you know, science is something that is reproducible if I drop an object is going to fall to the ground, whether you drop it or I drop it, but if I solve a data science problem, it’s gonna be very different results are going to be very different depending on whether I have two years experience or 20 years experience. So we are trying to distill all those practices into this including exploration, you know, standardized rec mechanisms for exploring, visualizing your data, standard ways of monitoring, you know, that’s the other thing of explaining. There are, you know, explainability is a very big research area in today’s world for machine learning. So we’re trying to bring in standardized mechanisms for that. There are some things called feature stores, which are again becoming very critical. These are essentially around the reusability of your features of your feature engineering. So that’s another thing that we’re trying to bring it into Xpresso.

Michael Waitze 29:24
I’m going to let you go. But I do want to say that the title of this episode has to be data science or data art with a question mark after it because I do believe as you said, it has to be repeatable, right? So physics is beatable. Once you understand it, it works. Right? That’s the beauty. Yes. But data science, like we said this from the beginning. It’s an iterative art and if you iterate differently, if two people see the same data but iterated definitely they’re going to come out with different solutions, which means that every result from that data analysis, again, whether it’s descriptive, diagnostic, predictive or prescriptive is going to be different by definition, so maybe it is art usings created through science? That makes sense?

Krishna Kumar Ramanujam 30:03
Yes. Yes, that’s absolutely right. So that’s, that’s one of my, you know, passions, essentially, you know, just structuring the entire field and trying to bring structure into it so that, you know, a kid straight out of college at least is at 70% of the efficiency of a person who has you know, 15 or 20 years of experience that’s, that’s good for the enterprise. That’s good for organizations like Abzooba and I’m sure every other company.

Michael Waitze 30:28
Absolutely…look that’s a great way to end I want to thank you so much for coming and doing this today. KK, Chief Architect EVP, Country Head in India for Abzoobz. This was awesome.

Krishna Kumar Ramanujam 30:37
Thank you very much, Michael. My pleasure, and was great. Thank you for including me on the show.