The future of artificial intelligence (AI) is overflowing with exciting possibilities where data science, knowledgeable teams and advanced tools work together to push the ever-expanding boundaries of technology. But the road going from data to a successful AI project is no straight line.
Here’s a fun fact for you: Gartner estimates that 85% of big data projects fail. Tech giants like Microsoft learnt this the hard way when their innocent AI chatbot went on a rampage on Twitter. Like most things in life, AI is tough to get right but easy to mess up.
This doesn’t mean you’re doomed to end up like Microsoft and pull the plug on your beloved AI after months of hard work. To give you a hand, here are seven fundamental tips to consider when building AI that can positively revolutionise your organisation.
1. Clearly define the purpose of the AI project
If you can’t summarise the end goal of your AI in one sentence, then it’s not clear enough. Figuring out your target customers and defining what makes your AI unique are key questions that will drive your approach and increase your chances of success. Here are a few pointers if you’re just starting out with your own AI.
Understand your customers
Here’s where you ask who benefits from your AI solution? What problems can you solve for them? Consider mapping out the key use cases alongside a group of actual representatives of your target audience for accurate insight into their needs. If there’s no real need, there will be no adoption and no ROI.
Measure your capabilities
This is where you really flesh out what your solution involves and what you need to make it happen (data, knowledge, tech, etc.). Doing this will give you a clear picture into whether the requirements align with your capabilities and technology.
Evaluate your competition
The end goal of your solution is to be a better alternative to whatever is already out there. This means your AI project has to be a step up from existing solutions. So, what makes your project special?
Define the required quality
How good does your AI need to be so it can be considered useful? This is the time to define the level of accuracy your customers need and the steps you need to achieve it. You should also think about the payoff matrix for quality outcomes so you can tune your optimisations around that matrix.
2. Follow a proven methodology
AI isn’t something you want to improvise as you go. Following a tried and tested methodology will ensure your data science project is reliable and successful.
The most common methodologies are SEMMA and CRISP-DM. We’ll save you the Google search and give you a brief overview of both.
SEMMA stands for Sample, Explore, Modify, Model, and Assess. It’s an iterative process for data mining using thorough modeling techniques. While it’s considered the standard methodology, it focuses on procedures rather than results and casually leaves out all business aspects. This is where CRISP-DM comes in.
CRISP-DM stands for ‘CRoss-Industry Standard Process for Data Mining’. Unlike SEMMA, this methodology includes a ‘Business Understanding’ phase that focuses on the objectives from a business perspective in relation to data mining definitions. Feel free to dig further into the phases of the CRISP-DM methodology if you suspect this is the one for you.
3. Find data from a trusted source
There’s no other way around it, to create AI, machine learning algorithms need data. Before moving any further, you have to define how much data you need and how you intend on getting it.
Data scientists have a few options when building training sets to feed into their algorithms: they can buy datasets, find open-sourced datasets, use artificial data, or engage with smart outsourcing solutions where dedicated annotators deliver accurate data to train and develop your AI models. The last option essentially acts as an extension of your in-house resources.
There is, of course, the option of annotating training data yourself, but not everyone has time for that.
4. Choose your algorithms for machine learning
Now for the big question: what machine learning algorithm should you use? According to Microsoft’s guide on choosing algorithms, it depends on your project. Here are a few considerations to help you narrow it down:
- Accuracy of results
- Training time
- Use of linearity
- Number of parameters
- Number of features
There is no shortage of algorithms at your disposal, but of course you’ll want to choose the one that’s best suited for your project. As you may already know, the majority of practical machine learning uses supervised learning. Some popular examples of supervised machine learning algorithms include linear regression for regression problems and support vector machines for classification problems.
However, if you don’t plan on having data on desired outcomes, then you’ll want to use unsupervised learning. Popular examples of these algorithms include k-means for clustering problems and the apriori algorithm for association rule learning problems.
If you need a refresher, here’s a post you can dig into for a detailed view into the key differences between supervised learning and unsupervised learning.
As for computer vision algorithms, artificial neural networks like Convolutional Neural Network (CNN) is better suited for the task of image labelling, annotation, and segmentation. Whereas Recursive Neural Network (RNN) is best for language analysis. Lastly, Multi-Layer Perceptron (MLP) is ideal for speech recognition and machine translation. (Just to give you a hint.)
Check this resource for a fine breakdown of machine learning algorithms.
5. Design and build your infrastructure
Building an AI infrastructure is a strategic decision where you have to consider things like data storage, computing resources, budget, and time. A useful tutorial series by Intel explains the infrastructures you can choose:
In-house hardware (on-prem)
Building and maintaining your own computing infrastructure in-house requires a lot more upfront effort, but it also gives you more freedom. With on-prem infrastructure, you can choose which GPU to use. There are pre-built DL server like Nvidia’s DGX Systems or you can have a custom workstation built using companies like Lambda Labs and AMAX. Another option is to build a DL workstation from scratch.
A cloud provider (like AWS, GCP and Microsoft Azure) makes the most sense when you’re just starting out. You can get your first training model on a high performing GPU for less upfront investment than on-prem, with the added advantage of up-to-date technology and hands-off maintenance. You can also use ML-specific providers (like Paperspace) which tailor their infrastructure offerings to better support deep learning workflows.
Like with everything else on this list, there are questions you need to answer before selecting an infrastructure that will properly support your AI projects. For example, how big is your data set? Do you have a team that can dedicate their time to maintaining on-prem systems? Are you training a model from scratch or using a pretrained model? Answer these questions now so you don’t have to deal with switching infrastructures later.
6. Test and validate your model
AI needs to be trained before it can be useful. This means running your AI application through a training data set so it can create a model, then running it again on an entirely new set to test the accuracy of results.
Sounds simple in theory, but there are dangers such as data bias which results in bad functionality (and bad press). You may have seen the media storm surrounding biased facial recognition software or the racial failure of the beauty pageant bot, Beauty.AI. (Here’s a cheat sheet by Figure Eight on how to prevent bias in your AI projects.)
You’ll get a strong hint that something is amiss with your model if it miraculously spits out 100% accuracy. Overfitting is a classic challenge of AI where your application memorises the training data and performs poorly on real-world data. On the flip side, if you get dismal results that don’t model the training data or generalise to new data, then you’re looking at a case of underfitting. It never ends.
In all honesty, training may take up more time than the actual development, but it’s possibly the most important step in your AI strategy. A trained and tested model is a useful model.
For more details on these challenges (in acapella format), check this video by Udacity:
7. Constantly monitor and retrain your model
Once you have a model that’s finally trained and validated, it can be tempting to lean back and call it a day. But the reality your model monitors is dynamic, which means your model should be too.
As the Former Director of Marketing at CognitiveClouds, Amit Ashwini, writes in a blog post,
“Business conditions change, customers change, products change, changes in your environment can affect your application. Its performance will gradually degrade over time, even though you might not notice. If you’re planning an AI project, you need to account for retraining.”
While this is not exactly a comprehensive guide into the best AI strategy for your project, it’s a solid start for you to ensure your AI is on the right path. If you have any questions on how you can acquire accurate data to reliably train and develop your models, drop us a note.