This section serves to provide a generic checklist for the most important aspects of a successful project in this space.
A good starting point is the Engineering Fundamentals Checklist from the CSE code-with-engineering-playbook
- Can the problem to be solved be clearly and succintly defined? For example, 'We need to extract the following 6 fields from these n forms'
- Can the success criteria be clearly defined? For example, 'If the following 6 fields can be extracted 80% of the time from these n forms, then our success criteria are met'.
- Has a hypothesis driven approach been adopted? This is vital for success on any data driven project.
Please refer to How to Implement Hypothesis-Driven Development as a good example of applying the scientific method to ensure success for a data driven software project.
The steps of the scientific method are to:
- Make observations
- Formulate a hypothesis
- Design an experiment to test the hypothesis
- State the indicators to evaluate if the experiment has succeeded
- Conduct the experiment
- Evaluate the results of the experiment
- Accept or reject the hypothesis
- If necessary, make and test a new hypothesis
- Sufficient data exists so that the data may be split into training, test and validation sets.
- The data available is representative of what is required to be predicted in the production environment.
- Sufficient labelled or Ground Truth (GT) data exists so that a variety of techniques may be experimented with.
- The data is accessible to the project team and is compliant with the organisation's security policies.
- All regulatory constraints on data collection, analysis, or implementation are clear.
- Data skills exist within the team to be able to analyse and manipulate the data as needed.
- At least two data resources are available during the analysis phase to validate each other's work and implement a variety of diverse experimental approaches.
Fast AI has a fantastic comprehensive checklist for generic data projects.
Now refer to the Decision Guidance section to determine whether the Form Recognizer service is a good fit.