-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define core schema #5
Comments
Here's my take on a minimal schema, divided by category. Entity IdentificationA unique identifierProbably something arbitrary or an established scientific name. In the latter case, we'd have to choose which nomenclature. Taxonomy and nomenclatureThis section defines the crop being described for interoperability with other data sets TaxonomyTaxonomic hierarchyAn array of the taxonomic hierarchy, starting at the family level. e.g.
NamesScientific nomenclatureThe names assigned to this plant by different nomenclatures. Relevent to a crop database these are ICN and ICNCP. There will often be a name for common plants in each nomenclature, and there will be synonyms. Keeping a list of all synonyms is out of scope, but we should list the common ones which some people continue to use.
Common namesBy language or locale
PhenologyThis section is going to be tricky to define a core schema for, because crops all have different triggers for various life cycle stages. I'm more or less ok with the idea of a simple 'days to maturity' or 'days to germination' field as a shortcut to this data as long as we understand that such figures totally relative to the environment and the common numbers you'll find in English growing literature are usually for temperate climates and are totally inaccurate elsewhere. Germination
Fruiting and Flowering triggers
Other phenological properties we might want to consider
Environmental tolerance
Technique / unscientific dataI think Ok, that's what I've got for now... |
Here's a rough example of a crop using something like the above schema
Here's an example of why days-to type data is tricky... The data I was using for some of this entry is this:
So, harvest time for the whole plant is 40-60, or 50-90, or for just an outer leaf harvest it's 25-45. That means the full range for a single property is 25-90. Ouch! We could have that overridden to a more specific value in the cultivar entries, assuming we could get that data, and we could add an additional property like
Not sure how best to model it, and in large part it depends on what data we can get our hands on. |
@andru this all looks really great to me. I agree that anything that is opinion such as Do you think the schema should include linking to the same entity on other sites? Should we use the wikidata identifier when available? |
@andru @roryaronson how familiar are you with linked data? It would be beneficial to use existing properties where possible. For stuff like name, commonName, synonym, etc that should be possible. Would be useful to check what other schemas already cover some of the specific stuff.
Yes, please read this for a much better description. Although it does depend on what the user needs are, which it would be great to tease out into user stories. Are any of these crop specific properties regional? |
Only passing familiarity. I get totally confused when it comes time to choosing which ontologies to use to express which relationships. Furthermore, any datasets we can link to instead of maintaining our own data is totally desirable from my perspective, but this has to be balanced with ease of use for the end user. As long as it's open data we could pull it in to the flat-file with a compile step, so that the source entities link to, e.g., agrovoc for name data, but the final compiled output bundles and transforms that data for ease of use. What property of which ontology would you use to model these kinds of relationships..? entity > taxonomy defined by > http://oek1.fao.org/skosmos/agrovoc/en/page/c_1068
As an additional ID I think for sure. As a primary ID this sounds great in theory, but wikipedia/base doesn't really cover many but the most well known cultivars, so that either lands us with the project to make sure it does, or not base our ID system upon it. |
@andru this is great! I agree with all your points. I will also add some thoughts: First, I really think that we should be conceiving of this as a distributed model right from the start. This will require a discussion of its own, so I started a new issue for it: #7 If we take a distributed approach, we can drastically simplify the schema requirements that we start with - only the most widely agreed-upon properties - and then begin a formalized process of suggesting and adopting additional properties moving forward. An example: if we start by providing a very bare-bones "base" dataset, with only a small set of common plants, then others can start to build their own derivative datasets on top of it (for specific cultivars, varieties, etc). And they can choose to include schema properties of their own creation (within a minimal framework that we provide, of course). Over time, as consensus is built, the standard schema definition can choose to adopt additional properties, when it makes sense to. Let's discuss these ideas more in the other issue, but I thought it was worth mentioning here, as it might take some of the pressure off of the initial schema definition task. In regards to your specific points: I especially love the phenology section - and I agree that it is the trickiest. You sort of touched on one idea that might work: we could "peg" the average numbers to a specific climate, so it is known that they are numbers relative to that climate. With that as a reference, it might be possible to write conversion code that can translate those numbers to other climates. That would be in the domain of the app, though, not the schema. And again, maybe we can leave some of the phenology properties to the creators of derivative datasets in the beginning - but provide a container for them it at least to get it started.
A Universally-Unique Identifier will ensure that there are no potential collisions with crops or cultivars that someone else's database provides, and will make it more certain which parent crop is being referenced in the "inherits" property. There will also need to be a "dataset reference" property, as well, that points to the dataset containing the parent - if it's not the same dataset that the current crop is in.
That's fine with me. But I think there should be some properties to define overall "size", like "height" and "spread" at maturity. Something that could be used to infer what's possible in terms of spacing... A mature basil plant is much different than a mature pumpkin plant. And I'm still on the fence a little about "yield", but we can leave it out for now. I fully understand that yield is extremely variable depending on climate and growing conditions. But it is less variable when you consider it in relation to other crops. Again the basil vs pumpkin example: I can never expect to get the same amount of yield from a single basil plant as I can from a single pumpkin. What I am hoping to achieve is to give the programs that use this data a real sense of the plants - even if it is just "average" information - it provides necessary knowledge that can be used to generate plans. User-submitted "Guides" are great, but I'm looking toward enabling automatically-generated guides... as much as is conceivable. Great conversation everyone! I'm really excited about this! |
Agreed. I'd also find this super useful. As I'm sure would FarmBot.
It's worth thinking about. Such a property can only hope to describe potential yield per area in ideal growing conditions which as you say, can be a ballpark figure to help someone calculate a total yield. It seems like it would need to be a property with ranges and contexts to be useful (to me). In line with the ideas of a distributed database, this property could from forming part of a DB which deals with more relative estimate figures used to advise growers as opposed to factual data. |
Yes, and maybe we would even see "regional" databases start popping up - which derive from the core dataset, but override the phenological properties for their specific region. |
I read some around problems in merging the data. If we want to minimize this problem, it'll be hard to avoid links to climates types, at least big families to reduce the range in stuff like germination or other durations. For example, only in France, there is 3 or 4 big climates types that can mess the ranges a lot. Only in a matter of being in north or south, you can delay seeding from 1 month ... |
Yea, for the most part I think the data should be as climate-agnostic as we can - and leave those kinds of decisions (ie: when to seed) up to the application that's using the data. It's tricky to draw the line, but I am in favor of starting with a very small set of standard data properties and then allowing new ones to be discussed/debated and maybe added to the schema one-by-one moving forward. |
That's the good way, to start with everything we agree as basis without a doubt. |
We started sketching out the general objects and properties in a wiki: https://github.com/openfarmcc/Crops/wiki/Crop-data-needs But that was just a first sketch/brainstorm, and hasn't received much attention or modification. The idea behind using YAML or JSON was to make something that was database agnostic, so could be shared between many different software platforms. But if mocking something up in MySQL is helpful to you, by all means! Ultimately it doesn't matter a whole lot what format the data is represented in, because it can be stored in a simple format and then imported into just about anything. I think that's the hope anyway. |
Ok, I'll work on it this evening if "evening" has any sense in this international discussion ! I'm thinking SQL because (I'm used to) and for example, when I see properties like "days to germination" and "days to maturity" my first though is to create tables "timings" and "timing types" to be able to add another "days to XXX" without touching to the DB structure. Maybe it's too much. |
Great discussion. Any news on this front? You may want to consider following the OpenAPI spec for defining the API. |
There hasn't been much progress on this as a separate entity, though OpenFarm has just been forging ahead with its crop endpoints. |
Not much progress yet - I've been focused on more general farmOS development recently - but it is still my plan to pick this back up when I get farther along, and hopefully we'll all still be open to sharing schemas/datasets at that point. |
A space to discuss a core schema. Entries could contain additional data, but it would be useful to define a schema that all entries share.
@pmackay and @mstenta already got started at this over at https://github.com/openfarmcc/Crops/wiki/Crop-data-needs, but I'm opening this for some parallel discussion which we can port over to the wiki as we reach consensus.
The text was updated successfully, but these errors were encountered: