Well it depends on my material, but when working on PC or with paint, I like to start out black. With a graphite stick I work the other way around which also works, and in pixelart it really depends on what I'm making. But what I mean with the term 'tone' is really the light and dark. You can make things clear with colour, but I like to only make them clearer. Therefore I generally start with a dark grey, then start adding lighter tones as to where things exist that are hit by light. I leave black open for details and extreme contrast, as I'm generally more likely to use pure white than pure black (using both creates odd contrast when applied wrong, or unsuited for the piece). I'd really have to show you some progress strip or something to break this down completely, but I'm not sure if you like that being posted (and possibly further discussed) in your topic. So PM me or reply here about where you want to go with that, I'll start on an example anyways now.
For your further WIP: Tetra in general, and Aryll's hair/face fall away in the background. There is too little contrast going on to make them stick out, and in general the bright colours and the lack of contrast in the actual value/brightness/tone loses depth. In general, a photograph doesn't capture an environment well, because you lose the second eye and with that the depth. As we make art and create ourselves, people often use foggyness or darkness in an environment towards depth, and you can make your character stand out by giving them more brightness or contrast, and let them have relatively excessive shadows cast onto the environment.
Now the art on the characters on themselves is good, the environment a little less but hey, that's a very WIP as we all can see and I'm sure you'll progress further into that without further comments. But what this piece lacks still is interaction. Aryll stares at the camera while Tetra looks at something non-existant (or at least, I can't see it from my viewpoint, so it should not be of interest for me, nor the characters portrayed in your scene). It'd make more sense for them to either interact with me, having a first person view, or with eachother. Or possibly with the environment itself, right now they're just standing there with no relation to eachother, nor with relation to the place they are at. Make sure that how they stand makes sense, not necessarily (but likely with only 2 characters) to eachother, but to the environment as well. Aryll seems to happily point something out, but to something by my, the viewers, feet which is invisible now.
PS: You draw out a shape, by defining 2 area's, 'solid' and 'rest', per surface. But what your shape lacks is depth, this method could work very well when drawing from life, but to really make sure your dynamics are right, you should add a lighter and darker shade in your initial form to define depth and positioning in the third dimension beforehand. That's just what I could suggest you to do for now though, perhaps I don't understand your workflow well enough yet.