Edit:

<-- single frame
Quite a bit of improvement overall. The texturing on the bark looks better too. That said, I've got a few things to point out though. I (mostly) addressed these in the following animation:

<-- animation
Firstly, there's one major thing you might want to be aware of in the new composition -- it's called "flyswatting".
In the case of your tree's value-contrast and silhouette combined, the image looks like a bug splattered on the page. It's difficult to see this unless you step back completely and look at the image objectively in its 2D contours/edges placed against the negative space, while also keeping in mind the contrast in color values across the form (as well as the 3d form itself in terms of light and shadow being cast).
The reason your image looks flattened is because, number one, your lighting isn't casting shadows which therefore leads to number two, which is that your image appears formless, which is probably caused by the fact that you were probably not considering its 3d form (or much of anything else aside from the subject matter you intended to draw.) This is definitely a beginner's mistake, but if it's not corrected, you could be drawing for years without realizing what's wrong with your art.

Thankfully there's hope -- you just need to be careful where you place your lighting values, and also be conscious of your silhouette while constructing your overall 2D form (i.e. limb-placement -- in this case, take that literally -- which must be placed in interesting ways in relation to the overall silhouette) in order for your drawing to appear interesting at first glance. I cannot stress how important this is because if the viewer doesn't care for your drawing at first glace, he's not likely to stick around to look at the details you painstakingly worked on either.
In this particular case, regarding colors, you have 5 values to work with -- and that's plenty. You have the lightest color (which should be used sparingly and only on the places you want to draw attention to) and you can use (up to) the next-to-darkest color to shade any closer parts of the form (i.e. anything that uses the brightest highlight color). The highlighted areas draw attention to themselves as being more important since they appear closer. Since our brains are wired to care about the things closer to us first (as they are likely to be the most dangerous!), we notice lighter stuff (such as the shine off a snake's scales) before we notice anything else. Only then does our brain process the darker areas (which generally get lumped into the silhouette in our brain) so that, ultimately, we will notice details in those areas *much* less and devote more processing power to the more important (read: dangerous) elements first.
===============
Form Composition:
===============
You can think of the silhouette and darker areas as the
brain's peripheral vision -- with that, everything is, by definition, much less detailed and more imprecise, which means more shape and color variation is required to get its attention. The more homogenous these 'peripheral' areas are, the less your brain cares about them -- this is the reason the silhouette must be interestingly-shaped and also the reason you must offer a reasonable amount of variation in light and dark areas across the whole sprite (in order to break it up).
Regarding the 3D form's casting of shadows onto other parts of the form (and the depth of the part of the form you are currently rendering), you can represent the deeper (more-shaded) areas with the second-brightest color as the highlight would be used on the closer forms and shade down to the darkest possible color (colors 2-4, with color 5 being the darkest color [and generally only used for outlining and emphasis] with color 2 being the brightest highlight on the deeper areas). In terms of shading, it is sometimes easier to use texture to make the form interesting (rather than shading it directly) because you don't have to be exact with it as long as your representation reads well (and, in pixel art, this is a HUGE time-saver) and this allows pixels to be more efficient than regular digital art in some cases.
================
Shape Composition:
================
As a beginner pixel-artist, one might look at my edit, and at first glance think "whoa! how'd he do that??" -- but this reaction simply stresses how important the above information is to learn. Representing forms must be learned first before trying to refine them (after all, although a tree might appear formless at first glance, everything in nature has something behind it driving the formation of its eventual shape.)
If you open up a program like Graphics Gale, and look at the two image frames in the animation above, back and forth, it will be clear to you that the number of edits I made on this piece is relatively small -- and are also very simple too. At first glance, you might think it to look like a few random pixels placed in the shadowy areas -- and you'd be partially right -- except I placed them carefully with special care as to which values I used at what places, ensuring I had enough shadow between certain areas to define their form.
================
Regarding shadows:
================
It's always good to place the most-used color you want in a given area down first (but **don't** use the outline color in there though, and this applies even when you don't use a traditional outline!!) -- After filling this area, add lighter and darker colors to define detail and texture later -- but only
after you have established a light direction and then used it to cast the respective shadows and placed them where you may. This is pretty fast in pixel art, especially when working with 4 colors -- after all, just as soon as you're done setting up your lights and shadows, you're generally almost done -- the only stuff that's really left is texturing and refinement. And that leads me to my last point: Save refinements (such as texturing) for last.
===============
To sum it all up:
===============
It's better to have a well-composed image with proper lighting representing the shapes and forms than one with refined lighting and details but also fundamentally broken shapes and forms. The latter (fundamentally broken 2D shapes and 3D forms) is the ultimate reason your image looks flyswatted.
In your case, this is because, alongside an incomprehensible and uninteresting silhouette, the bright highlights are used across the entire shape of the silhouette -- this blends every line and every part of the form into a single-colored shape (keep in mind, we're talking about the brain's perception here) making a single, large, bright, formless shape. The key reason for this is that all the colors being so similar makes the shape appear "homogenous" and the brain unconsciously interprets this homogenous shape as
a lumped together single blob of color.
To illustrate this, try looking at your latest image with squinted eyes, and then try to discern where the individual shapes are -- now do the same to my edit -- Which is easier to see, and why is this so?


Hopefully you understand now that, to fix this, you'll have to "break-up" the shapes. This is best done by breaking shapes into imaginary 'layers' (in this case, just two) across the image to give a sense of depth in some areas. This can be done easily with 4 colors because 'layer 1' consists of colors 1-3 (which appears closer to us) and 'layer 2' consists of colors 2-4 (which appears slightly farther from us depending on the intensity of the color values chosen), which ultimately gives the illusion of depth across the whole image without you even having to be entirely conscious of it while you're doing it. When you focus on breaking up the 2D shapes by way of using depth to accomplish this, you automatically achieve a sense of 3D space without much effort. The lighting still is required to cast shadows on the form properly -- but these don't have to be accurate when the shape a shadow would make would break-up an existing shape up too much. In this case, the shadow can be
consciously omitted for the sake of better readability of the image.
The below image should help you to recognize these "shapes" so that you can understand the difference in why my edit reads so much better as a whole than the original:
My Edit:

Your Original:

As you can see, there are a lot more shapes broken up in my edit than there are in the original. The brain distinguishes the boundaries by these broken imaginary 'lines' which are generally led by the highlights and darkness when separating the shapes using depth. As you can see in my original edit, it is quite effective.