Depending on the setup, you can probably get by with just training classes of a few different points. Or do a (simple) volume comparison and/or time-delay thing. This is much easier if you only have two contact mics as you can just compute it directly from the difference in arrival time (e.g. it arrives at the right contact mic first and then the left contact mic 150 samples later, would mean it was at a specific place on the table).
If you have 3 contact mics, then it becomes a more complex mathematical issue (this paper is a good place to start with this).
In terms of interpolating between trained classes, that’s been pretty tricky to do as well. There’s a bunch of info and examples in this thread here, with some working code I believe. I never got it working “well” but it worked alright.