LaKisha David | Dec. 8, 2019, 10:41 a.m.
Kabagworiwe Safiah and her parents Kabagworiwe Adam and Kabagworiwe Saleimabu are members of the Kassena ethnic group who have tested with Ancestry and subsequently had their DNA profiles uploaded to GEDmatch. This is a narrative of the practice of finding their relatives in GEDmatch.
I did a one-to-many comparison for Safiah to find all the matches in the GEDmatch database that match Safiah at a minimum of 7 cM. There were 129 matches, plus Safiah, for a total of 130. Among these matches were Safiah's parents, Adam and Saleimabu, and several other family members from their local village of Nania in Paga, Ghana.
It is very exciting to see these 129 matches for Safiah. There is a chance that all are true matches, but the reality is that some of these are not true matches. This is a reality of the testing technology and not GEDmatch.
Imagine that when the equipment reads the testers' DNA, the 22 pairs of chromosomes were first broken into many pieces and duplicated. These were read and recorded into a text file. The equipment reads each chromosome piece and records it in the text file. Each position along the chromosome piece will have two reads, one for a biological mother and one for a biological father.
This situation is that when the pieces are recorded, it cannot tell which parent to assign the chromosome reads. The equipment only records what's there but not how it was inherited. Sometimes the first variant is from the father and sometimes the first variant is from the mother. We just don't know which is which without further analysis. This results in a text file with the chromosome positions read for the chromosome pairs without being aligned by a specific parent. This is an unphased DNA profile.
When the DNA profile is downloaded from Ancestry and uploaded to GEDmatch, the profile is unphased. Regular kits on GEDmatch are unphased kits. This is no fault of Ancestry or GEDmatch, but it is a reality that we must account for when finding matches.
The most accurate way to align the kit on GEDmatch is to test the person's parents and create phased kits. In this case, Safiah's parents have also tested and their kits were uploaded to GEDmatch. I used GEDmatch's Phasing tool to create two phased kits for Safiah, one is phased with her mother's kit (M1 kit) and one is phased with her father's kit (P1 kit). These phased kits have Safiah's DNA variants aligned by the parents so we know that the DNA segment was inherited by one (or both) parents.
At the very least, we should expect that when two persons match, at least one parent of both persons should also match. For any set of cousins, one parent for both cousins should also match. This is a fact of inheritance. There are sure to be some exceptions but this is the situation for the vast majority of the cases. For example, for any person that matches Safiah, we expect, at a very minimum, that the person also matches Safiah's father Adam or Safiah's mother Saleimabu. We also expect that Safiah matches at least one parent of that match.
So let's see how many of Safiah's 129 matches also match her father.
Using a one-to-many with Safiah's P1 kit, her kit phased with her father, we see that she shares 44 matches with her father. Minus the kit itself, her, and her father, there are 41 matches.
And let's take a look at how many of Safiah's 129 matches also match her mother.
Using a one-to-many with Safiah's M1 kit, her kit phased with her mother, we see that she shares 47 matches with her father. Minus the kit itself, her, and her mother, there are 44 matches.
Alright, so there are 41 matches with father and 44 matches with mother for 85 matches. So of the 129 matches that Safiah started with, only 85 matches at most meet the basic expectation of also matching one of her parents. This is not yet addressing those that match both mother and father or some other reason to reduce the total count of matches.
At this point, I would say that Safiah has about 85 possible matches, some of which would prove to be false trough further analysis. I consider the following to be true matches: (1) kits that match Safiah's unphased kits at a minimum of 15 cM on a single segment, (2) unphased kits that are 3 to 14.9 cMs with the Safiah's unphased kit must also match at least one other of Safiah's matches at a minimum of 200 cM total on an overlapping segment (e.g., using the 3D chromosome browser and triangulation), and (3) phased kits or Lazarus kits compared to Safiah's phased kit can be at a minimum of 3 cMs.
It turns out that only 53% or 68 of the 129 matches remained after excluding those that did not also match her phased kits. However, all of the unphased matches at 14 cMs or more remained as possible matches after comparing the match with Safiah's phased results. From this, we assume that any kit matching at least 15 cMs more will prove to be a true match even if their parents or other close match is not available to provide additional evidence.
In the chart above, cMs is the number of cMs in the match and S1 is the number of matches in Safiah's unphased one-to-many comparison results. For example, using Safiah's unphased kit, she had 9 matches at 40 cMs or greater, 2 matches at 30 cMs - 39 cMs, and 7 matches at 20 cMs - 29 cMs. At 13 cMs, 2 of the 4 kits that matched her unphased kit did not match at least one of her parents. At 7 cMs, only 21% of the matches also matched at least one of her parents. At this rate, we cannot tell which of the matches less than 14 cMs would remain as an inherited DNA segment when the person she matches also compares their results with their parents. This is why when working with unphased kits, we maintain the rule of 15 cMs minimum for a single segment.
I then use GEDmatch's 3D chromosome browser to compare all of Safiah's one-to-many matches with her M1 phased kit and her P1 phased kit. In the 3D chromosome browser results below are based on a 7 cM threshold. I also removed the kits that were in the TAKiR database (which would be other people from Ghana or Burkina Faso). This leaves a matrix comparing all the matching kits to determine how much they match each other.
In the 3D chromosome browser results, I look for kits that match each other at a minimum of 200 cMs. This would be a stand-in for not having their phased kit. This threshold of 200 cMs is not entirely arbitrary. It's an amount where I assume would be at the bounds of a person being able to name the shared ancestor without DNA testing. According to statistics published by the International Society of Genetic Genealogy (ISOGG), second cousins would share 212.5 cMs or 3.125% of DNA on average. In practice, according to statistics published by The Shared cM Project, second cousins share 233 cMs on average (46 - 515 cMs). Second cousins share great grandparents. Although the matches would more than likely be true at 15 cMs, using the 200 cM threshold would increase the chance of being able to contact the persons for the kits matching each other and confirming their relatedness.
In Safiah's M1 3D results, there are 14 matches (7 pairs) that are of interest because they share 200 cMs or more total. An additional 6 matches (3 pairs) are also worth investigating because they share 15 cMs - 199.9 cMs total, though I need to check how much they share on a single segment.
In Safiah's P1 3D results, there are 18 matches (9 pairs) that are of interest because they share 200 cMs or more total. An additional 2 matches (1 pairs) are also worth investigating because they share 15 cMs - 199.9 cMs total, though I need to check how much they share on a single segment.
So now I have 32 matches (14 + 18) that I am very confident are true matches. There is also an additional 8 matches (6 + 2) that I could check to see if they also share 15 cMs on a single segment.
Looking back at the 3D chromosome results for Safiah's M1 and P1 kits, I now look at the first column. This columns tells me how much DNA each kit of interest shares with the key phased kits.
In the M1 results, matches in the first pair share 2,326.7 cMs with each other and, from column 1, 17.5 and 17.8 cMs with the M1 phased kit. I'm very confident that these two kits are true matches with Safiah and her mother, and I would contact them saying so.
In the M1 results, matches in the second pair share 1,626.3 cMs with each other and, from column 1, both share 8.1 cMs with the M1 phased kit. Based on the amount of shared DNA between each other, this pair is likely grandparent-grandchild (or uncle/aunt-niece/nephew). Because I'm using the M1 phased kit, I'm very confident that the shared segment is true between Safiah and her mother. However, using unphased kits for the possible grandparent-grandchild pair means that there is a chance that this 8.1 cM segment is not a truly inherited segment between the matches in the grandparent-grandchild pair. After all, only 69% of Safiah's matches at 8 - 8.9 cMs also matched at least one of her parents (see Kabagworiwe Safiah's Matches that Remained picture above). So there is a possibly that when aligned (i.e., phased), this segment would end up not being a true segment for the discovered grandparent-grandchild pair. At the same time, the kits in the grandparent-grandchild pair were read by the equipment separately and as such, this segment is more than likely not based on an error. I am fairly confident that this is a true match and would proceed with this pair as a true match. However, I would also encourage the grandparent-grandchild pair to create a phased or Lazarus kit (as I would encourage all potential matches to do) and use that to compare with Safiah's phased kit for stronger evidence of relatedness.
This may seem like a lot of unnecessary steps, but Safiah's P1 3D results illustrate why these steps are necessary. Recall that these matches were selected from Safiah's P1 one-to-many comparison results. The first pairs share of interest are a block of 4 matches sharing at least 3,500 cMs with each other. However, looking at column 1, none of these matches actually share at least 7 cMs with Safiah and her father. In a one-to-one comparison between Safiah's P1 kit and each of these 4 kits, none share a minimum of 3 cMs with Safiah and her father. This is a prime case of matches that appear to be true matches from a one-to-many comparison but ends up not really matching the key person on a one-to-one comparison even at a lower threshold. These would not be counted as true matches.
Using the 15 cM (or 14 cM from Kabagworiwe Safiah's Matches that Remained picture above), of the 32 confident matches, only 4 are very confident matches, and an additional 4 are already shown to not really be matches after all. Testing additional people and/or phasing kits would provide additional evidence of relatedness and make for more certain matches, even with the actual shared ancestor remaining unknown.
It is exciting for all of us to find relatives. It's exciting for people of African descent to find their relatives from Africa and it's exciting for Safiah's family to find the descendants of those who were taken away during slavery. However, in this excitement, it is important to maintain certain rules based on the technologies we are using to claim relatedness. People of African descent are increasingly finding relatives from Africa so there’s no need to shortcut the verification process.