I am trying to replace values from a list. However, I would like to be able to set the frequency for which this occurs. I would like to provide values (max and min) for the number of values that are replaced. For example. I would like to be able to set the max number of values to be replaced a 5 and the minimum at 0, meaning that any given line, at most 5 values will be replaced but there is a possibility that none are replaced. The numbers could be anything values though, I don’t want to just limit myself to 5 and 0 haha. I know this sounds bizarre but for the type of analysis I want to perform, the needs to be some kind of set frequency.
Based on previous posts, I have been able to find ways to randomly replace values but I haven’t been able to find anything that talked about setting how frequent the random replacement occurs.
The code that I am using looks like this
import random
vals = ['*','1','0']
with open("test2.txt","w") as out:
with open("test.txt", "rt") as f:
for line in f:
li=line.strip()
tabs = li.split("\t")
geno = tabs[1:]
print(geno)
for index, x in enumerate(geno):
if random.randint(0, 1):
geno[index] = random.choice(vals)
print(geno)
an example of a list that is being used looks like this
['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
A few example lines from my data to help out with your answer
AAD - - 0 - - - 0 - - 0 0 - 0 0 0 0 0 - 0 0 - 0 0 - 0 - 0 - - - - - - 0 - 0 0 0 0 - - 0 - 0 0 0 0 0 - 0 0 0 0 - 0 - 0 0 0 0 - - - 0 - 0 0 0 0 - 0 0 0 0 0 - 0 0 0 - 0 0 0 0 - - - 0 - 0 0 0 - 0 0 0 0 0 0 - 0 0 - 0 0 0 0 - 0 - 0 0 - - 0 0 0 0 0 - 0 0 0 0 - 0 0 0 0 - 0 0 0 0 - - 0 0 - 0 0 - 0 0 0 0 0 0 - - 0 0 0 0 0 - 0 - - 0 0 0 - 0 0 0 0 - 0 - - 0 0 0 0 0 0 0 - 0 - - - 0 - - 0 - 0 0 - 0 - - 0 - - - - - - - - 0 0 - 0 - - - - - - - - - - 0 0 - - - 0 0 0 - 0 0 - - - - - - - 0 0 - - - 0 - - 0 - - - - - 0 - - - - - - - 0 - - 0 0 0 0 - 0 - - 0 0 0 0 0 0 - - - - - 0 - 0 - 0 0 - - - 0 - - - - 0 - - - - - 0 - - - - 0 - 0 0 - - - - 0 - 0 - - - - 0 0 - 0 0 0 - 0 - 0 - - - - 0 - - 0 - 0 - - - 0 0 0 0 - 0 0 - - - 0 - - - - 0 0 - - - 0 - - - - - - - - - - - - - - - - - - - - - - 0 - - - - - - - - - 0 - - 0 - - - - - - - 0 - 0 - 0 - - 0 0 - - - - - - 0 - - - - - - - - - - - - - - 0 - - - - - - 0 - 0 - 0 - - - - - 0 0 - - - 0 - 0 - - 0 0 0 - - - - - - - - 0 - - 0 0 - - - - - - 0 0 - - - - 0 - 0 - - - - 0 - - 0 - - - - - - 0 - 0 - - - 0 - - - - - - - - - - - - - - - - - - 0 0 - - - 0 - - - - 0 - - 0 0 - - - - - - 0 - - 0 - 0 - - - - - - - - - - - 0 - 0 0 - - 0 - - - 0 - - - 0 - - - - - 0 - - - 0 - - - 0 - - - - - - - - - - - 0 - - - - - 0 0 - - - - 1 - 0 - 0 - - - - 0 - - - - 1 - - - - - - - - - - 0 - - - - - - - - - - - - - - - - 0 - 0 - - - - - - - - 0 - - - - - - - - - - - - - - - - - - 0 0 - 0 0 - - - - - - - - - - 0 - - 0 - - - - - - 0 - - - - - - - - - 0 1 - - - - - - - - - - - - - - - - 0 - - - - - - - - - - - 0 1 - - - - - - - - - 0 1 - - - - - - - - 1 0 - - - 0 - - - - - - - - - 0 - 0 - - - - - - - - - - - - 0 - - - 0 0 0 - - - 0 - - - 0 - - - - - - - - 0 - - - - 0 - - - - - 0 - 0 - - - - 0 - - - - - - - 0 - - 0 0 -
AAC 0 - 0 - - - 0 0 - - - 0 - - - - 0 0 0 - - - - 0 0 0 - 0 - - 0 0 0 0 - - - - 0 - 0 0 - 0 - 0 - 0 - - 0 - - 0 0 0 0 - 0 0 - 0 0 0 0 0 0 - 0 0 - - 0 0 0 0 - 0 0 0 0 - - - - - - - 0 0 - 0 - 0 - - 0 - 0 - 0 0 0 - - 0 0 0 - 0 - - - 0 0 0 0 0 0 - - 0 0 0 0 0 0 0 0 - 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 - - 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 - - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - 0 0 - 0 0 - - - 0 0 0 - 0 - 0 0 0 0 0 0 - - 0 - 0 0 0 0 0 - 0 0 0 0 0 - 0 0 - 0 - 0 0 0 - - - - 0 - 0 - 0 0 - - - - 0 0 - 0 - 0 - - - - - - - - - 0 0 - - 1 - 0 - 0 - 0 - 0 - 0 - 0 0 - - - - - - 0 0 0 0 0 0 0 - 0 0 0 - 0 0 0 - 0 0 0 0 0 0 - - 0 - - - - - - - 0 - - - 0 0 - - - - 0 - 0 - 0 0 - - 0 - 0 0 - 0 0 - - 0 - - - - 0 - 0 - 0 - - - - 0 0 0 0 0 0 0 0 - 0 - - 0 0 - - - 0 - 0 0 - 0 0 - 0 - - - 0 0 - - 0 0 - 0 - - - - 0 0 - 0 - 0 0 - - - - - - - - - 0 - - - 0 - - 0 - 0 - - - 0 - - 0 0 0 - - 0 0 - 0 0 - 0 - 0 0 0 0 - - 0 0 0 - 0 - - - 0 - - - 0 - - 0 - 0 - 0 - - - 0 0 0 - 0 - 0 - - - 0 - - - - 0 0 0 - 0 0 - - - 0 0 - 0 - 0 0 - - - - - 0 - 0 - 0 0 - - - - 0 - 0 0 - - - 0 - - - 0 - - - 0 - - - - - 0 - - - - 0 0 0 - - - 0 - 0 0 0 - - - - 0 - - - - - - 0 - 0 0 0 0 - 0 0 0 - 0 - 0 - - - - - - 0 - - 0 - 0 0 0 0 0 - - - - - 0 - - 0 - 0 0 - - - - 0 - - - - - - - - 0 0 - - - - 0 - - 0 - - - 0 - - 0 0 - - - - - - 0 0 - - - - - 0 - 0 0 - 0 0 - - - - - - - - - 0 0 0 - - 0 - - 0 0 - - - - 0 - 0 - - - - - - - 0 0 0 - - - - - 0 - - 0 0 - 0 - - 0 0 - - - - - - - 0 0 0 - 0 0 - - 0 - 0 - - - - 0 - 0 - - - - - - - - - - 0 - -
>Solution :
To elaborate on my comment:
import random
def replace_random_indexes(lst, min_n, max_n, replacements):
# (1) Figure out how many indexes to change.
n = random.randint(min_n, max_n)
if n == 0: # No changes required? Return original list.
return lst
# (2) Get a random set of indexes to change.
indexes = set(random.sample(range(len(lst)), n))
# (3) Using a list comprehension, return a new list with the indexes
# from `indexes` changed to a random choice from `replacements`.
return [
random.choice(replacements)
if index in indexes
else value
for index, value
in enumerate(lst)
]
orig = [1, 2, 3, 4, 5, 6, 7]
for i in range(10):
print(replace_random_indexes(orig, 0, 3, ["a", "b", "c"]))
This prints out e.g.
[1, 'a', 'a', 'c', 5, 6, 7]
['a', 2, 'c', 'a', 5, 6, 7]
[1, 2, 3, 4, 5, 6, 'a']
[1, 'a', 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 'a']
[1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 'c', 'c']
[1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 'b', 5, 'c', 'c']
You can note that no list has more than 3 items changed.
To plug that into your original program,
vals = ["*", "1", "0"]
with open("test2.txt", "w") as out, open("test.txt", "rt") as f:
for line in f:
li = line.strip()
tabs = li.split("\t")
geno = tabs[1:]
new_geno = replace_random_indexes(geno, 0, 5, vals)
print(new_geno)
or similar should do the trick.