代做Python：sequence代写 | 生物DNA分析 | 生物DNA序列分析 - Python代做_EssayGhost专做计算机编程作业代写 Essay代写 Assignment代写论文代写

代做Python：sequence代写 | 生物DNA分析 | 生物DNA序列分析 - Python代做

发布时间：2021-07-25 15:28:24浏览次数：

You will notice that large positive values in the PWM (e.g. 0.59) correspond to positions where the TF has a strong preference for a particular nucleotide (e.g. an A at position 3).We can now use a PWM to calculate the score of a match between a candidate sequence of L nucleotides and the PWM. For example, the score of candidate sequence TCGATC is obtained by summing up the appropriate values from the PWM:Score(TCGATG) = PWM[T][0] + PWM[C][1] + PWM[G][2] + PWM[A][3] + PWM[T][4] + PWM[G][5]= 0.19 + 0.43 + 0.43 + 0.59 + 0.54 + (-‐0.09) = 2.11whereas for sequence ACATAG, we getScore(ACATAG) = PWM[A][0] + PWM[C][1] + PWM[A][2] + PWM[T][3] + PWM[A][4] + PWM[G][5]= 0.37 + 0.43 + 0.07 + (-‐1.41) + (-‐1.41) + (-‐0.09) = -‐2.03The larger the score, the higher the affinity of the transcription factor for the sequence.Finally, we can use a PWM to scan a longer sequence to identify potential binding sites for the TF. This is done by calculating the PWM score for every possible portion of L nucleotides of the sequence. For example:Sequence = ATCGATGAGACTGA Score(0) = Score(ATCGAT) = -‐3.87… Score(1) = Score(TCGATG) = 1.65… Score(2) = Score(CGATGA) = -‐4.16… Score(3) = Score(GATGAG) = -‐4.16… Score(4) = Score(ATGAGA) = -‐0.01… Score(5) = Score(TGAGAC) = -‐2.55……Positions whose score exceeds a user-‐chosen threshold are reported as a putative binding sites for the TF. For example, if we choose a threshold of -‐0.2, then positions 1 and 4 would be reported as putative sites.Download TFBSscanning.py from MyCourses. Your work will consist in completing the each of the functions given in this program.You must not change the function names or their arguments. For each question, it is your responsibility to test your program to make sure it works on all possible cases, not just those provided as examples.Question 1 (10 points). Encoding of DNA nucleotides into integers.It is often convenient to represent a nucleotide (“A”,”C”,”G”, or “T”) as a number between 0 and 3, where “A” is represented as 0, “C” as 1, “G” as 2, and “T” as 3.Complete the encode function that takes as argument a string of one character, and returns an integer between 0 and 3. If the string provided as argument is not one of “A”,”C”,”G”, or “T”, the function should return -‐1.Example of execution: code = encode(“G”) print(code) èè 2code = encode(“A”) print(code) èè 0 code = encode(“X”) print(code) èè -‐1 Question 2 (20 points). Building a PFM.Write a function called buildPFM that takes as input a list of DNA sequences identified as binding sites for a given transcription factor, and returns a Position Frequency Matrix (PFM), represented as a list of lists of integers, where PFM[nuc][pos] = number of sequences that have nucleotide nuc at position pos. The function can assume that the input sequences all have the same length, but that length is not necessarily 6 (like in our example).Example of execution:sites = [ ACGATG , ACAATG , ACGATC , ACGATC , TCGATC , TCGAGC , TAGATC , TAAATC , AAAATC , ACGATA ]PFM = buildPFM(sites)print(PFM) èè [[6, 3, 3, 10, 0, 1], [0, 7, 0, 0, 0, 7], [0, 0, 7, 0, 1, 2], [4, 0, 0, 0, 9, 0]]Note: In your code, you may find it useful to be able to create a PFM with all entries initialized to zero. If L is the length of the desired PFM, this can be done as follows:PFM = [[0 for i in range(L)] for j in range(4)]Question 3 (10 points). Building a PWM from a PFM Write the function getPWMfromPFM, which takes as argument a PFM and a real number called the pseudocount and returns a PWM, according to the formula given above.Example of execution:PFM = [[6, 3, 3, 10, 0, 1], [0, 7, 0, 0, 0, 7], [0, 0, 7, 0, 1, 2], [4, 0, 0, 0, 9, 0]] PWM = getPWMfromPFM(PFM, 0.1)print(PWM)èè Output:[[0.370356487039949, 0.07638834586345467, 0.07638834586345467, 0.5893480258118245, -‐1.4149733479708178, -‐0.37358066281259295], [-‐1.4149733479708178, 0.43628500074825727, -‐1.4149733479708178, -‐1.4149733479708178, -‐1.4149733479708178, 0.43628500074825727], [-‐1.4149733479708178, -‐1.4149733479708178, 0.43628500074825727, -‐1.4149733479708178, -‐0.37358066281259295, -‐0.09275405323689867], [0.19781050874891748, -‐1.4149733479708178,-‐1.4149733479708178, -‐1.4149733479708178, 0.5440680443502756, -‐1.4149733479708178]] Question 4 (20 points). Scoring a sequence of length LWrite a function score that takes as argument a sequence and a PWM (both of the same length L), and calculates the score of the sequence with that PWM.Example of execution:PWM=[[0.370356487039949, 0.07638834586345467, 0.07638834586345467, 0.5893480258118245, -‐1.4149733479708178, -‐0.37358066281259295], [-‐1.4149733479708178,0.43628500074825727, -‐1.4149733479708178, -‐1.4149733479708178, -‐1.4149733479708178,0.43628500074825727], [-‐1.4149733479708178, -‐1.4149733479708178, 0.43628500074825727,-‐1.4149733479708178, -‐0.37358066281259295, -‐0.09275405323689867], [0.19781050874891748, -‐1.4149733479708178, -‐1.4149733479708178, -‐1.4149733479708178, 0.5440680443502756, -‐1.4149733479708178]]s= score( TCGATG ,PWM) print(s) èè 2.1110425271706337s=score( ACATAG ,PWM) print(s) èè -‐2.039670915526873Question 5 (20 points). Identifying matches in longer sequencesWrite a function predictSites that takes as input a DNA sequence and a PWM as positional arguments, as well a floating point number called threshold as keyword argument, and returns a list of the starting positions of substrings whose score with the given PWM is larger or equal to the threshold.Example of execution:PWM=[[0.370356487039949, 0.07638834586345467, 0.07638834586345467, 0.5893480258118245, -‐1.4149733479708178, -‐0.37358066281259295], [-‐1.4149733479708178,0.43628500074825727, -‐1.4149733479708178, -‐1.4149733479708178, -‐1.4149733479708178,0.43628500074825727], [-‐1.4149733479708178, -‐1.4149733479708178, 0.43628500074825727,-‐1.4149733479708178, -‐0.37358066281259295, -‐0.09275405323689867], [0.19781050874891748, -‐1.4149733479708178, -‐1.4149733479708178, -‐1.4149733479708178, 0.5440680443502756, -‐1.4149733479708178]]sequence = GCATCGATGGCAGCGACTACAGCGCTACTACAGCGGAGACGATGCGATCGATACAAT hits = predictSites(sequence, PWM) print(hits) èè [3, 38, 43, 47]Question 6 (20 points): Identification of putative target genes Genes are portions of DNA sequences. The transcription start site of a gene is the position of the start of the gene in the sequence. Suppose that you have a dictionary of gene names and associated transcription start site position. For examplegenes = { BRCA1 :3, MYC :23, RUNX : 45}Our goal is to count the number of predicted binding sites located in the neighborhood of the transcription start site of each gene. For example, if we allow a maximum distance of 10 and the list of PWM hits is [12, 38, 43, 46], then the BRCA1 gene has 1 hit, the MYC gene has 0, and the RUNX gene has 3.Your task is to write the function countHitsPerGene, which takes as positional arguments the genes dictionary and the list of hits, as well as keyword argument maxDist, and returns a new dictionary with genes as keys and hit count as values.Example of execution:genes = { BRCA1 :3, MYC :23, RUNX : 45} hits = [3, 38, 43, 47]geneHits = countHitsPerGene(genes, hits, maxDist = 10) print(geneHits) èè { BRCA1 : 1, MYC : 0, RUNX : 3} geneHits = countHitsPerGene(genes, hits, maxDist = 20) print(geneHits) èè { BRCA1 : 1, MYC : 3, RUNX : 3}最先出自Essayghost Python代写服务合作：315代写

所有的编程代写范围：essayghost为美国、加拿大、英国、澳洲的留学生提供C语言代写、代写C语言、C语言代做、代做C语言、数据库代写、代写数据库、数据库代做、代做数据库、Web作业代写、代写Web作业、Web作业代做、代做Web作业、Java代写、代写Java、Java代做、代做Java、Python代写、代写Python、Python代做、代做Python、C/C++代写、代写C/C++、C/C++代做、代做C/C++、数据结构代写、代写数据结构、数据结构代做、代做数据结构等留学生编程作业代写服务。

上一篇：代做Python：python代码代写 COMP3331/9331 Networks and Applications - Python代做

下一篇：代做Python：python network代写 | Mini Project 2 Preliminaries - Python代做

编程案例

写作技巧

联系我们

Python代写