Total points: 75 (+15 bonus)
project 3
Due: 30th Nov, 2018 (Fri)
Written project
1. We have four data privacy techniques:
1. k-anonymity
2. Differential privacy
3. Secure multiparty computation
4. Private information retrieval
For each scenario below, discuss whether eachof the four data privacy techniques would be suitable to resolve the challenge, and explain why.
(a) [4 points] You want to buy a new smart device to encourage a healthy lifestyle by measuring your calorie intake. Specifically, you want to be able to look up the nutritional information of any food using your device at any time. However, you are reluctant to reveal your personal eating habits to the company that owns the device. A new company making these smart devices is willing to use a data privacy technique to protect your privacy.
(b) [4 points] You are curious to know if your friend is earning more money than you are, but both of you are too embarrassed to reveal your salary to anyone. You know he is willing to cooperate with you in finding out an answer, though.
(c) [4 points] You are about to start a new company, and you would like to purchase its web domain as the company’s website. However, you are aware of the practice of cybersquatting; people may purchase the domain first if they know it is in demand, and sell it to you at an elevated price. You want to know if the domain is still available, but you are worried that attempting a DNS query for the domain will lead to some DNS servers purchasing it immediately for cybersquatting. Some DNS servers recognize your need and will cooperate with you if you can identify a useful data privacy technique.
(d) [4 points] A social media company is offering a monetary reward for a challenge: design the best algorithm that can identify which people should be friends, based on their real-life habits, online usage patterns, interests, and other personal data. They want to use the algorithm to improve their friend recommendation service. However, they are aware that simply releasing everyone’s raw data is a violation of their privacy.
2. A certain company backs up its data every day, at 11:59 PM. Every Sunday, a full backup is performed; partial backups are performed on the other weekdays. Every Monday, Wednesday, and Friday, there will be a differential backup. Every Tuesday, Thursday and Saturday, there will be an incremental backup.
For simplicity, assume that the data consists of D files of the same size. On each day, a small percentage p of the files is changed. If a file is changed, the entire file will need to be stored in the backup (it will not be subdivided into chunks, unlike rsync).
(a) [2 points] Assume that no file is changed twice during a week. Calculate the amount of storage required for each of the seven backups for the week, given in terms of p and D.
(b) [2 points] Now, assume that a file can be changed twice. A file, if changed twice or more, will never return to its original state. Which of the above answers, if any, is expected to be different at the mean? Show your workings.
(c) [2 points] Which partial backups cannot be corrupted if we want to be able to restore to the data at Saturday 11:59 PM?
3. Assume the following numbers describe the Bitcoin ecosystem:
I. The size of each block is at most 1 MB (= 1,048,576 Bytes), and there is exactly 1 block per 10 minutes. The reward for a successful block hash is 6 bitcoins.
II. Each transaction is at least 166 bytes in size. A block may contain nothing but transactions.
III. In total, all miners produce 1020 hashes per second. All miners cooperate to ensure that no two miners will attempt the same hash.
IV. It costs $8,000 to buy a mining device to produce 1016 hashes per second, and
$0.002 to run it per second for every miner.
(a) [2 points] What is the minimum price of a bitcoin such that miners avoid making a running loss using the given mining device, if there are no transaction fees?
(b) [2 points] What is the maximum number of transactions per second?
(c) [2 points] Now, suppose that the price of a bitcoin stays the same as your answer in (a), but the reward for a successful block hash has been halved to 3 bitcoins. What is the mean transaction fee (in $) required for each transaction so that miners continue to avoid making a running loss? (Show your workings clearly so that an error from (a) or (b) will not affect your marks for this part.)
(d) [2 points] Suppose the price of a bitcoin is fixed at $2,500, there are no transaction fees, and the reward for a successful block hash is still 6 bitcoins. What is the minimum number of Bitcoin mining devices you would need to buy and run to earn at least $1,000,000 profit in 365 days? Note that your purchases here will affect the total hashing rate above in (III). (The cost of purchase and the cost of running should both be counted as loss.)
Programmingproject
Error Correcting Codes [45 points]
Error Correcting Codes (ECCs) can be used in unsafe transmission or storage devices to improve reliability by providing redundancy. Unlike checksums, ECCs can automatically correct errors up to a certain number of bits. In this question, you will be asked to implement the general Hamming code, which can correct any 1-bit error in a general string. The specifications of the Hamming code are as follows.
1. Write the input data string as a bitstring according to ASCII.
2. Choose enough parity bits and intersperse the data bits with parity bits, so that the i-th parity bit piis followed by exactly 2i−1 − 1 data bits. Denote the i-th bit of H as Hi. Denote the length of H as |H|.
3. Let B(x) be the bitstring representation of the integer x, for example, B(5) = 00101.
4. Define each Mito be a set of specific bit positions in H, such that Miis the set of integers 1 ≤ x ≤ |H| where the i-th least significant bit of B(x) is 1.
5.
Set the parity bit pito be the XOR of all the bits of H in the positions indicated by Mi, except iitself:
pi= Hx
x∈Mi\{i}
Here is a more detailed explanation of Mi. Consider the integer x = 10, so B(10) is 1010. Then the 1st least significant bit is 0, the 2nd is 1, the 3rd is 0, and the 4th is 1 (reverse the bitstring). So x belongs to M2 and M4, not to M1 and M3.
To work out an example:
1. The input data string is “ab”, which becomes 0110000101100010 (16 data bits).
2. We need five parity bits, p1, p2, p3, p4, p5, as follows:
H
案例CS之python案例 data privacy techniq
2018-06-15