A simple bash script to remove photo taken by a specific device


I always use Airdrop to transfer photo shot by iPhone to iPad for editing. But the photo transfered and photo taken by iPad are mixed together since there is only one gallery in iOS. I wrote this script to remove all the photos taken by iPhone so I could backup only those taken by iPad in order to save storage.

for file in * ; do
    if mediainfo $file | grep iPhone\ 8 ; then
    echo rm $file>>/tmp/rm.sh
    elif exiftool $file -model | grep -l iPhone\ 8 ; then
    echo rm $file>>/tmp/rm.sh

MultiProcessing in Python

Recently, due to excessive need of processing binary executable data, I dig into python’s multiprocessing library, which is able to process data in a parallel way, unleashing full potential of modern multi-core processor.

Why use multiprocessing instead of threading?

Obviously, when thinking about parallel computing, threading is the most suitable option. Different threads have their own stack while sharing the same addressing space, which facilitate data exchanging. However, different from other low-level language, most implementations of python has a limitation on threading, called GIL, Global Interpreter Lock. It guarantees that only one interpreter thread is running, thus only one CPU core is utilized. Currently, except Jython (Java implementation of python) and IronPython (.NET implementation), other implementations of python, including the most popular Cython have this limitation. Therefore, multiprocessing seems to be a universal solution.

Basic concept

The basic usage of multiprocessing is to spawn a process and enter the specific function.
The following example spawns a new process and then enters the f function. Then child process joins father process so that the program will wait for it to finish executing.

from multiprocessing import Process

def f(name):
print 'hello', name

if __name__ == '__main__':
p = Process(target=f, args=('bob',))

There are some OS-specific stuff which will be explained in detail.
When there is a list of pending jobs, process pool can be used. A process pool will guarantee the exact amount of running process, as fewer processes cannot unleash the full potential of system and more processes will consume more valuable resources and significantly slow down the overall speed.
The following example maps a job list to a function. Pool(5) means that no more than 5 processes will be running concurrently. map takes an iterator. The result of p.map is a list of returning values of called function, which, in this case isf.

from multiprocessing import Pool

def f(x):
return x*x

if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))

Apart from map, there are other mapping methods like imap, which releases the return value as iterator.

value_itr=p.imap(f, [1, 2, 3])
for v in value_itr:

Its advantage is that we don’t have to wait until all the jobs are finished. And we can have a clear image of how many jobs have been finished. It allows some interesting tricks, such as to make a progress bar or to pass return value as job list to another pool or to create a producer-consumer model to avoid too much data being prepared.

Passing data between processes

Since different processes don’t share addressing space, they only communicate through IPC (Interprocess Communication) or shared memory. Python has a Manager module that solves all these problems.

from multiprocessing import Process, Manager

def f(d, l):
d[1] = '1'

if __name__ == '__main__':
manager = Manager()

d = manager.dict()
l = manager.list(range(10))

p = Process(target=f, args=(d, l))

In this case, d and l shares between different processes. Note that they cannot be pickled like list. But they can easily converted to list.
Processes can also communicate by pipes.
It’s suggested that we shall not store too much data in manager. Synchronization will waste much time by pending many processes.
If a lot of data will be sharing between processes, one option is to benefit from Unix’s “copy on write”, which I am going to talk next.

Spawning processes on different OSes

As we know, Unix-based system spawns process by forking. Everything remains the same except return value of fork(). Parent process will receive PID of child process but child process will receive 0. Also, they have a “copy on write” technique which means that when forking, the same memory will not be copied actually until one process intends to change that memory. Therefore, the forked process is able to access all the data from its parent process without actually copying it. This is dramatically faster than Manager.
However, in Windows, things are different. It creates a new process and import the whole python script. Then, it calls the target function.
This is why we need if __name__ == '__main__':. This is to ensure that the code will not run unexpectedly in the importing process. Otherwise, the spawned process will continue to spawn processes.

One more step further into map

map distributes jobs to spawned processes. It works as this. Firstly, some processes are spawned. Then, jobs are mapped to those processes in chunks. When one chunk of jobs is finished, this process will communicate with the main process and pass return values. Afterwards, it receives new jobs. This loop will continue until job list is empty or it is killed.
There are 2 problems.
1. If a process takes more jobs, it consumes more memory. Only by killing this process can the memory be released. But spawning and killing processes take time. This can be adjusted by setting parameter maxtasksperchild  of Pool.
2. Jobs are sent to processes by chunks. If chunk size is big but iterator produces jobs slowly, it takes more time waiting. Bigger chunk size also means more memory consumption. This can be adjusted by setting parameter chunksize of imap.

with Pool(processes=32, maxtasksperchild=100) as po:
res=po.imap(data_process,sql_iter(), chunksize=1)

Hola mundo

Tal vez es mi primera publicación en español.

El tiempo se pasaba muy rápidamente. Hoy estoy en mi ciudad natal. Ví unos papeles que escribí en mi escuela secundaria. Recordé las cosas muy exactamente como se ocurría ayer. En esa época, estaba muy preocupado por los exámenes. Ahora estoy preocupada por los experimentos. Los ambas son preocupaciones, pero ahora lo que estoy haciendo es más creativo, también a mi les gustan los ordenadores.

Pero todavía voy a soñar es difícil para mi. Aunque tengo sueños raramente, ellos serían iguales. Son sobre muchachas, o cosas emocionados que se ocurrieron cuando era joven.

Es la razón de aprender sociología. Pero tristemente, no me cambió mucho. Siempre estoy buscando la manera de vivir felizmente.

Son unos de mis pensamientos hoy. No quiero nadie entender. Escribir para escribir.

Buenas noches. Te amo.






监督学习和非监督学习的差别就是训练集目标是否人标注。他们都有训练集 且都有输入和输出。



机器学习能通过经验自动改进计算机算法,这中间有三个攻击:模型算法提取(Model Extraction, Reverse Engineering)、攻击保密性数据(Training Data Extraction, Membership Attack)、攻击系统完整性和可用性(Data Poisoning, Adversarial Attack)。这些攻击发生在机器学习的不同阶段中,如图所示:


训练时攻击:数据投毒(Data Poisoning)



一种常见的攻击是模型倾斜。例如,黑客有组织有预谋地将垃圾邮件标记为正常邮件,可以干扰Gmail的反垃圾邮件系统。图中红色曲线代表标记为正常邮件的数量,绿色曲线代表标记为垃圾邮件的数量。在“Skewing Attempt”的时间节点上,黑客将大规模的垃圾邮件标记为正常邮件,因此他们自己发送的类似垃圾邮件也被Gmail认为是正常邮件,这达到了他们的目的。


根据研究Biggio, B., Nelson, B., & Laskov, P. (2011).,如果训练数据中的标签中有40%被干扰,那么最终的模型的错误分类率





模型使用时(Inference Phase)攻击

模型训练结束后,机器学习产品进入使用阶段。在这个阶段,有三种可能的攻击:对抗输入(Adversarial Attack),模型抽取(Model Extraction),训练数据抽取(Data Extraction)。





由此可见,造成这一问题的原因是,训练模型过于线性。因此对抗样本只需要在每个维度上朝着让类别置信度降低的方向上移动一个一小步,由于输入维度非常高(例如一张200×200的图片,有40000个像素,即维度),每个维度上的效果叠加在一起,就可以对结果产生显著的影响。这种攻击叫做FGSM,Fast Gradiant Sign Method。














屏幕快照 2017-06-21 下午5.24.14.png



2、连接字符串。jdbc:oracle:thin:@ 后面的orcl根据数据库安装的时候输入的来。可以在NetManager里面查看。我的是orcl。

屏幕快照 2017-06-21 下午5.29.07.png

屏幕快照 2017-06-21 下午5.30.11


System.out.println("-------- Oracle JDBC Connection Testing ------");
try {


} catch (ClassNotFoundException e) {

    System.out.println("Where is your Oracle JDBC Driver?");
    return null;


System.out.println("Oracle JDBC Driver Registered!");

Connection connection = null;

try {

    connection = DriverManager.getConnection(
            "jdbc:oracle:thin:@", "scott", "tiger");

} catch (SQLException e) {

    System.out.println("Connection Failed! Check output console");
    return null;


if (connection != null) {
    System.out.println("You made it, take control your database now!");
    return connection;
} else {
    System.out.println("Failed to make connection!");
    return null;



屏幕快照 2017-06-21 下午5.32.58.png


但是你以为这就完了?一个大坑等着你呢。之前没有选择数据库版本的时候,它默认会用Oracle12 ORACLE。而这里我用的是Oracle11 ORACLE SQL*PLUS。而且它测试连接的时候会卡死,无法取消。

因此,要修改系统默认的ORACLE设定。将ORACLE改成ORACLE SQL*PLUS,同时,把我们测试可以用的jar驱动包拖进去。改完这个设置以后才可以新建数据库连接。之前建过的话就删了吧。

屏幕快照 2017-06-21 下午5.35.19.png






connect sys/orcl as sysdba;


alter database character set INTERNAL_USE UTF8;


select * from nls_database_parameters;







outline of Design of Operating System

Download PDF with some graph. design-of-operating-system-outline

Design of Operating System


OS protection, modes/privileges

User Mode, Kernel Mode


a register of flag to record what mode the CPU is in

Kernel Mode

In Kernel mode, the executing code has complete and unrestricted access to the underlying hardware. It can execute any CPU instruction and reference any memory address. Kernel mode is generally reserved for the lowest-level, most trusted functions of the operating system. Crashes in kernel mode are catastrophic; they will halt the entire PC.

User Mode

In User mode, the executing code has no ability to directly access hardware or reference memory. Code running in user mode must delegate to system APIs to access hardware or memory. Due to the protection afforded by this sort of isolation, crashes in user mode are always recoverable. Most of the code running on your computer will execute in user mode.

Transitioning between User and Kernel mode is expensive.

Memory protection

protect programs from each other

◆  Base and limit registers

◆  Page table pointers, page protection, TLB

◆  Virtual memory

◆  Segmentation

Events(Fault, interrupt, trap)

OS handle unnatural change, that is event

Synchronous: when executing instructions

Asynchronous: external event

Unexpected and deliberate (scheduled by OS or application)

  Unexpected Deliberate
Synchronous fault syscall trap
Asynchronous interrupt software interrupt

Fault: may handle unrecoverable faults by killing the user process. Kernel faults: Dereference NULL, divide by zero, undefined instruction.

System call: trap to syscall à handle syscall à kernel routine à restore routine, set to user mode


Interrupt Handler: part of the operating system, running in kernel mode

Two flavors of interrupts

◆  Precise: CPU transfers control only on instruction boundaries

◆  Imprecise: CPU transfers control in the middle of instruction execution

Timer interrupt: OS reclaim control

Synchronization: interrupt anytime so guarantee short instruction sequence execute atomically


Components, PCB, process control block

All the state for a program in execution.

Stack, heap, static data, code.


Running, ready, waiting

State queue:  wait queue, ready queue. PCB are those in the queue.


Process and Thread

PROCESS represented by PCB

◆  An address space

◆  The code for the executing program

◆  The data for the executing program

◆  A set of operating system resources »  Open files, network connections, etc.



◆  An execution stack encapsulating the state of procedure calls

◆  The program counter (PC) indicating the next instruction

◆  A set of general-purpose registers with current values

◆  Current execution state (Ready/Running/Waiting)


Dead lock


Deadlock exists among a set of processes if every process is waiting for an event that can be caused only by another process in the set.

4 perquisites

  1. Mutual exclusion – At least one resource must be held in a non-sharable mode
  2. Hold and wait – There must be one process holding one resource and waiting for another resource
  3. No preemption – Resources cannot be preempted (critical sections cannot be aborted externally)
  4. Circular wait – There must exist a set of processes [P1, P2, P3, …,Pn] such that P1 is waiting for P2, P2 for P3, etc.


In computer programming, a mutex is a program object that allows multiple program threads to share the same resource, such as file access, but not simultaneously. When a program is started, a mutex is created with a unique name.


In computing, preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time. Such changes of the executed task are known as context switches. It is normally carried out by a privileged task or part of the system known as a preemptive scheduler, which has the power to preempt, or interrupt, and later resume, other tasks in the system.


In software engineering, a spinlock is a lock which causes a thread trying to acquire it to simply wait in a loop (“spin”) while repeatedly checking if the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on (that which holds the lock) blocks, or “goes to sleep”.



A livelock is similar to a deadlock, except that the states of the processes involved in the livelock constantly change with regard to one another, none progressing. This term was defined formally at some time during the 1970s—an early sighting in the published literature is in Babich’s 1979 article on program correctness.[12] Livelock is a special case of resource starvation; the general definition only states that a specific process is not progressing.[13]

Livelock is a risk with some algorithms that detect and recover from deadlock. If more than one process takes action, the deadlock detection algorithm can be repeatedly triggered. This can be avoided by ensuring that only one process (chosen arbitrarily or by priority) takes action.[14]


◆  Job throughput (# jobs/unit time)

◆  Turnaround time (Tfinish – Tstart)

◆  Waiting time (Avg(Twait): avg time spent on wait queues)

◆  Response time (Avg(Tready): avg time spent on ready queue)

Starvation  (side effect of schedule)

a process is prevented from making progress because some other process has the resource it requires

Schedule mode, First in first out, Shortest job first, Round Robin

Round Robin: each task get resource for a fixed period of time

Priority scheduling

  • Problem?

◆  Starvation – low priority jobs can wait indefinitely

  • Solution

◆  “Age” processes

»  Increase priority as a function of waiting time

»  Decrease priority as a function of CPU consumption

Multi-level feedback queue


Read and write semaphore


A monitor is a module that encapsulates

◆  Shared data structures

◆  Procedures that operate on the shared data structures

◆  Synchronization between concurrent threads that invoke the procedures

guarantee one thread in the monitor, other threads are in waiting queue.

Memory Hierarchy

  • memory segment
  • virtual memory:
    • fixed partitions
    • variable partitions
    • paging (easy to allocate, simplify protection;; still have internal fragmentation, memory reference overhead(time consuming), page table is too large)

VAS: virtual address space

PTE: page table entry

VPN: virtual page number

PFN: page frame number

Page fault and Page hit: whether the referenced VM word is present in memory(if there’s a page fault, the memory is on the disk)

Work set locality: At any point in time, programs tend to access a set of active virtual pages called the working set

Programs with better temporal locality will have smaller working sets

If (working set size < main memory size): Good performance for one process after compulsory misses


If ( SUM(working set sizes) > main memory size )Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously


Page Table problem Equation:

Page size: size of each virtual page.

Page size total = page table entry size * number of page entries


Two level page table case:


2-level page table » Level 1 table: each PTE points to a page table (always memory resident) » Level 2 table: each PTE points to a page (paged in and out like any other data)

Top level page table: number of PTE* size of PTE

Bottom level page table: number of pages * size of pages


Address space = number of pages * 2^vpn

Page size = 2^vpn * PTE

VM’s usage:

Memory management

Memory allocation

Sharing memory: share memory when fork, avoiding intensive memory copy. Some data can be copied on write.

Memory protection



Page table translation:


TLB: speed up the translation, which functions like L1 Cache

TLB hit and TLB miss


The CPU generates the logical address, which contains the page number and the page offset.

The page number is used to index into the page table, to get the corresponding page frame number, and once we have the page frame of the physical memory(also called main memory), we can apply the page offset to get the right word of memory.

Why TLB(Translation Look Aside Buffer)

The thing is that page table is stored in physical memory, and sometimes can be very large, so to speed up the translation of logical address to physical address , we sometimes use TLB, which is made of expensive and faster associative memory, So instead of going into page table first, we go into the TLB and use page number to index into the TLB, and get the corresponding page frame number and if it is found, we completely avoid page table( because we have both the page frame number and the page offset) and form the physical address.

TLB Miss

If we don’t find the page frame number inside the TLB, it is called a TLB miss only then we go to the page table to look for the corresponding page frame number.


If we find the page frame number in TLB, its called TLB hit, and we don’t need to go to page table.

Page Fault

Occurs when we have formed the physical address, using TLB or page table, it does not matter and we do not find it in the main memory.

Cache Hit


Page Replacement Algorithm, Thrashing

When there’s a page fault and OS must replace the page in physical memory

  • Belady’s algorithm
    • Replace the page that will not be used for the longest time in future
    • Useful for determine the improvement of algorithm
  • FIFO: first in first out
    • In computer storage, Bélády’s anomaly is the phenomenon in which increasing the number of page frames results in an increase in the number of page faults for certain memory access patterns. This phenomenon is commonly experienced when using the First in First Out (FIFO) page replacement algorithm. László Bélády demonstrated this in 1969.[1]
  • LRU: least recently used
    • cannot do precisely, so we have do approximate it
    • periodically set REF to 0
  • LRU clock
  • Victim buffer.
    • Throw those will be deleted into buffer

Working set

The “working set” is short hand for “parts of memory that the current algorithm is using” and is determined by which parts of memory the CPU just happens to access. It is totally automatic to you. If you are processing an array and storing the results in a table, the array and the table are your working set.

Dynamic locality of process’s memory usage

Working set size is the number of pages in the working set

PFF: page fault frequency

Monitor the fault rate of each process and then decide to increase or reduce memory

File seek policy

FCFS: do nothing

SSTF: shortest seek time first

SCAN: elevator

C-SCAN: elevator only in one direction

Optimized File systems

Fast File System, FFS

FFS Main features:

Cylinder groups to improve locality.

Copy of superblock in each cylinder group (on different platters) to improve fault tolerance.

Free block bitmap per cylinder group,

supports allocation of consecutive blocks.

Larger blocks (multiple of 4kB) for improved throughput.

Clustering of I/O operations to further improve throughput[Pea88].

Fragments (1/2 – 1/8 blocks) to reduce fragmentation.

Pre-allocated inodes scattered throughout cylinder groups,

Bitmap free list for fast allocation.

Ability to rebuild free lists after crash (fsck).


Cylinder groups:


If possible, file direct-blocks and index blocks are allocated from same CG as inode,

up to a maximum (25% of CG capacity) to avoid squeezing out small files.

Attempt to allocate consecutive logical blocks rotationally optimal.

Aided by bitmap free list.

Attempt to allocate (plain) files’ inodes in same CG as directory.

Spread directories across CGs.



Last block of small file is actually array of consecutive fragments.

Fragments only allowed for files not using indirect blocks.

Fragment array is unaligned but must not span blocks.

When extending:

try allocating extra fragment from same block,

otherwise copy.

Allocation supported by using fragment granularity in free block bitmap.



Try to allocate logically contiguous blocks contiguously (up to 64kB).

Delay writes until ready to write whole cluster (if possible).

Read ahead full cluster for sequentially accessed files.

Reallocation of modified blocks to improve locality.

Separate cluster map to help finding clusters.


Variant: Linux Extended-2 File System (Ext2FS)

(Logical) block groups rather than cylinder groups.

Inode array in each block group.

Store (short) symbolic links in inode.

Preallocates contiguous blocks.



Ext2FS Disk layout


FFS Problem: Synchronous updates


Need to be able to recover consistent state of FS after crash.

Synchronous writes of metadata for consistency on disk:

On file creation:

Write inode before updating directory.

Write directory.

On file deletion:

Write updated directory before deallocating inode.

Write updated inode before freeing blocks in bitmap.

On file extension:

Write data block before inode (for security).

Write updated inode/indirect block before updating bitmap.


Similarly on file truncation.

Running fsck after crash rebuilds consistent bitmaps.

Synchronous I/O limiting when creating/deleting many files.



Delayed writes/soft updates[GP94]:

Consistency does not require synchronous writes.

Must ensure that dependent writes are kept in sequence.

E.g., ensure that on creation inode is written before bitmap.

Unrelated reads or writes can still occur in arbitrary order.

Need to:

enhance disk scheduler to honour write dependencies,

use copy-on-write on source blocks to avoid blocking further modifications.


Slightly more loss of user data during crash.

Much more complicated disk scheduling and interrupt processing.

Disks can reorder write requests internally



1.  Explain the difference between internal and external


Internal fragmentation is the wasted space within each allocated block

because of rounding up from the actual requested allocation to the

allocation granularity.  External fragmentation is the various free

spaced holes that are generated in either your memory or disk space.

External fragmented blocks are available for allocation, but may be

too small to be of any use.








出国前自己给自己一个约定,讲话的时候一句中文里除非是要直接引用专有名词,否则不得出现一个英文单词。譬如“在酒店check in” “我的professor”这些,都是陋习。