`
hqman
  • 浏览: 353273 次
  • 性别: Icon_minigender_1
  • 来自: 苏州
社区版块
存档分类
最新评论

Dropbox python开发中6点教训(每十五分钟同步100万个文件)

阅读更多

Dropbox saves one million files every 15 minutes,  more tweets than even Twitterers tweet. That mind blowing statistic was revealed by Rian Hunter, a Dropbox Engineer, in his presentation How Dropbox Did It and How Python Helped at PyCon 2011.

The first part of the presentation is some Dropbox lore, origin stories and other foundational myths. We learn that Dropbox is a startup company located in San Francisco that has probably one of the most popular file synchronization and sharing tools in the world, shipping Python on the desktop and supporting millions of users and growing every day

About half way through the talk turns technical. Not a lot of info on how Dropbox handles this massive scale was dropped, but there were a number of good lessons to ponder:

  1. Use Python
    • 99.9 % of their code is in Python. Used on the server backend; desktop client, website controller logic, API backend, and analytics.
    • Can't use Python on the Android due to memory constraints.
    • Runs on a single code base using Python. Dropbox runs on Windows, Mac, Linux using tools like PyObjs, WxPython, types, py2exe, py2app, PyWin32.
    • Pros: 
      • Developers talk to each other and express ideas in Python
      • Easy to learn, easy to read, easy to write, easy for new people to pick up.
    • Cons: 
      • Don't be silly. 
      • OK, it can use too much memory and be too slow. Not a big deal on the server side, just buy bigger machines. On the client side you can't get an old Power PC user to upgrade.
      • Coding in a mixed environment of Python and C creates problems because it's hard to profile across the language boundaries like you want to do when fixing memory and CPU problems.
      • Memory fragmentation issues are reason why scripting languages may not be a good idea for long running processes.
  2. Just Work Baby
    • Shouldn't matter what file system you are on, what OS you are using, what applications you are using. The product should always just work.
    • Python helped them iterate fast through all the different error cases they experienced on the wide variety of platforms they support.
  3. Release Early
    • Code something in a day and release it. Python makes that easy.
  4. Use C for Inner Loops - Optimizing CPU is easy
    • A way to handle the too slow problem.
    • Optimize inner loops to reduce CPU time. 
    • 44% of overhead when looping in Python vs C (2.88s vs 1.61)
    • Python VM bytecode dispatches are really slow. 
    • Many tools exist for profiling CPU. 
    • CPU optimizations are usually limited to small code sections.
  5. Poll - Polling 30 Milion Clients All Over the World Doesn't Scale 
    • Created an HTTP notification structure to avoid polling the server on the client site.
  6. Custom Memory Allocator - Optimizing Memory is Hard
    • This was there biggest problem for a while. Could use huge amounts of memory and the memory would never be freed. For large sync they could use up to 1.5GB, now they rarely use more than 100MB.
    • Hard because: 
      • Few tools exist for profiling memory for Python and C
      • Memory bloat has so many causes: leaks in Python and C code; memory fragmentation; inefficient use of memory.
    • Fixing obvious memory inefficiencies didn't help. They thought there was a memory leak, but there wasn't.
    • Problem turned out to be memory fragmentation. Memory fragmentation is what happens when different sized memory blocks are continually being deleted and allocated. What happens is contiguous blocks of memory can no longer be allocated. CPython doesn't have a garbage collector, so all this memory simply wasn't able to be allocated and the heap continually grew so memory requests could be satisfied.
    • Solution was to create a custom allocator. The file meta-data object grows a lot when doing transfers, so the obvious low hanging fruit was to create a custom allocator in C using mmap.

Future Directions

  • Dropbox on toasters. File sharing on toasters will be really big.
  • They see folders as a unifying metaphor for storing, organizing, and accessing data in the cloud and on any device, anywhere, anytime. 

Related Articles 

 

分享到:
评论

相关推荐

    Dropbox 最好的同步本地文件的网络存储在线应用

    Dropbox是一个提供同步本地文件的网络存储在线应用。支持在多台电脑多种操作中自动同步。并可当作大容量的网络硬盘使用。Dropbox采用免费试用+高级服务收费的Freemium模式,最初2GB空间免费,此后则需要按月支付存储...

    Python-Dropbox的现实密码强度评估器的Python实现

    Dropbox的现实密码强度评估器的Python实现

    Python-Dropbox-Clone:这是一个简单的python服务器和客户端,可将源目录中的文件从客户端同步到服务器上的目标文件夹-python source file

    这是一个简单,简单的python服务器和客户端实现,可将源目录中的文件从客户端同步到服务器上的目标目录。 总体上,通过粗略的测试和基本功能的实施,该项目花费了3 1/2小时才能完成。 当前,客户端和服务器可以...

    Dropbox安装文件

    Dropbox是一个用来在网络和你不同电脑之间同步的软件  将文件放入一台电脑的Dropbox里面去,文件就能即时的同步到Dropbox的服务器端,这些文件在你任何安装了Dropbox的电脑上都可以访问(Windows, Mac, and Linux...

    Laravel开发-dropbox

    Laravel开发-dropbox Dropbox是Laravel 5的Dropbox桥

    用于Python的官方Dropbox API V2 SDK

    可以在通过Developer Console.Install中获取Python.Documentation的offical Dropbox SDK。通过Pip,通过PIP获取应用程序,可以找到:.install从源代码:。在安装中,安装,按照我们的一个示例或阅读文档读取文档 。...

    Dropbox的用户空间文件系统-Python开发

    它与正式的Dropbox客户端有两个主要不同:访问需要Internet连接dbxfs不需要磁盘空间dbxfs允许您将Dropbox文件夹挂载为本地文件系统。 它与正式的Dropbox客户端有两个主要区别:访问需要互联网连接不需要磁盘空间来...

    Dropbox网络共享文件

    共享文件,能够实现自动同步,特别适合家庭和办公电脑“我的文档”的同步,如QQ聊天记录等,2G以下免费.

    dropbox实时更新远程文件,让文件的携带更方便

     将文件放入一台电脑的Dropbox里面去,文件就能即时的同步到Dropbox的服务器端,这些文件在你任何安装了Dropbox的电脑上都可以访问(Windows, Mac, and Linux都行!)。你可以用电脑或者移动终端从 Dropbox网站来...

    pyDropboxPath 更改dropbox同步文件夹名和路径

    此软件可以更改dropbox的同步文件夹名称和路径,可以解决windows、mac、Ubuntu等多系统使用时dropbox默认路径名不同而重复同步文件夹的问题

    dropbox for linux 核心文件

    由于dropbox被和谐了,现在不能在线安装,该报是linux版的核心文件包,下载解压缩放入home就可以使用了

    dropbox最新版

     Dropbox和Live Mesh都是常用的网络存储服务,可以实现多台电脑上文件共享和同步文件,微软的Live Mesh提供的空间是5GB空间,而Dropbox提供的初始免费空间是2GB,可以通过邀请增大到10GB,两个网络同步软件各有千秋...

    dropbox-sdk-python:适用于Python的官方Dropbox API V2 SDK

    适用于Python的正式Dropbox SDK。 可以在“找到。安装通过创建应用。 通过安装: $ pip install dropbox 从源安装: $ git clone git://github.com/dropbox/dropbox-sdk-python.git$ cd dropbox-sdk-python$ python ...

    Dropbox云存储 2.3.12.10

    当你在电脑A使用 Dropbox时,指定文件夹里所有文件的改动均会自动地“同步”到 Dropbox的服务器,当下次你在电脑B需要使用这些文件时,你只需登录你的账户,所有被同步的文件均会自动下载到B电脑中,同样,你在电脑B...

    Laravel开发-laravel-dropbox

    Laravel开发-laravel-dropbox Dropbox是Laravel 5的Dropbox桥

    ubuntu上安装dropbox所需文件

    ubuntu上安装dropbox所需文件 1. nautilus-dropbox_0.6.3_i386.deb 2. dropbox-lnx.x86-0.7.110.tar.gz

    同时运行多个dropbox

    同时登录多个dropbox账户,网络同步数据

    Dropbox(多宝箱)在线+离线安装包

    Dropbox是一款免费网络文件同步工具,是Dropbox公司运行的在线存储服务,通过云计算实现因特网上的文件同步,用户可以存储并共享文件和文件夹。Dropbox提供免费和收费服务,Dropbox的收费服务包括Dropbox Pro 和 ...

    matlab开发-本地Dropbox文件夹的根路径

    matlab开发-本地Dropbox文件夹的根路径。返回本地Dropbox文件夹的根路径或返回本地Dropbox文件夹中文件的完整路径

    Dropbox 3.14.7 Offline Installer / DROPBOX 离线安装文件

    Dropbox works the way you do Get to all your files from anywhere, on any device, and share them with anyone. Take your docs anywhereSave files on your computer, then access them on your phone from ...

Global site tag (gtag.js) - Google Analytics